New York Investment Network


Recent Blog


Pitching Help Desk


Testimonials

"This is to inform you that I have already obtained all the investment funds that I need to launch my project. I thank you for doing all you have done for me. I am thrilled beyond measure. Apparently I have a better idea than even I knew."
Jerry Johnston - Mega Clean

 BLOG >> Recent

A Classification Framework for Bayesian Angel Investing [Bayesian Inference
Posted on April 4, 2013 @ 11:30:00 AM by Paul Meagher

This blog is a followup to yesterday's blog introducing the idea of Bayesian Angel Investing.

In 2004 I wrote 3 articles for IBM developerWorks on Bayesian inference and developed php-based code to explore the topic with. I'd like to follow up on some of that work by exploring how Bayesian inference might be applied to Angel Investing.

It is hard to pick a starting point for this investigation. I thought the best way to begin would be to give a quick demo of how to use a ClassifierDiagnostics.php class I developed to analyze the relationship between two binary-valued variables (a "test" variable and a "classification" variable). Doing so will introduce you to many concepts, calculations, and stats you should be familiar with if you want to apply Bayesian inference to Angel Investing.

The two variables we will be analyzing in the demo code below are "Business Plan Quality" test variable and a "Successful Company" classification variable. The data we will be inputting to our software for analysis will consist of a binary rating of Business Plan Quality (0=Fail, 1=Pass) and a binary rating for the Successful Company variable (0=Not Successful, 1=Successful). Each of the four $data records below corresponds to an observation conducted on one startup company. In this case, the observation of Business Plan Quality for a startup company and the eventual success or failure of that startup company. One question to investigate is whether the Business Plan Quality measurement should be used as a "test" for diagnosing whether a startup company will be successful or not.

Without further ado, here is the source code for the business_plan_and_success.php demo script which invokes input, analysis, and output functions supplied by the ClassifierDiagnostics.php class.

<?php

/**
* business_plan_and_success.php
*
* Compute joint frequency and joint probability of two 
* variables: business plan quality (0=Fail, 1=Pass) and 
* and company success (0=No, 1=Yes). Displays joint 
* frequency table, joint probability table, and various
* diagnostic statistics about the relationship between 
* the variables.
*/

require_once "ClassifierDiagnostics.php";

$data[0] = array("1""1"); // Startup 1: BizPlan=Pass, Success=Yes
$data[1] = array("0""0"); // Startup 2: BizPlan=Fail, Success=No
$data[2] = array("1""1"); // Startup 3: BizPlan=Pass, Success=Yes
$data[3] = array("0""1"); // Startup 4: BizPlan=Fail, Success=Yes

$classifier = new ClassifierDiagnostics($data);

$classifier->setRowName("Business Plan");
$classifier->setRowTrue("Pass");
$classifier->setRowFalse("Fail");

$classifier->setColumnName("Successful Company");
$classifier->setColumnTrue("Yes");
$classifier->setColumnFalse("No");

$classifier->showCrossTabs();
$classifier->showStats();

?>

Below is the output generated by the running the demo script. The first set of tables below are the joint frequency and joint probability tables. Underneath these tables is displayed various diagnostic stats that can be used to assess the quality of your "test" variable (i.e., Business Plan Quality) in classifying a startup as being sucessful or not.

  Successful Company
Yes No
Business Plan Pass 2
(TP)
0
(FP)
Fail 1
(FN)
1
(TN)
  Successful Company
Yes No
Business Plan Pass 0.67
(TP)
0.00
(FP)
Fail 0.33
(FN)
1.00
(TN)
Test Sensitivity (TP) 0.67
False Alarm Rate (FP) 0.00
Miss Rate (FN) 0.33
Test Specificity (TN) 1.00
Base Rate 0.75
P(+Test) 0.50
P(-Test) 0.50
P(+Class | +Test) 1.00
P(-Class | +Test) 0.00
P(+Class | -Test) 0.50
P(-Class | -Test) 0.50
Likelihood Ratio(+Test) 0.00
Likelihood Ratio(-Test) 0.33
Accuracy 0.75
Gain 1.33

I'll return to discussing some of the stats being reported here in a later blog. For now, I'd like to complete the technical part of the demo by showing you the source code for the ClassifierDiagnostics.php object. If you put the ClassifierDiagnostics.php object in the same php-enabled folder the as business_plan_and_success.php demo script, then point your browser at the demo script, you will see the output above.

<?php
/**
* @package ClassifierDiagnostics
* @author  Paul Meagher <paul@datavore.com> 
* @license PHP License v3.0    
* @version 0.2
*
* The primary references I used when developing this class were:
*
* @see http://hippocrates.ouhsc.edu/cdmtutor/2x2/2x2tut2.html
* @see http://www.musc.edu/dc/icrebm/sensitivity.html
*
* Sheskin, David. (2004) Handbook of Parametic and NonParametric 
* Statistical Procedures (pp. 245-333).
*/
class ClassifierDiagnostics {
  
  var 
$data          = array();
  var 
$joint_freq    = array();
  var 
$joint_prob    = array();
  var 
$col_marginals = array();
  var 
$row_marginals = array();
  
  
// default labels for crosstab display
  
var $row_name  "test";
  var 
$col_name  "class";
  var 
$row_true  "+";
  var 
$row_false "-";
  var 
$col_true  "+";
  var 
$col_false "-";
    
  
/*
  * If a two column data matrix is supplied to the class, it will   
  * proceed to compute various accuracy metrics from this data.  
  * Otherwise, use the loadJointFrequency method to bypass having  
  * to feed in raw data.
  */
  
function ClassifierDiagnostics($data="empty") {
    if (
$data != "empty") {
      
      
$this->data $data;
      
      
// zero the cell counts
      
$this->joint_freq[0][0] = 0// True Negative  - TN
      
$this->joint_freq[0][1] = 0// False Negative - FN
      
$this->joint_freq[1][0] = 0// False Positive - FP
      
$this->joint_freq[1][1] = 0// True Positive  - TP
      
      // zero the corresponding cell probabilities
      
$this->joint_prob[0][0] = 0;
      
$this->joint_prob[0][1] = 0;
      
$this->joint_prob[1][0] = 0;
      
$this->joint_prob[1][1] = 0;
      
      
$this->getJointFrequency();
      
$this->getColumnMarginals();
      
$this->getRowMarginals();
      
$this->getJointProbability();

    }
  }
  
  
/*
  * Load joint frequency distribution directly instead of 
  * building it from supplied training data.
  */
  
function loadJointFrequency($joint_freq) {
    
$this->joint_freq $joint_freq;
    
$this->getColumnMarginals();
    
$this->getRowMarginals();
    
$this->getJointProbability();
  }
     
  
/**
  * First index in joint_freq[t][c] matrix refers 
  * to test outcome while the second index refers 
  * the classification outcome.    
  */
  
function getJointFrequency() {
    
$nrows count($this->data);
    for (
$i=0$i $nrows$i++) {
      
// tally true negatives (TN): -test AND -class (aka specificity of test) 
      
if ( ($this->data[$i][0] == 0) AND ($this->data[$i][1] == 0) ) {
        
$this->joint_freq[0][0]++;
      }
      
// tally false negatives (FN): -test AND -class     
      
if ( ($this->data[$i][0] == 0) AND ($this->data[$i][1] == 1) ) {
        
$this->joint_freq[0][1]++;
      }
      
// tally false positives (FP): +test AND -class
      
if ( ($this->data[$i][0] == 1) AND ($this->data[$i][1] == 0) ) {
        
$this->joint_freq[1][0]++;
      }
      
// tally true positives (TP): +test, +class (aka sensitivity of test)
      
if ( ($this->data[$i][0] == 1) AND ($this->data[$i][1] == 1) ) {
        
$this->joint_freq[1][1]++;
      }
    }  
  }
  
  function 
getRowMarginals() {
    
$this->row_marginals[0]  = $this->joint_freq[0][0] + $this->joint_freq[0][1];
    
$this->row_marginals[1]  = $this->joint_freq[1][0] + $this->joint_freq[1][1];
  }

  function 
getColumnMarginals() {
    
$this->col_marginals[0]  = $this->joint_freq[0][0] + $this->joint_freq[1][0];
    
$this->col_marginals[1]  = $this->joint_freq[0][1] + $this->joint_freq[1][1];
  }
  
  function 
getJointProbability() {         
    
$this->joint_prob[0][0] = $this->joint_freq[0][0] / $this->col_marginals[0];
    
$this->joint_prob[1][0] = $this->joint_freq[1][0] / $this->col_marginals[0];
    
$this->joint_prob[0][1] = $this->joint_freq[0][1] / $this->col_marginals[1];
    
$this->joint_prob[1][1] = $this->joint_freq[1][1] / $this->col_marginals[1];
  }
  
  function 
getTruePositiveRate() {
    return 
$this->joint_prob[1][1];
  }

  function 
getTrueNegativeRate() {
    return 
$this->joint_prob[0][0];
  }
  
  function 
getFalsePositiveRate() {
    return 
$this->joint_prob[1][0];
  }

  function 
getFalseNegativeRate() {
    return 
$this->joint_prob[0][1];
  }

  function 
getBaseRate() {
    return (
$this->joint_freq[1][1] + $this->joint_freq[0][1]) / array_sum($this->col_marginals); 
  }

  function 
getPosterior($row_status$col_status) {
    if (
$row_status == 1) {
      if (
$col_status == 1) {
        return 
$this->joint_freq[1][1] / ( $this->joint_freq[1][1] + $this->joint_freq[1][0] );
      } else {
        return 
$this->joint_freq[1][0] / ( $this->joint_freq[1][1] + $this->joint_freq[1][0]);
      }
    } else {
      if (
$col_status == 1) {
        return 
$this->joint_freq[0][1] / ( $this->joint_freq[0][1] + $this->joint_freq[0][0]);
      } else {
        return 
$this->joint_freq[0][0] / ( $this->joint_freq[0][1] + $this->joint_freq[0][0]);
      }            
    }
  }

  function 
getRowProbability($row_status) {
    if (
$row_status == 1) {
      return (
$this->joint_freq[1][1] + $this->joint_freq[1][0]) / array_sum($this->row_marginals);   
    } else {
      return (
$this->joint_freq[0][1] + $this->joint_freq[0][0]) / array_sum($this->row_marginals);         
    }
  }

  function 
getAccuracy() {
    return (
$this->joint_freq[1][1] + $this->joint_freq[0][0]) / array_sum($this->row_marginals);   
  }

  function 
getLikelihoodRatio($row_status) {
    if (
$row_status == 1) {
      
$numerator   $this->joint_freq[1][1] / $this->col_marginals[1];
      
$denominator $this->joint_freq[1][0] / $this->col_marginals[0];
      return 
$numerator $denominator;
    } else {
      
$numerator   $this->joint_freq[0][1] / $this->col_marginals[1];
      
$denominator $this->joint_freq[0][0] / $this->col_marginals[0];
      return 
$numerator $denominator;
    }
  }

  function 
getGain() {
    return 
$this->getPosterior(1,1) / $this->getBaseRate();
  }

  function 
setRowName($row_name) {
    
$this->row_name $row_name;
  }

  function 
setColumnName($col_name) {
    
$this->col_name $col_name;
  }

  function 
setRowTrue($row_true) {
    
$this->row_true $row_true;
  }

  function 
setRowFalse($row_false) {
    
$this->row_false $row_false;
  }

  function 
setColumnTrue($col_true) {
    
$this->col_true $col_true;
  }

  function 
setColumnFalse($col_false) {
    
$this->col_false $col_false;
  }
  
  function 
showCrossTabs() {
    
?>
    <table cellpadding='15' align='center'>
      <tr>
        <td>
          <?php
          $this
->showTable($this->joint_freq);
          
?>
        </td>
        <td>
          <?php
          $this
->showTable($this->joint_prob"%01.2f");
          
?>
        </td>
      </tr>
    </table>
    <?php
  
}

  function 
showTable($matrix$format="%u") {
    
?>
    <table border='1' cellspacing='1' cellpadding='8'>
      <tr> 
        <td rowspan='2' colspan='2'> &nbsp; </td>
        <td colspan='2' align='center'><b><?php echo $this->col_name ?></b></td>
      </tr>
      <tr> 
        <td align='center' height='20' bgcolor='silver'><?php echo $this->col_true ?></td>
        <td align='center' height='20' bgcolor='silver'><?php echo $this->col_false ?></td>
      </tr>
      <tr> 
        <td rowspan='2' align='center'><b><?php echo $this->row_name ?></b></td>
        <td align='center' width='20' bgcolor='silver'><?php echo $this->row_true ?></td>
        <td align='center'><?php printf($format$matrix[1][1]); ?><br/>(TP)</td>
        <td align='center'><?php printf($format$matrix[1][0]); ?><br/>(FP)</td>
      </tr>
      <tr> 
        <td align='center' bgcolor='silver'><?php echo $this->row_false ?></td>
        <td align='center'><?php printf($format$matrix[0][1]); ?><br/>(FN)</td>
        <td align='center'><?php printf($format$matrix[0][0]); ?><br/>(TN)</td>
      </tr>
    </table>  
    <?php
  
}
  
  function 
showStats() {
    
?>
    <table align='center' cellpadding='5'>
     <tr bgcolor='silver'>
       <td>Test Sensitivity (TP)</td>
       <td><?php printf("%01.2f"$this->getTruePositiveRate()); ?></td>
     </tr>
     <tr bgcolor='silver'>
       <td>False Alarm Rate (FP)</td>
       <td><?php printf("%01.2f"$this->getFalsePositiveRate()); ?></td>
     </tr>
     <tr bgcolor='silver'>
       <td>Miss Rate (FN)</td>
       <td><?php printf("%01.2f"$this->getFalseNegativeRate()); ?></td>
     </tr>
     <tr bgcolor='silver'>
       <td>Test Specificity (TN)</td>
       <td><?php printf("%01.2f"$this->getTrueNegativeRate()); ?></td>
     </tr>
     <tr>
       <td>Base Rate</td>
       <td><?php printf("%01.2f"$this->getBaseRate()); ?></td>
     </tr>
     <tr>
       <td>P(+Test)</td>
       <td><?php printf("%01.2f"$this->getRowProbability(1)); ?></td>
     </tr>
     <tr>
       <td>P(-Test)</td>
       <td><?php printf("%01.2f"$this->getRowProbability(0)); ?></td>
     </tr>
     <tr>
       <td>P(+Class | +Test)</td>
       <td><?php printf("%01.2f"$this->getPosterior(11)); ?></td>
     </tr>
     <tr>
       <td>P(-Class | +Test)</td>
       <td><?php printf("%01.2f"$this->getPosterior(10)); ?></td>
     </tr>
     <tr>
       <td>P(+Class | -Test)</td>
       <td><?php printf("%01.2f"$this->getPosterior(01)); ?></td>
     </tr>
     <tr>
       <td>P(-Class | -Test)</td>
       <td><?php printf("%01.2f"$this->getPosterior(00)); ?></td>
     </tr>
     <tr>
       <td>Likelihood Ratio(+Test)</td>
       <td><?php printf("%01.2f"$this->getLikelihoodRatio(1)); ?></td>
     </tr>
     <tr>
       <td>Likelihood Ratio(-Test)</td>
       <td><?php printf("%01.2f"$this->getLikelihoodRatio(0)); ?></td>
     </tr>
     <tr>
       <td>Accuracy</td>
       <td><?php printf("%01.2f"$this->getAccuracy()); ?></td>
     </tr>
     <tr>
       <td>Gain</td>
       <td><?php printf("%01.2f"$this->getGain()); ?></td>
     </tr>
    </table>
    <?php
  
}

}
?>

Permalink 

 Archive 
 

Archive


 October 2019 [1]
 September 2019 [1]
 July 2019 [1]
 June 2019 [2]
 May 2019 [2]
 April 2019 [5]
 March 2019 [4]
 February 2019 [3]
 January 2019 [3]
 December 2018 [4]
 November 2018 [2]
 September 2018 [2]
 August 2018 [1]
 July 2018 [1]
 June 2018 [1]
 May 2018 [5]
 April 2018 [4]
 March 2018 [2]
 February 2018 [4]
 January 2018 [4]
 December 2017 [2]
 November 2017 [6]
 October 2017 [6]
 September 2017 [6]
 August 2017 [2]
 July 2017 [2]
 June 2017 [5]
 May 2017 [7]
 April 2017 [6]
 March 2017 [8]
 February 2017 [7]
 January 2017 [9]
 December 2016 [7]
 November 2016 [7]
 October 2016 [5]
 September 2016 [5]
 August 2016 [4]
 July 2016 [6]
 June 2016 [5]
 May 2016 [10]
 April 2016 [12]
 March 2016 [10]
 February 2016 [11]
 January 2016 [12]
 December 2015 [6]
 November 2015 [8]
 October 2015 [12]
 September 2015 [10]
 August 2015 [14]
 July 2015 [9]
 June 2015 [9]
 May 2015 [10]
 April 2015 [10]
 March 2015 [9]
 February 2015 [8]
 January 2015 [5]
 December 2014 [11]
 November 2014 [10]
 October 2014 [10]
 September 2014 [8]
 August 2014 [7]
 July 2014 [6]
 June 2014 [7]
 May 2014 [6]
 April 2014 [3]
 March 2014 [8]
 February 2014 [6]
 January 2014 [5]
 December 2013 [5]
 November 2013 [3]
 October 2013 [4]
 September 2013 [11]
 August 2013 [4]
 July 2013 [8]
 June 2013 [10]
 May 2013 [14]
 April 2013 [12]
 March 2013 [11]
 February 2013 [19]
 January 2013 [20]
 December 2012 [5]
 November 2012 [1]
 October 2012 [3]
 September 2012 [1]
 August 2012 [1]
 July 2012 [1]
 June 2012 [2]


Categories


 Agriculture [71]
 Bayesian Inference [14]
 Books [15]
 Business Models [24]
 Causal Inference [2]
 Creativity [7]
 Decision Making [15]
 Decision Trees [8]
 Design [36]
 Eco-Green [4]
 Economics [12]
 Education [10]
 Energy [0]
 Entrepreneurship [59]
 Events [2]
 Farming [20]
 Finance [25]
 Future [15]
 Growth [18]
 Investing [24]
 Lean Startup [10]
 Leisure [5]
 Lens Model [9]
 Making [1]
 Management [9]
 Motivation [3]
 Nature [22]
 Patents & Trademarks [1]
 Permaculture [34]
 Psychology [1]
 Real Estate [2]
 Robots [1]
 Selling [11]
 Site News [15]
 Startups [12]
 Statistics [3]
 Systems Thinking [3]
 Trends [7]
 Useful Links [3]
 Valuation [1]
 Venture Capital [5]
 Video [2]
 Writing [2]