perClass Documentation
development version 3.2 (14-Mar-2012)
Content

Comments? Ideas? Compliments?

Your email (only if you wish to be contacted)

Chapter 9: Making decisions

Table of contents

9.1. Introduction ↩

Making decisions is the fundamental raison d'etre of a pattern recognition system. However, majority of statistical classifiers don't produce decisions directly. Instead, they deliver soft outputs such as estimated posterior probability of class membership, confidence level or distance to a decision boundary. This section introduces a comprehensive set of perClass tools for converting the soft outputs into crisp decisions. Understanding how decisions are made is necessary in order to optimize the performance of a pattern recognition system.

We can distinguish two fundamentally different decision making strategies:

  • Thresholding-based decisions are performed when a single concept of interest (target) is to be detected. In order to decide whether an observation does or does not correspond to the target concept, some form of the target-related similarity (soft-output) is computed and thresholded. If the similarity value falls above the pre-specified threshold, the observation is labeled as "target", otherwise as "non-target". This decision-making strategy is applied in two-class classification and detection (sometimes called distance-based rejection or one-class classification).

  • Weighting-based decisions are adopted for selecting one of multiple classes. The classifier provides a set of comparable outputs, each related to a specific class. In the simplest scenario, the output with the maximum value defines the decision. Note that this strategy assumes that all the classes are equally important. In many situations some classes exhibit higher misclassification costs than others. For example, labeling an ill patient as healthy in cancer diagnostics has much higher human cost than the opposite error, which can be easily discovered by the follow-up analysis. Cost-sensitive decisions are accomplished by weighting the soft-outputs before the maximum operation.

Note that, in general, the soft-output weights are not identical to class priors. This analogy holds only in situations where the classifier is trained on a balanced data with equal class sizes. Otherwise, the weights do not directly correspond to the test set priors but rather rectify the original class imbalance.

The decisions are performed at a specific operating point. According to the popular view, the operating point is the threshold value or the set of weights. In perClass, however, the term operating point refers to a complete body of information required to return decisions on the output of a given classifier in a specific problem. In addition to the threshold or weights, the operating point in perClass encapsulates also the polarity of the classifier output (similarity or distance) and the list of decisions. The polarity allows us to perform decisions also for classifiers returning distances such as the nearest neighbor rule.

9.2. Default operating point ↩

In Chapter 7 we have seen that training a model on a data set provide us with the soft-outputs. In case of a probabilistic classifier the soft-outputs may be interpreted as the confidence that a data point belongs to each of the class models. In order to convert the soft-outputs into decisions we need to set the operating point. The default operating point is set by considering equal weights for the soft-outputs (each class i considered equally important) and assigning the sample to the class with the highest confidence. This is achieved by the command sddecide. Let us first visualize the soft-outputs of a probabilistic model:

>> load fruit      
>> a               
'Fruit set' 260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 
>> p=sdgauss(a)   %  train a Gaussian model on each of the three classes   
Gaussian model pipeline 2x3  3 classes, 3 components (sdp_normal)
>> sdscatter(a,p) %  visualize the soft-outputs  

The figures below visualize the soft-outputs: one Gaussian for each class.

The sddecide command specify that the pipeline returns decisions at the default operating point. The decisions of the resulting classifier are visualized with sdscatter.

>> pd=sddecide(p) %  add the decision step to the pipeline p 
sequential pipeline     2x1 'Gaussian model+Decision'
 1  Gaussian model          2x3  3 classes, 3 components (sdp_normal)
 2  Decision                3x1  weighting, 3 classes, 1 ops at op 1 (sdp_decide)
>> sdscatter(a,pd) 

9.2.1. Visualization of decisions in multi dimensional feature space ↩

The example above illustrates how to visualize the classifier decisions in 2D feature space. When the data has more dimensions the same visualization is more complex because our 2D view is only a small part of a large space. It is possible to still visualize the decisions, but only with respect to a specific data sample. Given a feature space of dimensionality D (with D>2) sdscatter visualizes the classifier decisions for two features by fixing the other dimensions to the values of the point selected with the cursor. Clicking on a different sample will show a different plane of the decision space, determined by the new point (cyan square marker). In the example below, a Gaussian classifier is trained on the 10D dimensional feature space. The classifier decisions are visualized for features 2 and 8 for 4 different samples. The sample chosen as reference is the sample highlighted in black.

Each sample shows a different decision plane, therefore different classifier decisions. It is, of course, possible to change the two features visualized using the up/down arrow keys.

This visualization allows us to get a better feeling on how complex the decision boundary is, and on what we can expect in the close neighborhood of the selected sample.

9.3. Defining operating points ↩

perClass handles the decisions only through the objects of sdops (operating point set) class. The sdops object contains everything needed to turn the classifier soft output into the crisp decisions. The operating point is defined by:

  • a threshold or a set of weights,
  • the polarity of classifier soft output (similarity or distance),
  • sdlist object storing the decision names

Let's consider a two class problem:

>> load fruit; a=a(:,:,[1,2])
200 by 2 sddata, 2 classes: 'apple'(100) 'banana'(100) 

We create a set of three weighting-based operating points. Note that we must provide the decision names (the label list of the data a) as also these are stored inside the sdops object.

>> ops=sdops('w',[0.5 0.5;0.2 0.8; 0.7 0.3],a.lab.list)
Weight-based operating set (3 ops, 2 classes) at op 1

The content of the sdops can be inspected both with the dot notation or with getdata function:

>> size(ops)

ans =

     3     2

>> ops.data

ans =

    0.5000    0.5000
    0.2000    0.8000
    0.7000    0.3000

>> getdata(ops)   %  provides the list of ops 

ans =

    0.5000    0.5000
    0.2000    0.8000
    0.7000    0.3000

>> ops.data(3,:)

ans =

    0.7000    0.3000    

>> getdata(ops,3)   %  provides the weights of a specific operating point:

ans =

    0.7000    0.3000

Similarly, we can access the list directly with the dot notation or with the getlist command:

>> ops.list
sdlist (2 entries)
 ind name
   1 apple 
   2 banana     

>> getlist(ops)
sdlist (2 entries)
 ind name
   1 apple 
   2 banana       

Convertions between decision name or index in the list may be performed using

>> ind2name(ops.list,2)   %  provides the label name of the second class

ans =

banana

>> name2ind(ops.list,'banana')    %  specifies the index for class 'banana'

ans =

     2  

9.4. Using operating points ↩

sdops object always defines one operating point in the set as "default". The index of a default operating point may be retrieved using

>> getcurop(ops)

ans =

     1

A different operating point maybe selected as current using setcurop:

>> ops=setcurop(ops,3)
Weight-based operating set (3 ops, 2 classes) at op 3

9.4.1. Example on weighting-based decision ↩

Let's estimate the output of a classifier a data set. We will train linear discriminant and estimate its output on first 5 samples in out training set.

We will convert the data set out into a matrix data and perform decisions on these matrix:

>> p=sdlinear(a);
>> out=a(1:5)*p
'Fruit set' 5 by 2 sddata, class: 'apple'
>> data=double(out)

out =

0.8885    0.1115
0.9189    0.0811
0.5470    0.4530
0.9923    0.0077
0.9876    0.0124

decide function returns two outputs, namely the per-sample decisions and the list object with all possible decisions ops is capable of:

>> [dec,list]=decide(ops,data)

dec =

    1
    1
    1
    1
    1

sdlist (2 entries)
 ind    code  : name
   1 apple 
   2 banana

Note that we may perform decisions directly on raw numerical data because the operating point object contains all information necessary to make decisions. The vector with decisions dec returns the numerical code of the classes.

If decide function is applied to sddata data set, it returns sdlab object with decisions.

>> [dec,list]=decide(ops,out)
sdlab with 5 entries from 'apple'
sdlist (2 entries)
 ind name
   1 apple 
   2 banana

Note the difference between the list returned as the send output of decide and the list stored in the dec:

>> dec.list
sdlist (1 entries)
 ind name
   1 apple 

The dec is an sdlab object and therefore contains only the available entries in its list. In our example, it is only apple because there was no banana decision. The list returned as the second argument of decide describes what decisions ops can make. That's why we can see there both the apple and the banana entries.

9.4.2. Example on thresholding-based decision ↩

We will create a simple distance-based detector. We select one sample from the apple class and compute the squared Euclidean distances to this point using sdprox:

>> proto=a(10)
'Fruit set' 1 by 2 sddata, class: 'apple'
>> d=a*sdprox(proto)
'Fruit set' 200 by 1 sddata, 2 classes: 'apple'(100) 'banana'(100) 
>> d.featlab
sdlab with one entry: 'apple'

The data set d stores the distance of the samples to the chosen prototype (sample number 10 from the class apple). This distance may be directly used to make decisions using the thresholding approach. Samples closer to the prototype than the threshold should be assigned to the apple class, otherwise to the banana class (we assume only these two classes in our universe).

We use a histogram to define meaningful threshold values:

>> [h,x]=hist(+d,30);

>> ops=sdops('thr',x,d.lab.list,'polarity','distance')
Thr-based operating set (30 ops) at op 1, distance    

The first value is chosen as threshold for the detector. We can verify that the prototype itself is classified as apple:

>> dec=decide(ops,d(1:10))
sdlab with 10 entries, 2 groups: 'apple'(2) 'banana'(8) 
>> +dec

ans =

banana
apple 
banana
banana
banana
banana
banana
banana
banana
apple 

To understand the decisions, lets display the threshold used and the raw distances:

>> ops.data(ops.curop)

ans =

5.1860

>> +d(1:10)

ans =

   83.5997
0.0976
  110.1402
   47.6317
   13.1391
  136.8932
   80.0115
  101.9587
   20.8805
     0

The entries with smaller distance than 5.1860 were labeled as apple (the class of the prototype object), remaining samples are labeled as banana.

9.5. Setting specific operating points ↩

A set of operating points may be constructed manually specifying the decision type, data (thresholds or weights) and decision names. Let us use a simple two-class artificial problem and train a nearest mean classifier:

>> load fruit; a=a(:,:,[1,2]);
>> [tr,ts]=randsubset(a,0.8)
160 by 2 sddata, 2 classes: 'apple'(80) 'banana'(80) 
40 by 2 sddata, 2 classes: 'apple'(20) 'banana'(20) 
>> p=sdnmean(tr)
sequential pipeline     2x2 'Nearest mean'
 1  sdp_normal          2x2  2 classes, 2 components

We create a set of three operating points using the weighting approach. The decision names will correspond to the classes.

>> ops=sdops('w',[0.2 0.8; 0.5 0.5; 0.8 0.2],tr.lab.list)
Weight-based operating set (3 ops, 2 classes) at op 1

By default, the current operating point is the first one supplied. We can now compare the true labels in the test set to classifier decisions by looking at the confusion matrix:

>> sdconfmat(ts.lab,ts*p*ops)  %  the default operating point is used to make decisions 

 True      | Decisions
 Labels    |  apple banana  | Totals
-------------------------------------
 apple     |     7     13   |    20
 banana    |     0     20   |    20
-------------------------------------
 Totals    |     7     33   |    40

Because the weight for the banana output is emphasized, we can observe no errors on the banana class but more errors for the apple class. The effect of output weighting will become more apparent when we use the 3rd operating point, using the opposite weighting scheme where apple is deemed more important than banana:

>> ops=setcurop(ops,3)    %  set the 3rd op.point as current
Weight-based operating set (3 ops, 2 classes) at op 3
>> sdconfmat(ts.lab,ts*p*ops)  

 True      | Decisions
 Labels    |  apple banana  | Totals
-------------------------------------
 apple     |    20      0   |    20
 banana    |    12      8   |    20
-------------------------------------
 Totals    |    32      8   |    40

9.6. Confusion matrices ↩

Confusion matrix shows the match between true labels and classifier decisions. It is a matrix with true labels on the rows and estimated labels in the columns. The diagonals stores the number of correctly classified objects, while the off-diagonal elements refer to the misclassified objects. The example below shows the confusion matrix for a two class data a where the labels are estimated using the trained pipeline p:

>> truelab=a.lab;                %  sdlab object storing the labels
>> decisions=a*sddecide(p);    %  sdlab object with classifier decisions             
>> sdconfmat(truelab,decisions)
ans =

 True      | Decisions
 Labels    | apple  banana  | Totals
-------------------------------------
 apple     |   430     66   |   496
 banana    |    82    422   |   504
-------------------------------------
 Totals    |   512    488   |  1000 

In the data a there are 496 apples, of which 66 are wrongly classified as 'banana', while 430 are correctly classified as 'apple'.

9.6.1. Normalized confusion matrices ↩

The confusion matrix can be normalized by the true number of objects per class. The example below shows a confusion matrix for a eight class data a where the labels are estimated using the trained pipeline p:

>> sdconfmat(truelab,decisions,'norm')

ans =

 True      | Decisions
 Labels    |      a      b      c      d      e      f      g      h  | Totals
-------------------------------------------------------------------------------
 a         | 0.908  0.080  0.000  0.000  0.000  0.011  0.001  0.001   | 1.00
 b         | 0.020  0.938  0.000  0.000  0.000  0.000  0.042  0.000   | 1.00
 c         | 0.000  0.000  0.921  0.079  0.000  0.000  0.000  0.000   | 1.00
 d         | 0.000  0.000  0.241  0.758  0.000  0.000  0.001  0.000   | 1.00
 e         | 0.000  0.000  0.000  0.000  0.838  0.162  0.000  0.000   | 1.00
 f         | 0.000  0.000  0.000  0.000  0.153  0.847  0.000  0.000   | 1.00
 g         | 0.000  0.004  0.000  0.000  0.000  0.000  0.866  0.130   | 1.00
 h         | 0.000  0.000  0.000  0.000  0.000  0.000  0.095  0.905   | 1.00
-------------------------------------------------------------------------------

9.6.2. Storing confusion matrices as strings ↩

The confusion matrix can be saves as a string (str) to be used for example to generate automatic reports, or as a variable (cm).

>> str=sdconfmat(truelab,decisions,'string');

str =

 True      | Decisions
 Labels    |      a      b      c      d      e      f      g      h  | Totals
-------------------------------------------------------------------------------
 a         |  1119     99      0      0      0     13      1      1   |  1233
 b         |    25   1189      0      0      0      0     53      0   |  1267
 c         |     0      0   1114     95      0      0      0      0   |  1209
 d         |     0      0    306    962      0      0      1      0   |  1269
 e         |     0      0      0      0   1038    201      0      0   |  1239
 f         |     0      0      0      0    187   1033      0      0   |  1220
 g         |     0      5      0      0      0      0   1122    169   |  1296
 h         |     0      0      0      0      0      0    120   1147   |  1267
-------------------------------------------------------------------------------
 Totals    |  1144   1293   1420   1057   1225   1247   1297   1317   | 10000


>> cm=sdconfmat(truelab,decisions);

cm =

  Columns 1 through 6

        1119          99           0           0           0          13
          25        1189           0           0           0           0
           0           0        1114          95           0           0
           0           0         306         962           0           0
           0           0           0           0        1038         201
           0           0           0           0         187        1033
           0           5           0           0           0           0
           0           0           0           0           0           0

  Columns 7 through 8

           1           1
          53           0
           0           0
           1           0
           0           0
           0           0
        1122         169
         120        1147

9.6.3. Rectangular confusion matrices ↩

sdconfmat can limit the set of classes or decisions to user specified lists. Note that only subset of samples is used! Rectangular confusion matrices arise in situations where we have 'outlier' true class but several rejection decisions (e.g. 'background','not-fruit',...). In the example below only the true labels of the classes a, c and d are visualized:

>> sdconfmat(truelab,decisions,'classes',{'a','c','d'})

ans =

 True      | Decisions
 Labels    |      a      c      d      e      f      g      h  | Totals
------------------------------------------------------------------------
 a         |   121      0      0      0      5      0      0   |   126
 c         |     0      0    109      0      0      0      0   |   109
 d         |     0     15    130      0      0      0      0   |   145
------------------------------------------------------------------------
 Totals    |   121     15    239      0      5      0      0   |   380

The classes and the decisions to be visualized can be chosen independently:

>> sdconfmat(truelab,decisions,'classes',{'a','c','d'},'decisions',{'a','b','c','d'})

ans =

 True      | Decisions
 Labels    |      a      b      c      d  | Totals
---------------------------------------------------
 a         |   121      0      0      0   |   121
 c         |     0      0      0    109   |   109
 d         |     0      0     15    130   |   145
---------------------------------------------------
 Totals    |   121      0     15    239   |   375

When no classes option is provided, the true label of all classes is visualized as default:

>> sdconfmat(truelab,decisions,'decisions',{'a','b','c','d'})

ans =

True      | Decisions
 Labels    |      a      b      c      d  | Totals
---------------------------------------------------
 a         |   121      0      0      0   |   121
 b         |    57      0      0      0   |    57
 c         |     0      0      0    109   |   109
 d         |     0      0     15    130   |   145
 e         |     0      0      0      0   |     0
 f         |     0      0      0      0   |     0
 g         |     1      0      0      0   |     1
 h         |     0      0      0      0   |     0
---------------------------------------------------
 Totals    |   179      0     15    239   |   433

9.6.4. Confusion matrices for a set of operating points ↩

The confusion matrices from classifier soft output can be estimated for sets of operating points simultaneously. In this example, a test set with 10 000 samples is used and the confusion matrices are estimated at 10 000 randomly selected weighting-based operating points. The speed of the computation is also shown.

>> tr=sddata(gendatm(10000)); ts=sddata(gendatm(10000))
10000 by 2 sddata, 8 classes: [1270  1276  1142  1243  1315  1289  1238  1227]
>> p=sdquadratic(tr);     
>> out=ts*p;    
>> ops=sdops('w',rand(10000,8),tr.lab.list)
Weight-based operating set (10000 ops, 8 classes) at op 1   
>> tic; [cm,ll]=sdconfmat(ops,out); toc
Elapsed time is 3.182373 seconds.

>> size(cm)

ans =

       8           8       10000

The variable cm stores a confusion matrix (which has size 8*8 for a eight class problem) for each of the 10000 operating points. The sdconfmat routine can also be used for a friendly visualization of a single confusion matrix, e.g. the one at operating point number 42.

>> sdconfmat(cm(:,:,42),ts.lab.list)
 True      | Decisions
 Labels    |      a      b      c      d      e      f      g      h  | Totals
-------------------------------------------------------------------------------
 a         |  1191     76      0      0      0      6      4      0   |  1277
 b         |    71   1152      0      0      0      0     56      0   |  1279
 c         |     0      0    978    175      0      0      0      0   |  1153
 d         |     0      0    251    984      0      0      0      0   |  1235
 e         |     0      0      0      0   1092    223      0      0   |  1315
 f         |     2      0      0      0    194   1089      0      0   |  1285
 g         |     0      0      0      0      0      0   1192     47   |  1239
 h         |     0      0      0      0      0      0    297    920   |  1217
-------------------------------------------------------------------------------
 Totals    |  1264   1228   1229   1159   1286   1318   1549    967   | 10000

9.6.5. Visualization of the per class errors ↩

In order to inspect which samples are misclassified in a certain class it may be useful to visualize the errors of the confusion matrix. This can be achieved by creating a new sample property that combines the true labels with the decisions of the classifier.

>> tr=sddata(gendatf(50)); tr=tr(:,:,[1,2]);
>> ts=sddata(gendatf(10000)); ts=ts(:,:,[1,2]);
>> p=sdmixture(tr,'comp',2,'iter',10); 
>> dec=ts*sddecide(p)
sdlab with 7334 entries, 2 groups: 'apple'(3197) 'banana'(4137)
>> ts.confmat=[ts.lab '-' dec];   %  concatenation of the two class labels
>> getproplist(ts)

ans = 

'class'    'ident'    'confmat'

>> sdscatter(ts)

In the Scatter menu go to Use property and select confmat.