perClass Documentation
development version 3.1.2 (22-Dec-2011)
Content

Comments? Ideas? Compliments?

Your email (only if you wish to be contacted)

Chapter 15: Performance evaluation

Table of contents

15.1. Introduction ↩

Design of pattern recognition system aims at providing two outcomes, namely the algorithm capable of performing decisions for new observations and the estimate of its performance. The classification performance may be reliable estimated only using the data unseen during training. In order to maximally leverage the limited amount of labeled examples available in most projects, perClass offers easy-to-use tools to perform sophisticated cross-validation strategies.

15.1.1. Cross-validation by rotation ↩

Cross-validation is an evaluation strategy where the available design data set is split into several parts. One part is left out and the algorithm is trained on the remaining parts. The trained algorithm is executed on the part left out, and its decisions are used to compute the classification error. In perClass, this form of cross-validation is called 'rotation' because the definition of a test set rotates over parts and each sample is tested only once.

The cross-validation of a linear classifier is performed in as follows:

>> load fruit; a
'Fruit set' 260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60)
>> pd=sdgauss*sddecide
untrained pipeline 2 steps: sdgauss+sdp_decide
>> [s,res]=sdcrossval(pd,a)
10 folds: [1: ] [2: ] [3: ] [4: ] [5: ] [6: ] [7: ] [8: ] [9: ] [10: ] 
s =

 10-fold rotation

 ind mean (std)  measure
   1 0.13 (0.02) mean error over classes, priors [0.3,0.3,0.3]

 average execution speed per fold: 0.74 msec

res = 

      method: 'rotation'
       folds: 10
    measures: {'mean-error'}
        data: [10x1 double]
        mean: 0.1311
         std: 0.0160    
   time_data: [10x1 double]
   time_mean: 7.3944e-04
    time_std: 6.1006e-06
   time_desc: [1x67 char]

The 10-fold cross-validation was performed using the default operating point weighting both classes equally (equal class priors). Note that the cross-validated algorithm must return decisions, not soft outputs.

The cross-validation result summary is provided as a string s. Detailed results are given in the res structure. The res.data field stores the per-fold estimates of performance measures. By default, the mean error over classes with equal class priors is used. Additional measures may be specified using the measures option:

>> sdcrossval(pd,a,'measures',{'class-errors','sensitivity','apple','specificity','apple'}); 
ans =

 10-fold rotation

 ind mean (std)  measure
   1 0.09 (0.03) error on apple
   2 0.11 (0.04) error on banana
   3 0.18 (0.05) error on stone
   4 0.91 (0.03) sensitivity on apple
   5 0.93 (0.02) specificity on apple

 average execution speed per fold: 0.75 msec    

The 'class-errors' yields per-class error rates. Note that some measures, such as sensitivity or specificity require definition of the target class. For the list of the available performance measures see the ROC chapter.

We may wish to suppress the information displayed by sdcrossval. This may be done either using the 'nodisplay' option or, globally, switching off all display message of perClass commands using:

>> sd_display off

The number of cross-validation folds may be changed using the 'folds' option.

>> [s,res]=sdcrossval(pd,a,'folds',20);
>> res

res = 

  method: 'rotation'
   folds: 20
measures: {'mean-error'}
    data: [20x1 double]
    mean: 0.1311
     std: 0.0167
   time_data: [10x1 double]
   time_mean: 7.3164e-04
    time_std: 2.9048e-06
   time_desc: [1x67 char]

Maximum number of folds in randomization method is limited by the number of samples in the smallest class.

15.1.2. How are the errors computed ↩

sdcrossval requires that each class in the test set maps to a classifier decision. If a test set class does not have its counterpart in the list of decisions, the corresponding error cannot be computed and sdcrossval raises an error.

This may happen, for example, when training a detector. Lets assume a two-class problem with 'apple' and 'banana' classes.

>> b
'Fruit set' 1334 by 2 sddata, 2 classes: 'apple'(667) 'banana'(667) 

We construct an untrained Gaussian detector on 'apple':

>> pd=sddetector([],'apple',sdgauss)
untrained pipeline 'sddetector'

Cross-validation on b throws an error message:

>> s=sdcrossval(pd,b)
10 folds: [1:   1: apple  -> apple    
  2: banana -> non-apple
Warning: Some test set classes do not match to classifier decisions.
True classes ------------------------------
sdlist (2 entries)
 ind name
   1 apple 
   2 banana
Decisions ------------------------------
sdlist (2 entries)
 ind name
   1 apple    
   2 non-apple
??? Error using ==> sdroc.sdroc_err at 124
Cannot compute the error.

The error is raised because the trained detector has apple and non-apple decisions while The banana class in the test set does not map to any decision.

The solution is to make sure the detector's non-target class is called 'banana':

>> pd=sddetector([],'apple',sdgauss,'nontarget','banana','nodisplay')
untrained pipeline 'sddetector'
>> s=sdcrossval(pd,b)
10 folds: [1: ] [2: ] [3: ] [4: ] [5: ] [6: ] [7: ] [8: ] [9: ] [10: ] 

s =

 10-fold rotation

 ind mean (std)  measure
   1 0.17 (0.01) mean error over classes, priors [0.5,0.5]

Note that the opposite situation is possible. Often, some of the classifier decisions do not map to any class present in the test set. Such situation appears, for example, in leave-one-out cross-validation where the single test object belongs to one class only or when evaluating classifiers with reject option.

15.1.3. Setting random seed ↩

For the sake of repeatability, we may fix the random seed:

>> [s,res]=sdcrossval(pd,a,'seed',42);
>> [s,res2]=sdcrossval(pd,a,'seed',42);
>> [res.data res2.data]

ans =

0.0556    0.0556
0.1444    0.1444
0.2222    0.2222
0.1000    0.1000
0.1444    0.1444
0.1222    0.1222
0.1333    0.1333
0.0889    0.0889
0.1000    0.1000
0.2000    0.2000

15.2. Accessing algorithms trained in cross-validation ↩

The optional third output of sdcrossval is an object that provides us with access to algorithms trained in each fold.

>> [s,res,e]=sdcrossval(pd,a,'seed',42);

Here we access the pipeline trained in the second fold:

>> e(2)
sequential pipeline     2x1 'Gaussian model+Decision'
 1  Gaussian model          2x3  3 classes, 3 components (sdp_normal)
 2  Decision                3x1  weighting, 3 classes, 1 ops at op 1 (sdp_decide)

15.3. Accessing per-fold data sets ↩

The evaluation obeject e also allows us to access training or test set of any fold.

>> [s,res,e]=sdcrossval(pd,a,'seed',42);

We need to provide it with the original data set used by sdcrossval and with a fold index. To retrieve the training set, use the gettrdata method:

>> tr=gettrdata(e,a,2)
'Fruit set' 234 by 2 sddata, 3 classes: 'apple'(90) 'banana'(90) 'stone'(54) 

For a test set, use gettsdata method:

>> ts=gettsdata(e,a,2)
'Fruit set' 26 by 2 sddata, 3 classes: 'apple'(10) 'banana'(10) 'stone'(6) 

Using these facilities, we may anytime investigate the confusion matrix of a given fold and compute any error or performance of interest:

>> sdconfmat(ts.lab,ts*e(2))

ans =

 True      | Decisions
 Labels    |  apple banana  stone  | Totals
--------------------------------------------
 apple     |    10      0      0   |    10
 banana    |     1      9      0   |    10
 stone     |     0      2      4   |     6
--------------------------------------------
 Totals    |    11     11      4   |    26

>> mean([0 1/10 2/6])   %  mean error over clases:

ans =

0.1444

15.4. Cross-validation by randomization ↩

perClass also provides 'randomization' type of cross-validation where training/test splits are constructed by random sampling of the total set.

By default, 50% of samples in each class is taken randomly in each fold.

>> [s,res]=sdcrossval(pd,a,'method','random');

>> res

res = 

  method: 'randomization'
   folds: 10
measures: {'mean-error'}
    data: [10x1 double]
    mean: 0.1336
     std: 0.0110

Number of cross-valiadation folds in randomization is not limited.

Optional numerical argument of 'method','random' allows us to fix different training fraction. Because it is passed directly to the randsubset method, we may use it to:

  • select a percentage of each class: sdcrossval(pd,a,'method','random',0.8)
  • specify a number of samples per class: sdcrossval(pd,a,'method','random',50)
  • select a training subset only from certain class/classes: sdcrossval(pd,a,'method','random',[50 0])

The last option is useful when we want to cross-validate a detector trained in a one-class fashion (only on examples of a specific class).

>> a
'Fruit set' 260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 

>> b=a(:,:,{'apple','banana'})
'Fruit set' 200 by 2 sddata, 2 classes: 'apple'(100) 'banana'(100) 

>> pd2=sddetector([],'banana',sdgauss,'reject',0.1,'non-target','apple')
untrained pipeline 'sddetector'

>> [s,res,e]=sdcrossval(pd2,b,'method','random',[0 50]);

s =

 10-fold randomization

 ind mean (std)  measure
   1 0.14 (0.01) mean error over classes, priors [0.5,0.5]

Note that we specify the non-target name explicitly. If we did not do it, the default non-target decision ('non-banana') would not match with any true class and all non-target detections would be counted as errors. We are also providing the vector parameter [0 50] of 'method','random' in the order of classes in b.lab.list (banana is second).

>> e(1)
sequential pipeline     2x1 'Gaussian model+Decision'
 1  Gaussian model          2x1  one class, 1 component (sdp_normal)
 2  Decision                1x1  thresholding on banana at op 1 (sdp_decide)

>> tr=gettrdata(e,b,1) %  training data set does not contain apples
'Fruit set' 50 by 2 sddata, class: 'banana'

>> ts=gettsdata(e,b,1)
'Fruit set' 150 by 2 sddata, 2 classes: 'apple'(100) 'banana'(50) 

15.5. Leave-one-out evaluation ↩

sdcrossval also supports the leave-one-out cross-valiation scheme where each sample is once considered as a test set and training is performed on remaining samples. Leave-one-out evaluation is beneficial for very small sample sizes.

>> c=randsubset(a,[3 3 0])
'Fruit set' 6 by 2 sddata, 2 classes: 'apple'(3) 'banana'(3) 

>> [s,res,e]=sdcrossval(sdlinear*sddecide,c,'method','loo')
6 folds: [1: ] [2: ] [3: ] [4: ] [5: ] [6: ] 

s =

 6-fold leave-one-out

 ind mean (std)  measure
   1 0.33 (0.24) error on apple
   2 0.33 (0.24) error on banana

res = 

  method: 'leave-one-out'
   folds: 6
measures: {'err(apple)'  'err(banana)'}
    data: [6x2 double]
    mean: [0.3333 0.3333]
     std: [0.2357 0.2357]

completed 6-fold evaluation 'sde_loo' (ppl: '')

By default, leave-one-out evaluation includes class error measures. Note that because each of our samples originates from one of the classes, the error on the other class is not defined.

>> res.data

ans =

 0   NaN   
 0   NaN   
 1   NaN   
NaN     0   
NaN     0   
NaN     1   

15.5.1. Leave-one-out over property ↩

Frequently used type of leave-one-out is cross-validation over specific labels, for example over patients or objects. This allows us to quickly valiate generalization error on unseen patient or object.

We may use the example small_medical data set that contains data samples originating from a medical diagnostic problem. For each sample, we know not only class and tissue type but also patient label.

>> load small_medical
>> a
'small medical' 300 by 10 sddata, 2 classes: 'disease'(57) 'no-disease'(243) 

>> a'
'small medical' 300 by 10 sddata, 2 classes: 'disease'(57) 'no-disease'(243) 
sample props: 'lab'->'class' 'class'(L) 'pixel'(N) 'patient'(L) 'tissue'(L)
feature props: 'featlab'->'featname' 'featname'(L)
data props:  'data'(N)
>> a.patient'
 ind name        size percentage
   1 Alex         122 (40.7%)
   2 Chris        121 (40.3%)
   3 Gabriel       57 (19.0%)

Using sdcrossval, we may quickly perform leave-one-patient-out validation:

>> pd=sdpca([],3)*sdlinear*sddecide
untrained pipeline 3 steps: sdpca+sdlinear+sdp_decide

>> [s,res]=sdcrossval(pd,a,'method','loo','over','patient')
3 folds: [1: ] [2: ] [3: ] 

s =

 3-fold leave-one-out

 ind mean (std)  measure
   1 0.77 (0.15) error on disease
   2 0.22 (0.20) error on no-disease

res = 

  method: 'leave-one-out'
   folds: 3
measures: {'err(disease)'  'err(no-disease)'  'mean-error'}
    data: [3x2 double]
    mean: [0.7745 0.2225]
     std: [0.1464 0.1981]