- 9.1. Introduction
- 9.2. Default operating point
- 9.3. Defining operating points
- 9.4. Using operating points
- 9.5. Setting specific operating points
- 9.6. Confusion matrices
9.1. Introduction ↩
Making decisions is the fundamental raison d'etre of a pattern recognition system. However, majority of statistical classifiers don't produce decisions directly. Instead, they deliver soft outputs such as estimated posterior probability of class membership, confidence level or distance to a decision boundary. This section introduces a comprehensive set of perClass tools for converting the soft outputs into crisp decisions. Understanding how decisions are made is necessary in order to optimize the performance of a pattern recognition system.
We can distinguish two fundamentally different decision making strategies:
Thresholding-based decisions are performed when a single concept of interest (target) is to be detected. In order to decide whether an observation does or does not correspond to the target concept, some form of the target-related similarity (soft-output) is computed and thresholded. If the similarity value falls above the pre-specified threshold, the observation is labeled as "target", otherwise as "non-target". This decision-making strategy is applied in two-class classification and detection (sometimes called distance-based rejection or one-class classification).
Weighting-based decisions are adopted for selecting one of multiple classes. The classifier provides a set of comparable outputs, each related to a specific class. In the simplest scenario, the output with the maximum value defines the decision. Note that this strategy assumes that all the classes are equally important. In many situations some classes exhibit higher misclassification costs than others. For example, labeling an ill patient as healthy in cancer diagnostics has much higher human cost than the opposite error, which can be easily discovered by the follow-up analysis. Cost-sensitive decisions are accomplished by weighting the soft-outputs before the maximum operation.
Note that, in general, the soft-output weights are not identical to class priors. This analogy holds only in situations where the classifier is trained on a balanced data with equal class sizes. Otherwise, the weights do not directly correspond to the test set priors but rather rectify the original class imbalance.
The decisions are performed at a specific operating point. According to the popular view, the operating point is the threshold value or the set of weights. In perClass, however, the term operating point refers to a complete body of information required to return decisions on the output of a given classifier in a specific problem. In addition to the threshold or weights, the operating point in perClass encapsulates also the polarity of the classifier output (similarity or distance) and the list of decisions. The polarity allows us to perform decisions also for classifiers returning distances such as the nearest neighbor rule.
9.2. Default operating point ↩
In Chapter 7 we have seen that training a model on a data set provide us with the soft-outputs. In case of a probabilistic classifier the soft-outputs may be interpreted as the confidence that a data point belongs to each of the class models. In order to convert the soft-outputs into decisions we need to set the operating point. The default operating point is set by considering equal weights for the soft-outputs (each class i considered equally important) and assigning the sample to the class with the highest confidence. This is achieved by the command sddecide. Let us first visualize the soft-outputs of a probabilistic model:
>> load fruit
>> a
'Fruit set' 260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60)
>> p=sdgauss(a) % train a Gaussian model on each of the three classes
Gaussian model pipeline 2x3 3 classes, 3 components (sdp_normal)
>> sdscatter(a,p) % visualize the soft-outputs
The figures below visualize the soft-outputs: one Gaussian for each class.

The sddecide command specify that the pipeline returns decisions at the default operating point. The decisions of the resulting classifier are visualized with sdscatter.
>> pd=sddecide(p) % add the decision step to the pipeline p
sequential pipeline 2x1 'Gaussian model+Decision'
1 Gaussian model 2x3 3 classes, 3 components (sdp_normal)
2 Decision 3x1 weighting, 3 classes, 1 ops at op 1 (sdp_decide)
>> sdscatter(a,pd)

9.2.1. Visualization of decisions in multi dimensional feature space ↩
The example above illustrates how to visualize the classifier decisions in 2D feature space. When the data has more dimensions the same visualization is more complex because our 2D view is only a small part of a large space. It is possible to still visualize the decisions, but only with respect to a specific data sample. Given a feature space of dimensionality D (with D>2) sdscatter visualizes the classifier decisions for two features by fixing the other dimensions to the values of the point selected with the cursor. Clicking on a different sample will show a different plane of the decision space, determined by the new point (cyan square marker). In the example below, a Gaussian classifier is trained on the 10D dimensional feature space. The classifier decisions are visualized for features 2 and 8 for 4 different samples. The sample chosen as reference is the sample highlighted in black.

Each sample shows a different decision plane, therefore different classifier decisions. It is, of course, possible to change the two features visualized using the up/down arrow keys.
This visualization allows us to get a better feeling on how complex the decision boundary is, and on what we can expect in the close neighborhood of the selected sample.
9.3. Defining operating points ↩
perClass handles the decisions only through the objects of
sdops (operating point set) class. The sdops object
contains everything needed to turn the classifier soft output into the
crisp decisions. The operating point is defined by:
- a threshold or a set of weights,
- the polarity of classifier soft output (similarity or distance),
sdlistobject storing the decision names
Let's consider a two class problem:
>> load fruit; a=a(:,:,[1,2])
200 by 2 sddata, 2 classes: 'apple'(100) 'banana'(100)
We create a set of three weighting-based operating points. Note that we
must provide the decision names (the label list of the data a) as also these
are stored inside the sdops object.
>> ops=sdops('w',[0.5 0.5;0.2 0.8; 0.7 0.3],a.lab.list)
Weight-based operating set (3 ops, 2 classes) at op 1
The content of the sdops can be inspected both with the dot notation or
with getdata function:
>> size(ops)
ans =
3 2
>> ops.data
ans =
0.5000 0.5000
0.2000 0.8000
0.7000 0.3000
>> getdata(ops) % provides the list of ops
ans =
0.5000 0.5000
0.2000 0.8000
0.7000 0.3000
>> ops.data(3,:)
ans =
0.7000 0.3000
>> getdata(ops,3) % provides the weights of a specific operating point:
ans =
0.7000 0.3000
Similarly, we can access the list directly with the dot notation or with
the getlist command:
>> ops.list
sdlist (2 entries)
ind name
1 apple
2 banana
>> getlist(ops)
sdlist (2 entries)
ind name
1 apple
2 banana
Convertions between decision name or index in the list may be performed using
>> ind2name(ops.list,2) % provides the label name of the second class
ans =
banana
>> name2ind(ops.list,'banana') % specifies the index for class 'banana'
ans =
2
9.4. Using operating points ↩
sdops object always defines one operating point in the set as "default". The index of a default operating point may be retrieved using
>> getcurop(ops)
ans =
1
A different operating point maybe selected as current using setcurop:
>> ops=setcurop(ops,3)
Weight-based operating set (3 ops, 2 classes) at op 3
9.4.1. Example on weighting-based decision ↩
Let's estimate the output of a classifier a data set. We will train linear discriminant and estimate its output on first 5 samples in out training set.
We will convert the data set out into a matrix data and perform
decisions on these matrix:
>> p=sdlinear(a);
>> out=a(1:5)*p
'Fruit set' 5 by 2 sddata, class: 'apple'
>> data=double(out)
out =
0.8885 0.1115
0.9189 0.0811
0.5470 0.4530
0.9923 0.0077
0.9876 0.0124
decide function returns two outputs, namely the per-sample decisions and
the list object with all possible decisions ops is capable of:
>> [dec,list]=decide(ops,data)
dec =
1
1
1
1
1
sdlist (2 entries)
ind code : name
1 apple
2 banana
Note that we may perform decisions directly on raw numerical data because
the operating point object contains all information necessary to make
decisions. The vector with decisions dec returns the numerical code of
the classes.
If decide function is applied to sddata data set, it returns sdlab
object with decisions.
>> [dec,list]=decide(ops,out)
sdlab with 5 entries from 'apple'
sdlist (2 entries)
ind name
1 apple
2 banana
Note the difference between the list returned as the send output of
decide and the list stored in the dec:
>> dec.list
sdlist (1 entries)
ind name
1 apple
The dec is an sdlab object and therefore contains only the
available entries in its list. In our example, it is only apple because
there was no banana decision. The list returned as the second argument of
decide describes what decisions ops can make. That's why we can see there
both the apple and the banana entries.
9.4.2. Example on thresholding-based decision ↩
We will create a simple distance-based detector. We select one sample from
the apple class and compute the squared Euclidean distances to this
point using sdprox:
>> proto=a(10)
'Fruit set' 1 by 2 sddata, class: 'apple'
>> d=a*sdprox(proto)
'Fruit set' 200 by 1 sddata, 2 classes: 'apple'(100) 'banana'(100)
>> d.featlab
sdlab with one entry: 'apple'
The data set d stores the distance of the samples to the chosen prototype
(sample number 10 from the class apple). This distance may be directly
used to make decisions using the thresholding approach. Samples closer to
the prototype than the threshold should be assigned to the apple class,
otherwise to the banana class (we assume only these two classes in our
universe).
We use a histogram to define meaningful threshold values:
>> [h,x]=hist(+d,30);
>> ops=sdops('thr',x,d.lab.list,'polarity','distance')
Thr-based operating set (30 ops) at op 1, distance
The first value is chosen as threshold for the detector. We can verify that
the prototype itself is classified as apple:
>> dec=decide(ops,d(1:10))
sdlab with 10 entries, 2 groups: 'apple'(2) 'banana'(8)
>> +dec
ans =
banana
apple
banana
banana
banana
banana
banana
banana
banana
apple
To understand the decisions, lets display the threshold used and the raw distances:
>> ops.data(ops.curop)
ans =
5.1860
>> +d(1:10)
ans =
83.5997
0.0976
110.1402
47.6317
13.1391
136.8932
80.0115
101.9587
20.8805
0
The entries with smaller distance than 5.1860 were labeled as apple (the
class of the prototype object), remaining samples are labeled as banana.
9.5. Setting specific operating points ↩
A set of operating points may be constructed manually specifying the decision type, data (thresholds or weights) and decision names. Let us use a simple two-class artificial problem and train a nearest mean classifier:
>> load fruit; a=a(:,:,[1,2]);
>> [tr,ts]=randsubset(a,0.8)
160 by 2 sddata, 2 classes: 'apple'(80) 'banana'(80)
40 by 2 sddata, 2 classes: 'apple'(20) 'banana'(20)
>> p=sdnmean(tr)
sequential pipeline 2x2 'Nearest mean'
1 sdp_normal 2x2 2 classes, 2 components
We create a set of three operating points using the weighting approach. The decision names will correspond to the classes.
>> ops=sdops('w',[0.2 0.8; 0.5 0.5; 0.8 0.2],tr.lab.list)
Weight-based operating set (3 ops, 2 classes) at op 1
By default, the current operating point is the first one supplied. We can now compare the true labels in the test set to classifier decisions by looking at the confusion matrix:
>> sdconfmat(ts.lab,ts*p*ops) % the default operating point is used to make decisions
True | Decisions
Labels | apple banana | Totals
-------------------------------------
apple | 7 13 | 20
banana | 0 20 | 20
-------------------------------------
Totals | 7 33 | 40
Because the weight for the banana output is emphasized, we can observe no errors on the banana class but more errors for the apple class. The effect of output weighting will become more apparent when we use the 3rd operating point, using the opposite weighting scheme where apple is deemed more important than banana:
>> ops=setcurop(ops,3) % set the 3rd op.point as current
Weight-based operating set (3 ops, 2 classes) at op 3
>> sdconfmat(ts.lab,ts*p*ops)
True | Decisions
Labels | apple banana | Totals
-------------------------------------
apple | 20 0 | 20
banana | 12 8 | 20
-------------------------------------
Totals | 32 8 | 40
9.6. Confusion matrices ↩
Confusion matrix shows the match between true labels and classifier
decisions. It is a matrix with true labels on the rows and estimated labels
in the columns. The diagonals stores the number of correctly classified
objects, while the off-diagonal elements refer to the misclassified
objects. The example below shows the confusion matrix for a two class data
a where the labels are estimated using the trained pipeline p:
>> truelab=a.lab; % sdlab object storing the labels
>> decisions=a*sddecide(p); % sdlab object with classifier decisions
>> sdconfmat(truelab,decisions)
ans =
True | Decisions
Labels | apple banana | Totals
-------------------------------------
apple | 430 66 | 496
banana | 82 422 | 504
-------------------------------------
Totals | 512 488 | 1000
In the data a there are 496 apples, of which 66 are wrongly classified as
'banana', while 430 are correctly classified as 'apple'.
9.6.1. Normalized confusion matrices ↩
The confusion matrix can be normalized by the true number of objects per
class. The example below shows a confusion matrix for a eight class data
a where the labels are estimated using the trained pipeline p:
>> sdconfmat(truelab,decisions,'norm')
ans =
True | Decisions
Labels | a b c d e f g h | Totals
-------------------------------------------------------------------------------
a | 0.908 0.080 0.000 0.000 0.000 0.011 0.001 0.001 | 1.00
b | 0.020 0.938 0.000 0.000 0.000 0.000 0.042 0.000 | 1.00
c | 0.000 0.000 0.921 0.079 0.000 0.000 0.000 0.000 | 1.00
d | 0.000 0.000 0.241 0.758 0.000 0.000 0.001 0.000 | 1.00
e | 0.000 0.000 0.000 0.000 0.838 0.162 0.000 0.000 | 1.00
f | 0.000 0.000 0.000 0.000 0.153 0.847 0.000 0.000 | 1.00
g | 0.000 0.004 0.000 0.000 0.000 0.000 0.866 0.130 | 1.00
h | 0.000 0.000 0.000 0.000 0.000 0.000 0.095 0.905 | 1.00
-------------------------------------------------------------------------------
9.6.2. Storing confusion matrices as strings ↩
The confusion matrix can be saves as a string (str) to be used for example to generate automatic reports, or as a variable (cm).
>> str=sdconfmat(truelab,decisions,'string');
str =
True | Decisions
Labels | a b c d e f g h | Totals
-------------------------------------------------------------------------------
a | 1119 99 0 0 0 13 1 1 | 1233
b | 25 1189 0 0 0 0 53 0 | 1267
c | 0 0 1114 95 0 0 0 0 | 1209
d | 0 0 306 962 0 0 1 0 | 1269
e | 0 0 0 0 1038 201 0 0 | 1239
f | 0 0 0 0 187 1033 0 0 | 1220
g | 0 5 0 0 0 0 1122 169 | 1296
h | 0 0 0 0 0 0 120 1147 | 1267
-------------------------------------------------------------------------------
Totals | 1144 1293 1420 1057 1225 1247 1297 1317 | 10000
>> cm=sdconfmat(truelab,decisions);
cm =
Columns 1 through 6
1119 99 0 0 0 13
25 1189 0 0 0 0
0 0 1114 95 0 0
0 0 306 962 0 0
0 0 0 0 1038 201
0 0 0 0 187 1033
0 5 0 0 0 0
0 0 0 0 0 0
Columns 7 through 8
1 1
53 0
0 0
1 0
0 0
0 0
1122 169
120 1147
9.6.3. Rectangular confusion matrices ↩
sdconfmat can limit the set of classes or decisions to user specified
lists. Note that only subset of samples is used! Rectangular confusion
matrices arise in situations where we have 'outlier' true class but several
rejection decisions (e.g. 'background','not-fruit',...). In the example
below only the true labels of the classes a, c and d are visualized:
>> sdconfmat(truelab,decisions,'classes',{'a','c','d'})
ans =
True | Decisions
Labels | a c d e f g h | Totals
------------------------------------------------------------------------
a | 121 0 0 0 5 0 0 | 126
c | 0 0 109 0 0 0 0 | 109
d | 0 15 130 0 0 0 0 | 145
------------------------------------------------------------------------
Totals | 121 15 239 0 5 0 0 | 380
The classes and the decisions to be visualized can be chosen independently:
>> sdconfmat(truelab,decisions,'classes',{'a','c','d'},'decisions',{'a','b','c','d'})
ans =
True | Decisions
Labels | a b c d | Totals
---------------------------------------------------
a | 121 0 0 0 | 121
c | 0 0 0 109 | 109
d | 0 0 15 130 | 145
---------------------------------------------------
Totals | 121 0 15 239 | 375
When no classes option is provided, the true label of all classes is
visualized as default:
>> sdconfmat(truelab,decisions,'decisions',{'a','b','c','d'})
ans =
True | Decisions
Labels | a b c d | Totals
---------------------------------------------------
a | 121 0 0 0 | 121
b | 57 0 0 0 | 57
c | 0 0 0 109 | 109
d | 0 0 15 130 | 145
e | 0 0 0 0 | 0
f | 0 0 0 0 | 0
g | 1 0 0 0 | 1
h | 0 0 0 0 | 0
---------------------------------------------------
Totals | 179 0 15 239 | 433
9.6.4. Confusion matrices for a set of operating points ↩
The confusion matrices from classifier soft output can be estimated for sets of operating points simultaneously. In this example, a test set with 10 000 samples is used and the confusion matrices are estimated at 10 000 randomly selected weighting-based operating points. The speed of the computation is also shown.
>> tr=sddata(gendatm(10000)); ts=sddata(gendatm(10000))
10000 by 2 sddata, 8 classes: [1270 1276 1142 1243 1315 1289 1238 1227]
>> p=sdquadratic(tr);
>> out=ts*p;
>> ops=sdops('w',rand(10000,8),tr.lab.list)
Weight-based operating set (10000 ops, 8 classes) at op 1
>> tic; [cm,ll]=sdconfmat(ops,out); toc
Elapsed time is 3.182373 seconds.
>> size(cm)
ans =
8 8 10000
The variable cm stores a confusion matrix (which has size 8*8 for a eight
class problem) for each of the 10000 operating points. The sdconfmat
routine can also be used for a friendly visualization of a single confusion
matrix, e.g. the one at operating point number 42.
>> sdconfmat(cm(:,:,42),ts.lab.list)
True | Decisions
Labels | a b c d e f g h | Totals
-------------------------------------------------------------------------------
a | 1191 76 0 0 0 6 4 0 | 1277
b | 71 1152 0 0 0 0 56 0 | 1279
c | 0 0 978 175 0 0 0 0 | 1153
d | 0 0 251 984 0 0 0 0 | 1235
e | 0 0 0 0 1092 223 0 0 | 1315
f | 2 0 0 0 194 1089 0 0 | 1285
g | 0 0 0 0 0 0 1192 47 | 1239
h | 0 0 0 0 0 0 297 920 | 1217
-------------------------------------------------------------------------------
Totals | 1264 1228 1229 1159 1286 1318 1549 967 | 10000
9.6.5. Visualization of the per class errors ↩
In order to inspect which samples are misclassified in a certain class it may be useful to visualize the errors of the confusion matrix. This can be achieved by creating a new sample property that combines the true labels with the decisions of the classifier.
>> tr=sddata(gendatf(50)); tr=tr(:,:,[1,2]);
>> ts=sddata(gendatf(10000)); ts=ts(:,:,[1,2]);
>> p=sdmixture(tr,'comp',2,'iter',10);
>> dec=ts*sddecide(p)
sdlab with 7334 entries, 2 groups: 'apple'(3197) 'banana'(4137)
>> ts.confmat=[ts.lab '-' dec]; % concatenation of the two class labels
>> getproplist(ts)
ans =
'class' 'ident' 'confmat'
>> sdscatter(ts)
In the Scatter menu go to Use property and select confmat.

