- 3.1. Loading data
- 3.2. Training a fruit classifier
- 3.3. Executing the classifier on new data
- 3.4. Performing crisp decisions
- 3.5. Building a fruit detector
- 3.6. Creating a detector/classifier cascade
- 3.7. Executing the classifier in application outside of Matlab
This chapter provides a simple example of training a classifier in perClass and deploying it outside of Matlab in a custom application.
3.1. Loading data ↩
Let us start by loading the Fruit data set:
>> load fruit.mat
>> a
'Fruit set' 260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60)
The 'apple' and 'banana' classes represent genuine fruit to be processed on
the extracted from the conveyor belt. The samples labeled as 'stone' are
the outliers that should be rejected. The data set a contains 260
samples, each represented by two features.
The data set object is a data matrix augmented with meta-data information such as sample labels or feature names. The samples are stored as data rows and features as columns.
We will visualize the scatter plot of the Fruit data set using
>> sdscatter(a)

We will now split our available data set into training and test subsets. We will use 50% of data for training the classifier and the rest for estimating its performance:
>> [tr,ts]=randsubset(a,0.5)
'Fruit set' 130 by 2 sddata, 3 classes: 'apple'(50) 'banana'(50) 'stone'(30)
'Fruit set' 130 by 2 sddata, 3 classes: 'apple'(50) 'banana'(50) 'stone'(30)
3.2. Training a fruit classifier ↩
We will now train a classifier discriminating between the fruit classes. First, lets extract the subset with samples labeled as 'apple' and 'banana'. In perClass, the third dimension represents classes. Therefore, we may extract a subset simply by listing class names:
>> tr2=tr(:,:,{'apple','banana'})
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50)
In order to capture the specific shape of the class distributions, we use the Gaussian mixture model:
>> p=sdmixture(tr2)
[class 'apple' initialization:...................... 2 clusters EM:.....................
......... 2 comp] [class 'banana' initialization:...................... 2 clusters EM:...
........................... 2 comp]
Mixture of Gaussians pipeline 2x2 2 classes, 4 components (sdp_normal)
The output of sdmixture function is a trained pipeline. Pipelines in
perClass represent components of a pattern recognition system.
Note that we have not specified the number of mixture components. By
default, sdmixture estimates the number of components in each class from
the data automatically using EM algorithm and non-parametric density
estimation approach.
3.3. Executing the classifier on new data ↩
The pipeline may be executed on new data using multiplication operator:
>> out=tr*p
'Fruit set' 130 by 2 sddata, 3 classes: 'apple'(50) 'banana'(50) 'stone'(30)
When executed on new data, the pipeline p returns estimates of
probabilisty density for each of the two trained classes. We may display
the content of the resulting data set for first ten samples using:
>> +out(1:5)
ans =
0.0157 0.0002
0.0085 0.0000
0.0140 0.0000
0.0012 0.0001
0.0021 0.0001
The unary plus operator is just a convenient shortcut for sddata
conversion to double (double(out)).
We may visualize this soft output of the pipeline p using sdscatter:
>> sdscatter(ts,p)

The scatter plot backdrop now shows the estimated class conditional density computed in a dense grid over the feature space. You may use up and down cursor keys to flip between the soft outputs of both classes.
3.4. Performing crisp decisions ↩
To move from the soft output to decisions, we need to add an decision step
to our pipeline. We can do that using the sddecide function:
>> pd=sddecide(p)
sequential pipeline 2x1 'Mixture of Gaussians+Decision'
1 Mixture of Gaussians 2x2 2 classes, 4 components (sdp_normal)
2 Decision 2x1 weighting, 2 classes, 1 ops at op 1 (sdp_decide)
It adds default operating point assigning the sample to the class with maximum conditional probability density (Maximum Aposteriori Probability rule with equal class priors).
When applying the resulting pipeline pd to any data or matrix of doubles
with 2 columns, we obtain crisp decisions:
>> dec=ts*pd
sdlab with 130 entries, 2 groups: 'apple'(54) 'banana'(76)
Both labels and decisions in perClass are stored in sdlab
objects. We may access true labels in the test set using:
>> ts.lab
sdlab with 130 entries, 3 groups: 'apple'(50) 'banana'(50) 'stone'(30)
Confusion matrix helps us to compare the ground truth labels and decisions:
>> sdconfmat(ts.lab,dec)
ans =
True | Decisions
Labels | apple banana | Totals
-------------------------------------
apple | 49 1 | 50
banana | 0 50 | 50
stone | 5 25 | 30
-------------------------------------
Totals | 54 76 | 130
We may see that on our test set, the stones get labeled as bananas. To understand why, lets visualize the classifier decisions in the feature space:
>> sdscatter(ts,pd)

Our mixture model is a discriminant. This means that it labels each newcoming observation as 'apple' or 'banana'. Although this works fine in the neighborhood of our training observations, the discriminant decision may become meaningless far away. In production we may encounter observations not reliably represented by our training data. In such situation, we may want to avoid making decisions only based on the discrimination scheme. We can accomplish this is two ways, either by adding a reject option to our discriminant or by training a separate detector for genuine fruit examples. In this chapter, we illustrate the later case. See Chapter 9 for explanation how to add a reject option to a classifier trained in perClass.
3.5. Building a fruit detector ↩
Detector is a statistical model with thresholded soft output. Training a
detector in perClass may be done in a single step using the sddetector
function. We provide it with the data set, the name of the class to be
modelled and the untrained model:
>> pd2=sddetector(tr2,'fruit',sdparzen,'reject',0.01)
...sequential pipeline 2x1 'Parzen+Decision'
1 Parzen 2x1 one class, 100 prototypes (sdp_parzen)
2 Decision 1x1 thresholding on fruit at op 1 (sdp_decide)
We train our detector on all examples in tr2 data set. We specify the
'fruit' target class. Because 'fruit' is not present in the tr2 set,
sddetector trains the model on all data. We adopt the non-parametric
Parzen density estimator.
Finally, because we train the model on all available data, we need to specify how to actually set the detector threshold. Here, we set it by rejecting 1% of the data (one-class approach).
The detector pd2 returns two decisions, namely 'fruit' and 'non-fruit'.
We visualize its outputs using sdscatter:
>> sdscatter(ts,pd2)

3.6. Creating a detector/classifier cascade ↩
We will now create a two-stage classifier componsed of the 'fruit' detector and 'apple'/'banana' discriminant. The detector will be executed on all input samples. Only the samples accepted by the detector as 'fruit' will be passed to the second stage which performs 'apple'/'banana' discriminantion.
>> pc=sdcascade(pd2,'fruit',pd)
2-stage cascade pipeline 2x1 (sdp_cascade)
Executing the cascade on our test set, we may see that the 'stone' examples get mostly rejected as 'non-fruit':
>> sdconfmat(ts.lab,ts*pc)
ans =
True | Decisions
Labels | non-fr apple banana | Totals
--------------------------------------------
apple | 2 47 1 | 50
banana | 4 0 46 | 50
stone | 25 0 5 | 30
--------------------------------------------
Totals | 31 47 52 | 130
Finally, we visualize the cascade decisions in scatter plot:
>> sdscatter(ts,pc)

3.7. Executing the classifier in application outside of Matlab ↩
In order to execute the cascade classifier outside Matlab, we need two
things. First, we link our application with the runtime library
perclass.dll. In our example, we use the simple PRSDDemo application
written in RealBasic. You may find it in interfaces\GUIDemo directory of
the perClass distribution. This demo serves only as an illustrative
example. perClass may be embedded in any other language or environment as
long as it can call a DLL library.

To execute our classifier out of Matlab, we export it using the sdexport
command:
>> sdexport(pc,'cascade.ppl')
Exporting pipeline..ok
This pipeline requires perClass runtime version 3.0 (18-jun-2011) or higher.
The sdexport command creates a binary file cascade.ppl with complete
description of our cascade classifier.
We also export the test data into a comma separated file:
>> dlmwrite('test_data.txt',+ts)
We may now load this pipeline in our perClassDemo application:

We load the text file with comma-separated test samples:

Finally, we execute the cascade classifier:

We may check that we are getting identical decisions as in Matlab:
>> dec=ts*pc
sdlab with 130 entries, 3 groups: 'non-fruit'(33) 'apple'(46) 'banana'(51)
>> +dec(1:10)
ans =
apple
apple
apple
apple
non-fruit
apple
apple
apple
apple
apple
Note that the classifier running in our application may be changed anytime without recompilation. This allows us to quickly test our classifiers in real world production conditions.
This concludes our short Getting started intro to perClass.
