perClass Documentation
version 5.0 (21-sep-2016)

kb18: How to protect a trained discriminant against outliers?

Problem: How can I protect a multi-class discriminant against accepting outliers?

Solution: Add a rejection threshold to the discriminant operating point.

Classifiers we train are often executed in environments where new types of measurements appear that were not considered during classifier design. For example, in a fruit sorting problem our classifier distinguishing several types of fruit may also encounter stones, leaves or dirt on the conveyor belt. Accepting stones or dirt as one of the fruit classes results in high sorting error.

In this tutorial, we discuss how to protect the trained multi-class discriminants from accepting such outliers.

The approach we take in this example is adding a reject option to a trained discrimiant. This method does not use any outlier examples during training. Note, however, that some outlier examples are still needed for the sake of evaluation.

18.1. Fruit data set example ↩

Our data set contains three classes, namely the apple and banana fruit and some stones we have observed. Our goal is to discriminate apple and banana while protecting the decision to any potential outliers.

>> a
'Fruit set' 260 by 2 sddata, 3 classes: 'apple'(100) 'banana'(100) 'stone'(60) 

>> sdscatter(a)

three-class fruit data set

Let us first split our data set into training and test subsets. As mentioned before, we will not consider stone during training but only in testing phase. Using randsubset method, we may randomly sample only some of the classes:

>> [tr,ts]=randsubset(a,[50 50 0])
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50) 
'Fruit set' 160 by 2 sddata, 3 classes: 'apple'(50) 'banana'(50) 'stone'(60) 

18.2. Training a discriminant ↩

We will now train a model of interest on the two-class fruit problem tr. In this example, we use the Parzen classifier:

>> p=sdparzen(tr)
...Parzen pipeline         2x2  2 classes, 100 prototypes (sdp_parzen)

To provide decisions, we need to explicitly add a desired operating point using sddecide. We will use a default setting with equal weight for each class output:

>> pd=sddecide(p)
sequential pipeline     2x1 'Parzen+Decision'
 1  Parzen                  2x2  2 classes, 100 prototypes (sdp_parzen)
 2  Decision                2x1  weighting, 2 classes, 1 ops at op 1 (sdp_decide)

We visualize the decisions of our two-class discriminant pd on the test set ts:

>> sdscatter(ts,pd)

discriminant decisions on a three-class test set with outliers

We may observe that the existing stones (green markers) are assigned into one of the two classes.

>> sdconfmat(ts.lab,ts*pd)

ans =

 True      | Decisions
 Labels    |  apple banana  | Totals
-------------------------------------
 apple     |    49      1   |    50
 banana    |     0     50   |    50
 stone     |     3     57   |    60
-------------------------------------
 Totals    |    52    108   |   160

18.3. Adding reject option to the discriminant ↩

Let us now add the reject option to the operating point in pd using the sdreject command. sdreject will add a threshold on the maximum weighted output of the discriminant in pd. The threshold value is selected so, that a specific percentage of data is rejected.

>> pr=sdreject(pd,tr)
Weight-based operating point,2 classes,[0.50,0.50]
sequential pipeline     2x1 'Parzen+Decision'
 1  Parzen                  2x2  2 classes, 100 prototypes (sdp_parzen)
 2  Decision                2x1  weight+reject, 3 decisions, ROC 1 ops at op 1 (sdp_decide)

The resulting pipeline pr returns three decisions:

>> pr.list
sdlist (3 entries)
 ind name
   1 apple 
   2 banana
   3 reject

As we may see on the training set, 1% of samples is rejected by default:

>> sdconfmat(tr.lab,tr*pr)

ans =

 True      | Decisions
 Labels    |  apple banana reject  | Totals
--------------------------------------------
 apple     |    48      1      1   |    50
 banana    |     2     48      0   |    50
--------------------------------------------
 Totals    |    50     49      1   |   100

When executed on the test set, our new classifier with reject option pr rejects the most of the stone samples:

>> sdconfmat(ts.lab,ts*pr)

ans =

 True      | Decisions
 Labels    |  apple banana reject  | Totals
--------------------------------------------
 apple     |    46      1      3   |    50
 banana    |     0     49      1   |    50
 stone     |     0      9     51   |    60
--------------------------------------------
 Totals    |    46     59     55   |   160

Finally, we visualize the decisions of the classifier with reject option on the test set:

>> sdscatter(ts,pr)

rejecting outliers on a three-class test set

18.4. Building reject curve ↩

Instead of fixing the fraction of rejection manually, we may build entire reject curve relating multiple fractions to performance. This is achieved using sdroc command with 'reject' option.

Similarly, to standard ROC analysis, we first need to estimate soft outputs of our trained model:

>> out=tr*p
'Fruit set' 100 by 2 sddata, 2 classes: 'apple'(50) 'banana'(50) 

Now we invoke sdroc command with 'reject' option:

>> r=sdroc(out,'reject')
ROC (1001 wr-based op.points, 3 measures), curop: 1
est: 1:frac(reject)=0.00, 2:TPr(apple)=0.98, 3:TPr(banana)=0.96

By default, fraction of rejection and per-class true positive rates (recalls) are estimated.

To visualize the interactive ROC and scatter plots, use sdscatter command:

>> sdscatter(ts,p*r,'roc',r)

interactive reject curve

Note that we may visualize the test set containing additional stone examples. Moving the mouse over the ROC plot, we may investigate changes to the classifier boundary due to different rejection threshold used.

18.5. What discriminant models can be used for outlier rejection? ↩

Not all statistical models may be used for outlier rejection. Only the models that output probability density or distance can reject outliers. If the discriminant outputs are normalized over a set of classes, the domain information is lost and cannot be recovered. Adding a reject option will result in rejection close to the decision boundary (area of low confidence), not outlier rejection.

As an illustration, we may visualize the decisions of a classifier that is built on top of Parzen with outputs normalized to sum to one (aposteriori probabilities):

>> pm=sdnorm(p)
sequential pipeline     2x2 'Parzen+Output normalization'
 1  Parzen                  2x2  2 classes, 100 prototypes (sdp_parzen)
 2  Output normalization    2x2  (sdp_norm)
>> pr2=sdreject(pm,tr)
sequential pipeline     2x1 'Parzen+Output normalization+Decision'
 1  Parzen                  2x2  2 classes, 100 prototypes (sdp_parzen)
 2  Output normalization    2x2  (sdp_norm)
 3  Decision                2x1  weight+reject, 3 decisions, 1 ops at op 1 (sdp_decide)
>> sdscatter(ts,pr2)

rejecting close to the decision boundary on a three-class test set

PRSD Studio models appropriate for outlier detection: