perClass Documentation
version 5.0 (21-sep-2016)

kb16: Visualize the effect of a change of parameters in a trained classifier

Problem: Can I visualize how a trained classifier is influenced by its parameter?

Solution: The scatterplot provides a slider to directly visualize the effect of changing the parameters for the following classifiers: KNN, Parzen and Adaboost

Some classifiers requires a paramiter to be set. What is the effect of this choice? Sdscatter offers the possibility to interactively visualize in two dimensions the effect of changing the parameter of a trained classifier for three different models:

In all cases the choice of the paramiter will influence the complexity of the classifier. For example small values of k for the KNN or for the smooting parameter of the Parzen classifier will allow a fine fitting of the training set, leading to a complex decision boundary. On the contrary large values of k or of the smoothing parameter will result in a simple linear classifier. Usually the choise of a parameter is a traid-off between training a model while avoiding overfitting to the training data, that would result in less generalization capabilities of the classifier.

Let's illustrate the effect of changing the smoothing parameter in the Parzen classifier. First we train sdparzen on the three-class dataset a, and visualize the soft-outputs of the classifier (i.e. the probability density function) for the first class.

>> load fruit; a
Fruit set, 260 by 2 dataset with 3 classes: [100  100   60]
>> p=sdparzen(a)
.Parzen pipeline         2x2  2 classes, 100 prototypes (sdp_parzen)    
>> sdscatter(a,p) 

The soft-output ranges from high values, in red, to small values in deep blue. The figures titles show the value of the smoothing paramiter. The slider at the bottom of the figure allows to interactivily change the parameter. Small values give a very punctual probability density function for the class of interest. Increasing the value of the smoothing parameter leads to a more smooth and global probability density function.

In order to visualize the classifer decisions we fix a default operating point and added to the trained sdparzen pipeline.

 >> sdscatter(a,p*sddecide)

As expected, the boundary of the classifier becomes more smooth as the parameter increases.

Similar behaviour can be observed for the KNN classifier.

>> p=sdknn(a,'k',20,'method','classfrac')
20-NN pipeline          2x2  k=20, 260 prototypes (sdp_knnmc)
>> sdscatter(a,p*sddecide)

Small values for k lead to complex decision boundary, increasing the number of nearest neighbour reduce the overfitting to the training dataset and results in smoothed decision boundaries.

In the adaboost classifier the type and number of base classifiers need to be choosen. In the example below, 42 simple decision stumps classifers are used as base classifiers.

>> a=gendatb
Banana Set, 100 by 2 dataset with 2 classes: [50  50]
>> w=adaboostc(a,stumpc([],'maxcrit',2),42)
Weighted Voting, 2 to 2 trained  mapping   --> wvotec
>> p=sdconvert(w)
sequential pipeline     2x2 ''
 1  sdp_adaboost        2x2  
>> sdscatter (a,p*sddecide])

The scatterplots show the decision boundary when using different numbers of base classifiers. It the first plot only one classifier is used, the space is simply split on two region by the single decision stump. when more classifiers are employed the decision boundary is more complex and accurate, since number a classifiers decisions are fuses togheter. The figures title displays the number of baseclassifiers used, and the execution time. Note that the smaller the number of classifiers the faster the execution.