24.08.2011  pavel

How to visualize errors in scatter plot?

imageLast time, we discussed how to identify samples that fall in a specific entry of a confusion matrix.  Today, I will show you how to visualize them in the interactive scatter plot. This allows us to investigate the errors closely and quickly understand which feature space region is most affected.


In perClass, labels and decisions are represented by an sdlab objects. The label object behaves as a vertical 1D vector. We will start with the test set and decisions from this example:
>> ts
'medical D/ND' 2842 by 10 sddata, 2 classes: 'disease'(607) 'no-disease'(2235) 

>> ts.lab
sdlab with 2842 entries, 2 groups: 'disease'(607) 'no-disease'(2235) 

>> dec
sdlab with 2842 entries, 2 groups: 'disease'(562) 'no-disease'(2280) 
We may, therefore concatenate it horizontally with other set of labels. In this example, we will concatenate true labels with decisions creating a new label set describing different entries of the confusion matrix. To better distinguish the true label and decision in the new class names, we add a string separator in between:
>> L=[ts.lab ' / ' dec]
sdlab with 2842 entries, 4 groups: 'no-disease / no-disease'(1951) 'disease / disease'(278) 
'no-disease / disease'(284) 'disease / no-disease'(329) 

>> L'
 ind name                       size percentage
   1 no-disease / no-disease    1951 (69.0%)
   2 disease / disease           278 (10.0%)
   3 no-disease / disease        284 (10.0%)
   4 disease / no-disease        329 (12.0%)
We may add this new set of labels to our test set data set in the same way, we're adding a field to Matlab structure:
>> ts.confmat=L
'medical D/ND' 2842 by 10 sddata, 2 classes: 'disease'(607) 'no-disease'(2235) 
The new set of labels show when we display details about the data set ts with the apostrophe operator:
>> ts'
'medical D/ND' 2842 by 10 sddata, 2 classes: 'disease'(607) 'no-disease'(2235) 
sample props: 'lab'->'class' 'class'(L) 'pixel'(N) 'patient'(L) 'tissue'(L) 'confmat'(L)
feature props: 'featlab'->'featname' 'featname'(L)
data props:  'data'(N) 'license'(S)
We may now simply open a scatter plot and inspect the misclassified examples using the confmat labels:
>> sdscatter(ts)

Using the new labels in a scatter plot is shown in this video:

This video requires a more recent version of the Adobe Flash Player to display. Please update your version of the Adobe Flash Player.


Comments

Name:

Email:

Location:

URL:

Remember my personal information

Notify me of follow-up comments?

Submit the word you see below: