24.08.2011 pavel
How to visualize errors in scatter plot?
Last time, we discussed how to identify samples that fall in a specific entry of a confusion matrix. Today, I will show you how to visualize them in the interactive scatter plot. This allows us to investigate the errors closely and quickly understand which feature space region is most affected.
In perClass, labels and decisions are represented by an sdlab objects. The label object behaves as a vertical 1D vector. We will start with the test set and decisions from this example:
>> ts 'medical D/ND' 2842 by 10 sddata, 2 classes: 'disease'(607) 'no-disease'(2235) >> ts.lab sdlab with 2842 entries, 2 groups: 'disease'(607) 'no-disease'(2235) >> dec sdlab with 2842 entries, 2 groups: 'disease'(562) 'no-disease'(2280)We may, therefore concatenate it horizontally with other set of labels. In this example, we will concatenate true labels with decisions creating a new label set describing different entries of the confusion matrix. To better distinguish the true label and decision in the new class names, we add a string separator in between:
>> L=[ts.lab ' / ' dec] sdlab with 2842 entries, 4 groups: 'no-disease / no-disease'(1951) 'disease / disease'(278) 'no-disease / disease'(284) 'disease / no-disease'(329) >> L' ind name size percentage 1 no-disease / no-disease 1951 (69.0%) 2 disease / disease 278 (10.0%) 3 no-disease / disease 284 (10.0%) 4 disease / no-disease 329 (12.0%)We may add this new set of labels to our test set data set in the same way, we're adding a field to Matlab structure:
>> ts.confmat=L 'medical D/ND' 2842 by 10 sddata, 2 classes: 'disease'(607) 'no-disease'(2235)The new set of labels show when we display details about the data set ts with the apostrophe operator:
>> ts' 'medical D/ND' 2842 by 10 sddata, 2 classes: 'disease'(607) 'no-disease'(2235) sample props: 'lab'->'class' 'class'(L) 'pixel'(N) 'patient'(L) 'tissue'(L) 'confmat'(L) feature props: 'featlab'->'featname' 'featname'(L) data props: 'data'(N) 'license'(S)We may now simply open a scatter plot and inspect the misclassified examples using the
confmat labels:
>> sdscatter(ts)
Using the new labels in a scatter plot is shown in this video:
This video requires a more recent version of the Adobe Flash Player to display. Please update your version of the Adobe Flash Player.
