- 6.1. Interactive scatter plot
- 6.1.1. Legend
- 6.1.2. Changing features
- 6.1.3. Sample inspector
- 6.1.4. Switching between different sets of labels
- 6.1.5. Visualizing subsets of samples
- 6.1.6. Bringing class to top, z-order of classes
- 6.1.7. Hand-painting class labels
- 6.1.8. Renaming classes
- 6.1.9. Visualizing live feature distributions in scatter plot
- 6.2. Interactive plot of per-class feature distributions
- 6.3. Working with image data
- 6.3.1. Visualizing images with sdimage
- 6.3.2. Creating image data sets objects directly
- 6.3.3. Hand-painting class labels
- 6.3.4. Cropping images
- 6.3.5. Saving image data set to workspace
- 6.3.6. Working with image subsets
- 6.3.7. Creating image matrix from a data set
- 6.3.8. Storing multiple images in data sets
- 6.3.9. Connecting sdimage and sdscatter
- 6.3.10. Clustering image with k-means
- 6.3.11. Defining connected components
6.1. Interactive scatter plot ↩
perClass provides an interactive scatter plot sdscatter. We can
launch it on any data set - here we create a data set with three features
computed from road sign images. We will compute mean, standard deviation
and median of each data set row (image reshaped to a vector):
>> a
381 by 1024 sddata, 17 classes: [31 28 24 33 19 21 57 26 21 9 13 15 14 1 14 29 26]
>> a2=setdata(a,[mean(+a,2) std(+a,0,2) median(+a,2)])
Warning: Feature names reset to 'Feature X' format.
> In <a href="error:/Users/pavel/ws/misc/tools/prsd_toolbox/DEV/src/prsd/@sddata/setdata.m,31,1">sddata.setdata at 31</a>
381 by 3 sddata, 17 classes: [31 28 24 33 19 21 57 26 21 9 13 15 14 1 14 29 26]
Note the warning message stating that feature labels of the new data set were set automatically.
>> getfeatlab(a2)
sdlab with 3 entries, 3 groups: 'Feature 1'(1) 'Feature 2'(1) 'Feature 3'(1)
We may set the feature labels to more descriptive names using setfeatlab:
>> a2=setfeatlab(a2,sdlab('mean','std','median'))
381 by 3 sddata, 17 classes: [31 28 24 33 19 21 57 26 21 9 13 15 14 1 14 29 26]
Alternatively, we may provide the feature labels directly in the setdata call:
>> a2=setdata(a,[mean(+a,2) std(+a,0,2) median(+a,2)],sdlab('mean','std','median'))
381 by 3 sddata, 17 classes: [31 28 24 33 19 21 57 26 21 9 13 15 14 1 14 29 26]
In order to visualize the scatter plot, we invoke the sdscatter command:
>> sdscatter(a2)
ans =
1
sdscatter opens a new figure and returns its handle:

The figure shows scatter plot of the first two features in the data set. Each point represents one data sample (here a road sign). The color and marker styles correspond to different classes.
By moving the mouse over the plot, we're shifting focus to the closest data sample represented by black marker. The figure title provides details about the highlighted sample, such as its index in the data set and class.
6.1.1. Legend ↩
The legend may be switched on either by pressing the l key (as in
legend) or using Show legend command in Scatter menu.

Note that pressing the legend toolbar button does not show correct class names in the legend; this is a known issue.
6.1.2. Changing features ↩
We can change features shown in sdscatter using cursor keys. "Left" and
"Right" arrow flips through the features on the horizontal and "Up" and
"Down" through the features on the vertical axis.
In order to directly select a feature of interest, use right click on the axis legend. A pop-up menu will appear listing the features available.

If more than 25 features are present in the data set, a dialog will appear allowing us to select a feature by its index.
6.1.3. Sample inspector ↩
Sample inspector shows a detailed view of a current sample. It is especially useful if data samples in the data set represent images (such as in our road sign example).
We can select the Show sample inspector command from Scatter menu. The
dialog opens asking for the name of the data set which contains the image
data. We will type a2 and click on OK. A separate window opens showing
the road sign image of the currently highlighted example:

You can use the sample inspector to identify outliers or to understand which objects fall in the area of overlap.
6.1.4. Switching between different sets of labels ↩
It is often beneficial to use multiple sets of labels. For example, in a medical problem, we may be interested not only in the top-level class such as 'cancer'/'non-cancer' but also in specific type of tissue or in the patient the sample originates from.
sdscatter may visualize any sample labeling available in the data
set. Any sdlab object stored as a sample property is available.
Let's use a medical data set from cancer detection problem in this example. It contains information on pixels in scans of multiple patients. For each pixel, we know the high-level label such as 'cancer'/'non-cancer' more precise tissue type and patient:
>> load medical;
>> a'
'medical all' 225119 by 11 sddata, 2 classes: 'cancer'(56652) 'non-cancer'(168467)
sample props: 'lab'->'class' 'class'(L) 'pixel'(N) 'patient'(L) 'tissue'(L)
feature props: 'featlab'->'featname' 'featname'(L)
data props: 'data'(N)
>> sdscatter(a)

We may switch between different labels via Use property command in Scatter menu.

Switching to patient labeling:

We may switch quickly to a specific property using the 1-9 shortcut keys. In our example, the tissue property is accessible by pressing '3':

6.1.5. Visualizing subsets of samples ↩
sdscatter allows us to show only subset of samples defined by label
values. This feature is accessible via the Sample filter command in
Scatter menu.
For example, we may be interested only in non-cancer tissues. We can select only non-cancer examples in *Scatter/Sample filter/class*.

We may combine multiple filters. For example, we might be interested only in non-cancer of patient 'Dick':

Note that sdscatter preserves the axes limits of the total data set also
for the sample subsets. This gives us important clues about position of the
subset within the total data distribution. If we are interested in the
detailed view of the subset, we may enter the automatic mode by pressing
'a' key. The limits will then be set according to the subset. Pressing 'a'
again returns us to the full data set limits.
When visualizing sample subsets, we may freely move between different sets of labels. For example, by pressing '3' we use 'tissue' property which shows us the specific non-cancer tissues of Dick:

To quickly return to the previous filter, use 'f' key or *Sample filter/Apply previous filter* command. This allows us to understand differences between distributions
Visible subset of samples may be stored in a new data set in Matlab workspace using Create data set with visible samples menu command.
6.1.6. Bringing class to top, z-order of classes ↩
Overlapping classes may easily obscure scatter plots of large
data sets. sdscatter provides Class to top command in the Scatter menu
which allows us to bring desired class on top. In this way, we can better
understand what happens in the area of overlap.
We will demonstrate this function on the artificially-generated three-class
data set created by the gendatf function:
>> a=sddata(gendatf(10000))
'Fruit set' 10000 by 2 sddata, 3 classes: 'apple'(3333) 'banana'(3333) 'stone'(3334)
>> sdscatter(a)

The stone class obscures the banana distribution. By selecting Class to top and banana, we change the order in which the classes are plotted, so that banana appears on top.

sdscatter also offers two keystrokes for easy flipping through the
plotting order (z-order) of classes using + and - keys (to make things
simpler, the = works as + so three is no need to hold SHIFT).
6.1.7. Hand-painting class labels ↩
sdscatter allows us to define class labels directly by painting. In this
way, we can interactively label interesting groups of samples such as
outliers, areas of overlap or class modes.
Painting is accessible both from the Scatter menu and from context-sensitive menu.

We need to specify which class to paint. It can be either one of the existing classes or we can create a new class. In our example, we are interested in the area of overlap and will, therefore, create a new class called overlap.
In painting mode, the square is added to the scatter plot axis. By holding left mouse button, we assign the samples included in the square into the desired class.
Note that while painting, you can freely switch between features to find the best views for your problem. You can also hide some of the classes using Class visibility command. Painting assigns the labels only to visible data samples.

When finished, choose Stop painting from the context menu or from the Scatter menu.
6.1.8. Renaming classes ↩
sdscatter provides a simple way to rename classes. This facility is
helpful to re-arrange the data set or to assign meaning to labels generated
by cluster analysis.
The function is accessible through Rename class command in the context menu or in the Scatter menu.

We can, for example, rename the apple and banana classes into fruit. Using the Create data set in workspace command from Scatter menu, we can save this data set into the Matlab work-space. The resulting data set will have only two classes, namely stone and fruit.
>> b % Created sddata b with all label sets.
Fruit set, 10000 by 2 sddata, 2 classes: 'stone'(3334) 'fruit'(6666)
>> b.lab.list
sdlist (2 entries)
ind name
1 stone
2 fruit
Note that interactive renaming of classes makes sense when used with
interactively defined classes. For existing classes in the data set, it is
simpler to use the sdrelab function as we discussed here.
6.1.9. Visualizing live feature distributions in scatter plot ↩
When visualizing large data sets, the scatter plot alone is often not
sufficient to judge the class overlap. To visualize the overlap conditions,
sdscatter offers to include feature distribution plot for each of the
axes.
Select Show feature distributions in the Scatter menu or press 'd'. Scatter figure will be extended with an additional distribution plot for horizontal and vertical axis:
>> a
'medical D/ND' 6400 by 11 sddata, 3 classes: 'disease'(1495) 'no-disease'(4267) 'noise'(638)
>> sdscatter(a)

The distribution sub-plots show histograms for each of the available classes. Because the axes limits are aligned, we may better understand where is the true area of high class density located. When you focus on a subset of classes, switch between sets of labels or paint labels, the plots are updated accordingly.
To remove the class histograms, select Hide feature distributions from the Scatter menu or press 'd' key again.
6.2. Interactive plot of per-class feature distributions ↩
The visualization of the per class distribution of each feature gives an indication of the class overlap. The sdfeatplot provides this plot.
In order to visualize the distribution for different features use the up/down cursor keys.

The image shows the distribution for the two classes present in the data. By default the labels are used, but the 'lab' option allows to visualize the distribution of other properties present in the data set. The distribution is obtained computing the histogram. The default number of bins is 30, but it can be customized using the 'bin' option. The distribution maybe also be visualized using as bins the unique values of the data itself, this is achieved by pressing the 'u' keystroke. Of course the distribution looks more "noisy", see the right plot in the figure below.

The style of the distribution may be customized using the 'style' option.
>> sdfeatplot(out2,'style',{'g-','m--'})

sdfeatplot provides several options to enhance the visualization if the features of interest are obtained from the computation of histograms. For example, pressing the 's' keystroke switches to the stem-plot, highlighting individual histogram bins. The bins for the grid maybe computed automatically, with linearly spread bins over the data range. Alternatively, the unique values may be visualized using the 'u' keystroke. This is especially useful in case the feature histogram is very sparse. In this case, the direct inspection of the bins values gives a better understanding (right plot in the figure below) compared to the distribution plot (plot on the left).

Using the 'x' keystroke the x-axis for the bins maybe specified. This is especially useful if the data has logarithmic scale.
6.3. Working with image data ↩
perClass provides a set of tools for working with image data. It allows us
to quickly build classifiers based on local image information. The central
component of this framework is the sdimage command. It provides both, a
provides powerful interactive visualization tool and allows construction
of data sets with image data.
6.3.1. Visualizing images with sdimage ↩
Let us consider an RGB image of a traffic scene 'roadsign09.bmp', loaded
with Matlab imread command:
>> im = imread('roadsign09.bmp');
>> figure; imagesc(im)

Using sdimage command on matrix im opens an interactive viewer:
>> sdimage(im);

The blue layer on top of the image represents the set of labels of the
image data set, internally used by sdimage. As any other sddata
set, each sample (pixel) has a label, which is set to "unknown" by default.
We may toggle this label layer using the space bar key. Additionally, we
may also adjust label transparency from very transparent to opaque in the
Image menu.
The three image channels are visualized as three separate image bands. We
can move between the bands with the 'up' and 'down' cursor keys. Each pixel
is a data sample, the figure title shows the pixel's value and class label
('unknown' by default). Because sdimage represents the image matrix by
sdimage loads the image if provided with the string filename:
>> sdimage('roadsign09.bmp')
6.3.2. Creating image data sets objects directly ↩
sdimage command allows us to create image data set directly on the Matlab
prompt using 'sddata' option:
>> a=sdimage(im,'sddata')
412160 by 3 sddata, class: 'unknown'
sdimage also accepts the image filename, attempting the load the file
using imread:
>> b=sdimage('roadsign11.bmp','sddata')
412160 by 3 sddata, class: 'unknown'
The objects a and b are standard sddata sets with one sample
for each pixel and three features corresponding to R,G, and B bands
respectively. Note, that pixel values were also converted into double
precision.
6.3.3. Hand-painting class labels ↩
Interactive sdimage figure allows us to paint class labels for image
regions. In order to enter the 'paint' mode, use the Paint menu in the
Image menu, select the Create new class command:

A dialog window will ask for the name of the class. Let's say we are interested in labeling road, we provide the name of the class and paint in the image region. Via the Image menu, or by clicking the right mouse button we can change brush size or exit from the paint mode.

6.3.4. Cropping images ↩
Often, we only want to work with a smaller area of a large image.
sdimage offers us a crop function which makes this very quick.
Select a Crop image item in the Image menu.

A cross-hair will appear. Choose two corners of a region you wish to crop. The process may be terminated by clicking right mouse button.

The new sdimage figure is opened containing the data from the specified
region. Cropped data contains all labels and properties of the complete
image. The image size for the new data set will be set to the specified
region. If you save the cropped image into data set c:
>> Creating data set c in the workspace.
12840 by 3 sddata, class: 'unknown'
>> getiminfo(a)
ans =
imsize: [560 736]
>> getiminfo(c)
ans =
imsize: [120 108]
6.3.5. Saving image data set to workspace ↩
The Create data set in workspace command in Image menu lets us to store
the image data together with the painted labels in a new sddata
object in Matlab workspace. We are asked to provide the variable name for
this new data set.
Storing our image data set with the labeled road region in data2
variable, we will see the following message in Matlab command window:
>> Creating data set data2 in the workspace.
412160 by 3 sddata, 2 classes: 'unknown'(403001) 'road'(9159)
6.3.6. Working with image subsets ↩
Image data sets preserve the image information. We may, for example use
only a subset of data, e.g. the pixels labeled as 'road' in the data2
object above.
>> sub=data2(:,:,'road')
9159 by 3 sddata, class: 'road'
The image subset may be still visualized as an image:
>> sdimage(sub)

6.3.7. Creating image matrix from a data set ↩
Data set representation of image data is useful for training pattern
recognition algorithms. However, often we may need to apply imaging
operations, such as filtering, to our image regions. sdimage allows us to
create a image matrix with pixel values using the matrix option:
>> sub
9159 by 3 sddata, class: 'road'
>> I=sdimage(sub,'matrix');
>> size(I)
ans =
560 736 3
Matrix I is created with the size of the original image, the sub data
was extracted from. The matrix is filled with zeros and only the pixels
available in the sub data set are inserted into this matrix.
Note that the matrix I uses double precision:
>> class(I)
ans =
double
We may now perform any image processing operation such filtering and bring
the resulting data back into a data set format. This can be done using the
linear indices stored in the sub data set 'pixel' property:
>> sub2=setdata(sub, I(sub.pixel))
9159 by 1 sddata, class: 'road'
6.3.8. Storing multiple images in data sets ↩
Image data sets created from multiple images may be joined. This feature allows us to create larger training sets with pixel-level data from multiple images and train robust classifiers.
Each image data set, created using sdimage, contains 'image' property
(labels). If the image is loaded by providing the filename, this will be
used as its image label.
>> im1=sdimage('roadsign09.bmp','sddata')
412160 by 3 sddata, class: 'unknown'
>> im2=sdimage('roadsign11.bmp','sddata')
412160 by 3 sddata, class: 'unknown'
>> a=[im1; im2]
824320 by 3 sddata, class: 'unknown'
>> a'
824320 by 3 sddata, class: 'unknown'
sample props: 'lab'->'class' 'class'(L) 'pixel'(N) 'image'(L)
feature props: 'featlab'->'featname' 'featname'(L)
data props: 'data'(N)
>> a.image
sdlab with 824320 entries, 2 groups: 'roadsign09.bmp'(412160) 'roadsign11.bmp'(412160)
If we create an image from a matrix, sdimage creates random image label
to avoid name clash with other images.
>> im1=sdimage(imread('roadsign09.bmp'),'sddata')
412160 by 3 sddata, class: 'unknown'
>> im2=~sdimage`(imread('roadsign11.bmp'),'sddata')
412160 by 3 sddata, class: 'unknown'
>> a=[im1; im2]
824320 by 3 sddata, class: 'unknown'
>> a.image
sdlab with 824320 entries, 2 groups: 'image9552'(412160) 'image6571'(412160)
Note, that the image name is generated randomly and no check for identical names when concatenating image data sets is performed. It is the responsibility of the user to make sure that different images in one data set are labeled differently.
6.3.9. Connecting sdimage and sdscatter ↩
It is often useful to inspect the connection between image neighborhoods and
the scatter plot. In order to visualize this connection the sdscatter and
sdimage commands can be used together.
We may simply show data set with image data using sdscatter and then
connect the sdimage plot to the scatter figure using the returned figure
handle.
>> data2 % Created data set data2 in the workspace.
412160 by 3 sddata, 2 classes: 'unknown'(399848) 'road'(12312)
>> h=sdscatter(data2)
h =
2
>> sdimage(data2,h);

By moving mouse pointer over the image, we may see where the image pixel appears in the feature space. Similarly, moving over the scatter plot shows us the corresponding pixel.
By painting the in the scatter plot, the linked image plot also updates. This helps us to analyze position of specific feature space clusters in image domain:

6.3.10. Clustering image with k-means ↩
One way to quickly group image data is to perform clustering. Using the Cluster with k-means command in Image menu, the data set, underlying our image, is then clustered algorithm considering individual pixels as separate data samples and image bands as features.
We are prompted for the desired number of clusters.

We will obtain a new set of image labels called 'cluster' containing classes called 'C1','C2' etc.

Typical next step is to interpret the clusters. This may be done by assigning meaningful names using Rename class command.
6.3.11. Defining connected components ↩
sdimage allows us to define spatially-connected components. This allows
us to quickly access individual objects or regions in an image data set.
The Connected components menu is available only if the current set of
labels contains two or more classes.

Connected component command processes the current set of labels. For each class, the connected components are found separately.

Small isolated components are joined together into a special class (called 'small objects'). This helps us to quickly remove the noise. By default, objects smaller than 10 pixels are removed. This can be changed by the first item in the Connected components menu. In order to separate all isolated objects, use the value of 1.
When we save the data set back to the Matlab workspace (pressing 's' key),
we can see the 'object' labels. Note that, because we saved the image data
when the 'object' label selected, the resulting data set keeps it as a
current label set. Therefore, we may address it as data2.lab
>> Creating data set data2 in the workspace.
12210 by 3 sddata, 11 'object' groups: [242 160 66 2643 114 6481 72 4 2346 62 20]
>> data2.lab'
ind name size percentage
1 C1-object1 242 ( 2.0%)
2 C1-object2 160 ( 1.3%)
3 C1-object3 66 ( 0.5%)
4 C1-object4 2643 (21.6%)
5 C1-small objects 114 ( 0.9%)
6 C2-object1 6481 (53.1%)
7 C2-object2 72 ( 0.6%)
8 C2-small objects 4 ( 0.0%)
9 C3-object1 2346 (19.2%)
10 C3-object2 62 ( 0.5%)
11 C3-small objects 20 ( 0.2%)
We can remove the small objects quickly with a regular expression. We simply select all classes, that do not contain the 'small' substring:
>> data2(:,:,'~/small').lab'
ind name size percentage
1 C1-object1 242 ( 2.0%)
2 C1-object2 160 ( 1.3%)
3 C1-object3 66 ( 0.5%)
4 C1-object4 2643 (21.9%)
5 C2-object1 6481 (53.7%)
6 C2-object2 72 ( 0.6%)
7 C3-object1 2346 (19.4%)
8 C3-object2 62 ( 0.5%)
