perClass Documentation
development version 3.2 (14-Mar-2012)
Content

Comments? Ideas? Compliments?

Your email (only if you wish to be contacted)

Chapter 7: Feature extraction for images

Table of contents

This chapter describes extraction of local image features.


7.1. Introduction ↩

perClass provides a simple-to-use framework for working with local image information. This significantly reduces time needed to build classifiers in problems such as medical diagnostics or machine inspection. Typically, we need to detect a concept such as "disease" in patients' scans based on local textural and appearance patterns. We start from images with disease regions annotated by an expert. First, we must extract features in local image neighborhoods (blocks). Secondly, we need to collect such labeled information from multiple images because only then our classifiers may reach the necessary generalization capability. perClass sdextract command simplifies extraction of local image features.

It takes image data as input and returns a data set with feature vectors computed from image blocks on a regular grid.

Let us take a gray-level microscopic image of a detergent particle:

>> im
16384 by 1 sddata, class: 'unknown'
>> sdimage(im)

By default, sdextract computes mean and variance features in 8x8 pixel blocks with one pixel step:

>> a=sdextract(im)
14641 by 2 sddata, class: 'unknown'

We may visualize any data set with image data with sdimage:

>> sdimage(a)

Space-bar removes the blue label layer. Using up and down cursor keys, we may flip between the two features (bands in image a):

Note that the data set a contains less samples than the original image im. This is because each sample in a represents an image block centered around it. Therefore, features cannot be computed for border pixels due to missing data.

7.2. Working with image grid ↩

7.2.1. Computing features on image grid ↩

sdextract computes local image features on a grid defined by two parameters: block and step. While block specifies the size of the local neighborhood, step denotes the spacing between two blocks. We may change these parameters using block and step options:

>> b=sdextract(im,'block',4,'step',2)
3969 by 2 sddata, class: 'unknown'

The block size is important to control the level of local details captured by the features. Setting a larger step is useful to reduce the amount of samples extracted from high-resolution imagery.

Any image, stored in a data set, provides the information on the original image size and grid settings used to compute the features via the getiminfo command. Lets display the image info for the original image im and the data set extracted with sdextract:

>> getiminfo(im)

ans = 

imsize: [128 128]

>> getiminfo(b)

ans = 

  imsize: [128 128]
    grid: 1
   block: 4
    step: 2
gridsize: [63 63]

Specific field may be quickly returned from the image info structure by providing it in getiminfo:

>> s=getiminfo(im,'imsize')

s =

   128   128

7.2.2. Visualization of feature images computed on a grid ↩

Data sets with feature vectors computed on the grid may still be visualized using the sdimage command.

>> sdimage(b)

Note the gaps between data points. We may wish to remove the grid pattern for the sake of visualization purposes. This may be done using

>> b2=sdimage(b,'grid')
3969 by 2 sddata, class: 'unknown'
>> getiminfo(b2)

ans = 

imsize: [63 63]

>> sdimage(b2)

7.3. Types of local image features ↩

sdextract provides several types of local image features. The feature type may be computed using the 'feat' option.

7.3.1. Local mean and standard deviation ↩

This set of two features is a default setting, accessible also using 'feat','moments'.

7.3.2. Local histograms ↩

Local histogram with hist_bins histogram bins is estimated in each image block. The histogram is normalized to sum to one. By default, 8 bins are spread between the min and max value of the input data set. The data range may be adjusted using data_range option.

>> c=sdextract(im,'feat','hist')
14641 by 8 sddata, class: 'unknown'

>> c=sdextract(im,'feat','hist','hist_bins',16)
14641 by 16 sddata, class: 'unknown'

>> [min(+im) max(+im)]

ans =

 0   197

>> c=sdextract(im,'feat','hist','data_range',[0 255])
14641 by 8 sddata, class: 'unknown'

7.3.3. Features computed from local histograms ↩

The 'histfeat' feature set provides five features computed from the local histogram, namely histogram mean, 2nd moment, skewness, kurtosis and entropy.

Similarly to the 'histogram' feature set, we may specify hist_bins and data_range.

7.3.4. Co-occurrence matrices ↩

Co-occurrence matrix is a two-dimensional histogram estimating probability that a pixel has a specific gray-level while a displaced pixel exhibits another gray-level. Co-occurrence matrix encodes structural information which is useful for derivation of informative data representation in texture classification problems.

This feature extractor has three parameters. The first is the number of gray-level bins considered 'cmbins'. Output data set will contain square of the 'cmbins' features. The default value is 8, leading to 64 features. It is often useful to reduce the number of bins so that the co-occurrence matrices are better filled with values. Second parameter is the displacement distance 'cmdispl' denoting the number of pixels between the pixel pairs used to fill the 2D histogram. By default, 'cmdispl' is 1. The value cannot grow higher than block size. Increasing the displacement value reduces the number of pixel pairs and thus the amount of information in the co-occurrence matrix. Finally, being a histogram, the option data_range controls the known range of values. Similarly to histogram features above, the default is minimum and maximum of the image data.

>> c=sdextract(im,'feat','cm')
14641 by 64 sddata, class: 'unknown'


>> c=sdextract(im,'feat','cm','cm_bins',4,'block',16)
12769 by 16 sddata, class: 'unknown'

>> +c(1)

ans =

  Columns 1 through 7

0.5417    0.1708    0.0063         0    0.1708    0.0833    0.0104

  Columns 8 through 14

     0    0.0063    0.0104         0         0         0         0

  Columns 15 through 16

     0         0

We may reshape the values to see the 2D co-occurrence histogram:

>> reshape(+c(1),[4 4])

ans =

0.5417    0.1708    0.0063         0
0.1708    0.0833    0.0104         0
0.0063    0.0104         0         0
     0         0         0         0

7.4. Propagating image labels ↩

When applied to sddata data set with image data, sdextract propagates image labels and all sample properties to the output data.

We may, for example, define labels my hand, painting directly in the sdimage figure. In this example, we painted background, particle and "interesting texture" regions. We save the data set in Matlab workspace using Image menu:

 >> sdimage(im)

>> Creating data set data2 in the workspace.
16384 by 1 sddata, 3 classes: 'background'(7629) 'particle'(7760) 'interesting texture'(995) 

We may now compute the features on data set data2. The class labels get propagated to the pixels that serve as block centers in feature computation:

>> a=sdextract(data2,'block',4,'feat','histfeat')
15625 by 5 sddata, 3 classes: 'background'(7517) 'particle'(7113) 'interesting texture'(995) 

>> sdimage(a)

7.5. Computing features for image regions ↩

sdextract supports computation of features in user-defined image regions. This is typically used in situation where the image covers larger area that the object of interest. Using external classifier or image processing techniques, we may often easily distinguish foreground and background. For example, separate the object from the conveyor belt. We may compute the features only on the object, not waste time on the background area.

7.5.1. Defining a mask matrix ↩

One way to limit the feature extraction is to provide a mask matrix with the same size as the original image. Let us create a mask matrix and fill the region of interest with ones:

>> m=zeros(getiminfo(im,'imsize'));
>> m(20:100,30:100)=1;

>> d=sdextract(im,'mask',m)
4736 by 2 sddata, class: 'unknown'

>> sdimage(d)

Masking operation has a user-defined parameter called 'mask_frac' specifying what must be the minimum fraction of a mask in an image block to accept the block into the sdextract output. By default, 'mask_frac' is 1 which means that only blocks fully inside the mask region are included.

7.5.2. Computing features on a data set subset ↩

Because any subset of a data set is itself an image, sdextract may be applied to it. This includes, for example, class-specific regions or regions defined by classifier decisions.

What happens on the region boundaries? perClass takes a simple approach: Features are only computed in local blocks that do not contain holes. This is important because if blocks contain holes, the extracted features may be entirely uninformative and introduce unnecessary noise in the classifier training.

Lets take the example where we compute features on the "interesting texture" regions in data set a:

>> data2
16384 by 1 sddata, 3 classes: 'background'(7629) 'particle'(7760) 'interesting texture'(995) 

Let us first extract features from the entire image and then visualize the feature image for the "interesting texture" class only. Note that we use a regular expression /int to retrieve this class by substring:

>> out1=sdextract(data2)
14641 by 2 sddata, 3 classes: 'background'(7330) 'particle'(6316) 'interesting texture'(995) 
>> sdimage(out1(:,:,'/int'))

Alternatively, we may first extract the class and then run sdextract on this image subset:

>> input=data2(:,:,'/int')
995 by 1 sddata, class: 'interesting texture'

>> out2=sdextract(input)
372 by 2 sddata, class: 'interesting texture'
>> sdimage(out2)

Note that in the second case, we get less output samples because only 372 image blocks are entirely contained in the class region.