perClass Documentation
version 5.1 (31-May-2017)

Feature extraction, table of contents

Chapter 8.4: Representing bands in spectral measurements

8.4.1. Introduction ↩

This section describes feature extraction from spectra. Single spectrum is a 1D signal measuring e.g. reflectivity of an object at different wavelengths. From pattern recognition point of view, spectral data contain rich information useful for building material classifiers.

Spectra have typically tens or hundreds of wavelengths. The key point is that data at neighboring wavelengths exhibit strong correlations. perClass offers tools for extraction of lower-dimensional feature representation from spectral data.

8.4.2. Quick example ↩

We will use a data set with spectra of French fries:

>> a
8609 by 103 sddata, 4 classes: 'rot'(1491) 'green'(1762) 'peel'(3315) 'flesh'(2041) 

Each of the 8603 measurements is represented by 103 narrow spectral wavelengths. We have four classes, namely rotten, greening, peel (potato skin) and the healthy flesh.

We will now reduce the data dimensionality by extracting spectra-specific band features.

>> b=sdextract(a,'bands','mean','size',10,'step',10)
8609 by 10 sddata, 4 classes: 'rot'(1491) 'green'(1762) 'peel'(3315) 'flesh'(2041) 

We used the sdextract command, computing bands using the mean extractor. Bands are defined by simple sliding window in spectral domain with size 10 and step 10. The output b is a new data set with 10 features, each being a mean of 10 neighboring wavelengths.

Spectral feature extraction provides a way to lower data dimensionality leveraging our prior information on wavelength ordering. We may now train a classifier such as a probabilistic model on 10D data instead of the original 103D space.

8.4.3. Spectral pre-processing ↩

perClass provides number of spectral pre-processing methods via the sdprep command. For example, it is possible to subtract a mean of each spectrum or divide values at all spectral bands (features) by value at specific band.

In this example, we create a pre-processing pipeline subtracting a mean from each input spectrum:

>> tr
37967 by 103 sddata, 4 classes: 'rot'(10066) 'green'(3062) 'peel'(12025) 'flesh'(12814) 
>> pn=sdprep(tr,'submean')
Sample mean subtraction pipeline 103x103  
>> tr2=tr*pn
37967 by 103 sddata, 4 classes: 'rot'(10066) 'green'(3062) 'peel'(12025) 'flesh'(12814) 

Possible pre-processing steps:

For smooth and der procedures, sigma option may be used to define custom window size. Computation of spectral indices ↩

Spectral indices, such as NDVI, may be computed using sdprep command, specifying index type and bands used:


>> p=sdprep(tr,'(a-b)/(a+b)',10,47)
Divide by band pipeline   103x1  
>> out=tr*p
37967 by 1 sddata, 4 classes: 'rot'(10066) 'green'(3062) 'peel'(12025) 'flesh'(12814) 
>> sdfeatplot(out)

With add option, the computed index is added after all input features.

>> p1=sdprep(tr,'(a-b)/(a+b)',10,47,'add')
Divide by band pipeline   103x104  

To join multiple indices in a single pipeline, use horizontal concatenation:

>> p1=sdprep(tr,'(a-b)/(a+b)',10,47)
Divide by band pipeline   103x1  
>> p2=sdprep(tr,'(a-b)/(a+b)',30,100)
Divide by band pipeline   103x1  

>> P=[p1 p2]
stack pipeline            103x2  2 classifiers in 103D space

>> out=tr*P
37967 by 2 sddata, 4 classes: 'rot'(10066) 'green'(3062) 'peel'(12025) 'flesh'(12814) 

>> sdscatter(out)

8.4.4. perClass band extractors ↩

perClass performs spectral band extraction using the sdextract command and sdbands command. While the sdextract returns a new data set with extracted features, sdbands returns a pipeline object that can be applied to new data or exported for out-of-Matlab execution.

perClass spectral band extraction separates the step of band definition from the feature extraction. In our quick example in previous section, we defined bands by fixing band size to 10 with step of 10. The feature extractor used was a mean of wavelength values within each band.

We may leverage two alternative ways of defining bands:

At present, perClass supports two band feature extractor mechanisms, namely 'mean' and 'LDA'. The mean extractor is applicable to any data set even if all samples belong to a single class (it is un-supervised). On the other hand, the LDA feature extractor leverages supervised class labels and trains a Fisher projection for each of the bands. The output dimensionality of the projection is defined as number of classes minus one. Therefore, we receive a single output feature for each band in case of two-class problem and e.g. 5 features per band in a six-class problem.

8.4.5. Examples ↩ Define bands by clustering ↩

Define bands by clustering of spectral domain into 5 clusters:

>> rand('state',1); 
>> b=sdextract(a,'bands','mean','cluster',5)
clustering wavelengths:done
8609 by 5 sddata, 4 classes: 'rot'(1491) 'green'(1762) 'peel'(3315) 'flesh'(2041) 

The resulting data set contains five features. Note, we have first fixed the random number generator. This allows us to repeat the same clustering procedure with identical results later on. Defining band extraction pipeline ↩

In order to create a band extraction pipeline, use sdbands command:

>> rand('state',1); p=sdbands(a,'mean','cluster',5)
clustering wavelengths:done
Band extraction pipeline  103x5  5 bands,mean extractor

The pipeline p may now be applied to any data set containing spectra with 103 wavelengths:

>> c=a*p
8609 by 5 sddata, 4 classes: 'rot'(1491) 'green'(1762) 'peel'(3315) 'flesh'(2041) 

Similarly to any other pipeline object, p may be exported for execution out-of-Matlab with sdexport. Display band information ↩

Detailed information on band extractor pipelines may be displayed using sdbands without further arguments:

>> sdbands(p)
103 input wavelengths, 5 bands, 5 output features
Mean feature extractor

  band      orig.wavelength   output
ind name       low   high      dim
 1  Band 1      1     15        1
 2  Band 2     16     58        1
 3  Band 3     59     74        1
 4  Band 4     75     84        1
 5  Band 5     85    103        1 LDA spectral feature extractor ↩

The LDA supervised feature extractor trains a Fisher projection for each band. It projects the input data to the sub-space maximizing the class separation.

>> rand('state',1); p=sdbands(a,'LDA','cluster',5)
clustering wavelengths:done
Band extraction pipeline  103x15  5 bands,LDA extractor

>> sdbands(p)
103 input wavelengths, 5 bands, 15 output features
LDA feature extractor

  band      orig.wavelength   output
ind name       low   high      dim
 1  Band 1      1     15        3
 2  Band 2     16     58        3
 3  Band 3     59     74        3
 4  Band 4     75     84        3
 5  Band 5     85    103        3

>> d=a*p
8609 by 15 sddata, 4 classes: 'rot'(1491) 'green'(1762) 'peel'(3315) 'flesh'(2041) 

In our example, each band (wavelength group) yields 3 output features because the data set a contains four classes. Defining bands manually ↩

We may specify the bands manually using the 'bands' option. We need to provide the wavelength indices for each band in a cell array.

>> rand('state',1); p=sdbands(a,'LDA','bands',{16:58 75:84})
Band extraction pipeline  103x6  2 bands,LDA extractor
>> sdbands(p)
103 input wavelengths, 2 bands, 6 output features
LDA feature extractor

  band      orig.wavelength   output
ind name       low   high      dim
 1  Band 1     16     58        3
 2  Band 2     75     84        3

We have a complete freedom in defining bands manually - they may overlap or include multiple copies of the same band.