perClass Documentation
development version 3.1.2 (22-Dec-2011)
Content

Comments? Ideas? Compliments?

Your email (only if you wish to be contacted)

Chapter 1: Introduction

Table of contents

1.1. This manual ↩

This manual assumes basic knowledge of pattern recognition and Matlab environment. In order to embed trained classifiers into custom applications, basic familiarity with C language is also assumed.

The manual is structured in four parts:

1.2. Introduction to perClass ↩

perClass is a software toolkit simplifying design and deployment of pattern recognition algorithms. It consists of two components, the Matlab-based perClass Toolbox facilitating algorithm design and the perClass runtime library delivering the execution of trained classifiers to custom applications.

perClass provides tools for:

  • Construction of data sets
  • Handling of multiple sets of labels and arbitrary meta-data
  • Interactive visualization of data and meta-data
  • Training statistical detectors and discriminants
  • Quick evaluation of classifiers
  • Optimizing classifier decisions according to performance requirements using two-class and multi-class ROC analysis
  • Building hierarchies of classifiers
  • Deploying trained classifiers in custom applications out of Matlab

1.2.1. Versions ↩

perClass comes in the following versions:

  • Lite: Free limited version for non-commercial use intended for people who are learning about pattern recognition. It contains only perClass Toolbox and is limited to data sets with maximum 300 samples and three classes.
  • Academic toolbox: perClass Matlab Toolbox discounted for use by university students and researchers for non-commercial projects. The license is permanent and bound a hardware dongle which allows the researchers to move between different machines.
  • Academic full: Full version of perClass discounted for use by university students and researchers for non-commercial projects only. The license is permanent, bound a hardware dongle and includes both the perClass Toolbox and the perClass runtime library for execution of trained classifiers out of Matlab.
  • Commercial: Full version of perClass for commercial use. It includes a license of perClass Toolbox bound to a hardware dongle and deployment of trained classifiers in custom commercial applications using perClass runtime library on the same hardware dongle.

For Academic and Commercial versions, also group licensing is available using floating licenses provided by a license server.

1.2.2. System requirements ↩

perClass is supported on the following platforms:

  • MS Windows 32-bit
  • MS Windows 64-bit
  • GNU Linux 32-bit (x86)
  • GNU Linux 64-bit (x86)
  • Apple Mac OS X 32-bit (x86)
  • Apple Mac OS X 64-bit (x86)

perClass requires Matlab 7.5 or later

1.2.3. Useful general commands ↩

1.2.3.1. Displaying perClass version and license information ↩

perClass may be displayed using sdversion. It consists of a numerical part (e.g. 2.0.9) and a build date (08-Mar-2010).

sdversion also provides several license-related details such as license type (Commercial, Academic or Lite), licensee name and the license expiration date.

>> sdversion
perClass Toolbox 3.0 (01-May-2011), Copyright (C) 2007-2011, PR Sys Design, 
All rights reserved Commercial license for perClass. The license will expire on 26-apr-2011.

1.2.3.2. Demo examples ↩

sddemo lists several basic examples to get started

>> sddemo
run perclass_demo(num) where num is the index of the desired example

 1 : Working with data sets
 2 : Training a classifier and visualizing decisions
 3 : Tuning a classifier using ROC analysis
 4 : Multi-class ROC analysis
 5 : Building detectors
 6 : Building a detector-classifier cascade

1.2.3.3. Provide direct feedback to PR Sys Design ↩

sdfeedback command allows users to submit feedback such as error messages to PR Sys Design directly from within Matlab. Running sdfeedback without arguments opens an edit dialog where the user may paste or type the desired message. An alternative is to provide the message to sdfeedback as a string.

1.2.3.4. Control messages displayed by perClass ↩

sddisplay command provides global verbosity control in perClass. Running sddisplay without arguments prints the current display state (on/off). To switch off messages printed by perClass, use:

>> sddisplay off

Default sddisplay state is on. When perclass_mex library is re-loaded into memory, this default state is re-introduced.

Alternatively, you may use the 'nodisplay' option in the functions that support it: sdrelab, sdroc, sddetector and sdcrossval.

1.3. Release notes ↩

Version 3.1.2 (22-Dec-2011)

  • sdexport now supports export to C45 data format
  • sdscatter visualization of sdtree decisions now allows to interactively change the number of thresholds with a slider
  • fixing the problem with sdrun MEX which was returning int not a double pipeline index (issue pC-1267)
  • fixing the issue with setting the number of decision tree nodes

Version 3.1.1 (24-Nov-2011)

  • sdscatter now shows a value of the soft output under the mouse cursor in the figure title
  • polygon drawn in sdscatter figure may be saved also by pressing the 's' key
  • fix of sdscatter problem when showing soft outputs
  • sddecide now gives informative error message when given an empty ROC object (result of constraining when no operating point is available)
  • fix of erroneous sdp_combine product combiner output

Version 3.1 (14-Nov-2011)

  • fast and highly scalable decision tree (sdtree) and random forest (sdrandforest) classifiers
  • polygon classifier may be drawn interactively in sdscatter figures
  • classifier acceleration with 2D lookup table (sdlut) classifier
  • sdfeatsel now supports feature selection based on trained decision tree
  • sdimport supports reading of sdlab labels separately
  • randsubset method supports bootstrap sampling
  • sdexport displays minimum version of perClass runtime required to execute exported pipeline. This means that deployed runtimes may be updated only when needed.
  • sdfeatplot adds interactive selection of feature threshold.
  • sdcascade now supports classifier cascades as inputs
  • Improvements of sdscatter displaying classifier decisions:
    • Decision under cursor is now shown in the Figure title
    • The color of the decision under cursor may be changed using 'c' key
    • Pipeline drawing decisions may be saved back into Matlab workspace with the 's' key. Pipeline contains changed decision colors.
  • sdimage improvements:
    • show decisions of arbitrary pipeline in Matlab workspace
    • execute k-means clustering on image data using 'Cluster with k-means' menu command ('c' keystroke)
    • switching to a different class by pressing a digit
  • fix for the sdscatter with backdrop problem with flipped fonts (bug in Windows ATI drivers). added Shift-A keystroke to switch off the alpha level (see http://perclass.com/index.php/forums/viewthread/260)
  • fix for crash due to memory leak occuring when computing local histogram features for specific data ranges
  • fix for internal use of tic/toc functions inside sdexe and classifier execution.
  • fix for possible crash when using rejection-based operating point
  • fix for the error raised when switching between multiple scatter/ROC plots

Version 3.0.0 (6-Jun-2011)

  • PRSD Studio is renamed into perClass (How to transition to 3.0)

  • new functionality for handling image data

    • support for storing image data in data sets through sdimage command. Arbitrarily-shaped regions from multiple images may be stored in a sddata object. Single or multi-band images are supported.
    • visualization of arbitrarily-shaped pixel subsets. See example
    • texture and appearance features may be computed in local image regions using new sdextract function
      • support for user-defined grid (region size and step)
      • local histograms, features of local histograms, co-occurrence matrices
      • high-speed feature extraction (extracting 86000 co-occurrence matrices from 1024x1300 image takes 230 ms on a laptop)
  • new functionality for execution runtime

    • significantly faster execution runtime
    • labels and decisions are represented by integers, not doubles
    • C API for precision timers (sd_Tic and sd_Toc) available to custom applications out of Matlab on all platforms
    • new deployment tool sdrun for easy execution of trained classifiers using Matlab compiler. sdrun is implemented as a single statically-linked mex. To bring perClass classifier execution into custom Matlab application you only need to copy one mex binary and include the pipeline and license files.
    • new ASCI-based pipeline file format allows embedding of classifiers directly in a source code (see ex_buffer.c SDK example)
  • new toolbox functionality

    • new sdcluster function for direct clustering of a data set with user-defined model. sdcluster returns data set with cluster labels. Clustering is performed per class and original labels are preserved. See the new chapter on clustering
    • .* operator. Applying a pipeline returning decisions to a data set with .* operator returns a data set with decisions set as new labels. This is useful to get clustering results or image labels in one step. Example: b=a.*p is equivalent to dec=a*p; b=a; b.lab=dec;
    • sdscatter improvements
      • 'show all' menu command for each property
      • save filter to workspace. Filter is stored as a structure which may be easily edited by hand and loaded back into sdscatter
    • sdfeatplot improvements (read more)
      • left/right cursor keys move to first/last feature, respectively
      • 's' keystroke switches to stem-plot highlighting individual histogram bins
      • 'u' keystroke uses only unique values, instead of default histogram bining
      • 'a' keystroke sets automatic binning
      • 'x' keystroke allows to specify name of variable defining x-axis bins (e.g. logarithmic)
      • 'lab' option specifies the label set used (default: 'lab', example: sdfeatplot(data,'lab','tissue'))
      • 'bins' option allows bin specification from command-line
    • sdrelab includes new 'all' option that sets all samples to a specific class. This works both for labels and data sets.
    • sdrelab may rename pipeline decisions by providing a new list. Example: pd=sddetector(a,'target',sdgauss); pd2=sdrelab(pd,sdlist('accept','reject')); read more
    • generate more data from a Gaussian model using sdgenerate. Example: p=sdmixture(data); b=sdgenerate(p,1000);
    • sdconfmat header lines may be suppressed with 'no header' option. This is usefull when concatenating multiple confusion matrices, e.g. for each patient into one larger table.
    • sdknn accepts k also directly after data set as the second argument. Example p=~sdknn~(a,10) instead of p=~sdknn~(a,'k',10)
    • sdsvc accepts the type (linear, RBF or polynomial) as a direct parameter. Example: p=sdsvc(a,'linear')
    • operating point marker and color in sddrawroc may be changed by the second parameter. See example
  • new core-level functionality

    • length of sddata returns number of samples (see discussion at: http://prsdstudio.com/index.php/forums/viewthread/301)
    • subset and sdrelab preserve the user-specified order of classes when processing a single set of labels.
    • sorting label list using sdlist sort method or sdlab sortlist method. Only order of classes is changed, not the sample labeling.
    • sddata find supports regular expressions to return sample indices. Example: ind=find(a,'/substring') returns indices of all samples with classname containing substring.
    • direct assignments into sddata property. Example: a(1:10).lab='orange' or a.lab(1:10)='orange'
    • label assignment supports also class indices. Example: lab(1:10)=2 assigns first ten samples to second class in the lab.list.
  • fixes:

    • sdp_affine fix for scaling, labels optional
    • sdlab does not include extra space in sdlab('Feature',1:10) constructor
    • sdpca allows 1D output
    • sdscatter window will not jump out of the screen when switching on the distribution plots
    • sddata/randsusbet and sddata/subset return subset indices in column order

Version 2.4.0 (7-Feb-2011)

  • new execution utilities (commercial and academic full versions only)
  • toolbox improvements
    • improvements in horizontal label concat (omitting internal spaces + scalability to very large data sets (one million samples uner half a second))
    • fixing sdmixture problem where training set priors were not used by default
    • sdtree classifier adds 'levels' option that may significantly speedup training
    • PRTools AdaBoost classifiers with decision tree or stump base learners may be converted into pipelines using the sdconvert command.

Version 2.3.0 (13-Dec-2010)

  • sdcrossval now provides per-fold measurements of execution speed
  • fixing a bug in sdkmeans that could cause crash for multi-class data sets with high overlap
  • fixing the problem in sdscatter where regular expression could not be applied to a subset of samples
  • fix for the call subset(data,'lab',{}) which was not throwing error
  • randsubset now works for PRTools datasets

Version 2.2.5 (24-Nov-2010)

  • regular expressions allow simple definition of data subsets and sdrelab. Strings starting with slash / character are interpreted as regular expressions. For example: subset(data,'/good') returns all classes containing the word 'good'.
  • sdscatter enhancements:
    • undo the last label painting operation (u key or Undo painting command in scatter right-click menu)
    • cycle through all classes showing one at a time (< and > keys)
    • select class subset by regular expression (/ key)
    • class to top (t key)
  • new dissimilarity measures in sdprox (Spectral Angle Mapper, Kolmogorov distance, Match distance)
  • sdmindist classifier directly applicable to dissimilarity representations
  • sdfeatplot enhancements:
    • allows selection of a label set used for plotting the per-group distributions. sdfeatplot(data,'lab','patient') will show per-patient histogram for each feature.
    • change of default behaviour: sdfeatplot now uses all data to construct histograms, use 'maxsamples' option to limit sample count used for large data sets.
    • fixing the problem in sdfeatplot related to constant-value features; sdfeatplot now also shows the constant feature value if present.
  • sddrawroc supports interactive zooming
  • fix in sdscatter sample inspector showing correct labels when focusing on a sample subset

Version 2.2.4 (5-Oct-2010)

  • support for Mac OS X 64-bit platform
  • new sdimport command for loading sddata objects from text files. User may specify what columns correspond to data matrix, labels and additional sample properties. (read more)
  • sdexport command can store sddata in a comma-separated file (read more)
  • sdsvc support for linear and polynomial kernels including automatic grid search (read more)
  • support for incremental Support Vector Data Description (incsvdd) from DD_Tools.
  • adding support for creating sdlab labels using a vector and class names
  • sdfeatplot allows user definition of line styles used for plotting class-feature distributions (see example)
  • sdp_affine turns empty offsets into zero vectors (forum discussion)
  • sddetector now supports test sets also in one-class mode ('reject' and 'test' options used together)
  • fix of a bug related to scaling proximity data with sdscale
  • fix of a bug in sdprox where prototypes were unnecessarily sorted

Version 2.2.3 (29-July-2010)

  • sdsvc allows to identify training samples that became support vectors (using original property of support vectors set p{1}.proto)
  • sddetector support for externally defined test set using test option
  • sdfeatsel floating search provides history of feature subsets selected by individual steps.
  • sdfeatsel adds a test option which may be used to supply external data set used for evaluation of 1-NN error criterion
  • sddecide allows construction of an operating point manually. Support for both weighting-based discriminants and thresholding-based detectors.
  • sdsvc support for setting external data set used for error estimation in parameter grid search with `test option
  • sddata supports cell array properties
  • sdscatter user callbacks are now accessible using 'callback' option
  • untrained classifier pipelines now return names using getname

Version 2.2.2 (22-June-2010)

  • fast feature selection sdfeatsel scalable to large data sets (forward search with 1000 samples, 50 features under five seconds). Individual, random selection, forward, backward and floating searches are supported using 1-NN error on a validation set as a criterion. Feature subset size is selected automatically.
  • sdscatter called when clicking on an data sample. This allows to custom visualization such as loading an image corresponding to a sample form disk and showing it in a separate figure.
  • support for untrained high-level operations on data (subset, randsubset, sdrelab, sdroc). This allows one to easily express complex sequences of training operations.
  • extended sdscale supporting also robust domain scaling (robust in presence of outliers)
  • cascades may be now trimmed to return output after specific stage using sdconvert, e.g. pc2=sdconvert(pc,'until',2). This helps us to understand how the later stages of hierarchical classifiers improve performance.
  • experimental support for Mac OS X 64-bit platform
  • sdscatter fix for decision colormap when showing classifier decisions

Version 2.2.1 (3-May-2010)

  • interactive visualization of feature distributions in sdscatter for both axes (use 'Show feature distribution' in 'Scatter' menu or press 'd'). This greatly simplifies understandingo of overlap in very large data sets where scatter plot is not too informative. (example)
  • sdkmeans classifier and clustering scalable to very large data sets (1 million samples, 10 clusters in 3.3 sec). sdkmeans provides fast prototype selection method for k-NN classifiers. Classification performance is further improved by prototype pruning (similar effect to editing the training set).
  • sdkcentres classifier and clustering
  • randsubset allows to limit the maximum number of samples using 'atmax' option. This is useful to limit samples size but tolerate that some classes have less samples.
  • find and subset now allow that some of the class names do not exist and return what is present (and not empty [] as before)

Version 2.1.0 (21-Apr-2010)

  • fixing a bug in sddecide related to adding an operating point in an ROC object
  • fixing an error message in sdlab constructor
  • adding RBF support vector machine training using sdsvc command. sdsvc is based on libSVM and offers automatic grid search for sigma and C parameters and one-against-all multi-class support. (examples)
  • adding a reject option to a trained discriminant using the sdreject function (also for multi-class classifiers; both outlier rejection and rejection close to the decision boundary) (examples)
  • sdcrossval support for estimating ROC with variances using operating point averaging (cross-validate pipline returning soft outputs and provide fixed operating points using the 'ops' option), (example)
  • adding sdcrossval support for custom sdalg algorithms that are not convertible into a pipeline (algorithm needs to return the list of all possible decisions)
  • sddrawroc now saves complete sdroc objects back in the workspace, not only operating points (by pressing 's' key)
  • sddecide support for default op.point based on thresholding (e.g. for sdsvc on two-class problems)
  • support for clustering using sdmixture with 'cluster' option
  • sdscatter adding the "show only this class" command (press 'o' key)
  • default mean-error performance measure in sdcrossval is not anymore included if user requests a specific set of measures
  • sdneural may switch off the default use of validation for teaching purposes (to illustrate overfitting of the network). Use 'valfrac',[] to suppres the use of validation set.
  • fixing the problem with sdroc using 'confmat' and 'reject' options together
  • fixing the bug in sdlab constructor for single label per class
  • improving compatibility with PRTools (sdimage,sddetector,sdreject,sdcrossval, sdstackgen, sdscatter visualizing images using sample inspector)

Version 2.0.9 (8-Mar-2010)

  • adding support for subset by logical array for sddata and sdlab objects (example: a( a.lab=='banana' ))
  • sdtest raises a warning if some of the true classes are not matched to classifier decisions (all samples from these classes are considered misclassified)
  • fixed sdscatter problem with the order of classes in "class on top" and "change markers"
  • usability improvements in sdfeatplot (click to change figure title; legend properly displaying special characters)
  • 'mean-error' performance measure may specify optional class priors used for weighting the class errors
  • global display verbosity may be handled using prsd_display command (use prsd_display off to switch off display output of PRSD Studio functions).
  • 'nodisplay' options added in sdmixture, sdparzen, sdcrossval
  • randsubset supports random selection of objects from some classes only (example: [tr,ts]=randsubset(a,[0.5 0]) returns 50% of the first class for the training)
  • sdcrossval outputs string with the result summary, result struct and the evaluation object.

Version 2.0.8 (19-Feb-2010)

  • fix in sdimage for multi-dimensional images (image cubes)
  • pipelines now provide operating points via p.ops field
  • API interface simplification and cleanup
  • low-level output of pipelines on matrices and using C API returns indices to decision list as decisions, not the internal codes
  • sdlist and sdlab internal numerical representation is not exposed to the user anymore
  • feature selection pipelie sdp_fsel now may get the feature labels directly from the data set pf=sdp_fsel(data,[3 4])
  • sddetector handles output polarity automatically (k-NN output is distance, mixture output is similarity)
  • adding easy display of sdlab object details (class sizes, fractions) using the transpose operator (lab')

Version 2.0.5 (22-Dec-2009)

  • classifier output visualization using sdscatter can now switch between different soft outputs interactively using cursor keys
  • added constrain method for easy application of ROC performance constraints
  • enhanced setcurop method to choose operating point minimizing or maximizing specific performance measure or setting op.point based on costs
  • new performance measure nconfmat - the entry in normalized confusion matrix
  • 'target' and 'non-target' options in sddetector and sdroc setting the desired target/non-target names
  • setstate method in sdalg algorithm allows to call algorithm function directly (instead of using the multiplication operator)

Version 2.0.4 (14-Dec-2009)

  • sdrelab now allows to add string prefix to all classes in all labels present using 'add to all' option. This makes it easy to compare two data sets with multiple labelings (classes, patients, tissues).
  • adding sdscale command for data scaling

Version 2.0.3 (9-Dec-2009)

  • isclass method for quick check if certain classes are present (useful for custom algorithms)
  • sdnorm function adding normalization step to a trained pipeline (this construct a general discriminant)
  • sdlab fix for incorrect class size when initialized with a list and indices
  • adding initial version of auto-conversion for older-format @sdppl/sdppl and @sdops/sdops objects
  • fix for the inMathOverflow warning/error in sdtree training

Version 2.0.2 (4-Dec-2009)

  • new sdlab object simplifies handling of labels, decisions and indexed meta-data
  • new sddata object brings easy handling of sample meta-data
  • multiple sets of labels or meta-data in a dataset, unified access to sample properties
  • simple queries using multiple criteria (give me all samples labeled as "Cancer" from patient 1,2 and 5 using subset(a,'class','Cancer','patient',[1 2 5]))
  • access to classes is greatly simplified
  • sdroc handles classifier output polarity automatically (sdexe stores the output type in output_type data property)
  • user may change class markers. Data set remembers class markers. Scatter markers are stored in the 'marker' property inside the class list.
  • dissimilarity representation contains as feature properties all prototype sample properties
  • labels and decisions may be easily concatenated. This allows us to add new labels with brake-down of errors (confusion-matrix entries) in one command.
  • writing custom sdalg algorithms is significantly simplified

1.x Compatibility changes

  • sdppl objects use new internal format.
  • sderror replaced by sdtest

Version 1.3 (30-Nov-2009)

  • fix in sdnmean classifier: now computing pooled diagonal covariance using class priors
  • adding missing parse_measures.p file
  • fixing p-code copatibility problem with Matlab 7.4

Version 1.2.5 (12-Oct-2009)

  • fixes in findprop for numerical properties
  • adding 'all' and 'nodisplay' options to sdrelab

Version 1.2.4 (13-Aug-2009)

  • sdtree implements training of decision tree classifier scalable to large number of samples (example)
  • fix in prsd_feedback correcting the problem with PRTools not on Matlab path

Version 1.2.3 (15-Jul-2009)

  • visualization: sdscatter provides more detailed information in sample inspector including all sample meta-data
  • sdrelab: adding prefix or suffix to all class names. (example)
  • sdrelab: renaming a single class by relative index
  • simpler installation: PRSD Studio Lite installation does not anymore require software activation
  • sdroc: support for reject option on classifiers with distance soft output (sdknn)
  • selprop,findprop support for set of property values defined by cell array

Version 1.2.2 (16-Jun-2009)

  • libPRSD: support for AdaBoost execution using decision tree as base classifiers
  • visualization: sdscatter allows interactive change of classifier parameters using slider (k in k-NN, smoothing in Parzen, number of base classifiers in AdaBoost)
  • visualization: sdimage may be connected to ROC plot and visualize decisions at different operating points in real-time
  • sdneural provides target option that allows one to approximate trained classifiers (example)
  • sdroc: fraction of all objects may be rejected by specifying fraction after reject option

Version 1.2.1 (27-May-2009)

  • sdnbayes implementing Naive Bayes classifier with automatic selection of number of histogram bins
  • sdroc now supports cost-based selection of operating point for two-class scenario (in addition to the existing multi-class cost-based optimization)
  • sddecide may be used in pipelines to define default operating point
  • sdp_affine can construct simple feature scaling pipelines

Version 1.2 (19-May-2009)

  • sdmixture supports automatic estimation of number of components
  • sdneural implementing feed-forward neural network training
  • sdcrossval now supports untrained pipelines

Version 1.1.6 (9-May-2009)

  • sdparzen Parzen classifier implementing scalar and vector smoothing
  • sdknn k-th nearest neighbor classifier with for prototype selection and support for both detection and multi-class classification

Version 1.1.5 (1-May-2009)

  • libPRSD now supports loading pipelines also from a buffer using prsd_LoadPipelineFromBuffer (pipelines may be now stored in application resources or sent over network).
  • sdroc supports rejection both far away and close to the decision boundary using the reject option.
  • sdscatter: the figure title may be selected interactively by clicking on the title area
  • simplified selection us performance measures in sdroc

Version 1.1.4 (19-Mar-2009)

  • adding support for group licenses via license server
  • support for construction of arbitrary hierarchical classifiers using decision-level fusion and their execution through libPRSD
  • sddetector brings one-command construction of detectors based on arbitrary model (both in one-class setting specifying a threshold using fraction of rejected samples and in two-class setup using ROC analysis to fix the threshold minimizing mean error).
  • sddrawroc allows to save the current operating point into any relevant object (sdroc,sdops, pipelines, sddecide mappings, custom sdalg algorithms)
  • introducing sdmixture for training Gaussian mixture models (one- or multi-class, variable number of components per class, different stopping criteria (iterations or likelihood delta))
  • sdrelab allows to define classes by ~ (tilda) negation operator (e.g. turn all what is not not apple into "non-apple")
  • sdscatter allows the user to flip through order of classes (z-order) by + and - key-strokes
  • number of usability improvements in construction of pipelines and interaction with PRTools (sdroc and sdops objects can be now directly concatenated into pipelines; sdmap wraps pipelines for use in PRTools)
  • many improvements in confusion matrix estimation: sdconfmat
  • setprop now allows to quickly set property to a constant value. This makes it very easy to quickly tag a group of samples with a specific label.
  • sdconfmat can now add new labels with all confusion matrix combinations as a property. This can be used to quickly visualize different types of error directly in the feature-space
  • sdconfmat cosmetic fix: string confusion matrix scales nicely with long class names
  • new function selprop returning a subset of a dataset with given property values
  • significant improvements in scalability of sdroc to large datasets in speed and memory usage. Practical even for datasets with 100 000 samples and tens of thousands of operating points.
  • improved ROC optimizer brings better quality sets of operating points
  • sdconfmat can now estimate confusion matrices for sets of operating points from the soft outputs
  • sdexe can return numerical decision codes ('code' option). This is useful for low-level work with classifier outputs.
  • pipelines can return numerical decision codes using .* operator (e.g. dec=data.*p)
  • sdeaclust clustering can be now executed on new data. Scalable to very large datasets (images).

Version 1.1.3 (26-Jan-2009)

  • fix in sdscatter allowing to paint labels with legend switched on
  • fix in sdscatter retaining the type of numerical properties in a dataset saved back to workspace
  • sdscatter can now switch visibility of classes or groups on/off. That's helpful when inspecting large datasets with many overlapping sample groups (patients). See context menu in sdscatter Figure windows. Painting now applies only to visible samples.
  • initial support for hierarchical systems composed of multiple classifiers returning decisions (sdp_cascade)
    • support for meta-classes and different features at each classifier node.
    • ROC analysis for hierarchical systems
  • sdconfmat added
    • the order of labels and decisions (lablists) can be fixed by the user
    • sdconfmat can correctly handle situations where only some classes/decisions are present in the test set (given the full lablists)
    • sdconfmat can return the string with a table
    • support for normalization of confusion matrices
    • lablists may be supplied as cell arrays of strings or string arrays
  • support for weight-based operating points with reject option (rejection both close to the boundary and distance-based)
  • sdroc automatically shows rejected fraction and all per-class TPrs
  • support for similarity-based nad distance-based classifier outputs
  • adding reject fraction estimate to sdroc
  • support for leave-one-out over a property (object, person, patient...)
  • fix for the bug where sdscatter made error when mouse pointer was moved too quickly over the new window

Version 1.1.2 (18-Nov-2008)

  • adding fast approximated k-NN see example in our blog
  • adding a k-centres classifier capable of both one-class classification and multi-class discrimination
  • feature selection algorithm sda_featsel now supports also backward feature selection

Version 1.1.1 (09-Nov-2008)

  • adding leave-one-out evaluation to sdcrossval
  • adding sdfeatsel: robust feature selection using internal cross-validation loop. It supports custom-made feature selection algorithms
  • two example algorithms added illustrating the use of feature selection during training (sda_featsel_example1) and in inner cross-validation loop based on sdfeatsel (sda_featsel_example2)

Version 1.1 (04-Nov-2008)

  • fixing critical bug in 1.1 26-Oct-2008 related to problem with dongles
  • fixing the issues with one-sample test sets in ROC

Version 1.1 (26-Oct-2008)

  • sdscatter gets full support for GUI menus and class renaming
  • new sdimage command visualizing image stored in a dataset. Support for label paiting, class renaming, multiple sample groupings, connection to sdscatter
  • sdscatter support for interactive sample inspector (datasets with 1D data using bar plot or 2D images)
  • sddrawroc can now show confusion matrices at the cursor and at the selected operating point (if present i.e. if 'confmat' flag was specified in sdroc command)
  • sdexe now automatically converts sdalg algorithms into pipelines
  • sdstackgen now returns also a robust base classifier (mean fusion of per-fold trained base classifiers) as a second output
  • improved support for prtools classifiers with output conversion
  • fix: sdroc now stores confusion matrices in multi-class situations using 'confmat' flag
  • fix to scaling using affine projection. scalem is now supported for all affine scaling types

Version 1.0 (15-Sep-2008)

  • added randomization cross-validation scheme sdcrossval(nmc,data,'method','random')
  • ROC object may be queried using short names of measurements r(:,'err(Cancer)')
  • activation support for commercial demos

Version 1.0 (02-Sep-2008)

  • fix: included missing sda_prtools wrapper
  • new feature: sdscatter now allows for user-defined titles (sample details moved to the figure title bar)