Login Register



Auto-login on future visits

Forgot your password?

perClass Course: Modules content

Module 1: Introduction to Machine Learning.

Overview of what is needed to design a machine learning system. Supervised and unsupervised classification. Training from examples. Concept of a class, feature, data sample. Examples of several typical scenarios.

Module 2: Data and labels

How to annotate my data with multiple types of meta-data? Why this helps classifier design?

Tools: sddata object; constructing data sets; using properties; labels, categories and lists; working with data subsets; visualization using scatter and image views; normalization issues; data visualization and its relevance.

Module 3: Supervised Learning.

What is a classifier? How to train a classifier? How to choose a good classifier for my problem?

Bayes theorem; generative and discriminative classifiers; parametric and non-parametric models; naive Bayes; linear, quadratic, and mixture models; Parzen density estimation; linear discriminant analysis; nearest-neighbor rules; support-vector machines; perceptron; neural networks; decision trees; random forests

Module 4: Evaluation and model selection

How to reliably estimate classifier performance? How to choose a good performance measure? How to test classifier on unseen objects / patients?

Tools: Error and performance measures; confusion matrix; learning curves; overtraining; classifier complexity; cross-validation;

Module 5: Dimensionality reduction

Why more features aren't always giving better classifiers? How to choose or create smaller feature subset? What features are useful?

Tools: visualizing feature distributions; measures of overlap; feature selection with individual, greedy, and floating search strategies; genetic search, feature extraction; PCA, LDA, non-linear extraction methods.

Module 6: Classifier optimization

How to make sure we meet performance requirements. How to change behaviour of already trained classifier? How to deal with skewed data sets (one class much smaller than others)? How to protect classifier from outliers and concepts unknown in training?

Tools: Target detection, one-class classification, ROC analysis for two-class and multi-class problems; class imbalance; performance constraints; cost-sensitive optimization; handling of prior probabilities; rejection of outliers, rejection of low-confidence regions (to find areas of overlap = difficult samples)

Module 7: Advanced data handling. updated

How to get from raw files to data sets? How to clean raw data? How to learn from (multi-band) image data?

defining pattern recognition problem; importing images with annotation; computing local image features in regions; representation for texture and appearance classification; working with high-resolution imagery - extracting local features on a sparse grid, passing labels and classifier decisions between sparse and original image data; training from data extracted from multiple images; dealing with multi-band and hyper-spectral images; extracting spectral bands; importing data from databases using SQL queries; handling data sets that don't fit in memory; handling data validity; working with missing data (removal and imputation)

Module 8: Deep Learning updated

What is Deep Learning? What problems does it solve better than other approaches? How to build a reliable Deep learning classifier?

Tools: Building blocks of convolutional neural networks (CNNs). Strengths and weaknesses of deep learning. How to build reliable CNNs? How to integrate with other machine learning tools (ROC, cascading with other classifiers).

Module 9: Clustering, similarity representations, classifier fusion

How to define groups of similar observations? How to interpret clustering results? How to combine multiple classifiers? How to incorporate prior knowledge in custom similarity measures and learn from them.

Using clusters to quickly label data or build better classifiers in multi-modal problems; Visualizing clustering solutions; Leveraging clustering as a tool to understand the source of classification errors; Deciding on the number of clusters; Dissimilarity measures; k-means; mixture models, EM algorithm; Representing measurements by proximities; building classifier in dissimilarity spaces; Classifier fusion; crisp and trained combiners; Robust combining system based on unbiased estimation of second-stage soft outputs; Cascading of classifiers (solving difficult problems with different features/models than simple ones)

Module 10: System design

How to build robust systems? Why may optimization of a single component (classifier) not yield a good system performance? System design work-flow.

Tools: Role of meta-data, how to setup robust and realistic system evaluation, custom algorithms, automatic selection of operating points, local and object-level classification, cross-validation over objects

Module 11: Classifier deployment, embedding classifiers in production

How to move from a research prototype to a production machine? Is my classifier fast enough? How to speed up classifier execution? How to directly test research ideas real-time in production machine?

Execution complexity of classifiers; how to measure speed; Performance vs speed characteristics; Classifier speedup strategies; cascading for faster execution; Practical real-time embedding out of Matlab with perClass Runtime; linking perClass Runtime to a custom application; API walkthrough; accessing decision names; using multiple pipelines; changing operating points in production; strategies to speed up classifier execution;