Lecture 16. Classification (II): Practical Considerations

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Dimensionality Reduction PCA -- SVD
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
The loss function, the normal equation,
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Principle of Locality for Statistical Shape Analysis Paul Yushkevich.
Independent Component Analysis (ICA) and Factor Analysis (FA)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Supervised Learning Networks. Linear perceptron networks Multi-layer perceptrons Mixture of experts Decision-based neural networks Hierarchical neural.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Radial Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Linear Algebra and Image Processing
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
This week: overview on pattern recognition (related to machine learning)
Additive Data Perturbation: data reconstruction attacks.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.
Indoor Location Detection By Arezou Pourmir ECE 539 project Instructor: Professor Yu Hen Hu.
Linear Models for Classification
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Principal Component Analysis (PCA)
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.
Intro. ANN & Fuzzy Systems Lecture 13. MLP (V): Speed Up Learning.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Irena Váňová. B A1A1. A2A2. A3A3. repeat until no sample is misclassified … labels of classes Perceptron algorithm for i=1...N if then end * * * * *
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Unsupervised Learning II Feature Extraction
Presentation III Irvanda Kurniadi V. ( )
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Today’s Lecture Neural networks Training
Lecture 2. Bayesian Decision Theory
Data Transformation: Normalization
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
Lecture 15. Pattern Classification (I): Statistical Formulation
Applying Neural Networks
LECTURE 11: Advanced Discriminant Analysis
Background on Classification
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
School of Computer Science & Engineering
LECTURE 10: DISCRIMINANT ANALYSIS
Ch8: Nonparametric Methods
Lecture 19. SVM (III): Kernel Formulation
Overview of Supervised Learning
Application of Independent Component Analysis (ICA) to Beam Diagnosis
Lecture 25 Radial Basis Network (II)
REMOTE SENSING Multispectral Image Classification
Hyperparameters, bias-variance tradeoff, validation
Neuro-Computing Lecture 4 Radial Basis Function Network
Outline Associative Learning: Hebbian Learning
Descriptive Statistics vs. Factor Analysis
EE513 Audio Signals and Systems
Dimension reduction : PCA and Clustering
Feature space tansformation methods
The loss function, the normal equation,
Data Transformations targeted at minimizing experimental variance
LECTURE 09: DISCRIMINANT ANALYSIS
Mathematical Foundations of BME Reza Shadmehr
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Lecture 16. Classification (II): Practical Considerations

Outline Classifier design issues Features Output labels Which classifier to use? What order of a particular classifier to use? Which set of parameters to use? Features Feature transformation Feature dimension reduction Removing irrelevant features Removing redundant features Output labels Output encoding Two-class vs. multi-class pattern classification (C) 2001 by Yu Hen Hu

Classifier Design Issues Which classifier to use? Performance: cross-validation may be used to facilitate performance comparison. Each classifier should have been fully developed Cost: CPU time for developing and executing the algorithm Memory and storage requirements may prevent some classifier to be used (C) 2001 by Yu Hen Hu

Classifier Design Issues Order Selection Appears in almost all classifiers # of neighbors in kNN # of Gaussian mixtures in each class in ML classifier # of hidden layers, hidden neurons Cross-validation can be used to offer hints for selecting the order of classifiers. Which set of parameters to use When classifiers are adaptively trained (such as a MLP), different set of parameters may results due to multiple training runs. Again, CV may be used to aid the selection of which set of parameters to use. (C) 2001 by Yu Hen Hu

Feature Representation A typical pattern classification system Proper feature representation is essential. Feature Transformation: Exposing important features from among unimportant features. Feature Selection: Select among a set of given features. Bad feature will confuse classifier A feature = a particular dimension of the feature vector. E.g. feature vector x = [x1, x2, x3]. Then xi, i = 1, 2, 3 will be the features. (C) 2001 by Yu Hen Hu

Symbolic Feature Encoding Many classifiers allow only numerical input values. Features that are represented with symbols must be encoded in numerical form. eg. {red, green, blue}, {G, A, C, T} Real number encoding: map each feature symbol into a quantized number in real line. E.g. red  1, green  0, and blue  +1. 1-in-N encoding: e.g. red  [1 0 0], green  [0 1 0], and blue  [0 0 1] Fuzzy encoding: e.g. red  [1 0.5 0 0 0], green  [ 0 0.5 1 0.5 0], and blue  [ 0 0 0 0.5 1] (C) 2001 by Yu Hen Hu

Feature Transformation: Why? To expose structures inherent in the data, make difficult classification problem easier to solve. Requires in-depth understanding of the data and extensive trial-and-error To equalize the influence of different features. Feature value ranges should often be normalized To have zero mean in each dimension To have same (or similar) ranges or standard deviation in each dimension Linear transformation = projection onto basis Nonlinear transformation Potentially more powerful. But no easy rule to follow. (C) 2001 by Yu Hen Hu

Feature Transformation: How? Bias, shift of origin: x' = x – b Linear Transformation (Data Independent) : y = T x Rotation: Scaling: y = a x DFT: 0  k  N–1 Linear Digital Filtering: FIR, IIR Others: Discrete Cosine Transform, singular value decomposition, etc. (C) 2001 by Yu Hen Hu

Feature Dimension Reduction Irrelevant feature reduction An irrelevant feature (dimension) is one that is uncorrelated with the class label. E.g. the feature value in a dimension is a constant E.g. the feature value in a dimension is random Redundant feature reduction If the values of a feature is linearly dependent on remaining features, then this feature can be removed. Method 1. Using principal component analysis (eigenvalue/singular value decomposition) Method 2. Subset selection (C) 2001 by Yu Hen Hu

Irrelevant Feature Reduction If a feature value remains constant, or nearly constant for all the training samples, then this feature dimension can be removed. Method: Calculate mean and variance of each feature dimension. If the variance is less than a preset threshold, the feature dimension is marked for removal. If the distribution (histogram) of values of a feature corresponding to different classes overlap each other, this feature “may be” subject to removal. However, higher dimensional correlation may exist! (C) 2001 by Yu Hen Hu

Redundant Feature Reduction Given a feature matrix Each row = feature (sample) vector. Each column = feature In other words, x3 is redundant and hence can be removed without affecting the result of classification. Method: Perform SVD on X to identify its rank r ( M). Repeat M-r times: find i*, s. t. X = [xi* Xr] and the projection error is minimized. set X = Xr. (C) 2001 by Yu Hen Hu

Higher Dimension Features (C) 2001 by Yu Hen Hu

Data Sampling Samples are assumed to be drawn independently from the underlying population. Use resampling, i.e. repeated train-and-test partitions to estimate the error rate. M-fold cross-validation: partition all available samples into M mutually exclusive sets. Each time, use one set as the testing set, and the remaining as training set. Repeat M times. Take the average testing error as an estimate of the true error rate. If sample size < 100, use leave-one-out cross validation where only one sample is used as the test set each time (C) 2001 by Yu Hen Hu