Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”

The loss function, the normal equation,

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Principle of Locality for Statistical Shape Analysis Paul Yushkevich.

November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.

Independent Component Analysis (ICA) and Factor Analysis (FA)

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Supervised Learning Networks. Linear perceptron networks Multi-layer perceptrons Mixture of experts Decision-based neural networks Hierarchical neural.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.

This week: overview on pattern recognition (related to machine learning)

Machine Learning CSE 681 CH2 - Supervised Learning.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

An Introduction to Support Vector Machines (M. Law)

Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.

ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.

Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.

Indoor Location Detection By Arezou Pourmir ECE 539 project Instructor: Professor Yu Hen Hu.

Separating Style and Content with Bilinear Models Joshua B. Tenenbaum, William T. Freeman Computer Examples Barun Singh 25 Feb, 2002.

Linear Models for Classification

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

Chapter 13 (Prototype Methods and Nearest-Neighbors )

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Validation methods.

Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.

Intro. ANN & Fuzzy Systems Lecture 13. MLP (V): Speed Up Learning.

Feature Extraction 主講人：虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.

Intro. ANN & Fuzzy Systems Lecture 38 Mixture of Experts Neural Network.

Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Irena Váňová. B A1A1. A2A2. A3A3. repeat until no sample is misclassified … labels of classes Perceptron algorithm for i=1...N if then end * * * * *

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Unsupervised Learning II Feature Extraction

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.

Data Transformation: Normalization

Lecture 15. Pattern Classification (I): Statistical Formulation

LECTURE 11: Advanced Discriminant Analysis

Background on Classification

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Ch8: Nonparametric Methods

Overview of Supervised Learning

K Nearest Neighbor Classification

REMOTE SENSING Multispectral Image Classification

ECE539 final project Instructor: Yu Hen Hu Fall 2005

Outline Associative Learning: Hebbian Learning

EE513 Audio Signals and Systems

Feature space tansformation methods

Generally Discriminant Analysis

Separating Style and Content with Bilinear Models Joshua B

Data Transformations targeted at minimizing experimental variance

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Separating Style and Content with Bilinear Models Joshua B

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Lecture 16. Classification (II): Practical Considerations

Marios Mattheakis and Pavlos Protopapas

Presentation transcript:

Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 2 Outline Classifier design issues –Which classifier to use? –What order of a particular classifier to use? –Which set of parameters to use? Features –Feature transformation –Feature dimension reduction Removing irrelevant features Removing redundant features Output labels –Output encoding –Two-class vs. multi-class pattern classification

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 3 Classifier Design Issues Which classifier to use? –Performance: cross-validation may be used to facilitate performance comparison. Each classifier should have been fully developed –Cost: CPU time for developing and executing the algorithm Memory and storage requirements may prevent some classifier to be used

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 4 Classifier Design Issues Order Selection –Appears in almost all classifiers # of neighbors in kNN # of Gaussian mixtures in each class in ML classifier # of hidden layers, hidden neurons –Cross-validation can be used to offer hints for selecting the order of classifiers. Which set of parameters to use –When classifiers are adaptively trained (such as a MLP), different set of parameters may results due to multiple training runs. –Again, CV may be used to aid the selection of which set of parameters to use.

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 5 Feature Representation Proper feature representation is essential. Feature Transformation: Exposing important features from among unimportant features. Feature Selection: Select among a set of given features. Bad feature will confuse classifier A feature = a particular dimension of the feature vector. E.g. feature vector x = [x 1, x 2, x 3 ]. Then x i, i = 1, 2, 3 will be the features. A typical pattern classification system

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 6 Symbolic Feature Encoding Many classifiers allow only numerical input values. Features that are represented with symbols must be encoded in numerical form. eg. {red, green, blue}, {G, A, C, T} 1.Real number encoding: map each feature symbol into a quantized number in real line. E.g. red   1, green  0, and blue  in-N encoding: e.g. red  [1 0 0], green  [0 1 0], and blue  [0 0 1] 3.Fuzzy encoding: e.g. red  [ ], green  [ ], and blue  [ ]

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 7 Feature Transformation: Why? To expose structures inherent in the data, –make difficult classification problem easier to solve. –Requires in-depth understanding of the data and extensive trial-and-error To equalize the influence of different features. –Feature value ranges should often be normalized To have zero mean in each dimension To have same (or similar) ranges or standard deviation in each dimension Linear transformation = projection onto basis Nonlinear transformation –Potentially more powerful. But no easy rule to follow.

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 8 Feature Transformation: How? Bias, shift of origin: x' = x – b Linear Transformation (Data Independent) : y = T x Rotation: Scaling: y = a x DFT: 0  k  N–1 Linear Digital Filtering: FIR, IIR Others: Discrete Cosine Transform, singular value decomposition, etc.

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 9 Feature Dimension Reduction Irrelevant feature reduction –An irrelevant feature (dimension) is one that is uncorrelated with the class label. –E.g. the feature value in a dimension is a constant –E.g. the feature value in a dimension is random Redundant feature reduction –If the values of a feature is linearly dependent on remaining features, then this feature can be removed. –Method 1. Using principal component analysis (eigenvalue/singular value decomposition) –Method 2. Subset selection

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 10 Irrelevant Feature Reduction If a feature value remains constant, or nearly constant for all the training samples, then this feature dimension can be removed. Method: –Calculate mean and variance of each feature dimension. If the variance is less than a preset threshold, the feature dimension is marked for removal. If the distribution (histogram) of values of a feature corresponding to different classes overlap each other, this feature “may be” subject to removal. However, higher dimensional correlation may exist!

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 11 Redundant Feature Reduction Given a feature matrix –Each row = feature (sample) vector. –Each column = feature In other words, x 3 is redundant and hence can be removed without affecting the result of classification. Method: 1.Perform SVD on X to identify its rank r (  M). 2.Repeat M-r times: find i*, s. t. X = [x i* X r ] and the projection error is minimized. set X = X r.

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 12 Subset Section Feature Reduction Given n feature dimensions, select a subset of p dimensions such that the classification performance using the selected p dimensions is maximum among all possible selection of the subset of p dimensions. Repeatedly throw away one feature dimension that will minimize performance degradation due to feature reduction. Algorithm: a.Assume there are n feature dim remaining b.For i =1 to n discard i-th feature and use remaining feature to perform classification. Choose i * such that the remaining n-1 feature dim yield highest performance. c.Decrement n, repeat step b until n = p.

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 13 Higher Dimension Features

Intro. ANN & Fuzzy Systems (C) by Yu Hen Hu 14 Data Sampling Samples are assumed to be drawn independently from the underlying population. Use resampling, i.e. repeated train-and-test partitions to estimate the error rate. M-fold cross-validation: partition all available samples into M mutually exclusive sets. Each time, use one set as the testing set, and the remaining as training set. Repeat M times. Take the average testing error as an estimate of the true error rate. If sample size < 100, use leave-one-out cross validation where only one sample is used as the test set each time