ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:

Slides:



Advertisements
Similar presentations
Component Analysis (Review)
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Self Organization: Hebbian Learning CS/CMPE 333 – Neural Networks.
Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)
Face Recognition Jeremy Wyatt.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
Ch. 10: Linear Discriminant Analysis (LDA) based on slides from
Techniques for studying correlation and covariance structure
SVD(Singular Value Decomposition) and Its Applications
Summarized by Soo-Jin Kim
Linear Least Squares Approximation. 2 Definition (point set case) Given a point set x 1, x 2, …, x n  R d, linear least squares fitting amounts to find.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Principles of Pattern Recognition
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Discriminant Functions
AN ORTHOGONAL PROJECTION
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
CSE 185 Introduction to Computer Vision Face Recognition.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 471/571 – Lecture 6 Dimensionality Reduction – Fisher’s Linear Discriminant 09/08/15.
Linear Models for Classification
Discriminant Analysis
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Dimensionality reduction
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Feature Extraction 主講人:虞台文.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 09: Discriminant Analysis Objectives: Principal.
Objectives: Normal Random Variables Support Regions Whitening Transformations Resources: DHS – Chap. 2 (Part 2) K.F. – Intro to PR X. Z. – PR Course S.B.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
LECTURE 11: Advanced Discriminant Analysis
LECTURE 04: DECISION SURFACES
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
University of Ioannina
LECTURE 10: DISCRIMINANT ANALYSIS
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Introduction PCA (Principal Component Analysis) Characteristics:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Feature space tansformation methods
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Feature Selection Methods
Principal Component Analysis
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives: Mean-Square Error Objective Function Projection Operator Whitening Transformation Class-Independent PCA Class-Dependent PCA Examples Resources: D.H.S.: Chapter 3 (Part 2) JL: Layman’s Explanation QT: PCA Introduction JP: Speech Processing KF: Fukunaga, Pattern Recognition

ECE 8527: Lecture 08, Slide 1 Component Analysis Component analysis is a technique that combines features to reduce the dimension of the feature space. It can also be the basis for a classification algorithm. Features of this approach include:  Statistical decorrelation of the data.  Linear combinations are simple to compute and tractable.  Project a high dimensional space onto a lower dimensional space.  Gain insight into the relative importance of each feature. Three classical approaches for finding the optimal transformation:  Principal Components Analysis (PCA): projection that best represents the data in a least-square sense.  Multiple Discriminant Analysis (MDA): projection that best separates the data in a least-squares sense.  Independent Component Analysis (IDA): projection that minimizes the mutual information of the components.

ECE 8527: Lecture 10, Slide 2 Principal Component Analysis Consider representing a set of n d-dimensional samples x 1,…,x n by a single vector, x 0. Define a squared-error criterion:  The solution to this problem is given by:  The sample mean is a zero-dimensional representation of the data set. Consider a one-dimensional solution: In other words, the best one-dimensional projection of the data (in the least mean-squared error sense) is the projection of the data onto a line through the sample mean in the direction of the eigenvector of the scatter matrix having the largest eigenvalue (hence the name Principal Component). For the Gaussian case, the eigenvectors are the principal axes of the hyperellipsoidally shaped support region!

ECE 8527: Lecture 10, Slide 3 Why is it convenient to convert an arbitrary distribution into a spherical one? (Hint: Euclidean distance) Consider the transformation: A w = Φ Λ -1/2 where Φ is the matrix whose columns are the orthonormal eigenvectors of Σ and Λ is a diagonal matrix of eigenvalues (Σ= Φ Λ Φ t ). Note that Φ is unitary. What is the covariance of y=A w x? E[yy t ]= (A w x)(A w x) t =(Φ Λ -1/2 x) (Φ Λ -1/2 x) t = Φ Λ -1/2 x x t Λ -1/2 Φ t = Φ Λ -1/2 Σ Λ -1/2 Φ t = Φ Λ -1/2 Φ Λ Φ t Λ -1/2 Φ t = (Φ Φ t ) (Λ -1/2 Λ Λ -1/2 )(ΦΦ t ) = I Why is this significant? Alternate View: Coordinate Transformations

ECE 8527: Lecture 10, Slide 4 Examples of Maximum Likelihood Classification For the Gaussian case, the eigenvectors are the principal axes of the hyperellipsoidally shaped support region! Let’s work some examples (class-independent and class-dependent PCA).some examples In class-independent PCA, a global covariance matrix is computed:  In this case, the covariance will be an ellipsoid.  The first principal component will be the major axis of this ellipsoid.  The second principal component will be the minor axis. In class-dependent PCA, which requires labeled training data, transformations are computed independently for each class. The ML decision surface is a line (or a parabola in this case).

ECE 8527: Lecture 09, Slide 5 Summary Introduction of PCA as a whitening transformation. Demonstration of class-dependent vs. class-independent. Note that the eigenvectors of the covariance matrix provide insight into your features (since the eigenvectors represent a weighted sum of your features). The eigenvalues provide insight into the proportion of the variance explained by each eigenvector (ratio of an eigenvalue to the sum of the eigenvalues).