Discriminant Analysis

Slides:



Advertisements
Similar presentations
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Advertisements

Component Analysis (Review)
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Dimension reduction (1)
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.
Dimensionality Reduction Chapter 3 (Duda et al.) – Section 3.8
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Principal Component Analysis
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
CS 790Q Biometrics Face Recognition Using Dimensionality Reduction PCA and LDA M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Face Recognition using PCA (Eigenfaces) and LDA (Fisherfaces)
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Ch. 10: Linear Discriminant Analysis (LDA) based on slides from
Probability theory 2008 Outline of lecture 5 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Summarized by Soo-Jin Kim
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Principles of Pattern Recognition
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Generalizing Linear Discriminant Analysis. Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller.
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 471/571 – Lecture 6 Dimensionality Reduction – Fisher’s Linear Discriminant 09/08/15.
Lecture 4 Linear machine
Linear Models for Classification
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Dimensionality reduction
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
2D-LDA: A statistical linear discriminant analysis for image matrix
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Feature Extraction 主講人:虞台文.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 09: Discriminant Analysis Objectives: Principal.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Lecture 2. Bayesian Decision Theory
Principal Component Analysis (PCA)
LECTURE 11: Advanced Discriminant Analysis
LECTURE 04: DECISION SURFACES
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
CH 5: Multivariate Methods
Classification Discriminant Analysis
Classification Discriminant Analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Computational Intelligence: Methods and Applications
Introduction PCA (Principal Component Analysis) Characteristics:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Feature space tansformation methods
X.2 Linear Discriminant Analysis: 2-Class
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

LECTURE 11: Discriminant Analysis • Objectives: Principal Components Analysis Fisher Linear Discriminant Analysis Multiple Discriminant Analysis Examples Resources: Java PR Applet W.P.: Fisher DTREG: LDA S.S.: DFA

Discriminant Analysis Discriminant analysis seeks directions that are efficient for discrimination. Consider the problem of projecting data from d dimensions onto a line with the hope that we can optimize the orientation of the line to minimize error. Consider a set of n d-dimensional samples x1,…,xn in the subset D1 labeled 1 and n2 in the subset D2 labeled 2. Define a linear combination of x: and a corresponding set of n samples y1, …, yn divided into Y1 and Y2. Our challenge is to find w that maximizes separation. This can be done by considering the ratio of the between-class scatter to the within-class scatter.

Separation of the Means and Scatter Define a sample mean for class i: The sample mean for the projected points are: The sample mean for the projected points is just the projection of the mean (which is expected since this is a linear transformation). It follows that the distance between the projected means is: Define a scatter for the projected samples: An estimate of the variance of the pooled data is: and is called the within-class scatter.

Fisher Linear Discriminant and Scatter The Fisher linear discriminant maximizes the criteria: Define a scatter for class I, Si : The total scatter, Sw, is: We can write the scatter for the projected samples as: Therefore, the sum of the scatters can be written as:

Separation of the Projected Means The separation of the projected means obeys: Where the between class scatter, SB, is given by: Sw is the within-class scatter and is proportional to the covariance of the pooled data. SB , the between-class scatter, is symmetric and positive definite, but because it is the outer product of two vectors, its rank is at most one. This implies that for any w, SBw is in the direction of m1-m2. The criterion function, J(w), can be written as:

Linear Discriminant Analysis This ratio is well-known as the generalized Rayleigh quotient and has the well-known property that the vector, w, that maximizes J(), must satisfy: The solution is: This is Fisher’s linear discriminant, also known as the canonical variate. This solution maps the d-dimensional problem to a one-dimensional problem (in this case). From Chapter 2, when the conditional densities, p(x|i), are multivariate normal with equal covariances, the optimal decision boundary is given by: where , and w0 is related to the prior probabilities. The computational complexity is dominated by the calculation of the within- class scatter and its inverse, an O(d2n) calculation. But this is done offline! Let’s work some examples (class-independent PCA and LDA).

Multiple Discriminant Analysis For the c-class problem in a d-dimensional space, the natural generalization involves c-1 discriminant functions. The within-class scatter is defined as: Define a total mean vector, m: and a total scatter matrix, ST, by: The total scatter is related to the within-class scatter (derivation omitted): We have c-1 discriminant functions of the form:

Multiple Discriminant Analysis (Cont.) The criterion function is: The solution to maximizing J(W) is once again found via an eigenvalue decomposition: Because SB is the sum of c rank one or less matrices, and because only c-1 of these are independent, SB is of rank c-1 or less. An excellent presentation on applications of LDA can be found at PCA Fails!

Summary Principal Component Analysis (PCA): represents the data by minimizing the squared error (representing data in directions of greatest variance). PCA is commonly applied in a class-dependent manner where a whitening transformation is computed for each class, and the distance from the mean of that class is measured using this class-specific transformation. PCA gives insight into the important dimensions of your problem by examining the direction of the eigenvectors. Linear DIscriminant Analysis (LDA): attempts to project the data onto a line that represents the direction of maximum discrimination. LDA can be generalized to a multi-class problem through the use of multiple discriminant functions (c classes require c-1 discriminant functions).