Principal Component Analysis and Linear Discriminant Analysis

Slides:



Advertisements
Similar presentations
Números.
Advertisements

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Introduction to Graphing The Rectangular Coordinate System Scatterplots.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
3.6 Support Vector Machines
Fill in missing numbers or operations
EuroCondens SGB E.
STATISTICS Joint and Conditional Distributions
STATISTICS Linear Statistical Models
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Addition and Subtraction Equations
1 Superior Safety in Noninferiority Trials David R. Bristol To appear in Biometrical Journal, 2005.
Thursday, March 7 Duality 2 – The dual problem, in general – illustrating duality with 2-person 0-sum game theory Handouts: Lecture Notes.
Continuous Numerical Data
Whiteboardmaths.com © 2004 All rights reserved
CALENDAR.
2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt Time Money AdditionSubtraction.
The 5S numbers game..
Algorithms and applications
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Break Time Remaining 10:00.
The basics for simulations
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
MM4A6c: Apply the law of sines and the law of cosines.
Ideal Parent Structure Learning School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel Gal Elidan with Iftach Nachman and Nir.
Eigen Decomposition and Singular Value Decomposition
Chapter 6 The Mathematics of Diversification
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Progressive Aerobic Cardiovascular Endurance Run
Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.
Artificial Intelligence
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
2.10% more children born Die 0.2 years sooner Spend 95.53% less money on health care No class divide 60.84% less electricity 84.40% less oil.
Subtraction: Adding UP
Static Equilibrium; Elasticity and Fracture
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Eigen Decomposition and Singular Value Decomposition
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
9. Two Functions of Two Random Variables
A Data Warehouse Mining Tool Stephen Turner Chris Frala
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
Chapter 5 The Mathematics of Diversification
Covariance Matrix Applications
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Principal Component Analysis
Techniques for studying correlation and covariance structure
Foundation of High-Dimensional Data Visualization
Summarized by Soo-Jin Kim
A Simple Method to Extract Fuzzy Rules by Measure of Fuzziness Jieh-Ren Chang Nai-Jian Wang.
Chapter 7 Multivariate techniques with text Parallel embedded system design lab 이청용.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
CS623: Introduction to Computing with Neural Nets (lecture-16) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Principal Component Analysis (PCA)
An Image Database Retrieval Scheme Based Upon Multivariate Analysis and Data Mining Presented by C.C. Chang Dept. of Computer Science and Information.
LECTURE 10: DISCRIMINANT ANALYSIS
Principal Component Analysis (PCA)
Principal Component Analysis and Linear Discriminant Analysis
Techniques for studying correlation and covariance structure
Introduction PCA (Principal Component Analysis) Characteristics:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Feature space tansformation methods
LECTURE 09: DISCRIMINANT ANALYSIS
Principal Component Analysis
Presentation transcript:

Principal Component Analysis and Linear Discriminant Analysis Chaur-Chin Chen Institute of Information Systems and Applications National Tsing Hua University Hsinchu 30013, Taiwan E-mail: cchen@cs.nthu.edu.tw

Outline ◇ Motivation for PCA ◇ Problem Statement for PCA ◇ The Solution and Practical Computations ◇ Examples and Undesired Results ◇ Fundamentals of LDA ◇ Discriminant Analysis ◇ Practical Computations ◇ Examples and Comparison with PCA

Motivation Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are multivariate statistical techniques that are often useful in reducing dimensionality of a collection of unstructured random variables for analysis and interpretation.

Problem Statement • Let X be an m-dimensional random vector with the covariance matrix C. The problem is to consecutively find the unit vectors a1, a2, . . . , am such that yi= xt ai with Yi = Xt ai satisfies 1. var(Y1) is the maximum. 2. var(Y2) is the maximum subject to cov(Y2, Y1)=0. 3. var(Yk) is the maximum subject to cov(Yk, Yi)=0, where k = 3, 4, · · ·,m and k > i. • Yi is called the i-th principal component • Feature extraction by PCA is called PCP

The Solutions Let (λi, ui) be the pairs of eigenvalues and eigenvectors of the covariance matrix C such that λ1 ≥ λ2 ≥ . . . ≥ λm ( ≥0 ) and ∥ui ∥2 = 1, ∀ 1 ≤ i ≤ m. Then ai = ui and var(Yi)=λi for 1 ≤ i ≤ m.

Computations Given n observations x1, x2, . . . , xn of m-dimensional column vectors 1. Compute the mean vector μ = (x1+x2+. . . +xn )/n 2. Compute the covariance matrix by MLE C = (1/n) Σi=1n (xi − μ)(xi − μ)t 3. Compute the eigenvalue/eigenvector pairs (λi, ui) of C with λ1 ≥ λ2 ≥ . . . ≥ λm ( ≥0 ) 4. Compute the first d principal components yi(j) = xit uj, for each observation xi, 1 ≤ i ≤ n, along the direction uj , j = 1, 2, · · · , d. 5. (λ1 +λ2 + . . . + λd)/ (λ1 +λ2 + . . .+ λd + . . .+ λm) > 85%

An Example for Computations x1 =[3.03, 2.82]t x2 =[0.88, 3.49]t x3 =[3.26, 2.46]t x4 =[2.66, 2.79]t x5 =[1.93, 2.62]t x6 =[4.80, 2.18]t x7 =[4.78, 0.97]t X8 =[4.69, 2.94]t X9 =[5.02, 2.30]t x10 =[5.05, 2.13]t μ =[3.61, 2.37]t C = 1.9650 -0.4912 -0.4912 0.4247 λ1 =2.1083 λ2 =0.2814 u1 =[0.9600, -0.2801]t u2=[0.2801, 0.9600]t

Results of Principal Projection

Examples 1. 8OX data set 2. IMOX data set 8: [11, 3, 2, 3, 10, 3, 2, 4] The 8OX data set is derived from the Munson’s hand printed Fortran character set. Included are 15 patterns from each of the characters ‘8’, ‘O’, ‘X’. Each pattern consists of 8 feature measurements. O: [4, 5, 2, 3, 4, 6, 3, 6] The IMOX data set contains 8 feature measurements on each character of ‘I’, ‘M’, ‘O’, ‘X’. It contains 192 patterns, 48 in each character. This data set is also derived from the Munson’s database.

First and Second PCP for data8OX

Third and Fourth PCP for data8OX

First and Second PCP for dataIMOX

Description of datairis □ The datairis.txt data set contains the measurements of three species of iris flowers (setosa, verginica, versicolor). □ It consists of 50 patterns from each species on each of 4 features (sepal length, sepal width, petal length, petal width). □ This data set is frequently used as an example for clustering and classification.

First and Second PCP for datairis

Example that PCP is Not Working PCP works as expected PCP is not working as expected

Fundamentals of LDA Given the training patterns x1, x2, . . . , xn from K categories, where n1 + n2 + … + nK = n of m-dimensional column vectors. Let the between-class scatter matrix B, the within-class scatter matrix W, and the total scatter matrix T be defined below. 1. The sample mean vector u= (x1+x2+. . . +xn )/n 2. The mean vector of category i is denoted as ui 3. The between-class scatter matrix B= Σi=1K ni(ui − u)(ui − u)t 4. The within-class scatter matrix W= Σi=1K Σx in ωi(x-ui )(x-ui )t 5. The total scatter matrix T =Σi=1n (xi - u)(xi - u)t Then T= B+W

Fisher’s Discriminant Ratio Linear discriminant analysis for a dichotomous problem attempts to find an optimal direction w for projection which maximizes a Fisher’s discriminant ratio J(w) = The optimization problem is reduced to solving the generalized eigenvalue/eigenvector problem Bw= λ Ww by letting (n=n1n2) Similarly, for multiclass (more than 2 classes) problems, the objective is to find the first few vectors for discriminating points in different categories which is also based on optimizing J2(w) or solving Bw= λ Ww for the eigenvectors associated with few largest eigenvalues.

Fundamentals of LDA

LDA and PCA on data8OX LDA on data8OX PCA on data8OX

LDA and PCA on dataimox LDA on dataimox PCA on dataimox

LDA and PCA on datairis LDA on datairis PCA on datairis

Projection of First 3 Principal Components for data8OX

pca8OX.m fin=fopen('data8OX.txt','r'); d=8+1; N=45; % d features, N patterns fgetl(fin); fgetl(fin); fgetl(fin); % skip 3 lines A=fscanf(fin,'%f',[d N]); A=A'; % read data X=A(:,1:d-1); % remove the last columns k=3; Y=PCA(X,k); % better Matlab code X1=Y(1:15,1); Y1=Y(1:15,2); Z1=Y(1:15,1); X2=Y(16:30,1); Y2=Y(16:30,2); Z2=Y(16:30,2); X3=Y(31:45,1); Y3=Y(31:45,2); Z3=Y(31:45,3); plot3(X1,Y1,Z1,'d',X2,Y2,Z2,'O',X3,Y3,Z3,'X', 'markersize',12); grid axis([4 24, -2 18, -10,25]); legend('8','O','X') title('First Three Principal Component Projection for 8OX Data‘)

PCA.m % Script file: PCA.m % Find the first K Principal Components of data X % X contains n pattern vectors with d features function Y=PCA(X,K) [n,d]=size(X); C=cov(X); [U D]=eig(C); L=diag(D); [sorted index]=sort(L,'descend'); Xproj=zeros(d,K); % initiate a projection matrix for j=1:K Xproj(:,j)=U(:,index(j)); end Y=X*Xproj; % first K principal components