X.1 Principal component analysis

Slides:



Advertisements
Similar presentations
Covariance Matrix Applications
Advertisements

Component Analysis (Review)
Machine Learning Lecture 8 Data Processing and Representation
Dimension reduction (1)
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
The rank of a product of two matrices X and Y is equal to the smallest of the rank of X and Y: Rank (X Y) =min (rank (X), rank (Y)) A = C S.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 9(b) Principal Components Analysis Martin Russell.
CHAPTER 19 Correspondence Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Face Recognition Jeremy Wyatt.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Bayesian belief networks 2. PCA and ICA
Summarized by Soo-Jin Kim
Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
CSE 185 Introduction to Computer Vision Face Recognition.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Discriminant Analysis
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principal Component Analysis (PCA)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Feature Extraction 主講人:虞台文.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Principal Components Analysis ( PCA)
Unsupervised Learning II Feature Extraction
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Unsupervised Learning II Feature Extraction
CSSE463: Image Recognition Day 27
CSSE463: Image Recognition Day 26
Background on Classification
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Principle Component Analysis (PCA) Networks (§ 5.8)
University of Ioannina
LECTURE 10: DISCRIMINANT ANALYSIS
9.3 Filtered delay embeddings
Lecture 8:Eigenfaces and Shared Features
Matrices and Vectors Review Objective
Principal Component Analysis (PCA)
Principal Component Analysis
Bayesian belief networks 2. PCA and ICA
Principal Component Analysis
PCA is “an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the.
Descriptive Statistics vs. Factor Analysis
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
Recitation: SVD and dimensionality reduction
X.3 Linear Discriminant Analysis: C-Class
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
5.2 Least-Squares Fit to a Straight Line
5.4 General Linear Least-Squares
X.6 Non-Negative Matrix Factorization
Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors.
CSSE463: Image Recognition Day 25
Feature space tansformation methods
X.2 Linear Discriminant Analysis: 2-Class
CS4670: Intro to Computer Vision
Eigen Decomposition Based on the slides by Mani Thomas
Principal Components What matters most?.
LECTURE 09: DISCRIMINANT ANALYSIS
CSSE463: Image Recognition Day 25
Feature Selection Methods
Principal Component Analysis
Presentation transcript:

X.1 Principal component analysis Dimension reduction methods seek to distill information-rich high-dimension data to a few key dimensions that carry the majority of the desired information. In PCA, the algorithm is guided by identifying a few directions that contain the majority of the overall variance within a data set, since these dimensions likely carry the greatest information content within the signal. Note – PCA implicitly assumes that the S/N >1, such that the variations in the measured data due to changes in the signals are greater than variations due to noise. X.1 : 1/9

Limitations of PCA Often, we wish to discriminate between classes using known databases of information (e.g., molecular identification based on compiled Raman or MS data, determination of the origin of complex samples, chemical fingerprinting, etc.). The outcome of the measurement is often: i) assignment of an unknown into a class, ii) and assessment of the confidence of that assignment. PCA is “unsupervised”, in that the algorithm does not utilize class information in dimension reduction. The entire data set, including both samples of known and unknown origin, are treated equally in the analysis. The directions of maximum variance in the data do not necessarily correspond to the directions of maximum class discrimination. 6.7 : 2/9

The 5 key steps in PCA 1. Arrange the data as a matrix D, with each “spectrum” as a column in D. 2. Zero-center the data row by row by subtraction of the mean for each row to generate Dzc. 3. Calculate the variance/covariance matrix, , of the data by  = Dzc·DzcT. 4. Determine the eigenvectors and eigenvalues of . 5. Select the key eigenvectors based on the eigenvalues, which describe the variance of the data projected along that eigenvector. Transform the data to the principal coordinate system. X.1 : 2/9

Example: two-component mixtures Consider the following experiment, in which a sample set of solutions contain mixtures of both A and B. The goal of the experiment is to separate the samples into A-rich groups and B-rich groups. However, neither the identity nor the spectra for A and B are known. Although we cannot obtain them directly from the experiments (without some work), we will assume that the noise-free pure component spectra for A and B are shown below. X.1 : 3/9

Example: two-component mixtures Each spectrum with N data points generated from combinations of A and B together with noise represents a point described by the position vector x in an N-dimensional space. The complete set of spectra represent a set of data points in this high-dimensional surface. To illustrate the dimension problem, let’s consider just two wavelengths from the spectrum and consider the trends in absorbance values from a data set containing varying concentrations of A and B together with measurement noise. Both A and B contribute to absorbance at both 1 and 2. 350 nm 500 nm X.1 : 4/9

Example: two-component mixtures In this two-dimensional case considering just these two wavelengths, the principal axis containing the greatest spread in signal and the greatest ability to separate between the two data sets is not parallel to either wavelength axis. The axis containing the greatest spread in signal is a combination of the two axis, shown by the green line. The blue line is the only other dimension perpendicular to the green axis, and contains mostly noise. X.1 : 5/9

Example: two-component mixtures To find these PCA axes mathematically, we follow the steps in slide 2: subtract the mean from all data points, D, where each column of D is a “spectrum”. calculate α, find the eigenvectors and eigenvalues of , use the eigenvalues to pick the important eigenvectors, project each data vector on to the appropriate eigenvectors. Pool the total data to subtract the mean spectrum. Just select the 2-element spectra to be used in this simple example. X.1 : 6/9

 in two-component mixtures Explicitly evaluate DzcDzcT for wavelengths a and b (2D) with n measurements. Variance about the mean for Da and Db. Covariance between Da and Db. Thus,  is dubbed the variance/covariance matrix. X.1 : 5/9

Eigenvector/eigenvalue relationship The experimental variance s2 along a particular direction w is given by projection of the data onto that coordinate. The projections can be written in matrix notation using the zero-centered matrix Dzc The projections that maximize the variance will correspond to the eigenvectors of the variance/covariance matrix , with eigenvalues proportional to the experimental variance s2. X.1 : 5/9

Example: two-component mixtures If we consider PC0 to be unimportant (containing mostly noise) with the majority of the information content contained within PC1, we can reduce the dimensionality of the problem to 1. Hopefully, this correlates to the amount of “A” and “B” in each sample The same can be done using all wavelength channels of the data instead of just 2, which will extract a wider variance with less relative noise. X.1 : 7/9

Example: two-component mixtures To find these PCA axes mathematically, we follow the steps in slide 2: subtract the mean from all data points, D, where each column of D is a “spectrum”. calculate α, find the eigenvectors and eigenvalues of , use the eigenvalues to pick the important eigenvectors, project each data vector on to the appropriate eigenvectors. X.1 : 6/9