Principal Component Analysis

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

Introduction Principal Component Analysis is the study of the underlying dimensionality of data sets. Often data sets will have many variables, many of.
Dimension reduction (1)
PCA + SVD.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Lecture 7: Principal component analysis (PCA)
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
An introduction to Principal Component Analysis (PCA)
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
Principal Component Analysis
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Principal component analysis (PCA)
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Canonical correlations
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Principal component analysis (PCA)
Principal component analysis (PCA) Purpose of PCA Covariance and correlation matrices PCA using eigenvalues PCA using singular value decompositions Selection.
Techniques for studying correlation and covariance structure
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #18.
Correlation. The sample covariance matrix: where.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Summarized by Soo-Jin Kim
Principle Component Analysis Presented by: Sabbir Ahmed Roll: FH-227.
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
The Multiple Correlation Coefficient. has (p +1)-variate Normal distribution with mean vector and Covariance matrix We are interested if the variable.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #19.
DATA MINING Ronald Westra Dep. Mathematics Knowledge Engineering
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Chapter 9 Factor Analysis
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Chapter 7 Multivariate techniques with text Parallel embedded system design lab 이청용.
1 Matrix Algebra and Random Vectors Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Lecture 12 Factor Analysis.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
Principle Component Analysis and its use in MA clustering Lecture 12.
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Feature Extraction 主講人:虞台文.
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Unsupervised Learning II Feature Extraction
Unsupervised Learning II Feature Extraction
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Estimating standard error using bootstrap
Introduction to Vectors and Matrices
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Principle Component Analysis (PCA) Networks (§ 5.8)
LECTURE 10: DISCRIMINANT ANALYSIS
9.3 Filtered delay embeddings
Principal Component Analysis (PCA)
Principal Component Analysis
Techniques for studying correlation and covariance structure
Principal Component Analysis
Descriptive Statistics vs. Factor Analysis
Matrix Algebra and Random Vectors
Principal Component Analysis (PCA)
Generally Discriminant Analysis
Principal Components What matters most?.
LECTURE 09: DISCRIMINANT ANALYSIS
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Introduction to Vectors and Matrices
Principal Component Analysis
Presentation transcript:

Principal Component Analysis an introduction to Principal Component Analysis (PCA)

abstract Principal component analysis (PCA) is a technique that is useful for the compression and classification of data. The purpose is to reduce the dimensionality of a data set (sample) by finding a new set of variables, smaller than the original set of variables, that nonetheless retains most of the sample's information. By information we mean the variation present in the sample, given by the correlations between the original variables. The new variables, called principal components (PCs), are uncorrelated, and are ordered by the fraction of the total information each retains.

overview geometric picture of PCs algebraic definition and derivation of PCs usage of PCA astronomical application

Geometric picture of principal components (PCs) A sample of n observations in the 2-D space Goal: to account for the variation in a sample in as few variables as possible, to some accuracy

Geometric picture of principal components (PCs) the 1st PC is a minimum distance fit to a line in space the 2nd PC is a minimum distance fit to a line in the plane perpendicular to the 1st PC PCs are a series of linear least squares fits to a sample, each orthogonal to all the previous.

Algebraic definition of PCs Given a sample of n observations on a vector of p variables define the first principal component of the sample by the linear transformation λ where the vector is chosen such that is maximum

Algebraic definition of PCs Likewise, define the kth PC of the sample by the linear transformation λ where the vector is chosen such that is maximum subject to and to

Algebraic derivation of coefficient vectors To find first note that where is the covariance matrix for the variables

Algebraic derivation of coefficient vectors To find maximize subject to Let λ be a Lagrange multiplier then maximize by differentiating… therefore is an eigenvector of corresponding to eigenvalue

Algebraic derivation of We have maximized So is the largest eigenvalue of The first PC retains the greatest amount of variation in the sample.

Algebraic derivation of coefficient vectors To find the next coefficient vector maximize subject to and to First note that then let λ and φ be Lagrange multipliers, and maximize

Algebraic derivation of coefficient vectors We find that is also an eigenvector of whose eigenvalue is the second largest. In general The kth largest eigenvalue of is the variance of the kth PC. The kth PC retains the kth greatest fraction of the variation in the sample.

Algebraic formulation of PCA Given a sample of n observations on a vector of p variables define a vector of p PCs according to where is an orthogonal p x p matrix whose kth column is the kth eigenvector of Then is the covariance matrix of the PCs, being diagonal with elements

usage of PCA: Probability distribution for sample PCs If (i) the n observations of in the sample are independent & is drawn from an underlying population that follows a p-variate normal (Gaussian) distribution with known covariance matrix then where is the Wishart distribution else utilize a bootstrap approximation

usage of PCA: Probability distribution for sample PCs If (i) follows a Wishart distribution & the population eigenvalues are all distinct then the following results hold as all the are independent of all the and are jointly normally distributed (a tilde denotes a population quantity)

usage of PCA: Probability distribution for sample PCs and (a tilde denotes a population quantity)

usage of PCA: Inference about population PCs If follows a p-variate normal distribution then analytic expressions exist* for MLE’s of , , and confidence intervals for and hypothesis testing for and else bootstrap and jackknife approximations exist *see references, esp. Jolliffe

usage of PCA: Practical computation of PCs In general it is useful to define standardized variables by If the are each measured about their sample mean then the covariance matrix of will be equal to the correlation matrix of and the PCs will be dimensionless

usage of PCA: Practical computation of PCs Given a sample of n observations on a vector of p variables (each measured about its sample mean) compute the covariance matrix where is the n x p matrix whose ith row is the ith obsv. Then compute the n x p matrix whose ith row is the PC score for the ith observation.

usage of PCA: Practical computation of PCs Write to decompose each observation into PCs

usage of PCA: Data compression Because the kth PC retains the kth greatest fraction of the variation we can approximate each observation by truncating the sum at the first m < p PCs

usage of PCA: Data compression Reduce the dimensionality of the data from p to m < p by approximating where is the n x m portion of and is the p x m portion of

astronomical application: PCs for elliptical galaxies Rotating to PC in BT – Σ space improves Faber-Jackson relation as a distance indicator Dressler, et al. 1987

astronomical application: Eigenspectra (KL transform) Connolly, et al. 1995

references Connolly, and Szalay, et al., “Spectral Classification of Galaxies: An Orthogonal Approach”, AJ, 110, 1071-1082, 1995. Dressler, et al., “Spectroscopy and Photometry of Elliptical Galaxies. I. A New Distance Estimator”, ApJ, 313, 42-58, 1987. Efstathiou, G., and Fall, S.M., “Multivariate analysis of elliptical galaxies”, MNRAS, 206, 453-464, 1984. Johnston, D.E., et al., “SDSS J0903+5028: A New Gravitational Lens”, AJ, 126, 2281-2290, 2003. Jolliffe, Ian T., 2002, Principal Component Analysis (Springer-Verlag New York, Secaucus, NJ). Lupton, R., 1993, Statistics In Theory and Practice (Princeton University Press, Princeton, NJ). Murtagh, F., and Heck, A., Multivariate Data Analysis (D. Reidel Publishing Company, Dordrecht, Holland). Yip, C.W., and Szalay, A.S., et al., “Distributions of Galaxy Spectral Types in the SDSS”, AJ, 128, 585-609, 2004.