What is it? Principal Component Analysis (PCA) is a standard tool in multivariate analysis for examining multidimensional data To reveal patterns between.

Slides:

Advertisements

Similar presentations

EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.

Advertisements

Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.

An Introduction to Multivariate Analysis

Tensors and Component Analysis Musawir Ali. Tensor: Generalization of an n-dimensional array Vector: order-1 tensor Matrix: order-2 tensor Order-3 tensor.

Machine Learning Lecture 8 Data Processing and Representation

Dimension reduction (1)

Maximum Covariance Analysis Canonical Correlation Analysis.

1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.

Lecture 7: Principal component analysis (PCA)

1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.

Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.

Principal Component Analysis

Principal component analysis (PCA)

Face Recognition Jeremy Wyatt.

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

10/17/071 Read: Ch. 15, GSF Comparing Ecological Communities Part Two: Ordination.

Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.

Principal component analysis (PCA)

Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #18.

Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.

A Brief Introduction to Statistical Forecasting Kevin Werner.

The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.

Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

Chapter 2 Dimensionality Reduction. Linear Methods

Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.

Extensions of PCA and Related Tools

Some matrix stuff.

Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #19.

Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.

Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.

Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)

Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.

Carlos H. R. Lima - Depto. of Civil and Environmental Engineering, University of Brasilia. Brazil. Upmanu Lall - Water Center, Columbia.

Lecture 12 Factor Analysis.

Christina Bonfanti University of Miami- RSMAS MPO 524.

EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.

Principle Component Analysis and its use in MA clustering Lecture 12.

Principal Component Analysis (PCA)

Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.

Feature Extraction 主講人：虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.

Principal Component Analysis

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)

Feature Extraction 主講人：虞台文.

Chapter 13 Discrete Image Transforms

The Principal Components Regression Method David C. Garen, Ph.D. Hydrologist USDA Natural Resources Conservation Service National Water and Climate Center.

Principal Components Analysis ( PCA)

Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *

Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.

Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.

Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.

Unsupervised Learning II Feature Extraction

Methods of multivariate analysis Ing. Jozef Palkovič, PhD.

Principal Component Analysis (PCA)

Unsupervised Learning

Exploring Microarray data

Principal Component Analysis (PCA)

Dimension Reduction via PCA (Principal Component Analysis)

Principal Component Analysis

Principal Component Analysis

Descriptive Statistics vs. Factor Analysis

Principal Components Analysis

Principal Component Analysis (PCA)

Feature space tansformation methods

Principal Component Analysis

Seasonal Forecasting Using the Climate Predictability Tool

Principal Component Analysis (PCA)

Unsupervised Learning

Presentation transcript:

What is it? Principal Component Analysis (PCA) is a standard tool in multivariate analysis for examining multidimensional data To reveal patterns between objects that would not be apparent in a univariate analysis What is the goal of PCA? PCA reduces a correlated dataset (values of variables {x1, x2, …, xp}) to a dataset containing fewer new variables by axis rotation The new variables are linear combinations of the original ones and are uncorrelated The PCs are the new variables (or axes) which summarize several of the original variables If nonzero correlation exists among the variables of the data set, then it is possible to determine a more compact description of the data, which amounts to finding the dominant modes in the data. Principal Component Analysis

PCA can explain most of the variability of the original dataset in a few new variables (if data are well correlated) Correlation introduces redundancy (if two variables are perfectly correlated, then one of them is redundant because if we know x, we know y) PCA exploits this redundancy in multivariate data to pick out patterns and relationships in the variables and reduce the dimensionality of the dataset without significant loss of information How does it work?

Types of data we can use PCA on: Basically anything! Usually we have multiple variables (may be different locations of the same variables, or different variables) and samples or replicates of these variables (e.g. samples taken at different times, or data relating to different subjects or locations) Very useful in the geosciences where data are generally well correlated (across variables, across space) What can it be used for? Exploratory data analysis Detection of outliers Identification of clusters (grouping, regionalization) Reduction of variables (data pre-processing, multivariate modeling) Data compression (lossy!) Analysis of variability in space and time New interpretation of the data (in terms of the main components of variability) Forecasting (finding relationships between variables) What do we use it for?

Geosciences in general Rock type identification Remote sensing retrievals Classification of land use Hydrology, water quality and ecology Regionalization (e.g. drought and flood regimes) Analysis of water quality and relationships with hydrology Relationships between species richness with morphological and hydrological parameters (lake size, land use, hydraulic residence time) Relationships between hydrology/soils and vegetation patterns Contamination source identification Atmospheric science and climate analysis Weather typing and classification Identification of major modes of variability Teleconnection patterns The Hockey stick plot of global temperature record (example of perceived misuse of PCA – actually NOT!) Others Bioinformatics Gene expressions analysis Image processing and pattern recognition Data compression Some Examples from the Literature

Example: Image Processing

First principal component of October – December sea-surface temperatures in the Tropical Pacific. Example: Analysis of variability in Climate Data El Nino episodes Example taken from Climate Prediction Tool documentation, Simon Mason, IRI

Estimation of a reduced set of independent chemical groups with similar physical–chemical behavior. To reduce the dimension of environmental data, to map contaminant similarity groups and for source identification. Dimension reduction refers to finding a small number of statistically independent, physically plausible chemical groups that explain a significant fraction of the overall variance in the data set. Clustering of the chemical species data in PCA space: (a) scatterplot of all data; (b) primary and transition groups; (c) primary groups only. Example: Classification of Chemical Species

Principal components analysis for water quality parameters for three creeks in Georgia. The top panel highlights differences between streams. The bottom panel is the same PCA highlighting differences between hydrologic seasons. Example: Effects of Flooding and Drought on Water Quality in Gulf Coastal Plain Streams in Georgia

The North Atlantic Oscillation (NAO) is major mode of climate variability in the NH NAO is defined as either: 1 ST PC (leading mode of variability) of SLP in the North Atlantic, OR difference in pressure between north (Iceland) and south (Azores) of North Atlantic Advantage of PCA is that it takes into account the changing pattern of SLP across the Atlantic and ignores day-to-day variability of the weather. Example: Identifying Modes of Climate Variability Positive phase of the NAO: bad weather in the Med; good weather in N. Europe and Eastern USA Negative phase of the NAO: good weather in the Med; bad weather in N. Europe and Eastern USA

How do we analyze a global dataset of soil moisture with 50-yrs of data at grid points? How does it vary in time and space? What are the main modes of variability? Can we relate these to physical phenomena? Example: Global soil moisture variability

Understanding PCA: an simple example A cluster of data in 3-D. These could be for three variables (X, Y, Z) or three locations (X, Y, Z). Often the variables are correlated and show more variance in certain directions

Understanding PCA: an example The first principal component (PC1) is the direction along which there is the largest variation This is equivalent to axis rotation and expressing the data in a new coordinate system

Understanding PCA: an example The second PC (PC2) is in the direction uncorrelated to the first component which along which the data shows the largest variation Looking down the barrel of PC1

Understanding PCA: an example PC1 and PC2 are uncorrelated or orthogonal The result:

Given set of exam scores of a group of 14 people, how can we best summarize the scores? One objective of summarizing the scores would be to distinguish the good students from the bad students. Principal components are ideal for obtaining such summaries, but they can also provide other informative summaries … Understanding PCA: a REAL example Example taken from Climate Prediction Tool documentation, Simon Mason, IRI

Each subject is a variable (like one grid point), each person is a case, or sample (like one year). Input Data

Loading weights (left), amplitudes (right) PC1 – positive loadings on all exams. This distinguishes good from bad students. Amplitudes (projection of data onto PC1) are shown at right; good students have positive scores) The first PC

PC2 – oppositely signed loadings on physical vs. social sciences. This distinguishes physical scientists from social scientists. Again, amplitudes (projection of data onto PC2) are shown at right; physical scientists have positive score). The second PC Loading weights (left), amplitudes (right)

How to do PCA? Step 1: Organize the data (what are the variables, what are the samples?) Step 2: Calculate the covariance matrix (how do the variables co-vary?) Step 3: Calculate the eigenvectors and eigenvalues of the covariance matrix Step 4: Calculate the PCs (project the data onto the eigenvectors) Step 5: Choose a subset of PCs Step 6: Interpretation, data reconstruction, data compression, plotting, etc…

Perhaps you have some observations of several variables at one instant in time but you may have many samples or realizations of these variables taken at different times. For example: daily temperature and precipitation at a location for one year (2 by 365 matrix: 2 variables measured 365 times) 12 chemical species measured in 10 streams. This would be (12 x 10) or (10 x 12). Or soil moisture at 25 stations for 31 days (25 x 31: 25 variables or locations measured 31 times) In general, the data could be: 1) A space-time array: Measurements of a single variable at M locations taken at N different times, where M and N are integers. 2) A parameter-time array: Measurements of M variables (e.g. temperature, pressure, relative humidity, rainfall,...) taken at one location at N times. 3) A parameter-space array: Measurements of M variables taken at N different locations at a single time. and so on… Step 1: Organize the Data

Step 2: Covariance Matrix x1x1 x2x2 x3x3 …xMxM x1x1 cov(1,1)cov(1,2)cov(1,3)cov(1,M) x2x2 cov(2,1)cov(2,2)cov(2,3)cov(2,M) x3x3 cov(3,1)cov(3,2)cov(3,3)cov(3,M) … xMxM cov(M,1)cov(M,2)cov(M,3)cov(M.M) M = number of variables, N = number of samples

Step 2: Correlation Matrix (a normalized cov matrix) x1x1 x2x2 x3x3 …xMxM x1x1 corr(1,1)corr(1,2)corr(1,3)corr(1,M) x2x2 corr(2,1)corr(2,2)corr(2,3)corr(2,M) x3x3 corr(3,1)corr(3,2)corr(3,3)corr(3,M) … xMxM corr(M,1)corr(M,2)corr(M,3)corr(M.M) Note that on the diagonal, Corr(i,i) = variance / (sd * sd) = 1

Step 3: How to find PCs of the Cov/Corr Matrix? Principal components (PC 1, PC 2 …) are derived via successive multiple regression: use covariance/correlation matrices to look for association between variables derive a linear equation that summarizes the variation in the data with respect to the multiple variables, i.e. a multiple regression repeat as necessary (up to PC N, where N is the number of variables) The PC can be written in general as a linear combination (multiple linear regressions) of the original variables: z = a 1 X 1 + a 2 X 2 + a 2 X 2 + … + a M X M where X 1 … X M are the original variables and a 1 …a M (the loadings) are coefficients that reflect how much each variables contributes to the component. A PC ‘value’ (projection of the data into PC space) is possible for every sample in the dataset: z(t) = a 1 X 1 (t) + a 2 X 2 (t) + a 2 X 2 (t) + … + a M X M (t) where t is a given sample (e.g. a time step) and X 1 (t) … X M (t) are the original values of the variables for sample t.

Step 3: Coefficients of Linear Regression are Eigenvectors of Cov Matrix Eigenvector: a list showing how much each original variable contributes to the PC (i.e. the coefficients from the PC equation) one eigenvector for every PC Eigenvalue: a single number that quantifies the amount of the original variance that is explained by a component one eigenvalue for every component/eigenvector PCA is the eigenvalue analysis of a covariance/correlation (dispersion) matrix

Step 3: Eigen Analysis or Eigen Decomposition Any symmetric matrix A can be decomposed in the following way through an eigen analysis or eigen decomposition. where λ is an eigenvalue (scalar) and e i is an eigenvector, and E is the matrix of eigenvectors and L is the diagonal matrix of eigenvalues. This can also be written as Each e i will have dimension (Mx1) and E will have dimension MxM (the number of variables). We usually require the eigenvectors to have unit length and thus the product of an eigenvector with itself is 1 and the eigenvectors are mutually orthogonal (orthonormal if unit length):

Step 4: Calculate PCs from Eigenvectors Each of the M eigenvectors contains one element related to each of the K variables, x k Each of the PCs is computed from a particular set of observations of the K variables. Geometrically the first eigenvector, e 1, points in the direction in which the data vectors jointly exhibit the most variability The first eigenvector is associated with the largest eigenvalue, λ 1 The second eigenvector is associated with the second largest eigenvalue, λ 2, and is orthogonal to the first eigenvector And so on… The eigenvectors define a new coordinate system in which to view the data. The mth principal component PC m, is the projection of the data vector X onto the mth eigenvector e m :

Step 4: Eigenvector and PC Coefficients Eigenvector: – a list showing how much each original variable contributes to the component (i.e. the coefficients from the component equation) z i = e 1,i X 1 + e 2,i X 2 + e 3,i X 3 + … e M,i X M So E is 123…M x1x1 e 1,1 e 1,2 e 1,3 e 1,M X2X2 e 2,1 e 2,2 e 2,3 e 2,M X3X3 e 3,1 e 3,2 e 3,3 e 3,M … xMxM e M,1 e M,2 e M,3 e M,M Variables Eigenvectors

Step 5: PCs and Eigenvalues The eigenvalues indicate how much variance is explained by each eigenvector The variance of the mth PC is the mth eigenvalue λ m. If you arrange the eigenvector/eigenvalue pairs with the biggest eigenvalues first, then you may be able to explain a large amount of the variance in the original data set with relative few coordinate directions. Each PC represents a share of the total variation in x that is proportional to its eigenvalue If all PCs are used Now we can reconstruct the dataset to some approximation using fewer PCs and thus obtain a compression of the data or a dataset that contains only the important information of the dataset:

2-D example of PCAs