Principle Component Analysis and its use in MA clustering Lecture 12.

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

Dimension reduction (1)
PCA + SVD.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
An introduction to Principal Component Analysis (PCA)
Psychology 202b Advanced Psychological Statistics, II April 7, 2011.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
Principal Component Analysis
Principal component analysis (PCA)
CHAPTER 19 Correspondence Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Principal component analysis (PCA)
Principal component analysis (PCA) Purpose of PCA Covariance and correlation matrices PCA using eigenvalues PCA using singular value decompositions Selection.
Tables, Figures, and Equations
Techniques for studying correlation and covariance structure
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Principles of the Global Positioning System Lecture 11 Prof. Thomas Herring Room A;
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Summarized by Soo-Jin Kim
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Chapter 3 Data Exploration and Dimension Reduction 1.
DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering Maastricht University.
III. Multi-Dimensional Random Variables and Application in Vector Quantization.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Lecture 12 Factor Analysis.
CpSc 881: Machine Learning PCA and MDS. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.
III. Multi-Dimensional Random Variables and Application in Vector Quantization.
Matrix Notation for Representing Vectors
Exploratory Factor Analysis. Principal components analysis seeks linear combinations that best capture the variation in the original variables. Factor.
Principal Component Analysis (PCA)
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Principle Components Analysis A method for data reduction.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Feature Extraction 主講人:虞台文.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
 1 More Mathematics: Finding Minimum. Numerical Optimization Find the minimum of If a given function is continuous and differentiable, find the root.
Unsupervised Learning II Feature Extraction
Exploratory Factor Analysis
PREDICT 422: Practical Machine Learning
Exploring Microarray data
Principal Component Analysis (PCA)
Principal Component Analysis
Numerical Analysis Lecture 16.
Principal Component Analysis
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
Principal Components Analysis
Feature space tansformation methods
Principles of the Global Positioning System Lecture 11
Principal Components What matters most?.
Principal Component Analysis
Lecture 8: Factor analysis (FA)
Factor Analysis.
Presentation transcript:

Principle Component Analysis and its use in MA clustering Lecture 12

What is PCA? This is a MATHEMATICAL procedure that transforms a set of correlated responses into a smaller set of uncorrelated variables called PRINCIPAL COMPONENTS. Uses: Data screening Clustering Discriminant Analysis Regression combating Multicollinearity

Objectives of PCA It is an exploratory technique meant to give researchers a better FEEL for their data Reduce dimensionality, rather try to understand the TRUE dimensionality of the data Identify “meaningful” variables If you have a VARIANCE-COVARIANCE MATRIX, S: PCA returns new variables called Principal Components that are: –Uncorrelated –First component explains MOST of the variability –The remaining PC explain decreasing amounts of variability

Idea of PCA Consider x to be a random variable with mean  and Variance given by . The first PC variable is defined by: y 1 = a 1 ’(x-  ), such that a1 is chosen so that the VAR(a 1 ’(x-  )) is maximized for all vectors a1 satisfying a 1 ’a 1 =1 It can be shown that the maximum value of the variance of a 1 ’(x-  ) among all vectors satisfying the condition is 1 (the first or largest eigen value of the Matrix,  ). This implies a 1 is the eigen vector corresponding to the eigen value 1. The second PC is the eigen vector corresponding to the second largest eigen value 2 and so on to the pth eigen value.

Supplementary Info: What are Eigen Values and Eigen Vectors? Also called characteristic root (latent root) eigen values are the roots of the polynomial equation defined by: |  I| =0 This leads to an equation of form: c 1 p + c 2 p-1 + … c p + c p+1 = 0 If  is symmetric then the eigen values are real numbers and can be ordered.

Supplementary Info: II What are Eigen Vectors? Similarly, eigen vectors are the vectors satisfying the equation:  a  a =0 If  is symmetric then there will be p eigen vectors corresponding to the p eigen values. Generally not unique and are normalized to a j ’a j = 1 Remarks: if two eigen values are NOT equal there eigen vectors will be orthogonal to each other. When two eigen values are equal their eigen vectors are CHOSEN orthogonal to each other (in this case these are non-unique). Tr(  ) =  i |  | =  i

Idea of PCA contd… Hence the p principal components are a 1, a 2 ….a p, the eigen vectors corresponding to the ordered eigen values of . Here, 1  2  …  p. Result: two principal components are uncorrelated if and only if their defining eigen vectors are orthogonal to each other. Hence the PC are placed on a orthogonal axis system where are the data fall.

Idea of PCA contd… The varaince of the jth component is j, j=1,…,p. Remember: tr(  ) =  11 +  22 +…+  pp. Also, tr(  )= …+ p. Hence, often a measure of “importance” of the jth principal component is given by j /tr(  ).

Comments To actually do PCA we need to compute the principal component scores or the values of the principal component variable for each unit in the data set. These scores provide locations of the observations in a data set with respect to the principal component axis. Generally eigen vectors are normalized to length 1, a j ’a j =1. Often to make comparison between eigen values each element in the vector is multiplied by the square root of the corresponding eigen value( called component vectors), c j = ( j 1/2 )a j.

Estimating PC Life would be easy if  and  were known. All we had to do was to estimate the normalized eigen vectors and corresponding eigen values. But, most of the time we DO NOT know  and  and we need to estimate those and hence the PCA are the sample values corresponding to the estimated  and . Determining the # of PC: –Look for the eigen values that are much smaller than the others. Plots like SCREE plot (plot of eigen value versus the eigen number)

Caveats The whole idea of PCA is to transform a set of correlated variables to a set of uncorrelated variables, hence if the data are already uncorrelated, not much additional advantage of doing PCA. One can do PCA on correlation matrix or the Covariance matrix. In the latter case, the component correlation vectors c j = ( j 1/2 )a j give the correlations between the original variables and the jth principal component variable.

PCA and Multidimensional Scaling Essentially what PCA does is what is called SINGULAR VALUE DECOMPOSITION(SVD) of a matrix X=UDV’ Where X is n by p, with n<<p (in MA) U is n by n D is n by n, diagonal matrix with the diagonals decreasing, d1  d2…  dn. V is a p by n matrix, which rotates X into a new set of co-ordinates. such that XV=UD

SVD and MDS SVD is a VERY memory hungry procedure and especially for MA data when there a large number of genes it is very slow and often needs HUGE amounts of memory to work. Multidimensional Scaling (MDS): is a collection of methods that do not use the full data matrix but rather the distance matrix between the variables. This reduces the computation from n by p to n by n (quite a reduction!).

Sammon Mapping A common method used in MA is SAMMON mapping which aims to find the two-dimensional representation that has the maximum dissimilarity matrix compared to the original one. PCA has the advantage in the sense that it represents the samples in a scatterplot whose axes are made up of a linear combination of the most variable genes. Sammon mapping treats all genes equivalently and hence is a bit “duller” than PCA based clustering.

PCA in Microarrays Useful technique to understand the TRUE dimensionality of the data. Useful for clustering. In R under the MASS package you can use: my.data1=read.table("cluster.csv",header=TRUE,sep=",") princomp(my.data1) myd.sam <- sammon(dist(my.data1)) plot(myd.sam$points, type = "n") text(myd.sam$points, labels =as.character(1:nrow(my.data1)))