Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.

Slides:



Advertisements
Similar presentations
Covariance Matrix Applications
Advertisements

1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
An introduction to Principal Component Analysis (PCA)
Eigenvalues and eigenvectors
Psychology 202b Advanced Psychological Statistics, II January 25, 2011.
Principal Component Analysis
Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.
CHAPTER 19 Correspondence Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Introduction Given a Matrix of distances D, (which contains zeros in the main diagonal and is squared and symmetric), find variables which could be able,
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Principal Component Analysis Principles and Application.
Tables, Figures, and Equations
Stats & Linear Models.
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Separate multivariate observations
Summarized by Soo-Jin Kim
Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.
Chapter 2 Dimensionality Reduction. Linear Methods
Presented By Wanchen Lu 2/25/2013
Some matrix stuff.
Eigen Decomposition Based on the slides by Mani Thomas Modified and extended by Longin Jan Latecki.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Linear algebra: matrix Eigen-value Problems
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
1 Sample Geometry and Random Sampling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
Chapter 7 Multivariate techniques with text Parallel embedded system design lab 이청용.
Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.
Neural Computation Prof. Nathan Intrator
Introduction to Linear Algebra Mark Goldman Emily Mackevicius.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principle Component Analysis and its use in MA clustering Lecture 12.
Lecture 6 Ordination Ordination contains a number of techniques to classify data according to predefined standards. The simplest ordination technique is.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Reduced echelon form Matrix equations Null space Range Determinant Invertibility Similar matrices Eigenvalues Eigenvectors Diagonabilty Power.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Unsupervised Learning II Feature Extraction
Introduction to Vectors and Matrices
PREDICT 422: Practical Machine Learning
Principal Component Analysis
Exploring Microarray data
Principle Component Analysis (PCA) Networks (§ 5.8)
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Principal Component Analysis (PCA)
Principal Components Analysis
Principal Component Analysis
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Principal Component Analysis
PCA is “an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the.
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Matrix Algebra and Random Vectors
X.1 Principal component analysis
Principal Components Analysis
Principal Component Analysis (PCA)
Eigen Decomposition Based on the slides by Mani Thomas
Principal Components What matters most?.
Digital Image Processing Lecture 21: Principal Components for Description Prof. Charlene Tsai *Chapter 11.4 of Gonzalez.
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Introduction to Vectors and Matrices
Principal Component Analysis
Eigen Decomposition Based on the slides by Mani Thomas
Principal Components What matters most?.
Presentation transcript:

Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA

Outline Covariance matrix and correlation matrix Matrix determinant Identity matrix Eigenvalues Eigenvectors Principal Component Analysis (PCA) Distance measure Principal Coordinate Analysis (PcoA) Analysis of Molecular Variance (AMOVA)

Covariance matrix and correlation matrix X Y X 121050272 170833020 Y 170833020 448841482 Variable Number X Y 1 53,047 62,490 2 49,958 58,850 3 41,974 49,445 4 44,366 52,263 5 40,470 47,674 6 36,963 43,542 7 31,474 75,113 8 54,376 72,265 9 60,880 98,675 10 66,774 104,543 Correlation matrix X y X 1.0000000 0.7328961 Y 0.7328961 1.0000000

Matrix determinant The determinant of a matrix A represented by |A| Determinant of 2 x 2 matrix a11 a12 a21 a22 determinant = (a11 X a22) – (a12 x a21) Covariance matrix X Y X 121050272 170833020 Y 170833020 448841482 The determinant of the covariance matrix determinant = (121050272 X 448841482) – (170833020 X 170833020) determinant = 2.51485 x 1016

Matrix determinant The determinant of a matrix A represented by |A| Determinant of 2 x 2 matrix a11 a12 a21 a22 determinant = (a11 X a22) – (a12 x a21) Correlation matrix X y X 1.0000000 0.7328961 Y 0.7328961 1.0000000 The determinant of the correlation matrix determinant = (1.00 X 1.00) – (0.7328961 X 0.7328961) determinant = 0.462863307

Identity matrix Identity matrix is denoted by I is a square matrix with ones on the main (NW - SE) diagonal and zeros elsewhere. 1 0 0 0 1 0 0 0 1 1 0 0 1 I = (2 x 2) I = (3 x 3) 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 I = (4 x 4) 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 I = (5 x 5)

Eigenvalues Let A be a k x k square matrix and I be the k x k identity matrix. Then eigenvalues 𝜆1, 𝜆2, . . . 𝜆k satisfy the polynomial equation given by A - 𝝀I = 0 Characteristics equation For example: A = 1 0 1 3 1 0 1 3 1 0 0 1 1 - 𝝀 - 𝝀 = (1 - 𝝀) = (3 - 𝝀)=𝟎 3 - 𝝀 Eigenvalues are: 𝝀=1, 𝝀 = 3

Eigenvectors Let A be a matrix of dimension k x k and 𝜆 be an eigenvalue of A Eigenvector x of a matrix A associated with eigenvalue 𝜆: Ax = 𝜆x For example: A = , Eigenvectors associated with the eigenvalues are determined by solving the following equations = 1 for 𝝀=1; = 3 for 1 0 1 3 Eigenvalues are: 𝝀=1, 𝝀 = 3 1 0 1 3 x1 x2 x1 x2 1 0 1 3 x1 x2 x1 x2 𝝀 = 3

Eigenvectors Eigenvectors associated with the eigenvalues of 1 and 3 are given below For 𝝀 = 1; eigenvector is -2 1 1 For 𝝀 = 3; eigenvector is

Principal Component Analysis

PCA: Principal Component Analysis PCA is a mathematical procedure that transforms a set of variables into a smaller set of uncorrelated variables called principal components (PCs). These PCs are linear combinations of the original variables and can be thought of as “new” variables. Uses of PCA: a) Data screening (identify outlier) b) Clustering c) Dimension reduction

PCA From k original variables: x1,x2,...,xk: Produce k new variables: y1,y2,...,yk: y1 = a11x1 + a12x2 + ... + a1kxk y2 = a21x1 + a22x2 + ... + a2kxk ... yk = ak1x1 + ak2x2 + ... + akkxk yk's are Principal Components Where: a11 is an eigenvector

PCA Raw data: Original variables Covariance/Correlation Matrix Determinant matrix Eigenvalues Eigenvectors Principal component scores = Original variables x Eigenvectors

PCA for clustering using PC scores Variables Protein M1 M2 M3 Protein_X1 124 99 4.3 Protein_X2 106 67 7.5 Protein_X3 111 90 9.2 Protein_X4 109 Protein_X5 113 112 9.6 Protein_X6 89 72 10.1 Protein_X7 78 7.7 Protein_X8 190 87 6.8 Protein_X9 123 68 7.6 Protein_X10 116 7.8 Protein_X11 Protein_X12 M2 M3 M1 plot PC2 PCA PC1 14

Distance measures

Similarity and Dissimilarity distance Similarity distance measure: Euclidean distance Manhattan distance Dissimilarity distance measure: Jaccard Dice

Euclidean Distance Euclidean Distance is the most common use of distance. In most cases when people said about distance , they will refer to Euclidean distance. Euclidean distance or simply 'distance' examines the root of square differences between coordinates of a pair of objects. Euclidean distance (4,5) (1,1)

Euclidean Distance Euclidean distance Features cost time weight Example: Point A has coordinate (0, 3, 4, 5) and point B has coordinate (7, 6, 3, -1). The Euclidean Distance between point A and B is Euclidean distance Features cost time weight incentive Plant A 3 4 5 Plant B 7 6 -1

Manhattan It is also known as Manhattan distance, boxcar distance, absolute value distance. It examines the absolute differences between coordinates of a pair of objects. Features cost time weight incentive Plant A 3 4 5 Plant B 7 6 -1

Dissimilarity distance Marker1 Marker2 Marker3 Marker4 Marker5 Marker6 Marker7 i 1 j

Genetic distance Marker1 Marker2 Marker3 Marker4 Marker5 Marker6 Sample 1 1 Sample 2 Sample 1 1 Sample 2 Fa=3 Fb=1 Fc=2 Fd=1 N= Fa+Fb+Fc+Fd Simple Match distance = Fa/N= 3/7= 0.43 Genetic distance (Jaccard) = Fa/(Fa+Fb+Fc) = 3/6= 0.5

Distance Matrix Gives the matrix of distances between each pair of elements A B C D E 63 94 111 67 79 96 16 47 83 100 A B C D E

Principal Coordinate Analysis

Principal Coordinate Analysis (PcoA) Raw data: Original variables Distance matrix Determinant matrix Eigenvalues Eigenvectors Principal coordinate scores = Original variables x Eigenvectors

Examples of PcoA plot

Analysis of molecular variance (AMOVA)

Analysis of molecular variance (AMOVA) AMOVA used to detect population differentiation utilizing molecular markers. It uses distance matrix A P-value is calculated by measuring the fraction of 1000 randomizations of the rows and columns in a distance matrix. It is not ANOVA : No requirement of assumption of a normality.

AMOVA An example of AMOVA model This AMOVA model measures gene diversity among populations with specific reference to areas of a region in a continent We have: i = individuals, j = alleles, k = populations

RStudio