Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.

Similar presentations


Presentation on theme: "Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA."— Presentation transcript:

1 Introduction to Statistical Methods for Measuring “Omics” and Field Data
PCA, PcoA, distance measure, AMOVA

2 Outline Covariance matrix and correlation matrix Matrix determinant
Identity matrix Eigenvalues Eigenvectors Principal Component Analysis (PCA) Distance measure Principal Coordinate Analysis (PcoA) Analysis of Molecular Variance (AMOVA)

3 Covariance matrix and correlation matrix
X Y X Y Variable Number X Y 1 53,047 62,490 2 49,958 58,850 3 41,974 49,445 4 44,366 52,263 5 40,470 47,674 6 36,963 43,542 7 31,474 75,113 8 54,376 72,265 9 60,880 98,675 10 66,774 104,543 Correlation matrix X y X Y

4 Matrix determinant The determinant of a matrix A represented by |A|
Determinant of 2 x 2 matrix a11 a12 a21 a22 determinant = (a11 X a22) – (a12 x a21) Covariance matrix X Y X Y The determinant of the covariance matrix determinant = ( X ) – ( X ) determinant = x 1016

5 Matrix determinant The determinant of a matrix A represented by |A|
Determinant of 2 x 2 matrix a11 a12 a21 a22 determinant = (a11 X a22) – (a12 x a21) Correlation matrix X y X Y The determinant of the correlation matrix determinant = (1.00 X 1.00) – ( X ) determinant =

6 Identity matrix Identity matrix is denoted by I is a square matrix with ones on the main (NW - SE) diagonal and zeros elsewhere. I = (2 x 2) I = (3 x 3) I = (4 x 4) I = (5 x 5)

7 Eigenvalues Let A be a k x k square matrix and I be the k x k identity matrix. Then eigenvalues 𝜆1, 𝜆2, 𝜆k satisfy the polynomial equation given by A - 𝝀I = 0 Characteristics equation For example: A = 1 - 𝝀 - 𝝀 = (1 - 𝝀) = (3 - 𝝀)=𝟎 3 - 𝝀 Eigenvalues are: 𝝀=1, 𝝀 = 3

8 Eigenvectors Let A be a matrix of dimension k x k and 𝜆 be an eigenvalue of A Eigenvector x of a matrix A associated with eigenvalue 𝜆: Ax = 𝜆x For example: A = , Eigenvectors associated with the eigenvalues are determined by solving the following equations = for 𝝀=1; = for Eigenvalues are: 𝝀=1, 𝝀 = 3 x1 x2 x1 x2 x1 x2 x1 x2 𝝀 = 3

9 Eigenvectors Eigenvectors associated with the eigenvalues of 1 and 3 are given below For 𝝀 = 1; eigenvector is -2 1 1 For 𝝀 = 3; eigenvector is

10 Principal Component Analysis

11 PCA: Principal Component Analysis
PCA is a mathematical procedure that transforms a set of variables into a smaller set of uncorrelated variables called principal components (PCs). These PCs are linear combinations of the original variables and can be thought of as “new” variables. Uses of PCA: a) Data screening (identify outlier) b) Clustering c) Dimension reduction

12 PCA From k original variables: x1,x2,...,xk:
Produce k new variables: y1,y2,...,yk: y1 = a11x1 + a12x a1kxk y2 = a21x1 + a22x a2kxk ... yk = ak1x1 + ak2x akkxk yk's are Principal Components Where: a11 is an eigenvector

13 PCA Raw data: Original variables Covariance/Correlation Matrix
Determinant matrix Eigenvalues Eigenvectors Principal component scores = Original variables x Eigenvectors

14 PCA for clustering using PC scores
Variables Protein M1 M2 M3 Protein_X1 124 99 4.3 Protein_X2 106 67 7.5 Protein_X3 111 90 9.2 Protein_X4 109 Protein_X5 113 112 9.6 Protein_X6 89 72 10.1 Protein_X7 78 7.7 Protein_X8 190 87 6.8 Protein_X9 123 68 7.6 Protein_X10 116 7.8 Protein_X11 Protein_X12 M2 M3 M1 plot PC2 PCA PC1 14

15 Distance measures

16 Similarity and Dissimilarity distance
Similarity distance measure: Euclidean distance Manhattan distance Dissimilarity distance measure: Jaccard Dice

17 Euclidean Distance Euclidean Distance is the most common use of distance. In most cases when people said about distance , they will refer to Euclidean distance. Euclidean distance or simply 'distance' examines the root of square differences between coordinates of a pair of objects. Euclidean distance (4,5) (1,1)

18 Euclidean Distance Euclidean distance Features cost time weight
Example: Point A has coordinate (0, 3, 4, 5) and point B has coordinate (7, 6, 3, -1). The Euclidean Distance between point A and B is Euclidean distance Features cost time weight incentive Plant A 3 4 5 Plant B 7 6 -1

19 Manhattan It is also known as Manhattan distance, boxcar distance, absolute value distance. It examines the absolute differences between coordinates of a pair of objects. Features cost time weight incentive Plant A 3 4 5 Plant B 7 6 -1

20 Dissimilarity distance
Marker1 Marker2 Marker3 Marker4 Marker5 Marker6 Marker7 i 1 j

21 Genetic distance Marker1 Marker2 Marker3 Marker4 Marker5 Marker6
Sample 1 1 Sample 2 Sample 1 1 Sample 2 Fa=3 Fb=1 Fc=2 Fd=1 N= Fa+Fb+Fc+Fd Simple Match distance = Fa/N= 3/7= 0.43 Genetic distance (Jaccard) = Fa/(Fa+Fb+Fc) = 3/6= 0.5

22 Distance Matrix Gives the matrix of distances between each pair of elements A B C D E 63 94 111 67 79 96 16 47 83 100 A B C D E

23 Principal Coordinate Analysis

24 Principal Coordinate Analysis (PcoA)
Raw data: Original variables Distance matrix Determinant matrix Eigenvalues Eigenvectors Principal coordinate scores = Original variables x Eigenvectors

25 Examples of PcoA plot

26 Analysis of molecular variance (AMOVA)

27 Analysis of molecular variance (AMOVA)
AMOVA used to detect population differentiation utilizing molecular markers. It uses distance matrix A P-value is calculated by measuring the fraction of 1000 randomizations of the rows and columns in a distance matrix. It is not ANOVA : No requirement of assumption of a normality.

28 AMOVA An example of AMOVA model This AMOVA model measures gene diversity among populations with specific reference to areas of a region in a continent We have: i = individuals, j = alleles, k = populations

29 RStudio


Download ppt "Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA."

Similar presentations


Ads by Google