Presentation is loading. Please wait.

Presentation is loading. Please wait.

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics.

Similar presentations


Presentation on theme: "Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics."— Presentation transcript:

1 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics l Univariate versus multivariate statistics l The material of multivariate analysis l Displaying multivariate data l The uses of multivariate statistics l A refresher of matrix algebra l Displaying multivariate data l Univariate versus multivariate statistics l The material of multivariate analysis l Displaying multivariate data l The uses of multivariate statistics l A refresher of matrix algebra l Displaying multivariate data

2 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.2 Multivariate versus univariate statistics l In univariate statistical analysis, we are concerned with analyzing variation in a single random variable. l In multivariate statistical analysis, we are concerned with analyzing variation in several random variables which may or may not be related. l In univariate statistical analysis, we are concerned with analyzing variation in a single random variable. l In multivariate statistical analysis, we are concerned with analyzing variation in several random variables which may or may not be related.

3 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.3 The material of multivariate analysis l Multivariate data consists of a set of measurements (usually related) of P variables X 1, X 2, …, X P on n sample units. l The variables X j may be ratio, ordinal, or nominal. l Multivariate data consists of a set of measurements (usually related) of P variables X 1, X 2, …, X P on n sample units. l The variables X j may be ratio, ordinal, or nominal.

4 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.4 Example 1: Bumpus’ sparrow data l 5 morphological measurements (in mm) of 49 sparrows recovered from a storm in

5 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.5 Example 2: Biodiversity of SE Ontario wetlands l Species richness (number of species) of 5 different taxa in 57 wetlands in southeastern Ontario....

6 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.6 The material of multivariate analysis l In some applications, the measured variables comprise both dependent (X) and independent (Y) variables.

7 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.7 Example 1: Pgi frequencies in California Euphydras editha colonies in relation to environmental factors.

8 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.8 Example 2: Anurans in SE Ontario wetlands in relation to surrounding forest cover and road densities

9 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.9 Multivariate LS estimators l The vector of sample means, variances and covariances is an estimate of the true (“population”) means, variances and covariances. l As such, inferences to the latter based on the former assume random sampling. l The vector of sample means, variances and covariances is an estimate of the true (“population”) means, variances and covariances. l As such, inferences to the latter based on the former assume random sampling. Population Sample

10 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.10 The sample covariance matrix l The sample covariance matrix is a square matrix whose diagonal elements give the sample variances for each measured variable (s i 2 ), and whose off-diagonal elements are the sample covariances between pairs of variables (c ik ).

11 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.11 A review of matrix algebra l A matrix of size m x n is an array of numbers (either real or complex) with m rows and n columns. l Matrices with one column are column vectors, matrices with one row are row vectors. l A matrix of size m x n is an array of numbers (either real or complex) with m rows and n columns. l Matrices with one column are column vectors, matrices with one row are row vectors.

12 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.12 Special matrices l A zero matrix 0 has all elements equal to zero. l A diagonal matrix T is a square matrix (m = n) with all elements equal to zero except the main diagonal. l An identity matrix I is a diagonal matrix with all diagonal terms equal to zero. l A zero matrix 0 has all elements equal to zero. l A diagonal matrix T is a square matrix (m = n) with all elements equal to zero except the main diagonal. l An identity matrix I is a diagonal matrix with all diagonal terms equal to zero.

13 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.13 Matrix operations l The transpose of a matrix A (A T ) is obtained by interchanging rows and columns. l The transpose of a row vector is a column vector, and the transpose of a column vector is a row vector. l The transpose of a matrix A (A T ) is obtained by interchanging rows and columns. l The transpose of a row vector is a column vector, and the transpose of a column vector is a row vector.

14 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.14 The trace of a matrix l The trace of a matrix A, denoted tr(A), is the sum of the diagonal elements. l The trace is defined only for square matrices. l The trace of a matrix A, denoted tr(A), is the sum of the diagonal elements. l The trace is defined only for square matrices.

15 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.15 Matrix addition and subtraction l Two matrices A and B are conformable for addition if they are of the same size (same numbers of rows and columns). l The resulting matrix A + B (A - B) is obtained by adding (subtracting) individual matrix elements. l Two matrices A and B are conformable for addition if they are of the same size (same numbers of rows and columns). l The resulting matrix A + B (A - B) is obtained by adding (subtracting) individual matrix elements.

16 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.16 Matrix multiplication by a scalar l The multiplication of a matrix A by a scalar k involves multiplying each element of A by k.

17 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.17 Matrix multiplication l Two matrices A (m x n) and B (n x p) are conformable for multiplication (A B) if the number of columns in A equals the number of rows in B. l A B and B A are both defined only when both A and B are square, but even when true, in general A B  B A. l Two matrices A (m x n) and B (n x p) are conformable for multiplication (A B) if the number of columns in A equals the number of rows in B. l A B and B A are both defined only when both A and B are square, but even when true, in general A B  B A.

18 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.18 Matrix inversion l The inverse of a matrix A, denoted A -1, is the matrix solving the matrix equation where I is the identity matrix. l Only square matrices are invertible, and some matrices cannot be inverted (“singular” matrices) l The inverse of a matrix A, denoted A -1, is the matrix solving the matrix equation where I is the identity matrix. l Only square matrices are invertible, and some matrices cannot be inverted (“singular” matrices)

19 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.19 The covariance matrix l A multivariate sample is described by a covariance matrix, whose diagonal elements give the sample variances for each measured variable (s i 2 ), and whose off- diagonal elements are the sample covariances between pairs of variables (c ik ).

20 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.20 Calculating the sample covariance matrix LOLOLO X d  N M M M Q P P P  N M M M Q P P P   N M M M Q P P

21 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.21 The determinant of a matrix: 2 X 2 matrices l The determinant of a matrix A, denoted det(A) or |A|, is a unique number associated with every square matrix. l In multivariate statistics, the determinant of the sample covariance matrix C plays a crucial role in hypothesis testing. l The determinant of a matrix A, denoted det(A) or |A|, is a unique number associated with every square matrix. l In multivariate statistics, the determinant of the sample covariance matrix C plays a crucial role in hypothesis testing.

22 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.22 Matrix inversion and the determinant: 2 X 2 matrices l If a 2 X 2 matrix A is invertible, the elements of its inverse A -1 are obtained by dividing modified elements of A by |A| l Hence, if |A| = 0, the division is undefined and the matrix is non- invertible or singular. l If a 2 X 2 matrix A is invertible, the elements of its inverse A -1 are obtained by dividing modified elements of A by |A| l Hence, if |A| = 0, the division is undefined and the matrix is non- invertible or singular.

23 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.23 Multivariate variance: a geometric interpretation l Univariate variance is a measure of the “volume” occupied by sample points in one dimension. l Multivariate variance involving m variables is the volume occupied by sample points in an m -dimensional space. l Univariate variance is a measure of the “volume” occupied by sample points in one dimension. l Multivariate variance involving m variables is the volume occupied by sample points in an m -dimensional space. X X Larger variance Smaller variance X1X1 X2X2 Occupied volume

24 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.24 Multivariate variance: effects of correlations among variables l Correlations between pairs of variables reduce the volume occupied by sample points… l …and hence, reduce the multivariate variance. l Correlations between pairs of variables reduce the volume occupied by sample points… l …and hence, reduce the multivariate variance. No correlation X1X1 X2X2 X2X2 X1X1 Positive correlation Negative correlation Occupied volume

25 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.25 C and the generalized multivariate variance l The determinant of the sample covariance matrix C is a generalized multivariate variance… l … because area 2 of a parallelogram with sides given by the individual standard deviations and angle determined by the correlation between variables equals the determinant of C. l The determinant of the sample covariance matrix C is a generalized multivariate variance… l … because area 2 of a parallelogram with sides given by the individual standard deviations and angle determined by the correlation between variables equals the determinant of C.

26 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.26 The use of determinants in multivariate analysis l For a univariate sample variance s a 2, the multivariate analog is the determinant of the corresponding sample covariance matrix C a, i.e., | C a |… … and these variances are often used in the calculation of multivariate test statistics, e.g., Wilk’s . l For a univariate sample variance s a 2, the multivariate analog is the determinant of the corresponding sample covariance matrix C a, i.e., | C a |… … and these variances are often used in the calculation of multivariate test statistics, e.g., Wilk’s . Univariate single-classification ANOVA, k groups Multivariate single-classification ANOVA (MANOVA)

27 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.27 EigenvaluesEigenvalues The eigenvalues of a p X p matrix A are the p solutions, some of which may be zero, to the equation |A - I| = 0. l The trace of a matrix is the sum of its eigenvalues… l … and the determinant of a matrix is the product of its eigenvalues. The eigenvalues of a p X p matrix A are the p solutions, some of which may be zero, to the equation |A - I| = 0. l The trace of a matrix is the sum of its eigenvalues… l … and the determinant of a matrix is the product of its eigenvalues.

28 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.28 Eigenvalues and eigenvectors I Suppose v is a vector, and L a linear transformation. If L(v) = v, then v is an eigenvector of L associated with the eigenvalue. e.g., if L is the reflection in the line y = mx, then  is the eigenvector associated with eigenvalue 1,  with -1. Note that  and  are orthogonal! Suppose v is a vector, and L a linear transformation. If L(v) = v, then v is an eigenvector of L associated with the eigenvalue. e.g., if L is the reflection in the line y = mx, then  is the eigenvector associated with eigenvalue 1,  with -1. Note that  and  are orthogonal!

29 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.29 Eigenvalues and eigenvectors of C l Eigenvectors of the covariance matrix C are orthogonal directed line segments that “span” the variation in the data, and the corresponding (unsigned) eigenvalues are the length of these segments. l … so the product of the eigenvalues is the “volume” occupied by the data, i.e. the determinant of the covariance matrix. l Eigenvectors of the covariance matrix C are orthogonal directed line segments that “span” the variation in the data, and the corresponding (unsigned) eigenvalues are the length of these segments. l … so the product of the eigenvalues is the “volume” occupied by the data, i.e. the determinant of the covariance matrix. No correlation X1X1 X2X2 X2X2 X1X1 Positive correlation Negative correlation

30 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.30 Displaying multivariate data I: Draftman’s plots (SPLOM) l Plot pairs of variables against one another. l Advantages: need only 2 plotting dimensions, bivariate relationships among variables is clear. l Problems: no direct information on relationships in higher than 2 dimensions, relationships between objects unclear. l Plot pairs of variables against one another. l Advantages: need only 2 plotting dimensions, bivariate relationships among variables is clear. l Problems: no direct information on relationships in higher than 2 dimensions, relationships between objects unclear.

31 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.31 Displaying multivariate data II: multiple 3-D plots l Plot 3 variables against one another. l Advantages: trivariate relationships among variables is clear. l Problems: no direct information on relationships in higher than 3 dimensions, relationships between objects unclear. l Plot 3 variables against one another. l Advantages: trivariate relationships among variables is clear. l Problems: no direct information on relationships in higher than 3 dimensions, relationships between objects unclear.

32 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.32 Displaying multivariate data III: plotting index variables l Generate index variables that combine information from several measured variables, then plot these variables. l Advantages: 2- D plots make relationships among variables clear. l Disadvantages: relationships among objects unclear, key information may be lost in data reduction l Generate index variables that combine information from several measured variables, then plot these variables. l Advantages: 2- D plots make relationships among variables clear. l Disadvantages: relationships among objects unclear, key information may be lost in data reduction

33 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.33 Displaying multivariate data IV: Icon plots l Used to visualize relationships among objects, e.g. different canine groups. l Advantages: All variables displayed simultaneously. l Problems: order of display of variables arbitrary, and impressions may depend on order. Relationships among variables may be unclear. l Used to visualize relationships among objects, e.g. different canine groups. l Advantages: All variables displayed simultaneously. l Problems: order of display of variables arbitrary, and impressions may depend on order. Relationships among variables may be unclear. CuonDingo Prehistoric dog Chinese wolf Golden jackal Modern dog X3X3 X2X2 X1X1 X4X4 X5X5 X6X6

34 Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.34 Displaying multivariate data V: profile plots l Represent objects by lines, histograms or Fourier plots. l Advantages: All variables displayed simultaneously. l Problems: order of display of variables arbitrary, and impressions may depend on order. Relationships among variables may be unclear. l Represent objects by lines, histograms or Fourier plots. l Advantages: All variables displayed simultaneously. l Problems: order of display of variables arbitrary, and impressions may depend on order. Relationships among variables may be unclear.


Download ppt "Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L3.1 Lecture 3: A brief background to multivariate statistics."

Similar presentations


Ads by Google