Lecture 3: A brief background to multivariate statistics

Slides:



Advertisements
Similar presentations
CS 450: COMPUTER GRAPHICS LINEAR ALGEBRA REVIEW SPRING 2015 DR. MICHAEL J. REALE.
Advertisements

Chapter 6 Eigenvalues and Eigenvectors
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Lecture 7: Principal component analysis (PCA)
Principal Component Analysis
Chapter 2 Matrices Definition of a matrix.
Linear Equations in Linear Algebra
Boot Camp in Linear Algebra Joel Barajas Karla L Caballero University of California Silicon Valley Center October 8th, 2008.
Pam Perlich Urban Planning 5/6020
化工應用數學 授課教師: 郭修伯 Lecture 9 Matrices
Matrix Approach to Simple Linear Regression KNNL – Chapter 5.
Lecture 7: Matrix-Vector Product; Matrix of a Linear Transformation; Matrix-Matrix Product Sections 2.1, 2.2.1,
Intro to Matrices Don’t be scared….
Arithmetic Operations on Matrices. 1. Definition of Matrix 2. Column, Row and Square Matrix 3. Addition and Subtraction of Matrices 4. Multiplying Row.
CE 311 K - Introduction to Computer Methods Daene C. McKinney
Stats & Linear Models.
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
Separate multivariate observations
1 Chapter 2 Matrices Matrices provide an orderly way of arranging values or functions to enhance the analysis of systems in a systematic manner. Their.
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
Basics of Linear Algebra A review?. Matrix  Mathematical term essentially corresponding to an array  An arrangement of numbers into rows and columns.
A vector can be interpreted as a file of data A matrix is a collection of vectors and can be interpreted as a data base The red matrix contain three column.
Compiled By Raj G. Tiwari
Chapter 2 Dimensionality Reduction. Linear Methods
Linear Algebra Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
1 February 24 Matrices 3.2 Matrices; Row reduction Standard form of a set of linear equations: Chapter 3 Linear Algebra Matrix of coefficients: Augmented.
ECON 1150 Matrix Operations Special Matrices
Barnett/Ziegler/Byleen Finite Mathematics 11e1 Review for Chapter 4 Important Terms, Symbols, Concepts 4.1. Systems of Linear Equations in Two Variables.
Some matrix stuff.
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
Statistics and Linear Algebra (the real thing). Vector A vector is a rectangular arrangement of number in several rows and one column. A vector is denoted.
1 1.3 © 2012 Pearson Education, Inc. Linear Equations in Linear Algebra VECTOR EQUATIONS.
Matrices CHAPTER 8.1 ~ 8.8. Ch _2 Contents  8.1 Matrix Algebra 8.1 Matrix Algebra  8.2 Systems of Linear Algebra Equations 8.2 Systems of Linear.
Unit 3: Matrices.
Matrix Algebra and Regression a matrix is a rectangular array of elements m=#rows, n=#columns  m x n a single value is called a ‘scalar’ a single row.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
BIOL 582 Supplemental Material Matrices, Matrix calculations, GLM using matrix algebra.
Canonical Correlation Psy 524 Andrew Ainsworth. Matrices Summaries and reconfiguration.
1 Sample Geometry and Random Sampling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L10.1 Lecture 10: Cluster analysis l Uses of cluster analysis.
Matrices and Determinants
STROUD Worked examples and exercises are in the text Programme 5: Matrices MATRICES PROGRAMME 5.
2.5 – Determinants and Multiplicative Inverses of Matrices.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Unit 3: Matrices. Matrix: A rectangular arrangement of data into rows and columns, identified by capital letters. Matrix Dimensions: Number of rows, m,
STROUD Worked examples and exercises are in the text PROGRAMME 5 MATRICES.
Unsupervised Learning II Feature Extraction
Boot Camp in Linear Algebra TIM 209 Prof. Ram Akella.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Matrices. Variety of engineering problems lead to the need to solve systems of linear equations matrixcolumn vectors.
Graphics Graphics Korea University kucg.korea.ac.kr Mathematics for Computer Graphics 고려대학교 컴퓨터 그래픽스 연구실.
Lecture 1 Linear algebra Vectors, matrices. Linear algebra Encyclopedia Britannica:“a branch of mathematics that is concerned with mathematical structures.
Université d’Ottawa / University of Ottawa 2003 Bio 8102A Applied Multivariate Biostatistics L4.1 Lecture 4: Multivariate distance measures l The concept.
MTH108 Business Math I Lecture 20.
Introduction to Vectors and Matrices
Linear Algebra Review.
Matrices and Vector Concepts
CS479/679 Pattern Recognition Dr. George Bebis
Matrices and Vectors Review Objective
Matrix Operations SpringSemester 2017.
Matrices Definition: A matrix is a rectangular array of numbers or symbolic elements In many applications, the rows of a matrix will represent individuals.
Linear Equations in Linear Algebra
Unit 3: Matrices
1.3 Vector Equations.
Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors.
Principal Components What matters most?.
Introduction to Vectors and Matrices
Matrix Operations SpringSemester 2017.
Matrices and Determinants
Presentation transcript:

Lecture 3: A brief background to multivariate statistics Univariate versus multivariate statistics The material of multivariate analysis Displaying multivariate data The uses of multivariate statistics A refresher of matrix algebra Bio 8100s Applied Multivariate Biostatistics 2001

Multivariate versus univariate statistics In univariate statistical analysis, we are concerned with analyzing variation in a single random variable. In multivariate statistical analysis, we are concerned with analyzing variation in several random variables which may or may not be related. Bio 8100s Applied Multivariate Biostatistics 2001

The material of multivariate analysis Multivariate data consists of a set of measurements (usually related) of P variables X1, X2, …, XP on n sample units. The variables Xj may be ratio, ordinal, or nominal. Bio 8100s Applied Multivariate Biostatistics 2001

Example 1: Bumpus’ sparrow data 5 morphological measurements (in mm) of 49 sparrows recovered from a storm in 1898. ... Bio 8100s Applied Multivariate Biostatistics 2001

Example 2: Biodiversity of SE Ontario wetlands Species richness (number of species) of 5 different taxa in 57 wetlands in southeastern Ontario. ... ... ... ... ... ... Bio 8100s Applied Multivariate Biostatistics 2001

The material of multivariate analysis In some applications, the measured variables comprise both dependent (X) and independent (Y) variables. The material of multivariate analysis Bio 8100s Applied Multivariate Biostatistics 2001

Example 1: Pgi frequencies in California Euphydras editha colonies in relation to environmental factors. Bio 8100s Applied Multivariate Biostatistics 2001

Example 2: Anurans in SE Ontario wetlands in relation to surrounding forest cover and road densities Bio 8100s Applied Multivariate Biostatistics 2001

Multivariate LS estimators The vector of sample means, variances and covariances is an estimate of the true (“population”) means, variances and covariances. As such, inferences to the latter based on the former assume random sampling. Population Sample Bio 8100s Applied Multivariate Biostatistics 2001

The sample covariance matrix The sample covariance matrix is a square matrix whose diagonal elements give the sample variances for each measured variable (si2), and whose off-diagonal elements are the sample covariances between pairs of variables (cik). Bio 8100s Applied Multivariate Biostatistics 2001

A review of matrix algebra A matrix of size m x n is an array of numbers (either real or complex) with m rows and n columns. Matrices with one column are column vectors, matrices with one row are row vectors. Bio 8100s Applied Multivariate Biostatistics 2001

Special matrices A zero matrix 0 has all elements equal to zero. A diagonal matrix T is a square matrix (m = n) with all elements equal to zero except the main diagonal. An identity matrix I is a diagonal matrix with all diagonal terms equal to zero. Bio 8100s Applied Multivariate Biostatistics 2001

Matrix operations The transpose of a matrix A (AT) is obtained by interchanging rows and columns. The transpose of a row vector is a column vector, and the transpose of a column vector is a row vector. Bio 8100s Applied Multivariate Biostatistics 2001

The trace of a matrix The trace of a matrix A, denoted tr(A), is the sum of the diagonal elements. The trace is defined only for square matrices. Bio 8100s Applied Multivariate Biostatistics 2001

Matrix addition and subtraction Two matrices A and B are conformable for addition if they are of the same size (same numbers of rows and columns). The resulting matrix A + B (A - B) is obtained by adding (subtracting) individual matrix elements. Bio 8100s Applied Multivariate Biostatistics 2001

Matrix multiplication by a scalar The multiplication of a matrix A by a scalar k involves multiplying each element of A by k. Bio 8100s Applied Multivariate Biostatistics 2001

Matrix multiplication Two matrices A (m x n) and B (n x p) are conformable for multiplication (A • B) if the number of columns in A equals the number of rows in B. A • B and B • A are both defined only when both A and B are square, but even when true, in general A • B  B • A . Bio 8100s Applied Multivariate Biostatistics 2001

Matrix inversion The inverse of a matrix A, denoted A-1, is the matrix solving the matrix equation where I is the identity matrix. Only square matrices are invertible, and some matrices cannot be inverted (“singular” matrices) Bio 8100s Applied Multivariate Biostatistics 2001

The covariance matrix A multivariate sample is described by a covariance matrix, whose diagonal elements give the sample variances for each measured variable (si2), and whose off-diagonal elements are the sample covariances between pairs of variables (cik). Bio 8100s Applied Multivariate Biostatistics 2001

Calculating the sample covariance matrix d = N M Q P - 1 3 2 4 7 Bio 8100s Applied Multivariate Biostatistics 2001

The determinant of a matrix: 2 X 2 matrices The determinant of a matrix A, denoted det(A) or |A|, is a unique number associated with every square matrix. In multivariate statistics, the determinant of the sample covariance matrix C plays a crucial role in hypothesis testing. Bio 8100s Applied Multivariate Biostatistics 2001

Matrix inversion and the determinant: 2 X 2 matrices If a 2 X 2 matrix A is invertible, the elements of its inverse A-1 are obtained by dividing modified elements of A by |A| Hence, if |A| = 0, the division is undefined and the matrix is non-invertible or singular. Bio 8100s Applied Multivariate Biostatistics 2001

Multivariate variance: a geometric interpretation Larger variance Smaller variance Univariate variance is a measure of the “volume” occupied by sample points in one dimension. Multivariate variance involving m variables is the volume occupied by sample points in an m -dimensional space. X X X1 X2 Occupied volume Bio 8100s Applied Multivariate Biostatistics 2001

Multivariate variance: effects of correlations among variables No correlation Multivariate variance: effects of correlations among variables X1 X2 Correlations between pairs of variables reduce the volume occupied by sample points… …and hence, reduce the multivariate variance. Positive correlation Negative correlation X1 Occupied volume X2 Bio 8100s Applied Multivariate Biostatistics 2001

C and the generalized multivariate variance The determinant of the sample covariance matrix C is a generalized multivariate variance… … because area2 of a parallelogram with sides given by the individual standard deviations and angle determined by the correlation between variables equals the determinant of C. Bio 8100s Applied Multivariate Biostatistics 2001

The use of determinants in multivariate analysis Univariate single-classification ANOVA, k groups For a univariate sample variance sa2, the multivariate analog is the determinant of the corresponding sample covariance matrix Ca, i.e., | Ca|… … and these variances are often used in the calculation of multivariate test statistics, e.g., Wilk’s L. Multivariate single-classification ANOVA (MANOVA) Bio 8100s Applied Multivariate Biostatistics 2001

Eigenvalues The eigenvalues of a p X p matrix A are the p solutions, some of which may be zero, to the equation |A - lI| = 0. The trace of a matrix is the sum of its eigenvalues… … and the determinant of a matrix is the product of its eigenvalues. Bio 8100s Applied Multivariate Biostatistics 2001

Eigenvalues and eigenvectors I Suppose v is a vector, and L a linear transformation. If L(v) = lv, then v is an eigenvector of L associated with the eigenvalue l. e.g., if L is the reflection in the line y = mx, then a is the eigenvector associated with eigenvalue 1, b with -1. Note that a and b are orthogonal! Bio 8100s Applied Multivariate Biostatistics 2001

Eigenvalues and eigenvectors of C No correlation Eigenvectors of the covariance matrix C are orthogonal directed line segments that “span” the variation in the data, and the corresponding (unsigned) eigenvalues are the length of these segments. … so the product of the eigenvalues is the “volume” occupied by the data, i.e. the determinant of the covariance matrix. X1 Positive correlation X2 Negative correlation X1 X2 Bio 8100s Applied Multivariate Biostatistics 2001

Displaying multivariate data I: Draftman’s plots (SPLOM) Plot pairs of variables against one another. Advantages: need only 2 plotting dimensions, bivariate relationships among variables is clear. Problems: no direct information on relationships in higher than 2 dimensions, relationships between objects unclear. Bio 8100s Applied Multivariate Biostatistics 2001

Displaying multivariate data II: multiple 3-D plots Plot 3 variables against one another. Advantages: trivariate relationships among variables is clear. Problems: no direct information on relationships in higher than 3 dimensions, relationships between objects unclear. Bio 8100s Applied Multivariate Biostatistics 2001

Displaying multivariate data III: plotting index variables Generate index variables that combine information from several measured variables, then plot these variables. Advantages: 2- D plots make relationships among variables clear. Disadvantages: relationships among objects unclear, key information may be lost in data reduction Bio 8100s Applied Multivariate Biostatistics 2001

Displaying multivariate data IV: Icon plots Cuon Dingo Prehistoric dog Chinese wolf Golden jackal Modern Displaying multivariate data IV: Icon plots Used to visualize relationships among objects, e.g. different canine groups. Advantages: All variables displayed simultaneously. Problems: order of display of variables arbitrary, and impressions may depend on order. Relationships among variables may be unclear. X4 X3 X5 X6 X2 X1 Bio 8100s Applied Multivariate Biostatistics 2001

Displaying multivariate data V: profile plots Represent objects by lines, histograms or Fourier plots. Advantages: All variables displayed simultaneously. Problems: order of display of variables arbitrary, and impressions may depend on order. Relationships among variables may be unclear. Bio 8100s Applied Multivariate Biostatistics 2001