Download presentation
Presentation is loading. Please wait.
1
Multivariate Distance and Similarity Robert F. Murphy Cytometry Development Workshop 2000
2
General Multivariate Dataset u We are given values of p variables for n independent observations u Construct an n x p matrix M consisting of vectors X 1 through X n each of length p
3
Multivariate Sample Mean u Define mean vector I of length p or matrix notation vector notation
4
Multivariate Variance Define variance vector of length p matrix notation
5
Multivariate Variance u or vector notation
6
Covariance Matrix Define a p x p matrix cov (called the covariance matrix) analogous to 2
7
Covariance Matrix u Note that the covariance of a variable with itself is simply the variance of that variable
8
Univariate Distance u The simple distance between the values of a single variable j for two observations i and l is
9
Univariate z-score Distance u To measure distance in units of standard deviation between the values of a single variable j for two observations i and l we define the z-score distance
10
Bivariate Euclidean Distance u The most commonly used measure of distance between two observations i and l on two variables j and k is the Euclidean distance
11
Multivariate Euclidean Distance u This can be extended to more than two variables
12
Effects of variance and covariance on Euclidean distance Points A and B have similar Euclidean distances from the mean, but point B is clearly “more different” from the population than point A. B A The ellipse shows the 50% contour of a hypothetical population.
13
Mahalanobis Distance u To account for differences in variance between the variables, and to account for correlations between variables, we use the Mahalanobis distance
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.