Multivariate Distance and Similarity Robert F. Murphy Cytometry Development Workshop 2000.

Slides:



Advertisements
Similar presentations
SJS SDI_21 Design of Statistical Investigations Stephen Senn 2 Background Stats.
Advertisements

StatisticalDesign&ModelsValidation. Introduction.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Missing Data Analysis. Complete Data: n=100 Sample means of X and Y Sample variances and covariances of X Y
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Def: Let and be random variables of the discrete type with the joint p.m.f. on the space S. (1) is called the mean of (2) is called the variance of (3)
1 Introduction to Biostatistics (PUBHLTH 540) Multiple Random Variables.
Data Basics. Data Matrix Many datasets can be represented as a data matrix. Rows corresponding to entities Columns represents attributes. N: size of the.
Computational Biology, Part 12 Expression array cluster analysis Robert F. Murphy, Shann-Ching Chen Copyright  All rights reserved.
Basics: Notation: Sum:. PARAMETERS MEAN: Sample Variance: Standard Deviation: * the statistical average * the central tendency * the spread of the values.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
Correlations and Copulas Chapter 10 Risk Management and Financial Institutions 2e, Chapter 10, Copyright © John C. Hull
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Raw data analysis S. Purcell & M. C. Neale Twin Workshop, IBG Colorado, March 2002.
Slide 1 Detecting Outliers Outliers are cases that have an atypical score either for a single variable (univariate outliers) or for a combination of variables.
Lecture II-2: Probability Review
The Multivariate Normal Distribution, Part 1 BMTRY 726 1/10/2014.
The Multivariate Normal Distribution, Part 2 BMTRY 726 1/14/2014.
1 Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Separate multivariate observations
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
MathematicalMarketing Slide 2.1 Descriptive Statistics Chapter 2: Descriptive Statistics We will be comparing the univariate and matrix formulae for common.
Random variables Petter Mostad Repetition Sample space, set theory, events, probability Conditional probability, Bayes theorem, independence,
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Chapter 12 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
1 G Lect 8b G Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey.
The Mean of a Discrete RV The mean of a RV is the average value the RV takes over the long-run. –The mean of a RV is analogous to the mean of a large population.
Educ 200C Wed. Oct 3, Variation What is it? What does it look like in a data set?
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
1 Sample Geometry and Random Sampling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
1 Matrix Algebra and Random Vectors Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Estimation in Marginal Models (GEE and Robust Estimation)
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 22.
Robert Engle UCSD and NYU and Robert F. Engle, Econometric Services DYNAMIC CONDITIONAL CORRELATIONS.
Outliers Chapter 5.3 Data Screening. Outliers can Bias a Parameter Estimate.
Geology 6600/7600 Signal Analysis 02 Sep 2015 © A.R. Lowry 2015 Last time: Signal Analysis is a set of tools used to extract information from sequences.
Estimation of covariance matrix under informative sampling Julia Aru University of Tartu and Statistics Estonia Tartu, June 25-29, 2007.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Introduction to Multivariate Analysis and Multivariate Distances Hal Whitehead BIOL4062/5062.
Factor & Cluster Analyses. Factor Analysis Goals Data Process Results.
Objectives: Normal Random Variables Support Regions Whitening Transformations Resources: DHS – Chap. 2 (Part 2) K.F. – Intro to PR X. Z. – PR Course S.B.
Copyright © 2008 by Nelson, a division of Thomson Canada Limited Chapter 18 Part 5 Analysis and Interpretation of Data DIFFERENCES BETWEEN GROUPS AND RELATIONSHIPS.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Université d’Ottawa / University of Ottawa 2003 Bio 8102A Applied Multivariate Biostatistics L4.1 Lecture 4: Multivariate distance measures l The concept.
Basic simulation methodology
CH 5: Multivariate Methods
Inference for the mean vector
ECE 417 Lecture 4: Multivariate Gaussians
Construction Engineering 221
The Multivariate Normal Distribution, Part 2
Matrix Algebra and Random Vectors
Multivariate Statistical Methods
Aside: projections onto vectors
Multivariate Statistical Methods
数据的矩阵描述.
Checking the data and assumptions before the final analysis.
(Approximately) Bivariate Normal Data and Inference Based on Hotelling’s T2 WNBA Regular Season Home Point Spread and Over/Under Differentials
Multivariate Methods Berlin Chen
The Multivariate Normal Distribution, Part I
Multivariate Methods Berlin Chen, 2005 References:
Multivariate Statistics
The two sample problem.
Test #1 Thursday September 20th
Canonical Correlation Analysis
Probabilistic Surrogate Models
Presentation transcript:

Multivariate Distance and Similarity Robert F. Murphy Cytometry Development Workshop 2000

General Multivariate Dataset u We are given values of p variables for n independent observations u Construct an n x p matrix M consisting of vectors X 1 through X n each of length p

Multivariate Sample Mean u Define mean vector I of length p or matrix notation vector notation

Multivariate Variance  Define variance vector   of length p matrix notation

Multivariate Variance u or vector notation

Covariance Matrix  Define a p x p matrix cov (called the covariance matrix) analogous to  2

Covariance Matrix u Note that the covariance of a variable with itself is simply the variance of that variable

Univariate Distance u The simple distance between the values of a single variable j for two observations i and l is

Univariate z-score Distance u To measure distance in units of standard deviation between the values of a single variable j for two observations i and l we define the z-score distance

Bivariate Euclidean Distance u The most commonly used measure of distance between two observations i and l on two variables j and k is the Euclidean distance

Multivariate Euclidean Distance u This can be extended to more than two variables

Effects of variance and covariance on Euclidean distance Points A and B have similar Euclidean distances from the mean, but point B is clearly “more different” from the population than point A. B A The ellipse shows the 50% contour of a hypothetical population.

Mahalanobis Distance u To account for differences in variance between the variables, and to account for correlations between variables, we use the Mahalanobis distance