JBR1 Linear Discriminant Analysis zTwo approaches – Fisher & Mahalanobi zFor two-group discrimination - essentially equivalent to multiple regression zFor.

Slides:



Advertisements
Similar presentations
Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.
Advertisements

Topic 12: Multiple Linear Regression
Component Analysis (Review)
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Advanced Statistical Methods for Research Math 736/836
Discriminant Analysis Database Marketing Instructor:Nanda Kumar.
Lecture for Multiple Regression HSPM J716. Data Fertilizer-Rain chart The two X variables graphed.
Lecture for Multiple Regression HSPM J716. Data Fertilizer-Rain (two X variables) chart.
Discriminant Analysis To describe multiple regression analysis and multiple discriminant analysis. Discriminant Analysis.
Linear Discriminant Function LDF & MANOVA LDF & Multiple Regression Geometric example of LDF & multivariate power Evaluating & reporting LDF results 3.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Discrim Continued Psy 524 Andrew Ainsworth. Types of Discriminant Function Analysis They are the same as the types of multiple regression Direct Discrim.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
MANOVA LDF & MANOVA Geometric example of MANVOA & multivariate power MANOVA dimensionality Follow-up analyses if k > 2 Factorial MANOVA.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
Classification with several populations Presented by: Libin Zhou.
Discriminant analysis
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Spreadsheet Modeling & Decision Analysis
Multiple Discriminant Analysis and Logistic Regression.
Outline Separating Hyperplanes – Separable Case
Classification (Supervised Clustering) Naomi Altman Nov '06.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Chapter Eighteen Discriminant Analysis Chapter Outline 1) Overview 2) Basic Concept 3) Relation to Regression and ANOVA 4) Discriminant Analysis.
Discriminant Analysis
18-1 © 2007 Prentice Hall Chapter Eighteen 18-1 Discriminant and Logit Analysis.
Chapter 12 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Xuhua Xia Slide 1 MANOVA All statistical methods we have learned so far have only one continuous DV and one or more IVs which may be continuous or categorical.
Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor.
The Broad Institute of MIT and Harvard Classification / Prediction.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
MANOVA AND DISCRIMANT ANALYSIS Juan Carlos Penagos Saul Hoyos.
17-1 COMPLETE BUSINESS STATISTICS by AMIR D. ACZEL & JAYAVEL SOUNDERPANDIAN 6 th edition (SIE)
Multiple Discriminant Analysis
Copyright © 2010 Pearson Education, Inc Chapter Eighteen Discriminant and Logit Analysis.
This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
JBR1 Support Vector Machines Classification Venables & Ripley Section 12.5 CSU Hayward Statistics 6601 Joseph Rickert & Timothy McKusick December 1, 2004.
Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.
Linear Models for Classification
CLASSIFICATION DISCRIMINATION LECTURE 15. What is Discrimination or Classification? Consider an example where we have two populations P1 and P2 each ~
Discriminant Analysis Dr. Satyendra Singh Professor and Director University of Winnipeg, Canada
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Two-Group Discriminant Function Analysis. Overview You wish to predict group membership. There are only two groups. Your predictor variables are continuous.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Discriminant Function Analysis Mechanics. Equations To get our results we’ll have to use those same SSCP matrices as we did with Manova.
D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.
Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L9.1 Lecture 9: Discriminant function analysis (DFA) l Rationale.
Introduction to Classifiers Fujinaga. Bayes (optimal) Classifier (1) A priori probabilities: and Decision rule: given and decide if and probability of.
Chapter 12 – Discriminant Analysis
Erich Smith Coleman Platt
CH 5: Multivariate Methods
Multiple Discriminant Analysis and Logistic Regression
Discriminant Analysis
Multivariate statistics
Classifiers Fujinaga.
Linear Regression.
Linear Discriminant Analysis
7.4 – The Method of Least-Squares
Classifiers Fujinaga.
Generally Discriminant Analysis
Chapter 10 Discriminant Analysis
Multivariate Methods Berlin Chen, 2005 References:
Introduction.
Canonical Correlation Analysis
Presentation transcript:

JBR1 Linear Discriminant Analysis zTwo approaches – Fisher & Mahalanobi zFor two-group discrimination - essentially equivalent to multiple regression zFor multiple groups - essentially a special case of canonical correlation

JBR2 LDA – Fisher’s Approach zBased on the idea of a discriminant score zLinear combination of the variables which would produce the maximally different scores across the groups

JBR3 LDA – Mahalanobi’s Approach zFor two group - Uses the idea of finding the locus of points equidistant from the group means zFor # groups > 2 We find the distance to each group centroid and assign each point to the closest centroid

JBR4 LDA – Iris Data set zUsing Proc Discrim from SAS zProc DISCRIM data=iris_train out=iris_out_dis testdata=iris_test distance manova ncan=2 ; ztitle 'Discriminant Analysis - IRIS data set'; z class species; var sepallen sepalwid petallen petalwid; zrun; zHite rate =.9467 zError Rate =.0533 zWith Different training set Hit rate = 1. z Discriminant Analysis - IRIS data set 30 z 07:58 Sunday, November 28, 2004 z The DISCRIM Procedure z Classification Summary for Test Data: WORK.IRIS_TEST z Classification Summary using Linear Discriminant Function z Generalized Squared Distance Function z 2 _ -1 _ z D (X) = (X-X )' COV (X-X ) z j j j z Posterior Probability of Membership in Each species z 2 2 z Pr(j|X) = exp(-.5 D (X)) / SUM exp(-.5 D (X)) z j k k z Number of Observations and Percent Classified into species z From z species SETOSA VERSICOLOR VIRGINICA Total z SETOSA z z VERSICOLOR z z VIRGINICA z z Total z z Priors z

JBR5 LDA – Microarray Data ztrain <- sample(1:7129, 100) zz<-lda(fmat.train[,train],fy) zz.predict.test<-predict(z,fmat.test[,1:3000])$class ztable(fy2,z.predict.test) z30 of first 60 genes zfy2 ALL AML z ALL 16 4 z AML 10 4 zHit rate =.5882 zFirst 60 genes zfy2 ALL AML z ALL 15 5 z AML 6 8 zHit rate =.6765 z30 of all 7129 genes zfy2 ALL AML z ALL 14 6 z AML 3 11 zHit rate =.7353 z 30 of all 7129 genes z fy2 ALL AML z ALL 12 8 z AML 8 6 z Hit Rate =.5294 z 100 of all 7129 Genes z fy2 ALL AML z ALL 17 3 z AML 5 9 z Hit rate =.8235 z First 3000 Genes z fy2 ALL AML z ALL 20 0 z AML 9 5 z Hit rate =.7353

JBR6 Compare LDA to SVM (1 st 3000 Genes) fy2 pred ALL AML ALL AML 0 1 fy2 z.predict.test ALL AML ALL 20 9 AML 0 5

JBR7 LDA - Goodness of fit Proportional Chance Criterion (PPC) zT-test where t=(observed hits-expected hits)/√(n*h*(1- h)) [h=hit rate associated with the PPC] zExpected # of hits = n(prob 1 st group)^2+n(1-prob first group)^2 zFor the microarray example yExpected # of hits = (.5156 hit rate) yT= yGives us a P-value close to.0075 yLDA looks do a sufficient job

JBR8 LDA- Problems zR was nice enough to give me this warning when # of variables was over 36 Warning message: variables are collinear in: lda.default(x, grouping,...)