Presentation is loading. Please wait.

Presentation is loading. Please wait.

JBR1 Linear Discriminant Analysis zTwo approaches – Fisher & Mahalanobi zFor two-group discrimination - essentially equivalent to multiple regression zFor.

Similar presentations


Presentation on theme: "JBR1 Linear Discriminant Analysis zTwo approaches – Fisher & Mahalanobi zFor two-group discrimination - essentially equivalent to multiple regression zFor."— Presentation transcript:

1 JBR1 Linear Discriminant Analysis zTwo approaches – Fisher & Mahalanobi zFor two-group discrimination - essentially equivalent to multiple regression zFor multiple groups - essentially a special case of canonical correlation

2 JBR2 LDA – Fisher’s Approach zBased on the idea of a discriminant score zLinear combination of the variables which would produce the maximally different scores across the groups

3 JBR3 LDA – Mahalanobi’s Approach zFor two group - Uses the idea of finding the locus of points equidistant from the group means zFor # groups > 2 We find the distance to each group centroid and assign each point to the closest centroid

4 JBR4 LDA – Iris Data set zUsing Proc Discrim from SAS zProc DISCRIM data=iris_train out=iris_out_dis testdata=iris_test distance manova ncan=2 ; ztitle 'Discriminant Analysis - IRIS data set'; z class species; var sepallen sepalwid petallen petalwid; zrun; zHite rate =.9467 zError Rate =.0533 zWith Different training set Hit rate = 1. z Discriminant Analysis - IRIS data set 30 z 07:58 Sunday, November 28, 2004 z The DISCRIM Procedure z Classification Summary for Test Data: WORK.IRIS_TEST z Classification Summary using Linear Discriminant Function z Generalized Squared Distance Function z 2 _ -1 _ z D (X) = (X-X )' COV (X-X ) z j j j z Posterior Probability of Membership in Each species z 2 2 z Pr(j|X) = exp(-.5 D (X)) / SUM exp(-.5 D (X)) z j k k z Number of Observations and Percent Classified into species z From z species SETOSA VERSICOLOR VIRGINICA Total z SETOSA 24 0 0 24 z 100.00 0.00 0.00 100.00 z VERSICOLOR 0 23 2 25 z 0.00 92.00 8.00 100.00 z VIRGINICA 0 2 24 26 z 0.00 7.69 92.31 100.00 z Total 24 25 26 75 z 32.00 33.33 34.67 100.00 z Priors 0.33333 0.33333 0.33333 z

5 JBR5 LDA – Microarray Data ztrain <- sample(1:7129, 100) zz<-lda(fmat.train[,train],fy) zz.predict.test<-predict(z,fmat.test[,1:3000])$class ztable(fy2,z.predict.test) z30 of first 60 genes zfy2 ALL AML z ALL 16 4 z AML 10 4 zHit rate =.5882 zFirst 60 genes zfy2 ALL AML z ALL 15 5 z AML 6 8 zHit rate =.6765 z30 of all 7129 genes zfy2 ALL AML z ALL 14 6 z AML 3 11 zHit rate =.7353 z 30 of all 7129 genes z fy2 ALL AML z ALL 12 8 z AML 8 6 z Hit Rate =.5294 z 100 of all 7129 Genes z fy2 ALL AML z ALL 17 3 z AML 5 9 z Hit rate =.8235 z First 3000 Genes z fy2 ALL AML z ALL 20 0 z AML 9 5 z Hit rate =.7353

6 JBR6 Compare LDA to SVM (1 st 3000 Genes) fy2 pred ALL AML ALL 20 13 AML 0 1 fy2 z.predict.test ALL AML ALL 20 9 AML 0 5

7 JBR7 LDA - Goodness of fit Proportional Chance Criterion (PPC) zT-test where t=(observed hits-expected hits)/√(n*h*(1- h)) [h=hit rate associated with the PPC] zExpected # of hits = n(prob 1 st group)^2+n(1-prob first group)^2 zFor the microarray example yExpected # of hits = 17.52899 (.5156 hit rate) yT= 2.5637 yGives us a P-value close to.0075 yLDA looks do a sufficient job

8 JBR8 LDA- Problems zR was nice enough to give me this warning when # of variables was over 36 Warning message: variables are collinear in: lda.default(x, grouping,...)


Download ppt "JBR1 Linear Discriminant Analysis zTwo approaches – Fisher & Mahalanobi zFor two-group discrimination - essentially equivalent to multiple regression zFor."

Similar presentations


Ads by Google