Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.

Similar presentations


Presentation on theme: "Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical."— Presentation transcript:

1 Linear Discriminant Analysis (LDA)

2 Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical with k classes.) Assumptions Multivariate Normal Distribution variables are distributed normally within the classes/groups. Similar Group Covariances Correlations between and the variances within each group should be similar.

3 Dependent Variable Must be categorical with 2 or more classes (groups). If there are only 2 classes, the discriminant analysis procedure will give the same result as the multiple regression procedure.

4 Independent Variables Continuous or categorical independent variables If categorical, they are converted into binary (dummy) variables as in multiple linear regression

5 Output Example: Assume 3 classes (y=1,2,3) of the dependent. Yx11x12x13x14f1f2f3Pred. Y 1202510128578581 1181614128068651 ….. 215 16177584702 2141617187088672 ….. 38991195861053 31088 96841003 …..

6 Binary Dependent - Regression If only 2 classes of dependent, can do multiple regression Sample data shown below: StatusAge (18-30)Age (50+)Income YX1X2X3 01030 01032 ….. 00050 00028 00075 ….. 101100 10190 10195

7 Regression Output SUMMARY OUTPUT Regression Statistics Multiple R0.833615561 R Square 0.694914903 Adjusted R Square0.649152139 Standard Error 0.301479577 Observations24 ANOVA dfSSMSFSignificance F Regression34.1405346321.38017821115.18516005 2.19698E-05 Residual201.8177987020.090889935 Total235.958333333 Coefficients Standard Errort StatP-valueLower 95%Upper 95% Intercept-0.337942024 0.22002876-1.5358993270.14023269-0.7969139730.121029925 X1-0.160950017 0.155728156-1.0335319010.313691534-0.4857932570.163893223 X20.426373823 0.1531400522.7842084210.0114497030.1069292730.745818373 Income0.013571735 0.0030783794.4087278590.000270650.0071503490.019993121

8 Classification StatusAge (18-30)Age (50+)Income YX1X2X3Predicted YClass 01030-0.09170 01032-0.06460 010400.04400 010380.01680 010550.24760 010560.26110 000450.27280 000400.20490 000650.54421 000500.34060 000280.04210 100750.67991 100500.34060 110800.58681 1001001.01921 100900.88351 100950.95141 101751.10631 101500.76701 101851.24201 101400.63131 101881.28271 100780.72071 101650.97061 Classification Rule in this case: If Pred. Y > 0.5 then Class = 1; else Class = 0. This model yielded 2 misclassifications out of 24. How good is R-square?

9 Crosstab of Pred. Y and Y For large datasets, one can format the Predicted Y variable and create a crosstab with Y to see how accurately the model classifies the data (fictitious results shown here). The Good and Bad columns represent the number of actual Y values. Predicted Y *1000GoodBad 900to100041050 850to90039070 800to85037090 750to800350110 700to750330130 650to700310150 600to650290170 550to600270190 500to550250210 450to500230 400to450210250 350to400190270 300to350170290 250to300150310 200to250130330 150to200110350 100to15090370 50to10070390 0to50 410 4370

10 Kolmogorov-Smirnov Test Use the crosstabs shown in last slide to conduct the KS Test to determine 1. Cutoff score, 2. Classification accuracy, and 3. Forecasts of model performance.


Download ppt "Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical."

Similar presentations


Ads by Google