Chapter Eighteen Discriminant Analysis. 18-2 Chapter Outline 1) Overview 2) Basic Concept 3) Relation to Regression and ANOVA 4) Discriminant Analysis.

Slides:



Advertisements
Similar presentations
ANALYSIS OF VARIANCE (ONE WAY)
Advertisements

Chapter Nineteen Factor Analysis.
Chapter 17 Overview of Multivariate Analysis Methods
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Discrim Continued Psy 524 Andrew Ainsworth. Types of Discriminant Function Analysis They are the same as the types of multiple regression Direct Discrim.
Discriminant Analysis – Basic Relationships
Analysis of Variance & Multivariate Analysis of Variance
Multiple Regression – Basic Relationships
Discriminant Analysis Testing latent variables as predictors of groups.
Relationships Among Variables
Discriminant analysis
Example of Simple and Multiple Regression
Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, November 19 Chi-Squared Test of Independence.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Correlation and Regression
Selecting the Correct Statistical Test
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Discriminant Analysis and Logistic Regression.
One-Way Manova For an expository presentation of multivariate analysis of variance (MANOVA). See the following paper, which addresses several questions:
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Which Test Do I Use? Statistics for Two Group Experiments The Chi Square Test The t Test Analyzing Multiple Groups and Factorial Experiments Analysis of.
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Chapter 24 Multivariate Statistical Analysis © 2010 South-Western/Cengage Learning. All rights reserved. May not be scanned, copied or duplicated, or posted.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Discriminant Analysis
18-1 © 2007 Prentice Hall Chapter Eighteen 18-1 Discriminant and Logit Analysis.
© Copyright McGraw-Hill CHAPTER 12 Analysis of Variance (ANOVA)
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Chapter 14 – 1 Chapter 14: Analysis of Variance Understanding Analysis of Variance The Structure of Hypothesis Testing with ANOVA Decomposition of SST.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
17-1 COMPLETE BUSINESS STATISTICS by AMIR D. ACZEL & JAYAVEL SOUNDERPANDIAN 6 th edition (SIE)
Multiple Discriminant Analysis
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Chapter 13 Multiple Regression
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
ANOVA: Analysis of Variance.
Chapter 14 – 1 Chapter 14: Analysis of Variance Understanding Analysis of Variance The Structure of Hypothesis Testing with ANOVA Decomposition of SST.
Copyright © 2010 Pearson Education, Inc Chapter Eighteen Discriminant and Logit Analysis.
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Review for Final Examination COMM 550X, May 12, 11 am- 1pm Final Examination.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Lecture 12 Factor Analysis.
Correlation & Regression Analysis
Module III Multivariate Analysis Techniques- Framework, Factor Analysis, Cluster Analysis and Conjoint Analysis Research Report.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Two-Group Discriminant Function Analysis. Overview You wish to predict group membership. There are only two groups. Your predictor variables are continuous.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Chapter 9 Introduction to the Analysis of Variance Part 1: Oct. 22, 2013.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent.
D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 10 Introduction to the Analysis.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
DISCRIMINANT ANALYSIS. Discriminant Analysis  Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression: Regression with a Binary Dependent Variable.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
BINARY LOGISTIC REGRESSION
Multiple Discriminant Analysis and Logistic Regression
Multiple Regression – Part II
Product moment correlation
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Presentation transcript:

Chapter Eighteen Discriminant Analysis

18-2 Chapter Outline 1) Overview 2) Basic Concept 3) Relation to Regression and ANOVA 4) Discriminant Analysis Model 5) Statistics Associated with Discriminant Analysis 6) Conducting Discriminant Analysis i. Formulation ii. Estimation iii. Determination of Significance iv. Interpretation v. Validation

18-3 Chapter Outline 7)Multiple Discriminant Analysis i.Formulation ii.Estimation iii.Determination of Significance iv.Interpretation v.Validation 8)Stepwise Discriminant Analysis 9)Internet and Computer Applications 10)Focus on Burke 11)Summary 12)Key Terms and Concepts

18-4 Similarities and Differences between ANOVA, Regression, and Discriminant Analysis ANOVA REGRESSION DISCRIMINANT ANALYSIS Similarities Number of OneOneOne dependent variables Number of independent Multiple MultipleMultiple variables Differences Nature of the dependent MetricMetricCategorical variables Nature of the independent CategoricalMetricMetric variables Table 18.1

18-5 Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor or independent variables are interval in nature. The objectives of discriminant analysis are as follows: Development of discriminant functions, or linear combinations of the predictor or independent variables, which will best discriminate between the categories of the criterion or dependent variable (groups). Examination of whether significant differences exist among the groups, in terms of the predictor variables. Determination of which predictor variables contribute to most of the intergroup differences. Classification of cases to one of the groups based on the values of the predictor variables. Evaluation of the accuracy of classification.

18-6 When the criterion variable has two categories, the technique is known as two-group discriminant analysis. When three or more categories are involved, the technique is referred to as multiple discriminant analysis. The main distinction is that, in the two-group case, it is possible to derive only one discriminant function. In multiple discriminant analysis, more than one function may be computed. In general, with G groups and k predictors, it is possible to estimate up to the smaller of G - 1, or k, discriminant functions. The first function has the highest ratio of between-groups to within-groups sum of squares. The second function, uncorrelated with the first, has the second highest ratio, and so on. However, not all the functions may be statistically significant. Discriminant Analysis

18-7 Discriminant Analysis Model The discriminant analysis model involves linear combinations of the following form: D = b 0 + b 1 X 1 + b 2 X 2 + b 3 X b k X k where D=discriminant score b 's=discriminant coefficient or weight X 's=predictor or independent variable The coefficients, or weights (b), are estimated so that the groups differ as much as possible on the values of the discriminant function. This occurs when the ratio of between-group sum of squares to within-group sum of squares for the discriminant scores is at a maximum.

18-8 Canonical correlation. Canonical correlation measures the extent of association between the discriminant scores and the groups. It is a measure of association between the single discriminant function and the set of dummy variables that define the group membership. Centroid. The centroid is the mean values for the discriminant scores for a particular group. There are as many centroids as there are groups, as there is one for each group. The means for a group on all the functions are the group centroids. Classification matrix. Sometimes also called confusion or prediction matrix, the classification matrix contains the number of correctly classified and misclassified cases. Statistics Associated with Discriminant Analysis

18-9 Discriminant function coefficients. The discriminant function coefficients (unstandardized) are the multipliers of variables, when the variables are in the original units of measurement. Discriminant scores. The unstandardized coefficients are multiplied by the values of the variables. These products are summed and added to the constant term to obtain the discriminant scores. Eigenvalue. For each discriminant function, the Eigenvalue is the ratio of between-group to within- group sums of squares. Large Eigenvalues imply superior functions. Statistics Associated with Discriminant Analysis

18-10 F values and their significance. These are calculated from a one-way ANOVA, with the grouping variable serving as the categorical independent variable. Each predictor, in turn, serves as the metric dependent variable in the ANOVA. Group means and group standard deviations. These are computed for each predictor for each group. Pooled within-group correlation matrix. The pooled within-group correlation matrix is computed by averaging the separate covariance matrices for all the groups. Statistics Associated with Discriminant Analysis

18-11 Standardized discriminant function coefficients. The standardized discriminant function coefficients are the discriminant function coefficients and are used as the multipliers when the variables have been standardized to a mean of 0 and a variance of 1. Structure correlations. Also referred to as discriminant loadings, the structure correlations represent the simple correlations between the predictors and the discriminant function. Total correlation matrix. If the cases are treated as if they were from a single sample and the correlations computed, a total correlation matrix is obtained. Wilks'. Sometimes also called the U statistic, Wilks' for each predictor is the ratio of the within-group sum of squares to the total sum of squares. Its value varies between 0 and 1. Large values of (near 1) indicate that group means do not seem to be different. Small values of (near 0) indicate that the group means seem to be different. Statistics Associated with Discriminant Analysis

18-12 Conducting Discriminant Analysis Fig Assess Validity of Discriminant Analysis Estimate the Discriminant Function Coefficients Determine the Significance of the Discriminant Function Formulate the Problem Interpret the Results

18-13 Conducting Discriminant Analysis Formulate the Problem Identify the objectives, the criterion variable, and the independent variables. The criterion variable must consist of two or more mutually exclusive and collectively exhaustive categories. The predictor variables should be selected based on a theoretical model or previous research, or the experience of the researcher. One part of the sample, called the estimation or analysis sample, is used for estimation of the discriminant function. The other part, called the holdout or validation sample, is reserved for validating the discriminant function. Often the distribution of the number of cases in the analysis and validation samples follows the distribution in the total sample.

18-14 Information on Resort Visits: Analysis Sample Table 18.2 Annual Attitude Importance Household Age of Amount Resort Family Toward Attached Size Head of Spent on No. Visit Income Travel to Family Household Family ($000) Vacation Vacation M (2) H (3) H (3) L (1) H (3) H (3) M (2) M (2) H (3) H (3) H (3) H (3) M (2) H (3) H (3)

18-15 Annual Attitude Importance Household Age of Amount Resort Family Toward Attached Size Head of Spent on No. Visit Income Travel to Family Household Family ($000) Vacation Vacation L (1) L (1) M (2) M (2) M (2) L (1) M (2) L (1) L (1) L (1) M (2) M (2) L (1) L (1) L (1) Information on Resort Visits: Analysis Sample Table 18.2 cont.

18-16 Information on Resort Visits: Holdout Sample Table 18.3 Annual Attitude Importance Household Age of Amount Resort Family Toward Attached Size Head of Spent on No. Visit Income Travel to Family Household Family ($000) Vacation Vacation M(2) H (3) M(2) M(2) H (3) H (3) L (1) L (1) H (3) L (1) M(2) L (1)

18-17 Conducting Discriminant Analysis Estimate the Discriminant Function Coefficients The direct method involves estimating the discriminant function so that all the predictors are included simultaneously. In stepwise discriminant analysis, the predictor variables are entered sequentially, based on their ability to discriminate among groups.

18-18 Results of Two-Group Discriminant Analysis Table 18.4 GROUP MEANS VISITINCOME TRAVEL VACATION HSIZE AGE Total Group Standard Deviations Total Pooled Within-Groups Correlation Matrix INCOMETRAVELVACATION HSIZE AGE INCOME TRAVEL VACATION HSIZE AGE Wilks' (U-statistic) and univariate F ratio with 1 and 28 degrees of freedom VariableWilks' F Significance INCOME TRAVEL VACATION HSIZE AGE Contd.

18-19 Results of Two-Group Discriminant Analysis Table 18.4 cont. CANONICAL DISCRIMINANT FUNCTIONS % of Cum Canonical After Wilks' FunctionEigenvalue Variance %Correlation Function Chi-square df Significance : * : * marks the 1 canonical discriminant functions remaining in the analysis. Standard Canonical Discriminant Function Coefficients FUNC 1 INCOME TRAVEL VACATION HSIZE AGE Structure Matrix: Pooled within-groups correlations between discriminating variables & canonical discriminant functions (variables ordered by size of correlation within function) FUNC 1 INCOME HSIZE VACATION TRAVEL AGE Contd.

18-20 Results of Two-Group Discriminant Analysis Table 18.4 cont. Unstandardized Canonical Discriminant Function Coefficients FUNC 1 INCOME E-01 TRAVEL E-01 VACATION HSIZE AGE E-01 (constant) Canonical discriminant functions evaluated at group means (group centroids) GroupFUNC Classification results for cases selected for use in analysis PredictedGroup Membership Actual GroupNo. of Cases12 Group %20.0% Group %100.0% Percent of grouped cases correctly classified: 90.00% Contd.

18-21 Results of Two-Group Discriminant Analysis Table 18.4 cont. Classification Results for cases not selected for use in the analysis (holdout sample) PredictedGroup Membership Actual GroupNo. of Cases12 Group %33.3% Group %100.0% Percent of grouped cases correctly classified: 83.33%.

18-22 Conducting Discriminant Analysis Determine the Significance of Discriminant Function The null hypothesis that, in the population, the means of all discriminant functions in all groups are equal can be statistically tested. In SPSS this test is based on Wilks'. If several functions are tested simultaneously (as in the case of multiple discriminant analysis), the Wilks' statistic is the product of the univariate for each function. The significance level is estimated based on a chi- square transformation of the statistic. If the null hypothesis is rejected, indicating significant discrimination, one can proceed to interpret the results.

18-23 Conducting Discriminant Analysis Interpret the Results The interpretation of the discriminant weights, or coefficients, is similar to that in multiple regression analysis. Given the multicollinearity in the predictor variables, there is no unambiguous measure of the relative importance of the predictors in discriminating between the groups. With this caveat in mind, we can obtain some idea of the relative importance of the variables by examining the absolute magnitude of the standardized discriminant function coefficients. Some idea of the relative importance of the predictors can also be obtained by examining the structure correlations, also called canonical loadings or discriminant loadings. These simple correlations between each predictor and the discriminant function represent the variance that the predictor shares with the function. Another aid to interpreting discriminant analysis results is to develop a characteristic profile for each group by describing each group in terms of the group means for the predictor variables.

18-24 Conducting Discriminant Analysis Access Validity of Discriminant Analysis Many computer programs, such as SPSS, offer a leave- one-out cross-validation option. The discriminant weights, estimated by using the analysis sample, are multiplied by the values of the predictor variables in the holdout sample to generate discriminant scores for the cases in the holdout sample. The cases are then assigned to groups based on their discriminant scores and an appropriate decision rule. The hit ratio, or the percentage of cases correctly classified, can then be determined by summing the diagonal elements and dividing by the total number of cases. It is helpful to compare the percentage of cases correctly classified by discriminant analysis to the percentage that would be obtained by chance. Classification accuracy achieved by discriminant analysis should be at least 25% greater than that obtained by chance.

18-25 Results of Three-Group Discriminant Analysis Table 18.5 Group Means AMOUNT INCOMETRAVEL VACATION HSIZE AGE Total Group Standard Deviations Total Pooled Within-Groups Correlation Matrix INCOME TRAVEL VACATION HSIZE AGE INCOME TRAVEL VACATION HSIZE AGE Contd.

18-26 Results of Three-Group Discriminant Analysis Table 18.5 cont. Wilks' (U-statistic) and univariate F ratio with 2 and 27 degrees of freedom. Variable Wilks' Lambda FSignificance INCOME TRAVEL VACATION HSIZE AGE CANONICAL DISCRIMINANT FUNCTIONS % of Cum Canonical After Wilks' FunctionEigenvalue Variance %Correlation Function Chi-square df Significance : * : * : * marks the two canonical discriminant functions remaining in the analysis. Standardized Canonical Discriminant Function Coefficients FUNC 1 FUNC 2 INCOME TRAVEL VACATION HSIZE AGE Contd.

18-27 Results of Three-Group Discriminant Analysis Table 18.5 cont. Structure Matrix: Pooled within-groups correlations between discriminating variables and canonical discriminant functions (variables ordered by size of correlation within function) FUNC 1 FUNC 2 INCOME * HSIZE * VACATION * TRAVEL * AGE * Unstandardized canonical discriminant function coefficients FUNC 1 FUNC 2 INCOME E-01 TRAVEL VACATION E HSIZE AGE E E-01 (constant) Canonical discriminant functions evaluated at group means (group centroids) GroupFUNC 1 FUNC Contd.

18-28 Results of Three-Group Discriminant Analysis Table 18.5 cont. Classification Results: PredictedGroup Membership Actual Group No. of Cases 123 Group %10.0%0.0% Group %90.0%0.0% Group %20.0%80.0% Percent of grouped cases correctly classified: 86.67% Classification results for cases not selected for use in the analysis PredictedGroup Membership Actual Group No. of Cases123 Group %25.0%0.0% Group %75.0%25.0% Group %0.0%75.0% Percent of grouped cases correctly classified: 75.00%

18-29 All-Groups Scattergram Fig Across: Function 1 Down: Function * * * * indicates a group centroid

18-30 Territorial Map Fig Across: Function 1 Down: Function * * * * Indicates a group centroid

18-31 Stepwise Discriminant Analysis Stepwise discriminant analysis is analogous to stepwise multiple regression (see Chapter 17) in that the predictors are entered sequentially based on their ability to discriminate between the groups. An F ratio is calculated for each predictor by conducting a univariate analysis of variance in which the groups are treated as the categorical variable and the predictor as the criterion variable. The predictor with the highest F ratio is the first to be selected for inclusion in the discriminant function, if it meets certain significance and tolerance criteria. A second predictor is added based on the highest adjusted or partial F ratio, taking into account the predictor already selected.

18-32 Each predictor selected is tested for retention based on its association with other predictors selected. The process of selection and retention is continued until all predictors meeting the significance criteria for inclusion and retention have been entered in the discriminant function. The selection of the stepwise procedure is based on the optimizing criterion adopted. The Mahalanobis procedure is based on maximizing a generalized measure of the distance between the two closest groups. The order in which the variables were selected also indicates their importance in discriminating between the groups. Stepwise Discriminant Analysis

18-33 SPSS Windows The DISCRIMINANT program performs both two- group and multiple discriminant analysis. To select this procedure using SPSS for Windows click: Analyze>Classify>Discriminant …