Chapter 10 Discriminant Analysis

Chapter 10 Discriminant Analysis
Introduction Discriminant analysis linear equation Fisher’s discriminant analysis Bayes discriminant analysis Stepwise discriminant analysis Discriminant analysis in SPSS(version 20) Summary Glossary Exercises

Statistics in practice
When a company, for example WAHAHA, is going to produce a new kind of product , at the beginning, the managers usually want to predict whether this product will be a success or not. There are many factors affecting the future of this new product, such as consumption level and consumption habits of the potential consumers, infrastructure of the country, the wrapping, design and quality of the product etc. They need to do a market survey to find the data of these factors. Relying on the dataset, how to predict whether this new product will be a success or not? One way is through the discriminant analysis.

10.1 Introduction Discriminant Analysis (DA) Scope Of DA DA Steps
DA Classification

Discriminant Analysis (DA)
A set of tools and methods used to distinguish between two or more groups of populations and to determine how to allocate new observations into groups. Discriminant Analysis plays the same role as the multiple linear regressions by predicting an outcome. The basic purpose of discriminant analysis is to estimate the relationship between a single categorical dependent variable and a set of quantitative independent variables.

Scope Of DA Discriminant analysis has widespread applications in situations where the primary objective is identifying the group to which a categorical object belongs. Include predicting the success or failure of a new product, determining what category of credit risk a person falls into, classifying students as to vocational interests, deciding whether a student should be admitted to graduate school or predicting whether a company will be a success or not.

DA Steps Discriminant analysis is usually splitted into a 2-step process. The first step is to test the significance of a set of discriminant functions. The second step is the classification.

DA Classification Two-group discriminant analysis and multi-group discriminant analysis by the number of discriminant groups. linear discriminant analysis and nonlinear discriminant analysis according to the used mathematical models. step discriminant analysis and sequential discriminant analysis according to the different methods used for variables Fisher discriminant analysis and Bayes discriminant analysis according to different discriminant criterions.

10.2 Discriminant analysis linear equation
Discriminant Analysis Form Main Assumptions

Discriminant Analysis Form
The linear combination for a discriminant analysis, also known as the discriminant analysis function, is derived from an equation that takes the following form: where is the discriminate function; is the coefficient or weight of that variable; is a variable and is a constant.

Main Assumptions The sample is a random one and each predictor variable is normally distributed. The allocations for the dependent categories in the initial classification are correctly classified. There are two or more categorical groups, and each case belongs to exactly one group, so that the groups are mutually exclusive and collectively exhaustive. Each group must be well defined, clearly differentiated from any other group. Putting a median split on an attitude scale is not a natural way to form groups. Partitioning quantitative variables is only justifiable if there are easily identifiable gaps at the points of division.

Main Assumptions The groups should be defined before we collect the data. The attributes used to separate the groups should discriminate quite clearly between the groups. Hence there is no group overlap or the group overlap is minimal. Group sizes of the dependent should not be extremely different.

10.3 Fisher’s discriminant analysis
The idea of Fisher’s discriminant analysis One-way ANOVA Rule

The idea of Fisher’s discriminant analysis
The Fisher’s discriminant analysis was raised by R. Fisher in 1936. The idea of Fisher’s discriminant analysis is to base the discriminant rule on a projection, such that a good separation of groups of -dimensional data was achieved.

One-way ANOVA Rule. Let denote a linear combination of observations, where is a vector of a space , then the total sum of squares of , , is equal to (10.1) With the centering matrix and . Assume we have samples , , where is a positive integer denoting the sample size. Then find the linear combination which maximizes the ratio of the between-group-sum of squares to the within-group-sum of squares.

One-way ANOVA Rule. The within-group-sum of squares is given by (10.2) Where denotes the -th submatrix of corresponding to observations the -th group, and denotes the centering matrix. The within-group-sum of squares measures the sum of variations within each group.

One-way ANOVA Rule. The between-group-sum of squares is (10.3) Where and denote the means of and , and and denote the means of and , respectively. The between-group-sum of squares measures the variation of the means across groups. The total sum of squares (10.1) is the sum of the within-group-sum of squares and the between-group-sum of squares, that is,

One-way ANOVA Rule. Fisher’s discriminant analysis selects the projection vector that maximizes the ration (10.4) The solution can be easily found. Since the vector which maximizes formula (10.4) is the eigenvector of hat corresponds to the largest eigenvalue, a discriminant rule is easy to obtain. Classifying into group where is closest to . For more properties and details of Fisher’s discriminant analysis, we refer to [1,2,3].

10.4 Bayes discriminant analysis
The idea of Bayes discriminant analysis Introduce The Principle

The idea of Bayes discriminant analysis
This kind of information can usually be utilized to form a prior probability distribution. According to the principle of Bayes’ rule, the prior distribution can possibly be further refined after obtaining the observations, and refined to form the posterior distribution. The discriminant involving the posterior distribution is often referred to as Bayes discriminant.

Introduce The Principle
Let . Thus the ECM (expected cost of misclassification) caused by the above discriminant is: The Bayes discriminant analysis is to find a classification of such that ECM is minimum. For more Bayes discriminant analysis properties and detailed mathematical proofs refer to [1,2,3].

Assume that each set has a density function where . We already know that the prior probability distributions are respectively. Then Suppose that a classification of a space is given: . Hence whenever and . The discriminant rule is usually the following form: if belongs to , We use denoting the cost where the sample coming from but was wrongly sent to , and the probability of this misjudgment is

Let . Thus the ECM (expected cost of misclassification) caused by the above discriminant is: The Bayes discriminant analysis is to find a classification of such that ECM is minimum. For more Bayes discriminant analysis properties and detailed mathematical proofs refer to [1,2,3].

10.5 Stepwise discriminant analysis
The idea of Stepwise discriminant analysis The procedures of undertaking stepwise discriminant analysis in SPSS

The idea of Stepwise discriminant analysis
Stepwise discriminant analysis is an attempt to find the best set of predictors. It is often used in an exploratory situation to identify those variables from among a larger number that might be used later in a more rigorous theoretically driven study. In stepwise discriminant analysis, the most correlated independent is usually entered first by the stepwise program, then the second, until an additional dependent adds no significant amount to the canonical R squared.

The procedures of undertaking stepwise discriminant analysis in SPSS
The most economical method is the Wilks Lambda method. Hit “Analyze”, then choose “Classify”, then select “Discriminant”. Select grouping variable and transfer to “Grouping Variable” box. Then hit “Define Range” button and enter the lowest and highest codes for your grouping variable define range. Hit “Continue” then choose predictors and enter into “Independents box”. Then click on “Use Stepwise Methods”. Choose “Statistics”, then “Means, Univariate Anovas, Box’s M, Unstandardized and Within Groups Correlation”. Hit “Classify”. Select “Compute From Group Sizes, Summary Table, Leave One Out Classifi cation, Within Groups”, and all “Plots”. Click“Continue”, then “Save” and select “Predicted Group Membership” and “Discriminant Scores”.

10.6 Discriminant analysis in SPSS(version 20)
Example 1 Example 2

Example 1 To undertake discriminant analysis in SPSS, in the “Analyze” menu, click the “Classify”, then click “Discriminant”. In this case we are looking at a dataset that describes mobile phones with a drawback A. Research tells us that the degree of symptom drawback A in mobile phones is often in the eye of the beholder—some people may think that drawback A affects the mobile phone, and other people may not think it as a problem. For example women may think that drawback affects the mobile phone, while men may think that the drawback A does not affect the mobile phone.

Example 1 We are going to look at whether the symptoms reported on four different measures ( , , ,and ) can tell us whether those symptom ratings were provided by a woman (“people” = 1) or by a man (“people” = 2). In the dialog box, put in “people” as the “grouping variable” (in other words, the variable that you think defines different groups). It will appear in the box with two question marks after it—you have to tell SPSS what the codes are for the two groups that you want to compare. Click “Define Range” and type in “1” and “2” as the different values of the “people” variable that you want to compare. Then click “Continue” to go on.

Example 1 Then put the four “question” variables ( , ,
and ) in as the predictors (“independents”). The sub-dialog box “statistics” lets you see descriptive statistics on each predictor variable for the different groups. Let’s check “Means” to see some basic descriptive statistics.

Example 1

Example 1 Click “Continue” to go back to the main dialog box. Then, in the main dialog box, click the “Classify” button to see the next sub-dialog. Here’s the sub-dialog that you get when you hit the “Classify” button:

Example 1

Example 1 On this screen, check the box for the “summary table.” This will give you the classification table (sensitivity, specificity, etc.) on your printout. Hit “Continue” to go back to the main dialog box, and then hit “OK” to see the results of your analysis.

Example 1 This table just tells you if there’s any missing data.
The following is the result. Discriminant [DataSet1] C:\Users\lenovo\Documents\peopleexample.sav This table just tells you if there’s any missing data.

Example 1 This table shows the means that we asked for—it gives means on each variable for people in each sub-group, and also the overall means on each variable.

Example 1 Analysis 1 Summary of Canonical Discriminant Functions
This table tells you something about the “latent variable” that you have constructed (the discriminant function), which helps you to differentiate between the groups.

Example 1 Here is the multivariate test—Wilks’ lambda, just like in MANOVA.

Example 1 These “discriminant function coefficients” work just like the beta-weights in regression. Based on these, you can write out the equation for the discriminant function: Using this equation, given someone’s scores on , , and , you can calculate their score on the discriminant function. To figure out what that D function score means, look at the following group centroids.

Example 1 This table tells you the correlation between each item and the discriminant function, but we won’t use it for anything here.

Example 1 In practical terms, we usually figure out which group a person is in by calculating a cut score halfway between the two centroids: If an individual person’s score on the function (calculated by plugging in their scores on , , , and to the equation we wrote out above) is above –1.034, then they were probably men. If their function score is below –1.034, then they were probably women

Example 1 Classification Statistics

Example 1 Here is the classification table that we got by selecting that option in the SPSS dialog box. It gives information about actual group membership and predicted group membership.

Example 2 A sample of 12 riding-lawnmower owners and 12 nonowners is sampled from a city and the income in thousands of dollars and lot size in thousands of square feet are recorded. A riding-mower manufacturer wants to see if these two variables adequately separate owners from nonowners, and if so then direct their marketing on the basis of the separation of owners from nonowners.

Example 2 The data table is the following. Lawnmower owners and nonowners dataset

Example 2 If a family has a lawnmower, then ; if a family does not have a lawnmower, then . In the dialog box, put in “family” as the “grouping variable” (the variable that you think defines different groups). It will appear in the box with two question marks after it. Click “Define Range” and type in “1” and “2” as the different values of the “family” variable that you want to compare. Then click “Continue” to go on.

Example 2

Example 2 Then put the two “question” variables ( , )in as the predictors (“independents”). The sub-dialog box “statistics” lets you see descriptive statistics on each predictor variable for the different groups. Let’s check “Means” to see some basic descriptive statistics.

Example 2

Example 2 Click “Continue” to go back to the main dialog box. Then, in the main dialog box, click the “Classify” button to see the next sub-dialog. Here’s the sub-dialog that you get when you click the “Classify” button:

Example 2 On this screen, check the box for the “summary table.” This will give you the classification table (sensitivity, specificity, etc.) on your printout. Hit “Continue” to go back to the main dialog box, and then hit “OK” to see the results of your analysis.

Example 2 The following is the result. This table just tells you if there’s any missing data.

Example 2 This table shows the means that we asked for—it gives means on each variable for people in each sub-group, and also the overall means on each variable.

Example 2 Analysis 1 Summary of Canonical Discriminant Functions
This table tells you something about the “latent variable” that you have constructed (the discriminant function), which helps you to differentiate between the groups.

Example 2 Here is the multivariate test—Wilks’ lambda.

Example 2 These “discriminant function coefficients” work just like the beta-weights in regression. Based on these, you can write out the equation for the discriminant function: Using this equation, given someone’s scores on and , you can calculate their score on the discriminant function. To figure out what that function score means, look at the following group centroids.

Example 2 This table tells you the correlation between each item and the discriminant function, but we won’t use it for anything here.

Example 2 In practical terms, we usually figure out which group a family is in by calculating a cut score halfway between the two centroids: If an individual family’s score on the function (calculated by plugging in their scores on and to the equation we wrote out above) is above 0, then they were probably lawnmower owners. If their function score is below 0, then they were probably nonowners.

Example 2 Classification Statistics

Example 2 Here is the classification table that we got by selecting that option in the SPSS dialog box. It gives information about actual group membership and predicted group membership.

Summary Discriminant analysis, which is a set of tools and methods used to distinguish between two or more groups of populations and to determine how to allocate new observations into groups. The purpose of discriminant analysis is to estimate the relationship between a single categorical dependent variable and a set of quantitative independent variables. Discriminant analysis plays the same role as the multiple linear regressions by predicting an outcome, and it has widespread applications in situations where the primary objective is identifying the group to which a categorical object belongs.

Summary Introduced several well-known discriminant analysis methods, including: Fisher’s discriminant analysis, Bayes discriminant analysis and stepwise discriminant analysis. Show that how to undertake discriminant analysis in SPSS 20.0 through two examples.

Exercises 1.What is the purpose of DA? 2.What is the idea of Fisher’s DA? 3.What is stepwise DA? 4.To apply the DA, what conditions should hold?

Exercises 5.The following table gives three financial data( , , ) of 10 companies with good financial status (group 1) and 6 companies with poor financial status (group 2). Now a company’s dataset is ( =420.50, =32.42, =1.98). Using DA determine this company is in group 1 or group 2.

Exercises 6.The following table gives two weather forecast data ( , )of 8 flood years (group 1) and 8 nonflood years (group 2). Now one year’s dataset is ( =33.6, = -1.5). Using DA determine this year is in group 1 or group 2. Weather forecast data

Glossary Discriminant analysis: is a set of methods and tools used to distinguish between two or more groups of populations and to determine how to allocate new observations into groups. Fisher’s discriminant analysis: is to base the discriminant rule on a projection such that a good separation was achieved. Bayes discriminant: is the discriminant involving the posterior distribution. Stepwise discriminant analysis: is a method to find the best set of predictors. Wilks Lambda method: is a method to select predictors that minimize Wilks Lambda.

Bibliography [1] Y.T. Zhang and K.T. Fang, Introduction on Multivariate Analysis, Bejing, Science Press, [2] X.Q. He, Multivariate Statistical Analysis (2nd), Beijing, China Renmin University Press, [3] W. Hardle and L. Simar, Applied Multivariate Statistical Analysis, Springer Verlag, Berlin-Heidelberg-New York, [4] G. L. Wang and X.Q. He, Multivariate Economic Data Statistic Analysis, Shaanxi Science Press, [5] N. Johnson and D. Wichren, Applied Multivariate Statistics Analysis, Upper Saddle River, N.J.: Prentice Hall, 1982.

End of Chapter 10

Chapter 10 Discriminant Analysis

Similar presentations

Presentation on theme: "Chapter 10 Discriminant Analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 10 Discriminant Analysis

Similar presentations

Presentation on theme: "Chapter 10 Discriminant Analysis"— Presentation transcript:

Similar presentations

About project

Feedback