Presentation on theme: "1 ANALYSIS OF VARIANCE (ANOVA) ?=?= ?=?=. 2 STATITICAL DATA ANALYSIS COMMON TYPES OF ANALYSIS? COMMON TYPES OF ANALYSIS? 1.Examine Strength and Direction."— Presentation transcript:
1 ANALYSIS OF VARIANCE (ANOVA) ?=?= ?=?=
2 STATITICAL DATA ANALYSIS COMMON TYPES OF ANALYSIS? COMMON TYPES OF ANALYSIS? 1.Examine Strength and Direction of Relationships a.Bivariate (e.g., Pearson Correlation—r) Between one variable and another: r xy or Y = a + b 1 x 1 Between one variable and another: r xy or Y = a + b 1 x 1 b.Multivariate (e.g., Multiple Regression Analysis) Between one dep. var. and each of several indep. variables, while holding all other indep. variables constant: Between one dep. var. and each of several indep. variables, while holding all other indep. variables constant: Y = a + b 1 x 1 + b 2 x 2 + b 3 x 3 +... + b k x k Y = a + b 1 x 1 + b 2 x 2 + b 3 x 3 +... + b k x k 2.Compare Groups a.Compare Proportions (e.g., Chi-Square Test— 2 ) H 0 : P 1 = P 2 = P 3 = … = P k H 0 : P 1 = P 2 = P 3 = … = P k b.Compare Means (e.g., Analysis of Variance) H 0 : µ 1 = µ 2 = µ 3 = …= µ k H 0 : µ 1 = µ 2 = µ 3 = …= µ k
3 ONE-WAY ANOVA To compare the mean values of a certain characteristic among two or more groups. To see whether two or more groups are equal (or different) on a given metric characteristic. To examine whether a metric dependent variable is a function of a categorical independent variable. ANOVA was developed in 1919 by Sir Ronald Fisher, a British statistician and geneticist/evolutionary biologist When Do You Use ANOVA? Sir Ronald Fisher (1890-1962)
4 INDEPENDENT NOMINAL/CATEGORICAL METRIC (ORDERED METRIC or HIGHER) * Chi-Square * Discriminant Analysis * Fisher’s Exact Prob. * Logit Regression * T-Test * Correlation Analysis * Analysis of Variance * Regression Analysis (An Example ?) Remember: Level of measurement determines choice of statistical method. Statistical Techniques and Levels of Measurement: DEPENDENT NOMINALNOMINALNOMINALNOMINAL METRICMETRICMETRICMETRIC
5 ONE-WAY ANOVA H 0 : There are no differences among the mean values of the groups being compared (i.e., the group means are all equal)– H 0 : µ 1 = µ 2 = µ 3 = …= µ k H a (Conclusion if H 0 rejected)? Not all group means are equal (i.e., at least one group mean is different from the rest). H 0 in ANOVA?
6 ONE-WAY ANOVA Scenario 1. When comparing 2 groups, a one-step test : 2 Groups:A B Step 1: Check to see if the two groups are different or not, and if so, how. Scenario 2. When comparing >3 groups, if H 0 is rejected, it is a two-step test: >3 Groups: A B C Step 1: Overall test that examines if all groups are equal or not. And, if not all are equal (H 0 rejected), then: Step 2: Pair-wise (post-hoc) comparison tests to see where (i.e., among which groups) the differences exit, and how. So, the number of steps involved in ANOVA depend on if we are comparing 2 groups or > 2 groups:
7 Typical solution presented in statistics classes require… Constructing an ANOVA TABLE ANOVA TABLE Let’s see the intuitive logic… Test Statistic
8 Sample Data: A random sample of 9 banks, 10 retailers, and 10 utilities. Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries Banking Retailing Utility $ 6.42 $3.52$3.55 2.834.212.13 8.944.363.24 6.802.676.47 5.703.493.06 4.654.681.80 6.203.305.29 2.712.682.96 8.347.252.90 ----- 0.161.73 n B = 9 n R = 10n U = 10 n = 29 H 0 : There were no differences in average EPS of Banks, Utilities, and Retailers. First logical thing you do? __ _ = x B = 5.84x R = 3.63x U = 3.31 X = 4.21 ONE-WAY ANOVA EXAMPLE: Whether or not average earnings per share (EPS) for commercial banks, retailing operations, & utility companies (variable Industry) was the same last year.
9 ONE-WAY ANOVA Why is it called ANOVA? Differences in EPS (Dep. Var.) among all 29 firms has two components -- differences among the groups and differences within the groups. That is, a.There are some differences in EPS among the three groups of firms (Banks vs. Retailers vs. Utilities), and b.There are also some differences/variations in EPS of the firms within each of these groups (among banks themselves, among retailers themselves, and among utilities themselves). ANOVA will partition/analyze the variance of the dependent variable (i.e., the differences in EPS) and traces it to its two components/sources--i.e., to differences between groups vs. differences within groups. WHY?
10 ONE-WAY ANOVA The underlying intuitive logic in ANOVA: If the groups that are being compared, come from the same population (i.e., if groups are alike/equal): They should exhibit similar differences (have equal variability) Hence, the differences among these groups should be no more than the differences within them (i.e., among members within same groups). That is, groups that are alike/similar are expected to have about as much variability between them as they have within them.
11 ONE-WAY ANOVA On the other hand… If the groups being compared are divergent/dissimilar/unequal ? They would exhibit more difference between them than they show within them. Among members within the same groups That is, they will have greater similarity/commonality internally than they have externally (with members of the other groups).
12 ONE-WAY ANOVA a.Compute the differences that exist among these groups, and b.Compare it with the differences that exist within these groups. And, that is exactly what ANOVA does…. QUESTION: How do we usually measure differences? CRITERION USED BY ANOVA: Groups can be considered different if there exists…? …if there exists larger differences among these groups than there are among members within them. QUESTION: Given the above, what would one have to do to conduct ANOVA? That is, what do you have to do to judge whether or not two or more groups can be considered different/equal (with respect to a given characteristic)?
13 ONE-WAY ANOVA VARIANCE: A useful index of differences/variations/ dispersion among a set of values/scores. –Estimate of average (i.e., per observation) difference from the mean Computation? QUESTION: How do we usually measure differences/variations? Sum of squared deviations from the mean S 2 = Sample Size – 1
14 ONE-WAY ANOVA So, steps in performing ANOVA: a.Compute the BETWEEN-GROUP VARIANCE for the characteristic under study (i.e., the dependent variable), b.Compute the WITHIN-GROUP VARIANCE for the same characteristic/variable, and then c.COMPARE the two (i.e., check to see if Between Group var. > Within Group Var.) NOTE: In ANOVA the term “MEAN SQUARE,” rather than variance, is utilized.
15 ONE-WAY ANOVA Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries Banking Retailing Utility 6.423.523.55 2.834.212.13 8.944.363.24 6.802.676.47 5.703.493.06 4.654.681.80 6.203.305.29 2.712.682.96 8.347.252.90 ----- 0.161.73 n B = 9 n R = 10n U = 10 n = 29 __ _ = x B = 5.84x R = 3.63x U = 3.31 X = 4.21 Total WITHIN Group Variance (or Mean Square WITHIN)?
16 Mean Square WITHIN Groups (MSW): Called “Degrees of Freedom”= (n B -1)+(n R -1)+(n U -1) Let’s see what we just did: The generic mathematical formula for MSW: ONE-WAY ANOVA
17 Table 1. Earnings Per Share (EPS) of Sample Firms in the Three Industries Banking Retailing Utility 6.423.523.55 2.834.212.13 8.944.363.24 6.802.676.47 5.703.493.06 4.654.681.80 6.203.305.29 2.712.682.96 8.347.252.90 ----- 0.161.73 n B = 9 n R = 10n U = 10 n = 29 ___= x B = 5.84x R = 3.63x U = 3.31x = 4.21 Let’s now compute the BETWEEN Group Variance (Mean Square BETWEEN--MSB)? ONE-WAY ANOVA
18 Mean Square BETWEEN Groups (MSB): Called Degrees of Freedom Let’s see what we just did: Mathematical formula for MSB: Weighted by respective group sizes ONE-WAY ANOVA
19 ONE-WAY ANOVA Mean Square Between Groups = MSB = 17.698 MSB represents the portion of the total differences/variations in EPS (the dependent variable) that is attributable to (or explained by) differences BETWEEN groups (e.g., industries) That is, the part of differences in companies’ EPS that result from whether they are banks, retailers, or utilities.
20 ONE-WAY ANOVA Mean Square Within Groups (MS Residual/Error) = MSW = 3.35 MSW represents: a.The differences in EPS (the dependent variable) that are due to all other factors that are not examined and not controlled for in the study (e.g., diversification level, firm size, etc.) Plus... b.The natural variability of EPS (the dependent variable) among members within each of the comparison groups (Note that even banks with the same size and same level of diversification would have different EPS levels).
21 ONE-WAY ANOVA Now, let’s compare MSB & MSW: QUESTION: MSB = 17.6 and MSW = 3.35. QUESTION: Based on the logic of ANOVA, when would we consider two (or more) groups as different/unequal? When MSB is significantly larger than MSW.QUESTION: What would be a reasonable index (a single number) that will show how large MSB is compared to MSW? (i.e., a single number that will show if MSB is larger than, equal to, or smaller than MSW)?
22 Compare BETWEEN and WITHIN Group Variances/Mean Squares--Compute the F-Ratio: Ratio of MSB and MSW (Call it F-Ratio): What can we infer when F-ratio is close to 1? –MSB and MSW are likely to be equal and, thus, there is a strong likelihood that NO difference exists among the comparison groups. How about when F-ratio is significantly larger than 1? –The more F-ratio exceeds 1, the larger MSB is compared to MSW and, thus, the stronger would be the likelihood/evidence that group difference(s) exist. Results of the above computations are usually summarized in an ANOVA TABLE such as the one that follows:
23 ANOVA TABLE
24 For our sample companies, EPS difference across the three industries (MSB) is more than 5 times the EPS difference among firms within the industries (MSW) QUESTION: What is our null Hypothesis? QUESTION: Is the above F-ratio of 5.28 large enough to warrant rejecting the null? –ANSWER: It would be if the chance of being wrong (in rejecting the null) does not exceed 5%. – So, look up the F-value in the table of F-distribution (under appropriate degrees of freedom) to find out what the -level will be if, given this F- value, we decide to reject the null. Degrees of Freedom: v 1 = k – 1 = 2 v 2 = n – k = 26 ONE-WAY ANOVA Interpretation and Conclusion: QUESTION: What does the F = 5.28 mean, intuitively?
25 F = 3.37 is significant at = 0.05 (If F=3.37 and we reject H 0, 5% chance of being wrong) 11
26 F = 4.27 is significant at = 0.025. That is, if F=4.27 and we reject H 0, we would face 5% chance of being wrong. But, our F = 5.28 > 4.27 So, what can we say about our -level? Will it be larger or smaller than 0.025? Our F = 5.28 > 4.27 –So, what can we say about our -level?
27 ONE-WAY ANOVA The odds of being wrong, if we decide to reject the null, would be less than 2.5% (i.e., < 0.025). Would rejecting the null be a safe bet? Conclusion? Reject the null and conclude that the average EPS is NOT EQUAL FOR ALL GROUPS (industries) being compared. Is the analysis complete? Our F = 5.28 > 4.27
28 ONE-WAY ANOVA Is our analysis complete? –It would be if we were comparing only two groups; simply examine which sample mean is larger than which and report!! HOWEVER, … –If null is rejected and more than two groups are being compared: REMAINING QUESTION: Where exactly (i.e., between which groups) do the differences lie? And, which group(s) of firms exhibit relatively higher, lower, or equal EPS levels? ANSWER: Perform post hoc, multiple comparison tests. –SPSS (and other software packages) offer a variety of options (e.g., LSD, Bonferroni, Tukey, etc.) to choose from. Let’s now review the steps involved…
29 ONE-WAY ANOVA Overall Ho: All Group Means Are Equal H1: Not All Groups Are Equal How many groups are being compared? If only 2 If more than 2 Examine the group means. Conduct post-hoc pairwise comparison tests to see where the differences lie. Examine the results. Report which group has higher/lower mean Examine the group means. Report which groups have higher/lower means. Stop No ( >.05) Is overall F significant? (i.e., < 0.05) Yes ( <.05) Don’t reject Ho; No group diff. found; stop Reject Ho; Not all group means are equal. (i.e., at least 2 groups are diff.)
30 ANOVA in SPSS Let’s now use SPSS to perform the same analysis. NOTE: Students are supposed to have printed and brought the “SPSS OUTPUT One-Way ANOVA” PDF file with them to class.SPSS OUTPUT One-Way ANOVA ONE_WAY_EPS_SPSS_FILE
31 TWO-WAY ANOVA (with Interaction) In our EPS example, suppose you suspect that a company’s size category (small vs large) also may have a sig. effect on EPS. As such, since you did not attempt to control for company size when selecting your sample firms, small and large companies may not have been equally represented in the three industry groups (e.g., what if compared to the banks in the sample, all or a much greater % of retailers and utilities were small?). As such you are concerned that the potential confounding effect of company size may have distorted your earlier results. So, you now wish to examine possible EPS differences among the 3 industries while controlling for the possible confounding effect of company size (i.e., holding size constant/equal for the firms in our three industries). In other words, you wish to know if there are any differences among average EPS of banks, retailers, and utilities of equal size..
32 TWO-WAY ANOVA (with Interaction) So, Two-Way ANOVA will help us learn if banks in general, even after controlling for co. size, would, on average, have higher EPS than retailers and utilities. But an additional advantage of Two-Way ANOVA is that it can also show us whether a particular group of banks (i.e., CERTAIN COMBINATIONS of industry and size category) are more/less conducive to EPS than others combinations of the two characteristics. As just one example, it can show us if only the larger banks (and not all banks in general) have significantly higher EPS compared to firms in the other two industries (or compared to only the smaller firms in the other two industries).
33 ANOVA Using SPSS TWO-WAY ANOVA (with Main & Interaction Effects): –Analyze:General Linear Models –Univariate:Y to “Dependent” box, Categorical X1 & X2 to the “Fixed Factors” box –Model: Full, Continue –Plots: X1 to “Horizontal”, X2 to “Separate Lines”, Add, Continue –Post Hoc: Move factors (IVs) with >2 groups to “Post Hoc Tests” box, select “Tukey or Bonferoni”, Continue –Options: Move Overall, X1, X2, and X1*X2 to “Display Means” Box, check “Descriptive Stats.”, Continue –OK NOTE: Students are supposed to have printed and brought the “SPSS OUTPUT Two-Way ANOVA with Interaction” PDF file with them to class.SPSS OUTPUT Two-Way ANOVA with Interaction TWO_WAY_EPS_SPSS_FILE
34 TWO-WAY ANOVA (Main & Interaction Effects Model) Ho: There are no differences among the groups represented by either variable Yes, only 2 groups No, more than 2 groups Examine the group means for that variable; report which group has higher/lower mean. Conduct post-hoc pairwise comparison tests for that var. to see where the differences lie. Examine the results. Examine the group means for that variable; report which groups have higher/lower means. STOP No Is overall F significant? (i.e., < 0.05) Yes Don’t reject Ho; No group diff. found; STOP Reject Ho; Some differences among the groups represented by at least one of the var. b. Is the significant indep. var. dichotomous (i.e. represents only 2 groups)? Determine if the interaction effect is significant? NO YES Examine plot of interaction effect for results a. Examine which main effect, if any, is significant (i.e., differences exist across categories of which independent variable). STOP
35 ANOVA CAUTION: Don’t get carried away with the number of factors (independent categorical variables); DON’T DO N-WAY ANOVA !!!
36 ANOVA Using SPSS ANOTHER EXAMPLES: Using the gss.sav data file, we wish to find out if the age at which one gets married (agewed) is a function of one’s gender (sex) and highest educational degree (degree). That is, if average marriage age is different among the two genders and various educational groups. If so, in what way? NOTE: Here, we are considering/treating educational degree as a nominal/categorical variable, and NOT as an ordered metric variable.
37 ASSIGNMENT 4 1. Suppose, as a social scientist, you are interested in studying gender differences in preference for different types of music. Specifically, you wish to know if there are differences between men and women relative to how much they like classical music (variables classical). The gss.sav data file (on your SPSS Data Disk) includes data regarding such issues. This data set represents 1500 randomly selected cases from the 1993 General Social Survey. Use the data from this SPSS file to address the above questions.gss.sav NOTE: If you check the value labels for the variables classical, opera, and country in the gss.sav file, you will see that they were measured on 5-point scales (1=Like Very Much, 5=Dislike Very Much) and, thus, can be considered metric.
38 ASSIGNMENT 4 2.As a staff researcher in the HR Department of a major company, you are interested in learning if there are differences among male and female employees and among employees who have different levels of education regarding the level of importance that they attach (a) to having a fulfilling job. Data regarding such issues have been obtained through the General Social Research Survey using a representative sample of approximately 1500 working men and women in the U.S. You have access to the resulting data (see gss.sav SPSS data file, variables sex, impjob, and degree). Use this data set to address the above issues.gss.sav
39 IMPORTANT NOTES FOR QUESTIONS 2, 3, AND 4: Treat variable “degree” as a categorical/nominal variable. When interpreting the results, please pay attention to the fact that if you check the value labels for the dependent variables, you will notice that it was measured on 5-point scales (1=One of Most Important, 5=Not at All Important). If you find it necessary to conduct ad-hoc multiple comparison tests, use the Tukey option. IMPORTANT: If alpha level for a given test is just slightly higher than 0.05 (e.g., 0.054) consider that difference statistically significant. REMINDERS: –For each analysis, include the Notes part of the SPSS output in the printout. Also edit the first page of every output to include your name. Make sure that you state your complete interpretations and explanations on the appropriate pages of the output. Be specific as to how you have used what parts of the output to reach your conclusions. Make sure that your explanations are complete. For example, it is not enough to say that there is a difference between groups A and B regarding characteristic C. You have to go on to indicate how the two groups are different on characteristic C (e.g., “on average, group A exhibits more/less of the characteristic C”).