Sociology 5811: Lecture 13: ANOVA Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Announcements Midterm in one week Bring a Calculator Bring Pencil/Eraser Mid-term Review Sheet handed out today New topic today: ANOVA

5811 Midterm Exam Exam Topics: All class material and readings up through ANOVA Emphasis: conceptual understanding, interpretation Memorization of complex formulas not required I will provide a “formula sheet”… But, formulas won’t be labeled! Exam Format: Mix of short-answer and longer questions Mix of math problems and conceptual questions.

Review: Mean Difference Tests For any two means, the difference will also fall in a certain range. Example: Group 1 means range from 6.0 to 8.0 Group 2 means range from 1.0 to 2.0 Difference in means will range from 4.0 to 7.0 If it is improbable that the sampling distribution overlaps with zero, then the population means probably differ A corollary of the C.L.T provides formulas to estimate standard error of difference in means

Z-tests and T-tests If N = large, we can do a Z-test: This Z-score for differences in means indicates: How far the difference in means falls from zero (measured in “standard errors”) –If Z is large, we typically reject the null hypothesis… group means probably differ.

Z-tests and T-tests If N = small, but samples are normal, with equal variance, we can do a t-test: Small N requires a different formula to determine the standard error of difference in means Again: Large t = reject null hypothesis

T-Test for Mean Difference Question: What if you wanted to compare 3 or more groups, instead of just two? Example: Test scores for students in different educational tracks: honors, regular, remedial Can you use T-tests for 3+ groups? Answer: Sort of… You can do a T-test for every combination of groups e.g., honors & reg, honors & remedial, reg & remedial But, the possibility of a Type I error proliferates… 5% for each test. With 5 groups, chance of error reaches 50%.

ANOVA ANOVA = “ANalysis Of VAriance” “Oneway ANOVA” : The simplest form ANOVA lets us test to see if any group mean differs from the mean of all groups combined Answers: “Are all groups equal or not?” H0: All groups have the same population mean  1 =  2 =  3 =  4 H1: One or more groups differ But, doesn’t distinguish which specific group(s) differ Maybe only  2 differs, or maybe all differ.

ANOVA and T-Tests ANOVA and T-Test are similar Many sociological research problems can be addressed by either of them But, they rely on very different mathematical approaches If you want to compare two groups, both work If there are many groups, people usually use ANOVA Also, there are more advanced forms of ANOVA that are very useful.

ANOVA: Example Suppose you suspect that a firm is engaging in wage discrimination based on ethnicity Certain groups might be getting paid more or less… The company counters: “We pay entry-level workers all about the same amount of money. No group gets preferential treatment.” Given data on a sample of employees, ANOVA lets you test this hypothesis. Are observed group differences just due to chance? Or do they reflect differences in the underlying population? (i.e., the whole company)

ANOVA: Example The company has workers of three ethnic groups: Whites, African-Americans, Asian-Americans You observe: Y-bar White = $8.78 / hour Y-bar AfAm = $8.52 / hour Y-bar AsianAm = $8.91 / hour Even if all groups had the same population mean (  White =  AfAm =  AsianAm), samples differ randomly Question: Are the observed differences so large it is unlikely that they are due to random error? Thus, it is unlikely that:  White =  AfAm =  AsianAm

ANOVA: Concepts & Definitions The grand mean is the mean of all groups ex: mean of all entry-level workers = $8.75/hour The group mean is the mean of a particular sub- group of the population As usual, we hope to make inferences about population grand and group means, even though we only have samples and observed grand and group means We know Y-bar, Y-bar White, Y-bar AfAm,Y-bar AsianAm We want to infer about:  White,  AfAm,  AsianAm

ANOVA: Concepts & Definitions Hourly wage is the dependent variable We are looking to see if wage “depends” upon the particular group a person is in The effect of a group is the difference between that group’s mean from the grand mean Effect is denoted by alpha (  ) If  $  White = $8.90, then  White = $0.15 Effect of being in group j is: Calculated for samples as: It is like a deviation, but for a group

ANOVA: Concepts & Definitions ANOVA is based on partitioning deviation We initially calculated deviation as the distance of a point from the grand mean: But, you can also think of deviation from a group mean (called “e”): Or, for any case i in group j: Thus, the deviation (from group mean) of the 27 th person in group 4 is:

ANOVA: Concepts & Definitions The location of any case is determined by: The Grand Mean, , common to all cases The group “effect” , common to group members The distance between a group and the grand mean The within-group deviation (e): called “error” The distance from group mean to an case’s value

The ANOVA Model This is the basis for a formal model: For any population with mean  Comprised of J subgroups, N j in each group Each with a group effect  The location of any individual can be expressed as follows: Y ij refers to the value of case i in group j e ij refers to the “error” (i.e., deviation from group mean) for case i in group j

Sum of Squared Deviation We are most interested in two parts of the model: The group effects:  j Deviation of the group from the grand mean Individual case error: e ij Deviation of the individual from the group mean Each are deviations that can be “summed up” Remember, we square deviations when summing Otherwise, they add up to zero Remember variance is just squared deviation.

Sum of Squared Deviation The total deviation can partitioned into  j and e ij components: That is,  j + e ij = total deviation:

Sum of Squared Deviation The total deviation can partitioned into  j and e ij components: The total variance ( SS total ) is made up of: –  j  : between group variance ( SS between ) – e ij : within group variance ( SS within ) – SS total = SS between + SS within

Sum of Squared Deviation Given a sample with J sub-groups: Formula for the squared deviation can be re- written as follows: This is called the “Total Sum of Squares” (SS total )

Sum of Squared Deviation The between group (  ) variance is the distance from the grand mean to each group mean (summed for all cases): The within group variance (e) is the distance from each case to its group mean (summed):

Sum of Squared Variance The sum of squares grows as N gets larger. To derive a more comparable measure, we “average” it, just as with the variance: i.e, (divide) by N-1 It is desirable, for similar reasons, to “average” the Sum of Squares between/within Result the “Mean Square” variance –MS between and MS within

Sum of Squared Variance Choosing relevant denominators we get:

Mean Squares and Group Differences Question: Which suggests that group means are quite different? –MS between > MS within or MS between < MS within

Mean Squares and Group Differences MS between > MS within : MS between < MS within :

Mean Squares and Group Differences Question: Which suggests that group means are quite different: –MS between > MS within or MS between < MS within Answer: If between group variance is greater than within, the groups are quite distinct It is unlikely that they came from a population with the same mean But, if within is greater than between, the groups aren’t very different – they overlap a lot It is plausible that  1 =  2 =  3 =  4

The F Ratio The ratio of MS between to MS within is referred to as the F ratio: If MS between > MS within then F > 1 If MS between < MS within then F < 1 Higher F indicates that groups are more separate

The F Ratio The F ratio has a sampling distribution That is, estimates of F vary depending on exactly which sample you draw Again, this sampling distribution has known properties that can be looked up in a table The “F-distribution” –Different from z & t! Statisticians have determined how much area falls under the curve for a given value of F… So, we can test hypotheses.

The F Ratio Assumptions required for hypothesis testing using an F-statistic 1. J groups are drawn from a normally distributed population 2. Population variances of groups are equal If these assumptions hold, the F statistic can be looked up in an F-distribution table Much like T distributions –But, there are 2 degrees of freedom: J-1 and N-J One for number of groups, one for N

The F Ratio Example: Looking for wage discrimination within a firm The company has workers of three ethnic groups: Whites, African-Americans, Asian-Americans You observe in a sample of 200 employees: Y-bar White = $8.78 / hour Y-bar AfAm = $8.52 / hour Y-bar AsianAm = $8.91 / hour

The F Ratio Suppose you calculate the following from your sample: F = 6.24 Recall that N = 200, J = 3 Degrees of Freedom: J-1 = 2, N-J = 197 If  =.05, the critical F value for 2, 197 is 3.00 See Knoke, p. 514 The observed F easily exceeds the critical value Thus, we can reject H0; we can conclude that the groups do not all have the same population mean

Comparison with T-Test T-test strategy: Determine the sampling distribution of the mean… Use that info to assess probability that groups have same mean (difference in means = 0) ANOVA strategy Compute F-ratio, which indicates what kind of deviation is larger: “between” vs. “within” group High F-value indicates groups are separate Note: For two groups, ANOVA and T-test produce identical results.

Bivariate Analyses Up until now, we have focused on a single variable: Y Even in T-test for difference in means & ANOVA, we just talked about Y – but for multiple groups… Alternately, we can think of these as simple bivariate analyses Where group type is a “variable” –Ex: Seeing if girls differ from boys on a test … is equivalent to examining whether gender (a first variable) affects test score (a second variable).

2 Groups = Bivariate Analysis Group 1: Boys CaseScore 157 264 348 CaseGenderScore 1057 2064 3048 4153 5187 6173 Group 2: Girls CaseScore 153 287 373 2 Groups = Bivariate analysis of Gender and Test Score

T-test, ANOVA, and Regression Both T-test and ANOVA illustrate fundamental concepts needed to understand “Regression” Relevant ANOVA concepts The idea of a “model” Partitioning variance A dependent variable Relevant T-test concepts Using the t-distribution for hypothesis tests Note: For many applications, regression will supersede T-test, ANOVA But in some cases, they are still useful…

Sociology 5811: Lecture 13: ANOVA Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Similar presentations

Presentation on theme: "Sociology 5811: Lecture 13: ANOVA Copyright © 2005 by Evan Schofer Do not copy or distribute without permission."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sociology 5811: Lecture 13: ANOVA Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Similar presentations

Presentation on theme: "Sociology 5811: Lecture 13: ANOVA Copyright © 2005 by Evan Schofer Do not copy or distribute without permission."— Presentation transcript:

Similar presentations

About project

Feedback