Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan ANOVA SECTION 8.1 Testing for a difference in means across multiple categories.

Slides:



Advertisements
Similar presentations
Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
Advertisements

Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Hypothesis Test II: t tests
Chapter 15 ANOVA.
Chi-square and F Distributions
ANalysis Of VAriance can be used to test for the equality of three or more population means. H 0 :  1  =  2  =  3  = ... =  k H a : Not all population.
Multiple Regression and Model Building
Chi-Square Tests 3/14/12 Testing the distribution of a single categorical variable :  2 goodness of fit Testing for an association between two categorical.
Statistics: Unlocking the Power of Data Lock 5 Testing Goodness-of- Fit for a Single Categorical Variable Kari Lock Morgan Section 7.1.
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Hypothesis Testing Steps in Hypothesis Testing:
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/23/12 Sections , Single Proportion, p Distribution (6.1)
Hypothesis Testing IV Chi Square.
1 1 Slide © 2009, Econ-2030 Applied Statistics-Dr Tadesse Chapter 10: Comparisons Involving Means n Introduction to Analysis of Variance n Analysis of.
PSY 307 – Statistics for the Behavioral Sciences
Analysis of Variance: Inferences about 2 or More Means
STAT 101 Dr. Kari Lock Morgan Exam 2 Review.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Inference for Categorical Variables 2/29/12 Single Proportion, p Distribution Intervals and tests Difference in proportions, p 1 – p 2 One proportion or.
ANOVA 3/19/12 Mini Review of simulation versus formulas and theoretical distributions Analysis of Variance (ANOVA) to compare means: testing for a difference.
AM Recitation 2/10/11.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
1 1 Slide Analysis of Variance Chapter 13 BA 303.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/1/12 ANOVA SECTION 8.1 Testing for a difference in means across multiple.
ANOVA One Way Analysis of Variance. ANOVA Purpose: To assess whether there are differences between means of multiple groups. ANOVA provides evidence.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Statistics: Unlocking the Power of Data Lock 5 Normal Distribution STAT 101 Dr. Kari Lock Morgan 10/18/12 Chapter 5 Normal distribution Central limit theorem.
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: Cautions STAT 250 Dr. Kari Lock Morgan SECTION 4.3, 4.5 Type I and II errors (4.3) Statistical.
Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.
Chapter 19 Analysis of Variance (ANOVA). ANOVA How to test a null hypothesis that the means of more than two populations are equal. H 0 :  1 =  2 =
Inference after ANOVA, Multiple Comparisons 3/21/12 Inference after ANOVA The problem of multiple comparisons Bonferroni’s Correction Section 8.2 Professor.
Education 793 Class Notes Presentation 10 Chi-Square Tests and One-Way ANOVA.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
General Linear Model 2 Intro to ANOVA.
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 10/30/12 Chi-Square Tests SECTIONS 7.1, 7.2 Testing the distribution of a.
AP Statistics: ANOVA Section 1. In section 13.1 A, we used a t-test to compare the means between two groups. An ANOVA (ANalysis Of VAriance) test is used.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.1 Testing the distribution of a single categorical variable : χ.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal
Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Statistics: Unlocking the Power of Data Lock 5 Inference for Means STAT 250 Dr. Kari Lock Morgan Sections 6.4, 6.5, 6.6, 6.10, 6.11, 6.12, 6.13 t-distribution.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan SECTION 7.1 Testing the distribution of a single categorical variable : 
ANalysis Of VAriance can be used to test for the equality of three or more population means. H 0 :  1  =  2  =  3  = ... =  k H a : Not all population.
1/54 Statistics Analysis of Variance. 2/54 Statistics in practice Introduction to Analysis of Variance Analysis of Variance: Testing for the Equality.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
 List the characteristics of the F distribution.  Conduct a test of hypothesis to determine whether the variances of two populations are equal.  Discuss.
DSCI 346 Yamasaki Lecture 4 ANalysis Of Variance.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Chapter 13 Analysis of Variance (ANOVA). ANOVA can be used to test for differences between three or more means. The hypotheses for an ANOVA are always:
I. ANOVA revisited & reviewed
Chi-Square Goodness-of-Fit Test
AP Statistics FINAL EXAM ANALYSIS OF VARIANCE.
The binomial applied: absolute and relative risks, chi-square
Statistics Analysis of Variance.
Hypothesis Testing Review
Econ 3790: Business and Economic Statistics
Chapter 10 Analyzing the Association Between Categorical Variables
Analyzing the Association Between Categorical Variables
Chapter 10 – Part II Analysis of Variance
ANOVA: Analysis of Variance
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan ANOVA SECTION 8.1 Testing for a difference in means across multiple categories

Statistics: Unlocking the Power of Data Lock 5 Review: Chi-Square Tests The χ 2 goodness-of-fit tests if one categorical variable differs from a null distribution The χ 2 test for association tests for an association between two categorical variables For both, you compute the expected counts in each cell (assuming H 0 ) and the χ 2 statistic: Find the proportion above the χ 2 statistic in a randomization or χ 2 -distribution (if all expected counts > 5)

Statistics: Unlocking the Power of Data Lock 5 Multiple Categories So far, weve learned how to do inference for a difference in means IF the categorical variable has only two categories Today, well learn how to do hypothesis tests for a difference in means across multiple categories

Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing 1.State Hypotheses 2.Calculate a statistic, based on your sample data 3.Create a distribution of this statistic, as it would be observed if the null hypothesis were true 4.Measure how extreme your test statistic from (2) is, as compared to the distribution generated in (3) test statistic

Statistics: Unlocking the Power of Data Lock 5 Cuckoo Birds Cuckoo birds lay their eggs in the nests of other birds When the cuckoo baby hatches, it kicks out all the original eggs/babies If the cuckoo is lucky, the mother will raise the cuckoo as if it were her own uckoo-cuckoo / Do cuckoo birds found in nests of different species differ in size?

Statistics: Unlocking the Power of Data Lock 5 Length of Cuckoo Eggs

Statistics: Unlocking the Power of Data Lock 5 Notation k = number of groups n j = number of units in group j n = overall number of units = n 1 + n 2 + … + n k

Statistics: Unlocking the Power of Data Lock 5 Cuckoo Eggs k = 5 n 1 = 15, n 2 = 60, n 3 = 16, n 4 = 14, n 5 = 15 n = 120 BirdSample Mean Sample SD Sample Size Pied Wagtail Pipit Robin Sparrow Wren Overall

Statistics: Unlocking the Power of Data Lock 5 Hypotheses To test for a difference in means across k groups:

Statistics: Unlocking the Power of Data Lock 5 Test Statistic Why cant use the familiar formula to get the test statistic? We need something a bit more complicated…

Statistics: Unlocking the Power of Data Lock 5 Difference in Means Whether or not two means are significantly different depends on How far apart the means are How much variability there is within each group

Statistics: Unlocking the Power of Data Lock 5 Difference in Means

Statistics: Unlocking the Power of Data Lock 5 Analysis of Variance Analysis of Variance (ANOVA) compares the variability between groups to the variability within groups Total Variability Variability Between Groups Variability Within Groups

Statistics: Unlocking the Power of Data Lock 5 Analysis of Variance If the groups are actually different, then a)the variability between groups should be higher than the variability within groups b)the variability within groups should be higher than the variability between groups

Statistics: Unlocking the Power of Data Lock 5 Discoveries for Today How to measure variability between groups? How to measure variability within groups? How to compare the two measures? How to determine significance?

Statistics: Unlocking the Power of Data Lock 5 Discoveries for Today How to measure variability between groups? How to measure variability within groups? How to compare the two measures? How to determine significance?

Statistics: Unlocking the Power of Data Lock 5 Sums of Squares We will measure variability as sums of squared deviations (aka sums of squares) familiar?

Statistics: Unlocking the Power of Data Lock 5 Sums of Squares Total Variability Variability Between Groups Variability Within Groups overall mean data value i overall mean mean in group j i th data value in group j Sum over all data valuesSum over all groups Sum over all data values

Statistics: Unlocking the Power of Data Lock 5 Deviations Group 1 Group 2 Overall Mean Group 1 Mean

Statistics: Unlocking the Power of Data Lock 5 Sums of Squares Total Variability Variability Between Groups Variability Within Groups SST (Total sum of squares) SSG (sum of squares due to groups) SSE (Error sum of squares)

Statistics: Unlocking the Power of Data Lock 5 Cuckoo Birds

Statistics: Unlocking the Power of Data Lock 5 Source Groups Error Total df k-1 n-kn-k n-1 Sum of Squares SSG SSE SST Mean Square MSG = SSG/(k-1) MSE = SSE/(n-k) ANOVA Table The mean square is the sum of squares divided by the degrees of freedom variability average variability

Statistics: Unlocking the Power of Data Lock 5 ANOVA Table Fill in the beginnings of the ANOVA table based on the Cuckoo birds data. Source Groups Error Total df k-1 n-kn-k n-1 Sum of Squares SSG SSE SST Mean Square MSG = SSG/(k-1) MSE = SSE/(n-k) BirdSample Mean Sample SD Sample Size Pied Wagtail Pipit Robin Sparrow Wren Overall SSG = 35.9 SSE =

Statistics: Unlocking the Power of Data Lock 5 Source Groups Error Total dfSum of Squares Mean Square ANOVA Table Fill in the beginnings of the ANOVA table based on the Cuckoo birds data.

Statistics: Unlocking the Power of Data Lock 5 Discoveries for Today How to measure variability between groups? How to measure variability within groups? How to compare the two measures? How to determine significance?

Statistics: Unlocking the Power of Data Lock 5 F-Statistic The F-statistic is a ratio of the average variability between groups to the average variability within groups

Statistics: Unlocking the Power of Data Lock 5 Source Groups Error Total df k-1 n-kn-k n-1 Sum of Squares SSG SSE SST Mean Square MSG = SSG/(k-1) MSE = SSE/(n-k) F Statistic MSG MSE ANOVA Table

Statistics: Unlocking the Power of Data Lock 5 Cuckoo Eggs Source Groups Error Total df Sum of Squares Mean Square 35.9/4 = /115 = 0.88 F Statistic 8.97/0.88 = 10.19

Statistics: Unlocking the Power of Data Lock 5 F-statistic If there really is a difference between the groups, we would expect the F-statistic to be a)Higher than we would observe by random chance b)Lower than we would observe by random chance

Statistics: Unlocking the Power of Data Lock 5 Discoveries for Today How to measure variability between groups? How to measure variability within groups? How to compare the two measures? How to determine significance?

Statistics: Unlocking the Power of Data Lock 5 How to determine significance? We have a test statistic. What else do we need to perform the hypothesis test? A distribution of the test statistic assuming H 0 is true How do we get this? Two options: 1)Simulation 2)Distributional Theory

Statistics: Unlocking the Power of Data Lock 5 Simulation Because a difference would make the F- statistic higher, calculate proportion in the upper tail An F-statistic this large would be very unlikely to happen just by random chance if the means were all equal, so we have strong evidence that the mean lengths of cuckoo birds in nests of different species are not all equal.

Statistics: Unlocking the Power of Data Lock 5 F-distribution

Statistics: Unlocking the Power of Data Lock 5 F-Distribution If the following conditions hold, 1.Sample sizes in each group are large (each n j 30) OR the data are relatively normally distributed 2.Variability is similar in all groups 3.The null hypothesis is true then the F-statistic follows an F-distribution The F-distribution has two degrees of freedom, one for the numerator of the ratio (k – 1) and one for the denominator (n – k)

Statistics: Unlocking the Power of Data Lock 5 Equal Variance The F-distribution assumes equal within group variability for each group As a rough rule of thumb, this assumption is violated if the standard deviation of one group is more than double the standard deviation of another group

Statistics: Unlocking the Power of Data Lock 5 F-distribution Can we use the F-distribution to calculate the p-value for the Cuckoo bird eggs? a)Yes b)No c)Need more information BirdSample Mean Sample SD Sample Size Pied Wagtail Pipit Robin Sparrow Wren Overall

Statistics: Unlocking the Power of Data Lock 5 Length of Cuckoo Eggs

Statistics: Unlocking the Power of Data Lock 5 Source Groups Error Total df k-1 n-kn-k n-1 Sum of Squares SSG SSE SST Mean Square MSG = SSG/(k-1) MSE = SSE/(n-k) F Statistic MSG MSE p-value Use F k-1,n-k ANOVA Table

Statistics: Unlocking the Power of Data Lock 5 Cuckoo Eggs

Statistics: Unlocking the Power of Data Lock 5 Source Groups Error Total df Sum of Squares Mean Square F Statistic p-value 4.3 × ANOVA Table We have very strong evidence that average length of cuckoo eggs differs for nests of different species Equal variability Normal(ish) data

Statistics: Unlocking the Power of Data Lock 5 Can we use the F-distribution to calculate the p-value for whether there is a difference in average hours spent studying per week by class year at Duke? a)Yes b)No c)Need more information Study Hours by Class Year YearSample Mean Sample SD Sample Size First Year Sophomore Upperclass

Statistics: Unlocking the Power of Data Lock 5 Study Hours by Class Year Is there a difference in the average hours spent studying per week by class year at Duke? (a)Yes (b)No (c)Cannot tell from this data (d)I didnt finish YearSample Mean Sample SD Sample Size First Year Sophomore Upperclass

Statistics: Unlocking the Power of Data Lock 5 Source Groups Error Total dfSum of Squares Mean Square F- Statistic ANOVA Table p-value

Statistics: Unlocking the Power of Data Lock 5 Summary Analysis of variance is used to test for a difference in means between groups by comparing the variability between groups to the variability within groups Sums of squares are used to measure variability The F-statistic is the ratio of average variability between groups to average variability within groups The F-statistic follows an F-distribution, if sample sizes are large (or data is normal), variability is equal across groups, and the null hypothesis is true

Statistics: Unlocking the Power of Data Lock 5 To Do Read Section 8.1 (we are skipping 8.2) Do Homework 6 (due Monday, 3/24)