Stat 112: Lecture 22 Notes Chapter 9.1: One Way Analysis of Variance Chapter 9.2: Two Way Analysis of Variance.

Slides:



Advertisements
Similar presentations
ANALYSIS OF VARIANCE (ONE WAY)
Advertisements

Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Inference for Regression
Lecture 6 Outline – Thur. Jan. 29
Inferential Statistics
Analysis of Variance The contents in this chapter are from Chapter 15 and Chapter 16 of the textbook. One-Way Analysis of Variance Multiple Comparisons.
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Chapter 13 Multiple Regression
Class 23: Thursday, Dec. 2nd Today: One-way analysis of variance, multiple comparisons. Next week: Two-way analysis of variance. I will the final.
Lecture 13 – Tues, Oct 21 Comparisons Among Several Groups – Introduction (Case Study 5.1.1) Comparing Any Two of the Several Means (Chapter 5.2) The One-Way.
Stat 512 – Lecture 14 Analysis of Variance (Ch. 12)
Part I – MULTIVARIATE ANALYSIS
Chapter 12 Multiple Regression
Stat 112: Lecture 22 Notes Chapter 9.1: One-way Analysis of Variance. Chapter 9.3: Two-way Analysis of Variance Homework 6 is due on Friday.
Stat 112: Lecture 23 Notes Chapter 9.3: Two-way Analysis of Variance Schedule: –Homework 6 is due on Friday. –Quiz 4 is next Tuesday. –Final homework assignment.
Lecture 14 – Thurs, Oct 23 Multiple Comparisons (Sections 6.3, 6.4). Next time: Simple linear regression (Sections )
Class 22: Tuesday, Nov. 30th Today: One-way analysis of variance I will you tonight or tomorrow morning with comments on your project. Schedule:
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Stat 112 – Notes 3 Homework 1 is due at the beginning of class next Thursday.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 9: One Way ANOVA Between Subjects
Hypothesis Testing. Introduction Always about a population parameter Attempt to prove (or disprove) some assumption Setup: alternate hypothesis: What.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Lecture 12 One-way Analysis of Variance (Chapter 15.2)
Stat Today: Multiple comparisons, diagnostic checking, an example After these notes, we will have looked at (skip figures 1.2 and 1.3, last.
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
Class 24: Tues., Dec. 7th Today: Two-way analysis of variance Thursday: Design of Experiments Schedule: –Thurs., Dec. 9 th – Final class –Mon., Dec. 13.
Lecture 6. Hypothesis tests for the population mean  Similar arguments to those used to develop the idea of a confidence interval allow us to test the.
Today Concepts underlying inferential statistics
Lecture 14: Thur., Feb. 26 Multiple Comparisons (Sections ) Next class: Inferences about Linear Combinations of Group Means (Section 6.2).
Stat 112: Lecture 21 Notes Model Building (Brief Discussion) Chapter 9.1: One way Analysis of Variance. Homework 6 is due Friday, Dec. 1 st. I will be.
Chapter 14 Inferential Data Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Chapter 12: Analysis of Variance
F-Test ( ANOVA ) & Two-Way ANOVA
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Inferential Statistics: SPSS
Chapter 13: Inference in Regression
Hypothesis testing – mean differences between populations
STA291 Statistical Methods Lecture 27. Inference for Regression.
Intermediate Applied Statistics STAT 460
ANOVA Greg C Elvers.
More About Significance Tests
Chapter 14 Introduction to Multiple Regression
Associate Professor Arthur Dryver, PhD School of Business Administration, NIDA url:
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
6/2/2016Slide 1 To extend the comparison of population means beyond the two groups tested by the independent samples t-test, we use a one-way analysis.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
Hypothesis Testing. Why do we need it? – simply, we are looking for something – a statistical measure - that will allow us to conclude there is truly.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: One-way ANOVA Marshall University Genomics Core.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Stat 112 Notes 23. Quiz 4 Info 4 double sided sheets of notes Covers interactions, models with categorical variables and interactions, one way analysis.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Analysis of Variance STAT E-150 Statistical Methods.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Hypothesis Tests for 1-Proportion Presentation 9.
Step 1: Specify a null hypothesis
Presentation transcript:

Stat 112: Lecture 22 Notes Chapter 9.1: One Way Analysis of Variance Chapter 9.2: Two Way Analysis of Variance

Milgram’s Obedience Experiment Y=Maximum voltage level subject would give before refusing to continue. X=Condition. Four categories: Remote feedback, voice feedback, proximity, touch proximity.

Testing whether each of the groups is different Naïve approach to deciding which groups have mean that is different from the average of the means of all groups: Do t- test for each group and look for groups that have p-value <0.05. Problem: Multiple comparisons.

Errors in Hypothesis Testing State of World Null Hypothesis True Alternative Hypothesis True Decision Based on Data Accept Null Hypothesis Correct Decision Type II error Reject Null Hypothesis Type I errror Correct Decision When we do one hypothesis test and reject null hypothesis if p-value <0.05, then the probability of making a Type I error when the null hypothesis is true is We protect against falsely rejecting a null hypothesis by making probability of Type I error small.

Multiple Comparisons Problem Compound uncertainty: When doing more than one test, there is an increase chance of making a mistake. If we do multiple hypothesis tests and use the rule of rejecting the null hypothesis in each test if the p-value is 0.05.

Multiple Comparisons Simulation In multiplecomp.JMP, 20 groups are compared with sample sizes of ten for each group. The observations for each group are simulated from a standard normal distribution. Thus, in fact, Number of pairs found to have significantly different means using t-test at level Iterat ion # of Pairs

Multiple Comparison Simulation In multiplecomp.JMP, 20 groups are compared with sample sizes of ten for each group. The observations for each group are simulated from a standard normal distribution. Thus, in fact, Number of groups found to have means different than average using t-test and rejecting if p-value <0.05. Iteration12345 # of Groups

Individual vs. Familywise Error Rate When several tests are considered simultaneously, they constitute a family of tests. Individual Type I error rate: Probability for a single test that the null hypothesis will be rejected assuming that the null hypothesis is true. Familywise Type I error rate: Probability for a family of test that at least one null hypothesis will be rejected assuming that all of the null hypotheses are true. When we consider a family of tests, we want to make the familywise error rate small, say 0.05, to protect against falsely rejecting a null hypothesis.

Why is it important to make familywise error rate small? If we consider many tests, we will almost always reach some false conclusions if we do not make familywise error rate small: –Suppose we test the effect of eating a large amount of 100 different foods on preventing cancer and suppose that none of the foods helps. If we test each food with individual error rate 0.05, then we will on average find 5 foods that have a significan effect even though in truth, no food has a significant effect.

Suppose we look for a rule that reliably predicts the presidential election winner. If we look over enough rules for a rule that is significant at the 0.05 level, we will be able to find one even if no rule is reliable. Before this year’s election, the Washington Post reported this rule: When the Redskins win their final home game before the presidential election, the incumbent party stays in office. When the Redskins lose their final home game before the election the incumbent party loses, except in Pretty simple, right? Steve Hirdt of the Elias Sports Bureau discovered it's happened precisely that way 16 of 17 times since Anything close to that exact is more than a trend and is certainly politically unbiased.

Bonferroni Method General method for doing multiple comparisons for any family of k tests. Denote familywise type I error rate we want by p*, say p*=0.05. Compute p-values for each individual test -- Reject null hypothesis for ith test if Guarantees that familywise type I error rate is at most p*. Why Bonferroni works: If we do k tests and all null hypotheses are true, then using Bonferroni with p*=0.05, we have probability 0.05/k to make a Type I error for each test and expect to make k*(0.05/k)=0.05 errors in total.

Tukey’s Honest Significant Differences (HSD) Tukey’s HSD is a method that is specifically designed to control the familywise type I error rate (at 0.05) for analysis of variance. After Fit Y by X, click the red triangle next to Response, click Compare Means and click All Pairs, Tukey’s HSD. If using Fit Model, click the red triangle next to the X variable in the Leverage Plot for the X variable and click LSMeans Tukey HSD.

Tukey’s HSD for Milgram Data

Assumptions in one-way ANOVA Assumptions needed for validity of one- way analysis of variance p-values and CIs: –Linearity: automatically satisfied. –Constant variance: Spread within each group is the same. –Normality: Distribution within each group is normally distributed. –Independence: Sample consists of independent observations.

Rule of thumb for checking constant variance Constant variance: Look at standard deviation of different groups by using Fit Y by X and clicking Means and Std Dev. Rule of Thumb: Check whether (highest group standard deviation/lowest group standard deviation) is greater than 2. If greater than 2, then constant variance is not reasonable and transformation should be considered.. If less than 2, then constant variance is reasonable. (Highest group standard deviation/lowest group standard deviation) =( /63.640)=2.07. Thus, constant variance is not reasonable for Milgram’s data.

Transformations to correct for nonconstant variance If standard deviation is highest for high groups with high means, try transforming Y to log Y or. If standard deviation is highest for groups with low means, try transforming Y to Y 2. SD is particularly low for group with highest mean. Try transforming to Y 2. To make the transformation, right click in new column, click New Column and then right click again in the created column and click Formula and enter the appropriate formula for the transformation.

Transformation of Milgram’s data to Squared Voltage Level Check of constant variance for transformed data: (Highest group standard deviation/lowest group standard deviation) = Constant variance assumption is reasonable for voltage squared. Analysis of variance tests are approximately valid for voltage squared data; reanalyzed data using voltage squared.

Analysis using Voltage Squared Strong evidence that the group mean voltage squared levels are not all the same. Strong evidence that remote has higher mean voltage squared level than proximity and touch-proximity and that voice-feedback has higher mean voltage squared level than touch-proximity, taking into account the multiple comparisons.

Rule of Thumb for Checking Normality in ANOVA The normality assumption for ANOVA is that the distribution in each group is normal. Can be checked by looking at the boxplot, histogram and normal quantile plot for each group. If there are more than 30 observations in each group, then the normality assumption is not important; ANOVA p-values and CIs will still be approximately valid even for nonnormal data if there are more than 30 observations in each group. If there are less than 30 observations per group, then we can check normality by clicking Analyze, Distribution and then putting the Y variable in the Y, Columns box and the categorical variable denoting the group in the By box. We can then create normal quantile plots for each group and check that for each group, the points in the normal quantile plot are in the confidence bands. If there is nonnormality, we can try to use a transformation such as log Y and see if the transformed data is approximately normally distributed in each group.

One way Analysis of Variance: Steps in Analysis 1.Check assumptions (constant variance, normality, independence). If constant variance is violated, try transformations. 2.Use the effect test (commonly called the F- test) to test whether all group means are the same. 3.If it is found that at least two group means differ from the effect test, use Tukey’s HSD procedure to investigate which groups are different, taking into account the fact multiple comparisons are being done.

Analysis of Variance Terminology Analysis of variance is generally concerned with comparing the means of different groups and is a special case of regression in which all the explanatory variables are categorical variables. The criterion (criteria) by which we classify the groups in analysis of variance is called a factor. In one-way analysis of variance, we have one factor. The possible values of the factor are levels. Milgram’s study: Factor is experimental condition with levels remote, voice-feedback, proximity and touch- proximity. Two-way analysis of variance: Groups are classified by two factors.

Two-way Analysis of Variance Examples Milgram’s study: In thinking about the Obedience to Authority study, many people have thought that women would react differently than men. Two-way analysis of variance setup in which the two factors are experimental condition (levels remote, voice-feedback, proximity, touch-proximity) and sex (levels male, female). Package Design Experiment: Several new types of cereal packages were designed. Two colors and two styles of lettering were considering. Each combination of lettering/color was used to produce a package, and each of these combinations was test marketed in 12 comparable stores and sales in the stores were recorded.. Two-way analysis of variance in which two factors are color (levels red, green) and lettering (levels block, script). Goal of two-way analysis of variance: Find out how the mean response in a group depends on the levels of both factors and find the best combination.

Two-way Analysis of Variance The mean of the group with the ith level of factor 1 and the jth level of factor 2 is denoted, e.g., in package-design experiment, the four group means are As with one-way analysis of variance, two-way analysis of variance can be seen as a a special case of multiple regression. For two-way analysis of variance, we have two categorical explanatory variables for the two factors and also include an interaction between the factors.

Two-way ANOVA in JMP Use Analyze, Fit Model with a categorical variable for the first factor, a categorical variable for the second factor and an interaction variable that crosses the first factor and the second factor. The LS Means Plots (which show how the means of the groups vary as the levels of the factor vary) are produced by going to the output in JMP for each variable that is to the right of the main output, clicking the red triangle next to each variable (for package design, the vairables are Color, TypeStyle, Typestyle*Color) and clicking LS Means Plot.

Estimated Mean for Red Block group = = Estimated Mean for Red Script group = =

The LS Means Plots show how the means of the groups vary as the levels of the factors vary. For the top plot for color, green refers to the mean of the two green groups (green block and green script) and red refers to the mean of the two red groups (red block and red script). Similarly for the second plot for TypeStyle, block refers to the mean of the two block groups (red block and green block). The third plot for TypeStyle*Color shows the mean of all four groups.