SADC Course in Statistics Analysis of Variance with two factors (Session 13)

Slides:



Advertisements
Similar presentations
Questions From Yesterday
Advertisements

Ecole Nationale Vétérinaire de Toulouse Linear Regression
Variation, uncertainties and models Marian Scott School of Mathematics and Statistics, University of Glasgow June 2012.
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
1 Session 8 Tests of Hypotheses. 2 By the end of this session, you will be able to set up, conduct and interpret results from a test of hypothesis concerning.
SADC Course in Statistics Analysis of Variance for comparing means (Session 11)
SADC Course in Statistics Common Non- Parametric Methods for Comparing Two Samples (Session 20)
SADC Course in Statistics Multiple Linear Regresion: Further issues and anova results (Session 07)
SADC Course in Statistics Estimating population characteristics with simple random sampling (Session 06)
SADC Course in Statistics Simple Linear Regression (Session 02)
SADC Course in Statistics Multiple Linear Regression: Introduction (Session 06)
The Poisson distribution
SADC Course in Statistics Comparing several proportions (Session 15)
SADC Course in Statistics Further ideas concerning confidence intervals (Session 06)
SADC Course in Statistics Trends in time series (Session 02)
SADC Course in Statistics Tests for Variances (Session 11)
Assumptions underlying regression analysis
SADC Course in Statistics Basic principles of hypothesis tests (Session 08)
SADC Course in Statistics Meaning and use of confidence intervals (Session 05)
SADC Course in Statistics Sampling weights: an appreciation (Sessions 19)
SADC Course in Statistics Inferences about the regression line (Session 03)
SADC Course in Statistics Importance of the normal distribution (Session 09)
SADC Course in Statistics Revision of key regression ideas (Session 10)
Correlation & the Coefficient of Determination
SADC Course in Statistics Confidence intervals using CAST (Session 07)
SADC Course in Statistics Sampling design using the Paddy game (Sessions 15&16)
SADC Course in Statistics Multi-stage sampling (Sessions 13&14)
SADC Course in Statistics Comparing two proportions (Session 14)
SADC Course in Statistics Introduction to Statistical Inference (Session 03)
SADC Course in Statistics (Session 09)
SADC Course in Statistics A model for comparing means (Session 12)
SADC Course in Statistics Modelling ideas in general – an appreciation (Session 20)
SADC Course in Statistics Revision on tests for proportions using CAST (Session 18)
Probability Distributions
SADC Course in Statistics Excel for statistics Module B2, Session 11.
Simple Linear Regression 1. review of least squares procedure 2
Multivariate Data/Statistical Analysis SC504/HS927 Spring Term 2008 Week 18: Relationships between variables: simple ordinary least squares (OLS) regression.
Multiple Regression. Introduction In this chapter, we extend the simple linear regression model. Any number of independent variables is now allowed. We.
Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Simple Linear Regression Analysis
Copyright © 2010 Pearson Addison-Wesley. All rights reserved. Chapter 13 One-Factor Experiments: General.
Multiple Regression and Model Building
STAT E-150 Statistical Methods
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
SADC Course in Statistics Comparing Means from Independent Samples (Session 12)
Every achievement originates from the seed of determination. 1Random Effect.
Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 2000 LIND MASON MARCHAL 1-1 Chapter Twelve Multiple Regression and Correlation Analysis GOALS When.
SADC Course in Statistics Comparing Regressions (Session 14)
Simple Linear Regression Analysis
SADC Course in Statistics Paddy results: a discussion (Session 17)
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
SADC Course in Statistics The normal distribution (Session 08)
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Ch4 Describing Relationships Between Variables. Pressure.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Chapter 13 Multiple Regression
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Analysis Overheads1 Analyzing Heterogeneous Distributions: Multiple Regression Analysis Analog to the ANOVA is restricted to a single categorical between.
CPE 619 One Factor Experiments Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama in.
Essentials of Modern Business Statistics (7e)
Presentation transcript:

SADC Course in Statistics Analysis of Variance with two factors (Session 13)

To put your footer here go to View > Header and Footer 2 Learning Objectives At the end of this session, you will be able to understand and interpret the components of a linear model with two categorical factors fit a model involving two factors, interpret the output and present the results understand the difference between raw means and adjusted means appreciate that a residual analysis is the same with more complex models

To put your footer here go to View > Header and Footer 3 Using Paddy again! In the paddy example, there were two categorical factors, variety and village. Here we will look at a model including both factors and the corresponding output. We will also discuss assumptions associated with anova models with categorical factors and procedures to check these assumptions.

To put your footer here go to View > Header and Footer 4 A model using two factors Objective here is to compare paddy yields across the 3 varieties and also across villages. A linear model for this takes the form: y ij = 0 + v i + g j + ij Here 0 represents a constant, and the g j (i=1,2,3) represent the variety effect as before. We also have the term v i (i=1,2,3,4) to represent the village effect.

To put your footer here go to View > Header and Footer 5 Anova results Sourced.f.S.S.M.S.FProb. Village Variety Residual Total Above is a two-way anova since there are two factors explaining the variability in paddy yields. Again the Residual M.S. (s 2 ) = describes the variation not explained by village and variety.

To put your footer here go to View > Header and Footer 6 Sample sizes Above shows data is not balanced. Hence need to worry about the order of fitting terms. How then should we interpret the sequential S.S.s shown in slide 5 anova? | Variety | Village | New Old Trad | Total KESEN | | 7 NANDA | | 14 NIKO | | 5 SABEY | | Total | |

To put your footer here go to View > Header and Footer 7 Anova with adjusted SS and MS Sourced.f.Adj.S.S.Adj.M.S.FProb. Village Variety Residual Total How may the above results be interpreted? What are your conclusions?

To put your footer here go to View > Header and Footer 8 Model estimates ParameterCoeff.Std.errortt prob 0 :constant v 2 (Nanda) v 3 (Niko) v 4 (Sabey) g 2 (old) g 3 (trad) What do these results tell us?

To put your footer here go to View > Header and Footer 9 Relating estimates to means Again:Old - New = =Estimate of g 2 Trad - New = =Estimate of g 3 This is similar to the case with one categorical factor – can make comparisons easily with the base level using model estimates. But when sample sizes are unequal across the two categorical factors, results should be reported in terms of adjusted means!

To put your footer here go to View > Header and Footer 10 Raw means and adjusted means SampleRawStd.error VarietySize(n)Means (s.d./n) New improved Old improved Traditional VarietyAdjusted means Std.error (s/n) New improved Old improved Traditional Model based summaries (adjusted means):

To put your footer here go to View > Header and Footer 11 Computing adjusted means The model equation y ij = 0 + v i + g j + ij can be used to find the variety adjusted means e.g. adjusted mean for traditional variety is: = [ – ]–2.614 = Thus the variety adjusted mean is an average over the 4 villages.

To put your footer here go to View > Header and Footer 12 Checking model assumptions Anova model with two categorical factors is: y ij = 0 + g i + v j + ij Model assumptions are associated with the ij. These are checked in exactly the same way as before. A residual analysis is done, looking at plots of residuals in various ways. We give below a residual analysis for the model fitted above.

To put your footer here go to View > Header and Footer 13 Histogram to check normality Histogram of standardised residuals after fitting a model of yield on village and variety.

To put your footer here go to View > Header and Footer 14 A normal probability plot… Another check on the normality assumption Do you think the points follow a straight line?

To put your footer here go to View > Header and Footer 15 Std. residuals versus fitted values Checking assumption of variance homogeneity, and identification of outliers: What can you say here about the variance homogeneity assumption?

To put your footer here go to View > Header and Footer 16 Finally… know your software Different software packages impose different constraints on model parameters so need to be aware what this is. For example, Stata and Genstat set the first level of the factor to zero. SPSS and SAS set the last level to zero. Minitab imposes a constraint that sets the sum of the parameter estimates to zero! Check also whether the software produces sequential or adjusted or some other form of sums of squares. The correct interpretation of anova results would depend on this.

To put your footer here go to View > Header and Footer 17 Practical work follows to ensure learning objectives are achieved…