Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3, 11.4.3.

Slides:



Advertisements
Similar presentations
Lecture 17: Tues., March 16 Inference for simple linear regression (Ch ) R2 statistic (Ch ) Association is not causation (Ch ) Next.
Advertisements

Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Chapter 12 Simple Regression
Lecture 13 – Tues, Oct 21 Comparisons Among Several Groups – Introduction (Case Study 5.1.1) Comparing Any Two of the Several Means (Chapter 5.2) The One-Way.
BA 555 Practical Business Analysis
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Lecture 14 – Thurs, Oct 23 Multiple Comparisons (Sections 6.3, 6.4). Next time: Simple linear regression (Sections )
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Simple Regression Model
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 24: Thurs., April 8th
Chapter 11 Multiple Regression.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Simple Linear Regression Analysis
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Introduction to Probability and Statistics Linear Regression and Correlation.
Linear Regression Example Data
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Linear Regression/Correlation
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
STA291 Statistical Methods Lecture 31. Analyzing a Design in One Factor – The One-Way Analysis of Variance Consider an experiment with a single factor.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 23 Multiple Regression.
Stat 112 Notes 17 Time Series and Assessing the Assumption that the Disturbances Are Independent (Chapter 6.8) Using and Interpreting Indicator Variables.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chapter 10 Correlation and Regression
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Data Analysis.
Lecture 10: Correlation and Regression Model.
Simple linear regression Tron Anders Moger
Stat 112 Notes 5 Today: –Chapter 3.7 (Cautions in interpreting regression results) –Normal Quantile Plots –Chapter 3.6 (Fitting a linear time trend to.
Stat 112 Notes 6 Today: –Chapters 4.2 (Inferences from a Multiple Regression Analysis)
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Stat 112 Notes 8 Today: –Chapters 4.3 (Assessing the Fit of a Regression Model) –Chapter 4.4 (Comparing Two Regression Models) –Chapter 4.5 (Prediction.
Stats Methods at IC Lecture 3: Regression.
Chapter 4 Basic Estimation Techniques
CHAPTER 29: Multiple Regression*
Presentation transcript:

Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3, – very brief) Course summary More advanced statistics courses

Model Fits Parallel Regression Lines Model Separate Regression Lines Model How do we test whether parallel regression lines model is appropriate ( )?

Extra Sum of Squares F-tests Suppose we want to test whether multiple coefficients are equal to zero, e.g., test t-tests, either individually or in combination cannot be used to test such a hypothesis involving more than one parameter. F-test for joint significance of several terms

Extra Sum of Squares F-test Under, the F- statistic has an F distribution with number of betas being tested, n-(p+1) degrees of freedom. p-value can be found by using Table A.4 or creating a Formula in JMP with probability, F distribution and the putting the value of the F- statistic for F and the appropriate degrees of freedom. This gives the P(F random variable with degrees of freedom < observed F-statistic) which equals 1 – p-value

Extra Sum of Squares F-test example Testing parallel regression lines model (H 0, reduced model ) vs. separate regression lines model (full model) in manager example Full model: Reduced model: F-statistic p-value: P(F random variable with 2,53 df > 3.29)

Second Example of F-test For echolocation study, in parallel regression model, test Full model: Reduced model: F-statistic: p-value: P(F random variable with 2,16 degrees of freedom > 0.43) = = 0.658

Manager Example Findings The runs supervised by Manager a appear abnormally time consuming. Manager b has high initial fixed setup costs, but the time per unit is the best of the three. Manager c has the lowest fixed costs and per unit production time in between managers a and b. Adjustments to marginal analysis via regression only control for possible differences in size among production runs. Other differences might be relevant, e.g., difficulty of production runs. It could be that Manager a supervised most difficult production runs.

Special Cases of F-test Multiple Regression Model: If we want to test if one equals zero, e.g.,, F- test is equivalent to t-test. Suppose we want to test, i.e., null hypothesis is that the mean of Y does not depend on any of the explanatory variables. JMP automatically computes this test under Analysis of Variance, Prob>F. For separate regression lines model, strong evidence that mean run time does depend on at least one of run size, manager.

The R-Squared Statistic For separate regression lines model in production time example, Similar interpretation as in simple linear regression. The R-squared statistic is the proportion of the variation in y explained by the multiple regression model Total Sum of Squares: Residual Sum of Squares:

Assumptions of Multiple Linear Regression Model Assumptions of multiple linear regression: –For each subpopulation, (A-1A) (A-1B) (A-1C) The distribution of is normal [Distribution of residuals should not depend on ] –(A-2) The observations are independent of one another

Checking/Refining Model Tools for checking (A-1A) and (A-1B) –Residual plots versus predicted (fitted) values –Residual plots versus explanatory variables –If model is correct, there should be no pattern in the residual plots Tool for checking (A-1C) – Normal quantile plot Tool for checking (A-2) –Residual plot versus time or spatial order of observations

Residual Plots for Echolocation Study Model: Residual vs. predicted plot suggests that variance is not constant and possible nonlinearity.

Residual plots for echolocation study Model

Coded Residual Plots For multiple regression involving nominal variables, a plot of the residuals versus a continuous explanatory variable with codes for the nominal variable is very useful.

Residual Plots for Transformed Model Model: Residual Plots for Transformed Model Transformed model appears to satisfy Assumptions (1-B) and Assumptions (1-C).

Normal Quantile Plot To check Assumption 1-C [populations are normal], we can use a normal quantile plot of the residuals as with simple linear regression.

Dealing with Influential Observations By influential observations, we mean one or several observations whose removal causes a different conclusion or course of action. Display 11.8 provides a strategy for dealing with suspected influential cases.

Cook’s Distance Cook’s distance is a statistic that can be used to flag observations which are influential. After fit model, click on red triangle next to Response, Save columns, Cook’s D influence. Cook’s distance of close to or larger than 1 indicates a large influence.

Course Summary Cont. Techniques: –Methods for comparing two groups –Methods for comparing more than two groups –Simple and multiple linear regression for predicting a response variable based on explanatory variables and (with a random experiment) finding the causal effect of explanatory variables on a response variable.

Course Summary Cont. Key messages: –Scope of inference: randomized experiments vs. observational studies, random samples vs. nonrandom samples. Always use randomized experiments and random samples if possible. –p-values only assesses whether there is strong evidence against the null hypothesis. They do not provide information about either practical significance. Confidence intervals are needed to assess practical significance. –When designing a study, choose a sample size that is large enough so that it will be unlikely that the confidence interval will contain both the null hypothesis and a practically significant alternative.

Course Summary Cont. Key messages: –Beware of multiple comparisons and data snooping. Use Tukey-Kramer method or Bonferroni to adjust for multiple comparisons. –Simple/multiple linear regression is a powerful method for making predictions and understanding causation in a randomized experiment. But beware of extrapolation and making causal statements when the explanatory variables were not randomly assigned.

More Statistics? Stat 210: Sample Survey Design. Will be offered next year. Stat 202: Intermediate Statistics. Offered next fall. Stat 431: Statistical Inference. Will be offered this spring (as well as throughout next year). Stat 430: Probability. Offered this spring. Stat 500: Applied Regression and Analysis of Variance. Offered next fall. Stat 501: Introduction to Nonparametric Methods and Log-linear models. Offered this spring.