Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.

Slides:



Advertisements
Similar presentations
Inference in the Simple Regression Model
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Part 1: Simple Linear Model 1-1/301-1 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Part 4: Prediction 4-1/22 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Linear regression models
Correlation and regression
Objectives (BPS chapter 24)
Simple Linear Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Chapter 10 Simple Regression.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Statistical Inference and Regression Analysis: GB Professor William Greene Stern School of Business IOMS Department Department of Economics.
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 11 Multiple Regression.
Simple Linear Regression Analysis
Part 19: Residuals and Outliers 19-1/27 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Simple Linear Regression and Correlation
Chapter 7 Forecasting with Simple Regression
Part 7: Multiple Regression Analysis 7-1/54 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Lecture 5 Correlation and Regression
Correlation & Regression
Part 3: Regression and Correlation 3-1/41 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Part 24: Multiple Regression – Part /45 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
Part 17: Regression Residuals 17-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
CHAPTER 14 MULTIPLE REGRESSION
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
1 Lecture 4 Main Tasks Today 1. Review of Lecture 3 2. Accuracy of the LS estimators 3. Significance Tests of the Parameters 4. Confidence Interval 5.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chap 13-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 12.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Part 16: Regression Model Specification 16-1/25 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department.
14- 1 Chapter Fourteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
[1] Simple Linear Regression. The general equation of a line is Y = c + mX or Y =  +  X.  > 0  > 0  > 0  = 0  = 0  < 0  > 0  < 0.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Chapter 12 Simple Linear Regression.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Analysis of variance approach to regression analysis … an (alternative) approach to testing for a linear association.
Chapter 15 Inference for Regression. How is this similar to what we have done in the past few chapters?  We have been using statistics to estimate parameters.
Inference about the slope parameter and correlation
Chapter 20 Linear and Multiple Regression
Simple Linear Regression
Chapter 13 Simple Linear Regression
Chapter 12 Inference on the Least-squares Regression Line; ANOVA
Statistical Inference and Regression Analysis: GB
Multiple Regression Chapter 14.
Presentation transcript:

Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics

Part 2: Model and Inference 2-2/49 Regression and Forecasting Models Part 2 – Inference About the Regression

Part 2: Model and Inference 2-3/49 The Linear Regression Model 1. The linear regression model 2. Sample statistics and population quantities 3. Testing the hypothesis of no relationship

Part 2: Model and Inference 2-4/49 A Linear Regression Predictor: Box Office = Buzz

Part 2: Model and Inference 2-5/49 Data and Relationship  We suggested the relationship between box office and internet buzz is Box Office = Buzz  Note the obvious inconsistency in the figure. This is not the relationship. The observed points do not lie on a line.  How do we reconcile the equation with the data?

Part 2: Model and Inference 2-6/49 Modeling the Underlying Process  A model that explains the process that produces the data that we observe: Observed outcome = the sum of two parts (1) Explained: The regression line (2) Unexplained (noise): The remainder  Regression model The “model” is the statement that part (1) is the same process from one observation to the next. Part (2) is the randomness that is part of real world observation.

Part 2: Model and Inference 2-7/49 The Population Regression  THE model: A specific statement about the parts of the model (1) Explained: Explained Box Office = β 0 + β 1 Buzz (2) Unexplained: The rest is “noise, ε.” Random ε has certain characteristics  Model statement Box Office = β 0 + β 1 Buzz + ε

Part 2: Model and Inference 2-8/49 The Data Include the Noise

Part 2: Model and Inference 2-9/49 The Data Include the Noise   0 +  1 Buzz Box = 41,  0 +  1 Buzz = 10,  = 31

Part 2: Model and Inference 2-10/49 Model Assumptions  y i = β 0 + β 1 x i + ε i β 0 + β 1 x i is the ‘regression function’  Contains the ‘information’ about y i in x i  Unobserved because β 0 and β 1 are not known for certain ε i is the ‘disturbance.’ It is the unobserved random component  Observed y i is the sum of the two unobserved parts.

Part 2: Model and Inference 2-11/49 Regression Model Assumptions About ε i  Random Variable (1) The regression is the mean of y i for a particular x i. ε i is the deviation of y i from the regression line. (2) ε i has mean zero. (3) ε i has variance σ 2.  ‘Random’ Noise (4) ε i is unrelated to any values of x i (no covariance) – it’s “random noise” (5) ε i is unrelated to any other observations on ε j (not “autocorrelated”) (6) Normal distribution - ε i is the sum of many small influences

Part 2: Model and Inference 2-12/49 Regression Model

Part 2: Model and Inference 2-13/49 Conditional Normal Distribution of 

Part 2: Model and Inference 2-14/49 A Violation of Point (4) c =  0 +  1 q +  ? Electricity Cost Data

Part 2: Model and Inference 2-15/49 A Violation of Point (5) - Autocorrelation Time Trend of U.S. Gasoline Consumption

Part 2: Model and Inference 2-16/49 No Obvious Violations of Assumptions Auction Prices for Monet Paintings vs. Area

Part 2: Model and Inference 2-17/49 Samples and Populations  Population (Theory) y i = β 0 + β 1 x i + ε i Parameters β 0, β 1  Regression β 0 + β 1 x i Mean of y i | x i  Disturbance, ε i Expected value = 0 Standard deviation σ No correlation with x i  Sample (Observed) y i = b 0 + b 1 x i + e i Estimates, b 0, b 1  Fitted regression b 0 + b 1 x i Predicted y i |x i  Residuals, e i Sample mean 0, Sample std. dev. s e Sample Cov[x,e] = 0

Part 2: Model and Inference 2-18/49 Disturbances vs. Residuals  =y-  0 -  1 Buzz e=y-b 0 –b 1 Buzz

Part 2: Model and Inference 2-19/49 Standard Deviation of Residuals  Standard deviation of ε i = y i - β 0 – β 1 x i is σ  σ = √E[ε i 2 ] (Mean of ε i is zero)  Sample b 0 and b 1 estimate β 0 and β 1  Residual e i = y i – b 0 – b 1 x i estimates ε i  Use √(1/N)Σe i 2 to estimate σ? Close, not quite. Why N-2? Relates to the fact that two parameters (β 0,β 1 ) were estimated. Same reason N-1 was used to compute a sample variance.

Part 2: Model and Inference 2-20/49

Part 2: Model and Inference 2-21/49 Linear Regression Sample Regression Line

Part 2: Model and Inference 2-22/49 Residuals

Part 2: Model and Inference 2-23/49 Regression Computations

Part 2: Model and Inference 2-24/49

Part 2: Model and Inference 2-25/49

Part 2: Model and Inference 2-26/49 Results to Report

Part 2: Model and Inference 2-27/49 The Reported Results

Part 2: Model and Inference 2-28/49 Estimated equation

Part 2: Model and Inference 2-29/49 Estimated coefficients b 0 and b 1

Part 2: Model and Inference 2-30/49  Sum of squared residuals, Σ i e i 2

Part 2: Model and Inference 2-31/49 S = s e = estimated std. deviation of ε

Part 2: Model and Inference 2-32/49 Interpreting  (Estimated by s e ) Remember the empirical rule, 95% of observations will lie within mean ± 2 standard deviations? We show (b 0 +b 1 x) ± 2s e below.) This point is 2.2 standard deviations from the regression. Only 3.2% of the 62 observations lie outside the bounds. (We will refine this later.)

Part 2: Model and Inference 2-33/49 No Relationship:  1 = 0Relationship:  1  0 How to Distinguish These Cases Statistically? y i = β 0 + β 1 x i + ε i

Part 2: Model and Inference 2-34/49 Assumptions  (Regression) The equation linking “Box Office” and “Buzz” is stable E[Box Office | Buzz] = α + β Buzz  Another sample of movies, say 2012, would obey the same fundamental relationship.

Part 2: Model and Inference 2-35/49 Sampling Variability Samples 0 and 1 are a random split of the 62 observations. Sample 1: Box Office = Buzz Sample 0: Box Office = Buzz

Part 2: Model and Inference 2-36/49 Sampling Distributions

Part 2: Model and Inference 2-37/49 n = N-2 Small sample Large sample

Part 2: Model and Inference 2-38/49  Standard Error of Regression Slope Estimator

Part 2: Model and Inference 2-39/49 Internet Buzz Regression Regression Analysis: BoxOffice versus Buzz The regression equation is BoxOffice = Buzz Predictor Coef SE Coef T P Constant Buzz S = R-Sq = 42.4% R-Sq(adj) = 41.4% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Range of Uncertainty for b is (10.94) to (10.94) = [51.27 to 94.17] If you use 2.00 from the t table, the limits would be [50.1 to 94.6] 

Part 2: Model and Inference 2-40/49 Some computer programs report confidence intervals automatically; Minitab does not.

Part 2: Model and Inference 2-41/49 Uncertainty About the Regression Slope Hypothetical Regression Fuel Bill vs. Number of Rooms The regression equation is Fuel Bill = Number of Rooms Predictor Coef SE Coef T P Constant Rooms S = R-Sq = 72.2% R-Sq(adj) = 72.0% This is b 1, the estimate of β 1 This “Standard Error,” (SE) is the measure of uncertainty about the true value. The “range of uncertainty” is b ± 2 SE(b). (Actually 1.96, but people use 2) 

Part 2: Model and Inference 2-42/49 Sampling Distributions and Test Statistics

Part 2: Model and Inference 2-43/49 t Statistic for Hypothesis Test

Part 2: Model and Inference 2-44/49 Alternative Approach: The P value  Hypothesis:  1 = 0  The ‘P value’ is the probability that you would have observed the evidence on this hypothesis that you did observe if the null hypothesis were true.  P = Prob(|t| would be this large |  1 = 0)  If the P value is less than the Type I error probability (usually 0.05) you have chosen, you will reject the hypothesis.  Interpret: It the hypothesis were true, it is ‘unlikely’ that I would have observed this evidence.

Part 2: Model and Inference 2-45/49 P value for hypothesis test

Part 2: Model and Inference 2-46/49 Intuitive approach: Does the confidence interval contain zero?  Hypothesis:  1 = 0  The confidence interval contains the set of plausible values of  1 based on the data and the test.  If the confidence interval does not contain 0, reject H 0 :  1 = 0.

Part 2: Model and Inference 2-47/49 More General Test

Part 2: Model and Inference 2-48/49

Part 2: Model and Inference 2-49/49 Summary: Regression Analysis  Investigate: Is the coefficient in a regression model really nonzero?  Testing procedure: Model: y = β 0 + β 1 x + ε Hypothesis: H 0 : β 1 = B. Rejection region: Least squares coefficient is far from zero.  Test: α level for the test = 0.05 as usual Compute t = (b 1 – B)/StandardError Reject H 0 if t is above the critical value  1.96 if large sample  Value from t table if small sample. Reject H 0 if reported P value is less than α level Degrees of Freedom for the t statistic is N-2