Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.

Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics

Part 2: Model and Inference 2-2/49 Regression and Forecasting Models Part 2 – Inference About the Regression

Part 2: Model and Inference 2-3/49 The Linear Regression Model 1. The linear regression model 2. Sample statistics and population quantities 3. Testing the hypothesis of no relationship

Part 2: Model and Inference 2-4/49 A Linear Regression Predictor: Box Office = -14.36 + 72.72 Buzz

Part 2: Model and Inference 2-5/49 Data and Relationship  We suggested the relationship between box office and internet buzz is Box Office = -14.36 + 72.72 Buzz  Note the obvious inconsistency in the figure. This is not the relationship. The observed points do not lie on a line.  How do we reconcile the equation with the data?

Part 2: Model and Inference 2-6/49 Modeling the Underlying Process  A model that explains the process that produces the data that we observe: Observed outcome = the sum of two parts (1) Explained: The regression line (2) Unexplained (noise): The remainder  Regression model The “model” is the statement that part (1) is the same process from one observation to the next. Part (2) is the randomness that is part of real world observation.

Part 2: Model and Inference 2-7/49 The Population Regression  THE model: A specific statement about the parts of the model (1) Explained: Explained Box Office = β 0 + β 1 Buzz (2) Unexplained: The rest is “noise, ε.” Random ε has certain characteristics  Model statement Box Office = β 0 + β 1 Buzz + ε

Part 2: Model and Inference 2-8/49 The Data Include the Noise

Part 2: Model and Inference 2-9/49 The Data Include the Noise   0 +  1 Buzz Box = 41,  0 +  1 Buzz = 10,  = 31

Part 2: Model and Inference 2-10/49 Model Assumptions  y i = β 0 + β 1 x i + ε i β 0 + β 1 x i is the ‘regression function’  Contains the ‘information’ about y i in x i  Unobserved because β 0 and β 1 are not known for certain ε i is the ‘disturbance.’ It is the unobserved random component  Observed y i is the sum of the two unobserved parts.

Part 2: Model and Inference 2-11/49 Regression Model Assumptions About ε i  Random Variable (1) The regression is the mean of y i for a particular x i. ε i is the deviation of y i from the regression line. (2) ε i has mean zero. (3) ε i has variance σ 2.  ‘Random’ Noise (4) ε i is unrelated to any values of x i (no covariance) – it’s “random noise” (5) ε i is unrelated to any other observations on ε j (not “autocorrelated”) (6) Normal distribution - ε i is the sum of many small influences

Part 2: Model and Inference 2-12/49 Regression Model

Part 2: Model and Inference 2-13/49 Conditional Normal Distribution of 

Part 2: Model and Inference 2-14/49 A Violation of Point (4) c =  0 +  1 q +  ? Electricity Cost Data

Part 2: Model and Inference 2-15/49 A Violation of Point (5) - Autocorrelation Time Trend of U.S. Gasoline Consumption

Part 2: Model and Inference 2-16/49 No Obvious Violations of Assumptions Auction Prices for Monet Paintings vs. Area

Part 2: Model and Inference 2-17/49 Samples and Populations  Population (Theory) y i = β 0 + β 1 x i + ε i Parameters β 0, β 1  Regression β 0 + β 1 x i Mean of y i | x i  Disturbance, ε i Expected value = 0 Standard deviation σ No correlation with x i  Sample (Observed) y i = b 0 + b 1 x i + e i Estimates, b 0, b 1  Fitted regression b 0 + b 1 x i Predicted y i |x i  Residuals, e i Sample mean 0, Sample std. dev. s e Sample Cov[x,e] = 0

Part 2: Model and Inference 2-18/49 Disturbances vs. Residuals  =y-  0 -  1 Buzz e=y-b 0 –b 1 Buzz

Part 2: Model and Inference 2-19/49 Standard Deviation of Residuals  Standard deviation of ε i = y i - β 0 – β 1 x i is σ  σ = √E[ε i 2 ] (Mean of ε i is zero)  Sample b 0 and b 1 estimate β 0 and β 1  Residual e i = y i – b 0 – b 1 x i estimates ε i  Use √(1/N)Σe i 2 to estimate σ? Close, not quite. Why N-2? Relates to the fact that two parameters (β 0,β 1 ) were estimated. Same reason N-1 was used to compute a sample variance.

Part 2: Model and Inference 2-20/49

Part 2: Model and Inference 2-21/49 Linear Regression Sample Regression Line

Part 2: Model and Inference 2-22/49 Residuals

Part 2: Model and Inference 2-23/49 Regression Computations

Part 2: Model and Inference 2-26/49 Results to Report

Part 2: Model and Inference 2-27/49 The Reported Results

Part 2: Model and Inference 2-28/49 Estimated equation

Part 2: Model and Inference 2-29/49 Estimated coefficients b 0 and b 1

Part 2: Model and Inference 2-30/49  Sum of squared residuals, Σ i e i 2

Part 2: Model and Inference 2-31/49 S = s e = estimated std. deviation of ε

Part 2: Model and Inference 2-32/49 Interpreting  (Estimated by s e ) Remember the empirical rule, 95% of observations will lie within mean ± 2 standard deviations? We show (b 0 +b 1 x) ± 2s e below.) This point is 2.2 standard deviations from the regression. Only 3.2% of the 62 observations lie outside the bounds. (We will refine this later.)

Part 2: Model and Inference 2-33/49 No Relationship:  1 = 0Relationship:  1  0 How to Distinguish These Cases Statistically? y i = β 0 + β 1 x i + ε i

Part 2: Model and Inference 2-34/49 Assumptions  (Regression) The equation linking “Box Office” and “Buzz” is stable E[Box Office | Buzz] = α + β Buzz  Another sample of movies, say 2012, would obey the same fundamental relationship.

Part 2: Model and Inference 2-35/49 Sampling Variability Samples 0 and 1 are a random split of the 62 observations. Sample 1: Box Office = -13.25 + 68.51 Buzz Sample 0: Box Office = -16.09 + 79.11 Buzz

Part 2: Model and Inference 2-36/49 Sampling Distributions

Part 2: Model and Inference 2-37/49 n = N-2 Small sample Large sample

Part 2: Model and Inference 2-38/49  Standard Error of Regression Slope Estimator

Part 2: Model and Inference 2-39/49 Internet Buzz Regression Regression Analysis: BoxOffice versus Buzz The regression equation is BoxOffice = - 14.4 + 72.7 Buzz Predictor Coef SE Coef T P Constant -14.360 5.546 -2.59 0.012 Buzz 72.72 10.94 6.65 0.000 S = 13.3863 R-Sq = 42.4% R-Sq(adj) = 41.4% Analysis of Variance Source DF SS MS F P Regression 1 7913.6 7913.6 44.16 0.000 Residual Error 60 10751.5 179.2 Total 61 18665.1 Range of Uncertainty for b is 72.72+1.96(10.94) to 72.72-1.96(10.94) = [51.27 to 94.17] If you use 2.00 from the t table, the limits would be [50.1 to 94.6] 

Part 2: Model and Inference 2-40/49 Some computer programs report confidence intervals automatically; Minitab does not.

Part 2: Model and Inference 2-41/49 Uncertainty About the Regression Slope Hypothetical Regression Fuel Bill vs. Number of Rooms The regression equation is Fuel Bill = -252 + 136 Number of Rooms Predictor Coef SE Coef T P Constant -251.9 44.88 -5.20 0.000 Rooms 136.2 7.09 19.9 0.000 S = 144.456 R-Sq = 72.2% R-Sq(adj) = 72.0% This is b 1, the estimate of β 1 This “Standard Error,” (SE) is the measure of uncertainty about the true value. The “range of uncertainty” is b ± 2 SE(b). (Actually 1.96, but people use 2) 

Part 2: Model and Inference 2-42/49 Sampling Distributions and Test Statistics

Part 2: Model and Inference 2-43/49 t Statistic for Hypothesis Test

Part 2: Model and Inference 2-44/49 Alternative Approach: The P value  Hypothesis:  1 = 0  The ‘P value’ is the probability that you would have observed the evidence on this hypothesis that you did observe if the null hypothesis were true.  P = Prob(|t| would be this large |  1 = 0)  If the P value is less than the Type I error probability (usually 0.05) you have chosen, you will reject the hypothesis.  Interpret: It the hypothesis were true, it is ‘unlikely’ that I would have observed this evidence.

Part 2: Model and Inference 2-45/49 P value for hypothesis test

Part 2: Model and Inference 2-46/49 Intuitive approach: Does the confidence interval contain zero?  Hypothesis:  1 = 0  The confidence interval contains the set of plausible values of  1 based on the data and the test.  If the confidence interval does not contain 0, reject H 0 :  1 = 0.

Part 2: Model and Inference 2-47/49 More General Test

Part 2: Model and Inference 2-49/49 Summary: Regression Analysis  Investigate: Is the coefficient in a regression model really nonzero?  Testing procedure: Model: y = β 0 + β 1 x + ε Hypothesis: H 0 : β 1 = B. Rejection region: Least squares coefficient is far from zero.  Test: α level for the test = 0.05 as usual Compute t = (b 1 – B)/StandardError Reject H 0 if t is above the critical value  1.96 if large sample  Value from t table if small sample. Reject H 0 if reported P value is less than α level Degrees of Freedom for the t statistic is N-2

Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.

Similar presentations

Presentation on theme: "Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.

Similar presentations

Presentation on theme: "Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics."— Presentation transcript:

Similar presentations

About project

Feedback