Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.

Similar presentations


Presentation on theme: "Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics."— Presentation transcript:

1 Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics

2 Part 2: Model and Inference 2-2/49 Regression and Forecasting Models Part 2 – Inference About the Regression

3 Part 2: Model and Inference 2-3/49 The Linear Regression Model 1. The linear regression model 2. Sample statistics and population quantities 3. Testing the hypothesis of no relationship

4 Part 2: Model and Inference 2-4/49 A Linear Regression Predictor: Box Office = -14.36 + 72.72 Buzz

5 Part 2: Model and Inference 2-5/49 Data and Relationship  We suggested the relationship between box office and internet buzz is Box Office = -14.36 + 72.72 Buzz  Note the obvious inconsistency in the figure. This is not the relationship. The observed points do not lie on a line.  How do we reconcile the equation with the data?

6 Part 2: Model and Inference 2-6/49 Modeling the Underlying Process  A model that explains the process that produces the data that we observe: Observed outcome = the sum of two parts (1) Explained: The regression line (2) Unexplained (noise): The remainder  Regression model The “model” is the statement that part (1) is the same process from one observation to the next. Part (2) is the randomness that is part of real world observation.

7 Part 2: Model and Inference 2-7/49 The Population Regression  THE model: A specific statement about the parts of the model (1) Explained: Explained Box Office = β 0 + β 1 Buzz (2) Unexplained: The rest is “noise, ε.” Random ε has certain characteristics  Model statement Box Office = β 0 + β 1 Buzz + ε

8 Part 2: Model and Inference 2-8/49 The Data Include the Noise

9 Part 2: Model and Inference 2-9/49 The Data Include the Noise   0 +  1 Buzz Box = 41,  0 +  1 Buzz = 10,  = 31

10 Part 2: Model and Inference 2-10/49 Model Assumptions  y i = β 0 + β 1 x i + ε i β 0 + β 1 x i is the ‘regression function’  Contains the ‘information’ about y i in x i  Unobserved because β 0 and β 1 are not known for certain ε i is the ‘disturbance.’ It is the unobserved random component  Observed y i is the sum of the two unobserved parts.

11 Part 2: Model and Inference 2-11/49 Regression Model Assumptions About ε i  Random Variable (1) The regression is the mean of y i for a particular x i. ε i is the deviation of y i from the regression line. (2) ε i has mean zero. (3) ε i has variance σ 2.  ‘Random’ Noise (4) ε i is unrelated to any values of x i (no covariance) – it’s “random noise” (5) ε i is unrelated to any other observations on ε j (not “autocorrelated”) (6) Normal distribution - ε i is the sum of many small influences

12 Part 2: Model and Inference 2-12/49 Regression Model

13 Part 2: Model and Inference 2-13/49 Conditional Normal Distribution of 

14 Part 2: Model and Inference 2-14/49 A Violation of Point (4) c =  0 +  1 q +  ? Electricity Cost Data

15 Part 2: Model and Inference 2-15/49 A Violation of Point (5) - Autocorrelation Time Trend of U.S. Gasoline Consumption

16 Part 2: Model and Inference 2-16/49 No Obvious Violations of Assumptions Auction Prices for Monet Paintings vs. Area

17 Part 2: Model and Inference 2-17/49 Samples and Populations  Population (Theory) y i = β 0 + β 1 x i + ε i Parameters β 0, β 1  Regression β 0 + β 1 x i Mean of y i | x i  Disturbance, ε i Expected value = 0 Standard deviation σ No correlation with x i  Sample (Observed) y i = b 0 + b 1 x i + e i Estimates, b 0, b 1  Fitted regression b 0 + b 1 x i Predicted y i |x i  Residuals, e i Sample mean 0, Sample std. dev. s e Sample Cov[x,e] = 0

18 Part 2: Model and Inference 2-18/49 Disturbances vs. Residuals  =y-  0 -  1 Buzz e=y-b 0 –b 1 Buzz

19 Part 2: Model and Inference 2-19/49 Standard Deviation of Residuals  Standard deviation of ε i = y i - β 0 – β 1 x i is σ  σ = √E[ε i 2 ] (Mean of ε i is zero)  Sample b 0 and b 1 estimate β 0 and β 1  Residual e i = y i – b 0 – b 1 x i estimates ε i  Use √(1/N)Σe i 2 to estimate σ? Close, not quite. Why N-2? Relates to the fact that two parameters (β 0,β 1 ) were estimated. Same reason N-1 was used to compute a sample variance.

20 Part 2: Model and Inference 2-20/49

21 Part 2: Model and Inference 2-21/49 Linear Regression Sample Regression Line

22 Part 2: Model and Inference 2-22/49 Residuals

23 Part 2: Model and Inference 2-23/49 Regression Computations

24 Part 2: Model and Inference 2-24/49

25 Part 2: Model and Inference 2-25/49

26 Part 2: Model and Inference 2-26/49 Results to Report

27 Part 2: Model and Inference 2-27/49 The Reported Results

28 Part 2: Model and Inference 2-28/49 Estimated equation

29 Part 2: Model and Inference 2-29/49 Estimated coefficients b 0 and b 1

30 Part 2: Model and Inference 2-30/49  Sum of squared residuals, Σ i e i 2

31 Part 2: Model and Inference 2-31/49 S = s e = estimated std. deviation of ε

32 Part 2: Model and Inference 2-32/49 Interpreting  (Estimated by s e ) Remember the empirical rule, 95% of observations will lie within mean ± 2 standard deviations? We show (b 0 +b 1 x) ± 2s e below.) This point is 2.2 standard deviations from the regression. Only 3.2% of the 62 observations lie outside the bounds. (We will refine this later.)

33 Part 2: Model and Inference 2-33/49 No Relationship:  1 = 0Relationship:  1  0 How to Distinguish These Cases Statistically? y i = β 0 + β 1 x i + ε i

34 Part 2: Model and Inference 2-34/49 Assumptions  (Regression) The equation linking “Box Office” and “Buzz” is stable E[Box Office | Buzz] = α + β Buzz  Another sample of movies, say 2012, would obey the same fundamental relationship.

35 Part 2: Model and Inference 2-35/49 Sampling Variability Samples 0 and 1 are a random split of the 62 observations. Sample 1: Box Office = -13.25 + 68.51 Buzz Sample 0: Box Office = -16.09 + 79.11 Buzz

36 Part 2: Model and Inference 2-36/49 Sampling Distributions

37 Part 2: Model and Inference 2-37/49 n = N-2 Small sample Large sample

38 Part 2: Model and Inference 2-38/49  Standard Error of Regression Slope Estimator

39 Part 2: Model and Inference 2-39/49 Internet Buzz Regression Regression Analysis: BoxOffice versus Buzz The regression equation is BoxOffice = - 14.4 + 72.7 Buzz Predictor Coef SE Coef T P Constant -14.360 5.546 -2.59 0.012 Buzz 72.72 10.94 6.65 0.000 S = 13.3863 R-Sq = 42.4% R-Sq(adj) = 41.4% Analysis of Variance Source DF SS MS F P Regression 1 7913.6 7913.6 44.16 0.000 Residual Error 60 10751.5 179.2 Total 61 18665.1 Range of Uncertainty for b is 72.72+1.96(10.94) to 72.72-1.96(10.94) = [51.27 to 94.17] If you use 2.00 from the t table, the limits would be [50.1 to 94.6] 

40 Part 2: Model and Inference 2-40/49 Some computer programs report confidence intervals automatically; Minitab does not.

41 Part 2: Model and Inference 2-41/49 Uncertainty About the Regression Slope Hypothetical Regression Fuel Bill vs. Number of Rooms The regression equation is Fuel Bill = -252 + 136 Number of Rooms Predictor Coef SE Coef T P Constant -251.9 44.88 -5.20 0.000 Rooms 136.2 7.09 19.9 0.000 S = 144.456 R-Sq = 72.2% R-Sq(adj) = 72.0% This is b 1, the estimate of β 1 This “Standard Error,” (SE) is the measure of uncertainty about the true value. The “range of uncertainty” is b ± 2 SE(b). (Actually 1.96, but people use 2) 

42 Part 2: Model and Inference 2-42/49 Sampling Distributions and Test Statistics

43 Part 2: Model and Inference 2-43/49 t Statistic for Hypothesis Test

44 Part 2: Model and Inference 2-44/49 Alternative Approach: The P value  Hypothesis:  1 = 0  The ‘P value’ is the probability that you would have observed the evidence on this hypothesis that you did observe if the null hypothesis were true.  P = Prob(|t| would be this large |  1 = 0)  If the P value is less than the Type I error probability (usually 0.05) you have chosen, you will reject the hypothesis.  Interpret: It the hypothesis were true, it is ‘unlikely’ that I would have observed this evidence.

45 Part 2: Model and Inference 2-45/49 P value for hypothesis test

46 Part 2: Model and Inference 2-46/49 Intuitive approach: Does the confidence interval contain zero?  Hypothesis:  1 = 0  The confidence interval contains the set of plausible values of  1 based on the data and the test.  If the confidence interval does not contain 0, reject H 0 :  1 = 0.

47 Part 2: Model and Inference 2-47/49 More General Test

48 Part 2: Model and Inference 2-48/49

49 Part 2: Model and Inference 2-49/49 Summary: Regression Analysis  Investigate: Is the coefficient in a regression model really nonzero?  Testing procedure: Model: y = β 0 + β 1 x + ε Hypothesis: H 0 : β 1 = B. Rejection region: Least squares coefficient is far from zero.  Test: α level for the test = 0.05 as usual Compute t = (b 1 – B)/StandardError Reject H 0 if t is above the critical value  1.96 if large sample  Value from t table if small sample. Reject H 0 if reported P value is less than α level Degrees of Freedom for the t statistic is N-2


Download ppt "Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics."

Similar presentations


Ads by Google