1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.

1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test and Coefficient of determination Non-normality of error, Multicollinearity, Heteroscedasticity and Autocorrelation Dummy variables & Non-linear relationships

2 Population Vs. Sample Regression y =  0 +  1 x 1 +  2 x 2 + …+  k x k +  The population regression line is said to be the true relationship between the X’s and Y. It cannot be observed due to the size of the population and the presence of the random error term

3 Sample Regression line y = b 0 + b 1 x 1 + b 2 x 2 + …+ b k x k + e b 0 is an OLS estimator of  0 b 1 is an OLS estimator of  1 Based on fitting the line to the sample of data e is the error term. It is our best guess of  which is unobservable. It is the vertical distance of a sample data point from the regression line.

4 Residual and Standard Error of Regression The smaller the standard deviation of e (also called the Standard Error of the Regression) the better the line fits the data. It is given by the formula

5 Predicted Value of Y = b 0 + b 1 x 1 + b 2 x 2 + …+ b k x k Is the predicted value of Y obtained by multipying the data values of X by the sample regression coefficients

6 Slope & Intercept The slope b o of the sample regression equation is the value of Y when all the X’s are 0. In other words it is the impact of some variable (other than the X’s) on the Y variable. The slope of any X variable b i is the value that Y changes by when Xi changes by 1 unit.

7 T-test The t-test tests the hypothesis that a given X i does not influence Y at all. If we can reject the null hypothesis then we can conclude that X i does have an effect on Y and thus belongs in the regression equation. H 0 :  i = 0 H 1 :  i = 0

8 T-test We can reject the null hypothesis at a given significance level (usually 5%) if the value of the calculated t-stat in absolute value is greater than the critical value of the t-stat taken from the table (usually 1.96). d.f. = n - k -1

9 F-test The F-test checks to see whether the model as a whole (all the X’s) do anything to explain the movement of Y around its mean value. If the value of the calculated F-stat is greater than the value of the critical F-stat from the table we reject the null hypothesis that none of the X’s belong in the model. H 0 :  1 =  2 = … =  k = 0 H 1 : At least one  i is not equal to zero.

10 F-test The calculated F-stat is given by The critical F-state is given by F>F ,k,n-k-1 SSR/n-k-1 SSE/k F 

11 Coefficient of Determination The R-Square of coefficient of determination gives the percentage of variation of Y around its mean that can be explained by all the X’s. Low R-squares usually mean that the line does not fit the data very well and that the model may not be very good at explaining changes in Y based on the X’s.

12 Non-normality of Error If the assumption that  is distributed normally is called into question we cannot use any of the t-test, F-tests or R-square because these tests are based on the assumption that  is distributed normally. The results of these tests become meaningless.

13 Non-normality of Error Check for normality by computing the JB stat. Normality is preserved is JB stat is less than 5.99. If normality is a problem try a transformation of the Y-variable such as Log(y), 1/Y etc.

14 Multicollinearity When two or more X’s are correlated you have multicollinearity. Symptoms of multicollinearity include insignificant t-stats (due to inflated standard errors of coefficients) and a good R-square. Test: Run a correlation matrix of all X variables. Fix: More data, combine variables.

15 Heteroscedasticity When the variance of the error term is different for different values of X you have heteroscedasticity. Test: Plot Residual (On vertical axis) Vs. Predicted Y-values (Horizontal axis). Presence of a cone shape plot indicates heteroscedasticity. Problem: The OLS estimators for the  ’s are no longer minimum variance. You can no longer be sure that the value you get for b i a lies close to the true  i. Fix: White correction for standard errors.

16 Autocorrelation/ Serial Correlation When the error term from one time period is related to the error term from the previous time period you have first order auotocorrelation. Test: Durbin Watson test. Problem: The standard errors of the coefficients are wrong so the t-tests are unreliable. Fix: Include time as a variable.

17 Dummy Variables Sometimes X’s are categorical male/female, smoker/non-smoker etc. These can be modelled using variables that take on a zero or one value. Remember to always have one dummy variable less than the number of categories you are modelling. Why?

18 Non-linear Realtionships You may believe that X and X square both influence Y. OLS can still be used. Generate a new variable for X-square by multipying X by itself and include both X and X square in the regression. Interpretation of coefficients?

1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.

Similar presentations

Presentation on theme: "1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.

Similar presentations

Presentation on theme: "1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test."— Presentation transcript:

Similar presentations

About project

Feedback