 # ASSESSING THE STRENGTH OF THE REGRESSION MODEL. Assessing the Model’s Strength Although the best straight line through a set of points may have been found.

## Presentation on theme: "ASSESSING THE STRENGTH OF THE REGRESSION MODEL. Assessing the Model’s Strength Although the best straight line through a set of points may have been found."— Presentation transcript:

ASSESSING THE STRENGTH OF THE REGRESSION MODEL

Assessing the Model’s Strength Although the best straight line through a set of points may have been found and the assumptions for  may appear valid, is the resulting regression line useful in predicting y? NONO YES

STEP 4: HOW GOOD IS THE MODEL? Can we conclude that there is a linear relation between y and x? –This is a hypothesis test (t-test) What proportion of the overall variability in y (from its mean) can be explained by changes in x? r 2 –This is a performance measure called -- the coefficient of determination (denoted by r 2 )

Can we conclude a linear relation exists between y and x? We are hypothesizing that y changes linearly with x: y =  0 +  1 x. That is, if x goes up by 1, y will change by  1. But if no linear relation exists, then that means if x goes up by 1, y will not change, i.e.  1 = 0.

The Hypothesis Test To test whether or not a linear relation exists: H 0 :  1 = 0 (No linear relation exists) H A :  1  0 (A linear relation does exists)  = the significance level Reject H 0 (Accept H A ) if t > t  /2 or if t < -t  /2 with Degrees of Freedom = n- (# betas) = n-2

The t –statistic for the test of  1 = 0

HAND CALCULATIONS Test: Reject H 0 if t > t.025,8 = 2.306 or t < -t.025,8 = -2.306 5.123 > 2.306 5.123 > 2.306 – Can conclude β 1  0, i.e. a linear relation exists a linear relation exists.

95% Confidence Interval for  1 (Point Estimate)  t.025,n-2 (Appropriate st’d dev.)

Coefficient of Determination -- r 2 r 2 The proportion of the total change in y that can be explained by changes in the x values is called the coefficient of determination, denoted r 2.

Hand Calculation of SSR, SSE, SST 11200101000109567.57186802403.2173403214.0226010000 28009200088540.5454161643.5411967859.7515210000 3100011000099054.059948056.98119813732.7198810000 41300120000114824.32358130051.1326787618.7580810000 57009000083283.78159168911.6145107560.2634810000 68008200088540.5454161643.5442778670.56193210000 710009300099054.059948056.9836651570.498410000 86007500078027.03319443162.899162892.622436810000 99009100093797.304421358.667824872.16924010000 101100105000104310.8170741738.50474981.738582810000 SUM1226927027.03373972972.971600900000 2 i 2 i 2 iiii )y(y )y ˆ )yy ˆ ( y ˆ y x i  SSRSSESST

Hand Calculation of r 2

Interpretation of r 2 r 2 = 1 -- perfect (positive or negative) relation i.e. points fit exactly along the regression line r 2 close to 0 -- very little relation The higher the value of r 2 the better the model fits the data

Pearson Correlation Coefficient, r Pearson correlation coefficient.r =  r 2, which can also be calculated by cov(x,y)/s x s y is called the Pearson correlation coefficient. This is also used to measure the strength of the relation between y and x. r = -1 means perfect negative correlation (i.e. all points fit exactly on a line with negative slope). r = +1 means perfect positive correlation (i.e. all points fit exactly on a line with positive slope). r = 0 means no correlation. Other values give relative strength, but have no exact meaning like r 2 – so we usually use r 2 When we take the square root of r 2 to get r, the sign in front of r is the sign of b 1 – positive or negative slope

EXCEL r2r2 r (“+” if  1 >0; “-” if  1 < 0) SSRSSESST s b1 t statistic for  1 test p-value for  1 test 95% Confidence Interval for  1

Steps Using Excel 1.Determine regression equation Equation: ŷ = 46486.49 + 52.56757x 2.Can you conclude a linear relation exists between y and x? The p-value for the test is.000904 <  =.05—YES 3.What proportion of the overall variation in y is explained by changes to x? This is r 2 =.766398 -- a high r 2 CONCLUSION: Overall a good model!

Review Can we conclude a linear relation exists? –Two-tailed t-test of  1  0 –Look at p-value for the x-variable on Excel Computation of a confidence interval for the amount y will change per unit increase in x (i.e. for  1 ) –By hand –Printed on Excel Output What proportion of the overall variation in y is explained by changes in x?– r 2 –By hand –Printed on Excel Pearson correlation coefficient – r –Square root of r 2 –Sign is same as b 1

Download ppt "ASSESSING THE STRENGTH OF THE REGRESSION MODEL. Assessing the Model’s Strength Although the best straight line through a set of points may have been found."

Similar presentations