# July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics 111 - Lecture 17.

## Presentation on theme: "July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics 111 - Lecture 17."— Presentation transcript:

July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics 111 - Lecture 17

July 1, 2008Lecture 17 - Regression Testing2 Administrative Notes Homework 5 due tomorrow Lecture on Wednesday will be review of entire course

July 1, 2008Lecture 17 - Regression Testing3 Final Exam Thursday from 10:40-12:10 It’ll be right here in this room Calculators are definitely needed! Single 8.5 x 11 cheat sheet (two-sided) allowed I’ve put a sample final on the course website

July 1, 2008Lecture 17 - Regression Testing4 Outline Review of Regression coefficients Hypothesis Tests Confidence Intervals Examples

July 1, 2008Lecture 17 - Regression Testing5 Two Continuous Variables Visually summarize the relationship between two continuous variables with a scatterplot Numerically, we focus on best fit line (regression) Education and MortalityDraft Order and Birthday Mortality = 1353.16 - 37.62 · EducationDraft Order = 224.9 - 0.226 · Birthday

June 30, 2008Stat 111 - Lecture 16 - Regression6 Best values for Regression Parameters The best fit line has these values for the regression coefficients: Also can estimate the average squared residual: Best estimate of slope  Best estimate of intercept 

July 1, 2008Lecture 17 - Regression Testing7 Significance of Regression Line Does the regression line show a significant linear relationship between the two variables? If there is not a linear relationship, then we would expect zero correlation (r = 0) So the slope b should also be zero Therefore, our test for a significant relationship will focus on testing whether our slope  is significantly different from zero H 0 :  = 0 versus H a :   0

June 30, 2008Stat 111 - Lecture 16 - Regression8 Linear Regression Best fit line is called Simple Linear Regression Model: Coefficients:  is the intercept and  is the slope Other common notation:  0 for intercept,  1 for slope Our Y variable is a linear function of the X variable but we allow for error (ε i ) in each prediction We approximate the error by using the residual Observed Y i Predicted Y i =  +  X i

June 30, 2008Stat 111 - Lecture 16 - Regression9 Test Statistic for Slope Our test statistic for the slope is similar in form to all the test statistics we have seen so far: The standard error of the slope SE(b) has a complicated formula that requires some matrix algebra to calculate We will not be doing this calculation manually because the JMP software does this calculation for us!

June 30, 2008Stat 111 - Lecture 16 - Regression10 Example: Education and Mortality

July 1, 2008Lecture 17 - Regression Testing11 Confidence Intervals for Coefficients JMP output also gives the information needed to make confidence intervals for slope and intercept 100·C % confidence interval for slope  : b +/- t n-2 * SE(b) The multiple t * comes from a t distribution with n-2 degrees of freedom 100·C % confidence interval for intercept  : a +/- t n-2 * SE(a) Usually, we are less interested in intercept  but it might be needed in some situations

July 1, 2008Lecture 17 - Regression Testing12 Confidence Intervals for Example We have n = 60, so our multiple t * comes from a t distribution with d.f. = 58. For a 95% C.I., t * = 2.00 95 % confidence interval for slope  : -37.6 ± 2.0*8.307 = (-54.2,-21.0) Note that this interval does not contain zero! 95 % confidence interval for intercept  : 1353± 2.0*91.42 = (1170,1536)

July 1, 2008Lecture 17 - Regression Testing13 Another Example: Draft Lottery Is the negative linear association we see between birthday and draft order statistically significant? p-value

July 1, 2008Lecture 17 - Regression Testing14 Another Example: Draft Lottery p-value < 0.0001 so we reject null hypothesis and conclude that there is a statistically significant linear relationship between birthday and draft order Statistical evidence that the randomization was not done properly! 95 % confidence interval for slope  : -.23±1.98*.05 = (-.33,-.13) Multiple t * = 1.98 from t distribution with n-2 = 363 d.f. Confidence interval does not contain zero, which we expected from our hypothesis test

July 1, 2008Lecture 17 - Regression Testing15 Dataset of 78 seventh-graders: relationship between IQ and GPA Clear positive association between IQ and grade point average Education Example

July 1, 2008Lecture 17 - Regression Testing16 Education Example Is the positive linear association we see between GPA and IQ statistically significant? p-value

July 1, 2008Lecture 17 - Regression Testing17 Education Example p-value < 0.0001 so we reject null hypothesis and conclude that there is a statistically significant positive relationship between IQ and GPA 95 % confidence interval for slope  :.101±1.99*.014 = (.073,.129) Multiple t * = 1.99 from t distribution with n-2 = 76 d.f. Confidence interval does not contain zero, which we expected from our hypothesis test

July 1, 2008Lecture 17 - Regression Testing18 Next Class - Lecture 18 Review of course material

Download ppt "July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics 111 - Lecture 17."

Similar presentations