Ch 15 – Inference for Regression
Example #1: The following data are pulse rates and heights for a group of 10 female statistics students. Height Pulse a. Sketch a scatterplot of the data. What is the least-squares regression line for predicting pulse rate from height? where = predicted pulse rates x = height
b. What is the correlation coefficient between height and pulse rate? Interpret this number. r = Strong, Positive relationship
c. What is the predicted pulse rate of a 59” tall student?
d. What is the residual for the 59” student? Height Pulse – = – 3.54
e. Construct a residual plot and describe its meaning. No pattern, so good linear model
Ok, so what is the new stuff for chapter 15? This is not the true line for the population! Where = true y-intercept and = true slope of the population
Remember: Residuals tell you information about the line and if it is a good model Chapter 15 only focuses on slope. We are going to determine if there is a linear relationship between two variables. (or = 0)
Conditions for Inference: The observations are independent The relationship is linear Can’t do repeated observations on the same individual! Look for patterns in the residual plot
The standard deviation of the response about the true line is the same everywhere The response varies Normally about the true regression line Look for spread in the residual plot Histogram for residuals, look to see if approx normal Conditions for Inference:
Standard Error about the LSRL: s = unbiased estimator of Standard deviation of residuals
Calculator Tip!Standard Error Stat – Tests - LinRegTTest L1: x L2: y Use Leave RegEq blank Calculate s = standard error
Confidence Intervals for Regression Slope: where Standard error of the slope
SE b estimates the variability in the sampling distribution of the estimated slope (how much slopes vary from experiment to experiment.
Minitab Printout: The regression equation is Predicted y = y-intercept + slope x-variable PredictorCoefStDevTP Constant y-intercept (a)ignoreignoreignore X-variable Slope (b)SEb test-statisticp-value (2-sided) s = standard deviationR-sq = r 2 R-sq(adj) = ignore of residuals
Example #1 Infants who cry easily may be more easily stimulated than others. This may be a sign of higher IQ. Child development researchers explored the relationship between the crying of infants four to ten days old and later their IQ test scores. A snap of a rubber band on the sole of the foot caused the infants to cry. The researchers recorded the crying and measured its intensity by the number of peaks in the most active 20 seconds. They later measured the children’s IQ at age three years using the Stanford-Binet IQ test. The data is below.
CryingIQCryingIQCryingIQCryingIQ
a. Label all important parts of the Minitab printout. The regression equation is IQ = Crycount PredictorCoefStDevTP Constant Crycount s = 17.50R-sq = 20.7%R-sq(adj) = 21% LSRL (y-int) (slope)(SE b ) (standard deviation of the residuals) (correlation of determination)
b. Sketch a scatterplot of the data.
c. Calculate the standard deviation of the residuals using your calculator.
d. Construct a 95% confidence interval for the slope. P:True slope of the line for crying vs. IQ
A: The observations are independent Infants who cry easily may be more easily stimulated than others. This may be a sign of higher IQ. Child development researchers explored the relationship between the crying of infants four to ten days old and later their IQ test scores. A snap of a rubber band on the sole of the foot caused the infants to cry. The researchers recorded the crying and measured its intensity by the number of peaks in the most active 20 seconds. They later measured the children’s IQ at age three years using the Stanford-Binet IQ test. Each infant should be separate from another, not influencing the next test
The relationship is linear A: No apparent patterns in the residuals
A: The standard deviation of the response about the true line is the same everywhere Residuals spread out evenly
The response varies Normally about the true regression line A: Slightly skewed right.
Line of regression T-interval N:
I: ( , )
C: I am 95% confident the true slope of the line for crying vs. IQ is between and Note: 0 is not in the interval! This means they have an linear relationship. OR I am 95% confident the mean IQ increases by between and points for each additional peak in crying.
Ch 15B – Hypothesis Testing for Slope
Remember: so, if r = 0, then b = 0
Ho: Or there is no true linear relationship between x and y. Test Statistic:
Calculator Tip!Line Regression Test Stat – Tests - LinRegTTest L1: x L2: y Leave RegEq blank
Example #1 How well does the number of beers a student drink predict his or her blood alcohol content (BAC). Sixteen students volunteers at Ohio State University drank a randomly assigned number of cans of beer. Thirty minutes later, a police officer measured their BAC. The data is below. Stu # Beer BAC a. What is the least-squares regression line? where = predicted BAC x = # of beers
b. Make a scatterplot of the data and describe its shape. Positive, strong, linear relationship
c. What is the correlation coefficient? What does it mean? r = Strong, positive relationship
d. Label all important parts of the Minitab printout. The regression equation is BAC = – Beers PredictorCoef StDev TP Constant– – Beers s = R-sq = 80%R-sq(adj) = 78.6% LSRL (y-int) (slope) (SE b ) (standard deviation of the residuals) (correlation of determination) (test statistic) (Prob, 2-tailed)
e. Verify the results by using your calculator. Stat – Tests - LinRegTTest L1: x L2: y Leave RegEq blank
f. Conduct the hypothesis test to see if there is a positive relationship between # beers and BAC. P: determine if there is a positive linear relationship between # beers and BAC
H: Ho: =0The number of beers has no effect on BAC Ha: > 0The number of beers has a positive linear effect on BAC.
The relationship is linear A: No apparent patterns in the residuals
A: The standard deviation of the response about the true line is the same everywhere Residuals spread out evenly
The response varies Normally about the true regression line A:
N:Line of Regression T-Test T: 7.48
O: P(t > 7.48) = df = n – 2 =16 – 2 = 14
O: P(t > 7.48) = df = n – 2 =16 – 2 = 14 Less than Or: on calc P(t > 7.48) =
M: < Reject the Null
S:There is enough evidence to claim that an increased number of beers does increase BAC.