Presentation on theme: "Regression Inferential Methods. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for each. Using."— Presentation transcript:
Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for each. Using the slope from each of these LSRLs – we can create a sampling distribution for the slope of the true LSRL. bbbb b b b What shape will this distribution have? What is the mean of the sampling distribution equal? b = What is the standard deviation of the sampling distribution?
Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other words, there is a distribution of weights for adult females who are 5 feet tall. This distribution is normally distributed. (we hope) What would you expect for other heights? Where would you expect the TRUE LSRL to be? What about the standard deviations of all these normal distributions? We want the standard deviations of all these normal distributions to be the same.
Regression Model The mean response y has a straight-line relationship with x: –Where: slope and intercept are unknown parameters For any fixed value of x, the response y varies according to a normal distribution. Repeated responses of y are independent of each other. The standard deviation of y ( y ) is the same for all values of x. ( y is also an unknown parameter)
Person #HtWt 164130 1064175 1564150 1964125 2164145 4064186 4764121 6064137 6364143 6864120 7064112 7864108 8364160 Suppose we look at part of a population of adult women. These women are all 64 inches tall. What distribution does their weight have?
The slope b of the LSRL is an unbiased estimator of the true slope . The intercept a of the LSRL is an unbiased estimator of the true intercept . The standard error s is an unbiased estimator of the true standard deviation of y ( y ). Note: df = n-2 We use to estimate
Let’s review the regression model! For a given x-value, the responses (y) are normally distributed x & y have a linear relationship with the true LSRL going through the y y is the same for each x-value.
Height Weight Suppose the LSRL has a horizontal line –would height be useful in predicting weight? What is the slope of a horizontal line? A slope of zero – means that there is NO relationship between x & y!
Assumptions for inference on slope The true relationship is Linear –Check the scatter plot & residual plot The observations are Independent and random –Check that you have an SRS For any fixed value of x, the response y varies Normally about the true regression line. –Check a histogram or boxplot of residuals Equal variance about regression line. The standard deviation of the response is constant. –Check the scatter plot & residual plot L I N E
Hypotheses H 0 : = 0 H a : > 0 H a : < 0 H a : ≠ 0 This implies that there is no relationship between x & y Or that x should not be used to predict y What would the slope equal if there were a perfect relationship between x & y? 1 Be sure to define !
Formulas: Confidence Interval: Hypothesis test: df = n -2 Because there are two unknowns &
Example: It is difficult to accurately determine a person’s body fat percentage without immersing him or her in water. Researchers hoping to find ways to make a good estimate immersed 20 male subjects, and then measured their weights. a)Find the LSRL, correlation coefficient, and coefficient of determination. Body fat = -27.376 + 0.250 weight r = 0.697 r 2 = 0.485
b) Explain the meaning of slope in the context of the problem. For each increase of 1 pound in weight, there is an approximate increase in.25 percent body fat. c) Explain the meaning of the coefficient of determination in context. Approximately 48.5% of the variation in body fat can be explained by the regression of body fat on weight.
d) Estimate , , and . = -27.376 = 0.25 = 7.049 e) Create a scatter plot and residual plot for the data. Weight Residuals Weight Body fat
f) Is there sufficient evidence that weight can be used to predict body fat? Assumptions: Scatterplot and residual plot shows Linear association. Have an Independent SRS of male subjects Since the boxplot of residual is approximately symmetrical, the responses are approximately Normally distributed. Since the points are evenly spaced across the LSRL on the scatterplot, y is approximately Equal for all values of weight H 0 : = 0Where is the true slope of the LSRL of weight H a : ≠ 0& body fat Since the p-value < , I reject H 0. There is sufficient evidence to suggest that weight can be used to predict body fat.
g) Give a 95% confidence interval for the true slope of the LSRL. Assumptions: Scatter plot and residual plot show LINEAR association Have an INDEPENDENT SRS of male subjects Since the boxplot of residualS is approximately symmetrical, the responses are approximately NORMALLY distributed. Since the points are evenly spaced across the LSRL on the scatterplot, y is approximately EQUAL for all values of weight We are 95% confident that the true slope of the LSRL of weight & body fat is between 0.12 and 0.38. Be sure to show all graphs!
h) Here is the computer-generated result from the data: Sample size: 20 R-square = 48.5% s = 7.0491323 ParameterEstimateStd. Err. Intercept-27.37626311.547428 Weight0.249874140.060653996 df? Correlation coeficient? Be sure to write as decimal first! What does “s” represent (in context)? What do these numbers represent? What does this number represent?