CHAPTER 26: Inference for Regression

Slides:



Advertisements
Similar presentations
Chapter 12 Inference for Linear Regression
Advertisements

Inference for Regression
CHAPTER 24: Inference for Regression
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 14: More About Regression Section 14.1 Inference for Linear Regression.
Chapter 14: Inference for Regression
Objectives (BPS chapter 24)
Chapter 10 Simple Regression.
Chapter 12 Section 1 Inference for Linear Regression.
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Confidence Intervals for the Regression Slope 12.1b Target Goal: I can perform a significance test about the slope β of a population (true) regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Chapter 15 Inference for Regression
AP Statistics Section 15 A. The Regression Model When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
Chapter 12: More About Regression
A.P. STATISTICS LESSON 14 – 2 ( DAY 2) PREDICTIONS AND CONDITIONS.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Chapter 14 Inference for Regression AP Statistics 14.1 – Inference about the Model 14.2 – Predictions and Conditions.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
AP STATISTICS LESSON 14 – 1 ( DAY 1 ) INFERENCE ABOUT THE MODEL.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Chapter 10 Inference for Regression
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
AP Statistics Section 15 A. The Regression Model When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
The Practice of Statistics Third Edition Chapter 15: Inference for Regression Copyright © 2008 by W. H. Freeman & Company.
Chapter 12 Inference for Linear Regression. Reminder of Linear Regression First thing you should do is examine your data… First thing you should do is.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Inference for Linear Regression
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 14: More About Regression
CHAPTER 12 More About Regression
23. Inference for regression
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
CHAPTER 12 More About Regression
AP Statistics Chapter 14 Section 1.
Regression Inferential Methods
Inference for Regression
Chapter 12: More About Regression
Chapter 12: More About Regression
CHAPTER 12 More About Regression
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Lecture Slides Elementary Statistics Thirteenth Edition
The Practice of Statistics in the Life Sciences Fourth Edition
CHAPTER 29: Multiple Regression*
Chapter 12: More About Regression
Chapter 14 Inference for Regression
Chapter 12: More About Regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
Chapter 12: More About Regression
CHAPTER 12 More About Regression
Chapter 12: More About Regression
Chapter 12: More About Regression
Chapter 14 Inference for Regression
Day 68 Agenda: 30 minute workday on Hypothesis Test --- you have 9 worksheets to use as practice Begin Ch 15 (last topic)
Chapter 12: More About Regression
CHAPTER 12 More About Regression
Chapter 12: More About Regression
Chapter 12: More About Regression
Inferences for Regression
Chapter 12: More About Regression
Chapter 12: More About Regression
Inference for Regression
Chapter 12: More About Regression
Presentation transcript:

CHAPTER 26: Inference for Regression Basic Practice of Statistics - 3rd Edition CHAPTER 26: Inference for Regression Basic Practice of Statistics 7th Edition Lecture PowerPoint Slides Chapter 5

In Chapter 26, We Cover … Conditions for regression inference Estimating the parameters Using technology Testing the hypothesis of no linear relationship Testing lack of correlation Confidence intervals for the regression slope Inference about prediction Checking the conditions for inference

Introduction When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative response variable y, we can use the least-squares line fitted to the data to predict y for a given value of x. If the data are a random sample from a larger population, we need statistical inference to answer questions like these: Is there really a linear relationship between x and y in the population, or could the pattern we see in the scatterplot plausibly happen just by chance? What is the slope (rate of change) that relates y to x in the population, including a margin of error for our estimate of the slope? If we use the least-squares regression line to predict y for a given value of x, how accurate is our prediction (again, with a margin of error)?

Example STATE: Infants who cry easily may be more easily stimulated than others. This may be a sign of higher IQ. Child development researchers explored the relationship between the crying of infants 4 to 10 days old and their later IQ test scores. PLAN: Make a scatterplot. If the relationship appears linear, use correlation and regression to describe it. Finally, ask whether there is a statistically significant linear relationship between crying and IQ. SOLVE (first steps): Consider the scatterplot: Look for the form, direction, and strength of the relationship as well as for outliers or other deviations. There is a moderately strong positive linear relationship, with no extreme outliers or potentially influential observations.

Example SOLVE: Because the scatterplot shows a roughly linear (straight-line) pattern, the correlation describes the direction and strength of the relationship. The correlation between crying and IQ is r = 0.455. We are interested in predicting the response from information about the explanatory variable. So we find the least-squares regression line for predicting IQ from crying. The equation of the regression line is 𝑦 = 𝑎+𝑏𝑥 = 91.27+1.493 CONCLUDE (first steps): Children who cry more vigorously do tend to have higher IQs. Because r2 = 0.207, only about 21% of the variation in IQ scores is explained by crying intensity. Prediction of IQ will not be very accurate. Is this observed relationship statistically significant?

Conditions for Regression Inference To do inference, think of the slope b and intercept a of the least- squares line as estimates of unknown corresponding parameters 𝛽 and 𝛼 that describe the population of interest. CONDITIONS FOR REGRESSION INFERENCE We have n observations on an explanatory variable x and a response variable y. Our goal is to study or predict the behavior of y for given values of x. For any fixed value of x, the response y varies according to a Normal distribution. Repeated responses y are independent of each other. The mean response 𝜇 𝑦 has a straight-line relationship with x given by a population regression line 𝜇 𝑦 =𝛼+𝛽𝑦 The slope b and intercept a are unknown parameters. The standard deviation of y (call it s) is the same for all values of x. The value of s is unknown. There are thus three population parameters that we must estimate from the data: a, b, and s.

Conditions for Regression Inference The figure below shows the regression model when the conditions are met. The line in the figure is the population regression line µy= α + βx. For each possible value of the explanatory variable x, the mean of the responses µy moves along this line. The value of σ determines whether the points fall close to the population regression line (small σ) or are widely scattered (large σ). The Normal curves show how y will vary when x is held fixed at different values. All the curves have the same standard deviation σ, so the variability of y is the same for all values of x.

Estimating the Parameters The first step in inference is to estimate the unknown parameters a, b, and s. ESTIMATING THE POPULATION REGRESSION LINE When the conditions for regression are met and we calculate the least-squares line 𝑦 =𝑎+𝑏𝑥, the slope b of the least- squares line is an unbiased estimator of the population slope b, and the intercept a of the least-squares line is an unbiased estimator of the population intercept a. The remaining parameter is the standard deviation s , which describes the variability of the response y about the population regression line.

Estimating the Parameters The least-squares line estimates the population regression line, so the residuals estimate how much y varies about the population line. Recall that the residuals are the vertical deviations of the data points from the least-squares line: residual = observed 𝑦−predicted 𝑦 = 𝑦− 𝑦 REGRESSION STANDARD ERROR The regression standard error is 𝑠 = 1 𝑛−2 residual 2 𝑠 = 1 𝑛−2 𝑦− 𝑦 2 Use s to estimate the standard deviation s of responses about the mean given by the population regression line.

Using Technology The least squares regression line for these data is IQ =91.268+1.49 Crycount The intercept, or constant coefficient, is our predicted IQ if we observed a cry count of 0. The slope, or regression coefficient, is our predicted change in the IQ of an infant for each additional observed cry—here 1.4929, or increasing by about 1.5 points.

Using Technology

Testing the Hypothesis of No Linear Relationship Significance Test for Regression Slope To test the hypothesis 𝐻 0 : β = 0, compute the test statistic 𝑡= 𝑏 SE 𝑏 In this formula, the standard error of the least-square slope b is SE 𝑏 = 𝑠 𝑥− 𝑥 2 The sum runs over all observations on the explanatory variable x. In terms of a random variable T having the t(n − 2) distribution, the P-value for a test of 𝐻 0 against 𝐻 𝑎 :𝛽>0 is 𝑃 𝑇≥𝑡 𝐻 𝑎 :𝛽<0 is 𝑃 𝑇≤𝑡 𝐻 𝑎 :𝛽≠0 is 2×𝑃 𝑇≥|𝑡|

Example Crying and IQ: Is the relationship significant? SOLVE: The hypothesis 𝐻 0 :𝛽=0 says that crying has no straight-line relationship with IQ. We conjecture that there is a positive relationship, so we use the one-sided alternative, 𝐻 𝑎 :𝛽>0. The earlier scatterplot showed a positive relationship; the Minitab printout below shows b = 1.4929 and SEb = 0.4870. Thus, 𝑡= 𝑏 SE 𝑏 = 1.4929 0.4870 =3.07 CONCLUDE: It is not surprising that all the outputs give t = 3.07 with two-sided P-value 0.004. The P-value for the one-sided test is half of this, P = 0.002. There is very strong evidence that IQ increases as the intensity of crying increases.

Testing Lack of Correlation The least-squares regression slope b is closely related to the correlation r between the explanatory and response variables. In the same way, the slope β of the population regression line is closely related to the correlation between x and y in the population. Testing the null hypothesis, 𝐻 0 : β = 0, is therefore exactly the same as testing that there is no correlation between x and y in the population from which we drew our data.

Confidence Intervals for Regression Slope The slope is the rate of change of the mean response as the explanatory variable increases. We often want to estimate β. The confidence interval for β has the familiar form estimate± 𝑡 ∗ SE estimate Confidence Interval for Regression Slope A level C confidence interval for the slope of the population regression line is 𝑏± 𝑡 ∗ SE 𝑏 Here, 𝑡 ∗ is the critical value for the t(n − 2) density curve with area C between −𝑡 ∗ and 𝑡 ∗ .

Example Crying and IQ: estimating the slope From the computer output, we have slope b = 1.4929 and SEb = 0.4870. There are 38 data points, so the degrees of freedom are n – 2 = 36. Using software, for 95% confidence, enter the cumulative proportion 0.975: The 95 % confidence interval for the slope of the population regression line is 𝑏± 𝑡 ∗ SE 𝑏 = 1.4929± 2.02809 0.4870 = 1.4929±0.9877 = 0.505 to 2.481 We are 95% confident that the mean IQ increases by between 0.5 and 2.5 points for each additional peak in crying.

Inference About Prediction One of the most common reasons to fit a line to data is to predict the response to a particular value of the explanatory variable. We want, not simply a prediction, but a prediction with a margin of error that describes how accurate the prediction is likely to be. Write the given value of the explanatory variable x as x*. The distinction between predicting a single outcome and predicting the mean of all outcomes when x = x* determines what margin of error is correct. To emphasize the distinction, we use different terms for the two intervals. To estimate the mean response, we use a confidence interval. It is an ordinary confidence interval for the mean response when x has the value x*, which is 𝜇 𝑦 =𝛼+𝛽𝑥. This is a parameter, a fixed number whose value we don’t know. To estimate an individual response y, we use a prediction interval. A prediction interval estimates a single random response y rather than a parameter like 𝜇 𝑦 . The response y is not a fixed number. If we took more observations with x = x*, we would get different responses.

Inference About Prediction CONFIDENCE AND PREDICTION INTERVALS FOR REGRESSION RESPONSE A level C confidence interval for the mean response 𝜇 𝑦 when x takes the value x* is 𝑦 = 𝑡 ∗ 𝑆𝐸 𝜇 The standard error 𝑆𝐸 𝜇 is 𝑆𝐸 𝜇 =𝑠 1 𝑛 + 𝑥 ∗ − 𝑥 2 𝑥− 𝑥 2

Inference About Prediction CONFIDENCE AND PREDICTION INTERVALS FOR REGRESSION RESPONSE A level C prediction interval for a single observation y when x takes the value x* is 𝑦 = 𝑡 ∗ 𝑆𝐸 𝑦 The standard error for prediction 𝑆𝐸 𝑦 is 𝑆𝐸 𝑦 =𝑠 1+ 1 𝑛 + 𝑥 ∗ − 𝑥 2 𝑥− 𝑥 2 In both intervals, t* is the critical value for the t(n – 2) density curve with area C between –t* and t*.

Checking the Conditions for Inference You can fit a least-squares line to any set of explanatory-response data when both variables are quantitative. If the scatterplot doesn’t show a roughly linear pattern, the fitted line may be almost useless. Before you can trust the results of inference, you must check the conditions for inference one-by-one. The relationship is linear in the population. The response varies Normally about the population regression line. Observations are independent. The standard deviation of the responses is the same for all values of x. You can check all of the conditions for regression inference by looking at graphs of the residuals, such as a residual plot.