Presentation is loading. Please wait.

Presentation is loading. Please wait.

BPS - 5th Ed. Chapter 231 Inference for Regression.

Similar presentations


Presentation on theme: "BPS - 5th Ed. Chapter 231 Inference for Regression."— Presentation transcript:

1 BPS - 5th Ed. Chapter 231 Inference for Regression

2 BPS - 5th Ed. Chapter 232 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict the average response for all subjects with a given value of the explanatory variable. Linear Regression (from Chapter 5)

3 BPS - 5th Ed. Chapter 233 Case Study Researchers explored the crying of infants four to ten days old and their IQ test scores at age three to determine if more crying was a sign of higher IQ Crying and IQ Karelitz, S. et al., “Relation of crying activity in early infancy to speech and intellectual development at age three years,” Child Development, 35 (1964), pp. 769-777.

4 BPS - 5th Ed. Chapter 234 Case Study Crying and IQ Data collection u Data collected on 38 infants u Snap of rubber band on foot caused infants to cry –recorded the number of peaks in the most active 20 seconds of crying (explanatory variable x) u Measured IQ score at age three years using the Stanford-Binet IQ test (response variable y)

5 BPS - 5th Ed. Chapter 235 Case Study Crying and IQ Data TABLE 23.1

6 BPS - 5th Ed. Chapter 236 Case Study Crying and IQ Data analysis Scatterplot of y vs. x shows a moderate positive linear relationship, with no extreme outliers or potential influential observations

7 BPS - 5th Ed. Chapter 237 Case Study Crying and IQ Data analysis u Correlation between crying and IQ is r = 0.455 (from formula in Chapter 4) u Least-squares regression line for predicting IQ from crying is (as in Ch. 5) u R 2 = 0.207, so 21% of the variation in IQ scores is explained by crying intensity

8 BPS - 5th Ed. Chapter 238 u We now want to extend our analysis to include inferences on various components involved in the regression analysis –slope –intercept –correlation –predictions Inference

9 BPS - 5th Ed. Chapter 239 u Conditions required for inference about regression (have n observations on an explanatory variable x and a response variable y) 1.for any fixed value of x, the response y varies according to a Normal distribution. Repeated responses y are independent of each other. 2.the mean response µ y has a straight-line relationship with x: µ y =  +  x. The slope  and intercept  are unknown parameters. 3.the standard deviation of y (call it  ) is the same for all values of x. The value of  is unknown. Regression Model, Assumptions

10 BPS - 5th Ed. Chapter 2310 u the regression model has three parameters: , , and  u the true regression line µ y =  +  x says that the mean response µ y moves along a straight line as x changes (we cannot observe the true regression line; instead we observe y for various values of x) u observed values of y vary about their means µ y according to a Normal distribution (if we take many y observations at a fixed value of x, the Normal pattern will appear for these y values) Regression Model, Assumptions

11 BPS - 5th Ed. Chapter 2311 u the standard deviation  is the same for all values of x, meaning the Normal distributions for y have the same spread at each value of x Regression Model, Assumptions

12 BPS - 5th Ed. Chapter 2312 When using the least-squares regression line, the slope b is an unbiased estimator of the true slope , and the intercept a is an unbiased estimator of the true intercept  Estimating Parameters: Slope and Intercept

13 BPS - 5th Ed. Chapter 2313 u the standard deviation  describes the variability of the response y about the true regression line u a residual is the difference between an observed value of y and the value predicted by the least- squares regression line: u the standard deviation  is estimated with a sample standard deviation of the residuals (this is a standard error since it is estimated from data) Estimating Parameters: Standard Deviation

14 BPS - 5th Ed. Chapter 2314 The regression standard error is the square root of the sum of squared residuals divided by their degrees of freedom (n  2): Estimating Parameters: Standard Deviation

15 BPS - 5th Ed. Chapter 2315 Case Study Crying and IQ u Since, b = 1.493 is an unbiased estimator of the true slope , and a = 91.27 is an unbiased estimator of the true intercept  –because the slope b = 1.493, we estimate that on the average IQ is about 1.5 points higher for each added crying peak. u The regression standard error is s = 17.50 –see pages 600-601 in the text for this calculation

16 BPS - 5th Ed. Chapter 2316 Case Study Crying and IQ Using Technology:

17 BPS - 5th Ed. Chapter 2317 u The most common hypothesis to test regarding the slope is that it is zero: H 0 :  = 0 –says regression line is horizontal (the mean of y does not change with x) –no true linear relationship between x and y –the straight-line dependence on x is of no value for predicting y u Standardize b to get a t test statistic: Hypothesis Tests for Slope (test for no linear relationship)

18 BPS - 5th Ed. Chapter 2318 Hypothesis Tests for Slope u the standard error of b is a multiple of the regression standard error: u Test statistic for H 0 :  = 0 : –follows t distribution with df = n  2

19 BPS - 5th Ed. Chapter 2319 Hypothesis Tests for Slope u P-value: [for T ~ t(n  2) distribution] H a :  > 0 : P-value = P(T  t) H a :  < 0 : P-value = P(T  t) H a :   0 : P-value = 2  P(T  |t|)

20 BPS - 5th Ed. Chapter 2320 Case Study Crying and IQ Hypothesis Test for slope  P-value t = b / SE b = 1.4929 / 0.4870 = 3.07 Significant linear relationship

21 BPS - 5th Ed. Chapter 2321 u The correlation between x and y is closely related to the slope (for both the population and the observed data) –in particular, the correlation is 0 exactly when the slope is 0 u Therefore, testing H 0 :  = 0 is equivalent to testing that there is no correlation between x and y in the population from which the data were drawn Test for Correlation

22 BPS - 5th Ed. Chapter 2322 u There does exist a test for correlation that does not require a regression analysis –Table E on page 695 of the text gives critical values and upper tail probabilities for the sample correlation r under the null hypothesis that the correlation is 0 in the population v look up n and r in the table (if r is negative, look up its positive value), and read off the associated probability from the top margin of the table to obtain the P-value just as is done for the t table (Table C) Test for Correlation

23 BPS - 5th Ed. Chapter 2323 Case Study Crying and IQ Test for H 0 : correlation = 0 u Correlation between crying and IQ is r = 0.455 u Sample size is n=38 u From Table E: for H a : correlation > 0, the P-value is between.001 and.0025 (using n=40) –P-value for two-sided test is between.002 and.005 (matches two-sided P-value for test on slope) –one-sided P-value would be between.005 and.01 if we were very conservative and used n=30

24 BPS - 5th Ed. Chapter 2324 u A level C confidence interval for the true slope  is b  t* SE b –t* is the critical value for the t distribution with df = n  2 degrees of freedom that has area (1  C)/2 to the right of it –recall, the standard error of b is a multiple of the regression standard error: Confidence Interval for Slope

25 BPS - 5th Ed. Chapter 2325 Case Study Crying and IQ b SE b Confidence interval for slope 

26 BPS - 5th Ed. Chapter 2326 Case Study Crying and IQ b=1.4929, SE b = 0.4870, df = n  2 = 38  2 = 36 (df = 36 is not in Table C, so use next smaller df = 30) For a 95% C.I., (1  C)/2 =.025, and t* = 2.042 So a 95% C.I. for the true slope  is: b  t* SE b = 1.4929  2.042(0.4870) = 1.4929  0.9944 = 0.4985 to 2.4873 Confidence interval for slope 

27 BPS - 5th Ed. Chapter 2327 u Once a regression line is fit to the data, it is useful to obtain a prediction of the response for a particular value of the explanatory variable ( x* ); this is done by substituting the value of x* into the equation of the line ( ) for x in order to calculate the predicted value u We now present confidence intervals that describe how accurate this prediction is Inference about Prediction

28 BPS - 5th Ed. Chapter 2328 u There are two types of predictions –predicting the mean response of all subjects with a certain value x* of the explanatory variable –predicting the individual response for one subject with a certain value x* of the explanatory variable u Predicted values ( ) are the same for each case, but the margin of error is different Inference about Prediction

29 BPS - 5th Ed. Chapter 2329 u To estimate the mean response µ y, use an ordinary confidence interval for the parameter µ y =  +  x* –µ y is the mean of responses y when x = x* –95% confidence interval: in repeated samples of n observations, 95% of the confidence intervals calculated (at x*) from these samples will contain the true value of µ y at x* Inference about Prediction

30 BPS - 5th Ed. Chapter 2330 u To estimate an individual response y, use a prediction interval –estimates a single random response y rather than a parameter like µ y –95% prediction interval: take an observation on y for each of the n values of x in the original data, then take one more observation y at x = x*; the prediction interval from the n observations will cover the one more y in 95% of all repetitions Inference about Prediction

31 BPS - 5th Ed. Chapter 2331 u Both confidence interval and prediction interval have the same form: –both t* values have df = n  2 –the standard errors (SE) differ for the two intervals (formulas on next slide) v the prediction interval is wider than the confidence interval Inference about Prediction

32 BPS - 5th Ed. Chapter 2332 Inference about Prediction

33 BPS - 5th Ed. Chapter 2333 u Independent observations –no repeated observations on the same individual u True relationship is linear –look at scatterplot to check overall pattern –plot of residuals against x magnifies any unusual pattern (should see ‘random’ scatter about zero) Checking Assumptions

34 BPS - 5th Ed. Chapter 2334 u Constant standard deviation σ of the response at all x values –scatterplot: spread of data points about the regression line should be similar over the entire range of the data –easier to see with a plot of residuals against x, with a horizontal line drawn at zero (should see ‘random’ scatter about zero) (or plot residuals against for linear regr.) Checking Assumptions

35 BPS - 5th Ed. Chapter 2335 u Response y varies Normally about the true regression line –residuals estimate the deviations of the response from the true regression line, so they should follow a Normal distribution v make histogram or stemplot of the residuals and check for clear skewness or other departures from Normality –numerous methods for carefully checking Normality exists (talk to a statistician!) Checking Assumptions

36 BPS - 5th Ed. Chapter 2336 Residual Plots x = number of beers y = blood alcohol Roughly linear relationship; spread is even across entire data range (‘random’ scatter about zero) Residuals: -2 731 -1 871 -0 91 0 5578 1 1 2 39 3 (4|1 =.041) 4 1 (close to Normal)

37 BPS - 5th Ed. Chapter 2337 Residual Plots ‘x’ = collection of explanatory variables, y = salary of player Standard deviation is not constant everywhere (more variation among players with higher salaries)

38 BPS - 5th Ed. Chapter 2338 Residual Plots x = number of years, y = logarithm of salary of player A clear curved pattern – relationship is not linear


Download ppt "BPS - 5th Ed. Chapter 231 Inference for Regression."

Similar presentations


Ads by Google