Presentation on theme: "Chapter 12 Inference for Linear Regression"— Presentation transcript:
1Chapter 12 Inference for Linear Regression Target Goals:I can make predictions using regression for normal distributions.I can check conditions for performing inference about the slope β of the population (true) regression line.12.1ah.w: pg 759: 1 – 11 odd
2Inference about the Model We can use LSRL fitted to data to predict y for a given value of x for two quantitative variables.Now we will do tests and construct confidence intervals in this setting.
4Ex. Crying and IQInfants who cry easily may be more easily stimulated than others and this may be a sign of higher IQ.The researchers snapped a rubber band on the sole of the foot of infants and caused the infants to cry.At age 3 years the measured IQ.
5Step 1: Make a scatterplot of the data. Explanatory variable: CryingResponse variable: IQEnter “crying” data into L1 and “IQ” data into L2.Plot and Interpret. STAT:CALC:LinReg(a+bx) L1,L2,Y1Y1:(VARS:Y-VARS:FUNCT:Y1)Scatterplot shows a roughly linear pattern.The correlation r describes the direction and strength of the relationship.
7Step 3: Identify outliers and influential points No extreme outliers or potentially influential observations.
8Step 4: Calculate the Correlation (r value) The correlation between crying and IQ is r =
9Interpret r2 = ,only about 21% of the variation in IQ scores (response variable) is explained by crying intensity.r2 is called the coefficient of determination.Is prediction of IQ accurate with this model? No
10It is interesting though that behavior shortly after birth can partly predict IQ.
11Conditions for Regression Inference How long it will take before Old Faithful erupts again based on the duration of the previous eruption.Conditions for Regression Inference3 SRSs of 20 Old Faithful EruptionsThe values of the slope b for the 1000sample regression lines are plotted.
13Conditions for Regression Inference Our goal is to predict the behavior of y for a given value of x.Linear: The y responses for various samples vary according to a normal distribution.The mean response μy has a straight-line relationship with x.The true regression line is written in the form:
14where μy is the mean response, and is the true y-intercept and β is the true slope.
15Independent: The y responses are independent of each other. Normal: for any fixed value of x, the observed response value y varies according to a normal distribution having mean μy.
16Equal Variance: The standard deviation s about the true regression line is the same for all values of x. (constant).It is usually an unknown parameter.Random: The data come from a well designed random sample or randomized experiment.
18The LSRL : = a + b x where b is an unbiased estimator of the true slope β and a is the unbiased estimator of the true intercept .
19The line is the true regression line, which shows how the mean response μy changes as the explanatory variable x changes.
20Standard Deviationσ determines whether the points fall close to the true regression line (small σ) or are widely scattered (large σ).This is also the size of a typical prediction error if we use the least-squares regression line to predict “how long it will take before Old Faithful erupts again” based on the duration of the previous eruption.
21Ex: Slope and Intercept The LSRL is = xThe slope measures rate of change: how much higher average IQ is for children with one more peak in their crying measurements.b est. the unknown β; we est. that on the average IQ is about 1.5 points higher for each additional crying peak.crying peakIQ
22Standard Deviationσ describes the variability of the response y about the true regression line.Recall that residuals estimate how much y varies about the true line and are the vertical deviations of the data points from the least-square line:Residual = observed y – predicted y
23Standard Error about the LSRL We estimate σ with s, the sample standard deviation, which is also called the standard error (this is the key to inference about the regression).Since σ is unknown, we use s to estimate the value of σ.Note: (n – 2) is the degrees of freedom for the regression model.
24Ex. Calculating Residuals and Standard Error The quickest way to do this is to: (use ex 14.1 data).Enter “crying” data into L1 and “IQ” data into L2. (We already did this.)Recall: LINREG (a+bx) automatically calculates the residuals and stores them in “Resid.”Store “Resid” in L3STAT:CALC:1-Var Stats L3∑ resid2
25To find s, first find s2: To find s2: Enter the value of ∑X2 by hand or (VARS:5: : ∑X2 ) and divide by (n-2)Take sqrt to find s.
26A level C confidence interval for the slope b of the true regression line is
27You will rarely have to calculate this by hand. Regression software gives you the standard error SE b and b itself.
29There are 38 data points so df = n – 2 = 36.Find the critical value t* (critical value).For a 95% C.I. for true slope b, use critical value t* = with df =30 from table C.
30ConcludeWe are 95 % confident that mean IQ increases by, between 0.5 and 2.5 points, for each additional peak in crying.
31Interpret SEbSeb estimates how much the slope of the sample regression line typically varies from the slope of the population (true) regression line if we repeat the data production process many times.If we repeated the experiment many times, the slope the slope of the sample regression line would typically vary by about from the slope of the true regression line for predicting IQ from cry count of infants.