Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inference for Regression Section 14.1.1. Starter 14.1.1 The Goodwill second-hand stores did a survey of their customers in Walnut Creek and Oakland. Among.

Similar presentations


Presentation on theme: "Inference for Regression Section 14.1.1. Starter 14.1.1 The Goodwill second-hand stores did a survey of their customers in Walnut Creek and Oakland. Among."— Presentation transcript:

1 Inference for Regression Section 14.1.1

2 Starter 14.1.1 The Goodwill second-hand stores did a survey of their customers in Walnut Creek and Oakland. Among other things, they noted the sex of each respondent. Here is the breakdown: Is there a significant difference between the proportion of women customers in the two stores? 1.Treat this as a two-sample proportion problem –Find the z statistic and p value; draw a conclusion 2.Do a chi-square test –Find X² and the p value; draw a conclusion –How does X² relate to z? MenWomen W.C.38203 Oakland68150

3 Today’s Objective The students will find estimates for the three unknown population parameters in a linear regression model: True intercept α True slope  True standard deviation σ California Standards 12.0 Students find the line of best fit to a given distribution of data by using least squares regression. 13.0 Students know what the correlation coefficient of two variables means and are familiar with the coefficient's properties.

4 Linear Regression Revisited Recall that we describe the association between two numeric variables in terms of three expressions –Shape –Strength –Direction To make that description, we perform several steps: –Draw a scatterplot of the data –Look for outliers or influential observations Discard if justifiable –Find the equation of the Least Squares Regression Line (LSRL) –Calculate r, a measure of how well the LSRL fits the data r² is the proportion of y variation due to the linear relationship

5 Example Let’s re-do the Sanchez household gas consumption problem from Chapter 3 Recall that the explanatory variable was “degree-days” and the response variable was consumption of gas in hundreds of cubic feet –We called those lists L GASDA and L GASFT Copy the lists into L 1 and L 2 –Link up to get those lists if you do not still have them in your TI (or enter from p. 754)

6 Example Continued Draw the scatterplot on your TI –Are there any points that should be removed? Perform Linear Regression and add the graph of the equation found to the scatterplot Draw a sketch of the scatterplot and line –Write the regression equation and the value of r² below the sketch

7 The Regression Model Several assumptions are made in regression inference: 1.We assume that the response variable y varies as a linear function of the explanatory variable x. 2.For any given x, the response variable y varies normally about the regression line. –Repeated responses are independent of each other 3.The mean of these responses is µ y –µ y = α +  x –α and  are unknown parameters we want to estimate 4.The standard deviation of y is σ –σ is the other unknown parameter we want to estimate –We assume σ is constant throughout the range of x values

8 Estimating α, , and σ The LSRL gives us a and b directly –a is an unbiased estimator of α –b is an unbiased estimator of  To estimate σ, we will use Standard Error:

9 Calculating the sum of resid² Recall that a residual is the difference between the observed and predicted y values –resid = y – y-hat So the first (of 16) residuals for Sanchez is 6.3 – [1.0892 + (.1890)(24)] =.6748 Or do all 16 at once in lists: L 2 – Y 1 (L 1 )  L 3 Easier: Store L RESID in L 3 Then 1-Var Stats (L 3 ) gives Σx²

10 Example Concluded Find Σx² for the Sanchez data –You should have found 1.6082 Now apply the formula to find s –You should have found 0.3390 So we now have estimates for the parameters –Estimate α with a = 1.0892 –Estimate  with b = 0.1890 –Estimate σ with s = 0.3390

11 Today’s Objective The students will find estimates for the three population parameters in a linear regression model: True slope  True intercept α True standard deviation σ California Standards 12.0 Students find the line of best fit to a given distribution of data by using least squares regression. 13.0 Students know what the correlation coefficient of two variables means and are familiar with the coefficient's properties.

12 Homework Read pages 752 – 760 Do problems 1 - 3


Download ppt "Inference for Regression Section 14.1.1. Starter 14.1.1 The Goodwill second-hand stores did a survey of their customers in Walnut Creek and Oakland. Among."

Similar presentations


Ads by Google