Presentation on theme: "AP Stat Day Days until AP Exam"— Presentation transcript:
1AP Stat Day 15 63 Days until AP Exam Least Squares RegressionCoefficient of DeterminationResiduals
2Least Squares Regression Line A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes.You may have performed a linear regression in Algebra II, and in Statistics the process is very similar.But before we run a regression, let’s learn about EXACTLY what we are doing.
4What your calculator does… So, your calculator sorts through all the possible variations of lines to come up with and estimate for slope and the y-intercept of a line that has the smallest least-squares sum (be GLAD you don’t have to do this…)It then reports those values so you can write your equation in form.y-hat is the predicted value of your dependent variable. It is different from y- your oberved value.
5EXAMPLE:Using the data from our Kalama children from last class, let’s run a least squares regression and write our equation.
6Interpreting the Slope So, slope in this equation is given by b- please don’t let this confuse you.When we interpret slope, we talk about how much y changes for a 1-unit change in x.Let’s put this in terms of our Kalama children problem…
7Interpreting y-intercept y-intercept in our equation is given by a.The y-intercept will often be an extrapolation and thus may not make any sense in terms of the problem.Let’s practice with our Kalama children problem…
8Correlation Coefficient In order to see r, our correlation coefficient, we need to turn on the diagnostics in your calculator.Now, let’s run the regression analysis again. We have more numbers now- r and r2.r is the correlation coefficient. This is the number that determines the “strength” in our strength, form, and direction description.Kalama children:
9Coefficient of Determination r2 is the coefficient of determination.What this means is that r2 tells us how much of the variation of the data is explained by the relationship between x and y.r2 is always reported as a percentage.Kalama children:
10ResidualsResiduals are simply the distance between the observed (y) and the predicted (y-hat) values.The residuals are plotted against the horizontal axis, some positive and some negative.Unlike the normal probability plot, in a residual plot PATTERNS ARE BAD!Residuals help us determine if a linear model is appropriate for our data.Kalama children:
11ACTIVITY- Guess My AgeOn p.13 record the answers to the following questions.I will give you the answers to the questions, then you will create a scatterplot by hand.You will run a regression and interpret your slope and y-intercept. (Do these make sense?)You will interpret your correlation coefficient and coefficient of determination.
12Minitab Outputs EXAMPLE The following output data from MINITAB shows the number of teachers (in thousands) for each of the states plus the District of Columbia against the number of students (in thousands) enrolled in grades K-12.Predictor Coef Stdev t-ratio pConstantEnrolls= R-sq=81.5%What is the equation of the least squares line? Interpret the slope.Find the correlation coefficient and coefficient of determination. Interpret in the context of the problem.Predict the number of students if the number of teachers in the state is 40,000.Predict the number of teachers if the number of students in the state is 35,700.
13ACTIVITY- Reading Excel and Minitab On p. 14 of your notebook, answer the following questions:The growth and decline of forests is a matter of great public and scientific interest. The paper “Relationships Among Crown Condition, Growth, and Stand Nutrition in Seven Northern Vermont Sugarbushes” included a scatter plot of y = mean crown dieback (%), which is one indicator of growth retardation, and x = soil pH. A statistical computer package MINITAB gives the following analysis:The regression equation is: dieback=31.0 – 5.79 soil pHPredictor Coef Stdev t-ratio pConstantsoil pHs= R-sq=51.5%What is the equation of the least squares line?Where else in the printout do you find the information for the slope and y-intercept?Roughly, what change in crown dieback would be associated with an increase of 1 in soil pH?What value of crown dieback would you predict when soil pH = 4.0?Would it be sensible to use the least squares line to predict crown dieback when soil pH = 5.67?What is the correlation coefficient?
14Rules of Thumb Properties of Correlation A negative r means that there is a negative association. A positive r means that there is a positive association.0 means that there is no association.The closer r is to -1 or 1, the stronger the association.r only measures the strength of a LINEAR relationship and is completely useless in other types of regression.r is NOT resistant. This means that correlation is easily affected by outliers.
15More Rules of Thumb Properties of the coefficient of determination: This value represents the proportion of variability in y that can be explained by the relationship with x.
16FormulasWe can also calculate the slope of the regression line using the standard deviation and the correlation coefficient…And the intercept can found using the mean of x and y.
17Summary p. 15 How do we interpret the correlation coefficient? How do we interpret the coefficient of determination?What do you look for in a Minitab output to write the least squares regression equation?
18Prep Questions p.16 What is horsepower? What do you think YOUR horsepower would be?REMEMBER to wear comfortable clothes and running shoes on WEDNESDAY 10/12.