AP Stat Day 15 63 Days until AP Exam
Least Squares Regression Coefficient of Determination Residuals

Least Squares Regression Line
A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. You may have performed a linear regression in Algebra II, and in Statistics the process is very similar. But before we run a regression, let’s learn about EXACTLY what we are doing.

The “Least Squares” part…
A random scatterplot….

So, your calculator sorts through all the possible variations of lines to come up with and estimate for slope and the y-intercept of a line that has the smallest least-squares sum (be GLAD you don’t have to do this…) It then reports those values so you can write your equation in form. y-hat is the predicted value of your dependent variable. It is different from y- your oberved value.

EXAMPLE: Using the data from our Kalama children from last class, let’s run a least squares regression and write our equation.

Interpreting the Slope
So, slope in this equation is given by b- please don’t let this confuse you. When we interpret slope, we talk about how much y changes for a 1-unit change in x. Let’s put this in terms of our Kalama children problem…

Interpreting y-intercept
y-intercept in our equation is given by a. The y-intercept will often be an extrapolation and thus may not make any sense in terms of the problem. Let’s practice with our Kalama children problem…

Correlation Coefficient
In order to see r, our correlation coefficient, we need to turn on the diagnostics in your calculator. Now, let’s run the regression analysis again. We have more numbers now- r and r2. r is the correlation coefficient. This is the number that determines the “strength” in our strength, form, and direction description. Kalama children:

Coefficient of Determination
r2 is the coefficient of determination. What this means is that r2 tells us how much of the variation of the data is explained by the relationship between x and y. r2 is always reported as a percentage. Kalama children:

Residuals Residuals are simply the distance between the observed (y) and the predicted (y-hat) values. The residuals are plotted against the horizontal axis, some positive and some negative. Unlike the normal probability plot, in a residual plot PATTERNS ARE BAD! Residuals help us determine if a linear model is appropriate for our data. Kalama children:

ACTIVITY- Guess My Age On p.13 record the answers to the following questions. I will give you the answers to the questions, then you will create a scatterplot by hand. You will run a regression and interpret your slope and y-intercept. (Do these make sense?) You will interpret your correlation coefficient and coefficient of determination.

Minitab Outputs EXAMPLE
The following output data from MINITAB shows the number of teachers (in thousands) for each of the states plus the District of Columbia against the number of students (in thousands) enrolled in grades K-12. Predictor Coef Stdev t-ratio p Constant Enroll s= R-sq=81.5% What is the equation of the least squares line? Interpret the slope. Find the correlation coefficient and coefficient of determination. Interpret in the context of the problem. Predict the number of students if the number of teachers in the state is 40,000. Predict the number of teachers if the number of students in the state is 35,700.

On p. 14 of your notebook, answer the following questions: The growth and decline of forests is a matter of great public and scientific interest. The paper “Relationships Among Crown Condition, Growth, and Stand Nutrition in Seven Northern Vermont Sugarbushes” included a scatter plot of y = mean crown dieback (%), which is one indicator of growth retardation, and x = soil pH. A statistical computer package MINITAB gives the following analysis: The regression equation is: dieback=31.0 – 5.79 soil pH Predictor Coef Stdev t-ratio p Constant soil pH s= R-sq=51.5% What is the equation of the least squares line? Where else in the printout do you find the information for the slope and y-intercept? Roughly, what change in crown dieback would be associated with an increase of 1 in soil pH? What value of crown dieback would you predict when soil pH = 4.0? Would it be sensible to use the least squares line to predict crown dieback when soil pH = 5.67? What is the correlation coefficient?

Rules of Thumb Properties of Correlation
A negative r means that there is a negative association. A positive r means that there is a positive association. 0 means that there is no association. The closer r is to -1 or 1, the stronger the association. r only measures the strength of a LINEAR relationship and is completely useless in other types of regression. r is NOT resistant. This means that correlation is easily affected by outliers.

More Rules of Thumb Properties of the coefficient of determination:
This value represents the proportion of variability in y that can be explained by the relationship with x.

Formulas We can also calculate the slope of the regression line using the standard deviation and the correlation coefficient… And the intercept can found using the mean of x and y.

Summary p. 15 How do we interpret the correlation coefficient?
How do we interpret the coefficient of determination? What do you look for in a Minitab output to write the least squares regression equation?

Prep Questions p.16 What is horsepower?
What do you think YOUR horsepower would be? REMEMBER to wear comfortable clothes and running shoes on WEDNESDAY 10/12.