 # The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition,

## Presentation on theme: "The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition,"— Presentation transcript:

The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to have a numerical description ( model ) of how both variables vary together. We would like to make predictions based on that numerical description. The relationship above looks linear. But which line best describes our data?

The regression line The least-squares regression line is the unique line such that the sum of the squares of the vertical distances of the data points to the line is the smallest possible. ˆ y  0.125x  41.4

Definition, pg 114 Introduction to the Practice of Statistics, Sixth Edition © 2009 W.H. Freeman and Company This means that we don't have to calculate the squared distances to find the least-squares regression line for a data set -we can instead rely on the equations above. But use JMP ("Fit Y by X") to do the tedious computations - or some other technology (e.g.,calculator)

The equation completely describes the regression line. To plot the regression line you only need to choose two x values, put them into the prediction equation, calculate y, and draw the line that goes through those two points... or let JMP do it for you! NOTE: The regression line always passes through the mean of x and y. The points you use for drawing the regression line are computed from the equation..125*450-41.4 = 14.85.125*700-41.4= 46.1 So plot the points (450,14.85) & (700,46.1)  ˆ y  0.125x  41.4 X X

The distinction between explanatory and response variables is crucial in regression. If you exchange y for x in calculating the regression line, you will get a different line. Regression examines the distance of all points from the line in the y (vertical) direction only. Hubble telescope data about galaxies moving away from earth: These two lines are the two regression lines calculated either correctly (x = distance, y = velocity, solid line) or incorrectly (x = velocity, y = distance, dotted line).

There is a positive linear relationship between the number of powerboats registered and the number of manatee deaths. (in 1000’s) The least squares regression line has the equation: Roughly 21 manatees. Thus if we were to limit the number of powerboat registrations to 500,000, what could we expect for the number of manatee deaths?  ˆ y  0.125x  41.4  ˆ y  0.125x  41.4

The least-squares regression line of y on x is the line that minimizes the sum of the squares of the vertical distances of the data points to the line. The equation of the l-s line is usually represented as = b 0 + b 1 x where = the predicted value of y b 0 = the intercept (predicted value of y when x=0) b 1 = the slope of the prediction line The correlation coefficient, r, is related to the l-s regression line as follows: the square of r (r 2 ) is equal to the fraction of the variation in the values of the response variable y that is explained by the least squares regression of y on x. (See next slide)

r=0.994, r-square=0.988r=0.921, r-square=0.848 Here are two plots of height (response) against age (explanatory) of some children. Notice how r 2 relates to the variation in heights...

Now take the Manatee data and regress # of deaths against the year: –what is the explanatory variable? response variable? –what does the scatterplot look like? how would you describe this relationship? –what is the correlation coefficient, r? –what is the interpretation of the square of r? –what is the intercept? the slope? –what is the meaning of these two quantities? –how many manatee deaths would you predict in 2000? 2009? –could you now reverse the roles of powerboat registration and manatee deaths to predict the number of boat registrations in 2009?

Homework: –You should read through section 2.3 - make sure you understand the meaning of the slope/intercept quantities in the linear equations. –Do #2.53 - 2.58, 2.62, 2.64, 2.66, 2.68, 2.73, 2.74. Use JMP to make the scatterplots, describe the association you see between the two variables, and then let JMP calculate the intercept and slope and r- square for the regression line. –With you own data, perform a meaningful regression and bring your interpretation to class on Wednesday…

Download ppt "The correlation coefficient, r, tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition,"

Similar presentations