Presentation on theme: "LSRLs: Interpreting r vs. r 2 r – “the correlation coefficient” tells you the strength and direction between two variables (x and y, for example, height."— Presentation transcript:
LSRLs: Interpreting r vs. r 2 r – “the correlation coefficient” tells you the strength and direction between two variables (x and y, for example, height v. weight, r = ?). -1 ≤ r ≤ 1 r will not work for non-linear relationships r does not have units (r ≠.30 pounds?) r is not resistant to outliers! Consider the effect of outliers when looking at r, report r with outliers and without r is same regardless of which is explanatory and which is response variable
Understanding what is expected with LSRLs Note: When finding LSRL the placement of the explanatory and response variables DOES matter! Y_hat = _x + _ (prediction equation, equation of line of best fit) Found by minimizing sum of squares of residuals *extra credit for manual calculation from packet 1. Find LSRL using calculator: stat->calc->8 or 4 (linreg) resulted in the output for packet examples, y_hat = a + bx (#8), y_hat = ax + b (#4) 2. Find LSRL using the mathematical formula of minimizing a quadratic function. (extra credit). 3. Find LSRL using computer output. 4. Find LSRL using b= r s y /s x. (You are not given data, you are given statistics: s y, s x, x_bar, y_bar, and r.) Find b., Substitute into y_bar = ax_bar + b. Solve for a. Substitue a and b into y_hat = a + bx and you are done.
Simple understanding: r v r 2 r, correlation coefficient (strength and direction, only about relationship between x and y, r is related to the slope of LSRL – b = rs y /s x ) r 2, coefficient of determination (how strong=accurate is our LSRL? How much better is the LSRL at making a prediction than using y_bar alone?)
Examining LSRLs: r v. r 2 Students height v. weight y_hat = 4.915x predicted weight = 4.915(height) r = r 2 =
To answer the question in your packet, which is the better prediction equation (which would be more accurate in making a prediction)? The one with the highest r 2 value! The higher the value, the more % of variation in y is explained by the LSRL of y on x.
Theory behind r 2 It tells us how much better a line with a slope would be at predicting than a line of y=y_bar. It compares the vertical deviations (residuals) between the sloped line and the horizontal line (y=y_bar) and tells how much better the sloped line is in accounting for this variation. This math and theory can be found in the book You don’t have to know the mathematical formulas for finding it for AP Test or my test.
What You Should Know: Summary of r 2 r 2 tells us how accurate our LSRL is at making predictions. Do you think the x value in each observation tells you something about y? How much is it actually telling you? When r 2 = 1 we say “100% of the variation in weight is explained by the LSRL. When r 2 =.64, we say “64% of the variation in weight is explained by the LSRL. r 2 tells us the fractional variation in y that is explained by the LSRL of y on x. MUST USE THIS SPECIFIC LANGUAGE TO INTERPRET r 2 ON THE AP TEST AND MY TEST!!!
What is a residual? The vertical deviation from y to y_hat from each observation to the LSRL (y_hat) -> “y-y_hat”. The residual values (the vertical deviations) are stored in your calculator each time you run a linear regression LinReg a+bx. These residuals can be found in RESID in your calculator 2 nd ->Stat->RESID
What do the residuals tell us? The residuals tell us whether a line is a best fit (maybe a non-linear function, exponential or power, might fit the data better and help us predict better). How to create a residual plot: Plot x, the explanatory variable, L1 vs. y=RESIDS. (x vs RESIDS) If the plot shows a pattern (not scattered), then a line is not a best fit.