The Big Idea Plot data on a scatterplot. Interpret what you see: direction, form, strength, and outliers Numerical summary: mean of X and Y, standard deviation of X and Y, and r. Least- Squares Regression Line How well does it fit: r and r^2
Vocabulary Response Variable: output, dependent variable, y value Explanatory Variable: input, independent variable, x value Scatterplot: a mathematical diagram that shows values for two variables as points on a Cartesian plane; best used for quantitative data Outlier: an observation that has a large residual
Vocabulary Influential Point: a point that has a large effect on the slope of a regression line but has a small residual Correlation: a measure of how dependent the response variable is on the explanatory variable Residuals: difference between observed value of the response value and the value predicted by the regression line
Vocabulary Least-squares Regression Line: the line that makes the sum of the squared vertical distances of the data points from the line as small as possible Sum of Squared Errors: a measure of the difference between the estimated values based on the linear regression and the actual observations Total Sum of Squares: a measure of the difference between the estimated values on the line y = y and the actual observed values.
Vocabulary Coefficient of Determination: the fraction of the variation in the values of the response variable that can be explained by the LSRL of y on x Residual Plots: a plot of the residuals against the explanatory variable Extrapolation: the use of a regression line for prediction outside the range of values of the explanatory variable Lurking Variable: a variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables
Key Topics Data: categorical and quantitative Scatterplots and descriptions Strong/weak, positive/negative, linear/not linear Outliers and Influential Points Creating the least-squares regression line Calculating correlation and coefficient of determination
Formulas To calculate the correlation r : To calculate the slope, b, of the least- squares regression line: To calculate the y-intercept: To calculate the sum of squared errors, SSE:
Formulas To calculate the total sum of squares, SSM: To calculate the coefficient of determination: Or the correlation r could be squared To calculate the residual:
Calculator Key Strokes To make a scatterplot with the calculator, first enter the explanatory variable data in L1. Then enter the corresponding response variable data in L2. Then, turn push “2nd” “Y=” “ENTER” “ENTER”. Next push “ZoomStat” to view the scatterplot. To overlay the least-squares regression line over the scatterplot, follow the above two list of steps. However, after pushing “8” choose to store the RegEQ: by first selecting RegEQ:. Next, push “VARS”, scroll over to “Y- VARS”, and push “ENTER” twice. Push “ENTER” twice again to calculate the least-squares regression line. Next, push “ZoomStat” to view the scatterplot and the overlaying least-squares regression line.
Calculator Key Strokes To calculate the least-squares regression line, r, and r 2, first push “MODE”. Scroll down to “Stat Diagnostics” and select “ON”. Hit “Enter”. Enter the explanatory variable data in L1. Then enter the corresponding response variable data in L2. Press “STAT”, choose “CALC”, then push “8”. Hit “ENTER” five times. The y-intercept, slope, r, and r 2 will be calculated.
Calculator Key Strokes To create a residual plot in the calculator, first enter the explanatory variable data in L1. Then enter the corresponding response variable data in L2. Next, calculate the least-squares regression line. Then, push “2nd” “Y=” “ENTER”. Turn on Plot1, make sure the scatterplot form is selected, and Xlist should be L1. Ylist should be changed to Resid. This is done by selecting Ylist, then pushing “2nd” “Stat” “Resid”. Next push “ZoomStat” to view the residual plot.
Example Problem With this data, find the LSRL Start by entering this data into list 1 and list 2 Shoe Size (men’s U.S.)Height (in) 764 1069 1271 868 9.571 10.570 1172 12.574 13.577 1068
Example Problem Results of the Regression a=53.24 b=1.65 r-squared=.8422 r=.9177
Example Problem Interpreting the intercept When your shoe size is 0, you should be about 53.24 inches tall Of course this does not make much sense in the context of the problem Interpreting the slope For each increase of 1 in the shoe size, we would expect the height to increase by 1.65 inches Making predictions How tall might you expect someone to be who has a shoe size of 12.5? Plug in 12.5 Height = 53.24+1.65 (12.5)=73.865 inches
Helpful Hints Our eyes are not good judges of how strong a linear relationship is. Correlation requires that both variables be quantitative. Correlation makes no distinction between explanatory and response variables. r does not change when the units of measurement of x or y change. The correlation r is always a number between -1 and 1.
Helpful Hints Correlation measures strength of linear relationships only. The correlation is very affected by outliers. Regression, unlike correlation, requires that we have an explanatory variable and a response variable. The size of the LSRL slope does not determine how important a relationship is. There is a close connection between correlation and the slope of LSRL.
Helpful Hints Do not forget to use y-hat in the equations. Write in the form Extrapolation produces unreliable predictions. Lurking variables can make correlation misleading. Correlations based on averages are usually too high when applied to individuals. Association does not imply causation.