# Copyright © 2010 Pearson Education, Inc. Slide 8 - 1 The correlation between two scores X and Y equals 0.8. If both the X scores and the Y scores are converted.

## Presentation on theme: "Copyright © 2010 Pearson Education, Inc. Slide 8 - 1 The correlation between two scores X and Y equals 0.8. If both the X scores and the Y scores are converted."— Presentation transcript:

Copyright © 2010 Pearson Education, Inc. Slide 8 - 1 The correlation between two scores X and Y equals 0.8. If both the X scores and the Y scores are converted to z-scores, then the correlation between the z-scores for X and the z-scores for Y would be: a. -0.8 b. -0.2 c. 0.0 d. 0.2 e. 0.8

Copyright © 2010 Pearson Education, Inc. Slide 8 - 2 The correlation between two scores X and Y equals 0.8. If both the X scores and the Y scores are converted to z-scores, then the correlation between the z-scores for X and the z-scores for Y would be: a. -0.8 b. -0.2 c. 0.0 d. 0.2 e. 0.8

Copyright © 2010 Pearson Education, Inc. Slide 8 - 4 Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu with a correlation of.83:

Copyright © 2010 Pearson Education, Inc. Slide 8 - 5 The Linear Model The linear model (line of best fit, least squares line, regression line) is just an equation of a straight line through the data to show us how the values are associated. Using this line we will be able to predict values. Predicted values are denoted as: (also called y-hat). The hat tells you they are predicted values. The difference between the observed-value and the predicted-value is called the residual. residual = observed – predicted = y – y(hat)

Copyright © 2010 Pearson Education, Inc. Slide 8 - 6 A negative residual means the predicted values too big (an overestimate). A positive residual means the predicted values too small (an underestimate). In the figure, the estimated fat of the BK Broiler chicken sandwich is 36 g, while the true value of fat is 25 g, the residual=?

Copyright © 2010 Pearson Education, Inc. Slide 8 - 7 Best Fit Means Least Squares Some residuals are positive, others are negative, and, on average, they cancel each other out. To calculate how well the line fits the data we square the residuals (to eliminate the negatives) then find the sum of the squares. The smaller the sum, the better the fit. That is why another name is least squares line.

Copyright © 2010 Pearson Education, Inc. Slide 8 - 8 If the variables are standardized (zscores or standard deviations): The equation of the line of best fit is: Correlation (also called r) is the same for x and y because it is standardized. Therefore:

Copyright © 2010 Pearson Education, Inc. Slide 8 - 9 Example: A scatterplot of house prices vs. house size for houses shows a relationship that is straight, with only moderate scatter and no outliers. The correlation between house price and house size is 0.77. a. You go to an open house and find the house is 1 standard deviation above the mean in size. What would you guess about its price? b. You read an add for a house priced 2 standard deviations below the mean. What would you guess about its size? c. A friend tells you about a house whose size in square meters (hes European) is 1.5 standard deviations above the mean. What would you guess about its size in square feet?

Copyright © 2010 Pearson Education, Inc. Slide 8 - 10 Sometimes we are given the regression line in REAL UNITS!!! The regression line for the Burger King data fits the data well: The equation is Example: What is predicted fat content for a BK Broiler chicken sandwich (with 30 g of protein)?

Copyright © 2010 Pearson Education, Inc. Slide 8 - 11 To find the regression line (in real units): You may be given the standard deviations, correlation and means. OR …You may be given raw data.

Copyright © 2010 Pearson Education, Inc. Slide 8 - 12 First make sure a regression is appropriate: Since regression and correlation are closely related, we need to check the same conditions for regressions as we did for correlations: Quantitative Variables Condition Straight Enough Condition (look at scatterplot) Outlier Condition (look at scatterplot)

Copyright © 2010 Pearson Education, Inc. Slide 8 - 13 To create the Regression Line in Real Units given the standard deviations, correlation (r), and means: You know the equation of a line. In Statistics we use a slightly different notation: We write b 1 and b 0 for the slope and intercept of the line. (slope is always in units of y per unit of x)

Copyright © 2010 Pearson Education, Inc. Slide 8 - 14 To find a regression line (linear model) with raw data: Use your calculator! First, be sure to check: Quantitative Linear (scatterplot) No outliers (scatterplot). If it is not quantitative, not linear or it has outliers, you will NOT be able to model the data with a linear model.

Copyright © 2010 Pearson Education, Inc. Slide 8 - 15 TI Tips: Equation of the Regression Line STAT, CALC, Choose LinReg(a + bx) (Not the first one … the second one … scroll down!) Specify that x and y are YR and TUIT (we put these in our calculator before.)

Copyright © 2010 Pearson Education, Inc. Slide 8 - 16 TI Tips: Equation of the Regression Line Graphed on the Scatterplot STAT, CALC, Choose LinReg(a + bx) (Not the first one … the second one … scroll down!) Specify that x and y are YR and TUIT (we put these in our calculator before.) We want the screen to say: LinReg(a+bx) YR, TUIT, Y1 (this will send the equation to Y1 and then we will see it on our graph) To add Y1 to the end: VARS, Y-VARS, 1:Function and choose Y1 ENTER See the equation. It has also been placed in Y1. Hit GRAPH.

Copyright © 2010 Pearson Education, Inc. Slide 8 - 17 Example: Using the relationship between house price (in thousands of dollars) and house size (in thousands of square feet) the regression model is: a. What is the slope and what does it mean? b. What are the units of the slope? c. Your house is 2000 square feet bigger than your neighbors house. How much more do you expect it to be worth? d. Is the y-intercept of -3.117 meaningful, explain?

Copyright © 2010 Pearson Education, Inc. Slide 8 - 18 Example: The linear model relating hurricanes wind speeds to their central pressures was: Predicted MaxWindSpeed = 955.27-(.897)CentralPressure Hurricane Katrina had a central pressure measured at 920 millibars. What does our regression model predict for her maximum wind speed? How good is that prediction, given that Katrinas actual wind speed was measured at 110 knots?

Copyright © 2010 Pearson Education, Inc. Slide 8 - 19 More about Residuals A scatterplot of all the residuals the graph should be completely random! It should show no bends and should have no outliers.

Copyright © 2010 Pearson Education, Inc. Slide 8 - 20 Draw examples of a residual graph that is not random.

Copyright © 2010 Pearson Education, Inc. Slide 8 - 21 TI Tips – Residual Plots You look at the scatterplot to make sure it is linear. Sometimes it is hard to tell. After you do a regression do a residual plot. If the residual plot is completely random, you know your scatterplot was linear. The calculator automatically stores the residuals in a list named RESID after you run a regression. To look at them … STAT EDIT cursor over to RESID. To create the residual plot … STAT PLOT, Plot2, Xlist:YR and Ylist: RESID Y= may still have your regression line in it. You can either turn it off or remove it. ZoomStat Do you see a curve?

Copyright © 2010 Pearson Education, Inc. Slide 8 - 22 Example: Our linear model for homes uses the model: predicted price = -3.117 + (94.454)(size) a. Would you prefer to find a home with a negative or a positive residual? Explain. b. You plan to look for a home of about 3000 square feet. How much should you expect to have to pay? c. You find a nice home that size selling for \$300,000. Whats the residual?

Copyright © 2010 Pearson Education, Inc. Slide 8 - 23 The Residual Standard Deviation The standard deviation of the residuals, s e, measures how much the points spread around the regression line. S e = Errors in predictions based on this model have a standard deviation of s (standard deviation in y units). We estimate the SD of the residuals using:

Copyright © 2010 Pearson Education, Inc. Slide 8 - 24 R 2 The Variation Accounted For (cont.) All regression analyses include this statistic, although by tradition, it is written R 2 (pronounced R-squared). An R 2 of 0 means that none of the variance in the data is in the model; all of it is still in the residuals. When interpreting a regression model you need to Tell what R 2 means. The % of variability in y that is explained by x is R 2 R 2 is always between 0% and 100%. What makes a good R 2 value depends on the kind of data you are analyzing and on what you want to do with it. Always report slope and intercept for a regression and R 2 so that readers can judge for themselves how successful the regression is at fitting the data.

Copyright © 2010 Pearson Education, Inc. Slide 8 - 25 Assumptions and Conditions Quantitative Variables Condition: Regression can only be done on two quantitative variables (and not two categorical variables). Straight Enough Condition: The linear model assumes that the relationship between the variables is linear. (check by scatterplot)

Copyright © 2010 Pearson Education, Inc. Slide 8 - 26 Assumptions and Conditions (cont.) If the scatterplot is not straight enough, stop here. You can only use a linear model on two variables that are related linearly! Some nonlinear relationships can be saved by re- expressing the data to make the scatterplot more linear.

Copyright © 2010 Pearson Education, Inc. Slide 8 - 27 Assumptions and Conditions (cont.) Its a good idea to check linearity again after computing the regression when we can examine the residuals. Does the Plot Thicken? Condition: Residual plots should be scattered. Dont confuse this with Normal Probability Plots from unit one (to see if it is a normal curve) should be a straight line.

Copyright © 2010 Pearson Education, Inc. Slide 8 - 28 Assumptions and Conditions (cont.) Outlier Condition: Watch out for outliers. Outlying points can dramatically change a regression model. Outliers can even change the sign of the slope, misleading us about the underlying relationship between the variables.

Copyright © 2010 Pearson Education, Inc. Slide 8 - 29 What Can Go Wrong? Dont fit a straight line to a nonlinear relationship. Beware extraordinary points (y-values that stand off from the linear pattern or extreme x-values). Dont extrapolate beyond the datathe linear model may no longer hold outside of the range of the data. Dont infer that x causes y just because there is a good linear model for their relationship association is not causation. Dont choose a model based on R 2 alone.

Copyright © 2010 Pearson Education, Inc. Slide 8 - 30 A few IMPORTANT things to remember: The percentage of variability in y that is explained by x is: r 2 (an example of this will be homework problem #7) Correlation = r = +/- squareroot of r 2 (you need to decide if it is + or – for a positive or negative correlation) residual = observed – predicted = y – y(hat) R 2 tells you how well the actual data fits the model (1 is perfect, zero is no correlation) 1 – r 2 is the fraction of the original variance left in the residuals Be careful not to use a regression to extrapolate (predict values beyond the scope/time frame of the model)

Copyright © 2010 Pearson Education, Inc. Slide 8 - 31 Homework (Day 1) Pg. 192 1-33 odd (skip 9)

Download ppt "Copyright © 2010 Pearson Education, Inc. Slide 8 - 1 The correlation between two scores X and Y equals 0.8. If both the X scores and the Y scores are converted."

Similar presentations