Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 8: Prediction Eating Difficulties Often with bivariate data, we want to know how well we can predict a Y value given a value of X. Example: With.

Similar presentations


Presentation on theme: "Chapter 8: Prediction Eating Difficulties Often with bivariate data, we want to know how well we can predict a Y value given a value of X. Example: With."— Presentation transcript:

1 Chapter 8: Prediction Eating Difficulties Often with bivariate data, we want to know how well we can predict a Y value given a value of X. Example: With the Stress/Eating Difficulties data below, what is the expected level of eating difficulty for a stress level of 15? X Y Stress Eating Difficulties

2 Example: With the Stress/Eating Difficulties data below, what is the expected level of eating difficulty for a stress level of 15? The basic procedure is to draw the best-fitting line through the scatter plot, and find the value of Y corresponding to X=15. Y = ??? X = 15 What does it mean to be the ‘best-fitting line’? X Y Eating Difficulties Stress Eating Difficulties

3 A quick look back on the definitions of the mean and variance: It turns out that the mean is the value that minimizes the sums of squared deviations. This equation: is true for all values of a In other words, the mean is the ‘best-fitting’ value for the values of X in terms of the sums of squared deviations. Correspondingly, the best-fitting line is the line that minimizes the sums of squared differences between the line and each data point. This line is called the regression line.

4 Y = ??? X = 15 Eating Difficulties Stress Eating Difficulties

5 Quick review on slopes and intercepts: The ‘point-slope’ formula for a line is: Y = m(X-x1)+y1 Where m is the slope, and (x1,y1) is a point on the line (x1,y1) 1 m

6 The regression line passes through the means of X and Y : Eating Difficulties X Y Y = ??? X = 15 Eating Difficulties Stress Eating Difficulties

7 The slope of the regression line is: where r is the Pearson correlation S X and S Y are the standard deviations of x and y Putting it together, the equation of the regression line is: slope Y-intercept

8 nXYX2X2 Y2Y2 XY Totals SS X SSy662.4 r0.675 Calculating the regression line in our example Equation of regression line:

9 Original Example Question: What is the expected level of eating difficulty for a stress level of 15? Answer: Plug 15 in for X in the equation of the regression line: We expect an eating difficulty level of for a stress level of 15 Y = 12.2 X = 15 Eating Difficulties Stress Eating Difficulties

10 Another example: The ages of men and women in the US when they marry correlate with a value of r=0.85. Suppose that the average age at marriage for women is 25.1 years with a standard deviation of 5 years, and the average age at marriage for men is 26.8 years with a standard deviation of 6 years. What is the expected age of a groom for a 22 year old bride? (these numbers are made up) Answer: We need to calculate the regression line with X = women and Y = men, and evaluate it for X = 22. Plugging in 22 for X: The expected age of a groom for a 22 year old bride is 23.6 years.

11 Here’s what a typical sample of 200 couples would look like with r = 0.85 Age of Bride Age of Groom r=0.85, Regression line: Y'=1.02X+1.20 Y'=1.02X+1.20

12 Correlation of r= Age of Bride Age of Groom r=0.00, Regression line: Y'=0.00X+26.8 Correlation of r=0.0 Look what happens to the regression line. When r=1, we can perfectly predict the groom’s age based entirely on the means and variances. When the correlation is zero, then we can’t say anything about the groom’s age based on the bride’s, so our best estimate is the mean age of the groom (26.8 years). Here are typical distributions for extreme values of r Age of Bride Age of Groom r=1.00, Regression line: Y'=1.20X-3.32

13 S YX : the standard error of the estimate The regression line is the ‘best-fitting’ line through a bivariate distribution in terms of minimizing the sums of squared deviations between the values of Y and the line. How can we measure how good this fit is? A natural measure is the standard error of the estimate: This is just like a standard deviation. It’s a measure of the average deviation between the line and the data. (Y-Y’) X = 15 Eating Difficulties Stress Eating Difficulties

14 Another way of calculating S YX is: This makes intuitive sense. How well the data fits the line is a combination of the variability in Y (S Y ), and the correlation (r) Age of Bride Age of Groom r= Age of Bride Age of Groom r=1 If the correlation is perfect (r=1), then S YX = 0. If the correlation is zero, then S YX = S Y. Notice that is S YX always less than or equal to S Y

15 nXY Totals Means SS X SSy662.4 r0.675 Example: Calculating S YX using

16 nXYY'Y-Y'(Y-Y') Totals Means SS X SXSX 21.58S YX SSy SYSY r0.145 slope0.186 Another example of calculating r and S YX


Download ppt "Chapter 8: Prediction Eating Difficulties Often with bivariate data, we want to know how well we can predict a Y value given a value of X. Example: With."

Similar presentations


Ads by Google