# Chapter 8: Prediction Eating Difficulties Often with bivariate data, we want to know how well we can predict a Y value given a value of X. Example: With.

## Presentation on theme: "Chapter 8: Prediction Eating Difficulties Often with bivariate data, we want to know how well we can predict a Y value given a value of X. Example: With."— Presentation transcript:

Chapter 8: Prediction Eating Difficulties Often with bivariate data, we want to know how well we can predict a Y value given a value of X. Example: With the Stress/Eating Difficulties data below, what is the expected level of eating difficulty for a stress level of 15? 179 813 87 2018 1411 71 215 2215 1926 3028 X Y 5101520253035 5 10 15 20 25 Stress Eating Difficulties

Example: With the Stress/Eating Difficulties data below, what is the expected level of eating difficulty for a stress level of 15? The basic procedure is to draw the best-fitting line through the scatter plot, and find the value of Y corresponding to X=15. Y = ??? X = 15 What does it mean to be the ‘best-fitting line’? 179 813 87 2018 1411 71 215 2215 1926 3028 X Y Eating Difficulties 5101520253035 5 10 15 20 25 Stress Eating Difficulties

A quick look back on the definitions of the mean and variance: It turns out that the mean is the value that minimizes the sums of squared deviations. This equation: is true for all values of a In other words, the mean is the ‘best-fitting’ value for the values of X in terms of the sums of squared deviations. Correspondingly, the best-fitting line is the line that minimizes the sums of squared differences between the line and each data point. This line is called the regression line.

Y = ??? X = 15 Eating Difficulties 5101520253035 5 10 15 20 25 Stress Eating Difficulties

Quick review on slopes and intercepts: The ‘point-slope’ formula for a line is: Y = m(X-x1)+y1 Where m is the slope, and (x1,y1) is a point on the line (x1,y1) 1 m

The regression line passes through the means of X and Y : Eating Difficulties 179 813 87 2018 1411 71 215 2215 1926 3028 X Y Y = ??? X = 15 Eating Difficulties 5101520253035 5 10 15 20 25 Stress Eating Difficulties

The slope of the regression line is: where r is the Pearson correlation S X and S Y are the standard deviations of x and y Putting it together, the equation of the regression line is: slope Y-intercept

nXYX2X2 Y2Y2 XY 1017928981153 81364169104 87644956 2018400324360 1411196121154 7249414 21544125105 2215484225330 1926361676494 3028900784840 Totals166134324824582610 SS X 492.4 SSy662.4 r0.675 Calculating the regression line in our example Equation of regression line:

Original Example Question: What is the expected level of eating difficulty for a stress level of 15? Answer: Plug 15 in for X in the equation of the regression line: We expect an eating difficulty level of 12.02 for a stress level of 15 Y = 12.2 X = 15 Eating Difficulties 5101520253035 5 10 15 20 25 Stress Eating Difficulties

Another example: The ages of men and women in the US when they marry correlate with a value of r=0.85. Suppose that the average age at marriage for women is 25.1 years with a standard deviation of 5 years, and the average age at marriage for men is 26.8 years with a standard deviation of 6 years. What is the expected age of a groom for a 22 year old bride? (these numbers are made up) Answer: We need to calculate the regression line with X = women and Y = men, and evaluate it for X = 22. Plugging in 22 for X: The expected age of a groom for a 22 year old bride is 23.6 years.

Here’s what a typical sample of 200 couples would look like with r = 0.85 Age of Bride Age of Groom r=0.85, Regression line: Y'=1.02X+1.20 Y'=1.02X+1.20

Correlation of r=1.0 10203040 20 30 40 Age of Bride Age of Groom r=0.00, Regression line: Y'=0.00X+26.8 Correlation of r=0.0 Look what happens to the regression line. When r=1, we can perfectly predict the groom’s age based entirely on the means and variances. When the correlation is zero, then we can’t say anything about the groom’s age based on the bride’s, so our best estimate is the mean age of the groom (26.8 years). Here are typical distributions for extreme values of r 02040 0 10 20 30 40 50 Age of Bride Age of Groom r=1.00, Regression line: Y'=1.20X-3.32

S YX : the standard error of the estimate The regression line is the ‘best-fitting’ line through a bivariate distribution in terms of minimizing the sums of squared deviations between the values of Y and the line. How can we measure how good this fit is? A natural measure is the standard error of the estimate: This is just like a standard deviation. It’s a measure of the average deviation between the line and the data. (Y-Y’) X = 15 Eating Difficulties 5101520253035 5 10 15 20 25 Stress Eating Difficulties

Another way of calculating S YX is: This makes intuitive sense. How well the data fits the line is a combination of the variability in Y (S Y ), and the correlation (r). 10203040 20 30 40 Age of Bride Age of Groom r=0 02040 0 10 20 30 40 50 Age of Bride Age of Groom r=1 If the correlation is perfect (r=1), then S YX = 0. If the correlation is zero, then S YX = S Y. Notice that is S YX always less than or equal to S Y

nXY 10179 813 87 2018 1411 72 215 2215 1926 3028 Totals166134 Means16.613.4 SS X 492.4 SSy662.4 r0.675 Example: Calculating S YX using

nXYY'Y-Y'(Y-Y') 2 1022 42.79-21.25451.61 51139.80-28.56815.43 736752.4614.21201.86 601050.01-39.811584.86 227942.9336.411325.33 136041.1418.68349.05 174841.986.2739.26 9340.42-37.551410.03 116340.9221.69470.60 307444.3729.91894.78 Totals26343707543 Means26.3243.68 SS X 4657.6SXSX 21.58S YX 27.46 SSy7704.504SYSY 27.76 r0.145 slope0.186 Another example of calculating r and S YX

Download ppt "Chapter 8: Prediction Eating Difficulties Often with bivariate data, we want to know how well we can predict a Y value given a value of X. Example: With."

Similar presentations