Regression Line A regression line is a straight line that describe how a response variable y changes as an explanatory variable x changes. We often use a regression line to predict the value of y for a given value of x. Chapter 152
Linear Regression Objective: To quantify the linear relationship between an explanatory variable and a response variable. We can then predict the average response for all subjects with a given value of the explanatory variable. Regression equation: y = a + bx – x is the value of the explanatory variable. – y is the average value of the response variable. Chapter 153
Figure 15.1 Using a straight-line pattern for prediction, for Example 1. The data are the lengths of two bones in 5 fossils of the extinct beast Archaeopteryx.
Figure 15.2 A weaker straight-line pattern. The data are the percentage in each state who voted Democratic in the two Reagan presidential elections.
Least Squares Used to determine the “best” line. We want the line to be as close as possible to the data points in the vertical (y) direction (since that is what we are trying to predict). Least Squares: Use the line that minimizes the sum of the squares of the vertical distances of the data points from the line. Chapter 156
Figure 15.3 A regression line aims to predict y from x. So a good regression line makes the vertical distances from the data points to the line small.
Prediction via Regression Line Chapter 158 u The regression equation is: y = 3.6 + 0.97x –y is the average age of all husbands who have wives of age x. u For all women aged 30, we predict the average husband age to be 32.7 years: 3.6 + (0.97)(30) = 32.7 years Ages of Husband and Wife
Coefficient of Determination (r 2 ) Measures usefulness of regression prediction. r 2, the square of the correlation, measures the percentage of the variation in the values of the response variable (y) that is explained by the regression line. v r=1: r 2 =1:regression line explains all (100%) of the variation in y. v r=.7: r 2 =.49:regression line explains almost half (50%) of the variation in y. Chapter 159
Beware of Extrapolation Sarah’s height was plotted against her age. Can you predict her height at age 42 months? Can you predict her height at age 30 years (360 months)? Chapter 1511
Regression line: y = 71.95 +.383 x height at age 42 months? y = 88 cm. height at age 30 years? y = 209.8 cm. – She is predicted to be 6' 10.5" at age 30. Chapter 1512 Beware of Extrapolation
Correlation Does Not Imply Causation Even very strong correlations may not correspond to a real causal relationship. Chapter 1513
Chapter 1514 Evidence of Causation A properly conducted experiment establishes the connection. Other considerations: – A reasonable explanation for a cause and effect exists. – The connection happens in repeated trials. – The connection happens under varying conditions. – Potential confounding factors are ruled out. – The alleged cause precedes the effect in time.
Chapter 1515 Reasons Two Variables May Be Related (Correlated) Explanatory variable causes change in response variable. Response variable causes change in explanatory variable. Explanatory may have some cause, but is not the sole cause of changes in the response variable. Confounding variables may exist. Both variables may result from a common cause – such as, both variables changing over time. The correlation may be merely a coincidence.
Chapter 1516 Explanatory is not Sole Contributor u barbecued foods are known to contain carcinogens, but other lifestyle choices may also contribute Explanatory: Consumption of barbecued foods Response: Incidence of stomach cancer
Chapter 1517 Common Response (both variables change due to common cause) u Both may result from an unhappy marriage. Explanatory: Divorce among men Response: Percent abusing alcohol
Chapter 1518 The Relationship May Be Just a Coincidence Some strong correlations (or apparent associations) just by chance, even when the variables are not related in the population.
Chapter 1519 Key Concepts Least Squares Regression Equation R 2 Correlation does not imply causation Confirming causation Reasons variables may be correlated Continued…
Chapter 1520 Cautions about Correlation and Regression Only describe linear relationships. Variables are both affected by outliers. Always plot the data before interpreting. Beware of extrapolation. – predicting outside of the range of x Beware of lurking variables. – They have important effect on the relationship among the variables in a study, but are not included in the study. Association does not imply causation.