# Chapter 151 Describing Relationships: Regression, Prediction, and Causation.

## Presentation on theme: "Chapter 151 Describing Relationships: Regression, Prediction, and Causation."— Presentation transcript:

Chapter 151 Describing Relationships: Regression, Prediction, and Causation

Chapter 152 Thought Question 1 From a long-term study on several families, researchers constructed a scatterplot of the cholesterol level of a child at age 50 versus the cholesterol level of the father at age 50. You know the cholesterol level of your best friend’s father at age 50. How could you use this scatterplot to predict what your best friend’s cholesterol level will be at age 50?

Chapter 153 Thought Question 2 From past natural disasters, a strong positive correlation has been found between the amount of aid sent and the number of deaths. Would you interpret this to mean that sending more aid causes more people to die? Explain.

Chapter 154 Thought Question 3 Studies have shown a negative correlation between the amount of food consumed that is rich in beta carotene and the incidence of lung cancer in adults. Does this correlation provide evidence that beta carotene is a contributing factor in the prevention of lung cancer? Explain.

Chapter 155 Thought Question 4 A scatterplot of number of bicycles sold versus number of bank robberies in the United States for each year over the past century would show a very strong positive correlation. Why would this be true? Does an increase in one cause an increase in the other?

Chapter 156 Linear Regression u Objective: To quantify the linear relationship between an explanatory variable and a response variable. We can then predict the average response for all subjects with a given value of the explanatory variable. u Regression equation: y = a + bx –x is the value of the explanatory variable –y is the average value of the response variable –note that a and b are just the intercept and slope of a straight line –note that r and b are not the same thing, but their signs will agree Plot

Chapter 157 Least Squares Regression u Used to determine the “best” line u We want the line to be as close as possible to the data points in the vertical (y) direction (since that is what we are trying to predict) u Least Squares: use the line that minimizes the sum of the squares of the vertical distances of the data points from the line Click for Graphical Explanation

Chapter 158 Prediction via Regression Line Hand, et.al., A Handbook of Small Data Sets, London: Chapman and Hall u The regression equation is y = 3.6 + 0.97x –y is the average age of all husbands who have wives of age x u For all women aged 30, we predict the average husband age to be 32.7 years: 3.6 + (0.97)(30) = 32.7 years u Suppose we know that an individual wife’s age is 30. What would we predict her husband’s age to be? Husband and Wife: Ages

Chapter 159 Coefficient of Determination (R 2 ) u Measures usefulness of regression prediction u R 2 (or r 2, the square of the correlation): measures the percentage of the variation in the values of the response variable (y) that is explained by the regression line v r=1: R 2 =1:regression line explains all (100%) of the variation in y v r=.7: R 2 =.49:regression line explains almost half (50%) of the variation in y

Chapter 1510 A Caution Beware of Extrapolation u Sarah’s height was plotted against her age u Can you predict her height at age 42 months? u Can you predict her height at age 30 years (360 months)?

Chapter 1511 A Caution Beware of Extrapolation u Regression line: y = 71.95 +.383 x u height at age 42 months? y = 88 cm. u height at age 30 years? y = 209.8 cm. –She is predicted to be 6' 10.5" at age 30.

Chapter 1512 Correlation Does Not Imply Causation Even very strong correlations may not correspond to a real causal relationship. Click for Graphical Explanation

Chapter 1513 Evidence of Causation u A properly conducted experiment establishes the connection u Other considerations: –A reasonable explanation for a cause and effect exists –The connection happens in repeated trials –The connection happens under varying conditions –Potential confounding factors are ruled out –Alleged cause precedes the effect in time

Chapter 1514 Reasons Two Variables May Be Related (Correlated) u Explanatory variable causes change in response variable u Response variable causes change in explanatory variable u Explanatory may have some cause, but is not the sole cause of changes in the response variable u Confounding variables may exist u Both variables may result from a common cause –such as, both variables changing over time u The correlation may be merely a coincidence

Chapter 1515 Explanatory causes Response u Explanatory: pollen count from grasses u Response: percentage of people suffering from allergy symptoms u Explanatory: amount of food eaten u Response: hunger level

Chapter 1516 Response causes Explanatory u Explanatory: Hotel advertising dollars u Response: Occupancy rate u Positive correlation? – more advertising leads to increased occupancy rate? u Actual correlation is negative: lower occupancy leads to more advertising

Chapter 1517 Explanatory is not Sole Contributor u barbecued foods are known to contain carcinogens, but other lifestyle choices may also contribute u Explanatory: Consumption of barbecued foods u Response: Incidence of stomach cancer

Chapter 1518 Confounding Variables u Explanatory: Meditation u Response: Aging (measurable aging factor) u general concern for one’s well being may be confounded with decision to try meditation Meditation vs. Aging

Chapter 1519 Common Response (both variables change due to common cause) u Both may result from an unhappy marriage. u Explanatory: Divorce among men u Response: Percent abusing alcohol

Chapter 1520 Both Variables are Changing Over Time u Both divorces and suicides have increased dramatically since 1900. u Are divorces causing suicides? u Are suicides causing divorces??? u The population has increased dramatically since 1900 (causing both to increase). u Better to investigate: Has the rate of divorce or the rate of suicide changed over time?

Chapter 1521 The Relationship May Be Just a Coincidence We will see some strong correlations (or apparent associations) just by chance, even when the variables are not related in the population

Chapter 1522 u A required whooping cough vaccine was blamed for seizures that caused brain damage –led to reduced production of vaccine (due to lawsuits) u Study of 38,000 children found no evidence for the accusations (reported in New York Times) –“people confused association with cause-and-effect” –“virtually every kid received the vaccine…it was inevitable that, by chance, brain damage caused by other factors would occasionally occur in a recently vaccinated child” Coincidence (?) Vaccines and Brain Damage

Chapter 1523 Case Study House, J., Landis, K., and Umberson, D. “Social Relationships and Health,” Science, Vol. 241 (1988), pp 540-545. Social Relationships and Health u Does lack of social relationships cause people to become ill? u Or, are unhealthy people less likely to establish and maintain social relationships? u Or, is there some other factor that predisposes people both to have lower social activity and become ill?

Chapter 1524 Key Concepts u Least Squares Regression Equation uR2uR2 u Correlation does not imply causation u Confirming causation u Reasons variables may be correlated Continued… 

Chapter 1525 Cautions about Correlation and Regression u only describe linear relationships u are both affected by outliers u always plot the data before interpreting u beware of extrapolation –predicting outside of the range of x u beware of lurking variables –have important effect on the relationship among the variables in a study, but are not included in the study u association does not imply causation

Chapter 1526 Least Squares Regression A least squares regression line makes the vertical distances from the data points to the line small. Return to Slide 7

Chapter 1527 A few explanations for an observed association A dashed line shows an association. An arrow shows a cause-and-effect link. Variable x is explanatory, y is a response variable, and z is a lurking variable. Return to Slide 13