Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 10. Causality and Correlation ECON 251 Research Methods.

Similar presentations


Presentation on theme: "1 10. Causality and Correlation ECON 251 Research Methods."— Presentation transcript:

1 1 10. Causality and Correlation ECON 251 Research Methods

2 2 Example 1  A strong correlation has been found in a certain city in the northeastern United States between weekly sales of hot chocolate and weekly sales of facial tissues.  Would you interpret that to mean that hot chocolate causes people to need facial tissues ? Explain.

3 3 Example 2  Researchers found a correlation of 0.86 between the number of churchgoers and the number of burglaries committed in different towns.  Explanation? More churchgoers means more empty houses Attending church makes people want to rob  Common Third Cause:

4 4 Example 3  Researchers have shown that there is a positive correlation between the average fat intake and the breast cancer rate across countries. In other words, countries with higher fat intake tend to have higher breast cancer rates.  Does this correlation prove that dietary fat is a contributing cause of breast cancer ? Explain.

5 5 Example 4  If you were to draw a scatterplot of number of women in the work force versus number of Christmas trees sold in the United States for each year between 1930 and the present, you would find a very strong correlation.  Why do you think this would be true?  Does one cause the other?

6 6 Example 5  Explain this cartoon in terms of correlation and causation

7 7 Causation vs. Association  Some studies want to find the existence of causation.  Example of causation: Increased drinking of alcohol causes a decrease in coordination. Smoking and Lung Cancer.  Example of association: High SAT scores are associated with a high Freshman year GPA. Smoking and Lung Cancer.

8 8 Explaining Associations Some possible explanations for an observed association. The dashed lines show an association. The solid arrows show a cause-and-effect link. x is explanatory, y is response, and z is a lurking variable.

9 9 Reasons Two Variables Could Be Related: 1.Explanatory variable is the direct cause of the response variable. Example: Amount of food consumed in past hour and level of hunger. 2.Response variable is causing a change in the explanatory variable. Example: In a study in Resource Manual, it was noted that divorced men were twice as likely to abuse alcohol as married men. The authors concluded that getting divorced caused alcohol abuse. But, it is just as reasonable to assume that alcohol abuse causes divorce.

10 10 Reasons Two Variables Could Be Related: 3.Explanatory variable is a contributing but not sole cause of the response variable. Example: Carcinogen in diet is not sole cause of cancer, but rather a necessary contributor to it. 4.Confounding variables may exist. A confounding variable is related to the explanatory variable and affects the response variable. So can’t determine how much change is due to the explanatory and how much is due to the confounding variable(s). Example: Consider the relationship between hours studied per day and grade point average. Studying increases grade point average, but it is also reasonable that a desire to do well in school means that a person studies more and that their grade point average is high.

11 11 Confounding  Two variables are confounded when their effects on a response variable cannot be distinguished from each other. The confounded variables may be either explanatory variables or lurking variables. Example: Studies have found that religious people live longer than nonreligious people. Religious people also take better care of themselves and are less likely to smoke or be overweight.

12 12 Lurking Variables  Lurking variables can create nonsense correlations.  For the world’s nations, let x be the number of TVs/person and y be the average life expectancy; A high positive correlation Nations with more TV sets have higher life expectancies. Could we lengthen the lives of people in Rwanda by shipping them more TVs?  Lurking variable: wealth of the nation Rich nations: more TV sets. Rich nations: longer life expectancies because of better nutrition, clean water, and better health care.

13 13 Lurking Variables Examples:  Students who use tutors have lower test scores than students who don’t. Lurking variable:  Negative association between moderate amounts of wine drinking and death rates from heart disease in developed nations. Lurking variable:  Number of churches and number of bars Lurking variable:  Lurking variables can create nonsense (false) correlations!

14 14 Lurking Variables  How to spot the presence of lurking variables? In general difficult. Many lurking variables change systematically over time.  Plot both the response variable and the residuals against the time order of the observations whenever possible.

15 15 Reasons Two Variables Could Be Related: 5. Both variables may result from a common cause. Example: Students who have high SAT scores in high school have high GPAs in their first year of college. This positive correlation can be explained as a common response to students’ ability and knowledge.  The observed association between two variables x and y could be explained by a third lurking variable z.  Both x and y change in response to changes in z. This creates an association even though there is no direct causal link.

16 16 Common Response  “There is a strong positive correlation between the number of firefighters at a fire and the amount of damage the fire does. So sending lots of firefighters just causes more damage.”  What is the lurking variable? a) Number of firefighters b) Amount of damage c) How large the fire is. d) If the fire is close to the fire station.

17 17 Reasons Two Variables Could Be Related: 6. Both variables are changing over time. Nonsensical associations result from correlating two variables that have both changed over time. Example: The number of divorces and the number of suicides have both increased dramatically since 1900. This does not mean that divorces are causing suicides. All such statistics increase as the population increases. 7. Association may be nothing more than coincidence. Association is a coincidence, even though odds of it happening appear to be very small.

18 18 Simpson’s Paradox  Simpson’s paradox is a severe form of confounding in which there is a reversal in the direction of an association caused by a lurking variable.  Overall direction of association: _________  But when we color different habitats in different colors, the data is separated by a lurking variable (different habitats) into a series of ______ linear associations.

19 19 Simpson’s Paradox  Is acceptance into a college (response variable) predicted by gender (explanatory variable)?  Consider these data:  Proportions accepted by gender: Male success rate = 198 / 360 = 0.55 Female success rate = 88 / 200 = 0.44  Conclude: males were accepted at a _______ rate than females.

20 20 Simpson’s Paradox  Broken down according to the lurking variable "major…" Male proportion = 18 / 120 = 0.15 Female proportion = 24 / 120 = 0.20 Therefore: males were accepted at a _____ rate than females. Male proportion = 180 / 240 = 0.75 Female proportion = 64 / 80 = 0.80 Therefore: males were accepted at a _______ rate than females.

21 21 Evidence for Causation  Evidence of a possible causal connection The association is strong (high r value) The association is consistent (the association can be found in several studies of different subjects) Higher doses are associated with stronger responses The alleged cause precedes the effect in time The alleged cause is plausible (storks do not bring babies)  Other things to keep in mind: Data from an observational study in the absence of any other evidence cannot be used to establish causation.

22 22 Summary  Association does not imply causation!  Correlation and regression can be misleading if you ignore important lurking variables.  A correlation based on averages is usually higher than if we had data for individuals (Simpson’s paradox).  Do not use a regression on inappropriate data. Pattern in the residuals Presence of large outliers Clumped data falsely appearing linear  A relationship, however strong, does not itself imply causation. Use residual plots for help.


Download ppt "1 10. Causality and Correlation ECON 251 Research Methods."

Similar presentations


Ads by Google