1 10. Causality and Correlation ECON 251 Research Methods.

Slides:



Advertisements
Similar presentations
The Question of Causation YMS3e 4.3:Establishing Causation AP Statistics Mr. Molesky.
Advertisements

Chapter 4: More on Two- Variable Data.  Correlation and Regression Describe only linear relationships Are not resistant  One influential observation.
Aim: How do we establish causation?
AP Statistics Section 4.3 Establishing Causation
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.4 Cautions in Analyzing.
AP Statistics Causation & Relations in Categorical Data.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Correlation: Relationships Can Be Deceiving. The Impact Outliers Have on Correlation An outlier that is consistent with the trend of the rest of the data.
Correlation: Relationships Can Be Deceiving. An outlier is a data point that does not fit the overall trend. Speculate on what influence outliers have.
Lesson Establishing Causation. Knowledge Objectives Identify the three ways in which the association between two variables can be explained. Define.
Scatterplots By Wendy Knight. Review of Scatterplots  Scatterplots – Show the relationship between 2 quantitative variables measured on the same individual.
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Causation. Learning Objectives By the end of this lecture, you should be able to: – Describe causation and the ways in which it differs from correlation.
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
 Pg : 3b, 6b (form and strength)  Page : 10b, 12a, 16c, 16e.
Chapter 4 Section 3 Establishing Causation
The Question of Causation
HW#9: read Chapter 2.6 pages On page 159 #2.122, page 160#2.124,
 Correlation and regression are closely connected; however correlation does not require you to choose an explanatory variable and regression does. 
C HAPTER 4: M ORE ON T WO V ARIABLE D ATA Sec. 4.2 – Cautions about Correlation and Regression.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Can Be Deceiving Chapter 11.
2.4: Cautions about Regression and Correlation. Cautions: Regression & Correlation Correlation measures only linear association. Extrapolation often produces.
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 15 Describing Relationships: Regression, Prediction, and Causation Chapter 151.
1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.
Chapter 151 Describing Relationships: Regression, Prediction, and Causation.
Relationships Can Be Deceiving Statistics lecture 5.
Does Association Imply Causation? Sometimes, but not always! What about: –x=mother's BMI, y=daughter's BMI –x=amt. of saccharin in a rat's diet, y=# of.
Lecture Presentation Slides SEVENTH EDITION STATISTICS Moore / McCabe / Craig Introduction to the Practice of Chapter 2 Looking at Data: Relationships.
1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.
Chapter 5 Regression. u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We can then predict.
AP STATISTICS LESSON 4 – 2 ( DAY 1 ) Cautions About Correlation and Regression.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
 What is an association between variables?  Explanatory and response variables  Key characteristics of a data set 1.
Lecture 5 Chapter 4. Relationships: Regression Student version.
Chapter 4 Day Six Establishing Causation. Beware the post-hoc fallacy “Post hoc, ergo propter hoc.” To avoid falling for the post-hoc fallacy, assuming.
Describing Relationships
Cautions About Correlation and Regression Section 4.2.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Prediction and Causation How do we predict a response? Explanatory Variables can be used to predict a response: 1. Prediction is based on fitting a line.
The Question of Causation 4.2:Establishing Causation AP Statistics.
AP Statistics. Issues Interpreting Correlation and Regression  Limitations for r, r 2, and LSRL :  Can only be used to describe linear relationships.
4. Relationships: Regression
2.7 The Question of Causation
Cautions About Correlation and Regression Section 4.2
Cautions About Correlation and Regression
Proving Causation Why do you think it was me?!.
Establishing Causation
Cautions about Correlation and Regression
Chapter 2 Looking at Data— Relationships
Section 4.3 Types of Association
Chapter 2: Looking at Data — Relationships
Which of the following would be necessary to establish a cause-and- effect relationship between two variables? Strong association between the variables.
Chapter 2 Looking at Data— Relationships
Scatterplots, Association, and Correlation
Register for AP Exams --- now there’s a $10 late fee per exam
The Question of Causation
Looking at data: relationships - Caution about correlation and regression - The question of causation IPS chapters 2.4 and 2.5 © 2006 W. H. Freeman and.
Least-Squares Regression
Basic Practice of Statistics - 3rd Edition Regression
EQ: What gets in the way of a good model?
Does Association Imply Causation?
Correlation/regression using averages
Section 6.2 Establishing Causation
Basic Practice of Statistics - 3rd Edition Lecture Powerpoint
Chapter 4: More on Two-Variable Data
Correlation/regression using averages
Presentation transcript:

1 10. Causality and Correlation ECON 251 Research Methods

2 Example 1  A strong correlation has been found in a certain city in the northeastern United States between weekly sales of hot chocolate and weekly sales of facial tissues.  Would you interpret that to mean that hot chocolate causes people to need facial tissues ? Explain.

3 Example 2  Researchers found a correlation of 0.86 between the number of churchgoers and the number of burglaries committed in different towns.  Explanation? More churchgoers means more empty houses Attending church makes people want to rob  Common Third Cause:

4 Example 3  Researchers have shown that there is a positive correlation between the average fat intake and the breast cancer rate across countries. In other words, countries with higher fat intake tend to have higher breast cancer rates.  Does this correlation prove that dietary fat is a contributing cause of breast cancer ? Explain.

5 Example 4  If you were to draw a scatterplot of number of women in the work force versus number of Christmas trees sold in the United States for each year between 1930 and the present, you would find a very strong correlation.  Why do you think this would be true?  Does one cause the other?

6 Example 5  Explain this cartoon in terms of correlation and causation

7 Causation vs. Association  Some studies want to find the existence of causation.  Example of causation: Increased drinking of alcohol causes a decrease in coordination. Smoking and Lung Cancer.  Example of association: High SAT scores are associated with a high Freshman year GPA. Smoking and Lung Cancer.

8 Explaining Associations Some possible explanations for an observed association. The dashed lines show an association. The solid arrows show a cause-and-effect link. x is explanatory, y is response, and z is a lurking variable.

9 Reasons Two Variables Could Be Related: 1.Explanatory variable is the direct cause of the response variable. Example: Amount of food consumed in past hour and level of hunger. 2.Response variable is causing a change in the explanatory variable. Example: In a study in Resource Manual, it was noted that divorced men were twice as likely to abuse alcohol as married men. The authors concluded that getting divorced caused alcohol abuse. But, it is just as reasonable to assume that alcohol abuse causes divorce.

10 Reasons Two Variables Could Be Related: 3.Explanatory variable is a contributing but not sole cause of the response variable. Example: Carcinogen in diet is not sole cause of cancer, but rather a necessary contributor to it. 4.Confounding variables may exist. A confounding variable is related to the explanatory variable and affects the response variable. So can’t determine how much change is due to the explanatory and how much is due to the confounding variable(s). Example: Consider the relationship between hours studied per day and grade point average. Studying increases grade point average, but it is also reasonable that a desire to do well in school means that a person studies more and that their grade point average is high.

11 Confounding  Two variables are confounded when their effects on a response variable cannot be distinguished from each other. The confounded variables may be either explanatory variables or lurking variables. Example: Studies have found that religious people live longer than nonreligious people. Religious people also take better care of themselves and are less likely to smoke or be overweight.

12 Lurking Variables  Lurking variables can create nonsense correlations.  For the world’s nations, let x be the number of TVs/person and y be the average life expectancy; A high positive correlation Nations with more TV sets have higher life expectancies. Could we lengthen the lives of people in Rwanda by shipping them more TVs?  Lurking variable: wealth of the nation Rich nations: more TV sets. Rich nations: longer life expectancies because of better nutrition, clean water, and better health care.

13 Lurking Variables Examples:  Students who use tutors have lower test scores than students who don’t. Lurking variable:  Negative association between moderate amounts of wine drinking and death rates from heart disease in developed nations. Lurking variable:  Number of churches and number of bars Lurking variable:  Lurking variables can create nonsense (false) correlations!

14 Lurking Variables  How to spot the presence of lurking variables? In general difficult. Many lurking variables change systematically over time.  Plot both the response variable and the residuals against the time order of the observations whenever possible.

15 Reasons Two Variables Could Be Related: 5. Both variables may result from a common cause. Example: Students who have high SAT scores in high school have high GPAs in their first year of college. This positive correlation can be explained as a common response to students’ ability and knowledge.  The observed association between two variables x and y could be explained by a third lurking variable z.  Both x and y change in response to changes in z. This creates an association even though there is no direct causal link.

16 Common Response  “There is a strong positive correlation between the number of firefighters at a fire and the amount of damage the fire does. So sending lots of firefighters just causes more damage.”  What is the lurking variable? a) Number of firefighters b) Amount of damage c) How large the fire is. d) If the fire is close to the fire station.

17 Reasons Two Variables Could Be Related: 6. Both variables are changing over time. Nonsensical associations result from correlating two variables that have both changed over time. Example: The number of divorces and the number of suicides have both increased dramatically since This does not mean that divorces are causing suicides. All such statistics increase as the population increases. 7. Association may be nothing more than coincidence. Association is a coincidence, even though odds of it happening appear to be very small.

18 Simpson’s Paradox  Simpson’s paradox is a severe form of confounding in which there is a reversal in the direction of an association caused by a lurking variable.  Overall direction of association: _________  But when we color different habitats in different colors, the data is separated by a lurking variable (different habitats) into a series of ______ linear associations.

19 Simpson’s Paradox  Is acceptance into a college (response variable) predicted by gender (explanatory variable)?  Consider these data:  Proportions accepted by gender: Male success rate = 198 / 360 = 0.55 Female success rate = 88 / 200 = 0.44  Conclude: males were accepted at a _______ rate than females.

20 Simpson’s Paradox  Broken down according to the lurking variable "major…" Male proportion = 18 / 120 = 0.15 Female proportion = 24 / 120 = 0.20 Therefore: males were accepted at a _____ rate than females. Male proportion = 180 / 240 = 0.75 Female proportion = 64 / 80 = 0.80 Therefore: males were accepted at a _______ rate than females.

21 Evidence for Causation  Evidence of a possible causal connection The association is strong (high r value) The association is consistent (the association can be found in several studies of different subjects) Higher doses are associated with stronger responses The alleged cause precedes the effect in time The alleged cause is plausible (storks do not bring babies)  Other things to keep in mind: Data from an observational study in the absence of any other evidence cannot be used to establish causation.

22 Summary  Association does not imply causation!  Correlation and regression can be misleading if you ignore important lurking variables.  A correlation based on averages is usually higher than if we had data for individuals (Simpson’s paradox).  Do not use a regression on inappropriate data. Pattern in the residuals Presence of large outliers Clumped data falsely appearing linear  A relationship, however strong, does not itself imply causation. Use residual plots for help.