Download presentation
Presentation is loading. Please wait.
Published byRoberta Fitzgerald Modified over 7 years ago
1
Relationships Between Quantatitive Variables
Scatterplots, Association, Correlation, and Linear Regression
2
Thought Questions 1. Judging from the scatterplot, there is a positive correlation between verbal SAT score and GPA. For used cars, there is a negative correlation between the age of the car and the selling price. Explain what it means for two variables to have a positive correlation or a negative correlation.
3
Thought Questions 2. Do you think each of the following pairs of variables would have a positive correlation, a negative correlation, or no correlation? Calories eaten per day and weight Calories eaten per day and IQ Amount of alcohol consumed and accuracy on a manual dexterity test Height of husband and height of wife Hours on FaceBook and GPA for college students
4
Looking at Scatterplots
Scatterplots may be the most common and most effective display for data. In a scatterplot, you can see patterns, trends, relationships, and even the occasional extraordinary value sitting apart from the others. Scatterplots are the best way to start observing the relationship and the ideal way to picture associations between two quantitative variables.
5
Looking at Scatterplots
Form of the relationship If there is a straight line (linear) relationship, it will appear as a cloud or swarm of points stretched out in a generally consistent, straight form.
6
Looking at Scatterplots
Strength of the Relationship At one extreme, the points appear to follow a single stream At the other extreme, the points appear as a vague cloud with no discernable trend or pattern:
7
Looking at Scatterplots
Other Forms of the Relationship The relationship isn’t straight, but curves gently, while still increasing or decreasing steadily The relationship curves sharply
8
Looking at Scatterplots
Unusual features: Look for the unexpected. Often the most interesting thing to see in a scatterplot is the thing you never thought to look for. One example of such a surprise is an outlier standing away from the overall pattern of the scatterplot. Clusters or subgroups should also raise questions.
9
SAS - Scatterplots title "Using PROC SGPLOT to Produce a Scatter Plot"; proc sgplot data=example.store; scatter x=Book_Sales y=Music_Sales; run; quit; title2 "Adding Gender Information to the Plot"; proc sgplot data=store; scatter x=Book_Sales y=Music_Sales / group=Gender;
10
SAS - Scatterplots
11
SAS - Scatterplots
12
Roles for Variables It is important to determine which of the two quantitative variables goes on the x-axis and which on the y-axis. This determination is made based on the roles played by the variables. When the roles are clear, the explanatory or predictor variable goes on the x-axis, and the response variable (variable of interest) goes on the y- axis.
13
Roles for Variables The roles that we choose for variables are more about how we think about them rather than about the variables themselves. Just placing a variable on the x-axis doesn’t necessarily mean that it explains or predicts anything. And the variable on the y-axis may not respond to it in any way.
14
Thinking about correlation between two quantitative variables
Pearson’s Correlation Coefficient Measures Strength Direction Shape
15
1: Strength of the relationship between two quantitative variables
Correlation measures the strength of the linear association (relationship) between two quantitative variables. Correlation Conditions Correlation applies only to quantitative variables Correlation measures the strength only of the linear association, and will be misleading if the relationship is not linear Outliers can distort the correlation dramatically
16
1: Strength of the relationship between two quantitative variables
Correlation Properties The sign of a correlation coefficient gives the direction of the association. Correlation is always between -1 and +1. Correlation can be exactly equal to -1 or +1, but these values are unusual in real data because they mean that all the data points fall exactly on a single straight line.
17
Correlation Measures the relative strength of the linear relationship between two numerical variables Sample correlation: where
18
Correlation - Formulas
20
2: Direction of the relationship between two variables
Typically the sign of the correlation coefficient indicates the direction. A positive relationship means as the value of X variable increases (decreases), the value of Y variable also increases (decreases). A negative relationship means as the value of X variable increases, the value of Y variable decreases.
21
3: Shape of the relationship between two variables
There can be many different shapes that characterize the relationship. Linear relationship is only one kind.
22
3: Shape of the relationship between two variables
Why this is a correlation of 0? No matter how much X changes, Y doesn’t change, so X and Y are not related.
23
Correlation Data collected from students in Statistics classes included their heights (in inches) and weights (in pounds): Here we see a positive association and a fairly straight form, although there seems to be a high outlier.
24
Correlation How strong is the association between weight and height of Statistics students? If we had to put a number on the strength, we would not want it to depend on the units we used. A scatterplot of heights (in centimeters) and weights (in kilograms) doesn’t change the shape of the pattern:
25
Example: Husbands and Wives
Scatterplot of British husbands’ and wives’ ages; r = .94 Scatterplot of British husbands’ and wives’ heights (in millimeters); r = .36
26
Relationship of Collegiate Football Experience and Concussion With Hippocampal Volume and Cognitive Outcomes
27
Thought Question All but one of these statements contain a mistake. Which could be true? A) The correlation between a football player’s weight and the position he plays is 0.54. B) The correlation between the amount of fertilizer used and the yield of beans is 0.42. C) There is a high correlation (1.09) between height of a corn stalk and its age in weeks. D) There is a correlation of 0.63 between gender and political party.
28
Question 39 39. Income and housing The Office of Federal Housing Enterprise Oversight ( collects data on various aspects of housing costs around the United States. Here is a scatterplot of the Housing Cost Index versus the Median Family Income for each of the 50 states. The correlation is 0.65. a) Describe the relationship between the Housing Cost Index and the Median Family Income by state. b) If we standardized both variables, what would the correlation coefficient between the standardized variables be? c) If we had measured Median Family Income in thousands of dollars instead of dollars, how would the correlation change? d) Washington, DC, has a housing cost index of 548 and a median income of about $45,000. If we were to include DC in the data set, how would that affect the correlation coefficient? e) Do these data provide proof that by raising the median family income in a state, the housing cost index will rise as a result? Explain.
29
What Can Go Wrong? Don’t say “correlation” when you mean “association.” More often than not, people say correlation when they mean association. The word “correlation” should be reserved for measuring the strength and direction of the linear relationship between two quantitative variables.
30
What Can Go Wrong? Don’t correlate categorical variables.
Be sure to check the Quantitative Variables Condition. Be sure the association is linear. There may be a strong association between two variables that have a nonlinear association.
31
What Can Go Wrong? Just because the correlation coefficient is high, don’t assume the relationship is linear. Here the correlation is 0.979, but the relationship is actually bent.
32
What Can Go Wrong? Watch out for lurking variables.
A hidden variable that stands behind a relationship and determines it by simultaneously affecting the other two variables is called a lurking variable. Example: A strong correlation has been found in a certain city in the northeastern United States between weekly sales of hot chocolate and weekly sales of facial tissues. Would you interpret that to mean that hot chocolate causes people to need facial tissues? Explain.
33
What Can Go Wrong? Beware of outliers. Even a single outlier
can dominate the correlation value. Make sure to check the Outlier Condition.
34
What Can Go Wrong? Don’t confuse correlation with causation.
35
What Can Go Wrong? Don’t confuse correlation with causation.
36
SAS - Correlations title "Computing Pearson Correlation Coefficients"; proc corr data=example.exercise nosimple rank; var Rest_Pulse Max_Pulse Run_Pulse Age; with Pushups; run;
37
SAS - Correlations
38
SAS - Correlations
39
SAS - Correlations ods pdf file='myreport.pdf'; title "Computing Pearson Correlation Coefficients"; proc corr data=example.exercise nosimple plots = matrix(histogram); var Pushups Rest_Pulse Max_Pulse Run_Pulse Age; run; ods pdf close;
40
SAS - Correlations
42
Study Example Study investigating the possible link between alcohol consumption and the death rate per 100,000 of the population from cirrhosis and alcoholism (data collected before West Germany ceased to exist as a separate country)
44
Study Example – SAS Code
data drinking; input country $ 1-12 alcohol cirrhosis; cards; France Italy W.Germany Austria Belgium USA Canada E&W Sweden Japan Netherlands Ireland Norway Finland Israel ;
45
Study Example – SAS Code
proc sgplot data=drinking; scatter y=cirrhosis x=alcohol /datalabel=country ; run; proc corr; run;
46
Study Example – SAS Output
47
Study Example – SAS Output
48
Specifying Linear Relationships with Linear Regression
People often are willing to make an assumption that the two variables are linearly related. Tip: to make sure this assumption is not too inappropriate, always look at the scatter-plot first.
49
Specifying Linear Relationships with Linear Regression
In regression, we want to model the relationship between two quantitative variables, one the explanatory (independent) variable and the other the response (dependent) variable.
50
Fat Versus Protein: An Example
The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu:
51
The Linear Model The correlation in Burger King example is 0.83.
It says “There seems to be a linear association between these two variables,” but it doesn’t tell what that association is. We can say more about the linear relationship between two quantitative variables with a model. A model simplifies reality to help us understand underlying patterns and relationships.
52
The Linear Model The linear model is just an equation of a straight line through the data. The points in the scatterplot don’t all line up, but a straight line can summarize the general pattern with only a couple of parameters. The linear model can help us understand how the values are associated.
53
Introduction to Regression Analysis
Regression analysis is used to: Predict the value of a dependent variable based on the value of at least one independent variable Explain the impact of changes in an independent variable on the dependent variable Dependent variable: the variable we wish to predict or explain Independent variable: the variable used to predict or explain the dependent variable
54
Simple Linear Regression Model
Random Error term Population Slope Coefficient Population Y intercept Independent Variable Dependent Variable Linear component Random Error component
55
Simple Linear Regression Model
Y Observed Value of Y for Xi εi Slope = β1 Predicted Value of Y for Xi Random Error for this Xi value Intercept = β0 Xi X
56
Simple Linear Regression Equation (Prediction Line)
The simple linear regression equation provides an estimate of the population regression line Estimated (or predicted) Y value for observation i Estimate of the regression intercept Estimate of the regression slope Value of X for observation i
57
Residuals The model won’t be perfect, regardless of the line we draw.
Some points will be above the line and some will be below. The estimate made from a model is the predicted value (denoted as ).
58
Residuals The difference between the observed value and its associated predicted value is called the residual. To find the residuals, we always subtract the predicted value from the observed one:
59
Residuals A negative residual means the predicted value’s too big (an overestimate). A positive residual means the predicted value’s too small (an underestimate). In the figure, for a Burger item with 30g of protein, the estimated fat of the BK Broiler chicken sandwich is 36 g, while the true value of fat is 25 g, so the residual is –11 g of fat.
60
“Best Fit” Means Least Squares
Some residuals are positive, others are negative, and, on average, they cancel each other out. So, we can’t assess how well the line fits by adding up all the residuals. Similar to what we did with (standard) deviations, we square the residuals and add the squares. The smaller the sum, the better the fit. The line of best fit is the line for which the sum of the squared residuals is smallest, the least squares line.
61
The Least Squares Method
b0 and b1 are obtained by finding the values of that minimize the sum of the squared differences between Y and :
62
Interpretation of the Slope and the Intercept
b0 is the estimated average value of Y when the value of X is zero b1 is the estimated change in the average value of Y as a result of a one-unit increase in X
63
The Regression Line in Real Units
In our model, we have a slope (b1): The slope is built from the correlation and the standard deviations: Our slope is always in units of y per unit of x. In our model, we also have an intercept (b0). The intercept is built from the means and the slope: Our intercept is always in units of y.
64
Fat Versus Protein: An Example
The regression line for the Burger King data fits the data well: The equation is The predicted fat content for a BK Broiler chicken sandwich (with 30 g of protein) is (30) = 35.9 grams of fat. For 31g of protein, the line predicts: (31) = grams of fat.
65
Textbook Body Fat Question 71
66
Textbook Body Fat Example: Using Excel Data Analysis Function
1. Choose Data 2. Choose Data Analysis 3. Choose Regression
67
Textbook Body Fat Example
68
Textbook Body Fat Example: Using Excel Data Analysis Function
Enter Y’s and X’s and desired options
69
Textbook Body Fat Example: Excel Output
The regression equation is:
70
Textbook Body Fat Example: Interpretation of bo
b0 ( ) is the estimated average value of body fat(%) when the value of weight(lb) is zero (if weight = 0 is in the range of observed X values) Because we can’t have a weight of 0, b0 has no practical application
71
Textbook Body Fat Example: Interpreting b1
b1 (0.2499) estimates the change in the average value of body fat(%) as a result of a one-unit increase in weight(lb) Here, b1 = tells us that the mean value of body fat(%) increases by , on average, for each additional one pound increase in weight
72
Textbook Body Fat Example: Making Predictions
Predict the body fat(%) for a person whose weight is190 lbs: What is the residual for someone who weighs 190 lbs and has a body fat content of 21%?
73
Study: Amygdala volume and social network size in humans
We found that amygdala volume correlates with the size and complexity of social networks in adult humans. An exploratory analysis of subcortical structures did not find strong evidence for similar relationships with any other structure, but there were associations between social network variables and cortical thickness in three cortical areas, two of them with amygdala connectivity. These findings indicate that the amygdala is important in social behavior.
74
Study: Amygdala volume and social network size in humans
In 58 healthy adults (22 females; mean age M = 52.6, s.d. = 21.2, range = 19–83 years) with confirmed absence of DSM-IV Axis I diagnoses and normal performance on cognitive testing, we examined social network size and complexity with two subscales of the Social Network Index.” One SNI subscale (Number of People in Social Network) measures the total number of regular contacts that a person maintains, reflecting overall network size. A second subscale (Number of Embedded Networks) measured the number of different groups these contacts belong to, reflecting network complexity.
75
Study: Amygdala volume and social network size in humans
76
Study: Amygdala volume and social network size in humans
77
Study: Amygdala volume and social network size in humans
Figure 1a – A Closer Look y: Total number of people in social network (psn) x: amygdala volume (av) psn = (av) where intecept = 9 and slope = 0.38 av= 3: psn = (3) => psn = 10.14 av=4: psn = (4) => psn = 10.52
78
Residuals Revisited The linear model assumes that the relationship between the two variables is a perfect straight line. The residuals are the part of the data that hasn’t been modeled. Data = Model + Residual or (equivalently) Residual = Data – Model Or, in symbols,
79
Residuals Revisited Residuals help us to see whether the model makes sense. When a regression model is appropriate, nothing interesting should be left behind. After we fit a regression model, we usually plot the residuals in the hope of finding…nothing.
80
Textbook Body Fat Example
81
Burger King Example The residuals for the BK menu regression look appropriately boring:
82
Fuel Efficiency Example
83
Question 27
84
R2—The Variation Accounted For (Burger King Example)
If the correlation were 1.0 and the model predicted the fat values perfectly, the residuals would all be zero and have no variation. As it is, the correlation is 0.83—not perfection. However, we did see that the model residuals had less variation than total fat alone. We can determine how much of the variation is accounted for by the model and how much is left in the residuals.
85
R2—The Variation Accounted For (Burger King Example)
The squared correlation, r2, gives the fraction of the data’s variance accounted for by the model. Thus, 1 – r2 is the fraction of the original variance left in the residuals. For the BK model, r2 = = 0.69, so 31% of the variability in total fat has been left in the residuals.
86
R2—The Variation Accounted For (Burger King Example)
All regression analyses include this statistic, although by tradition, it is written R2 (pronounced “R-squared”). An R2 of 0 means that none of the variance in the data is in the model; all of it is still in the residuals. When interpreting a regression model you need to Tell what R2 means. In the BK example, 69% of the variation in total fat is accounted for by variation in the protein content.
87
Textbook Body Fat Example
In the Body example, 48.5% of the variation in body fat(%) is accounted for by variation in weight(lbs).
88
SAS – Simple Linear Regression
ods graphics on; title "Running a Simple Linear Regression Model"; proc reg data=example.exercise; model Pushups = Rest_Pulse; run; quit;
89
SAS – Simple Linear Regression
90
SAS – Simple Linear Regression
91
SAS – Simple Linear Regression
93
Question 41 41. More real estate Consider the Albuquerque home sales from Exercise 29 again. The regression analysis gives the model A random sample of records of home sales from Feb. 15 to Apr. 30, 1993, from the files maintained by the Albuquerque Board of Realtors gives the Price and Size (in square feet) of 117 homes. A regression to predict Price (in thousands of dollars) from Size has an R- Squared of 71.4% a) Explain what the slope of the line says about housing prices and house size. b) What price would you predict for a 3000-square-foot house in this market? c) A real estate agent shows a potential buyer a 1200-square-foot home, saying that the asking price is $6000 less than what one would expect to pay for a house of this size. What is the asking price, and what is the $6000 called?
94
Regression Wisdom Extrapolation, Impact of Outliers, Correlation versus Causation, Science as Falsification
95
Thought Question If the point in the upper left corner of this scatterplot is removed from the data set, then what will happen to the slope of the regression (b) and to the correlation (r)? A) both will increase. B) both will decrease. C) b will increase, and r will decrease. D) b will decrease, and r will increase. E) both will remain the same.
96
Thought Question 2. A scatterplot of the damage (in dollars) caused to a house by fire and the number of firefighters on the scene shows a strong correlation. Do more firefighters cause more damage?
97
Thought Questions 3. An article in the Sacramento Bee (29 May, 1998, p. A17) noted “Americans are just too fat, researchers say, with 54 percent of all adults heavier than is healthy. If the trend continues, experts say that within a few generations virtually every U.S. adult will be overweight.” This prediction is based on “extrapolating,” which assumes the current rate of increase will continue indefinitely. Is that a reasonable assumption? Do you agree with the prediction? Explain.
98
Extrapolation: Reaching Beyond the Data
Linear models give a predicted value for each case in the data. We cannot assume that a linear relationship in the data exists beyond the range of the data. The farther the new x value is from the mean in x, the less trust we should place in the predicted value. Once we venture into new x territory, such a prediction is called an extrapolation.
99
Extrapolation: Reaching Beyond the Data
Extrapolations are dubious because they require the additional—and very questionable — assumption that nothing about the relationship between x and y changes even at extreme values of x. Extrapolations can get you into deep trouble. You’re better off not making extrapolations.
100
Textbook Body Fat Example
101
Extrapolation - Predicting the Future
Here is a timeplot of the Energy Information Administration (EIA) predictions and actual prices of oil barrel prices. How did forecasters do? They seemed to have missed a sharp run-up in oil prices in the past few years.
102
Don’t Blink! The Hazards of Confidence – NY Times, OCT 2011
Mutual funds are run by highly experienced and hard-working professionals who buy and sell stocks to achieve the best possible results for their clients. Nevertheless, the evidence from more than 50 years of research is conclusive: for a large majority of fund managers, the selection of stocks is more like rolling dice than like playing poker. At least two out of every three mutual funds underperform the overall market in any given year.
103
Don’t Blink! The Hazards of Confidence – NY Times, OCT 2011
More important, the year-to-year correlation among the outcomes of mutual funds is very small, barely different from zero. The funds that were successful in any given year were mostly lucky; they had a good roll of the dice. There is general agreement among researchers that this is true for nearly all stock pickers, whether they know it or not — and most do not. The subjective experience of traders is that they are making sensible, educated guesses in a situation of great uncertainty. In highly efficient markets, however, educated guesses are not more accurate than blind guesses.
104
Predicting the Future Extrapolation is always dangerous. But, when the x- variable in the model is time, extrapolation becomes an attempt to peer into the future. Knowing that extrapolation is dangerous doesn’t stop people. The temptation to see into the future is hard to resist. Here’s some realistic advice: If you must extrapolate into the future, at least don’t believe that the prediction will come true.
105
Outliers Outlying points can strongly influence a linear regression. Even a single point far from the body of the data can dominate the analysis. Any point that stands away from the others can be called an outlier and deserves your special attention.
106
Outliers The following scatterplot shows that something was awry in Palm Beach County, Florida, during the 2000 presidential election…
107
Outliers The red line shows the effects that one unusual point can have on a regression: Red Line: Buchanan = Nader r = 0.65 With Palm Beach County (red dot) removed the correlation increases to r = 0.90
108
Outliers: Leverage and Influence
A data point can also be unusual if its x-value is far from the mean of the x-values. Such points are said to have high leverage. A point with high leverage has the potential to change the regression line. We say that a point is influential if omitting it from the analysis gives a very different model.
109
Outliers: Leverage and Influence
The extraordinarily large shoe size gives the data point high leverage. Wherever the IQ is, the line will follow! If a point has enough leverage, it can pull the line right to it. Then it is highly influential.
110
Question If the point in the upper right corner of this scatterplot is removed from the data set, then what will happen to the slope of the regression (b) and to the correlation (r)? A) b will decrease, and r will increase B) b will remain the same, and r will increase. C) b will remain the same, and r will decrease. D) b will decrease, and r will remain the same. E) both will remain the same.
112
Study Example – SAS Output
113
Study Example – SAS Output
114
Study Example – SAS Output
115
Study Example – SAS Output (France Removed)
116
Study Example – SAS Output (France Removed)
117
Study Example – SAS Output (France Removed)
118
35. Interest rates 2014 Here’s a plot showing the federal rate on 3-month Treasury bills from 1950 to 1980, and a regression model fit to the relationship between the Rate (in %) and Years Since 1950 a) What is the correlation between Rate and Year? b) Interpret the slope and intercept. c) What does this model predict for the interest rate in the year 2000? d) Would you expect this prediction to have been accurate? Explain.
119
Outliers: Leverage and Influence
When we investigate an unusual point, we often learn more about the situation than we could have learned from the model alone. You cannot simply delete unusual points from the data. You can, however, fit a model with and without these points as long as you examine and discuss the two regression models to understand how they differ.
120
Legitimate Correlation Does Not Imply Causation
The scatterplot shows that the average life expectancy for a country is related to the number of doctors per person in that country. This new scatterplot shows that the average life expectancy for a country is related to the number of televisions per person in that country. The basic meaning of causation is that by changing the value of one variable we can bring about a change in the value of another variable
121
Legitimate Correlation Does Not Imply Causation
Example For school children, shoe size is strongly correlated with reading skills. However, learning new words does not make the feet get bigger. Instead, there is a third variable involved - ?
122
Study: Chocolate Consumption, Cognitive Function, and Nobel Laureates – NEJM, Oct 2012
123
Correlation does not imply Causation
124
Correlation does not imply Causation
125
Lurking Variables and Causation
No matter how strong the association, no matter how large the R2 value, no matter how straight the line, there is no way to conclude from a regression alone that one variable causes the other. There’s always the possibility that some third variable is driving both of the variables you have observed. With observational data, as opposed to data from a designed experiment, there is no way to be sure that a lurking variable is not the cause of any apparent association.
126
Causation vs. Association
Example of causation: Increased drinking of alcohol causes a decrease in coordination. Example of association: High SAT scores are associated with a high College Freshman year GPA.
127
Freakonomics – Crime Drops in the 90’s – by Steve Levitt
Anyone living in the United States in the early 1990s ….could be forgiven for having been scared out of his skin. The culprit was crime. It had been rising relentlessly Death by gunfire, intentional and otherwise, had become commonplace. So too had carjacking and crack dealing, robbery and rape. Violent crime was a gruesome, constant companion. And things were about to get even worse. Much worse. All the experts were saying so.
128
Freakonomics – Crime Drops in the 90’s – by Steve Levitt
And then, instead of going up and up and up, crime began to fall. And fall and fall and fall some more. It was ubiquitous, with every category of crime falling in every part of the country. It was persistent, with incremental decreases year after year. By 2000 the overall murder rate in the United States had dropped to its lowest level in thirty-five years. So had the rate of just about every other sort of crime, from assault to car theft.
129
Freakonomics – Crime Drops in the 90’s – by Steve Levitt
130
Freakonomics – Crime Drops in the 90’s – by Steve Levitt
CRIME-DROP EXPLANATION / NUMBER OF CITATIONS (IN MEDIA) 1. Innovative policing strategies / 52 2. Increased reliance on prisons / 47 3. Changes in crack and other drug markets / 33 4. Aging of the population / 32 5. Tougher gun control laws / 32 6. Strong economy / 28 7. Increased number of police / 26 8. All other explanations (increased use of capital punishment, concealed-weapons laws, gun buybacks, and others) / 34
131
Freakonomics – Crime Drops in the 90’s – by Steve Levitt
There was another factor, meanwhile, that had greatly contributed to the massive crime drop of the 1990s. It had taken shape more than twenty years earlier and concerned a young woman in Dallas named Norma McCorvey (Roe v. Wade)
132
Freakonomics – Crime Drops in the 90’s – by Steve Levitt
The strong economy studies have shown that an unemployment decline of 1 percentage point accounts for a 1 percent drop in nonviolent crime. During the 1990s, the unemployment rate fell by 2 percentage points; nonviolent crime, meanwhile, fell by roughly 40 percent. This weak link is made even weaker by glancing back to a recent decade, the 1960s, when the economy went on a wild growth spurt—as did violent crime. So while a strong 1990s economy might have seemed, on the surface, a likely explanation for the drop in crime, it almost certainly didn’t affect criminal behavior in any significant way.
133
Freakonomics – Crime Drops in the 90’s – by Steve Levitt
Increased number of police The number of police officers per capita in the United States rose about 14 percent during the 1990s. From 1960 to 1985, the number of police officers fell more than 50 percent relative to the number of crimes. This 50 percent decline in police translated into a roughly equal decline in the probability that a given criminal would be caught. The hiring of additional police accounted for roughly 10 percent of the 1990s crime drop.
134
Freakonomics – Crime Drops in the 90’s – by Steve Levitt
Tougher gun laws Nearly two-thirds of U.S. homicides involve a gun, a far greater fraction than in other industrialized countries. But guns are not the whole story. In Switzerland, every adult male is issued an assault rifle for militia duty and is allowed to keep the gun at home. On a per capita basis, Switzerland has more firearms than just about any other country, and yet it is one of the safest places in the world. In other words, guns do not cause crime.
135
Freakonomics – Crime Drops in the 90’s – by Steve Levitt
Final Reasoning In 1966, one year after Nicolae Ceaus¸escu became the Communist dictator of Romania, he made abortion illegal. Abortion was in fact the main form of birth control, with four abortions for every live birth. Now, virtually overnight, abortion was forbidden. In one important way, the Romanian abortion story is a reverse image of the American crime story. The children born in the wake of the abortion ban were much more likely to become criminals than children born earlier. Researchers found that in the instances where the woman was denied an abortion, she often resented her baby and failed to provide it with a good home.
136
Confirming Causation The only legitimate way to try to establish a causal connection statistically is through the use of randomized experiments. If a randomized experiment cannot be done, then nonstatistical considerations must be used to determine whether a causal link is reasonable. Evidence of a possible causal connection: There is a reasonable explanation of cause and effect. The connection happens under varying conditions. Potential confounding variables are ruled out.
137
Evidence for Causation – Smoking causing Lung Cancer
What are the criteria for establishing causation when we cannot do an experiment? The association is strong. The association between smoking and lung cancer is very strong. The association is consistent. Many studies of different kinds of people in many countries link smoking to lung cancer. That reduces the chance that a confounding variable specific to one group or one study explains the association. Higher doses are associated with stronger responses. People who smoke more cigarettes per day or who smoke over a longer period get lung cancer more often. People who stop smoking reduce their risk.
138
Evidence for Causation – Smoking causing Lung Cancer
The alleged cause precedes the effect in time. Lung cancer develops after years of smoking. The number of men dying of lung cancer rose as smoking became more common, with a lag of about 30 years. Lung cancer was rare among women until women began to smoke. Lung cancer in women rose along with smoking, again with a lag of about 30 years, and has now passed breast cancer as the leading cause of cancer death among women. The alleged cause is plausible. Experiments with animals show that tars from cigarette smoke do cause cancer
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.