Chapter 4: More about Relationships Between Two Variables

Slides:

Advertisements

Similar presentations

Data Analysis for Two-Way Tables

Advertisements

Chapter 4 Review: More About Relationship Between Two Variables

4.1: Linearizing Data.

Chapter 4 More About Relationships Between Two Variables 4.1 Transforming to Achieve Linearity 4.2 Relationship Between Categorical Variables 4.3 Establishing.

Chapter 10 Re-Expressing data: Get it Straight

Comparitive Graphs.

AP Statistics Section 4.2 Relationships Between Categorical Variables.

Chapter 3 Bivariate Data

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Quantitative Variables Chapter 5.

Chapter Four: More on Two- Variable Data 4.1: Transforming to Achieve Linearity 4.2: Relationships between Categorical Variables 4.3: Establishing Causation.

AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.

Lesson Quiz: Part I 1. Change 6 4 = 1296 to logarithmic form. log = 4 2. Change log 27 9 = to exponential form = log 100,000 4.

Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/

CHAPTER 1 Exploring Data 1.1 Analyzing Categorical Data.

Excellence Justify the choice of your model by commenting on at least 3 points. Your comments could include the following: a)Relate the solution to the.

Ch 2 and 9.1 Relationships Between 2 Variables

Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.

Chapter 4: More about Relationships Between Two Variables

+ Hw: pg 788: 37, 39, 41, Chapter 12: More About Regression Section 12.2b Transforming using Logarithms.

More about Relationships Between Two Variables

CHAPTER 12 More About Regression

Chapter 8 Exponential and Logarithmic Functions

Transforming to achieve linearity

The Practice of Statistics

Section 4.4: Simpson Paradox Section 4.5: Linearizing an association between two variable by performing a Mathematical Transformations 4-11.

The Practice of Statistics Third Edition Chapter 4: More about Relationships between Two Variables Copyright © 2008 by W. H. Freeman & Company Daniel S.

Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict.

Chapter 3 concepts/objectives Define and describe density curves Measure position using percentiles Measure position using z-scores Describe Normal distributions.

4.3 Categorical Data Relationships.

Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?

1 Chapter 4: More on Two-Variable Data 4.1Transforming Relationships 4.2Cautions 4.3Relations in Categorical Data.

Lecture Presentation Slides SEVENTH EDITION STATISTICS Moore / McCabe / Craig Introduction to the Practice of Chapter 2 Looking at Data: Relationships.

CHAPTER 6: Two-Way Tables. Chapter 6 Concepts 2  Two-Way Tables  Row and Column Variables  Marginal Distributions  Conditional Distributions  Simpson’s.

Chapter 4 More on Two-Variable Data YMS 4.1 Transforming Relationships.

Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.

Copyright © 2010 Pearson Education, Inc. Slide A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a.

Two-way tables BPS chapter 6 © 2006 W. H. Freeman and Company.

Analysis of two-way tables - Data analysis for two-way tables IPS chapter 2.6 © 2006 W.H. Freeman and Company.

BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.

Stat1510: Statistical Thinking and Concepts Two Way Tables.

Two-Way Tables Categorical Data. Chapter 4 1.  In this chapter we will study the relationship between two categorical variables (variables whose values.

YOU NEED TO KNOW WHAT THIS MEANS

Warm-up An investigator wants to study the effectiveness of two surgical procedures to correct near-sightedness: Procedure A uses cuts from a scalpel and.

Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Business Statistics for Managerial Decision Making

Chapter 6 Two-Way Tables BPS - 5th Ed.Chapter 61.

AP Statistics Section 4.2 Relationships Between Categorical Variables

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.

CHAPTER 6: Two-Way Tables*

Chapter 10 Notes AP Statistics. Re-expressing Data We cannot use a linear model unless the relationship between the two variables is linear. If the relationship.

Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.

Chapter 4 More on Two-Variable Data. Four Corners Play a game of four corners, selecting the corner each time by rolling a die Collect the data in a table.

Unit 2 Exploring Data: Comparisons and Relationships Topic 7 Comparing Distributions II: Categorical Variables (page 137)

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.2 Transforming.

Chapter 10: Re-expressing Data (Get it Straight)

AP Statistics Chapter 3 Part 3

Chapter 2 Looking at Data— Relationships

Chapter 2: Looking at Data — Relationships

Looking at Data - Relationships Data analysis for two-way tables

Chapter 2 Looking at Data— Relationships

Linear transformations

CHAPTER 26: Inference for Regression

Chapter 2 Looking at Data— Relationships

CHAPTER 12 More About Regression

Advanced Placement Statistics Section 4

3.1: Scatterplots & Correlation

CHAPTER 12 More About Regression

Chapters Important Concepts and Terms

CHAPTER 12 More About Regression

Chapter 4: More on Two-Variable Data

Presentation transcript:

Chapter 4: More about Relationships Between Two Variables

4.1 – Transforming to Achieve Linearity Exponential Growth

Not all data can be expressed with a linear model.

PROBLEM! We cannot use least-squares regression for nonlinear data because least-squares regression depends upon correlation, which only measures the strength of linear relationships. SOLUTION! Transform the data into a linear set, then use the least-squares regression to determine the best fitting line for the transformed data. Finally, do a reverse transformation equation which will model our original nonlinear data.

Properties of Logarithms 1. log ab = log a + log b a b 2. log = log a – log b 3. log xp = p  log x Remember: log has a base of 10 and natural logs (ln) have a base of e. It doesn’t matter which one you use.

Linearizing Exponential Functions: We want to write an exponential function of the form y = abx as a linear model. (where x, y are variables and a,b are constants) y = abx log y = log (abx) log y = log a + log bx log y = log a + xlog b (x, y) (x, log y)

CONCLUSIONS: 1. If the graph of (x, y) is exponential, then the graph of (x, log y) is linear. 2. If the graph of (x, log y) is linear, then the graph of (x, y) is exponential.

Example #1 Transform the exponential data to a linear model using logs and then natural logs. y = 5(2)x log y = log (5  2x) ln y = ln (5  2x) log y = log 5 + log 2x ln y = ln 5 + log 2x log y = log 5 + xlog 2 ln y = ln 5 + xln 2 log y = 0.69897 + 0.3010x ln y = 1.6094 + 0.6931x

Example #2 ln y = 16 + 9x e e y = e(16 + 9x) y = e(16)  e(9x) Convert the equation back to an exponential function. ln y = 16 + 9x e e y = e(16 + 9x) y = e(16)  e(9x) y = e(16)  e(9)x y = 8,886,110.521  8103.0839x

Example #3 log y = 4 + 2x 10 10 y = 10(4 + 2x) y = 10(4)  10(2x) Convert the equation back to an exponential function. log y = 4 + 2x 10 10 y = 10(4 + 2x) y = 10(4)  10(2x) y = 10(4)  10(2)x y = 10,000  100x

Calculator Tip: Exponential Functions L1: x L2: y L3: leave blank for now! L4: log y LinReg(L1, L4, Y1) - (x, log y, Y1) To prevent Overload error: convert years to a smaller number

Calculator Tip: Residual Plot After calculating the line of regression: In Lists!

Calculator Tip: Exponential Equation ExpReg(L1, L2, Y2) - (x, y, Y2)

Exponential to Linear Change: 1. The ratio of the y’s should be fairly constant 2. Graph x and y and look at the pattern 3. Calculate the transformed linear model 4. Describe the r value and the residual plot

1. Make a scatterplot of the data and describe the graph. Example#4: Consider the following data representing the population for Asian and Pacific Islander. Year 1950 1960 1970 1980 1990 2000 Population (in thousands) 1131 1620 2320 3330 4770 6850 1. Make a scatterplot of the data and describe the graph.

D: Positive, as year increases, population increases F: Nonlinear S: Strong

Year 1950 1960 1970 1980 1990 2000 Population (in thousands) 1131 1620 2320 3330 4770 6850 2. Describe the pattern of change and find the percent of change for each y (ratio of y’s). The ratios of the y’s are fairly consistent, suggesting an exponential model

3. Find r and describe its meaning D: Positive S: Strong

4. Graph and comment on the residual plot for x and y. Curve, not a good linear model

5. Take the log of the y-values and make a new scatterplot. D: Positive D: Positive F: Nonlinear F: Linear S: Strong S: Strong

6. Find the least squares regression line of the transformed data. Log(Population) = 2.27095 + 0.0156432(Year)

r = 0.968 3. Find r and describe its meaning 7. Find the value of r and describe its meaning. r = 0.999999 r = 0.968 D: Positive D: Positive S: Strong S: Strong

Curve, not a good linear model 4. Graph and comment on the residual plot for x and y. 8. Construct the residual plot and describe its meaning. No pattern, so good linear model Curve, not a good linear model

9. Perform the inverse transformation to express y-hat as an exponential equation. 10 10 y = 10(2.27095 + 0.0156432x) y = 10(2.27095)  10(0.0156432x) y = 10(2.27095)  10(0.0156432)x y = 186.6162  1.0367x

10. Check your work on your calculator using ExpReg.

11. Make a prediction for the population in 2010 using both equations. log y = 2.27095 + 0.0156432(110) y = 186.6162  1.0367x log y = 3.991697 y = 186.6162  1.0367(110) 10 10 y = 9810.6342 y = 9,810.6342

1. Make a scatterplot of the data and describe the graph. Example#5: Consider the following data representing an account balance over time: x: time (months) 48 96 144 192 240 y: account balance ($) 100 161.22 259.93 419.06 675.62 1089.30 1. Make a scatterplot of the data and describe the graph.

D: Positive, as time increases, account balance increases F: Nonlinear S: Strong

x: time (months) y: account balance ($) 48 96 144 192 240 y: account balance ($) 100 161.22 259.93 419.06 675.62 1089.30 2. Describe the pattern of change and find the percent of change for each y (ratio of y’s).

3. Find r and describe its meaning D: Positive S: Strong

4. Graph and comment on the residual plot for x and y. Curved, not good linear model

5. Take the natural log of the y-values and make a new scatterplot. D: Positive D: Positive F: Nonlinear F: Linear S: Strong S: Strong

6. Find the least squares regression line of the transformed data. ln(Account Balance) = 4.60516 + 0.00995047(Months)

r = 0.9481 7. Find r and describe its meaning. D: Positive D: Positive S: Strong S: Strong

Curved, not good linear model 4. Graph and comment on the residual plot for x and y. 8. Construct the residual plot and describe its meaning. No pattern, so good linear model Curved, not good linear model

9. Perform the inverse transformation to express y-hat as an exponential equation. e e y = e(4.60516 + 0.00995047x) y = e(4.60516)  e (0.00995047x) y = e(4.60516)  e (0.00995047)x y = 99.9988  1.01x

10. Check your work on your calculator using ExpReg.

11. Make a prediction for the account balance in 60 months using both equations. y = 99.9988  1.01x ln y = 4.60516 + 0.00995047(60) y = 99.9988  1.01(60) ln y = 5.20218656728 e e y = $181.67 y = $181.67

4.1 – Transforming to Achieve Linearity – Power Model

A power model is in the form y = axp A power model is in the form y = axp. To transform this equation into a linear model you must apply the log transformation to both variables x and y. y = axp log y = log (axp) log y = log a + log xp log y = log a + plog x How is this different than exponential functions? You have to take the log of both x and y to make a linear model.

Example #6 y = 4x5 y = 4x5 log y = log (4x5) ln y = ln (4x5) Find the LSRL by taking the logs and then the natural logs. y = 4x5 y = 4x5 log y = log (4x5) ln y = ln (4x5) log y = log 4 + log x5 ln y = ln 4 + ln x5 log y = log 4 + 5log x ln y = ln 4 + 5ln x log y = 0.6021 + 5log x ln y = 1.3863 + 5ln x

e e y = e(-5 + 9lnx) y = e(-5)  e(9lnx) y = e(-5)  e(lnx)9 Example #7 Convert the equation back to a power equation. ln y = -5 + 9ln x e e y = e(-5 + 9lnx) y = e(-5)  e(9lnx) y = e(-5)  e(lnx)9 y = 0.0067x9

10 10 y = 10(0.5 + 2logx) y = 10(0.5)  10(2logx) Example #8 Convert the equation back to a power equation. log y = 0.5 + 2log x 10 10 y = 10(0.5 + 2logx) y = 10(0.5)  10(2logx) y = 10(0.5)  10(logx)2 y = 3.1623x2

Calculator Tip: Power Functions L1: x L2: y L3: log x L4: log y LinReg(L3, L4, Y1) - (log x, log y, Y1)

Calculator Tip: Power Equation PwrReg(L1, L2, Y2) - (x, y, Y2)

1. Make a scatterplot of the data and describe the graph. Example #9 The distances from our sun and the periods of the 9 planets in the solar system are given below. Distance (astronomical units) .39 .72 1 1.5 5.2 9.5 19 30 40 Period (earth years) .24 .62 1.9 12 29 84 160 250 1. Make a scatterplot of the data and describe the graph.

D: Positive, as distance increases, period increases F: Nonlinear S: Strong

Distance (astronomical units) .39 .72 1 1.5 5.2 9.5 19 30 40 Period (earth years) .24 .62 1.9 12 29 84 160 250 2. Describe the pattern of change and find the percent of change for each y (ratio of y’s). Ratio of y’s are not similar, perhaps not exponential

3. Find r and describe its meaning D: Positive S: Strong

4. Graph an exponential model and discuss if it is appropriate to use this model. Curved, not good linear model

1. Make a scatterplot of the data and describe the graph. 5. Transform the data to a linear model by taking the log of the x’s and the y’s. Make a sketch of the new scatterplot. 1. Make a scatterplot of the data and describe the graph. D: Positive D: Positive F: Nonlinear F: Linear S: Strong S: Strong

6. Find the least squares regression line of the transformed data. log(Period) = 0.002916 + 1.49627log(Distance)

r = 0.9779 3. Find r and describe its meaning 7. Find the value of r and describe its meaning. r = 0.9999765 r = 0.9779 D: Positive D: Positive S: Strong S: Strong

No pattern, so good linear model 8. Construct the residual plot and describe its meaning. No pattern, so good linear model

10 10 y = 10(0.002916+ 1.49627logx) y = 10(0.002916)  10(1.49627logx) 9. Perform the inverse transformation to express y-hat as an exponential equation. 10 10 y = 10(0.002916+ 1.49627logx) y = 10(0.002916)  10(1.49627logx) y = 10(0.002916)  10(logx)1.49627 y = 1.0067x1.49627

10. Check your work on your calculator using PwrReg.

11. If a planet were discovered 35 astronomical units from our sun, predict its period using both equations. y = 1.0067x1.49627 y = 1.0067(35)1.49627 y = 205.709 10 10

1. Graph the original data. Do you see a curve? How do you determine if the model is exponential or power? 1. Graph the original data. Do you see a curve? 2. Look for the ratio of the y values to see if maybe exponential 3. Take the logs of both x and y. Then graph (x, log y) and (log x, log y). Which graph looks more linear? 4. Use the r value and the residual plot to determine the strength of the linear relationship.

Example #10 An experiment was conducted to determine the effect of practice time (in seconds) on the percent of unfamiliar words recalled. Here is a Fathom scatterplot of the results with a least-squares regression line superimposed. (a) Sketch a residual plot below.

(b) Does a linear model fit the data well? Justify your answer. No, the residual plot has a curve in it, so it isn’t linear

We used Fathom to transform the original data in hopes of achieving linearity. The screen shots below show the results of two different transformations. (c) Would an exponential model or a power model fit the original data better? Justify your answer. Power, Stronger r value and residual plot is not as curved

(d) Use the model you chose in (c) to predict word recall for 25 seconds of practice. Show your method. e e

Example #11 Foresters are interested in predicting the amount of usable lumber they can harvest from various tree species. The following data have been collected on the diameter of Ponderosa pine trees, measured at chest height, and the yield in board feet. Note that a board foot is defined as a piece of lumber 12 inches by 12 inches by 1 inch. Determine if an exponential or power model would make a better model. Support your reasoning. Using the model you have chosen, predict the yield in board feet from a diameter of 40.

yes 1. Graph the original data. Do you see a curve? Diameter Bd Feet 36 192 28 113 88 41 294 19 32 123 22 51 38 252 25 56 17 16 31 141 20 86 21 39 231 33 187 37 205 23 57 265 1. Graph the original data. Do you see a curve? yes

no 2. Look for the ratio of the y values to see if maybe exponential Diameter Bd Feet 36 192 28 113 88 41 294 19 32 123 22 51 38 252 25 56 17 16 31 141 20 86 21 39 231 33 187 37 205 23 57 265 2. Look for the ratio of the y values to see if maybe exponential no

3. Take the logs of both x and y 3. Take the logs of both x and y. Then graph (x, log y) and (log x, log y). Which graph looks more linear? Power is more linear

4. Use the r value and the residual plot to determine the strength of the linear relationship. Power has a stronger r value and doesn’t have a curve in the residual plot, therefore, it is a power model.

Using the model you have chosen, predict the yield in board feet from a diameter of 40.

4.2 – Relationship between Categorical Variables

http://www.ruf.rice.edu/~lane/stat_sim/transformations/index.html

Because we cannot perform direct calculation on categorical data, we use the counts or percents of individuals by category. Two-Way Table: Classifies categorical data according to two variables. Marginal Distribution: The total of each margin, column and row. Conditional Distribution: Distribution of one variable for given categories of another variable.

Segmented bar graph: Adds up conditional probabilities to 100% based on categories The following segmented bar graph represents the conditional distributions of living arrangements for each race category:

Example #12 In a national survey of adult Americans in 1998, people were asked to indicate their age and to classify their interest in politics as very much, somewhat, or not much. The ages were grouped in ranges. 18-35 36-55 56-94 Not Much 146 89 Somewhat 192 260 154 Very Much 47 125 106 381 606 278 385 531 349 1265 Calculate the row and column totals.

18-35 36-55 56-94 Not Much 146 89 Somewhat 192 260 154 Very Much 47 125 106 381 606 278 385 531 349 1265 b. What proportion of the survey respondents were between ages 18 and 35? = 0.3043

18-35 36-55 56-94 Not Much 146 89 Somewhat 192 260 154 Very Much 47 125 106 381 606 278 385 531 349 1265 c. What proportion of the survey respondents were between 36 and 55? = 0.41976

18-35 36-55 56-94 Not Much 146 89 Somewhat 192 260 154 Very Much 47 125 106 381 606 278 385 531 349 1265 d. What proportion of the survey respondents were between 56 and 94? = 0.27589

18-35 36-55 56-94 Not Much 146 89 Somewhat 192 260 154 Very Much 47 125 106 381 606 278 385 531 349 1265 e. Restrict your attention (for the moment) to just the respondents under 35 years of age. What proportion of these young respondents classify themselves as having not much interest in politics? = 0.3792

18-35 36-55 56-94 Not Much 146 89 Somewhat 192 260 154 Very Much 47 125 106 381 606 278 385 531 349 1265 f. What proportion of the young respondents classify themselves as somewhat interested in politics? = 0.4987

18-35 36-55 56-94 Not Much 146 89 Somewhat 192 260 154 Very Much 47 125 106 381 606 278 385 531 349 1265 g. What proportion of the young respondents classify themselves as having very much interest in politics? = 0.1221

18-35 36-55 56-94 Not Much .2749 .2550 Somewhat Very Much Total 1.000 h. Record the conditional distribution that you have just calculated in the table below. 18-35 36-55 56-94 Not Much .2749 .2550 Somewhat Very Much Total 1.000 0.3792 0.4987 0.4896 0.4413 0.1221 0.2354 0.3037

i. Construct a segmented bar graph

Example #13 The University of CA at Berkeley was charged with having discriminated against women in their graduate admissions process for the fall quarter of 1973. The table below identifies the number of acceptances and denials for both men and women applicants in each of the six largest graduate programs at the institution at that time. Men Accepted Men Denied Women Accepted Women denied Program A 511 314 89 19 Program B 352 208 17 8 Program C 120 205 202 391 Program D 137 270 132 243 Program E 53 138 95 298 Program F 22 351 24 317 total 1195 1486 559 1276

1195 1486 2681 559 1276 1835 1754 2762 4516 Accepted Denied Total Men Start by ignoring the program distinction, collapsing the data into a two-way table of gender by admissions status. To do this, find the total number of men accepted and denied and the total number of women accepted and denied. Fill in the table below: Accepted Denied Total Men Women 1195 1486 2681 559 1276 1835 1754 2762 4516

b. Consider for the moment just the men applicants b. Consider for the moment just the men applicants. Of the men who applied to one of these programs, what proportion were accepted? Now consider the women applicants; what proportion of them were accepted? Do these proportions seem to support the claim that men were given preferential treatment in admissions decisions? MEN WOMEN = = 0.4457 0.3046

Proportion of men Accepted Proportion of women Accepted c. To try to isolate the program responsible for the alleged mistreatment of women applicants, calculate the proportion of men and the proportion of women within each program who were accepted. Record your results in the table below: Proportion of men Accepted Proportion of women Accepted Program A Program B Program C Program D Program E Program F 511/1195 = 0.4276 89/559 = 0.1592 352/1195 = 0.2946 17/559 = 0.0304 120/1195 = 0.1004 202/559 = 0.3614 137/1195 = 0.1146 132/559 = 0.2361 53/1195 = 0.0444 95/559 = 0.1699 22/1195 = 0.0184 24/559 = 0.0429

Yes, program A and program B accepted less women than men. d. Does it seem as if any program is responsible for the large discrepancy between women in the overall proportions admitted? Yes, program A and program B accepted less women than men.

= = 0.90 0.80 Survived Died Total Hospital A 800 200 1000 Hospital B Example #14: The following two-way table classifies hypothetical hospital patients according to the hospital that treated them and whether they survived or died: Survived Died Total Hospital A 800 200 1000 Hospital B 900 100 Calculate the proportion of hospital A’s patients who survived and the proportion of hospital B’s patients who survived. Which hospital saved the higher percentage of its patients? Hospital A Hospital B = = 0.80 0.90

Suppose that when we further categorize each patient according to whether they were in fair condition or poor condition prior to treatment we obtain the following two-way table: FAIR CONDITION Survived Died Total Hospital A 590 10 600 Hospital B 870 30 900 POOR CONDITION Survived Died Total Hospital A 210 190 400 Hospital B 30 70 100

= = 0.9667 0.9833 FAIR CONDITION Survived Died Total Hospital A 590 10 600 Hospital B 870 30 900 POOR CONDITION Survived Died Total Hospital A 210 190 400 Hospital B 30 70 100 b. Among those who were in fair condition, compare the recovery rates for the two hospitals. Which hospital saved the greater percentage of its patients who had been in fair condition? Hospital A Hospital B = = 0.9833 0.9667

= = 0.3 0.525 FAIR CONDITION Survived Died Total Hospital A 590 10 600 Hospital B 870 30 900 POOR CONDITION Survived Died Total Hospital A 210 190 400 Hospital B 30 70 100 c. Among those who were in poor condition, compare the recovery rates for the two hospitals. Which hospital saved the greater percentage of its patients who had been in poor condition? Hospital A Hospital B = = 0.525 0.3

Simpson’s Paradox: When you combine data sometimes it reverses the direction of the relationship in the individual pieces.

d. Write a few sentences explaining (arguing from the given data given) how it happens that hospital B has the higher recovery rate overall, yet hospital A has the higher recovery rate for each type of patient. e. Which hospital would you rather go to if you were ill? Explain.

4.3 – Establishing Causation

The only time you can determine causation is when you conduct an experiment. What if you can’t do an experiment? Look for a strong, consistent association The increase in the explanatory variable leads to a stronger increase in response The cause is plausible

x causes y

Seems x causes y, but z has an effect on x and y, “Z is common to both” Seems x causes y, but z has an effect on x and y, making it look like x causes y

Seems x causes y, but z also has an effect on y, making it look like x causes y

Example #15 A soccer coach wanted to improve the team's playing ability, so he had them run two miles a day. At the same time the players decided to take vitamins. In two weeks the team was playing noticeably better, but the coach and players did not know whether it was from the running or the vitamins. What type of variable is this? Confounding. Running Improve teams ability vitamins

Example #16 An article that appeared in the San Luis Obispo Tribune (November 11, 1999) was titled “Study Points Out Dangerous Side to SUV Popularity: Half of All 1996 Ejection Deaths Occur in SUVs.” This article states that SUV’s have a much higher rate of passengers being thrown from a window during an accident than do automobiles. The article also states that more than half of all deaths caused by ejection involved SUVs – the basis for the conclusion that SUVs are more dangerous than cars. Later in the article, there is a comment that about 98% of those injured or killed in ejection accidents were not wearing seat belts. Comment on the conclusion that SUVs are more dangerous than cars. Confounding variable

SUV Roll over Seat belts

Example #17 A study showed that households with more TV sets tend to have longer life expectancies. Describe a possible common response relationship. More TVs Longer life expectancy More $ COMMON RESPONSE!

Example #18 Based on a survey conducted on the DietSmart.com website, investigators concluded that women who regularly watched Oprah were only one-seventh as likely to crave fattening foods as those who watched other daytime talk shows. Is it reasonable to conclude that watching Oprah causes a decrease in cravings for fattening foods? Explain. NO, Not an experiment!