Presentation is loading. Please wait.

Presentation is loading. Please wait.

Section 4.4: Simpson Paradox Section 4.5: Linearizing an association between two variable by performing a Mathematical Transformations 4-11.

Similar presentations


Presentation on theme: "Section 4.4: Simpson Paradox Section 4.5: Linearizing an association between two variable by performing a Mathematical Transformations 4-11."— Presentation transcript:

1 Section 4.4: Simpson Paradox Section 4.5: Linearizing an association between two variable by performing a Mathematical Transformations 4-11

2 Consider the following study: Accident rates in California. A study showed that male teenagers have twice the accident rate of female teenagers. MaleFemale Proportion of accidents:0.1620.075 The study did not take into account the confounding variable: number of miles driven per year! MaleFemale Accident rate 0.1620.075 Average number of miles p.p.9,5574,643 Average number of accidents1.781.77 per 100,000 miles This more accurate study shows NO DIFFERENCE!! The higher proportion of accidents for male teenagers is explained away by the fact that men typically drive more! This is an example of Simpson’s paradox!! Simpson’s Paradox

3 Another example: Medical study of a treatment: http://qjmed.oxfordjournals.org/cgi/content/full/95/4/247 Table 1 Number of patients responding to treatment A vs. treatment B: A is better than B ResponseNo responseResponse rate Treatment A20 20/40=50% Treatment B162416/40=40% Table 2 Number of patients with high serum X responding to treatment A vs. treatment B: in this subgroup, B is better than A ResponseNo responseResponse rate Treatment A181218/30=60% Treatment B737/10=70% Table 3 Number of patients with low serum X responding to treatment A vs. treatment B: in this subgroup too, B is better than A ResponseNo responseResponse rate Treatment A282/10=20% Treatment B9219/30=30%

4 Conclusion: http://qjmed.oxfordjournals.org/cgi/content/full/95/4/247 “Thus, if the patient's serum X level is unknown, treatment A seems to be better, but if serum X is known, treatment B is preferable (and one can better predict the response rate of a patient). This phenomenon is a result of the aggregation of two (or more) subgroups. 1 The numbers of the example are kept simple to demonstrate this phenomenon of severe confounding, but there are a number of real examples in the literature, including the medical literature. 2–4. This aggregation effect can occur in the case of an uneven distribution of a ‘latent variable’ (in this case the serum X level) among the groups studied. “ 1 2–4.

5 Simpson’s Paradox represents a situation in which an association between two variables inverts or goes away when : data are collapsed across a sub-classification (in the previous example: across different serum X levels), the overall change may not represent what is really happening. there is a combination of a lurking variable and/or data from unequal sized groups being combined into a single data set. The unequal group sizes, in the presence of a lurking variable, can weight the results incorrectly.

6 Exponential relationship 4-6 Nonlinear Regression

7 Power relationship: 4-7

8 Apply a logarithm transformation to re-express the previous Exponential or Power functions into Linear Functions Use Log function properties: log a (MN) = log a M + log a N log a M r = r log a M (M, N, and a are positive real numbers, a > 1, and r is any real number.) Linearization

9 y = ab x Exponential Model log y = log (ab x )Take the common logarithm of both sides log y = log a + log b x log y = log a + x log b Y = A + B x where b = 10 B a = 10 A 4-9

10 y = ax b Power Model log y = log (ax b )Take the common logarithm of both sides log y = log a + log x b log y = log a + b log x Y = A + b Xwherea = 10 A 4-10

11 Example: The statistics of poverty and inequality Data from U.N.E.S.C.O. 1990 Demographic Year Book. For 97 countries in the world, data are given for birth rates and for an index of the Gross National Product. Exponential relation!

12 The plot before shows a non-linear association! we can make it linear by using the transformation natural log of GNP. Birth rate vs Log G.N.P. Linearization using LOG function:

13 EXAMPLE Finding the Curve of Best Fit to a Power Model Cathy wishes to measure the relation between a light bulb’s intensity and the distance from some light source. She measures a 40- watt light bulb’s intensity 1 meter from the bulb and at 0.1-meter intervals up to 2 meters from the bulb and obtains the following data. DistanceIntensity 1.00.0972 1.10.0804 1.20.0674 1.30.0572 1.40.0495 1.50.0433 1.60.0384 1.70.0339 1.80.0294 1.90.0268 2.00.0224 4-13

14 (a) Draw a scatter diagram of the data treating the distance, x, as the predictor variable. (b) Determine X = log x and Y = log y and draw a scatter diagram treating the day, X = log x, as the predictor variable and Y = log y as the response variable. Comment on the shape of the scatter diagram. (c) Find the least-squares regression line of the transformed data. (d) Determine the power equation of best fit and graph it on the scatter diagram obtained in part (a). (e) Use the power equation of best fit to predict the intensity of the light if you stand 2.3 meters away from the bulb. 4-14

15 4-15

16 4-16


Download ppt "Section 4.4: Simpson Paradox Section 4.5: Linearizing an association between two variable by performing a Mathematical Transformations 4-11."

Similar presentations


Ads by Google