Presentation is loading. Please wait.

Presentation is loading. Please wait.

More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations.

Similar presentations


Presentation on theme: "More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations."— Presentation transcript:

1 More on Two-Variable Data

2 Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations involving powers and logarithms to linearize curved relationships. Explain what is meant by a two-way table, and describe its parts. Give an example of Simpson’s Paradox. Explain what gives the best evidence for causation. Explain the criteria for establishing causation when experimentation is not feasible.

3 The Goal Our goal is to fit a model to curved data so that we can make predictions as we did in chapter 3. HOWEVER, the only statistical tool we have to fit a model is the least-squares regression model. THEREFORE, in order to find a model for curved data, we must first “straighten it out”….

4 Transforming Relationships Data that displays a curved pattern can be modeled by a number of different functions. Two most common: –Exponential (y=AB x ) –Power (y=Ax B ) Chapter 4 focuses on these two models

5 pp. 195 – 6 Example 4.1 Brain weight v. body weight Note about variables: –Sometimes we wish to transform x, or y, or both x and y. –Therefore we refer to variables generically as t.

6 Why Linear transformations cannot straighten a curved relationship between two variables. Because of this, we must resort to functions that are not linear.

7 A Note about Monotonic Functions

8 4.1 A. y = 2.54 x monotonic increasing B. y = 60/x monotonic decreasing C. circumference = π(diameter) monotonic increasing D. SquaredError = (time – 5) 2 Not monotonic

9 Figure 4.5 What can we learn? –The graph of a linear function (power p = 1) is a straight line. –Powers greater than 1 (like p = 2 and p = 4) give graphs that bend upward. The sharpness of the bend increases as p increases. –Powers less than 1 but greater than 0 (like p = 0.5) give graphs that bend downward. –Powers less than 0 (like p = -0.5 and p = -1) give graphs that decrease as x increases. Greater negative values of p result in graphs that decrease more quickly. –Look at the p = 0 graph. You may be surprised that this is not the graph of y = x 0. Why not? The 0 th power x 0 is just the constant 1, which is not very useful. The p = 0 entry in the figure is not constant; it is the logarithm, log x. That is, the logarithm fits into the hierarchy of power transformations at p = 0.

10 pp. 201 - 202 Example 4.2 runs through several steps from the ladder of power transformations. This emphasizes that the process can be one of –(a) making a good guess, based on observations of a graph of the data, about the type of transformation needed and –(b) trying several types of the transformation chosen. This can get tedious, so the next section introduces a more analytic approach. The first approach is to look for an exponential growth pattern, which has the advantage that it can be linearized by taking logarithms (of the response variable) to transform the data.

11 4.3 Weight = c 1 (height) 3 and strength = c 2 (height) 2 ; therefore, strength = c (weight) 2/3, where c is a constant.

12 4.4 A graph of the power law y =x 2/3 shows that strength does not increase linearly with body weight, as would be the case if a person 1 million times as heavy as an ant could lift 1 million times more than the ant. Rather, strength increases more slowly. For example, if weight is multiplied by 1000, strength will increase by a factor of (1000) 2/3 = 100.

13 4.5 Let y = average heart rate and x = body weight. Keibler’s law says that total energy consumed is proportional to the three-fourths power of body weight, that is, Energy = c 1 x 3/4. But total energy consumed is also proportional to the product of the volume of blood pumped by the heart and the heart rate, that is, Energy = c 2 (volume)y. The volume of blood pumped by the heart is proportional to body weight, that is, Volume = c 3 x. Putting these three equations together yields c 1 x 3/4 = c 2 (volume)y = c 2 (c 3 x)y. Solving for y, we obtain

14 Exponential Growth Linear growth: adding a fixed increment in each equal time period. Exponential growth: multiplying by a fixed number in each equal time period. –Can also be looked at as growing by a fixed percentage.

15 p. 205 Example 4.4 Is this exponential growth? What is the projected amount for 2005? Actual was 203,000,000 (2005) Other interesting statistics: –2,000,000,000 cell phones world wide 4.5% world without –Average American spends 13 talking hours per month –Average American in 18 – 24 age group spends 22 talking hours per month

16 Texting in the United States

17 Logarithm log b x=y if and only if b y =x The rules for logarithms are

18 p. 209 Example 4.6

19 4.6 A.

20 4.6 B. 226260/63024 = 3.59 907075/226260 = 4.01 2826095/907075 = 3.12 C. log y yields 4.7996, 5.3546, 5.9576, 6.4512

21 4.6 C.

22 4.6 D. use calculator to confirm E. The residual plot of the transformed data shows no clear pattern, so the line is a reasonable model for these points.

23 4.6 F.

24 4.6 G. The predicted number of acres defoliated in 1982 is the exponential function evaluated at 1982, which gives 10,719,964.92 acres.

25 4.9

26 4.10 A. Year# children killed 19512 19524 19538 195416 195532 195664 1957128 1958256 1959512 19601024

27 4.10 B.

28 4.10 C. If x = number of years after 1950, then y = the number of children killed x years after 1950 = 2 x. At x = 45, y = 2 45 = 3.52 x 10 13, or 35,200,000,000,000.

29 4.10 D.

30 4.10 E. b = 0.3010 a = -587.008

31 p. 215 Exponential growth models become linear when we apply the logarithm transformation to the response variable y. Power law models become linear when we apply the logarithm transformation to both variables.

32 4.17 A. YearValue 1537.50 2577.81 3621.15 4667.73 5717.81 6771.65 7829.52 8891.74 9958.62 101030.52

33 4.17 B.

34 4.17 C. 2.73, 2.76, 2.79, 2.82, 2.86, 2.89, 2.92, 2.95, 2.98, 3.01

35 4.18 Alice has Fred has

36 Cautions About Correlation and Regression

37 Our Tools for Describing Data Sets Correlation –r: Strength, form, direction Regression –Generalized pattern –Useful for predictions Limitations of our tools –Correlation and regression describe only linear relationships –The correlation “r” and the “LSRL” are NOT RESISTANT

38 Other Cautions Extrapolation –The use of a regression line for prediction far outside the domain used. –Examples: Age v. Height Time v. Death Rate ( Swine Flu) Time v. Water Level of a Lake Time v. Children gunned down

39 Other Cautions Lurking Variables –A variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among these variables. –Can falsely suggest relationship between x and y –Can hide actual relationship between x and y

40 Other Cautions Lurking Variables –An example….

41 There's this guy who's going to clean the windows of a mental asylum. A patient follows him shouts to him "I gotta secret, I gotta secret...", he ignores the patient. Again the patient follows him, but he ignores his cries. By the time he's nearly finished the building, he's really curious about what the patients secret is, so he decides to ask the patient. The patient pulls a matchbox out of his pocket, opens it and puts it on a table. Out crawls this little spider. The patient says "spider go left", and the spider walks to it's left a bit. Then he says "spider go right", the spider walks to its right a little bit. He says "spider turn around, walk forward then go right", and sure enough the spider turns around, walks forward, and then goes right a bit. The window cleaner is amazed "Wow! He says, that's amazing!", "No, that's not my secret says the patient, watch". He picks up the spider in his hand and pulls all its legs off then puts it back on the table. "Spider go right", the spider doesn't move, "spider go Left", the spider doesn't move, "Spider turn around" again the spider doesn't move. "There!" he says, "that's my secret, if you pull all a spiders legs off they go deaf....................

42 The answer is not available in the original data, but was discovered through some additional research on the Buick Estate Wagon. These data were collected by Consumer's Union on a test track (rather than using the EPA test values for fuel efficiency) following the manufacturer's recommendations for each car's maintenance. Additional research revealed that starting with this model year, Buick recommended a higher tire inflation pressure for the Buick Estate Wagon. The recommended inflation pressure level was higher than the level for other cars in the survey. Harder tires present less rolling resistance and improve gas mileage; therefore, the Buick Estate Wagon outperformed our expectations based on our regression model, which did not account for tire inflation pressure. In our model Tire Pressure is a lurking variable, variable that seems to help in predicting gas mileage but is not included in the model.

43 Other Cautions Using averaged data –Pay particular attention to data that has been averaged –The correlation and LSRL of these data sets should not be applied to the individuals that the averages came from Example –Examining monthly data and attempting to apply it to a day of that month.

44 Beware the post-hoc fallacy “Post hoc, ergo propter hoc.” To avoid falling for the post-hoc fallacy, assuming that an observed correlation is due to causation, you must put any statement of relationship through sharp inspection. Causation can not be established “after the fact.” It can only be established through well-designed experiments. {see Ch 5}

45 Explaining Association Strong Associations can generally be explained by one of three relationships. Confounding Confounding: x may cause y, but y may instead be caused by a confounding variable z CommonResponse Common Response: x and y are reacting to a lurking variable z Causation Causation: x causes y

46 Causation Causation is not easily established. The best evidence for causation comes from experiements that change x while holding all other factors fixed. Even when direct causation is present, it is rarely a complete explanation of an association between two variables. Even well established causal relations may not generalize to other settings.

47 Common Response “Beware the Lurking Variable” The observed association between two variables may be due to a third variable. Both x and y may be changing in response to changes in z.

48 Confounding Two variables are confounded when their effects on a response variable cannot be distinguished from each other. Confounding prevents us from drawing conclusions about causation. We can help reduce the chances of confounding by designing a well-controlled experiment.

49 Example People with two cars tend to live longer than people who own only one car. Owning three cars is even better, and so on. What might explain the association?

50 p. 238 4.38: People who use artificial sweeteners in place of sugar tend to be heavier than people who use sugar. Does artificial sweetener use cause weight gain? –There may be a causative effect, but in the direction opposite to the one suggested: People who are overweight are more likely to be on diets, and so choose artificial sweeteners over sugar. Also, heavier people are at a higher risk to develop diabetes; if they do, they are likely to switch to artificial sweeteners.

51 p. 238 4.39: Women who work in the production of computer chips have abnormally high numbers of miscarriages. The union claimed chemicals cause the miscarriages. Another explanation may be the fact these workers spend a lot of time on their feet. –Time standing up is a confounding variable in this case.

52 p. 239 4.41: Children who watch many hours of TV get lower grades on average than those who watch less TV. Why does this fact not show that watching TV causes low grades?

53 p. 239 4.43: High school students who take the SAT, enroll in an SAT coaching course, and take the SAT again raise their mathematics score from an average of 521 to 561. Can this increase be attributed entirely to taking the course? The effect of coaching and confounded with those of experience. A student who has taken the SAT once may improve his ro her score on the second attempt because of increased familiarity with the test.


Download ppt "More on Two-Variable Data. Chapter Objectives Identify settings in which a transformation might be necessary in order to achieve linearity. Use transformations."

Similar presentations


Ads by Google