Download presentation

Presentation is loading. Please wait.

Published byKayden Jenney Modified about 1 year ago

1
Chapter 4: More about Relationships Between Two Variables

2
4.1 – Transforming to Achieve Linearity Exponential Growth

3
Not all data can be expressed with a linear model.

4
PROBLEM! We cannot use least-squares regression for nonlinear data because least- squares regression depends upon correlation, which only measures the strength of linear relationships. SOLUTION! Transform the data into a linear set, then use the least-squares regression to determine the best fitting line for the transformed data. Finally, do a reverse transformation equation which will model our original nonlinear data.

5
Properties of Logarithms 1. log ab = log a + log b 2. log = log a – log b abab 3. log x p = p log x Remember: log has a base of 10 and natural logs (ln) have a base of e. It doesn’t matter which one you use.

6
Linearizing Exponential Functions: We want to write an exponential function of the form y = ab x as a linear model. (where x, y are variables and a,b are constants) y = ab x log y = log (ab x ) log y = log a + log b x log y = log a + xlog b (x, log y) (x, y)

7
CONCLUSIONS: 1. If the graph of (x, y) is exponential, then the graph of (x, log y) is linear. 2. If the graph of (x, log y) is linear, then the graph of (x, y) is exponential.

8
Example #1 Transform the exponential data to a linear model using logs and then natural logs. y = 5(2) x log y = log (5 2 x ) log y = log 5 + log 2 x log y = log 5 + xlog 2 log y = 0.69897 + 0.3010x ln y = ln (5 2 x ) ln y = ln 5 + log 2 x ln y = ln 5 + xln 2 ln y = 1.6094 + 0.6931x

9
Example #2 Convert the equation back to an exponential function. ln y = 16 + 9x y = e (16 + 9x) e y = e (16) e (9x) y = e (16) e (9)x y = 8,886,110.521 8103.0839 x

10
Example #3 Convert the equation back to an exponential function. log y = 4 + 2x y = 10 (4 + 2x) 10 y = 10 (4) 10 (2x) y = 10 (4) 10 (2)x y = 10,000 100 x

11
Calculator Tip: Exponential Functions L1: x L2: y L3: leave blank for now! L4: log y LinReg(L1, L4, Y1) - (x, log y, Y1) To prevent Overload error: convert years to a smaller number

12
Calculator Tip: Residual Plot After calculating the line of regression: In Lists!

13
ExpReg(L1, L2, Y2) - (x, y, Y2) Calculator Tip: Exponential Equation

14
Exponential to Linear Change: 1. The ratio of the y’s should be fairly constant 2. Graph x and y and look at the pattern 3. Calculate the transformed linear model 4. Describe the r value and the residual plot

15
Example#4: Consider the following data representing the population for Asian and Pacific Islander. Year 195019601970198019902000 Population (in thousands) 113116202320333047706850 1. Make a scatterplot of the data and describe the graph.

16
D: Positive, as year increases, population increases F: Nonlinear S: Strong

17
2. Describe the pattern of change and find the percent of change for each y (ratio of y’s). Year 195019601970198019902000 Population (in thousands) 113116202320333047706850 The ratios of the y’s are fairly consistent, suggesting an exponential model

18
3. Find r and describe its meaning r = 0.968 D: Positive S: Strong

19
4. Graph and comment on the residual plot for x and y. Curve, not a good linear model

20
5. Take the log of the y- values and make a new scatterplot. D:Positive F:Linear S:Strong D: Positive F: Nonlinear S: Strong

21
6. Find the least squares regression line of the transformed data. Log(Population) = 2.27095 + 0.0156432(Year)

22
7. Find the value of r and describe its meaning. r = 0.999999 D: Positive S: Strong 3. Find r and describe its meaning r = 0.968 D: Positive S: Strong

23
8. Construct the residual plot and describe its meaning. No pattern, so good linear model 4. Graph and comment on the residual plot for x and y. Curve, not a good linear model

24
9. Perform the inverse transformation to express y-hat as an exponential equation. y = 10 (2.27095 + 0.0156432x) 10 y = 10 (2.27095) 10 (0.0156432x) y = 10 (2.27095) 10 (0.0156432)x y = 186.6162 1.0367 x

25
10. Check your work on your calculator using ExpReg.

26
11. Make a prediction for the population in 2010 using both equations. log y = 2.27095 + 0.0156432(110) log y = 3.991697 y = 9810.6342 10 y = 186.6162 1.0367 x y = 186.6162 1.0367 (110) y = 9,810.6342

27
Example#5: Consider the following data representing an account balance over time: 1. Make a scatterplot of the data and describe the graph. x: time (months) 04896144192240 y: account balance ($) 100161.22259.93419.06675.621089.30

28
D: Positive, as time increases, account balance increases F: Nonlinear S: Strong

29
2. Describe the pattern of change and find the percent of change for each y (ratio of y’s). x: time (months) 04896144192240 y: account balance ($) 100161.22259.93419.06675.621089.30

30
3. Find r and describe its meaning r = 0.9481 D: Positive S: Strong

31
4. Graph and comment on the residual plot for x and y. Curved, not good linear model

32
5. Take the natural log of the y-values and make a new scatterplot. D: Positive F: Nonlinear S: Strong D: Positive F: Linear S: Strong

33
6. Find the least squares regression line of the transformed data. ln(Account Balance) = 4.60516 + 0.00995047(Months)

34
7. Find r and describe its meaning. r = 0.999999 D: Positive S: Strong 3. Find r and describe its meaning r = 0.9481 D: Positive S: Strong

35
8. Construct the residual plot and describe its meaning. No pattern, so good linear model Curved, not good linear model 4. Graph and comment on the residual plot for x and y.

36
9. Perform the inverse transformation to express y-hat as an exponential equation. y = e (4.60516 + 0.00995047x) e y = e (4.60516) e (0.00995047x) y = e (4.60516) e (0.00995047)x y = 99.9988 1.01 x

37
10. Check your work on your calculator using ExpReg.

38
11. Make a prediction for the account balance in 60 months using both equations. ln y = 4.60516 + 0.00995047(60) ln y = 5.20218656728 y = $181.67 e y = 99.9988 1.01 x y = 99.9988 1.01 (60) y = $181.67

39
4.1 – Transforming to Achieve Linearity – Power Model

40
A power model is in the form y = ax p. To transform this equation into a linear model you must apply the log transformation to both variables x and y. y = ax p log y = log (ax p ) log y = log a + log x p log y = log a + plog x How is this different than exponential functions? You have to take the log of both x and y to make a linear model.

41
Example #6 Find the LSRL by taking the logs and then the natural logs. y = 4x 5 log y = log (4x 5 ) log y = log 4 + log x 5 log y = log 4 + 5log x log y = 0.6021 + 5log x y = 4x 5 ln y = ln (4x 5 ) ln y = ln 4 + ln x 5 ln y = ln 4 + 5ln x ln y = 1.3863 + 5ln x

42
Example #7 Convert the equation back to a power equation. ln y = -5 + 9ln x y = e (-5 + 9lnx) e y = e (-5) e (9lnx) y = e (-5) e (lnx)9 y = 0.0067x 9

43
Example #8 Convert the equation back to a power equation. log y = 0.5 + 2log x y = 10 (0.5 + 2logx) 10 y = 10 (0.5) 10 (2logx) y = 10 (0.5) 10 (logx)2 y = 3.1623x 2

44
Calculator Tip: Power Functions L1: x L2: y L3: log x L4: log y LinReg(L3, L4, Y1) - (log x, log y, Y1)

45
PwrReg(L1, L2, Y2) - (x, y, Y2) Calculator Tip: Power Equation

46
Example #9 The distances from our sun and the periods of the 9 planets in the solar system are given below. Distance (astronomical units).39.7211.55.29.5193040 Period (earth years).24.6211.9122984160250 1. Make a scatterplot of the data and describe the graph.

47
D: Positive, as distance increases, period increases F: Nonlinear S: Strong

48
Distance (astronomical units).39.7211.55.29.5193040 Period (earth years).24.6211.9122984160250 2. Describe the pattern of change and find the percent of change for each y (ratio of y’s). Ratio of y’s are not similar, perhaps not exponential

49
3. Find r and describe its meaning r = 0.9779 D: Positive S: Strong

50
4. Graph an exponential model and discuss if it is appropriate to use this model. Curved, not good linear model

51
5. Transform the data to a linear model by taking the log of the x’s and the y’s. Make a sketch of the new scatterplot. D: Positive F: Nonlinear S: Strong 1. Make a scatterplot of the data and describe the graph. D: Positive F: Linear S: Strong

52
6. Find the least squares regression line of the transformed data. log(Period) = 0.002916 + 1.49627log(Distance)

53
7. Find the value of r and describe its meaning. r = 0.9999765 D: Positive S: Strong 3. Find r and describe its meaning r = 0.9779 D: Positive S: Strong

54
8. Construct the residual plot and describe its meaning. No pattern, so good linear model

55
9. Perform the inverse transformation to express y-hat as an exponential equation. y = 10 (0.002916+ 1.49627logx) 10 y = 10 (0.002916) 10 (1.49627logx) y = 10 (0.002916) 10 (logx)1.49627 y = 1.0067x 1.49627

56
10. Check your work on your calculator using PwrReg.

57
11. If a planet were discovered 35 astronomical units from our sun, predict its period using both equations. 10 y = 1.0067x 1.49627 y = 1.0067(35) 1.49627 y = 205.709

58
How do you determine if the model is exponential or power? 1. Graph the original data. Do you see a curve? 2. Look for the ratio of the y values to see if maybe exponential 3. Take the logs of both x and y. Then graph (x, log y) and (log x, log y). Which graph looks more linear? 4. Use the r value and the residual plot to determine the strength of the linear relationship.

59
Example #10 An experiment was conducted to determine the effect of practice time (in seconds) on the percent of unfamiliar words recalled. Here is a Fathom scatterplot of the results with a least-squares regression line superimposed. (a) Sketch a residual plot below.

60
(b) Does a linear model fit the data well? Justify your answer. No,the residual plot has a curve in it, so it isn’t linear

61
We used Fathom to transform the original data in hopes of achieving linearity. The screen shots below show the results of two different transformations. (c) Would an exponential model or a power model fit the original data better? Justify your answer. Power,Stronger r value and residual plot is not as curved

62
(d) Use the model you chose in (c) to predict word recall for 25 seconds of practice. Show your method. e

63
Example #11 Foresters are interested in predicting the amount of usable lumber they can harvest from various tree species. The following data have been collected on the diameter of Ponderosa pine trees, measured at chest height, and the yield in board feet. Note that a board foot is defined as a piece of lumber 12 inches by 12 inches by 1 inch. Determine if an exponential or power model would make a better model. Support your reasoning. Using the model you have chosen, predict the yield in board feet from a diameter of 40.

64
DiameterBd Feet 36192 28113 2888 41294 1928 32123 2251 38252 2556 1716 31141 2032 2586 1921 39231 33187 1722 37205 2357 39265 1. Graph the original data. Do you see a curve? yes

65
2. Look for the ratio of the y values to see if maybe exponential DiameterBd Feet 36192 28113 2888 41294 1928 32123 2251 38252 2556 1716 31141 2032 2586 1921 39231 33187 1722 37205 2357 39265 no

66
3. Take the logs of both x and y. Then graph (x, log y) and (log x, log y). Which graph looks more linear? Power is more linear

67
4. Use the r value and the residual plot to determine the strength of the linear relationship. r = 0.9751r = 0.9880 Power has a stronger r value and doesn’t have a curve in the residual plot, therefore, it is a power model.

68
Using the model you have chosen, predict the yield in board feet from a diameter of 40. e

69
4.2 – Relationship between Categorical Variables

70
http://www.ruf.rice.edu/~lane/stat_sim/transformations/index.html

71
Because we cannot perform direct calculation on categorical data, we use the counts or percents of individuals by category. Two-Way Table:Classifies categorical data according to two variables. Marginal Distribution: The total of each margin, column and row. Conditional Distribution: Distribution of one variable for given categories of another variable.

72
Segmented bar graph: The following segmented bar graph represents the conditional distributions of living arrangements for each race category: Adds up conditional probabilities to 100% based on categories

73
Example #12 In a national survey of adult Americans in 1998, people were asked to indicate their age and to classify their interest in politics as very much, somewhat, or not much. The ages were grouped in ranges. 18-3536-5556-94 Not Much146 89 Somewhat192260154 Very Much47125106 a.Calculate the row and column totals. 385 531349 381 606 278 1265

74
b. What proportion of the survey respondents were between ages 18 and 35? 18-3536-5556-94 Not Much146 89 Somewhat192260154 Very Much47125106 385 531349 381 606 278 1265 = 0.3043

75
18-3536-5556-94 Not Much146 89 Somewhat192260154 Very Much47125106 385 531349 381 606 278 1265 c. What proportion of the survey respondents were between 36 and 55? = 0.41976

76
18-3536-5556-94 Not Much146 89 Somewhat192260154 Very Much47125106 385 531349 381 606 278 1265 d. What proportion of the survey respondents were between 56 and 94? = 0.27589

77
18-3536-5556-94 Not Much146 89 Somewhat192260154 Very Much47125106 385 531349 381 606 278 1265 e. Restrict your attention (for the moment) to just the respondents under 35 years of age. What proportion of these young respondents classify themselves as having not much interest in politics? = 0.3792

78
18-3536-5556-94 Not Much146 89 Somewhat192260154 Very Much47125106 385 531349 381 606 278 1265 f. What proportion of the young respondents classify themselves as somewhat interested in politics? = 0.4987

79
18-3536-5556-94 Not Much146 89 Somewhat192260154 Very Much47125106 385 531349 381 606 278 1265 g. What proportion of the young respondents classify themselves as having very much interest in politics? = 0.1221

80
h. Record the conditional distribution that you have just calculated in the table below. 18-3536-5556-94 Not Much.2749.2550 Somewhat Very Much Total1.000 0.3792 0.4987 0.1221 0.4896 0.2354 0.4413 0.3037

81
i. Construct a segmented bar graph

82
Example #13 The University of CA at Berkeley was charged with having discriminated against women in their graduate admissions process for the fall quarter of 1973. The table below identifies the number of acceptances and denials for both men and women applicants in each of the six largest graduate programs at the institution at that time. Men Accepted Men Denied Women Accepted Women denied Program A5113148919 Program B352208178 Program C120205202391 Program D137270132243 Program E5313895298 Program F2235124317 total 119514865591276

83
a.Start by ignoring the program distinction, collapsing the data into a two-way table of gender by admissions status. To do this, find the total number of men accepted and denied and the total number of women accepted and denied. Fill in the table below: AcceptedDeniedTotal Men Women Total 11951486 5591276 175427624516 1835 2681

84
b. Consider for the moment just the men applicants. Of the men who applied to one of these programs, what proportion were accepted? Now consider the women applicants; what proportion of them were accepted? Do these proportions seem to support the claim that men were given preferential treatment in admissions decisions? = 0.4457 = 0.3046 MEN WOMEN

85
c. To try to isolate the program responsible for the alleged mistreatment of women applicants, calculate the proportion of men and the proportion of women within each program who were accepted. Record your results in the table below: Proportion of men Accepted Proportion of women Accepted Program A Program B Program C Program D Program E Program F 511/1195 = 0.4276 352/1195 = 0.2946 120/1195 = 0.1004 137/1195 = 0.1146 53/1195 = 0.0444 22/1195 = 0.0184 89/559 = 0.1592 17/559 = 0.0304 202/559 = 0.3614 132/559 = 0.2361 95/559 = 0.1699 24/559 = 0.0429

86
d. Does it seem as if any program is responsible for the large discrepancy between women in the overall proportions admitted? Yes, program A and program B accepted less women than men.

87
Example #14: The following two-way table classifies hypothetical hospital patients according to the hospital that treated them and whether they survived or died: SurvivedDiedTotal Hospital A8002001000 Hospital B9001001000 a.Calculate the proportion of hospital A’s patients who survived and the proportion of hospital B’s patients who survived. Which hospital saved the higher percentage of its patients? = 0.80 = 0.90 Hospital A Hospital B

88
Suppose that when we further categorize each patient according to whether they were in fair condition or poor condition prior to treatment we obtain the following two-way table: FAIR CONDITIONSurvivedDiedTotal Hospital A59010600 Hospital B87030900 POOR CONDITIONSurvivedDiedTotal Hospital A210190400 Hospital B3070100

89
b. Among those who were in fair condition, compare the recovery rates for the two hospitals. Which hospital saved the greater percentage of its patients who had been in fair condition? FAIR CONDITIONSurvivedDiedTotal Hospital A59010600 Hospital B87030900 POOR CONDITIONSurvivedDiedTotal Hospital A210190400 Hospital B3070100 = 0.9833 = 0.9667 Hospital AHospital B

90
c. Among those who were in poor condition, compare the recovery rates for the two hospitals. Which hospital saved the greater percentage of its patients who had been in poor condition? FAIR CONDITIONSurvivedDiedTotal Hospital A59010600 Hospital B87030900 POOR CONDITIONSurvivedDiedTotal Hospital A210190400 Hospital B3070100 = 0.525 = 0.3 Hospital AHospital B

91
Simpson’s Paradox: When you combine data sometimes it reverses the direction of the relationship in the individual pieces.

92
d. Write a few sentences explaining (arguing from the given data given) how it happens that hospital B has the higher recovery rate overall, yet hospital A has the higher recovery rate for each type of patient. e. Which hospital would you rather go to if you were ill? Explain.

93
4.3 – Establishing Causation

94
The only time you can determine causation is when you conduct an experiment. What if you can’t do an experiment? Look for a strong, consistent association The increase in the explanatory variable leads to a stronger increase in response The cause is plausible

95
x causes y

96
Seems x causes y, but z has an effect on x and y, making it look like x causes y “Z is common to both”

97
Seems x causes y, but z also has an effect on y, making it look like x causes y

98
Example #15 A soccer coach wanted to improve the team's playing ability, so he had them run two miles a day. At the same time the players decided to take vitamins. In two weeks the team was playing noticeably better, but the coach and players did not know whether it was from the running or the vitamins. What type of variable is this? Confounding. RunningImprove teams ability vitamins

99
Example #16 An article that appeared in the San Luis Obispo Tribune (November 11, 1999) was titled “Study Points Out Dangerous Side to SUV Popularity: Half of All 1996 Ejection Deaths Occur in SUVs.” This article states that SUV’s have a much higher rate of passengers being thrown from a window during an accident than do automobiles. The article also states that more than half of all deaths caused by ejection involved SUVs – the basis for the conclusion that SUVs are more dangerous than cars. Later in the article, there is a comment that about 98% of those injured or killed in ejection accidents were not wearing seat belts. Comment on the conclusion that SUVs are more dangerous than cars. Confounding variable

100
SUV Roll over Seat belts

101
Example #17 A study showed that households with more TV sets tend to have longer life expectancies. Describe a possible common response relationship. More TVs Longer life expectancy More $ COMMON RESPONSE!

102
Example #18 Based on a survey conducted on the DietSmart.com website, investigators concluded that women who regularly watched Oprah were only one-seventh as likely to crave fattening foods as those who watched other daytime talk shows. Is it reasonable to conclude that watching Oprah causes a decrease in cravings for fattening foods? Explain. NO, Not an experiment!

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google