Presentation on theme: "Chapter 4 Review: More About Relationship Between Two Variables Group Members: Qianya Meng Nikta Kheiri Min Kim 1 st period 12/14/11."— Presentation transcript:
Chapter 4 Review: More About Relationship Between Two Variables Group Members: Qianya Meng Nikta Kheiri Min Kim 1 st period 12/14/11
The Big Idea
Vocabulary You Need to Know Transforming or re-expressing the data is applying a function such as the logarithm or square root to a quantitative variable Log Rules: 1) logb(mn) = logb(m) + logb(n) 2) logb(m/n) = logb(m) – logb(n) 3) logb(mn) = n · logb(m)
Vocabulary Linear growth increases by a fixed amount in each equal time period. Exponential growth model Log y = log a + (log b)x Predicted y = ab^x Power law model Log y = log a + p log x Predicted y = ax^p
Vocabulary Two-way table describes two categorical variables Marginal distributions are the total in each column and row variable Conditional distributions of column variable, given row variable Conditional distributions of row variable, given column variable Simpson’s paradox is a reversal that an association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group
Vocabulary Causation: Changes in x cause changes in y Common response: Changes in both x and y are caused by changes in a lurking variable z Confounding: The effect (if any) of x on y is confounded with the effect of a lurking variable z
Key Topics Covered in this Chapter Modeling nonlinear data Relations in categorical data Establishing causation
Formulas You Should Know Exponential growth model Log y = log a + (log b)x Predicted y = ab^x Power law model Log y = log a + p log x Predicted y = ax^p
Calculator Key Strokes Exponential growth modeling Enter the explanatory data into L1 and response data into L2 Draw the scatterplot y versus x Define L3 as the (natural) logarithm of L2 then make a scatterplot of (ln) log versus L1 Perform the least-squares regression on the transformed data Draw the scatterplot Plot the residuals versus L1 With the regression equation in Y1, define Y2 = e^(Y1) or Y2 = log^(Y1).
Calculator Key Strokes Power law modeling Enter the explanatory data into L1 and response data into L2 Draw the scatterplot y versus x Define L3 as the (natural) logarithm of L1 and define L4 as the (natural) logarithm of L2 Plot L4 versus L3 Calculate the regression equation for the transformed data and store it in Y1 Construct a residual plot Define Y2 as (10^a)(x^b) or (e^a)(x^b) Plot Y2 and the scatterplot for the original data together To make a prediction for the value x = k, evaluate Y2(k) on the home screen
Helpful Hints When the explanatory variable is years, transform the data to “years since” so that the values are smaller and don’t create overflow problems when you perform the inverse transformation If there is a clear explanatory/response relationship, compare the conditional distributions of the response variable for the separate values of the explanatory variable Even when direct causation is present, it is rarely a complete explanation of an association between two variables
Q1 Some college students collected data on the intensity of light at various depths in a lake. Here are their data: a)Make a scatterplot suitable for predicting light intensity from depth. Describe the form of the relationship. b)To verify that the decrease in light intensity follows an exponential model, calculate the ratio of light intensity at consecutive depths. Start with /168.00= what do you conclude? c)Take the natural logarithm(ln) of the light intensity measurements and plot these values against the corresponding depth. Does this transformation achieve linearity? d)Calculate the least-square regression equation for the transformed data. Interpret the slope and y intercept of this equation in this setting. e)Construct and interpret a residual plot. f)Perform the inverse transformation to express light intensity as an exponential function of depth in the lake. Display scatter plot of the original data with the exponential model superimposed. Is your exponential function a satisfactory model for the data? g)Use your model to predict the light intensity at a depth of 22 meters. The actual light intensity reading at the depth was.58 lumens. Does this surprise you? Depths (m) Light intensity
Answer Q2 (a) relationship is curved, strong, and positive. (b) if x = time and y = distance, predicted y = x^2 (c) r^2 = and the residual plot shows random scatter and fairly small-sized residuals, so this looks like an appropriate model (d) yes. Square-root of the predicted y = x (e) r^2 = and the residual plot show no pattern, which suggest a good model (f) using model from (b): cm. using model from (d): cm
Q3 Here are data from eight schools on smoking among students and among their parents. a)How many students are described in the two-way table ? b)What percent of these students smoke? c)Give the marginal distribution of parents’ smoking behavior, both in counts and in percents. d)Calculate three conditional distributions of students’ smoking behavior: one for each of the three parental smoking categories. Describe the relationship between the smoking behaviors of students and their parents in a few sentences. Neither parent smokeOne parent smokeBoth parents smoke Students does not smoke Student smoke
Answer Q3 A) 5375 students B) 18.7% C) both parents smoke: 1780, 33.1%. One parent smokes: 2239, 41.7%. Neither parents smoke: 1356, 25.2%. D) student smokes, given both parents smoke: 400/( )= student doesn’t smoke, given both parents smoke: 1380/( )= student smoke, given one parent smokes: 416/( )= student doesn’t smoke, given one parent smokes: 1823/( )= student smokes, given neither parent smokes : 188?( )= student doesn’t smoke, given that neither parent smokes: 1168/( )= students who smoke are most likely to come from families where one or more of their parents smoke.
Q4 Whether a convicted murder gets the death penalty seems to be influenced by the race of the victim. Here are data on 326 cases in which the defendants was convicted of murder a)Use these data to make a two-way table of defendant’s race vs. death penalty b)Show that Simpson’s paradox holds: a higher percent of white defendants are sentenced to death overall, but for the black and white victims a higher percent of black defendants are sentenced to death. c)Use the data to explain why the paradox hold in language that a judge could understand White defendantBlack defendant White victimBlack victimWhite victimBlack victim Death Not
Answer Q4 A) white defendant: 19 yes, 141 no. Black defendant: 17 yes, 149 no. B) overall death penalty: 11.9% of white defendants, 10.2% of Black defendants. For white victims, 12.6% and 17.5%; for black victims, 0% and 5.8%. C) the death penalty is more likely when the victim was white(14%) rather than lack (5.4%). Because most convicted killers are of the same race as their victims, whites are more often sentenced to death.
Q5 A study showed that woman who work in the production of computer chips have abnormally high numbers of miscarriages. The union claimed that exposure to chemical used in production causes the miscarriage. Another possible explanation is that these workers spend most of their time standing up. Can we conclude that exposure to chemicals causes more miscarriages? Why or why not?
Answer Q5 No. The “number of hours standing up at work” is a confounding variable.
Q6 A study finds that high school students who take the SAT, enroll in an SAT coaching courses, and then take the SAT a second time raise their SAT mathematics scores from a mean of 521 to a mean of 561. what factors other taking the course might explain this improvement?
Answer Q6 The variable “knowledge gained as a result of taking the SAT previously is a confounding variable.