Presentation on theme: "Chapter 4 Review: More About Relationship Between Two Variables"— Presentation transcript:
1Chapter 4 Review: More About Relationship Between Two Variables Group Members:Qianya MengNikta KheiriMin Kim1st period12/14/11
2The Big Idea Transform the graph to achieve linearity Transform exponential graphs: 𝑦=𝑎 𝑏 𝑥 to achieve linearity and come up with a transformed equation for the use of extrapolation.Transform power functions 𝑦=𝑎 𝑥 𝑝 to achieve linearity and come up with a transformed equation for the use of extrapolation.Learn to use marginal distribution and conditionalRecognize relationships between two variables.
3Vocabulary You Need to Know Transforming or re-expressing the data is applying a function such as the logarithm or square root to a quantitative variableLog Rules:1) logb(mn) = logb(m) + logb(n)2) logb(m/n) = logb(m) – logb(n)3) logb(mn) = n · logb(m)
4VocabularyLinear growth increases by a fixed amount in each equal time period.Exponential growth modelLog y = log a + (log b)xPredicted y = ab^xPower law modelLog y = log a + p log xPredicted y = ax^p
5Vocabulary Two-way table describes two categorical variables Marginal distributions are the total in each column and row variableConditional distributions of column variable, given row variableConditional distributions of row variable, given column variableSimpson’s paradox is a reversal that an association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group
6Vocabulary Causation: Changes in x cause changes in y Common response: Changes in both x and y are caused by changes in a lurking variable zConfounding: The effect (if any) of x on y is confounded with the effect of a lurking variable z
7Key Topics Covered in this Chapter Modeling nonlinear dataRelations in categorical dataEstablishing causation
8Formulas You Should Know Exponential growth modelLog y = log a + (log b)xPredicted y = ab^xPower law modelLog y = log a + p log xPredicted y = ax^p
9Calculator Key Strokes Exponential growth modelingEnter the explanatory data into L1 and response data into L2Draw the scatterplot y versus xDefine L3 as the (natural) logarithm of L2 then make a scatterplot of (ln) log versus L1Perform the least-squares regression on the transformed dataDraw the scatterplotPlot the residuals versus L1With the regression equation in Y1, define Y2 = e^(Y1) or Y2 = log^(Y1).
10Calculator Key Strokes Power law modelingEnter the explanatory data into L1 and response data into L2Draw the scatterplot y versus xDefine L3 as the (natural) logarithm of L1 and define L4 as the (natural) logarithm of L2Plot L4 versus L3Calculate the regression equation for the transformed data and store it in Y1Construct a residual plotDefine Y2 as (10^a)(x^b) or (e^a)(x^b)Plot Y2 and the scatterplot for the original data togetherTo make a prediction for the value x = k, evaluate Y2(k) on the home screen
11Helpful HintsWhen the explanatory variable is years, transform the data to “years since” so that the values are smaller and don’t create overflow problems when you perform the inverse transformationIf there is a clear explanatory/response relationship, compare the conditional distributions of the response variable for the separate values of the explanatory variableEven when direct causation is present, it is rarely a complete explanation of an association between two variables
12Depths (m)Light intensity5168.006120.42786.31861.87944.341031.781122.78Q1Some college students collected data on the intensity of light at various depths in a lake. Here are their data:Make a scatterplot suitable for predicting light intensity from depth. Describe the form of the relationship.To verify that the decrease in light intensity follows an exponential model, calculate the ratio of light intensity at consecutive depths. Start with /168.00= what do you conclude?Take the natural logarithm(ln) of the light intensity measurements and plot these values against the corresponding depth. Does this transformation achieve linearity?Calculate the least-square regression equation for the transformed data. Interpret the slope and y intercept of this equation in this setting.Construct and interpret a residual plot.Perform the inverse transformation to express light intensity as an exponential function of depth in the lake. Display scatter plot of the original data with the exponential model superimposed. Is your exponential function a satisfactory model for the data?Use your model to predict the light intensity at a depth of 22 meters. The actual light intensity reading at the depth was .58 lumens. Does this surprise you?
13Answer Q1 A) the relationship is strong, negative, and curved. B) the ratios are all 0.717, so an exponential model is appropriate.C) it achieves linearity.D) if x= depth and y=ln(light intensity), then 𝑦 = x. T5hye i8ntercept, , provides an estimate for the average value of the natural log of the light intensity decreases on average by for each one meter increase in depth.E) the residual plot shows a fairly random scatter and relatively small residuals, so the linear model is appropriate.F) if x=depth and y=light intensity, y=(e^6.789)(e^-.333x). It is a satisfactory model.G) at 22m, the predicted light intensity would be .584 lumens. No, not surprised.
14Q2Some high school physics students dropped a ball and measured its height at various points along its descent. Table 4.3 shows the time since release and the distance the ball had fallenMake a scatterplot suitable for predicting distance fallen from time since release. describe the direction, form, and strength of the relationship.Perform an appropriate transformation to achieve linearity . Then find a least-square regression model for the transformed data.Comment on the quality of your model in (b) by referring to a residual plot and 𝑟 2 .Make a scatter plot of the point (time, 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 ) to see if this transformation works. Then find a least-square regression model for the transformed data.Comment on the quality of your model in (d) by referring to a residual plot and 𝑟 2Use the two models you obtained in (b) and (d) to predict the distance that the object had fallen after 0.47 seconds. Which prediction do you think is closer to the actual value? Why?timedistance.1612.1.2429.8.2532.7.342.844.2.3255.8.3663.565.1.5124.6129.7.57150.2.61182.21189.4.68220.4.72254.0261.0.83334.6.88375.5.89399.1
15Answer Q2 (a) relationship is curved, strong, and positive. (b) if x = time and y = distance, predicted y = x^2(c) r^2 = and the residual plot shows random scatter and fairly small-sized residuals, so this looks like an appropriate model(d) yes. Square-root of the predicted y = x(e) r^2 = and the residual plot show no pattern, which suggest a good model(f) using model from (b): cm. using model from (d): cm
16Q3Here are data from eight schools on smoking among students and among their parents.How many students are described in the two-way table ?What percent of these students smoke?Give the marginal distribution of parents’ smoking behavior, both in counts and in percents.Calculate three conditional distributions of students’ smoking behavior: one for each of the three parental smoking categories. Describe the relationship between the smoking behaviors of students and their parents in a few sentences.Neither parent smokeOne parent smokeBoth parents smokeStudents does not smoke116818231380Student smoke188416400
17Answer Q3 A) 5375 students B) 18.7% C) both parents smoke: 1780, 33.1%. One parent smokes: 2239, 41.7%. Neither parents smoke: 1356, 25.2%.D) student smokes, given both parents smoke: 400/( )= student doesn’t smoke, given both parents smoke: 1380/( )= student smoke, given one parent smokes: 416/( )= student doesn’t smoke, given one parent smokes: 1823/( )= student smokes, given neither parent smokes : 188?( )= student doesn’t smoke, given that neither parent smokes: 1168/( )= students who smoke are most likely to come from families where one or more of their parents smoke.
18Q4Whether a convicted murder gets the death penalty seems to be influenced by the race of the victim. Here are data on 326 cases in which the defendants was convicted of murderUse these data to make a two-way table of defendant’s race vs. death penaltyShow that Simpson’s paradox holds: a higher percent of white defendants are sentenced to death overall, but for the black and white victims a higher percent of black defendants are sentenced to death.Use the data to explain why the paradox hold in language that a judge could understandWhite defendantBlack defendantWhite victimBlack victimDeath19116Not13295297
19Answer Q4A) white defendant: 19 yes, 141 no. Black defendant: 17 yes, 149 no.B) overall death penalty: 11.9% of white defendants, 10.2% of Black defendants. For white victims, 12.6% and 17.5%; for black victims, 0% and 5.8%.C) the death penalty is more likely when the victim was white(14%) rather than lack (5.4%). Because most convicted killers are of the same race as their victims, whites are more often sentenced to death.
20Q5A study showed that woman who work in the production of computer chips have abnormally high numbers of miscarriages. The union claimed that exposure to chemical used in production causes the miscarriage. Another possible explanation is that these workers spend most of their time standing up. Can we conclude that exposure to chemicals causes more miscarriages? Why or why not?
21Answer Q5No. The “number of hours standing up at work” is a confounding variable.
22Q6A study finds that high school students who take the SAT, enroll in an SAT coaching courses, and then take the SAT a second time raise their SAT mathematics scores from a mean of 521 to a mean of 561. what factors other taking the course might explain this improvement?
23Answer Q6The variable “knowledge gained as a result of taking the SAT previously is a confounding variable.