Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transforming to Achieve Linearity

Similar presentations


Presentation on theme: "Transforming to Achieve Linearity"— Presentation transcript:

1 Transforming to Achieve Linearity
Section 12.2 Transforming to Achieve Linearity

2 Example 1: A fisheries biologist wants to predict the weight (in grams) of perch (a type of fish) caught in a certain lake from their length (in cm). He catches, measures, and weights 13 perch whose lengths were between 8 and 48 cm. Below is a scatterplot of his data, along with the residual plot from a linear regression analysis. a) Is a linear model appropriate for these data? Justify your answer. A linear model is not appropriate because there is a curved pattern in both the scatter and residual plots.

3 If the scatterplot of logarithm (or natural logarithm) of the response variable values and the original explanatory values has a linear form, then the 2 variables can be modeled using an exponential function. If the scatterplot of logarithm (or natural logarithm) of the response variable values and the logarithm (or natural logarithm) of the explanatory values has a linear form, then the 2 variables can be modeled using a power function.

4 b) Below is a scatterplot of the natural logarithm of weight vs
b) Below is a scatterplot of the natural logarithm of weight vs. the natural logarithm of length. This relationship is clearly more linear than the one above. Does this suggest that the relationship between length and weight can be modeled by an exponential function or by a power function? Explain.

5 The relationship between length and weight can be modeled by a power function because when the ln was taken of each variable the resulting scatterplot showed a linear pattern.

6 c) Computer output from the regression of ln (Weight) vs
c) Computer output from the regression of ln (Weight) vs. ln (Length) is given below. Use it to predict the weight of a fish that is 75 cm long.

7 Example 2: Is there a link between the amount of cigarette smoking in countries and death rates from coronary heart disease (CHD)? Below is computer output from a regression analysis of this relationship for 14 randomly-selected countries from around the world, along with a residual plot. The explanatory variable is annual consumption of cigarettes per person and the response variable is annual deaths from coronary heart disease per 100,000 people.

8 a) What is the equation of the least-squares regression line based on these data? Define any variables used. b) Interpret the slope of the regression line. A one-cigarette increase in the annual number of cigarettes consumed in a country is associated with a predicted increase of in annual deaths from CHD.

9 c) If we are trying to determine the relationship between these two variables throughout the world, is the slope you provided in part (b) a statistic or a parameter? Explain. This is a statistic: it is an estimate of the population regression slope based on this particular random sample of 14 countries.

10 d) Assuming all conditions have been met, construct and interpret a 90% confidence interval for the slope of the least squares regression of annual CHD deaths on annual cigarette consumption. State: We want to estimate β, the true slope of the population regression line relating annual cigarette consumption to annual deaths from CHD, with 90% confidence. Plan: We are told to assume all conditions for inference have been met, so we will use a t-interval for the slope to estimate β.

11 Do: df = 14 – 2 = 12 For a 90% confidence level, the critical value is t* = So the 90% confidence interval for β is ± 1.782( ) ≈ ± (–0.0116, ) Conclude: We are 90% confident that the interval from – to captures the actual slope of the population regression line relating annual deaths by CHD to annual cigarette consumption per person in all countries.

12 e) If you were to perform a test of the hypotheses H0: β = 0 versus Ha: β ≠ 0 at the α = 0.10 level, what would you conclude? Justify your answer using your result in part (d). Since the 90% confidence interval contains 0, we fail to reject H0 at the α = 0.10 level. We do not have enough evidence to suggest that the slope of the population regression line relating annual cigarette consumption to annual deaths from CHD is different from 0.

13 Example 3: Lupe is shopping for a used car and collects data on age (in years) and price (in 1000s of dollars) for Ford Taurus sedans on a used-car web site. The computer output for three different regression models: Price vs. Age, Log (Price) vs. Age, and Log (Price) vs. Log (Age) are shown on this page and the next. I. Price versus Age

14 II. Log Price versus Age

15 III. Log Price versus Log Age

16 a) Explain how the information provided suggests that a linear model may not be appropriate for describing the relationship between car age and price. The scatterplot shows a curved relationship between Price and Age. This is reinforced by the distinctive “U-shaped” pattern in residual plot—positive residuals for high and low ages and negative residuals in between.

17 b) Would an exponential model or a power model provide a better description of this relationship? Use the information provided to justify your answer. The plot of Log (Price) versus Age is clearly linear, and the residual plot shows a random scatter of points on either side of the line residual = 0. (The Log (Price) versus Log (Age) scatterplot and residual plot do not suggest a linear relationship). If Log (Price) vs. Age is roughly linear, then Price vs. Age can be modeled well by an exponential function.

18 c) Give the equation of the model you chose in part (b), using the transformed variable(s).

19 d) Use the model you chose in part (c) to predict the price of a 5-year-old Ford Taurus. Show your work!


Download ppt "Transforming to Achieve Linearity"

Similar presentations


Ads by Google