Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transforming to achieve linearity

Similar presentations


Presentation on theme: "Transforming to achieve linearity"— Presentation transcript:

1 Transforming to achieve linearity

2 Body and brain weight of 96 species of mammals
For this data, r = 0.86, but why might we not trust the given correlation? If we remove the elephant, the correlation changes to r = 0.5!

3 Body and brain weight of 96 species of mammals
Here is a close up of the blob in the lower-left corner. Is the data linear? The data is not exactly linear- notice the data bends to the right as body weight increases.

4 How does our data look now?
Biologists know that data on sizes often behave better if we take logarithms before doing more analysis. This plot graphs the logarithm of brain weight against the logarithm of body weight for all 96 species. How does our data look now?

5 Applying a function such as the logarithm or square root to a quantitative variable is called transforming or re-expressing the data.

6 In this chapter, we'll focus on the third reason.
Why transform? To make the distribution of a single variable (as seen in a histogram, for example) more symmetric. To make the spread of several groups (as seen in side-by-side boxplots) more alike. To make the form of a scatterplot more nearly linear (as seen in the previous example). Make the scatter in a scatterplot spread out evenly rather than following a fan shape. In this chapter, we'll focus on the third reason.

7 Common Transformations
Transformations we may use include raising our data to a power (like squared or cubed), square rooting our data, taking the logarithm of our data, or taking the reciprocal of our data.

8 Common transformations
The situation may help us know which transformations will best achieve linearity. For example... A problem dealing with area might benefit from squaring the data (power of 2) since area involves square units. A problem dealing with weight or volume might benefit from cubing or cube-rooting (a power of 3 or one-third) the data since volume involves cubic units. Data involving a ratio (like miles per gallon) might benefit from a reciprocal transformation (power of -1).

9 Example 4.2 This example has data comparing the lengths and weights of fish, and asks us to find a model that helps us predict the weight of a fish given its length.

10 Weight versus length of fish
Here's a graph of the data. Describe the form of the data. Since the data is not linear, we want to try a transformation that will make it linear.

11 Common transformations
Which transformation should we try? A problem dealing with area might benefit from squaring the data (power of 2) since area involves square units. A problem dealing with weight or volume might benefit from cubing or cube-rooting (a power of 3 or one-third) the data since volume involves cubic units. Data involving a ratio (like miles per gallon) might benefit from a reciprocal transformation (power of -1).

12 Weight versus length3 Notice what happens to our graph when we cube all our lengths. Our form is now linear.

13 Weight versus length3 The least-squares regression line is
weight = length3 with r2 = 0.995 Notice our explanatory variable is length3, because we cubed all our lengths. Would you feel comfortable using this model for prediction?

14 Weight versus length3 What can you say about the residual plot?
Despite the slight pattern in the residual plot, the residuals themselves are quite small compared to the hundreds of grams we were measuring our fish in. We should be safe using our LSRL for prediction.

15 Prediction So to predict the weight of a fish with a length of 36 centimeters, plug 36 into our LSRL weight = length3 weight = (36)3 weight = grams

16 The ladder of powers A review of functions
When transforming with powers (like in the last example), a general understanding of different power functions can sometimes help, since we could use any of these powers in transforming our data.

17 The ladder of powers A review of functions
The power of 1 graph is a straight line

18 The ladder of powers A review of functions
Powers greater than one (like 2 and 4) give graphs that bend upward.

19 The ladder of powers A review of functions
Powers less than 1 but greater than 0 (like 0.5 or the square root) give graphs that bend downward.

20 The ladder of powers A review of functions
Powers less than zero (like -1 or the reciprocal transformation) give graphs that decrease as x increases.

21 The ladder of powers A review of functions
The zero power in the ladder is replaced by the graph of logx.

22 A country's GDP & life expectancy
So let's say we were looking at a graph such as this, which compares a country's gross domestic product and life expectancy, and we wanted to linearize the data.

23 A country's GDP & life expectancy
There isn't an obvious relationship between GDP and life expectancy like there was between length & weight, so just start somewhere on the ladder and move down.

24 A country's GDP & life expectancy
Here's our data to the power of 0.5, or in other words square rooted. Compare our new r value to the old. How linear is the data?

25 A country's GDP & life expectancy
We could do better, so let's go down the ladder another step to see what happens.

26 A country's GDP & life expectancy
Here's the log of our data (which takes the power of 0 on the ladder). Compare our new r value to the old. How linear is the data? Let's go one more step on the ladder.

27 A country's GDP & life expectancy
Here's our data to the power of -0.5, or in other words the reciprocal square rooted. Compare our new r value to the old. How linear is the data?

28 A country's GDP & life expectancy
I'm sure you noticed that as we moved down the ladder of powers, the scatterplots became straighter. This final plot has a fairly linear form apart from the outliers.

29 Also note that not all data will become linear with a transformation.
Although this guess and check method ultimately accomplished the goal of achieving linearity, the ladder of powers is rarely used in practice. It is much more satisfactory to begin with a theory or mathematical model that we expect to describe a relationship, (as in the length and weight of fish example.)‏ Also note that not all data will become linear with a transformation.

30 Modeling Nonlinear Data
Nonlinear growth can often be modeled by exponential or power functions. When transformation of the data reveals a linear form on a scatterplot, we can find an LSRL. Using the rules of logs and exponents, we can then perform an inverse transformation which gives us a curve that fits the original data.

31 Review of Log Properties
Rules of Logs

32 An exponential function is a function with the form:
y = abx Note that a is the y-intercept and b is the rate of growth or decay. When a is positive, b > 1 indicates growth. When a is positive, b < 1 indicates decay.

33 Exponential Growth Remember that linear growth increases by a fixed amount in each equal time period. Exponential growth occurs when a variable increases by a fixed multiple (b) in each equal time period. Or another way to say it is that exponential growth increases by a fixed percent of the previous total in each equal time period.

34 Exponential Decay Exponential decay occurs when a variable decreases by a fixed multiple (b) in each equal time period. Or another way to say it is that exponential decay decreases by a fixed percent of the previous total in each equal time period.

35 Common Ratio The rate of increase or decrease for exponential data is called the common ratio. It is the ratio of the y values for equal-interval x values. (yn/yn-1) By determining if there is a common ratio, we can determine if data are exponential. Note that real data will not have a perfect common ratio. We are looking to see if the ratio is approximately the same.

36 Common Ratio Determine if the following data sets represent exponential growth, exponential decay or neither. If it is exponential, what is the common ratio? Dataset A Dataset B x y 2 270 3 92 4 31 5 10 6 7 1 x Y 1 10 2 34 3 51 4 66 5 83 6 97

37 More on Moore's law You may recall last time talking about Moore's law, which predicted in 1965 that the number of transistors on an integrated circuit chip would double every 18 months

38 Construct a scatterplot of the data.
More on Moore's law Construct a scatterplot of the data.

39 Your plot should have looked like this
We will answer two questions 1. What transformation will linearize the data? 2. Is the data exponential?

40 Verifying exponential growth
The answer to question 1: Taking the log of the response variable will linearize the data The answer to questions 2: the data is exponential The log (or ln) of the response variable plotted vs the explanatory variable will always produce a linear relationship if the data is exponential!

41 Verifying exponential growth
In other words, if our data are growing exponentially and we plot the logarithm (base 10 or base e) of y against x, we should observe a straight line for the transformed data. Perform the transformation and graph ln y vs. x.

42 Verifying exponential growth
Your graph should look like this. It's fairly linear. Let's perform a regression to see how linear.

43 Verifying exponential growth
Perform a linear regression and record your regression equation, correlation, and r2 values. Check your residual plot. How does the model fit?

44 Verifying Exponential Growth
The residual plot is shown on page 274 of your text Once again, there is a slight pattern to our residuals, but they are so small that we can justify using our model to make predictions.

45 Residuals Again When calculating residuals by hand, remember:
For linear data: Residual = observed – predicted value When using transformed data be sure to use the transformed values for the observations: log y vs. x ln y vs. x

46 Predictions using our LSRL
With our regression equation, we can now use it to make predictions. To predict the number of transistors on Intel’s Itanium 2 chip, which was released in 2003, we substitute 33 for “years since 1970” in the regression equation. Ln(transistors) = (years since 1970) Ln(transistors) = (33) = Then change to exponential form (remember ln is base e)

47 Inverse Transformation (Used to obtain a model to fit the original data)
For an exponential model that has been transformed, by taking the ln of the response variable, the inverse transformation is

48 Use the correct inverse transformation to obtain a model that will fit our original transistor data.
Make a scatterplot of the original data and this model. How does it look?


Download ppt "Transforming to achieve linearity"

Similar presentations


Ads by Google