Transforming Relationships

Slides:



Advertisements
Similar presentations
Chapter 12: More About Regression
Advertisements

Section 10-3 Regression.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.2.
4.1: Linearizing Data.
Chapter 10 Re-Expressing data: Get it Straight
Chapter Four: More on Two- Variable Data 4.1: Transforming to Achieve Linearity 4.2: Relationships between Categorical Variables 4.3: Establishing Causation.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Lesson Quiz: Part I 1. Change 6 4 = 1296 to logarithmic form. log = 4 2. Change log 27 9 = to exponential form = log 100,000 4.
+ Hw: pg 764: 21 – 26; pg 786: 33, 35 Chapter 12: More About Regression Section 12.2a Transforming to Achieve Linearity.
C HAPTER 3: E XAMINING R ELATIONSHIPS. S ECTION 3.3: L EAST -S QUARES R EGRESSION Correlation measures the strength and direction of the linear relationship.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Lecture 3: Bivariate Data & Linear Regression 1.Introduction 2.Bivariate Data 3.Linear Analysis of Data a)Freehand Linear Fit b)Least Squares Fit c)Interpolation/Extrapolation.
More about Relationships Between Two Variables
Warm-up A. P Stats Shape-Changing Transformations/ Stats – 3.4 Diagonstics Copy the two columns and match them.
Transforming to achieve linearity
Chapter 4: More on Two-Variable (Bivariate) Data.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Statistical Analysis Topic – Math skills requirements.
Transforming Relationships Chapter 4.1: Exponential Growth and Power Law Models Part A: Day 1: Exponential Growth.
Chapter 10 Correlation and Regression
M25- Growth & Transformations 1  Department of ISM, University of Alabama, Lesson Objectives: Recognize exponential growth or decay. Use log(Y.
Exponential Regression Section Starter The city of Concord was a small town of 10,000 people in Returning war veterans and the G.I.
7-8 Curve Fitting with Exponential and Logarithmic Models Warm Up
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright © 2010 Pearson Education, Inc. Slide A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a.
Transforming Data.  P ,4  P ,7,9  Make a scatterplot of data  Note non-linear form  Think of a “common-sense” relationship.
Scatter Plots, Correlation and Linear Regression.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
AP Statistics Section 4.1 A Transforming to Achieve Linearity.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Least Squares Regression Lines Text: Chapter 3.3 Unit 4: Notes page 58.
1.5 Linear Models Warm-up Page 41 #53 How are linear models created to represent real-world situations?
Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL.
Chapter 10 Notes AP Statistics. Re-expressing Data We cannot use a linear model unless the relationship between the two variables is linear. If the relationship.
A little VOCAB.  Causation is the "causal relationship between conduct and result". That is to say that causation provides a means of connecting conduct.
UNIT 8 Regression and Correlation. Correlation Correlation describes the relationship between two variables. EX: How much you study verse how well you.
Chapter 4 More on Two-Variable Data. Four Corners Play a game of four corners, selecting the corner each time by rolling a die Collect the data in a table.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.2 Transforming.
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 12: More About Regression
Splash Screen.
Transforming Relationships
A little VOCAB.
Section 3.2: Least Squares Regression
Chapter 12: More About Regression
Chapter 10 Re-Expressing data: Get it Straight
Day 13 Agenda: DG minutes.
Section 9-3   We already know how to calculate the correlation coefficient, r. The square of this coefficient is called the coefficient of determination.
Lecture Slides Elementary Statistics Thirteenth Edition
Ch. 12 More about regression
Chapter 12: More About Regression
Regression.
CHAPTER 12 More About Regression
Advanced Placement Statistics Section 4
Transforming Relationships
Chapter 12 Review Inference for Regression
Chapter 12: More About Regression
Chapter 12: More About Regression
Chapter 12: More About Regression
CHAPTER 12 More About Regression
Chapter 12: More About Regression
Chapter 12: More About Regression
Chapter 12: More About Regression
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Transforming Relationships
CHAPTER 12 More About Regression
Section 9-3   We already know how to calculate the correlation coefficient, r. The square of this coefficient is called the coefficient of determination.
Presentation transcript:

Transforming Relationships AP Statistics Practice of Statistics Section 4.1

What You’ll Learn Recognize when the relationship between two variables is either an exponential relationship or a power relationship  Perform the appropriate transformation to “linearize” the data, find the LSRL on the transformed points, “untransform” to find a model for the original data

Not everything is Linear! We’ve looked at several sets of data in which the relationships are linear in nature What about those relationships that exhibit a different “nonlinear” pattern? Consider for a moment gypsy moths. An outbreak of gypsy moths in Massachusetts from 1978 to 1981 resulted in many acres of defoliated land. The acreages are listed in the following table.

Gypsy Moths The data and graph depict the number of acres defoliated by gypsy moths in Massachusetts between 1978 and 1981. Years 1978 1979 1980 1981 Acres of Defoliated land 63042 226260 907075 2826095 Calculator: Create a scatter plot L1: Years L2: Acres Stat Plot, On, scatterplot, zoom9

So, this doesn’t look too bad So, this doesn’t look too bad! Let’s try a linear regression on the data, remembering to check both the correlation coefficient and the residual plot. Calculator: Stat Calc 4 Store RegEQ, VARS, Y-VARS, Y1 Calculate, Graph (LSRL appears) OR Stat Calc 4 Y=, VARS, 5, EQ, 1RegEQ Dependent Variable: Acres Independent Variable: Year Acres = -1.7746007E9 + 896997.4 (Year) Sample size: 4 R (correlation coefficient) = 0.9136 R-sq = 0.8347045 Estimate of error standard deviation: 631139.44 Well a visual of the line doesn’t look too bad, and that’s a great correlation coefficient. (remember though, sometimes “r” is deceptive---be sure to check the residuals!)

The Residuals A check of the residuals indicates that a linear model is NOT appropriate! (Notice the parabolic pattern in the plot that even with only 4 data points can be seen!)

So, what type of relationship is this? Remember from linear regression that when the relationship is linear, the response variable increases (or decreases) by a constant amount. (Add or subtract the same number each time) Years Since 1977 1 2 3 4 Acres of defoliated land 63042 226260 907075 2826095 Difference in Acres 163218 680815 1919020 Notice that the difference between number of acres is not constant With this in mind and the problem with the residual plot, let’s consider another type of relationship.

Exponential Relationships In an exponential relationship, the response variable increases by a fixed percentage of the previous total. In other words, we should be able to multiply the previous value by some constant to get the next one. So, let’s check out this possibility (we will again disregard the increase from 1990-1993 and only look at the increases for 1-year intervals. Years Since 1977 1 2 3 4 Acres of defoliated land 63042 226260 907075 2826095 Ratio (Next/Prev) 3.5890 4.0090 3.1156 Ratio: 226260/63042 = 35890 Notice that although the ratio is not exactly the same (we wouldn’t expect it to be exact with “real” data) that there does appear to be a pretty consistent ratio value.

So How Do We Create the Model? If the relationship is an exponential one, we can use a mathematical transformation to “linearize” the data, find the LSRL of the transformed data, then “untransform” to find the model that will fit the original data. Ok, so let’s take it step by step

Finding the Model Step 1: Use a mathematical model to “linearize” (create a new data set whose relationship is linear) If the original data is exponential, find the logarithm (either common log or natural log) of each of the response values. When working with years it is also helpful to “code” the year data so our calculators can handle the values (most computer programs are capable of creating models using the full year) To do this we will take each year and subtract 1977 (this way all of our values are > 0) Calculator: Stat Edit L1: 1, 2, 3, 4 L3, up to select, Log(L2), Enter Years 1978 1979 1980 1981 Acres of Defoliated land 63042 226260 907075 2826095 Years Since 1977 1 2 3 4 Log10 (acres) 4.7996 5.3546 5.9576 6.4512

Finding the Model Now, let’s check a scatterplot of the transformed data Calculator: Stat Plot, On, Scatter L1,L3 Graph, Zoom9 Notice the change in the pattern from our original data to the transformed data. The logarithm transformation really “straightened our data”. (Using the natural logarithm would have had the same effect, our values would have just been different)

Finding the Model Step 2: Find the LSRL for the transformed data (remember to check the “r” and the residuals!) Calculator: Stat Calc 4, L1, L3 Enter 2nd Zero, DiagnosticOn Dependent Variable: log10(Acres) Independent Variable: Year-1977 log 10(Acres) = 4.2513404 + 0.5557706 (Year-1977) Sample size: 4 R (correlation coefficient) = 0.9993 R-sq = 0.9985874 Estimate of error standard deviation: 0.033050213 This model looks promising, but remember to CHECK THE RESIDUALS!!!

Check the Residual Plot Calculator: Stat Edit L4, up, select Enter LSRL equation, 4.2513404 + 0.5557706 (L1) Enter, this populates the y-hat data in L4. Stat Edit L5, up select Enter Residual equation, L3 – L4, Enter. Remember, L3 is the new (log) transformed y, and L4 is y-hat Stat Plot, On, Scatter, L1 L5 , Graph, Zoom9 A check of the residuals confirms that a exponential model is appropriate. (No pattern is present now).

“Untransforming” to find the model for our original data ★ Remember that our goal was to find a model that we could use for prediction of the number of defoliated acres of land for a given year. ★ The linear model we have would predict the common logarithm of acres. In order for our model to be useful, we need to reverse the transformation to create the model that fits the original data. ★ Although many transformations are easier to “untransform” after evaluating, we can use the properties of logarithms with both exponential and power (we’ll look at those next) to find the model for our original data.

Properties of Logarithms Before we try to “untransform”, let’s review the properties of logarithms you learned in Algebra (yes, you really did learn these!) Logb xy = logb x + logb y (Addition rule) Logb xm = mlogb x (Power rule) Logb bn = n (Same base) 10logn = n Logb(x/y) = logb x – logb y (Subtraction rule) Since any subtraction can be changed to an addition equation, we will not use this last rule much!

Rewriting Log/Exponential Forms Also recall rewriting from Exponential to Logarithmic form: bx = a logba = x “log base answer = exponent”

Review Exponent Rules

Homework: Notebook, page 69 and 70

Day 2: UNTRANSFORMING Linearized Data Notes: Page 73, 74

“Untransforming” exponential expressions An exponential function takes the form: y = abx, where a, b are constants (This is the form we want to end up with) So, let’s get started log10 (Acres) = 4.2513404 + 0.5557706 (Year-1977) Linear regression of the transformed data Raise both sides using power of 10 (same base) Same base law and multiplication law for exponents. Simplify the constants 10log10(Acres) = 10 4.2513404 + 0.5557706 (Year-1977) Acres = (10 4.2513404) (10.5557706(Year-1977)) Acres = 17837.7634 (3.5956(Year-1977)) This is now in the form of y=abx, where a=17837.7634 and b = 3.5956 Notice that “b” is approximately the average of the ratios (next/prev) we calculated when we began looking for a model.

So, does it fit our original data? Since our original goal was to find a model that would allow us to predict the number of acres of defoliated land if we knew the year, we need to check to see if our model actually fits the data. The model looks pretty good, but as with any model we need to use caution when predicting outside our original data range.

Power Models Another important transformation used in modeling is the power model. Power models have the form Y = axb where a and b are constants We can find an appropriate power model by taking the logarithms for both the response and explanatory variables, finding the linear regression for the transformed data, then using the laws of logarithms and exponents to “untransform” Let’s look at an example

Fishing Tournament In a fishing tournament that you are in charge of you need to find a way to record the weight of each fish caught without destroying or killing the fish. Since it is easier to measure the length of the fish rather than it’s weight, we must find a way to convert the length to weight. The local marine research lab has been gracious enough to provide you with the data for the average length and weight at different ages for Atlantic Ocean rockfish which model most fish species growing under normal feeding conditions.

The Data Age (yr) Length (cm) Weight (g) 1 5.2 2 8.5 8 3 11.5 21 4 14.3 38 5 16.8 69 6 19.2 117 7 21.3 148 23.3 190 9 25.0 264 10 26.7 293 11 28.2 318 12 29.6 371 13 30.8 455 14 32.0 504 15 33.0 518 16 34.0 537 17 34.9 651 18 36.4 719 19 37.1 726 20 37.7 810 Since length is one dimensional and weight is three dimensional we should be able to find a reasonable model using power model (the residuals for a regression on the original data confirms that the variables are NOT linearly related—but we already knew that!) As before we need to first transform our data but we have to perform transformations on both length and weight

Transforming the Data Age (yr) Length (cm) Log 10 (length) Weight (g) Log10 (weight) 1 5.2 .7160 2 .3010 8.5 .9294 8 .9031 3 11.5 1.0607 21 1.3222 4 14.3 1.1553 38 1.5798 5 16.8 1.2253 69 1.8388 6 19.2 1.2833 117 2.0682 7 21.3 1.3284 148 2.1703 23.3 1.3674 190 2.2788 9 25.0 1.3979 264 2.4216 10 26.7 1.4265 293 2.4669 11 28.2 1.4502 318 2.5024 12 29.6 1.4713 371 2.5694 13 30.8 1.4886 455 2.6580 14 32.0 1.5052 504 2.7024 15 33.0 1.5315 518 2.7143 16 34.0 1.5428 537 2.7300 17 34.9 1.5611 651 2.8136 18 36.4 1.5694 719 2.8567 19 37.1 1.5763 726 2.8609 20 37.7 810 2.9085 This scatterplot indicates that a linear regression on the logarithms of both variables is certainly one to consider.

Linear Regression on the transformed data Simple linear regression results: Dependent Variable: log10(Weight(g)) Independent Variable: log10(Length(cm)) log10 (Weight(g)) = -1.8993973 + 3.049418 log10 (Length(cm)) Sample size: 20 R (correlation coefficient) = 0.9993 R-sq = 0.9985228 A check of the correlation coefficient is certainly promising (r=.9993), the scatterplot of the transformed data indicates the line fits very well, and most importantly-----look at those residuals!!! Yes, statisticians get very excited when they see residuals that look that good!

“Untransforming” a power model log10 (Weight(g)) = -1.8993973 + 3.049418 log10 (Length(cm)) 10log10(Weight(g)) = 10-1.8993973 + 3.049418 log10(length(cm)) Weight = 10-1.8993973 (103.049418log10(length(cm))) Weight = 10-1.8993973(10log10(length(cm))3.049418) Weight = 10-1.8993973(length(cm))3.049418) Weight = .01261 (length(cm))3.049418 Linear equation of the transformed data Raise both sides using a base of 10 Same base and Multiplication law for exponents Power rule for logarithms Same base Simplify constants Last check: plot the new model on the original data. Looks like we’ve got a model that will be very useful for estimating the weight of a fish if we know its length!

Are there Other Possibilities? There are many other possibilities to transform data in order to find a model. If either an exponential or power model is not appropriate you may try: Square the response or explanatory variable Take the square root of either variable Take the reciprocal of either variable The possibilities are endless, but for now we will concentrate mostly on either an exponential or power model.

Transforming on the TI There are a couple of different ways to find both an exponential and power regression model on your TI-calculator Using lists to transform Using the built in regression models

Using lists to transform We’ll use the Gypsy Moth data first. Enter in lists 1 & 2 L1: years since 1977 L2: acres of defoliated land Take the common log of the values in list 2 and put the new values in list 3 L3: log (L2) Now do a linear regression on lists 1 & 3 You can check residuals just like we did before to verify this regression. Now “untransform” as we did before to get the exponential Note: for a power model create another list for the logarithm of the explanatory variable and do the linear regression on these two lists.

Using the Regression Models The TI family of calculators has both an exponential and power model built into the stat calc menus. Create a list for the explanatory variable and one for the response variable From the home screen STAT CALC 0:ExpReg (A:PwrReg) L1, L2 The model does not need untransforming The residuals created are the residuals from the linear transformation on the transformed data (yes, your calculator actually transforms the data, does a linear regression, then untransforms

How to decide which model Creating mathematical models for real data involves a lot of trial and error. One strategy: Try a linear model first ( residuals) Then try an exponential model ( residuals) Then try a power model ( residuals) If all residuals show a pattern, you can continue to try different transformations or choose the one with the best correlation Remember, no model is perfect, some models are useful…..we wish to find a useful model.

Homework: Notebook, page 71, problem #1 only Handout “Practice Before Quiz 3.3”