Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 12-2 Transforming Relationships Day 2

Similar presentations


Presentation on theme: "Chapter 12-2 Transforming Relationships Day 2"— Presentation transcript:

1 Chapter 12-2 Transforming Relationships Day 2
AP Statistics

2 The Ladder of Power Let’s examine the ln rung of the ladder a little more closely: POWER (x, ln(y)) (ln(x), y) (ln(x), ln(y)) Comment This is very useful if the values of y increases by a percentage. When x has a wide range or when the scatterplot descends rapidly to the left and levels off to the right When one of the ladder powers is too big and the other one is too small, this is often a useful transformation. Also, if the scatterplot is thickening, this transformation can be very useful.

3 Non-Linear Regression
Let’s examine the relationship between shutter speed and f/stops of a particular camera lens: Shutter speed f/stop 1/1000 2.8 1/60 11 1/500 4 1/30 16 1/250 5.6 1/15 22 1/125 8 1/8 32

4 Non-Linear Regression
Store the ln(F-stop) and ln(shutter speed) into L5 and L8, respectively: POWER (x, y) (x, ln(y)) (ln(x), y) (ln(x), ln(y)) =(speed, f/stop) =(speed, ln(f/stop)) =(ln(speed), f/stop) =(ln(speed), ln(f/stop))

5 Non-Linear Regression
Let’s examine the relationship between shutter speed and f/stops of a particular camera lens: No! Curved No! Curved No! Curved YES! Linear

6 Non-Linear Regression
Let’s examine the relationship between shutter speed and f/stops of a particular camera lens: Although the data looks linear, it’s still possible that it is actually curved. We need to check if this data is actually linear or just appears to be linear. Let’s perform a residual plot on this data.

7 Non-Linear Regression
Let’s examine the relationship between shutter speed and f/stops of a particular camera lens: The points appear to have a random spread about the MODIFIED LSRL line. So, this seems to be a good model to the data - although it may have increasing spread. Be careful when determining the actual LSRL line.

8 Non-Linear Regression
Let’s examine the relationship between shutter speed and f/stops of a particular camera lens: The following is the modified equation: However, in the calculator we have the following: Take the first equation and solve for y-hat; this is the true modified equation of the LSRL: or

9 Non-Linear Regression
Let’s examine the relationship between shutter speed and f/stops of a particular camera lens: Let’s make sure that our new equation fits the original data: Graph it: It looks pretty good. See if you can determine the f/stop for a shutter speed of ¼ . Y1(1/4) =

10 Non-Linear Regression
Let’s attempt to solve the following: What is the equation of the curve of “best-fit”? Predict the salary for a superstar for 2005 Player Year Salary (Millions) Nolan Ryan 1980 1 George Foster 1982 2.04 Kirby Pucket 1990 3 Jose Canseco 4.7 Roger Clemens 1991 5.3 Ken Griffey, Jr. 1996 8.5 Albert Belle 1997 11 Player Year Salary (Millions) Pedro Martinez 1998 12.5 Mike Piazza 1999 Mo Vaughn 13.3 Kevin Brown 15 Carlos Delgado 2001 17 Alex Rodriguez 25.2

11 Non-Linear Regression
Let’s attempt to solve the following: What is the equation of the curve of “best-fit”? Predict the salary for a superstar for 2005 If we examine the scatterplot for this data, it is obvious that the data is curved, so a transformation is needed to answer this question This data is definitely curved

12 Non-Linear Regression
Let’s check the ln transformations Store the ln(x) and ln(y) into new lists: POWER (x, y) (x, ln(y)) (ln(x), y) (ln(x), ln(y)) =(year, salary) =(year, ln(salary)) =(ln(year), salary) =(ln(year), ln(salary)) We already know that this is curved

13 Non-Linear Regression
Best One!! Check all three models… Original (x, y) Exponential (x, ln(y)) Scatter plot Residual plot Scatter plot Residual plot Logarithmic (ln(x), y) Power (ln(x), ln(y)) Scatter plot Residual plot Scatter plot Residual plot

14 Non-Linear Regression
After checking the scatter plots and the residual plots, we see that (x, ln(x)) is the best transformation. Logarithmic (ln(x), y) These are NO GOOD!!! POWER (ln(x), ln(y)) Scatter plot Residual plot Scatter plot Residual plot However, we should check the ladder of powers to make sure that there is not a better transformation.

15 Non-Linear Regression
The ladder of powers. Let’s try to transform the data according to the ladder: Power: ½ ln ½

16 Non-Linear Regression
To get a better picture of the somewhat linear models, let’s look at the residual plots Power: ½ ln ½ This is definitely curved This residual plot still looks curved We’ve already done this one This looks pretty good too; although, it may be slightly curved This is definitely curved This is the best one!!!

17 Non-Linear Regression
Let’s examine the relationship between years and baseball superstar salaries: best transformation (x, ln(y)) The following is the modified equation: However, in the calculator we have the following: Change the equation to account for the transformation. Solve for y. This is the equation of the function of “best-fit.”

18 Non-Linear Regression
Now let’s predict the salary of a superstar in 2005: Plug in 2005: The average baseball superstar in 2005 will make about million If we look at the graph, it is evident that the curve fits the data However, we should be wary of extrapolation

19 Common Errors Do we have to re-scale our numbers?
We need to be very careful what numbers we use when we try to transform our data. Data values that are far from 1 are often not affected much by our transformations unless the range is very large. It is often useful to try to get numbers between 1 and 100 since our re-expression will have a greater affect on 1 to 100 than from 100,001 to 100,100. Why can’t I find the “perfect” model? Don’t expect to ever find the perfect model – more likely than not, it doesn’t exist. Just remember that “real-world” data is messy and it is difficult to find a model that will fit the data perfectly.

20 Common Errors Is it me or does that graph look weird?
Re-expression can straighten many relationships, but not those that that go up and down and up again (or something to that effect). You should refuse to analyze such data with methods that require a linear form. Why is my re-expression missing data? It is impossible to re-express negative values for several rungs on our ladder of powers. Such values are omitted by the calculator and the effect on your re-expression can be significant. Try not to lose good data values while transforming your data. Sometimes adding small values such as 1/2 or 1/6 is useful.

21 Transformations Re-expressed Model Type of Model Transformation
Exponential Logarithmic Power Transformation New Model

22 Can’t We Just Use the Curve?
Although your calculator will do other types of regression (quadratic, exponential, etc.), using the curve has drawbacks. First, lines are easy to understand. Using the curve, throws out all of our understanding of linear regression. We understand how to interpret the slope and the y-intercept, and linear models are more useful in advanced statistical practices. In order to use the curve, we would have to come up with a whole new system of understanding. It’s best to use the linear model.

23 Outliers In regression analysis, a data point that diverges greatly from the overall pattern of data is called an outlier. There are basically four ways that a point can be considered an outlier: It could have an extreme X value compared to other data points. It could have an extreme Y value compared to other data points. It could have an extreme X and Y values compared to other data points. It might be distant from the rest of the data, even without extreme X or Y values.

24 Examples Extreme X Extreme Y Extreme X and Y Distant Data Point

25 Influential Points An influential point is an outlier that greatly affects the slope of the regression line. One way to test the influence of an outlier is to compute the regression equation with and without the outlier. To put it simply, influential points are data points that have disproportionate effects on the slope of the regression equation.

26 Influential Points This type of analysis is illustrated below. The slope is larger when the outlier is present, so this outlier would be considered an influential point. Sometimes, an influential point will cause the coefficient of determination to be bigger; sometimes, smaller. In this example, the coefficient of determination is smaller when the outlier is present. Without Outlier With Outlier Regression equation: ŷ = x Coefficient of determination: R2 = 0.94 Regression equation: ŷ = x Coefficient of determination: R2 = 0.55

27 Example Which statement about influential points is true?
Removal of an influential point changes the regression line. Data points that are outliers in the horizontal direction are more likely to be influential than points that are outliers in the vertical direction. Influential points have large residuals. I and II are true statements. A linear transformation neither increases nor decreases the linear relationship between variables; it preserves the relationship. A nonlinear transformation is used to increase the relationship between variables. The most effective transformation method depends on the data being transformed.


Download ppt "Chapter 12-2 Transforming Relationships Day 2"

Similar presentations


Ads by Google