Chapter 12-2 Transforming Relationships Day 2

Slides:



Advertisements
Similar presentations
Copyright © 2010 Pearson Education, Inc. Slide
Advertisements

Chapter 8 Linear Regression.
Chapter 10 Re-Expressing data: Get it Straight
LSRLs: Interpreting r vs. r2
Chapter 3 Bivariate Data
AP Statistics Chapter 3 Practice Problems
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
CHAPTER 3 Describing Relationships
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
AP Statistics Chapter 8 & 9 Day 3
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Chapter 10: Re-expressing Data It’s easier than you think!
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Copyright © 2010 Pearson Education, Inc. Slide A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a.
Bivariate Data Analysis Bivariate Data analysis 4.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
WARM-UP Do the work on the slip of paper (handout)
Transformations Remember scatterplots from CH3Remember scatterplots from CH3 Insert data L1(x),L2,(y) in your calculatorInsert data L1(x),L2,(y) in your.
1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster ( m/year > m/year)
Chapter 3-Examining Relationships Scatterplots and Correlation Least-squares Regression.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Chapter 8 Linear Regression. Objectives & Learning Goals Understand Linear Regression (linear modeling): Create and interpret a linear regression model.
3.2 Least-Squares Regression Objectives SWBAT: INTERPRET the slope and y intercept of a least-squares regression line. USE the least-squares regression.
CHAPTER 8 Linear Regression. Residuals Slide  The model won’t be perfect, regardless of the line we draw.  Some points will be above the line.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
Statistics: Analyzing 2 Quantitative Variables MIDDLE SCHOOL LEVEL  Session #2  Presented by: Dr. Del Ferster.
CHAPTER 3 Describing Relationships
Quadratic Regression ©2005 Dr. B. C. Paul. Fitting Second Order Effects Can also use least square error formulation to fit an equation of the form Math.
Linear Regression Chapter 8. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Residuals, Influential Points, and Outliers
Chapter 9 Regression Wisdom. Getting the “Bends” Linear regression only works for data with a linear association. Curved relationships may not be evident.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Statistics 8 Linear Regression. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Chapter 4 More on Two-Variable Data. Four Corners Play a game of four corners, selecting the corner each time by rolling a die Collect the data in a table.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
Chapter 8 Part 2 Linear Regression
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Residuals, Influential Points, and Outliers
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
transformations Remember scatterplots from CH3
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Warm-up: Pg 197 #79-80 Get ready for homework questions
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapters Important Concepts and Terms
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Chapter 12-2 Transforming Relationships Day 2 AP Statistics

The Ladder of Power Let’s examine the ln rung of the ladder a little more closely: POWER (x, ln(y)) (ln(x), y) (ln(x), ln(y)) Comment This is very useful if the values of y increases by a percentage. When x has a wide range or when the scatterplot descends rapidly to the left and levels off to the right When one of the ladder powers is too big and the other one is too small, this is often a useful transformation. Also, if the scatterplot is thickening, this transformation can be very useful.

Non-Linear Regression Let’s examine the relationship between shutter speed and f/stops of a particular camera lens: Shutter speed f/stop 1/1000 2.8 1/60 11 1/500 4 1/30 16 1/250 5.6 1/15 22 1/125 8 1/8 32

Non-Linear Regression Store the ln(F-stop) and ln(shutter speed) into L5 and L8, respectively: POWER (x, y) (x, ln(y)) (ln(x), y) (ln(x), ln(y)) =(speed, f/stop) =(speed, ln(f/stop)) =(ln(speed), f/stop) =(ln(speed), ln(f/stop))

Non-Linear Regression Let’s examine the relationship between shutter speed and f/stops of a particular camera lens: No! Curved No! Curved No! Curved YES! Linear

Non-Linear Regression Let’s examine the relationship between shutter speed and f/stops of a particular camera lens: Although the data looks linear, it’s still possible that it is actually curved. We need to check if this data is actually linear or just appears to be linear. Let’s perform a residual plot on this data.

Non-Linear Regression Let’s examine the relationship between shutter speed and f/stops of a particular camera lens: The points appear to have a random spread about the MODIFIED LSRL line. So, this seems to be a good model to the data - although it may have increasing spread. Be careful when determining the actual LSRL line.

Non-Linear Regression Let’s examine the relationship between shutter speed and f/stops of a particular camera lens: The following is the modified equation: However, in the calculator we have the following: Take the first equation and solve for y-hat; this is the true modified equation of the LSRL: or

Non-Linear Regression Let’s examine the relationship between shutter speed and f/stops of a particular camera lens: Let’s make sure that our new equation fits the original data: Graph it: It looks pretty good. See if you can determine the f/stop for a shutter speed of ¼ . Y1(1/4) = 43.612

Non-Linear Regression Let’s attempt to solve the following: What is the equation of the curve of “best-fit”? Predict the salary for a superstar for 2005 Player Year Salary (Millions) Nolan Ryan 1980 1 George Foster 1982 2.04 Kirby Pucket 1990 3 Jose Canseco 4.7 Roger Clemens 1991 5.3 Ken Griffey, Jr. 1996 8.5 Albert Belle 1997 11 Player Year Salary (Millions) Pedro Martinez 1998 12.5 Mike Piazza 1999 Mo Vaughn 13.3 Kevin Brown 15 Carlos Delgado 2001 17 Alex Rodriguez 25.2

Non-Linear Regression Let’s attempt to solve the following: What is the equation of the curve of “best-fit”? Predict the salary for a superstar for 2005 If we examine the scatterplot for this data, it is obvious that the data is curved, so a transformation is needed to answer this question This data is definitely curved

Non-Linear Regression Let’s check the ln transformations Store the ln(x) and ln(y) into new lists: POWER (x, y) (x, ln(y)) (ln(x), y) (ln(x), ln(y)) =(year, salary) =(year, ln(salary)) =(ln(year), salary) =(ln(year), ln(salary)) We already know that this is curved

Non-Linear Regression Best One!! Check all three models… Original (x, y) Exponential (x, ln(y)) Scatter plot Residual plot Scatter plot Residual plot Logarithmic (ln(x), y) Power (ln(x), ln(y)) Scatter plot Residual plot Scatter plot Residual plot

Non-Linear Regression After checking the scatter plots and the residual plots, we see that (x, ln(x)) is the best transformation. Logarithmic (ln(x), y) These are NO GOOD!!! POWER (ln(x), ln(y)) Scatter plot Residual plot Scatter plot Residual plot However, we should check the ladder of powers to make sure that there is not a better transformation.

Non-Linear Regression The ladder of powers. Let’s try to transform the data according to the ladder: Power: 2 ½ ln -½ -1

Non-Linear Regression To get a better picture of the somewhat linear models, let’s look at the residual plots Power: 2 ½ ln -½ -1 This is definitely curved This residual plot still looks curved We’ve already done this one This looks pretty good too; although, it may be slightly curved This is definitely curved This is the best one!!!

Non-Linear Regression Let’s examine the relationship between years and baseball superstar salaries: best transformation (x, ln(y)) The following is the modified equation: However, in the calculator we have the following: Change the equation to account for the transformation. Solve for y. This is the equation of the function of “best-fit.”

Non-Linear Regression Now let’s predict the salary of a superstar in 2005: Plug in 2005: The average baseball superstar in 2005 will make about 29.312 million If we look at the graph, it is evident that the curve fits the data However, we should be wary of extrapolation

Common Errors Do we have to re-scale our numbers? We need to be very careful what numbers we use when we try to transform our data. Data values that are far from 1 are often not affected much by our transformations unless the range is very large. It is often useful to try to get numbers between 1 and 100 since our re-expression will have a greater affect on 1 to 100 than from 100,001 to 100,100. Why can’t I find the “perfect” model? Don’t expect to ever find the perfect model – more likely than not, it doesn’t exist. Just remember that “real-world” data is messy and it is difficult to find a model that will fit the data perfectly.

Common Errors Is it me or does that graph look weird? Re-expression can straighten many relationships, but not those that that go up and down and up again (or something to that effect). You should refuse to analyze such data with methods that require a linear form. Why is my re-expression missing data? It is impossible to re-express negative values for several rungs on our ladder of powers. Such values are omitted by the calculator and the effect on your re-expression can be significant. Try not to lose good data values while transforming your data. Sometimes adding small values such as 1/2 or 1/6 is useful.

Transformations Re-expressed Model Type of Model Transformation Exponential Logarithmic Power Transformation New Model

Can’t We Just Use the Curve? Although your calculator will do other types of regression (quadratic, exponential, etc.), using the curve has drawbacks. First, lines are easy to understand. Using the curve, throws out all of our understanding of linear regression. We understand how to interpret the slope and the y-intercept, and linear models are more useful in advanced statistical practices. In order to use the curve, we would have to come up with a whole new system of understanding. It’s best to use the linear model.

Outliers In regression analysis, a data point that diverges greatly from the overall pattern of data is called an outlier. There are basically four ways that a point can be considered an outlier: It could have an extreme X value compared to other data points. It could have an extreme Y value compared to other data points. It could have an extreme X and Y values compared to other data points. It might be distant from the rest of the data, even without extreme X or Y values.

Examples Extreme X Extreme Y Extreme X and Y Distant Data Point

Influential Points An influential point is an outlier that greatly affects the slope of the regression line. One way to test the influence of an outlier is to compute the regression equation with and without the outlier. To put it simply, influential points are data points that have disproportionate effects on the slope of the regression equation.

Influential Points This type of analysis is illustrated below. The slope is larger when the outlier is present, so this outlier would be considered an influential point. Sometimes, an influential point will cause the coefficient of determination to be bigger; sometimes, smaller. In this example, the coefficient of determination is smaller when the outlier is present. Without Outlier With Outlier Regression equation: ŷ = 104.78 - 4.10x Coefficient of determination: R2 = 0.94 Regression equation: ŷ = 97.51 - 3.32x Coefficient of determination: R2 = 0.55

Example Which statement about influential points is true? Removal of an influential point changes the regression line. Data points that are outliers in the horizontal direction are more likely to be influential than points that are outliers in the vertical direction. Influential points have large residuals. I and II are true statements. A linear transformation neither increases nor decreases the linear relationship between variables; it preserves the relationship. A nonlinear transformation is used to increase the relationship between variables. The most effective transformation method depends on the data being transformed.