Transformations.  Although linear regression might produce a ‘good’ fit (high r value) to a set of data, the data set may still be non-linear. To remove.

Slides:



Advertisements
Similar presentations
Transformations Data transformation is commonly used to linearise the relationship between two numerical variables. If the relationship is non-linear,
Advertisements

 Coefficient of Determination Section 4.3 Alan Craig
Chapter Four: More on Two- Variable Data 4.1: Transforming to Achieve Linearity 4.2: Relationships between Categorical Variables 4.3: Establishing Causation.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Curve Fitting and Interpolation: Lecture (IV)
FACTOR THE FOLLOWING: Opener. 2-5 Scatter Plots and Lines of Regression 1. Bivariate Data – data with two variables 2. Scatter Plot – graph of bivariate.
CHAPTER 3 Describing Relationships
OBJECTIVES 2-2 LINEAR REGRESSION
Lesson Nonlinear Regression: Transformations.
Adapted from Walch Education A linear equation describes a situation where there is a near- constant rate of change. An exponential equation describes.
Descriptive Methods in Regression and Correlation
Linear Regression.
Relationship of two variables
Correlation and regression 1: Correlation Coefficient
Residuals and Residual Plots Most likely a linear regression will not fit the data perfectly. The residual (e) for each data point is the ________________________.
VCE Further Maths Least Square Regression using the calculator.
2.4: Cautions about Regression and Correlation. Cautions: Regression & Correlation Correlation measures only linear association. Extrapolation often produces.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
2-5 Using Linear Models Make predictions by writing linear equations that model real-world data.
© 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions.
1 6.9 Exponential, Logarithmic & Logistic Models In this section, we will study the following topics: Classifying scatter plots Using the graphing calculator.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
AP Statistics Chapter 8 & 9 Day 3
Describe correlation EXAMPLE 1 Telephones Describe the correlation shown by each scatter plot.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Scatterplot and trendline. Scatterplot Scatterplot explores the relationship between two quantitative variables. Example:
Review Multiple Choice Regression: Chapters 7, 8, 9.
Regression Regression relationship = trend + scatter
Chapter 5 Residuals, Residual Plots, & Influential points.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression (3)
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Nonlinear Regression Problem 4.14 Heart Weights of Mammals.
WARM-UP Do the work on the slip of paper (handout)
Creating a Residual Plot and Investigating the Correlation Coefficient.
Section 2.6 – Draw Scatter Plots and Best Fitting Lines A scatterplot is a graph of a set of data pairs (x, y). If y tends to increase as x increases,
Financial Statistics Unit 2: Modeling a Business Chapter 2.2: Linear Regression.
Algebra 3 Lesson 1.9 Objective: SSBAT identify positive, negative or no correlation. SSBAT calculate the line of best fit using a graphing calculator.
YOU NEED TO KNOW WHAT THIS MEANS
Foundations for Functions Chapter Exploring Functions Terms you need to know – Transformation, Translation, Reflection, Stretch, and Compression.
SWBAT: Calculate and interpret the residual plot for a line of regression Do Now: Do heavier cars really use more gasoline? In the following data set,
A. Write an equation in slope-intercept form that passes through (2,3) and is parallel to.
AP Statistics HW: p. 165 #42, 44, 45 Obj: to understand the meaning of r 2 and to use residual plots Do Now: On your calculator select: 2 ND ; 0; DIAGNOSTIC.
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Residual Plots Unit #8 - Statistics.
LEAST-SQUARES REGRESSION 3.2 Role of s and r 2 in Regression.
Section 1.6 Fitting Linear Functions to Data. Consider the set of points {(3,1), (4,3), (6,6), (8,12)} Plot these points on a graph –This is called a.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Residuals. Why Do You Need to Look at the Residual Plot? Because a linear regression model is not always appropriate for the data Can I just look at the.
Chapter 5 Lesson 5.2 Summarizing Bivariate Data 5.2: LSRL.
Chapter 10 Notes AP Statistics. Re-expressing Data We cannot use a linear model unless the relationship between the two variables is linear. If the relationship.
REGRESSION MODELS OF BEST FIT Assess the fit of a function model for bivariate (2 variables) data by plotting and analyzing residuals.
1.6 Modeling Real-World Data with Linear Functions Objectives Draw and analyze scatter plots. Write a predication equation and draw best-fit lines. Use.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Lesson 4.5 Topic/ Objective: To use residuals to determine how well lines of fit model data. To use linear regression to find lines of best fit. To distinguish.
CHAPTER 3 Describing Relationships
distance prediction observed y value predicted value zero
Section 3.2: Least Squares Regression
Active Learning Lecture Slides
Unit 2.2 Linear Regression
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
Regression and Residual Plots
Day 13 Agenda: DG minutes.
Investigating Relationships
Regression.
Advanced Placement Statistics Section 4
Residuals and Residual Plots
Scatterplots line of best fit trend line interpolation extrapolation
Ch 9.
Presentation transcript:

Transformations

 Although linear regression might produce a ‘good’ fit (high r value) to a set of data, the data set may still be non-linear. To remove (as much as is possible) such non-linearity, the data can be transformed.  Either the x-values, y-values, or both may be transformed in some way so that the transformed data are more linear. This enables more accurate predictions (extrapolations and interpolations) from the regression equation.

 To decide on an appropriate transformation, examine the points on a scatterplot with high values of x and or y (that is, away from the origin) and decide for each axis whether it needs to be stretched or compressed to make the points line up. The best way to see which of the transformations to use is to look at a number of ‘data patterns’.

 Example:  The seeds in the sunflower are arranged in spirals for a compact head. Counting the number of seeds in the successive circles starting from the centre and moving outwards, the following number of seeds were counted. Regression equation is: y = x Or Number of seeds = 18.77(circle) – 49.73

 Fit a least-squares regression line and plot the data.

 Find the correlation coefficient. What does this mean? r = 0.88 Strong, positive and linear correlation

 Using the regression line for the original data, predict the number of seeds in the 11th circle. (note: 11 th circle is an extrapolation analysis)  y = x  When x = 11  y = 18.77(11) –  y = (round it off to the nearest no. of seeds, y = 157)

 Find the residuals.

 Plot the residuals on a separate graph. Are the data linear?

 What type of transformation could be applied to:  i) the x-values?  Ii) the y-values?

 Apply the log10 y transformation to the data used in the previous question.

 Fit a least-squares regression line to the transformed data and plot it with the data.

 Find the correlation coefficient. Is there an improvement? Why?  Complete the least-squares regression for the transformation.  Calculate the coefficient of determination. How does this explain variation?

 Using the regression line for the transformed data, predict the number of seeds for the 11th circle.  How does this compare with the prediction from the previous question?