YOU NEED TO KNOW WHAT THIS MEANS

Slides:



Advertisements
Similar presentations
Transformations Data transformation is commonly used to linearise the relationship between two numerical variables. If the relationship is non-linear,
Advertisements

Section 10-3 Regression.
Polynomial Regression and Transformations STA 671 Summer 2008.
Conclusion to Bivariate Linear Regression Economics 224 – Notes for November 19, 2008.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
From the Carnegie Foundation math.mtsac.edu/statway/lesson_3.3.1_version1.5A.
Lesson Quiz: Part I 1. Change 6 4 = 1296 to logarithmic form. log = 4 2. Change log 27 9 = to exponential form = log 100,000 4.
Statistics for Managers Using Microsoft® Excel 5th Edition
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Welcome to class today! Chapter 12 summary sheet Jimmy Fallon video
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Lesson Nonlinear Regression: Transformations.
Transformations to Achieve Linearity
Correlation Scatter Plots Correlation Coefficients Significance Test.
Residuals and Residual Plots Most likely a linear regression will not fit the data perfectly. The residual (e) for each data point is the ________________________.
VCE Further Maths Least Square Regression using the calculator.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Transforming to achieve linearity
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Correlation & Regression – Non Linear Emphasis Section 3.3.
Wednesday, May 13, 2015 Report at 11:30 to Prairieview.
Chapter 10 Correlation and Regression
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Regression Regression relationship = trend + scatter
11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the.
Linear Regression Analysis Using MS Excel Tutorial for Assignment 2 Civ E 342.
Fitting Curves to Data 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 5: Fitting Curves to Data Terry Dielman Applied Regression.
Scatter Plots And Looking at scatter plots Or Bivariate Data.
WARM-UP Do the work on the slip of paper (handout)
Transformations.  Although linear regression might produce a ‘good’ fit (high r value) to a set of data, the data set may still be non-linear. To remove.
Creating a Residual Plot and Investigating the Correlation Coefficient.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Multiple Logistic Regression STAT E-150 Statistical Methods.
1.5 Scatter Plots and Least-Squares Lines Objectives : Create a scatter plot and draw an informal inference about any correlation between the inference.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Quadratic Regression ©2005 Dr. B. C. Paul. Fitting Second Order Effects Can also use least square error formulation to fit an equation of the form Math.
Residuals.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
1.5 Linear Models Warm-up Page 41 #53 How are linear models created to represent real-world situations?
REGRESSION MODELS OF BEST FIT Assess the fit of a function model for bivariate (2 variables) data by plotting and analyzing residuals.
Chapter 4 More on Two-Variable Data. Four Corners Play a game of four corners, selecting the corner each time by rolling a die Collect the data in a table.
Simple Linear Regression Relationships Between Quantitative Variables.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
12.2 TRANSFORMING TO ACHIEVE LINEARITY To use transformations involving powers, roots, and logarithms to find a power or exponential model that describes.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 15 Multiple Regression Model Building
distance prediction observed y value predicted value zero
Section 3.2: Least Squares Regression
Unit 2 Exploring Data: Comparisons and Relationships
Active Learning Lecture Slides
Ch12.1 Simple Linear Regression
(Residuals and
Data Transformation Data Analysis.
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
Regression and Residual Plots
Day 13 Agenda: DG minutes.
CHAPTER 26: Inference for Regression
Correlation and Regression
Chapter 3 Describing Relationships Section 3.2
Residuals and Residual Plots
M248: Analyzing data Block D UNIT D2 Regression.
Chapters Important Concepts and Terms
Presentation transcript:

YOU NEED TO KNOW WHAT THIS MEANS REGRESSION ANALYSIS In the OUTCOME that you will commence second week back, you might be given data and asked to perform a REGRESSION ANALYSIS YOU NEED TO KNOW WHAT THIS MEANS

REGRESSION ANALYSIS is the process of fitting a linear model to a data set. The aim is to determine the best linear model possible and to use it to make predictions.

What do we mean by the “best possible linear model”? The best possible linear model is the one in which: a. The data is linear or has been linearized by a data transformation:

and we also want b. the linear model which has the greatest possible value of r2 REMEMBER: the value of the coefficient of determination measures the predictive power of our regression model. PREDICTIVE POWER R2

If r2 > 30%, then our model will have Predictive power

STEP 1: Construct a scatterplot of the RAW (Original ) Data and note: Its shape The value of the coefficient of determination FIRST: We must decide Which is the INDEPENDENT (x) VARIABLE: Which is the DEPENDENT (y ) VARIABLE We are predicting LIFE EXPECTANCY from GDP, so: X GDP Y LIFE EXPECTANCY

CONCLUSION: Data is NON-LINEAR List A = gdp List B = le lifeex 950 58 1670 65 4250 68 11520 74 12280 73 4170 14300 75 5540 71 9830 72 1680 61 320 67 22260 66 550 50 930 940 64 2670 11220 1420 48 150 41 330 44 520 49 350 180 Life expectancy CONCLUSION: Data is NON-LINEAR

From the Home screen determine the value of r2. Value of r2 = 0.3665.

CHECK THE CIRCLE OF TRANSFORMATIONS!! STEP 2: We seek a Transformation to linearize the data. CHECK THE CIRCLE OF TRANSFORMATIONS!! Our scatterplot most closely resembles Quadrant 2! POTENTIALLY SUITABLE TRANSFORMATIONS are: Y2 Logx 1 x Quadrant 2 Quadrant 1 Quadrant 3 Quadrant 4

Step 3 Try each of these transformations to determine which one effectively linearizes the data and gives the highest value for r2. In each case, obtain a RESIDUAL PLOT to confirm that the transformed data is linear.

R2 = 38.3% List A gdp ( x variable) Y SQUARED TRANSFORMATION List B lesqu 950 58 3364 1670 65 4225 4250 68 4624 11520 74 5476 12280 73 5329 4170 14300 75 5625 5540 71 5041 9830 72 5184 1680 61 3721 320 67 4489 22260 66 4356 550 50 2500 930 940 64 4096 2670 11220 1420 48 2304 150 41 1681 330 44 1936 520 49 2401 350 180 List B le (y variable) List C lesqu (y2 transformed variable ) R2 = 38.3% TRANSFORMED DATA APPEARS NON-LINEAR STILL

Establish the value of r2 in HOMESCREEN:

CONFIRM WITH RESIDUAL PLOT Remember: to get the correct residual plot use the split screen view. Make sure that the scatterplot at the top has the correct transformed variable. CONCLUSION: The residual plot shows a definite curved pattern, indicating that the transformed data is still not linear. The y2 transformation has NOT succeeded in producing an effective linear model.

NEXT STEP…. You guessed it!! Now we try the next potential candidate transformation. It was the log x transformation!

(Delete the y2 column, as we have discarded this transformation.) GDP lifeex logGDP 950 58 2.98 1670 65 3.22 4250 68 3.63 11520 74 4.06 12280 73 4.09 4170 3.62 14300 75 4.16 5540 71 3.74 9830 72 3.99 1680 61 3.23 320 67 2.51 22260 66 4.35 550 50 2.74 930 2.97 940 64 2670 3.43 11220 4.05 1420 48 3.15 150 41 2.18 330 44 2.52 520 2.72 49 350 2.54 List A= gdp List B= le List C= loggdp R2 = 66.0% CONCLUSION: It appears that the log(GDP) transformation has successfully linearized the data! Scatterplot appears linear, and R2 has increased.

NOTE THE VARIABLES ARE LISTED HERE SO YOU CAN CHECK

The value of r2 has now increased to 66.0%. Now confirm this by creating a RESIDUAL PLOT for the log(x) transformation. Open a new graphing screen!! CONCLUSION: The residual plot shows a random scattering of points with no pattern, indicating that the transformed data is linear. The value of r2 has now increased to 66.0%. The logx transformation has succeeded in producing an effective linear model for the data with significant predictive power.

And now…… Yes you guessed it! We need to check out the reciprocal x transformation, because ….. maybe it will give a higher coefficient of determination than the logx! (here we go again)

Don’t delete log x column because we think this model was effective! List A = gdp List B = le List C = loggdp List D = recgdp Life expectancy list A listB list C list D log(list1) 1/list1 950 58 2.98 0.00105 1670 65 3.22 0.00060 4250 68 3.63 0.00024 11520 74 4.06 0.00009 12280 73 4.09 0.00008 4170 3.62 14300 75 4.16 0.00007 5540 71 3.74 0.00018 9830 72 3.99 0.00010 1680 61 3.23 320 67 2.51 0.00313 22260 66 4.35 0.00004 550 50 2.74 0.00182 930 2.97 0.00108 940 64 0.00106 2670 3.43 0.00037 11220 4.05 1420 48 3.15 0.00070 150 41 2.18 0.00667 330 44 2.52 0.00303 520 2.72 0.00192 49 350 2.54 0.00286 180 2.26 0.00556 R2 = 51.5% 1/GNP CONCLUSION: The transformed data appears to be linear, but the value of the coefficient of determination is 51.5%, lower than for the loggdp transformation.

Coefficient of determination

Remember to create a new graphing screen for the new transformation!! CONCLUSION: The residual plot shows a random scattering of points with no pattern, indicating that the 1/x transformation has made the data linear.

Y squared transformation: Ineffective (did not linearize the data) OVERALL CONCLUSIONS We have tested three transformations: Y squared transformation: Ineffective (did not linearize the data) Log (x ) transformation: Effective in linearizing data with r2 = 66.0% 1/x transformation: Effective in linearizing data with r2 = 51.5% Based on this regression analysis, we conclude that the log(GDP) transformation provides the best model for making predictions from this data.

MAKING A PREDICTION Use your linear regression model to predict the Life Expectancy in a country where the GNP is $8000 gnp le List3 loggnp 950 58 2.98 1670 65 3.22 4250 68 3.63 11520 74 4.06 12280 73 4.09 4170 3.62 14300 75 4.16 5540 71 3.74 9830 72 3.99 1680 61 3.23 320 67 2.51 22260 66 4.35 550 50 2.74 930 2.97 940 64 2670 3.43 11220 4.05 1420 48 3.15 150 41 2.18 330 44 2.52 520 2.72 49 350 2.54 180 2.26 Find the equation of the LEAST SQUARES REGRESSION line for the Log transformation Regression(a+bx) Xlist = log(GNP) Ylist=le a = 14.3 b = 14.5 Life Expectancy = 14.3 + 14.5  log(GNP) Life Expectancy = 14.3 + 14.5 × log(8000) = 70.9 Predicted Life Expectancy = 70.9 years