Stat 112: Lecture 10 Notes Fitting Curvilinear Relationships –Polynomial Regression (Ch. 5.2.1) –Transformations (Ch. 5.2.2-5.2.4) Schedule: –Homework.

Slides:



Advertisements
Similar presentations
Correlation and regression
Advertisements

Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Chapter 13 Multiple Regression
Lecture 21: Review Review a few points about regression that I went over quickly concerning coefficient of determination, regression diagnostics and transformation.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Lecture 18: Thurs., Nov. 6th Chapters 8.3.2, 8.4, Outliers and Influential Observations Transformations Interpretation of log transformations (8.4)
Lecture 23: Tues., Dec. 2 Today: Thursday:
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Lecture 25 Multiple Regression Diagnostics (Sections )
Chapter 12 Multiple Regression
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Stat 112: Lecture 12 Notes Fitting Curvilinear Relationships (Chapter 5): –Interpreting the slope coefficients in the log X transformation –The log Y –
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Chapter Topics Types of Regression Models
Lecture 24: Thurs., April 8th
Class 10: Tuesday, Oct. 12 Hurricane data set, review of confidence intervals and hypothesis tests Confidence intervals for mean response Prediction intervals.
Lecture 20 Simple linear regression (18.6, 18.9)
Lecture 27 Polynomial Terms for Curvature Categorical Variables.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Stat 112: Lecture 18 Notes Chapter 7.1: Using and Interpreting Indicator Variables. Visualizing polynomial regressions in multiple regression Review Problem.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Analysis of Covariance Goals: 1)Reduce error variance. 2)Remove sources of bias from experiment. 3)Obtain adjusted estimates of population means.
Lecture 11 Chapter 6. Correlation and Linear Regression.
Stat 112 Notes 11 Today: –Fitting Curvilinear Relationships (Chapter 5) Homework 3 due Friday. I will Homework 4 tonight, but it will not be due.
Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters.
Chapter 15: Model Building
Class 11: Thurs., Oct. 14 Finish transformations Example Regression Analysis Next Tuesday: Review for Midterm (I will take questions and go over practice.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Chapter 10 Correlation and Regression
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
1 Quadratic Model In order to account for curvature in the relationship between an explanatory and a response variable, one often adds the square of the.
Stat 112 Notes 10 Today: –Fitting Curvilinear Relationships (Chapter 5) Homework 3 due Thursday.
Stat 112 Notes 5 Today: –Chapter 3.7 (Cautions in interpreting regression results) –Normal Quantile Plots –Chapter 3.6 (Fitting a linear time trend to.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Stat 112 Notes 6 Today: –Chapters 4.2 (Inferences from a Multiple Regression Analysis)
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
REGRESSION REVISITED. PATTERNS IN SCATTER PLOTS OR LINE GRAPHS Pattern Pattern Strength Strength Regression Line Regression Line Linear Linear y = mx.
Stat 112 Notes 11 Today: –Transformations for fitting Curvilinear Relationships (Chapter 5)
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 15 Multiple Regression Model Building
Chapter 4 Basic Estimation Techniques
Basic Estimation Techniques
(Residuals and
Basic Estimation Techniques
Stat 112 Notes 4 Today: Review of p-values for one-sided tests
Presentation transcript:

Stat 112: Lecture 10 Notes Fitting Curvilinear Relationships –Polynomial Regression (Ch ) –Transformations (Ch ) Schedule: –Homework 3 due on Thursday. –Quiz 2 next

Curvilinear Relationship Reconsider the simple regression problem of estimating the conditional mean of y given x, For many problems, is not linear. Linear regression model makes restrictive assumption that increase in mean of y|x for a one unit increase in x equals Curvilinear relationship: is a curve, not a straight line; increase in mean of y|x is not the same for all x.

Example 1: How does rainfall affect yield? Data on average corn yield and rainfall in six U.S. states ( ), cornyield.JMP

Example 2: How do people’s incomes change as they age Weekly wages and age of 200 randomly chosen males between ages 18 and 70 from the 1998 March Current Population Survey

Example 3: Display.JMP A large chain of liquor stores would like to know how much display space in its stores to devote to a new wine. It collects sales and display space data from 47 of its stores.

Polynomial Regression Add powers of x as additional explanatory variables in a multiple regression model. Often is used in the place of x. This does not affect the that is obtained from the multiple regression model. Quadratic model (K=2) is often sufficient.

Polynomial Regression in JMP Two ways to fit model: –Create variables. Use fit model with variables –Use Fit Y by X. Click on red triangle next to Bivariate Analysis … and click Fit Polynomial instead of the usual Fit Line. This method produces nicer plots.

Interpretation of coefficients in polynomial regression The usual interpretation of multiple regression coefficients doesn’t make sense in polynomial regresssion. We can’t hold x fixed and change. Effect of increasing x by one unit depends on the starting x=x*

Interpretation of coefficients in wage data

Choosing the order in polynomial regression Is it necessary to include a kth order term ? Test vs. Choose largest k so that test still rejects (at 0.05 level) If we use, always keep the lower order terms in the model. For corn yield data, use K=2 polynomial regression model. For income data, use K=2 polynomial regression model

Transformations Curvilinear relationship: E(Y|X) is not a straight line. Another approach to fitting curvilinear relationships is to transform Y or x. Transformations: Perhaps E(f(Y)|g(X)) is a straight line, where f(Y) and g(X) are transformations of Y and X, and a simple linear regression model holds for the response variable f(Y) and explanatory variable g(X).

Curvilinear Relationship Y=Life Expectancy in 1999 X=Per Capita GDP (in US Dollars) in 1999 Data in gdplife.JMP Linearity assumption of simple linear regression is clearly violated. The increase in mean life expectancy for each additional dollar of GDP is less for large GDPs than Small GDPs. Decreasing returns to increases in GDP.

The mean of Life Expectancy | Log Per Capita appears to be approximately a straight line.

How do we use the transformation? Testing for association between Y and X: If the simple linear regression model holds for f(Y) and g(X), then Y and X are associated if and only if the slope in the regression of f(Y) and g(X) does not equal zero. P-value for test that slope is zero is <.0001: Strong evidence that per capita GDP and life expectancy are associated. Prediction and mean response: What would you predict the life expectancy to be for a country with a per capita GDP of $20,000?

How do we choose a transformation? Tukey’s Bulging Rule. See Handout. Match curvature in data to the shape of one of the curves drawn in the four quadrants of the figure in the handout. Then use the associated transformations, selecting one for either X, Y or both.

Transformations in JMP 1.Use Tukey’s Bulging rule (see handout) to determine transformations which might help. 2.After Fit Y by X, click red triangle next to Bivariate Fit and click Fit Special. Experiment with transformations suggested by Tukey’s Bulging rule. 3.Make residual plots of the residuals for transformed model vs. the original X by clicking red triangle next to Transformed Fit to … and clicking plot residuals. Choose transformations which make the residual plot have no pattern in the mean of the residuals vs. X. 4.Compare different transformations by looking for transformation with smallest root mean square error on original y-scale. If using a transformation that involves transforming y, look at root mean square error for fit measured on original scale.

` By looking at the root mean square error on the original y-scale, we see that all of the transformations improve upon the untransformed model and that the transformation to log x is by far the best.

The transformation to Log X appears to have mostly removed a trend in the mean of the residuals. This means that. There is still a problem of nonconstant variance.

Comparing models for curvilinear relationships In comparing two transformations, use transformation with lower RMSE, using the fit measured on the original scale if y was transformed on the original y-scale [this is equivalent to choosing the transformation with the higher or ] In comparing transformations to polynomial regression models, compare of best transformation to best polynomial regression model (selected using the criterion on slide 10). If the transfomation’s is close to (e.g., within.01) but not as high as the polynomial regression’s, it is still reasonable to use the transformation on the grounds of parsimony.

(Section 4.3.1) Problem with : it never decreases even if we add useless variables.. This can decrease if useless variables are added. Useful for comparing regression models with different numbers of variables. No longer represents proportion of variation in y explained by multiple regression line. Found under Summary of Fit in JMP.

Transformations and Polynomial Regression for Display.JMP R2R2 Linear log x /x Fourth order poly Fourth order polynomial is the best polynomial regression model using the criterion on slide 10 Fourth order polynomial is the best model – it has the highest

Summary Two methods for fitting regression models for curvilinear relationships: –Polynomial Regression –Transformations