Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.

Slides:



Advertisements
Similar presentations
Lecture 17: Tues., March 16 Inference for simple linear regression (Ch ) R2 statistic (Ch ) Association is not causation (Ch ) Next.
Advertisements

Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Lecture 8 Relationships between Scale variables: Regression Analysis
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Chapter 10 Simple Regression.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Stat 112: Lecture 10 Notes Fitting Curvilinear Relationships –Polynomial Regression (Ch ) –Transformations (Ch ) Schedule: –Homework.
BA 555 Practical Business Analysis
Class 19: Tuesday, Nov. 16 Specially Constructed Explanatory Variables.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Stat 112: Lecture 12 Notes Fitting Curvilinear Relationships (Chapter 5): –Interpreting the slope coefficients in the log X transformation –The log Y –
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Stat 112: Lecture 14 Notes Finish Chapter 6:
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 6 Notes Note: I will homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from.
Lecture 26 Omitted Variable Bias formula revisited Specially constructed variables –Interaction variables –Polynomial terms for curvature –Dummy variables.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24: Thurs., April 8th
Class 10: Tuesday, Oct. 12 Hurricane data set, review of confidence intervals and hypothesis tests Confidence intervals for mean response Prediction intervals.
Lecture 20 Simple linear regression (18.6, 18.9)
Stat 112: Lecture 18 Notes Chapter 7.1: Using and Interpreting Indicator Variables. Visualizing polynomial regressions in multiple regression Review Problem.
Regression Diagnostics Checking Assumptions and Data.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Stat 112 Notes 11 Today: –Fitting Curvilinear Relationships (Chapter 5) Homework 3 due Friday. I will Homework 4 tonight, but it will not be due.
Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters.
Statistics 350 Lecture 17. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Class 11: Thurs., Oct. 14 Finish transformations Example Regression Analysis Next Tuesday: Review for Midterm (I will take questions and go over practice.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Section 1 Inference for Linear Regression.
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Stat 112 Notes 10 Today: –Fitting Curvilinear Relationships (Chapter 5) Homework 3 due Thursday.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Stat 112 Notes 6 Today: –Chapters 4.2 (Inferences from a Multiple Regression Analysis)
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Maths Study Centre CB Open 11am – 5pm Semester Weekdays
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Stat 112 Notes 11 Today: –Transformations for fitting Curvilinear Relationships (Chapter 5)
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Inference for Least Squares Lines
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Linear Regression.
Let’s Get It Straight! Re-expressing Data Curvilinear Regression
Inferences for Regression
(Residuals and
CHAPTER 29: Multiple Regression*
CHAPTER 12 More About Regression
Chapter 13 Additional Topics in Regression Analysis
Presentation transcript:

Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start Chapter 6: Checking Assumptions of Multiple Regression and Remedies for the Assumptions. Schedule: Homework 4 will be assigned next week and due Thursday, Nov. 2 nd.

Another Example of Transformations: Y=Count of tree seeds, X= seed weight

By looking at the root mean square error on the original y-scale, we see that Both of the transformations improve upon the untransformed model and that the transformation to log y and log x is by far the best.

Prediction using the log y/log x transformation What is the predicted seed count of a tree that weights 50 mg? Math trick: exp{log(y)}=y (Remember by log, we always mean the natural log, ln), i.e.,

Polynomials and Transformations in Multiple Example: Fast Food Locations. An analyst working for a fast food chain is asked to construct a multiple regression model to identify new locations that are likely to be profitable. The analyst has for a sample of 25 locations the annual gross revenue of the restaurant (y), the mean annual household income and the mean age of children in the area. Data in fastfoodchain.jmp

Scatterplot Matrix There seems to be a nonlinear relationship between revenue and income and between revenue and age.

Polynomials and Transformations for Multiple Regression in JMP For multiple regression, transformations can be done by creating a new column, right clicking and clicking formula to create new formula. Polynomials can be added by using Fit Model and then highlighting the X variable in both the Select Columns box and the Construct Model Effects Box and then clicking cross. For choosing the order of the polynomials, we use the same procedure as in simple regression, making the polynomials higher order until the coefficient on the highest order term is not significant.

Polynomial Regression for Fast Food Chain Data

Chapter 6: Checking the Assumptions of the Regressions Model and Remedies for When the Assumptions are Not Met

Assumptions of Multiple Linear Regression Model 1.Linearity: 2.Constant variance: The standard deviation of Y for the subpopulation of units with is the same for all subpopulations. 3.Normality: The distribution of Y for the subpopulation of units with is normally distributed for all subpopulations. 4.The observations are independent.

Assumptions for linear regression and their importance to inferences InferenceAssumptions that are important Point prediction, point estimation Linearity, independence Confidence interval for slope, hypothesis test for slope, confidence interval for mean response Linearity, constant variance, independence, normality (only if n<30) Prediction intervalLinearity, constant variance, independence, normality

Fast Food Chain Data

Checking Linearity Plot residuals versus each of the explanatory variables. Each of these plots should look like random scatter, with no pattern in the mean of the residuals. If residual plots show a problem, then we could try to transform the x-variable and/or the y-variable. Residual Plot: Use Fit Y by X with Y being Residuals. Fit Line will draw horizontal Line.

Residual Plots in JMP After Fit Model, click red triangle next to Response, click Save Columns and click Residuals. Use Fit Y by X with Y=Residuals and X the explanatory variable of interest. Fit Line will draw a horizontal line with intercept zero. It is a property of the residuals from multiple linear regression that a least squares regression of the residuals on an explanatory variable has slope zero and intercept zero.

Residual by Predicted Plot Fit Model displays the Residual by Predicted Plot automatically in its output. The plot is a plot of the residuals versus the predicted Y’s, We can think of the predicted Y’s as summarizing all the information in the X’s. As usual we would like this plot to show random scatter. Pattern in the mean of the residuals as the predicted Y’s increase: Indicates problem with linearity. Look at residual plots versus each explanatory variable to isolate problem and consider transformations. Pattern in the spread of the residuals: Indicates problem with constant variance.

Corrections for Violations of the Linearity Assumption When the residual plot shows a pattern in the mean of the residuals for one of the explanatory variables X j, we should consider: –Transforming the X j. –Adding polynomial variables in X j — –Transforming Y After making the transformation/adding polynomials, we need to refit the model and look at the new residual plot vs. X to see if linearity has been achieved.

Quadratic Polynomials for Age and Income

Linearity now appears to be satisfied.

Checking Constant Variance Assumption Residual plot versus explanatory variables should exhibit constant variance. Residual plot versus predicted values should exhibit constant variance (this plot is often most useful for detecting nonconstant variance)

Heteroscedasticity When the requirement of a constant variance is violated we have a condition of heteroscedasticity. Diagnose heteroscedasticity by plotting the residual against the predicted y The spread increases with y ^ y ^ Residual ^ y

How much traffic would a building generate? The goal is to predict how much traffic will be generated by a proposed new building of 150,000 occupied sq ft. (Data is from the MidAtlantic States City Planning Manual.) The data tells how many automobile trips per day were made in the AM to office buildings of different sizes. The variables are x = “Occupied Sq Ft of floor space in the building (in 1000 sq ft)” and Y = “number of automobile trips arriving at the building per day in the morning”.

A brief list of transformations »y’ = y 1/2 (for y > 0) Use when the s 2  increases with »y’ = log y (for y > 0) Use when the s  increases with Use when the error distribution is skewed to the right. »y’ = y 2 Use when the s 2  is decreasing with, or Use when the error distribution is left skewed Reducing Nonconstant Variance/Nonnormality by Transformations

The heteroscedasticity shows here.

To try to fix heteroscedasticity we transform Y to Log(Y) This fixes hetero… BUT it creates a nonlinear pattern.

To fix nonlinearity we now transform x to Log(x), without changing the Y axis anymore. The resulting pattern is both satisfactorily homoscedastic AND linear.

Often we will plot residuals versus predicted. For simple regression the two residual plots are equivalent