Multiple Regression ©2005 Dr. B. C. Paul. Problems with Regression So Far We have only been able to consider one factor as controlling at a time Everything.

Slides:



Advertisements
Similar presentations
Introduction to Regression ©2005 Dr. B. C. Paul. Things Favoring ANOVA Analysis ANOVA tells you whether a factor is controlling a result It requires that.
Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Copyright © 2010 Pearson Education, Inc. Slide
Analyzing Bivariate Data With Fathom * CFU Using technology with a set of contextual linear data to examine the line of best fit; determine and.
Limitations of ANOVA ©2005 Dr. B. C. Paul. The Data Size Effect We Did ANOVA with one factor We Did ANOVA with one factor We Did it with two factors (Driver.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Statistics for the Social Sciences
BA 555 Practical Business Analysis
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
The Simple Regression Model
Lecture 6: Multiple Regression
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
CHAPTER 3 Describing Relationships
Correlation and Regression Analysis
Spreadsheet Problem Solving
Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan.
Simple Linear Regression
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Unequal Variance and ANOVA ©2005 Dr. B. C. Paul. ANOVA Assumptions ANOVA assumes the populations sampled in each class are normally distributed Also assumes.
Generating Full-Factorial Models in Minitab We want to generate a design for a 2 3 full factorial model. 2 x 2 x 2 = 8 runs We want to generate a design.
Chapter 3: Examining relationships between Data
Proportions for the Binomial Distribution ©2005 Dr. B. C. Paul.
One Way ANOVA ©2005 Dr. B. C. Paul modified 2009 Note – The concepts presented in these slides are considered common knowledge to those familiar with statistics.
Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.
Two Way ANOVA ©2005 Dr. B. C. Paul. ANOVA Application ANOVA allows us to review data and determine whether a particular effect is changing our results.
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Soc 3306a Multiple Regression Testing a Model and Interpreting Coefficients.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Then click the box for Normal probability plot. In the box labeled Standardized Residual Plots, first click the checkbox for Histogram, Multiple Linear.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Regression Regression relationship = trend + scatter
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
Analysis of Residuals ©2005 Dr. B. C. Paul. Examining Residuals of Regression (From our Previous Example) Set up your linear regression in the Usual manner.
Discussion of time series and panel models
Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Correlation The apparent relation between two variables.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Assumptions 5.4 Data Screening. Assumptions Parametric tests based on the normal distribution assume: – Independence – Additivity and linearity – Normality.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Business Statistics for Managerial Decision Making
CHAPTER 3 Describing Relationships
Quadratic Regression ©2005 Dr. B. C. Paul. Fitting Second Order Effects Can also use least square error formulation to fit an equation of the form Math.
ANOVA, Regression and Multiple Regression March
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Problems with Variance ©2005 Dr. B. C. Paul. Determining What To Do We have looked at techniques that depend on normally distributed data with variance.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
Chapter 3: Describing Relationships
Chapter 4 More on Two-Variable Data. Four Corners Play a game of four corners, selecting the corner each time by rolling a die Collect the data in a table.
An Interactive Tutorial for SPSS 10.0 for Windows©
The simple linear regression model and parameter estimation
Topics
Regression and Correlation
Regression Analysis.
Multiple Regression.
Regression Analysis.
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Hypothesis testing and Estimation
Chapter 3 Describing Relationships Section 3.2
Algebra Review The equation of a straight line y = mx + b
Presentation transcript:

Multiple Regression ©2005 Dr. B. C. Paul

Problems with Regression So Far We have only been able to consider one factor as controlling at a time Everything else has gone into sum of squares for error With our MPG we suspect other factor are involved

Looking at Other Variables Lets try to plot MPG against Outside temperature. Click on Graphs Highlight Interactive to bring up The side menu Highlight and click on scatterplot

Setting the Plot Move your Y and X Axis variable into position This interface requires you To use a drag and drop Method rather than click On arrows. When you are done click Ok.

Up Comes the Plot There appears to be evidence That MPG improves as the Outside temperature increases.

Ordering a Model We would like to have a model that includes more than one factor at a time Such a model exists

Function of the Model Works by least square error Objective remains to pick the coefficients such that the average error squared between the model and data points is minimum Again we will skip any derivations or explanations of how this is done For us we’ll push the right SPSS buttons

How Do We Deal with Significance? We have seen that any coefficient in a model can be analyzed individually to certainty that it is not really zero (if its zero that term is not even in the model) The trick! The significance of the coefficient depends on How much of the total variation the model explains How much of the credit for that is going to other variables It makes a difference what is in the model Example for MPG= f(distance driven) linear regression was significant When quadratic term was added the regression fit improved but neither term including the linear was significant

Method of Variable Entry The Decree Method I can tell SPSS I want it to do a regression with such and such a variable SPSS will do the best regression it can and then show me the ANOVA table I can look and see whether I believe my coefficients are strong enough for me to sign-off on.

Forward Regression The computer will look at the variables available It will try a linear regression on each one If one variable comes up at 95% significant it becomes a candidate to enter The computer will get the significance of each variable and if several are over 95% it will pick the best The computer will then look at the remaining variables to explain the residuals It will try each variable and check its significance in explaining the residuals It looks for variables over 95% significant and then chooses the best

Forward Regression Continued The process of variables entering continues until all variables have been selected or no more variables are significant. 95% significance is the “default” significance to enter We can reset to a different significance level

Backward Regression Computer starts by doing a regression with all possible variables in the equation Computer then does a T test on each coefficient to see if the coefficient might be zero. Any variable that falls below 75% significance is removed from the equation (75% is a default that you can reset) The T tests are then repeated The process repeats until no more variables can be thrown out of the equation

Step Wise Regression Starts out like Forward Regression Moves forward till two variables are in the equation Now the computer does T tests on all the variables in the equation like a Backward Regression to see if anyone should be thrown out. If not it goes into another Forward step After the forward step it checks with T tests on all variables in the equation It continues the back and forth process until nothing changes or it goes into an infinite loop

What are the chances that the Methods will give you the same answer? About zero Some smaller easier sets will converge for all methods but the larger sets usually do not yield the same answer. Which is right? Maybe it’s a dumb question Method does influence answer May be more important that you carefully make sure you have a good defensible method (Maybe it’s the teachers favorite answer – Step Wise)

Lets Try It I added a variable for Distance Squared. Multilinear regression can only Consider linear effects of a Variable, but I can trick it by Creating a non-linear variable In my case I still think I saw The MPG bending down as the Drive distance increased (logical cause the engine warmed Up)

Start Like We Are Going To Do Regular Linear Regression Click analyze to pull down The menu Highlight Regression to pop Out the side menu Highlight and Click Linear

Select My Variables Note the change here is that I Entered all the possible Independent variables (you can’t see that I also entered Distance squared)

Set the Regression Method to Step Wise

Check My Options Click on Options Note that this controls my significance To enter and remove The default is set to 95% to enter And 90% to remove.

Set My Plots I ask for my histograms. I ask for my residuals to be plotted Against the predicted value to search For trends in the residual.

Click Ok and Out Comes Stuff

We Can See Some Model History Our First Model was MPG is a linear Function of outside temperature It explained about 54% of observed Variation.

The Saga Continues The next step was to add an Effect for distance. The two variables explained 91% Of the observed variation.

The Rest of the History The model next added Age and finally a distance squared term. It appears that none of the variables was removed in a backwards Step. This just moved forward till all variables were in. In the end we have 93.5% of variation explained.

Looking at the ANOVA for the Regression Equations All Four Regressions were Highly significant

Checking the Significance of Coefficients We Actually Knew that none of our Variables got bounced out. Note that every Variable is Significant at Above the Alpha = 5% level.

We Can See Interaction Between Variables as they Enter Note that the T score for Distance dips When distance Squared Entered (For some Reason it Appears the Values are Correlated).

More Interactions As the unexplained random variations decreased the significance of The temperature effect increase steadily.

Our Equation Is

Look at the Significance that Controlled the Order the Variables Came In To start with Age had less than 50% significance but distance and distance squared Were both strong. Distance had a better T score and entered next.

Next Regression Step In the next step both Age and Distance Squared were above 5% but Age Was stronger.

Checking Out Our Residuals I’ve seen better normal Distributions on a cell by cell Basis but this doesn’t trigger Any immediate concerns. (Remember we do assume Our residuals will be normally Distributed with a mean of Zero around the predicted value).

Looking at Cumulative Probability On an accumulative value Chart we do very well in Assuming normal distribution Of the error with a mean of 0.

Our Scatter Plot If there is a trend there I don’t see it. (which is exactly what One wants to see After the regression is Well done).

Summary on Regression ANOVA works for Category Data Is a particular category significant Ford Escorts are made in 3 plants  Is there a difference in the mechanical problems rate that depends on which factory built the car? Plants #1, #2, #3 really have no order except arbitrary If I had looked at MPG based on Spring, Summer, Fall, Winter Assigning a numeric value to the seasons would be totally arbitrary Category Data Lends itself poorly to regression

So When Should I Choose Regression? Continuous quantitative variables I could break my drivers ages into groups but the break points would be arbitrary This little artifact is one of the reasons two car insurance companies can look at the same regional risk for drivers in an area and yet quote different rates for the same coverage Creating categories out of continuous data can cause some weird effects Regression tends to work better for continuously distributed quantitative data Also provides predictive models as opposed to category means