Multiple Linear Regression

Slides:



Advertisements
Similar presentations
Chapter 5 Multiple Linear Regression
Advertisements

Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Chapter 8 – Logistic Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
Statistics for Managers Using Microsoft® Excel 5th Edition
BA 555 Practical Business Analysis
Statistics for Managers Using Microsoft® Excel 5th Edition
Part I – MULTIVARIATE ANALYSIS C3 Multiple Linear Regression II © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Lecture 6: Multiple Regression
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
EPI809/Spring Testing Individual Coefficients.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 15: Model Building
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Regression Model Building
Correlation & Regression
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Lecture 12 Model Building BMTRY 701 Biostatistical Methods II.
Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons Business Statistics, 4e by Ken Black Chapter 15 Building Multiple Regression Models.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
Shonda Kuiper Grinnell College. Statistical techniques taught in introductory statistics courses typically have one response variable and one explanatory.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding.
1 Model Selection Response: Highway MPG Explanatory: 13 explanatory variables Indicator variables for types of car – Sports Car, SUV, Wagon, Minivan There.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 Building the Regression Model –I Selection and Validation KNN Ch. 9 (pp )
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
© Galit Shmueli and Peter Bruce 2010 Chapter 6: Multiple Linear Regression Data Mining for Business Analytics Shmueli, Patel & Bruce.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
DSCI 346 Yamasaki Lecture 6 Multiple Regression and Model Building.
Canadian Bioinformatics Workshops
Model selection and model building. Model selection Selection of predictor variables.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 15 Multiple Regression Model Building
Chapter 9 Multiple Linear Regression
Multiple Linear Regression
Statistics in MSmcDESPOT
BUSI 410 Business Analytics
Business Statistics, 4e by Ken Black
Chapter 6: Multiple Linear Regression
Regression Model Building
Regression Model Building
Linear Model Selection and regularization
Multiple Linear Regression
Regression Forecasting and Model Building
Business Statistics, 4e by Ken Black
Presentation transcript:

Multiple Linear Regression Data Mining for Business Intelligence Shmueli, Patel & Bruce © Galit Shmueli and Peter Bruce 2008

Topics Explanatory vs. predictive modeling with regression Example: prices of Toyota Corollas Fitting a predictive model Assessing predictive accuracy Selecting a subset of predictors

Explanatory Modeling Goal: Explain relationship between predictors (explanatory variables) and target Familiar use of regression in data analysis Model Goal: Fit the data well and understand the contribution of explanatory variables to the model “goodness-of-fit”: R2, residual analysis, p-values

Predictive Modeling Goal: predict target values in other data where we have predictor values, but not target values Classic data mining context Model Goal: Optimize predictive accuracy Train model on training data Assess performance on validation (hold-out) data Explaining role of predictors is not primary purpose (but useful)

Example: Prices of Toyota Corolla ToyotaCorolla.xls Goal: predict prices of used Toyota Corollas based on their specification Data: Prices of 1442 used Toyota Corollas, with their specification information

Data Sample (showing only the variables to be used in analysis)

Variables Used Price in Euros Age in months as of 8/04 KM (kilometers) Fuel Type (diesel, petrol, CNG) HP (horsepower) Metallic color (1=yes, 0=no) Automatic transmission (1=yes, 0=no) CC (cylinder volume) Doors Quarterly_Tax (road tax) Weight (in kg)

Preprocessing Fuel type is categorical, must be transformed into binary variables Diesel (1=yes, 0=no) CNG (1=yes, 0=no) None needed for “Petrol” (reference category)

60% training data / 40% validation data Subset of the records selected for training partition (limited # of variables shown) 60% training data / 40% validation data

The Fitted Regression Model

Error reports

Predicted Values Residuals = difference between actual and predicted prices Predicted price computed using regression coefficients

Distribution of Residuals Symmetric distribution Some outliers

Selecting Subsets of Predictors Goal: Find parsimonious model (the simplest model that performs sufficiently well) More robust Higher predictive accuracy Exhaustive Search Partial Search Algorithms Forward Backward Stepwise

Exhaustive Search All possible subsets of predictors assessed (single, pairs, triplets, etc.) Computationally intensive Judge by “adjusted R2” Penalty for number of predictors

Forward Selection Start with no predictors Add them one by one (add the one with largest contribution) Stop when the addition is not statistically significant

Backward Elimination Start with all predictors Successively eliminate least useful predictors one by one Stop when all remaining predictors have statistically significant contribution

Stepwise Like Forward Selection Except at each step, also consider dropping non- significant predictors

Backward elimination (showing last 7 models) Top model has a single predictor (Age_08_04) Second model has two predictors, etc.

All 12 Models

Diagnostics for the 12 models Good model has: High adj-R2, Cp = # predictors

Next step Subset selection methods give candidate models that might be “good models” Do not guarantee that “best” model is indeed best Also, “best” model can still have insufficient predictive accuracy Must run the candidates and assess predictive accuracy (click “choose subset”)

Model with only 6 predictors Model Fit Predictive performance (compare to 12-predictor model!)

Summary Linear regression models are very popular tools, not only for explanatory modeling, but also for prediction A good predictive model has high predictive accuracy (to a useful practical level) Predictive models are built using a training data set, and evaluated on a separate validation data set Removing redundant predictors is key to achieving predictive accuracy and robustness Subset selection methods help find “good” candidate models. These should then be run and assessed.