Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.

Slides:



Advertisements
Similar presentations
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Advertisements

Correlation and Linear Regression.
Correlation and Regression By Walden University Statsupport Team March 2011.
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Statistics for the Social Sciences
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 6: Multiple Regression
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #20.
Lecture 23 Multiple Regression (Sections )
Regression Diagnostics Checking Assumptions and Data.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Multiple Regression Dr. Andy Field.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi India
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Statistics for the Social Sciences Psychology 340 Fall 2013 Thursday, November 21 Review for Exam #4.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Lack of Fit (LOF) Test A formal F test for checking whether a specific type of regression function adequately fits the data.
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5)
1 1 Slide Simple Linear Regression Estimation and Residuals Chapter 14 BA 303 – Spring 2011.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 12 Multiple.
ANOVA, Regression and Multiple Regression March
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
Chapter 12: Correlation and Linear Regression 1.
Canadian Bioinformatics Workshops
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Multiple Regression.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Slides by JOHN LOUCKS St. Edward’s University.
CHAPTER 29: Multiple Regression*
Hypothesis testing and Estimation
Checking the data and assumptions before the final analysis.
Essentials of Statistics for Business and Economics (8e)
Model Adequacy Checking
Presentation transcript:

Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable and p independent variables.

Multiple Regression Model Y i is value of dependent variable for i-th unit. The values x i1, x i2, …, x ip are values of the independent variables. Z i is an unobservable error:

Objectives Estimate the regression coefficients β 0, β 1, …, β p. Estimate σ (crucial for tests). Test whether the regression coefficients β 1, …, β p are all simultaneously zero (note that the intercept was left out). Test whether some of the regression coefficients β q, …, β p are zero.

Assumptions for Multiple Regression Regression function is linear. Error terms are independent. Constant error variance. Distribution of errors is normal.

Context of your second project Artificial data set, available on web site. Each set is individual. –If you analyze the wrong data set, no credit! Three dependent variables. –Three separate sections of your report! Six independent variables. 500 data points with replicated observations.

Check Scatterplots Use scatterplot matrix to get a brief summary look. –Graphs, scatterplot, matrix. If Y vs x i is flat and patternless, then your interpretation is that the regression coefficient of x i is xero. Two of the dependent variables are random samples.

Table of regression coefficients Contains the OLS estimates. The line (constant) refers to β 0, the intercept. There is a line for each variable in the model that refers to β q, the partial regression coefficient (slope) of the q-th independent variable.

Table of regression coefficients Five columns of numbers Two are labeled “unstandardized coefficients” –B column contains the OLS estimates. –Std. Error contains the estimated standard deviation.

Table of regression coefficients One is the standardized coefficient. –Scale free coefficient often used in social science studies for comparison across studies. There is a column for t. –As usual, t=(B-0)/(se B). There is a column for sig. –Interpret as a p-value.

Interpretation There appears to be an association between an independent variable and the dependent variable if the observed significance level is small for that coefficient. Specify which variable has associations and the significant independent variables.

Refinement of Model Rerun regression using only those variables that appear to be significant. Usually, the database of a study has many variables that have no association with the dependent variable. Most clients prefer that these variables not be used. –There are some technical problems with this approach that are widely ignored.

Strategy of Stepwise Regression Let the computer do the work. In regression box, specify stepwise. The computer will see whether additional variables can be added or added variables deleted. There are three basic strategies: forward selection, backward selection, and stepwise.

Using Stepwise Regression Examine final model selected. Note which variables are included. Examine information for excluded variables. –Check whether there is any possibility that one of the variables left out might matter.

Checking the Model Residual plots. Diagnostics. Lack of Fit test.

Residual Plots Always plot unstandardized residuals against unstandardized predicted. Plot unstandardized residuals against each independent variable in model. If there is a time order to data, plot residuals in time order.

Diagnostics Check for outliers. Check for influential points. –Cook’s distance is useful. Deleting point with largest Cook’s distance causes the greatest change in the coefficients. Box plot of residuals. Q-Q plot of residuals.

Lack of Fit Test Need replicated points (same settings of independent variables with different runs determining dependent variable). Your data has replicated points. Design your studies so that you can do a lack of fit test.

Approximate Lack of Fit Test Statistics, Compare Means, One-way anova. Dependent variable is residuals from regression model that you think is correct. Independent variable is the second column of your data set. Click OK.

Interpretation of Approximate Lack of Fit Test If F test near one (observed significance level large), then the model that generated the residuals “appears to be adequate.” That is, there is no empirical reason to go on. If F test is larger than one (small observed significance level), model should be improved.

Theory behind Lack of Fit Test One way analysis of variance. Covered next class. Happy Thanksgiving.