Regression analysis and multiple regression: Here’s the beef* *Graphic kindly provided by Microsoft.

Slides:



Advertisements
Similar presentations
Multiple Regression and Model Building
Advertisements

Lesson 10: Linear Regression and Correlation
Here we add more independent variables to the regression.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Econ 140 Lecture 151 Multiple Regression Applications Lecture 15.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
Chapter 10 Simple Regression.
PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Linear Regression and Correlation Analysis
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Predictive Analysis in Marketing Research
Chapter 13 Introduction to Linear Regression and Correlation Analysis
An Introduction to Logistic Regression
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Leon-Guerrero and Frankfort-Nachmias,
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
Multiple Regression continued… STAT E-150 Statistical Methods.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Leedy and Ormrod Ch. 11 Gray Ch. 14
Chapter 8: Bivariate Regression and Correlation
Introduction to Linear Regression and Correlation Analysis
Selecting the Correct Statistical Test
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
Regression. Correlation and regression are closely related in use and in math. Correlation summarizes the relations b/t 2 variables. Regression is used.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Two Ending Sunday, September 9 (Note: You must go over these slides and complete every.
Understanding Regression Analysis Basics. Copyright © 2014 Pearson Education, Inc Learning Objectives To understand the basic concept of prediction.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Multiple Regression. Multiple Regression  Usually several variables influence the dependent variable  Example: income is influenced by years of education.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 13 Multiple Regression
Lecture 4 Introduction to Multiple Regression
Right Hand Side (Independent) Variables Ciaran S. Phibbs.
Chapter Eight: Using Statistics to Answer Questions.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Business Research Methods
Multiple Regression Analysis Regression analysis with two or more independent variables. Leads to an improvement.
Research Methodology Lecture No :26 (Hypothesis Testing – Relationship)
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Multiple Linear Regression An introduction, some assumptions, and then model reduction 1.
Multiple Regression Scott Hudson January 24, 2011.
Bivariate & Multivariate Regression Analysis
Multiple Regression.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Multiple Regression.
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Correlation and Simple Linear Regression
Correlation and Regression
Ass. Prof. Dr. Mogeeb Mosleh
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Regression III.
Regression Part II.
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Regression analysis and multiple regression: Here’s the beef* *Graphic kindly provided by Microsoft.

Generally, regression analysis is used with interval and ratio data Regression analysis is a method of determining the specific function relating y to x ===> Y=f(X) Not really cause and effect(s) but… how the independent variables combine to help predict the dependent variable Widely used in the social sciences Provides a value called R 2 (R-squared) which tells how well a set of variables explains a dependent variable

To explain means to reduce errors when predicting the dependent variables scores on the basis of information about the independent variables The regression results measure the direction and size of the effect of each variable on the dependent variable The form of the regression line is: Y=a+bX, where Y is the dependent variable, a is the intercept, b is the slope, and X is the independent variable

Regression analysis Examples: If we know Tara’s IQ, what can we say about her prospects of satisfactorily completing a university degree? Knowing Nancy’s prior voting record, can we make any informed guesses concerning her vote in the coming provincial election? Kendra is enrolled in a statistics course. If we know her score on the midterm exam, can we make a reasonably good estimate of her grade on the final?

forms of regression analysis, depending on the complexity of the relationships being studied. The simplest is known as linear regression. Assumes a perfect linear association between two variables. The straight line connecting points together is called the regression line.

The regression line, rarely, cuts through all points in a distribution (e.g., picture a scatterplot). As such, we can draw an approximate line showing the best possible linear representation of the several points. Recall geometry: a straight line on a graph can be represented by the equation:

To simplify our discussion, let’s start with an example of two variables that are usually perfectly related: monthly salary and yearly income. ===> Y=12X Let’s add one more factor to this linear relationship. Suppose that we received a Christmas bonus of $500 ===> Y=500+12X In the above income example, the slope of the line is 12, which means that Y changes by 12 for each change of one unit in X.

Example of linear regression If we are interested in exploring the relationship between SEI and EDUC using linear regression we would do the following First, assign SEI as our dependent variable and EDUC as our independent variable Run SPSS using Analyze-->Regression--> Linear Interpret the output -- look only at R 2 and the unstandardized coefficients and their associated levels of significance

Taking the unstandardized B (beta) coefficients for the constant and the variable EDUC gives us the following regression equation: SEI = (EDUC*3.917) For example, the predicted SEI for someone with 18 years of education is: SEI = (18*3.917) = 66.2

Let’s look at Pearson’s r for SEI and EDUC

The scatterplot shows the following relationship

Multiple regression example If we believe that variables other than EDUC influenced SEI we could bring them in to the model using stepwise multiple regression. Let’s consider the influence of EDUC, AGE, and SEX. Now remember… we can only use interval/ratio variables in regression, and SEX is nominal. To get around this we need to use dummy variable re-coding for SEX.

Since SEX is coded 1=male and 2=female in the GSS, and we believe a priori that being male confers status advantages, we will code for “maleness.” We want to recode so that male=1 and female=0. This allows is to assume that male=100% male and female=0% male. Use Transform-->Recode-->Into different variables to create a new variable called SEX2 Run the regression by Analyze-->Regression -->Linear (make sure that method=stepwise)

Model 1 (EDUC) SEI = (EDUC*3.919) Model 2 (EDUC and AGE) SEI = (EDUC*4.017) + (AGE*.123) Model 3 (EDUC, AGE, and SEX2) SEI = (EDUC*4.000) + (AGE*.124) + (SEX2*1.819)

Example from Model 3 What is the predicted SEI score for a 40 year old woman with 13 years of education? SEI = (13*4.000) + (40*.124) + (0*1.819) SEI = What is the predicted SEI score for a 25 year old man with 18 years of education? SEI = (18*4.000) + (25*.124) + (1*1.819) SEI = 65.09

Multiple regression Viewed as a plane rather than a line.

There are several assumptions associated with using a multiple regression model: linearity equal variance: variation around the regression line is constant (known as homoscedastic) normality: errors are normally distributed independence: different errors are sampled independently. Multicollinearity occurs when independent variables are highly correlated (usually over.80).

Dummy regression analysis Multiple regression accommodates several quantitative independent variables, but frequently independent variables of interest are qualitative. Dummy variable regressors permit the effects of qualitative independent variables to be incorporated into a regression equation. Suppose that, along with a quantitative independent variable X there is a two-category (dichotomous) independent variable thought to influence the dependent variable Y. For example, if Y is income, X may be years of education and the qualitative independent variable may be gender.

Dummy variable coding a polytomous independent variable When a qualitative independent variable has several categories (polytomous), its effects can be captured by coding a set of dummy regressor. A variable with m categories gives rise to m-1 dummy variables. For example, to add region effects to a regression in which income is the dependent variable and education and labour-force experience are quantitative independent variables:

Dummy regressors Region D1 D2 D3 D4 East Quebec Ontario Prarie B.C.* * arbitrary reference or baseline category Thus the model represents 5 parallel regression planes, one for each region.

Diagnosing and correcting problems in regression Collinearity: When there is a perfect linear relationship among the independent variables in a regression, the least-squares regression coefficients are not uniquely defined. Strong, but less than perfect, collinearity (sometimes called multicollinearity) doesn’t prevent the least-squares coefficients from being calculated, but makes them unstable: coefficient standard errors are big; small changes in the data (due even to rounding errors) can cause large changes in the regression coefficients.

The variance inflation factor (VIF) measures the extent to which collinearity affects sampling variance. The VIF is at a minimum (1) when R 2 =0 and at a maximum (infinity) when R 2 =1 Caveat: The VIF is not very useful when an effect is spread over several degrees of freedom. Collinearity is a data problem; it does not imply that model is wrong, only that the data are incapable of providing good estimates of model parameters. If, for example, X 1 and X 2 are perfectly correlated in a set of data, it’s impossible to separate their effects.

There are, however, several strategies for coping with collinear data Give up: This is an honest, if unsatisfying answer. Collect new data Reconsider the model: perhaps X1 and X2 are better conceived as alternative measures of the same construct, in which case their high correlation is indicative of high reliability. Get rid of one of them or combine them in some manner (index)