A first order model with one binary and one quantitative predictor variable.

Slides:



Advertisements
Similar presentations
Multiple Regression and Model Building
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Qualitative predictor variables
Regression and correlation methods
Objectives 10.1 Simple linear regression
BA 275 Quantitative Business Methods
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Inference for Regression
Objectives (BPS chapter 24)
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Simple Linear Regression Estimates for single and mean responses.
Chapter 15 Multiple Regression. Regression Multiple Regression Model y =  0 +  1 x 1 +  2 x 2 + … +  p x p +  Multiple Regression Equation y = 
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
GRA 6020 Multivariate Statistics The regression model OLS Regression Ulf H. Olsson Professor of Statistics.
Announcements: Next Homework is on the Web –Due next Tuesday.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Linear Regression MARE 250 Dr. Jason Turner.
Simple Linear Regression Analysis
MARE 250 Dr. Jason Turner Correlation & Linear Regression.
Ch. 14: The Multiple Regression Model building
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
Logistic regression for binary response variables.
A (second-order) multiple regression model with interaction terms.
Inference for regression - Simple linear regression
Simple linear regression Linear regression with one predictor variable.
Understanding Multivariate Research Berry & Sanders.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 1 Slide © 2003 Thomson/South-Western Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination.
Chapter 14 Multiple Regression Models. 2  A general additive multiple regression model, which relates a dependent variable y to k predictor variables.
Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
1 Lecture 4 Main Tasks Today 1. Review of Lecture 3 2. Accuracy of the LS estimators 3. Significance Tests of the Parameters 4. Confidence Interval 5.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Copyright ©2011 Nelson Education Limited Linear Regression and Correlation CHAPTER 12.
Solutions to Tutorial 5 Problems Source Sum of Squares df Mean Square F-test Regression Residual Total ANOVA Table Variable.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Multiple regression. Example: Brain and body size predictive of intelligence? Sample of n = 38 college students Response (Y): intelligence based on the.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Inference with computer printouts. Coefficie nts Standard Errort StatP-value Lower 95% Upper 95% Intercept
Overview of our study of the multiple linear regression model Regression models with more than one slope parameter.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Logistic regression (when you have a binary response variable)
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
732G21/732G28/732A35 Lecture 4. Variance-covariance matrix for the regression coefficients 2.
Business Research Methods
Inference for  0 and 1 Confidence intervals and hypothesis tests.
Inference with Computer Printouts. Leaning Tower of Pisa Find a 90% confidence interval. Year Lean
Multicollinearity. Multicollinearity (or intercorrelation) exists when at least some of the predictor variables are correlated among themselves. In observational.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
Announcements There’s an in class exam one week from today (4/30). It will not include ANOVA or regression. On Thursday, I will list covered material and.
Chapter 14 Introduction to Multiple Regression
Simple Linear Regression
Presentation transcript:

A first order model with one binary and one quantitative predictor variable

Examples of binary predictor variables Gender (male, female) Smoking status (smoker, nonsmoker) Treatment (yes, no) Health status (diseased, healthy)

On average, do smoking mothers have babies with lower birth weight? Random sample of n = 32 births. y = birth weight of baby (in grams) x 1 = length of gestation (in weeks) x 2 = smoking status of mother (yes, no)

Coding the binary (two-group qualitative) predictor Using a (0,1) indicator variable. –x i2 = 1, if mother smokes –x i2 = 0, if mother does not smoke Other terms used: –dummy variable –binary variable

On average, do smoking mothers have babies with lower birth weight?

A first order model with one binary and one quantitative predictor where … y i is birth weight of baby i x i1 is length of gestation of baby i x i2 = 1, if mother smokes and x i2 = 0, if not and … the independent error terms  i follow a normal distribution with mean 0 and equal variance  2.

An indicator variable for 2 groups yields 2 response functions If mother is a smoker (x i2 = 1): If mother is a nonsmoker (x i2 = 0):

Interpretation of the regression coefficients represents the change in the mean response μ Y for each additional unit increase in the quantitative predictor x 1 … for both groups. represents how much higher (or lower) the mean response function for the second group is than the one for the first group… for any value of x 2.

The estimated regression function The regression equation is Weight = Gest Smoking

The regression equation is Weight = Gest Smoking Predictor Coef SE Coef T P Constant Gest Smoking S = R-Sq = 89.6% R-Sq(adj) = 88.9% A significant difference in mean birth weights for the two groups?

Why not instead fit two separate regression functions? One for the smokers and one for the nonsmokers?

Using indicator variable, fitting one function to 32 data points The regression equation is Weight = Gest Smoking Predictor Coef SE Coef T P Constant Gest Smoking S = R-Sq = 89.6% R-Sq(adj) = 88.9%

Using indicator variable, fitting one function to 32 data points Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI (2740.6, ) (2559.1, ) (2989.1, ) (2804.7, ) Values of Predictors for New Observations New Obs Gest Smoking

Fitting function to 16 nonsmokers The regression equation is Weight = Gest Predictor Coef SE Coef T P Constant Gest S = R-Sq = 91.5% R-Sq(adj) = 90.9%

Fitting function to 16 nonsmokers Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI (2990.3, ) (2811.3, ) Values of Predictors for New Observations New Obs Gest

Fitting function to 16 smokers The regression equation is Weight = Gest Predictor Coef SE Coef T P Constant Gest S = R-Sq = 87.4% R-Sq(adj) = 86.5%

Fitting function to 16 smokers Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI (2731.7, ) (2526.4, ) Values of Predictors for New Observations New Obs Gest

Summary table Model estimated using… SE(Gest) Length of CI for μ Y 32 data points9.128 (NS) (S) nonsmokers smokers

Reasons to “pool” the data and to fit one regression function Model assumes equal slopes for the groups and equal variances for all error terms. It makes sense to use all of the data to estimate these quantities. More degrees of freedom associated with MSE, so confidence intervals that are a function of MSE tend to be narrower.

How to answer the research question using one regression function? The regression equation is Weight = Gest Smoking Predictor Coef SE Coef T P Constant Gest Smoking S = R-Sq = 89.6% R-Sq(adj) = 88.9%

How to answer the research question using two regression functions? The regression equation is Weight = Gest Predictor Coef SE Coef T P Constant Gest Nonsmokers The regression equation is Weight = Gest Predictor Coef SE Coef T P Constant Gest Smokers

Reasons to “pool” the data and to fit one regression function It allows you to easily answer research questions concerning the binary predictor variable.

What if we instead tried to use two indicator variables? One variable for smokers and one variable for nonsmokers?

Definition of two indicator variables – one for each group Using a (0,1) indicator variable for nonsmokers –x i2 = 1, if mother smokes –x i2 = 0, if mother does not smoke Using a (0,1) indicator variable for smokers –x i3 = 1, if mother does not smoke –x i3 = 0, if mother smokes

The modified regression function with two binary predictors where … μ Y is mean birth weight for given predictors x i1 is length of gestation of baby i x i2 = 1, if smokes and x i2 = 0, if not x i3 = 1, if not smokes and x i3 = 0, if smokes

Implication on data analysis Regression Analysis: Weight versus Gest, x2*, x3* * x3* is highly correlated with other X variables * x3* has been removed from the equation The regression equation is Weight = Gest x2* Predictor Coef SE Coef T P Constant Gest x2* S = R-Sq = 89.6% R-Sq(adj) = 88.9%

To prevent problems with the data analysis A qualitative variable with c groups should be represented by c-1 indicator variables, each taking on values 0 and 1. –2 groups, 1 indicator variable –3 groups, 2 indicator variables –4 groups, 3 indicator variables –and so on…

What is the impact of using a different coding scheme? … such as (1, -1) coding?

The regression model defined using (1, -1) coding scheme where … y i is birth weight of baby i x i1 is length of gestation of baby i x i2 = 1, if mother smokes and x i2 = -1, if not and … the independent error terms  i follow a normal distribution with mean 0 and equal variance  2.

The regression model yields 2 different response functions If mother is a smoker (x i2 = 1): If mother is a nonsmoker (x i2 = -1):

Interpretation of the regression coefficients represents the change in the mean response μ Y for each additional unit increase in the quantitative predictor x 1 … for both groups. represents the “average” intercept represents how far each group is “offset” from the “average”

The estimated regression function The regression equation is Weight = Gest Smoking2

What is impact of using different coding scheme? Interpretation of regression coefficients changes. When reporting your results, make sure you explain what coding scheme was used! When interpreting others’ results, make sure you know what coding scheme was used!