1 Dummy Variables. 2 Topics for This Chapter 1. Intercept Dummy Variables 2. Slope Dummy Variables 3. Different Intercepts & Slopes 4. Testing Qualitative.

Slides:



Advertisements
Similar presentations
Dummy variables Hill et al chapter 9. Parameters that vary between observations Assumption MR1 The parameters are the same for all observations. k= the.
Advertisements

Topic 12: Multiple Linear Regression
Random Assignment Experiments
Using Indicator Variables
FIN822 Li11 Binary independent and dependent variables.
Lecture 3 (Ch4) Inferences
Qualitative Variables and
Econ 140 Lecture 151 Multiple Regression Applications Lecture 15.
Summary of previous lecture Introduction of dummy variable
Chapter 15, part D Qualitative Independent Variables.
Dependent Variable: INCOME Included observations: 30 Variable Coefficient Std. Error t-Statistic Prob. C AGE
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Simple Linear Regression
Regression with a Binary Dependent Variable. Introduction What determines whether a teenager takes up smoking? What determines if a job applicant is successful.
Functional Form, Scaling and Use of Dummy Variables Copyright © 2006 Pearson Addison-Wesley. All rights reserved
Economics 20 - Prof. Anderson
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
7 Dummy Variables Thus far, we have only considered variables with a QUANTITATIVE MEANING -ie: dollars, population, utility, etc. In this chapter we will.
Nonlinear Relationships Prepared by Vera Tabakova, East Carolina University.
Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17.
Treatment Effects: What works for Whom? Spyros Konstantopoulos Michigan State University.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Linear Regression and Correlation Analysis
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
7.1 Lecture #7 Studenmund(2006) Chapter 7 Objective: Applications of Dummy Independent Variables.
Lecture 23 Multiple Regression (Sections )
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
DUMMY VARIABLES BY HARUNA ISSAHAKU Haruna Issahaku.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Multiple Linear Regression Analysis
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©
1 MF-852 Financial Econometrics Lecture 9 Dummy Variables, Functional Form, Trends, and Tests for Structural Change Roy J. Epstein Fall 2003.
LESSON 5 Multiple Regression Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-1.
Lecture 3-3 Summarizing r relationships among variables © 1.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Statistics and Econometrics for Business II Fall 2014 Instructor: Maksym Obrizan Lecture notes III # 2. Advanced topics in OLS regression # 3. Working.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Multiple Linear Regression ● For k>1 number of explanatory variables. e.g.: – Exam grades as function of time devoted to study, as well as SAT scores.
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Dummy Variable Regression Models chapter ten.
Regression analysis and multiple regression: Here’s the beef* *Graphic kindly provided by Microsoft.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Seven.
7.4 DV’s and Groups Often it is desirous to know if two different groups follow the same or different regression functions -One way to test this is to.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Overview of Regression Analysis. Conditional Mean We all know what a mean or average is. E.g. The mean annual earnings for year old working males.
9.1 Chapter 9: Dummy Variables A Dummy Variable: is a variable that can take on only 2 possible values: yes, no up, down male, female union member, non-union.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
Chapter 12 Simple Regression Statistika.  Analisis regresi adalah analisis hubungan linear antar 2 variabel random yang mempunyai hub linear,  Variabel.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Multiple Regression Analysis with Qualitative Information
Using Indicator Variables
Multiple Regression Analysis and Model Building
Multiple Regression Analysis with Qualitative Information
Business Statistics, 4e by Ken Black
Nonlinear Relationships
Multiple Regression Analysis with Qualitative Information
Prepared by Lee Revere and John Large
Nonlinear Relationships
1/18/2019 ST3131, Lecture 1.
Chapter 8: DUMMY VARIABLE (D.V.) REGRESSION MODELS
Multiple Regression Analysis with Qualitative Information
Regression and Categorical Predictors
Business Statistics, 4e by Ken Black
Chapter 9 Dummy Variables Undergraduated Econometrics Page 1
Financial Econometrics Fin. 505
Presentation transcript:

1 Dummy Variables

2 Topics for This Chapter 1. Intercept Dummy Variables 2. Slope Dummy Variables 3. Different Intercepts & Slopes 4. Testing Qualitative Effects 5. Are Two Regressions Equal? 6. Interaction Effects

3 Dummy variables  Dummy variables, often called binary or dichotomous variables, are explanatory variables that only take two values, usually 0 and 1.  These simple variables are a very powerful tool for capturing qualitative characteristics of individuals, such as gender, race, geographic region of residence.  In general, we use dummy variables to describe any event that has only two possible outcomes.

4 Intercept Dummy Variables Dummy variables are binary (0,1) D t = 1 if red car, D t = 0 otherwise. y t =  1 +  2 X t +  3 D t + ε t y t = speed of car in miles per hour X t = age of car in years Police: red cars travel faster. H 0 :  3 = 0 H 1 :  3 > 0

5 y t =  1 +  2 x t +  3 D t + ε t red cars: y t = (  1 +  3 ) +  2 x t + ε t other cars: y t =  1 +  2 x t + ε t ytyt XtXt miles per hour age in years 0  1 +  3 11  2 +  3 22 red cars other cars

6 Slope Dummy Variables y t =  1 +  2 X t +  3 D t X t + ε t y t =  1 + (  2 +  3 )X t + ε t y t =  1 +  2 X t + ε t ytyt XtXt Value of portfolio years 0  2 +  3 11 22 stocks bonds Stock portfolio: D t = 1 Bond portfolio: D t = 0  1 = initial investment

7 Different Intercepts & Slopes y t =  1 +  2 X t +  3 D t +  4 D t X t + ε t y t = (  1 +  3 ) + (  2 +  4 )X t + ε t y t =  1 +  2 X t + ε t ytyt XtXt harvest weight of corn rainfall  2 +  4 11 22 Miracle regular Miracle seed: D t = 1 regular seed: D t = 0  1 +  3

8 y t =  1 +  2 X t +  3 D t + ε t 22  1 +  3 22 11 ytyt XtXt Men Women 0 y t =  1 +  2 X t + ε t For men  D t = 1. For women  D t = 0. years of experience y t = (  1 +  3 ) +  2 X t + ε t wage rate H 0 :  3 = 0 H 1 :  3 > 0.. Testing for discrimination in starting wage

9 y t =  1 +  5 X t +  6 D t X t + ε t 55  5 +  6 11 ytyt XtXt Men Women 0 y t =  1 + (  5 +  6 )X t + ε t y t =  1 +  5 X t + ε t For men D t = 1. For women D t = 0. Men and women have the same starting wage,  1, but their wage rates increase at different rates (different  6 ).  6 >  means that men’s wage rates are increasing faster than women's wage rates. years of experience wage rate

10 y t =  1 +  2 X t +  3 D t +  4 D t X t + ε t  1 +  3 11 22  2 +  4 ytyt XtXt Men Women 0 y t = (  1 +  3 ) + (  2 +  4 ) X t + ε t y t =  1 +  2 X t + ε t Women are given a higher starting wage,  1, while men get the lower starting wage,  1 +  3, (  3 < 0 ). But, men get a faster rate of increase in their wages,  2 +  4, which is higher than the rate of increase for women,  2, (since  4 > 0 ). years of experience An Ineffective Affirmative Action Plan women are started at a higher wage. Note: (  3 < 0 ) wage rate

11 Testing Qualitative Effects 1. Test for differences in intercept. 2. Test for differences in slope. 3. Test for differences in both intercept and slope.

12 H 0 :    vs  1 :    H 0 :    vs  1 :    YtYt  1  2 XtXt  3 DtDt  4 DtDt XtXt b   3 Est.Var bb 3 t T  4 b    4 Est.Var bb 4 t T  4 men: D t = 1 ; women: D t = 0 Testing for discrimination in starting wage. Testing for discrimination in wage increases. intercept slope  ε t  

13 Testing  H o :        H 1 : otherwise and SSE R   y t  b 1  b 2 X t  2 t  1 T  U  y t  b 1  b  X t  b  D t  b  D t X t  2 t  1 T   R  U  2 U  T  4   F  intercept and slope

14 The University Effect on House Prices  A real estate economist collects data on two similar neighborhoods, one bordering a large state university, and one that is a neighborhood about 3 miles from the university.  Records 1000 observations  Dependent Variable: House prices are given in $;  Independent Variables:  SQFT is the number of square feet of living area.  AGE are the house age (years)  UTOWN = 1 for homes near the university, 0 otherwise  USQFT = SQFT  UTOWN  POOL = 1 if a pool is present, 0 otherwise  FPLACE = 1 is a fireplace is present, 0 otherwise

15  We anticipate that all the coefficients in this model will be positive except, which is an estimate of the effect of age (or depreciation) on house price.  The model R-squared = and the overall-F statistic value is F= Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP UTOWN SQFT USQFT AGE POOL FPLACE

16 Based on these regression estimates, what do we conclude?  We estimate the location premium, for lots near the university, to be $27,453  We estimate the price per square foot to be $89.11 (= $ $12.994) for houses near the university, and $76.12 for houses in other areas.  We estimate that houses depreciate $ per year  We estimate that a pool increases the value of a home by $  We estimate that a fireplace increases the value of a home by $

17 Are Two Regressions Equal? y t =  1 +  2 X t +  3 D t +  4 D t X t + ε t I. Restricted versus Unrestricted Models men: D t = 1 ; women: D t = 0 H 0 :  3 =  4 = 0 vs. H 1 : otherwise y t = wage rateX t = years of experience Chow Test (there are two alternative ways)

18 y t =  1 +  2 X t + ε t II. Get SSE U separately y tm =  1 +  2 X tm + ε tm y tw =  1 +  2 X tw + ε tw Everyone: Men only: Women only: SSE R Forcing men and women to have same  1,  2. Allowing men and women to be different. SSE m SSE w where SSE U = SSE m + SSE w F = (SSE R  SSE U )/J SSE U /(T  K) J = # restrictions K=unrestricted coefs. (running three regressions) J = 2 K = 4

19 Interaction Variables 1. Interaction Dummies 2. Polynomial Terms (special case of continuous interaction) 3. Interaction Among Continuous Variables

20 Interactions Between Qualitative Factors  Suppose we are estimating a wage equation, in which an individual’s wages are explained as a function of their experience, skill, and other factors related to productivity.  It is customary to include dummy variables for race and gender in such equations.  Including just race and gender dummies will not capture interactions between these qualitative factors. Special wage treatment for being “white” and “male” is not captured by separate race and gender dummies.  To allow for such a possibility consider the following specification, where for simplicity we use only experience (EXP) as a productivity measure

21 Wage =  1 +  2 EXP +  1 RACE +  2 SEX +  (RACE  SEX) + ε where  1 measures the effect of race  2 measures the effect of gender  measures the effect of being “white” and “male.”

22 1. Interaction Dummies y t =  1 +  2 X t +  3 M t +  4 B t + ε t For men  M t = 1. For women  M t = 0. For black  B t = 1. For nonblack  B t = 0. No Interaction: wage gap assumed the same: y t =  1 +  2 X t +  3 M t +  4 B t +  5 M t B t + ε t Interaction: wage gap depends on race: Wage Gap between Men and Women y t = wage rate; X t = experience

23 2. Polynomial Terms y t =  1 +  2 X t +  3 X 2 t +  4 X 3 t + ε t Linear in parameters but nonlinear in variables: y t = income; X t = age Polynomial Regression ytyt X tX t People retire at different ages or not at all

24 y t =  1 +  2 X t +  3 X 2 t +  4 X 3 t + ε t y t = income; X t = age Polynomial Regression Rate income is changing as we age: ytyt XtXt =   3 X t + 3  4 X 2 t Slope changes as X t changes.

25 3. Continuous Interaction y t =  1 +  2 Z t +  3 B t +  4 Z t B t + ε t Exam grade = f(sleep: Z t, study time: B t ) Sleep and study time do not act independently. More study time will be more effective when combined with more sleep and less effective when combined with less sleep.

26 Your mind sorts things out while you sleep (when you have things to sort out.) y t =  1 +  2 Z t +  3 B t +  4 Z t B t + ε t Exam grade = f(sleep: Z t, study time: B t ) ytyt BtBt =  2 +  4 Z t Your studying is more effective with more sleep. ytyt ZtZt =  2 +  4 B t continuous interaction

27 y t =  1 +  2 Z t +  3 B t +  4 Z t B t + ε t Exam grade = f(sleep: Z t, study time: B t ) If Z t + B t = 24 hours, then B t = (24  Z t ) y t =  1 +  2 Z t +  3 (24  Z t ) +  4 Z t (24  Z t ) + ε t y t = (   3 ) + (  2   4 )Z t   4 Z 2 t + ε t y t =  1 +  2 Z t +  3 Z 2 t + ε t Sleep needed to maximize your exam grade: ytyt ZtZt =   3 Z t = 0 where  2 > 0 and  3 < 0  2  3 Z t =

28 Qualitative Variables with Several Categories  Many qualitative factors have more than two categories.  Examples are region of the country (North, South, East, West) and level of educational attainment (less than high school, high school, college, postgraduate).  For each category we create a separate binary dummy variable.  To illustrate, let us again use a wage equation as an example, and focus only on experience and level of educational attainment (as a proxy for skill) as explanatory variables.

29 Define dummies for educational attainment as follows: Specify the wage equation as Wage =  1 +  2 EXP +  1 E 1 +  2 E 2 +  3 E 3 + ε

30  First notice that we have not included all the dummy variables for educational attainment. Doing so would have created a model in which exact collinearity exists.  Since the educational categories are exhaustive, the sum of the education dummies is equal to 1. Thus the “intercept variable,” is an exact linear combination of the education dummies.  The usual solution to this problem is to omit one dummy variable, which defines a reference group, as we shall see by examining the regression function,

31   1 measures the expected wage differential between workers who have a high school diploma and those who do not.   2 measures the expected wage differential between workers who have a college degree and those who did not graduate from high school, and so on.

32  The omitted dummy variable, E 0, identifies those who did not graduate from high school. The coefficients of the dummy variables represent expected wage differentials relative to this group.  The intercept parameter  1 represents the base wage for a worker with no experience and no high school diploma.  Mathematically it does NOT matter which dummy variable is omitted, although the choice of E 0 is convenient in the example above.  If we are estimating an equation using geographic dummy variables, N, S, E and W, identifying regions of the country, the choice of which dummy variable to omit is arbitrary.