# Lecture 3-4 Summarizing relationships among variables ©

## Presentation on theme: "Lecture 3-4 Summarizing relationships among variables ©"— Presentation transcript:

Lecture 3-4 Summarizing relationships among variables ©

Biases in OLS coefficients 4Suppose you are interested in estimating the effects of education on wage. If your data contain IQ, then you can estimate: Model 1: (wage)=β 0 +β 1 (education)+β 2 (experience)+ β 3 (IQ) If your data do not have IQ, then you can only estimate: Model 2: (wage)=β 0 +β 1 (education)+β 2 (experience) IQ not included in the model

4 By using the `Returns on Education 2 data set the estimated models are: Model 1: Wage= -539.4 + 58.1(education)+17.4(experience)+5.6(IQ) Model 2: Wage= -272.5+ 76.2(education)+ 17.6(experience) Thus, the effect of education appear to be much larger for model 2.

4The reason why the effect of education appear to be higher for model 2 is that model 2 is suffering from `omitted variable biases. When there is an omitted variable that affects both education and wage directly, the estimated effect of education will be biased.

Model 2: (wage)=β 0 +β 1 (education)+β 2 (experience) IQ (+)positively affects education (+)positively affects wage Since IQ affects education positively, and at the same time, affects wage positively, β 1 captures the mixed effects of education and IQ on wage. In this case, β 1 is biased upward (i.e., β 1 overstates the true effects of education)

4In general, suppose that there is an omitted variable Z, that affects both X and Y. β 1 is biased upward if Y=β 0 +β 1 X or Y= β 0 +β 1 X β 1 is biased downward if Y=β 0 +β 1 X or Y=β 0 +β 1 X Z (+) Z ( ) Z (+) ( ) Z

4When a coefficient suffers from the `omitted bias problem, the coefficient does not show the causal effects. 4There are many other situations where a coefficient can be biased. 4Many econometric techniques focus on eliminating these biases (in order to estimate causal effects).

1. Panel Data Introduction 4Panel Data is a data set that contains repeated observations over time. 4Panel data is often used by researchers to extract the `causal effect of one variable on another variable. 4The purpose of this lecture, however, is to familiarize you with this form of data.

Panel Data -Example- 4Open Panel Data Exercise. 4This data set contains production data of several construction companies for the period between 1990 and 1997. Production for each company is measured by the total material moved in tones. Employment is measured by the number of persons employed. Equipment is measured by the sum of engine powers for all the equipment used.

Panel Data -Example- 4Notice that for each company, observations are collected for several years: you have repeated observations for the same company over time. This is an example of a panel data. 4Suppose you would like to know how many employees you have to hire in order to achieve a certain level of production. 4The simplest model would be: (Production)=β 0 +β 1 (Employment)+β 2 (Equipment)

Panel Data -Example- 4However, when we use panel data, we consider the year effects as well. 4Year effect refers to the aggregate effect of unobserved factors that affect production of all the companies equally in a particular year. For example, the government may have relaxed the requirements for environmental regulation for the construction industry in a particular year. Then, such a policy would affect the production of all the construction companies equally. Next Slide

Panel Data -Example: Year effect- 4If such a change in governmental regulation is not observed by the analysts and if we (as data analysts) do not take such an unobserved factor into consideration, we may mistakenly attribute such year effects to employment or equipment. This may give an inflated (or deflated) image of the effects of employment or equipment on the production level. Next Slide

Panel Data -Incorporating year effects in the model- 4The simplest way to incorporate the year effects in the model is to incorporate year dummy variables in the model. 4Often year dummy variables are called year dummies. 4The following slides show how to construct year dummy variables.

Panel Data -Constructing year dummy variables- 4We take the Panel Data exercise: Data A as an example. This panel data covers the period between 1990 and 1999. Then for each year, except the first year in the data, you construct the dummy variable in the way described in the box.

Panel Data -Incorporating year dummy variables in the model- 4After constructing the year dummies, we can incorporate these dummy variables in the model in the following way: (Production)=β 0 +β 1 (Employment)+β 2 (Equipment) +β 3 Year91+ β 4 Year92+ β 5 Year93+ β 6 Year94+ β 7 Year95+ β 8 Year96+ β 9 Year97+ β 10 Year98+ β 11 Year99

Year dummies: exercise 4Use Panel Data Exercise Data A to construct the year dummy variables.

More exercises 4Exercise 1. Use the data you constructed in the previous exercise, estimate the effect of employment and equipment on the production level using the following model. Make sure to incorporate year dummy variables in your model. (Production)=β 0 +β 1 (Employment)+β 2 (Equipment)+β 3 Ye ar91+ β 4 Year92+ β 5 Year93+ β 6 Year94+ β 7 Year95+ β 8 Year96+ β 9 Year97+ β 10 Year98+ β 11 Year99

More exercises 4Exercise 2: Using the results of exercise 1 answer the following questions. Exercise 2-1: If a firm hires 600 workers and use the equipment equal to 4000, what would be the expected production of the firm. Assume that the year effect is equal to the year effect of 1998. (For this type of question use all the coefficients, even if some of them are not statistically significant.) Exercise 2-2: Suppose that the firm is using equipment equal to 5000. If the firm would like to achieve 7000 tones of production, how many workers does it have to hire? Assume that the year effect is the same as the year effect of 1998.

Notes about year dummy variables 4When you use panel data, construct year dummy variables except the first year. (More precisely speaking, there must be at least one year for which you do not use year dummy.) 4If you include a year dummy for all the years, including the first year, you will have a problem called perfect multi-colinearity. If this happens, OLS regression procedure will not work anymore. (Excel will automatically drop one year dummy.)

2.Policy analysis using panel data 4Regression analysis is widely used for policy analysis. 4Examples of policy analysis include the analysis of: Effect of governmental subsidies on small-medium enterprises, on the growth of these enterprises. Effect of job training on the wage of workers. Effect of changing the package of a product on the revenue from the product. Effect of changing the compensation scheme on the productivity of firms.

Example: The effect of changing the compensation scheme on the productivity 4We continue using the Panel Data Exercise data set. 4Some of the construction companies in the data set began to introduce a new compensation scheme called productivity bonus. The productivity bonus is tied to the amount of production (i.e., The company pays \$0.003 for each tone of material moved, etc). 4We would like to see if the productivity bonus scheme has increased the productivity of these companies, and if so by how much.

Example: The effect of changing the compensation scheme on the productivity, contd 4The simplest way to evaluate the effect of productivity bonus is to incorporate a dummy variable for productivity bonus. We can construct a dummy variable for productivity bonus in the following way. (Productivity bonus dummy)=1 if productivity bonus exists. =0 if productivity bonus does not exists. Such a dummy variable is often called the policy dummy variable since this dummy variable shows if a particular policy (compensation scheme in this example) exists or not.

Example: The effect of changing the compensation scheme on the productivity, contd 4Open the data Panel Data Exercise, Data C. This data contain the productivity bonus dummy. 4Notice that from 1993 some of the companies began to introduce the productivity bonus scheme. At the end of the sample period (year 1999), productivity bonus has become fairly prevalent. (6 out of 13 firms are using the productivity bonus)

Example: The effect of changing the compensation scheme on the productivity 4Estimate the effect of the productivity bonus on the production by estimating the following model: (Production)=β 0 +β 1 (Employment)+β 2 (Equipment) +β 3 (Productivity Bonus Dummy) +β 4 (Year91) +β 5 (Year92) +β 6 (Year93) +β 7 (Year94) +β 8 (Year95) +β 9 (Year96) +β 10 (Year97) +β 11 (Year98) +β 12 (Year99)

Summary for policy analysis using panel data 4Construct a policy dummy variable (productivity bonus dummy for our example) 4Construct year dummies for all years except the first year. 4Estimate a model including the policy dummy variable and year dummies. The coefficient for the policy dummy variable can be interpreted as the effect of the policy.