Lecture slides by Todd Gormley

Lecture slides by Todd Gormley
Common Errors: How to (and Not to) Control for Unobserved Heterogeneity Lecture slides by Todd Gormley

What are these slides? The following slides are a combination of lecture slides used by Todd Gormley in his Ph.D. course on “Empirical Methods in Corporate Finance” at The Wharton School For more details about the issues discussed in these slides, please see the below article Gormley, T. and D. Matsa, 2014, “Common Errors: How to (and Not to) Control for Unobserved Heterogeneity,” Review of Financial Studies 27(2):

Motivation [Part 1] Controlling for unobserved heterogeneity is a fundamental challenge in empirical finance Unobservable factors affect corporate policies and prices These factors may be correlated with variables of interest Important sources of unobserved heterogeneity are often common across groups of observations Demand shocks across firms in an industry, differences in local economic environments, etc. Notes E.g. basic corporate policies regarding financing or investment will depend on factors that are inherently unobservable like cost of capital, managerial quality, etc. Failing to control for these group-level heterogeneities can cause serious identification challenges.

Motivation [Part 2] E.g. consider a the firm-level estimation
where leverage is debt/assets for firm i, operating in industry j in year t, and profit is the firms net income/assets What might be some unobservable omitted variables in this estimation?

Motivation [Part 3] Oh, there are so, so many…
Managerial talent and/or risk aversion Cost of capital Industry supply and/or demand shock Regional demand shocks And so on… Easy to think of ways these might be affect leverage and be correlated with profits Sadly, this is easy to do with other dependent or independent variables… Notes More talented managers might have more profits and higher leverage (because they take advantage of tax shields) More risk averse managers may have less profits (because they don’t take sufficient risk) and lower leverage An adverse industry shock might dampen profits and affect firm’s access to capital (and debt) A lower cost of capital might cause firms to take on more leverage, but us also correlated with profits

Panel data to the rescue…
Thankfully, panel data can help us with a particular type of unobserved variable… What type of unobserved variable does panel data help us with, and why? Answer = It helps with any unobserved variable that doesn’t vary within groups of observations

Outline for lecture Panel data and fixed effects (FE)
How not to control for unobserved heterogeneity General implications Benefits and limitations of FE model Estimating high-dimensional FE models

Panel data Panel data = whenever you have multiple observations per unit of observation i (e.g. you observe each firm over multiple years) Let’s assume N units i And, J observations per unit i [i.e. balanced panel] E.g., You observe 5,000 firms in Compustat over a twenty year period [i.e. N=5,000, J=20]

The underlying model [Part 1]
When unobserved heterogeneity is thought to be present, researcher implicitly assumes the following: i indexes groups of observations (e.g. industry); j indexes observations within each group (e.g. firm) yi,j = dependent variable Xi,j = independent variable of interest fi = unobserved group heterogeneity = error term Notes The above model can easily be augmented to reflect more complicated sources of unobserved heterogeneity without affecting our subsequent analysis. Time-varying omitted factors (such as industry shocks that vary over time) can be captured by adding an additional subscript t to each variable. And, a model with two types of unobserved heterogeneity, such as firm and year group effects in panel data, can be captured by adding a second type of unobserved heterogeneity to the model. The intuition for our subsequent findings carry through to these more complicated models; so, we just stick with this simple framework. 9

The following standard assumptions are made: N groups, J observations per group, where J is small and N is large X and ε are i.i.d. across groups, but not necessarily i.i.d. within groups Notes If the variables were not mean zero, this would only affect the bias on the constant term in the regression. The mean-zero assumptions only simplifies the analysis and has no effect on the estimate of β under the different estimation techniques we analyze. Simplifies some expressions, but doesn’t change any results 10

Finally, the following assumptions are made: Notes Usually, we’d just assume x(I,t) uncorrleated with epsilon(I,t). But here, we assume that an x is uncorrelated with all other errors in the group. It’s stronger than what we’d typically assume, and it is called ‘strict exogeneity’ It is this correlation between X and f that makes this a fixed effects problem and not just random effects. Because there exists nonzero covariance, , between the unobserved group heterogeneity, f, and the independent variable of interest, failing to account for the unobserved heterogeneity, f, causes an omitted variable problem. We assume the unobserved heterogeneity is the only omitted variable; that is, neither f nor the independent variable of interest, X, covary with the residual What do these imply? Answer = Model is correct in that if we can control for f, we’ll properly identify effect of X; but if we don’t control for f there will be omitted variable bias Source of identification concern 11

OLS estimate of β is inconsistent
By failing to control for group effect, fi, OLS suffers from omitted variable bias True model is: But OLS estimates: Notes The bias represents the standard omitted variable bias in a univariate regression—it equals the coefficient from a regression of the omitted variable, f, onto the included variable, X, multiplied by the coefficient on the omitted variable in the true model (which in this case is 1) It’s an omitted variable problem because OLS fails to control the group effect which is correlated with both X and Y… i.e. the cov(X,u) is not zero. Because of this, the estimate of beta will conflate the effects of X and f on the dependent variable Y. Sign of bias will depend on the sign of the correlation between X and f. Alternative estimation strategies are required… 12

Can solve this by transforming data
First, notice that if you take the population mean of the dependent variable for each unit of observation, i, you get… where Again, I assumed there are J obs. per unit i

Transforming data [Part 2]
Now, if we subtract from , we have And look! The unobserved variable, fi , is gone (as is the constant) because it is group-invariant With our earlier assumptions, easy to see that is uncorrelated with the new disturbance, , which means… ?

Fixed effects (or within) estimator
Answer: OLS estimation of transformed model will yield a consistent estimate of β The prior transformation is called the “within transformation” because it demeans all variables within their group This is also called the FE estimator

Least Squares Dummy Variable (LSDV)
Another way to do the FE estimation is by adding indicator (dummy) variables I.e. create a dummy variable for each group i, and add it to the regression This is least squares dummy variable model Now, our estimation equation exactly matches the true underlying model

LSDV versus FE [Part 1] Why do both approaches work? Well…
Frisch-Waugh-Lovell Theorem shows us there are two ways to estimate the below β1… Estimate directly; i.e. regress y onto both x and z OR we can just partial z out from both y and x before regressing y on x (i.e. regress residuals from regression of y on z onto residuals from regression of x on z)

LSDV versus FE [Part 2] Can show that LSDV and within-transformation of FE are identical because demeaned variables of within regression are the residuals from a regression onto group dummies!

Other approaches… Gormley and Matsa (RFS 2014) notes that existing literature uses various other strategies to control for unobserved group-level heterogeneity… Their questions – How do each of the approaches differ? And, when are they consistent? Their answer – Some popular strategies can distort inferences and should not be used; FE estimator should be used instead

They focus on two popular strategies
“Adjusted-Y” (AdjY) – dependent variable is demeaned within groups [e.g. ‘industry-adjust’] “Average effects” (AvgE) – uses group mean of dependent variable as control [e.g. ‘state-year’ control] Notes AdjY and AvgE are labels given by Gormley and Matsa (RFS 2014). Industry-adjusting is common example of AdjY; it is used to remove common industry factors in firm-level analysis AvgE is commonly implemented to control for time-varying differences in local economic environments

AdjY & AvgE are widely used
In Journal of Finance, Journal of Financial Economics, and Review of Financial Studies Used since at least the late 1980s Still used, 60+ papers published in Variety of subfields; asset pricing, banking, capital structure, governance, M&A, etc. Also been used in papers published in the American Economic Review, Journal of Political Economy, and Quarterly Journal of Economics Notes The exact origin of the two estimators in finance is unclear, but one potential source is the event-studies literature where stock returns are regressed on the average market return to construct market-adjusted returns. This approach is qualitatively similar in that it uses a group mean—the average market return for a given period—to account for a potential group effect. In this regard, event studies’ analysis of market-adjusted returns is similar to AdjY, and estimation of the market model is similar to AvgE. An important difference in the event studies literature, however, is that the underlying statistical model assumes that each return is a function of the average market return whereas the statistical model underlying unobserved group-level heterogeneity does not.

But, AdjY and AvgE are inconsistent
As Gormley and Matsa (RFS 2014) shows… Both can be more biased than OLS Both can get opposite sign as true coefficient In practice, bias is likely and trying to predict its sign or magnitude will typically impractical

More implications of GM (RFS 2014)
Other, related strategies should also not be used “Characteristically-adjusted” stock returns in AP “Adjusted” stock returns when trying to estimate firms’ internal value of cash Simple comparisons of benchmark-adjusted outcomes before & after events (like M&A) “Diversification discount” Using group average of an independent variable as instrumental variable Now, let’s see why… Notes Basically, they show that a key difference between AdjY and AvgE from that of FE is that they only transform the dependent variable to remove unobserved heterogeneity, while FE is equivalent to transforming both the dependent and independent variables. From this perspective, it is easy to see that other more complicated strategies used to transform the dependent variable (such as subtracting the mean or median of comparable firms) will be inconsistent as well. Even simple comparisons of ‘industry-adjusted’ means before and after events– as is common in analysis of corporate control transactions– will not reveal the causal effect of the event. 24

Adjusted-Y (AdjY) Tries to remove unobserved group heterogeneity by demeaning the dependent variable within groups AdjY estimates: Notes When this adjustment is applied at the industry or industry-year level, researches typically refer to the dependent variable as being “industry-adjusted.” In some cases, the authors do not exclude the observation at hand, and in other cases, the median is used. This isn’t key for any of the subsequent findings. where Note: Researchers often exclude observation at hand when calculating group mean or use a group median, but both modifications will yield similarly inconsistent estimates 25

Example AdjY estimation
One example – firm value regression: = Tobin’s Q for firm j, industry i, year t = mean of Tobin’s Q for industry i in year t Xi,j,t = vector of variables thought to affect value Researchers might also include firm & year FE Notes 1. Again, this type of adjustment for unobserved industry factors in AdjY is referred to as industry-adjusted analysis. Anyone know why AdjY is going to be inconsistent? 26

Here is why… Rewriting the group mean, we have:
Therefore, AdjY transforms the true data to: What is the AdjY estimation forgetting? 27

AdjY has omitted variable bias
can be inconsistent when By failing to control for , AdjY suffers from omitted variable bias when True model: But, AdjY estimates: Notes In practice, a positive covariance between X and its group mean is commonplace, leading to inconsistent AdjY estimates. For example, consider a standard firm-level capital structure estimation, where leverage is regressed onto multiple independent variables—such as return on assets, bankruptcy risk, and market-to-book ratio. Because firms in the same industry are subject to common demand and technology shocks, their leverage, return on assets, bankruptcy risk, and the other regressors will positively co-vary with their industry averages. Moreover, it can be shown that this covariance will be non-zero whenever group averages differ!!! In practice, a positive covariance between X and will be very common 28

Further analysis of AdjY estimate
Bias doesn’t disappear as group size J increases Can be inconsistent even when OLS is not; this happens when σXf = 0 and Bias is more complicated with two variables… Notes The bias in the adjusted-Y estimation is present even with very large groups. Because the AdjY estimator suffers from an omitted variable bias, increasing group size does not eliminate the identification problem—the estimation’s error term still contains the omitted group average of the independent variable. Even when there isn’t any observed heterogeneity (and OLS is consistent), AdjY introduces a new omitted variable problem in its attempt to control for a nonexistent omitted variable problem in the original OLS specification. Other Notes In practice, the bias of AdjY is typically attenuating (in the one variable case) because it is unusual for the X’s to be negatively correlated with their group mean when there is also a group component, f, that has nonzero correlation with X (as in Equation (1)). Specifically, the covariance matrix for the underlying data structure implied by Equation (1) is positive definite only if certain conditions hold. More often than not, you’ll need a positive covariance to satisfy this condition. See paper for more details.

AdjY estimates with 2 variables
Suppose, there are instead two RHS variables Use same assumptions as before, but add: True model: Notes First two assumptions are standard: Z is does not co-vary with error term or error of others in group and it has some variance and is mean zero. Then, we allow there to be some correlation between X and Z And, we assume Z and f may be correlated, which is yet another reason to control for group effects

AdjY estimates with 2 variables [Part 2]
With a bit of algebra, it is shown that: Notes Determining size and magnitude of bias will be generally difficult because it depends on so many parameters, some of which are not observable (gamma and beta) The bias becomes more complicated because the omitted variable now includes the group mean of X and the group mean of Z which can each be correlated with X and Z. Again, such correlations are commonplace in practice. Other Notes The bias in Proposition 2 can be verified by applying the multivariate generalization of the omitted variable bias formula. See Section of Greene (2002) and page 61, footnote 14 of Angrist and Pischke (2009) for details. Estimates of both β and γ can be inconsistent Determining sign and magnitude of bias will typically be difficult

Average effects (AvgE)
AvgE uses group mean of dependent variable as control for unobserved heterogeneity AvgE estimates: 32

Example AvgE estimation
Following profit regression is an AvgE example: ROAs,t = mean of ROA for state s in year t Xi,s,t = vector of variables thought to profits Researchers might also include firm & year FE Anyone know why AvgE is going to be inconsistent? 33

Average effects (AvgE)
AvgE uses group mean of dependent variable as control for unobserved heterogeneity AvgE estimates: Recall, true model: Problem is that measures fi with error 34

AvgE has measurement error bias
Recall that group mean is given by Therefore, measures fi with error As is well known, even classical measurement error causes all estimated coefficients to be inconsistent Bias here is complicated because error can be correlated with both mismeasured variable, , and with Xi,j when 35

AvgE estimate of β with one variable
With a bit of algebra, it is shown that: Determining magnitude and direction of bias is difficult Covariance between X and again problematic, but not needed for AvgE estimate to be inconsistent Notes Similar to AvgE, the AdjY estimator will be inconsistent when there is correlation among between x_bar and x. For example, in a firm-level AvgE estimation where a researcher uses the state-year mean to control for time-varying differences in local economic environments, any correlation between the independent variable and state averages of firms would cause AvgE to be inconsistent. Such correlations are likely to exist since firms located in the same state are subject to similar local demand shocks. In fact, as shown in our paper, such a correlation will exist whenever the average of the independent variable varies across groups. However, that covariance is no longer necessary. Just a correlation between X and f is enough. Even non-i.i.d. nature of errors can affect bias!

How common will the bias be?
First, we look at when by separating Xi,j into it’s group and idiosyncratic components Idiosyncratic component distributed with mean 0 and variance Assume group means are i.i.d. with mean zero and variance Notes Each group has a mean, xbar, and some idiosyncratic component, w(I,j) For simplicity, the mean of both components is equal to zero, and they don’t covary. And, assume

AdjY and AvgE bias very common
Both AdjY and AvgE biased when But with prior setup, we can show that… Or, bias whenever observations within groups are not independent! Notes If you think about this, this is almost assuredly true in most finance applications. E.g. observations will have different means across industries, geographies, and other groupings. And, non-independence within groups is also very common. Any group with multiple observations over time will have this if there is any serial correlation across time. We solve this covariance in the case where researchers exclude the observation at hand… This is the most common approach. The covariance is more complicated (but very similar) when the observation at hand isn’t excluded because it has an added term because of the mechanical correlation caused by leaving the observation in the group mean. Bias whenever different means across groups! * Solved excluding observation at hand (most common approach)

Analytical comparisons
Next, we use analytical solutions to compare relative performance of OLS, AdjY, and AvgE To do this, we re-express solutions… We use correlations (e.g. solve bias in terms of correlation between X and f, , instead of ) We also assume i.i.d. errors [just makes bias of AvgE less complicated]

ρXf has large effect on performance
(from Figure 1A) AdjY more biased than OLS, except for large values for ρXf Estimate, AvgE worst for low correlations, best for high OLS True β = 1 AdjY Notes Unlike OLS, performance of AdjY is unaffected by corr(x,f). But, it is only better than OLS when this correlation is very large. AvgE bias, in contrast, is nonlinear in corr(x,f). Under the chosen parameters, AvgE is very biased for low correlations, but better at very high correlations. Default parameters, when not being varied, are corr(x,f) = 0.25, corr(xi,x-i)=.5, J= 10, sigma(e)/sigma(x)=sigma(f)/sigma(x) = 1 Other parameters held constant AvgE

Relative variation across groups key
(from Figure 1B) Estimate, OLS AvgE Notes OLS is unaffected by variation in group means, whereas AvgE and AdjY are. As shown earlier, this is because both AdjY and AvgE fail to account for X_bar , while OLS instead fails to account for the unobserved heterogeneity, fi. Bias gets worse as variation in group means increases AdjY

More observations need not help!
(from Figure 1F) Estimate, OLS AvgE Notes OLS and AdjY are unaffected by observations per group b/c omitted variable is independent of J. Bias in AvgE is affected because higher J reduces noise of measurement error, but under these parameters, more observations actually increases the bias! AvgE just asymptotes to a biased coefficient (as the measurement error asymptotes to a non-zero value) that need not be closer to true parameter. Other Notes 1. The bias for an AdjY estimator that does not exclude the observation at hand when demeaning the dependent variable, however, will be affected by J because of the diminishing mechanical correlation that occurs between X and X_bar as J increases AdjY J

Summary of OLS, AdjY, and AvgE
In general, all three estimators are inconsistent in presence of unobserved group heterogeneity AdjY and AvgE may not be an improvement over OLS; depends on various parameter values AdjY and AvgE can yield estimates with opposite sign of the true coefficient

Comparing FE, AdjY, and AvgE
To estimate effect of X on Y controlling for Z One could regress Y onto both X and Z… Or, regress residuals from regression of Y on Z onto residuals from regression of X on Z Add group FE Within-group transformation! AdjY and AvgE aren’t the same as finding the effect of X on Y controlling for Z because... AdjY only partials Z out from Y AvgE uses fitted values of Y on Z as control

The differences matter! Example #1
Consider the following capital structure regression: (D/A)it = book leverage for firm i, year t Xi,t = vector of variables thought to affect leverage fi = firm fixed effect We now run this regression for each approach to deal with firm fixed effects, using data, winsorizing at 1% tails… 45

Estimates vary considerably
(from Table 2) 46

Consider the following firm value regression: Q = Tobin’s Q for firm i, industry j, year t Xi,j,t = vector of variables thought to affect value fj,t = industry-year fixed effect We now run this regression for each approach to deal with industry-year fixed effects… Notes 1. Again, this type of adjustment for unobserved industry factors in AdjY is referred to as industry-adjusted analysis. 47

(from Table 4) Notes As in the prior examples, the various techniques lead to significantly different estimates of β. These differences indicate the presence of unobserved industry-year heterogeneities and correlations across observations within a given industry-year that will cause OLS, AdjY, and AvgE to provide inconsistent estimates. It is apparent that analyzing industry-adjusted data rather than using industry-year FE can distort inference. For the coefficient on Delaware incorporation, the AdjY and AvgE estimates are statistically insignificant and considerably smaller than the statistically significant OLS and FE estimates. A researcher relying on AdjY or AvgE would conclude that incorporation in Delaware is uncorrelated with firm value after accounting for industry trends, when the opposite is true. 48

It also matters in literature on antitakeover laws Past papers used AvgE to control for unobserved, time-varying differences across states & industries Gormley and Matsa (2014) show that properly using industry-year, state-year, and firm FE estimator changes estimates considerably E.g., using this framework, they show that managers have an underlying preference to “Play it Safe” For details, see 49

General implications With this framework, easy to see that other commonly used estimators will be biased AdjY-type estimators in M&A, asset pricing, etc. AvgE-type instrumental variables

Other AdjY estimators are problematic
Same problem arises with other AdjY estimators Subtracting off median or value-weighted mean Subtracting off mean of matched control sample [as is customary in studies if diversification “discount”] Comparing industry-adjusted means for treated firms pre- versus post-event [as often done in M&A studies] Characteristically adjusted returns [as used in asset pricing] Notes Even the simple pre- versus post-treatment AdjY comparison does not reveal the true effect of the event being analyzed. This is because the comparison can incorrectly remove part of the treatment’s effect on the dependent variable when it demeans the data using an industry average that might include both treated and untreated firms

AdjY-type estimators in asset pricing
Common to sort and compare stock returns across portfolios based on a variable thought to affect returns But, returns are often first “characteristically adjusted” I.e. researcher subtracts the average return of a benchmark portfolio containing stocks of similar characteristics This is equivalent to AdjY, where “adjusted returns” are regressed onto indicators for each portfolio Approach fails to control for how avg. independent variable varies across benchmark portfolios Notes From an econometric perspective, there is nothing wrong with using the adjusted return as a measure of stocks’ performance, as proposed by Daniel et al. (1997); it accurately summarizes a portfolio’s performance relative to a benchmark return. Problems arise, however, if the adjusted return is then correlated with other (unadjusted) stock or firm characteristics. As an example, consider analyses of R&D intensity and stock returns. Although R&D and returns are positively correlated, it is possible that differences in firm size confound this relationship. Larger firms are associated with lower stock returns (for reasons presumably unrelated to R&D intensity). If R&D intensity is also correlated with firm size, then the correlation between R&D and returns may be attributable to firm size rather than a causal relation. Using size-matched benchmark portfolios to adjust returns but not adjust R&D does not adequately control for size, because it does not account for average differences in R&D intensity across the benchmark portfolios. The average R&D of firms in a stock’s benchmark portfolio affects its adjusted stock return but this fact is overlooked when one compares adjusted returns across portfolios sorted on firms’ unadjusted R&D intensity.

Asset Pricing AdjY – Example
Asset pricing example; sorting returns based on R&D expenses / market value of equity Notes We construct 48×5=240 benchmark portfolios at the end of June in each year based on firms’ 48 Fama-French industry classification and size quintile. We then demean each stock return using the corresponding return on the benchmark portfolio and sort these characteristically adjusted returns based on R&D. The market-weighted adjusted returns, reported in Table 5, Difference between Q5 and Q1 is 5.3 percentage points We use industry-size benchmark portfolios and sorted using R&D/market value

(from Table 5) Same AdjY result, but in regression format; quintile 1 is excluded Use benchmark-period FE to transform both returns and R&D; this is equivalent to double sort Notes Like in the other examples, the various techniques lead to different estimates of β. In this case, correlations between R&D, industry, and firm size cause AdjY to provide inconsistent estimates for the relation between R&D and returns. I.e. the AdjY estimator is failing to control for how average R&D varies across bins, which will affect the adjusted return. To control for unobserved differences, one must also “adjust” the independent variable used for sorting. E.g. double-sort returns on the benchmark portfolio and the independent variable and then to compare returns across the independent variable within each benchmark portfolio. This is equivalent to an estimator with benchmark portfolio fixed effects; please see paper for details In practice, the double-sort is cumbersome to report when there are many benchmark portfolios (e.g., 5 size x 5 book-to-market x 5 momentum = 125 portfolios) and systematic patterns may be difficult to eyeball. The FE estimator accurately summarizes these patterns and is easy to implement 55

AvgE IV estimators also problematic
Many researchers try to instrument problematic Xi,j with group mean, , excluding observation j Argument is that is correlated with Xi,j but not error But, this is typically going to be problematic Any correlation between Xi,,j and an unobserved hetero-geneity, fi, causes exclusion restriction to not hold Can’t add FE to fix this since IV only varies at group level Notes While it is possible that there exist scenarios where the independent variables are not i.i.d. within groups and the underlying data structure exhibits no other unobserved heterogeneity at the group level that is correlated with X, researchers should probably not assume this is true absent a strong economic argument to justify it.

What if AdjY or AvgE is true model?
If data exhibits structure of AvgE estimator, this would be a peer effects model [i.e. group mean affects outcome of other members] In this case, none of the estimators (OLS, AdjY, AvgE, or FE) reveal the true β [Manski 1993; Leary and Roberts 2010] Even if interested in studying , AdjY only consistent if Xi,j does not affect yi,j !

FE Estimator – Benefits [Part 1]
There are many benefits of FE estimator Allows for arbitrary correlation between each fixed effect, fi, and each x within group i I.e. it is very general and not imposing much structure on what the underlying data must look like Very intuitive interpretation; coefficient is identified using only changes within cross-sections

FE Estimator – Benefits [Part 2]
It is also very flexible and can help us control for many types of unobserved heterogeneities Can add year FE if worried about unobserved heterogeneity across time [e.g. macroeconomic shocks] Can add CEO FE if worried about unobserved heterogeneity across CEOs [e.g. talent, risk aversion] Add industry-by-year FE if worried about unobserved heterogeneity across industries over time [e.g. investment opportunities, demand shocks]

FE Estimator – Limitations
But, FE estimator also has its limitations Can’t identify variables that don’t vary within group Subject to potentially large measurement error bias Can be hard to estimate in some cases

Limitation #1 – Can’t est. some var.
If no within-group variation in the independent var., x, of interest, can’t disentangle it from group FE It is collinear with group FE; and will be dropped by computer or swept out in the within transformation In some cases, IV can be used to obtain estimates for variables that do not vary within groups [see Hausman and Taylor 1981]

Limitation #2 – Noisy ind. variables
If some within-group variation is noise, then variation being exploited that is noise rises in FE Think of there being two types of variation Good (meaningful) variation Noise variation because we don’t perfectly measure the underlying variable of interest Adding FE can sweep out a lot of the good variation; fraction of remaining variation coming from noise goes up [What will this do?]

Noisy independent variables [Part 2]
Answer: Attenuation bias on mismeasured (i.e. noisy) independent variable will go up! Practical advice: Be careful in interpreting ‘zero’ coefficients on potentially mismeasured regressors; might just be attenuation bias! Note… sign of bias on other coefficients will be generally difficult to know

Noisy independent variables [Part 3]
Problem can also apply even when all variables are perfectly measured [How?] Answer: Adding FE might throw out relevant variation; e.g. y in firm FE model might respond to sustained changes in x, rather than transitory changes [see McKinnish 2008 for more details] With FE you’d only have the transitory variation leftover; might find x uncorrelated with y in FE estimation even though sustained changes in x is most important determinant of y

Possible solutions for Limitation #2
Standard solutions for measurement error apply (e.g. IV), but in practice, hard to fix For examples on how to deal with measurement error, see following papers Griliches and Hausman (JoE 1986) Biorn (Econometric Reviews 2000) Erickson and Whited (JPE 2000, RFS 2012) Almeida, Campello, and Galvao (RFS 2010)

Limitation #3 – Computation issues
Researchers occasionally motivate using AdjY and AvgE because FE estimator is computationally difficult to do when there are more than one FE of high-dimension Now, let’s see why this is (and isn’t) a problem… Notes The FE transformation only removes one group effect, and the usual process for handling this would be to use the within transformation to remove one type of group effect (particularly the one with high dimension) and dummy variables for the second. E.g. firm and year dummies. This can be a problem, however, if both groups are of high dimension, because it can be computationally infeasible to store into memory and invert the matrix necessary to complete OLS. If the data is balanced, meaning there is a consistent set of observations for each subgroup, the data can be transformed by demeaning all the variables in both dimensions and then using OLS. Greene 2000, p shows how to do transformation in balanced panel But, in empirical finance, we almost always deal with unbalanced panels (e.g. firms exit and enter the data), and the transformation there is considerably more complicated The transformation in the unbalanced case (but patterned data) was proposed by Wansbeek and Kapteyn (1989) and is discussed by Baltagi (1995). But, this case is not that applicable in practice since the cases where such computational problems arise are not cases where the data is patterned.

Computational issues [Part 1]
Estimating a model with multiple types of FE can be computationally difficult When more than one type of FE, you cannot remove both using within-transformation Generally, you can only sweep one away with within-transformation; other FE dealt with by adding dummy variable to model E.g. firm and year fixed effects [See next slide]

Year FE Consider below model: To estimate this in Stata, we’d use a command something like the following… Firm FE Tells Stata that panel dimension is given by firm variable xtset firm xi: xtreg y x i.year, fe Tells Stata to remove FE for panels (i.e. firms) by doing within-transformation Tells Stata to create and add dummy variables for year variable

Dummies not swept away in within-transformation are actually estimated With year FE, this isn’t problem because there aren’t that many years of data If had to estimate 1,000s of firm FE, however, it might be a problem…

Why is this a problem? Estimating FE model with many dummies can require a lot of computer memory E.g., estimation with both firm and 4-digit industry-year FE requires ≈ 40 GB of memory Most researchers don’t have this much memory; hence, we don’t see these regressions being used Notes For example, researchers that use firm-level data are increasingly concerned about unobserved heterogeneity across firms AND time-varying heterogeneity across industries, such as industry-level shocks to demand. The way to control for this would be to put in both firm and industry-year FE. Even if you dialed it down to just 3-digit industry-year fixed effects, it would still require about 27 GB.

This is growing problem
Multiple unobserved heterogeneities increasingly argued to be important Manager and firm fixed effects in executive compensation and other CF applications [Graham, Li, and Qui 2011, Coles and Li 2011] Firm, industry×year, state×year FE to control for industry- and state-level shocks [Gormley and Matsa 2014] Notes In the analysis of executive compensation, there may be concern about unobserved heterogeneity across managers (such as skill, risk aversion, personality) and unobserved heterogeneity across firms (such as firm culture and organization capital) that might also correlated with the variables of interest (Graham, Li, Qui, forthcoming). Likewise, researchers that use firm-level data are increasingly concerned about time-varying heterogeneity across industries and states, such as industry- and state-level shocks to demand (Gormley and Matsa 2014) Note: These are not papers that use AvgE or AdjY! The first set use an alternative approach, and Gormley and Matsa (2014) used techniques discussed in following slides.

But, there are solutions!
There exist two techniques that can be used to arrive at consistent FE estimates without requiring as much memory #1 – Interacted fixed effects #2 – Memory saving procedures 73

#1 – Interacted fixed effects
Combine multiple fixed effects into one-dimensional set of fixed effect, and remove using within transformation E.g. firm and industry-year FE could be replaced with firm-industry-year FE But, there are limitations… Can severely limit parameters you can estimate Could have serious attenuation bias Notes In the above example, you would have firm-year FE and only be able to identify effect of variables that vary WITHIN firm-years. Given most finance datasets only contain one observation per firm-year, using interacted fixed effects in this case is infeasible because there is no within variation left after including firm-year fixed effects! And even when some variation remains after transforming the data using interacted fixed effect, the estimates may suffer from serious attention biases because of measurement error. You’ve in essence removed more variation than you really needed to. Another limitation of interacted fixed effects estimation is that it does not allow the researcher to recover the uninteracted fixed effects. When the researcher seeks to analyze the distribution, correlation, and importance of the fixed effects for specific groups (such as manager/worker fixed effects, as in Abowd, Kramarz, and Margolis 1999; Abowd, Creecy, and Kramarz 2002; Coles and Li 2011a; Graham, Li, and Qui 2011), other estimation techniques that allow the researcher to recover the estimates on the separate fixed effects are required. 75

#2 – Memory-saving procedures
Use properties of sparse matrices to reduce required memory, e.g. Cornelissen (2008) Or, instead iterate to a solution, which eliminates memory issue entirely, e.g. Guimaraes and Portugal (2010) See Gormley and Matsa (RFS 2014) for details of how each method works Both can be done in Stata using user-written commands FELSDVREG and REGHDFE Notes Basic idea of sparse matrix approach is that the matrix with all your fixed effects includes just zeros and ones. There are ways to store such matrices that require very little memory. Making use of these can reduce memory problems significantly. E.g. with 4-digit industry-year and firm FE example, memory drops from 40 GB to just 2GB. The iteration is very quick. Only problem is that it has the potential to require a large number of computations. Though, in practice, we’ve never had a problem with it. Iteration is very intuitive. You basically make initial guess of parameters. Then, solve for first beta that minimizes the squared residuals holding the other guesses constant. You update your guess and continue to repeat this process until it converges. This is the Gauss-Seidel zigzag algorithm These iterative algorithms provide consistent estimates of the coefficients and standard errors for the K parameters of interest. 76

These latter techniques work…
Estimated typical capital structure regression with firm and 4-digit industry×year dummies Standard FE approach would not work; my computer did not have enough memory… Sparse matrix procedure took 8 hours… Iterative procedure took 5 minutes See new Gormley and Matsa “Playing it Safe” working paper for example application Notes Sparse matrix dropped required memory from 40GB to just 2GB Computational times will obviously vary based on computing resources 77

See website for more details…
For examples of SAS, STATA, and R code one can use to estimate these high-dimensional FE estimations, please see our website Notes Sparse matrix dropped required memory from 40GB to just 2GB Computational times will obviously vary based on computing resources 78

Concluding remarks Unobserved heterogeneity across groups is common identification concern in empirical finance Despite heavy use, AdjY and AvgE are typically biased Can lead to very misleading inferences, including estimates with opposite sign of true effect Problem also applies to other, ad hoc transformations of dep. var. used in literature FE is best way to account for unobserved heterogeneity; limitations can easily be overcome Say Verbally 79

Practical advice… the punch lines
Don’t use AdjY or AvgE! Don’t use group averages as instruments! But, do use fixed effects Should use benchmark portfolio-period FE in asset pricing rather than char-adjusted returns Use iteration techniques to estimate models with multiple high-dimensional FE

Additional sources In addition to Gormley and Matsa (RFS 2014), other sources used to construct these slides are… Chapter 10 of Wooldridge, Jeffrey M., 2010, Econometric Analysis of Cross-Section and Panel Data, MIT Press, Massachusetts, Second Edition Chapter 11 of Greene, William H., 2011, Econometric Analysis, Prentice Hall, N.J., Seventh Edition. Sections 5.1 of Angrist, Joshua D., and Jorn-Steffen Pischke, 2009, Mostly Harmless Econometrics, Princeton University Press, New Jersey

Lecture slides by Todd Gormley

Similar presentations

Presentation on theme: "Lecture slides by Todd Gormley"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture slides by Todd Gormley

Similar presentations

Presentation on theme: "Lecture slides by Todd Gormley"— Presentation transcript:

Similar presentations

About project

Feedback