# Workshop on Endogeneity

## Presentation on theme: "Workshop on Endogeneity"— Presentation transcript:

Workshop on Endogeneity
Application examples & discussion --in Financial Research Important note: This file is made only for the workshop use. Here many of the statements and ideas are copied from available reports. So DO NOT cite this in your reference.

Reality about endogeneity ------from Jesper B Sørensen, 2012
“Endogeneity is a fancy word for a simple problem. So fancy, in fact, that the Microsoft Word spell-checker does not recognize it. “ “Unfortunately, there are other sources of endogeneity that are not so easily dealt with.” “And in fact, I think that in most cases where the charge of endogeneity is filed, people are not so much worried about omitted variables. Rather, what they are worried about is things like simultaneity – i.e., X causes Y but Y also causes X, -- and self-selection.” “The problem with such endogeneity problems is that no amount of control variables will address them. “

Endogeneity problem Dependent variable: Y Independent variable: Xs True Model: Y = α + β Xs + ε (disturbance) Predicted Model: Ye = αe + βe Xs + u (error) In a model, a variable is said to be endogenous when there is a correlation between the variable and the disturbance (in the true model).

Endogeneity problem Broadly, a multiple causalities between the Independent variable and Dependent variable of a model leads to endogeneity. (the second relation if X and disturbance are related: X--> ε-->Y) Problem: Regression technique cannot identify which is which between relations (1) and (3).

Endogeneity problem The common problem type in application:
Some other factors influence the dependent variable (relation 4) and independent variables (relation 5) with some stable relations; the net effect is “equivalent” to the relation (3). The other factors may be unknown (or not measurable).

Endogeneity problem Endogeneity can arise as a result of
omitted variables measurement error other factors Others forms: simultaneity, and sample selection bias. autoregression with aurocorrelated errors

Relative Issues Easier said than done
Difficult in defending: It is easy for the reviewer to attack for the simple reason that generally there are many other potential factors. Question is how much is enough.

Type 1 Omitted variable:
The regression result will be biased. (The degree of bias is related to the relative level of the correlations and functional relation and correlated variables. We can sense these in the Excel simulation) (Signal of the problem: the estimated parameters are not consistent in robustness tests (in different sample time or data) Testing the problem (testing on residual, D-W, RESET etc) Remedy (Econometrics textbook IV, 2SLS, GMM)

Type 2 : Measurement error
Imagine that instead of observing true we observe where is the measurement "noise" In this case, a true model is which follows (where ) Since ( ) and ( ) are correlated, so OLS estimation will be downward bias. Note: Measurement error in the dependent variable, however, does not cause endogeneity (though it does increase the variance of the error term).

Type 3: other factors Other factors are related to the X and Y at the same time. True model: Y= α + β X + S1* F + u X =T0 + T1 *F +v  F = -T0/T1 + 1/T1 X - v/T1 Y = α + β X + S1 (-T0/T1 + 1/T1 X -v/T1) + u = α -S1*T0/T1+ (β + S1/T1) X + u – v/T1 So Y =a’ + b’ X + μ (regression model) b’ = β + S1/T1 --- a biased result

Remedy Instrument variable (Z)
Z is correlated with X (and so Y)—The instrument variable will keep more informational variance of X. and no correlation with ε (Break down the second relation) The result will be un-biased, with a loss of efficiency. The regression result will depend on the relative value of model coefficient and the correlation of the instrumental variable.

Remedy Instrument variable (Z) Y= α+β*X + ε Y= α+β*Xh + ω (=ε+u)
X (=α+β*Z+u) Z Xh (=a+b*Z)

Remedy 2SLS (Z) Y1= α+β*Y2 + ε Y1= α+β*X2h + ω (=ε+u) Y2 (=a+b*Z+u) u
Y2h (=a+b*Z)

A natural experiment can be one of two things:
Natural experiments: state-of-the-art solutions to endogeneity ----J. GIPPEL1, T. SMITH2, Y. ZHU A natural experiment can be one of two things: a randomized trial set up by the researcher in a natural setting or; a naturally occurring state (event) resulting from a social or political situation and thus not intentionally set up by the researcher.

A natural experiment can be one of two things:
Natural experiments: state-of-the-art solutions to endogeneity ----J. GIPPEL1, T. SMITH2, Y. ZHU A natural experiment can be one of two things: a randomized trial set up by the researcher in a natural setting or; an experiment where subjects were given a hypothetical \$10,000 to invest and randomly assigned to receive varying levels of information about four S&P 500 index funds. About short selling impact on market efficiency, by the cooperation of a large money manager, randomly made available or withheld stocks from the lending market.” Controlling the assumed condition is difficult.

Natural experiments: state-of-the-art solutions to endogeneity ----J
Natural experiments: state-of-the-art solutions to endogeneity ----J. GIPPEL1, T. SMITH2, Y. ZHU A natural experiment can be one of two things: a randomized trial set up by the researcher in a natural setting or; a naturally occurring state (event) resulting from a social or political situation and thus not intentionally set up by the researcher. Choosing the control groups and treatment groups such that they may differ in systematic ways other than in regard to the treatment. The occurring state is not intentionally set up by the researcher and so the treatment group is not randomly assigned. Keys: identify the treated and controlled to rule out the effects of other factors (including the endogeneity elements).

Natural experiments: state-of-the-art solutions to endogeneity ----J
Natural experiments: state-of-the-art solutions to endogeneity ----J. GIPPEL1, T. SMITH2, Y. ZHU The advantage of using natural experiments, when well considered, is that they are an exogenous event. At least the researcher should convince her audience this is so. Studies using such events make a strong case for a causal interpretation of the results.

Differences-in-differences
Multicollinearity Problem?

Example: Corporate Governance and Dividend Payout (Francis eta, 2011, FM)
Why firms pay dividend ? What are the determinants ? Signaling and tax clienteles theories are not always consistent with test results Agency problems between insiders and outsiders have impact on dividend payout policy Dividend payout = a + b * CG

Particular: Dividend payout = a + b * CG
Wxample: Corporate Governance and Dividend Payout (Francis eta, 2011, FM) Particular: Dividend payout = a + b * CG High payout firms are associated with better CG because managers make efficient investment. Many other (unknown) factors impact on dividend and C.G. at the same time (Endogeneity problem).

Possible factors Increased cash usages in: CEO compensation
Employee benefits Reduction in leverage Increased investments Increased restriction on dividend payment imposed by state laws (external C.G.)

Exogenous event and special treatment
Impacts of Antitakeover laws (event) Managers are insulated from market pressure. They begin to enjoy a “quiet” life and increase their compensation. Entrenched managers prefer to reduce or cut dividend payouts. All else being equal after managers are insulated from takeovers they reduce their firms’ dividend payout ratios (or inclined to pay less). The effect is more pronounced when the firms have poor internal C.G. (and for small firms)

Exogenous event and special treatment
Increased restriction on dividend payment imposed by state laws (external C.G.) Exogenous measure of C.G.: passage of antitakeover laws Clear treatment (in the states with law change) and control (in the states without the change)

Multivariate regression
Proxy of C.G. change (passage of antitakeover laws) Payout patterns before and after the event; Multivariate regression to estimate the relation (differences-in-differences) Change in propensity to pay (using Logit model)

Differences-in-differences
Multivariate regression to estimate the relation (differences-in-differences) Control group: firms in the states that did not enact antitakeover laws at that time The key variable in the regression (Treat*PostLaw) captures the pure effect of the law passage

Differences-in-differences

Y=B0+B1*Treat+B2*(PostLaw) +B3*(Treat*PostLaw)+Other Controls
B0+B1+B2+B3 PostLaw is exogenous and a proxy for the change of C.G. Common change is controlled by B2 (assumed 0 in the study)(?) Only the treated group is influence by the event, measured by B3.

Reality about endogeneity ------from Jesper B Sørensen, 2012
In general I think a lot of researchers, in thinking about their research designs, misallocate effort. They dedicate too much time to collecting a wide range of control variables, even though it is often the case that many of those variables either don’t affect the X or affect the Y (or both). Instead, the efforts of researchers are often better focused on thinking about actual research design -- how they might anticipate and address concerns about endogeneity in the form of things like simultaneity and self-selection.

Self-selection problems
Size effects in finance Return = a + b * size Is the size take the value randomly or it is a result of operating business? Survival bias – currently listed firms vs. delisted firms Financial Bankruptcy prediction—bankrupted firms vs. survived firms

Dynamic models Prediction Modes R(t) = a + λ * R(t-1) + b * X (t-1);
Granger causal relation models Y1(t) = Y1(t-1) + Y2(t-1) + X(t-1); Y2(t) = Y1(t-1) + Y2(t-1) + X(t-1); Endogeneity problem?

Prediction Models Y(t) = a + λ * Y(t-1) + b * X (t-1) + u(t);
Correlation (Y(t-1), u(t)) =0 (?)

Dynamic models CF(t) = a + b * CF(t-1) + Operating(t);
Current year CF could be somehow related to the previous level (in the model); The capital investment in the preceding years could be an important factor to determine both CF(t-1) and CF(t). Endogineity is not automatic.

Dynamic models “The endogeneity problem is particularly relevant in the context of time series analysis of causal processes. It is common for some factors within a causal system to be dependent for their value in period t on the values of other factors in the causal system in period t-1. “ “Suppose that the level of pest infestation is independent of all other factors within a given period, but is influenced by the level of rainfall and fertilizer in the preceding period. In this instance it would be correct to say that infestation is exogenous within the period, but endogenous over time.” --- Jesper B Sørensen, 2012

development and positives
A real benefit of the growing emphasis on endogeneity concerns is to force people to be explicit about this assumption. Right now, people often come to face this issue in the review process. But we can hope that over time people will internalize this concern in the same way that they have internalized the concerns with omitted variables. Find a good solution in the research design.

Trade-off It is most helpful to think about the difference between theory generation and theory testing (a trade-off) Our choice is to choose which side of the trade-off in each research project.