3 Forecasting Objective: Forecast Distinction: Ex post vs. Ex ante forecastingEx post: RHS data are observedEx ante: RHS data must be forecastedPrediction vs. model validation.Within sample prediction“Hold out sample”
4 Prediction Intervals Given x0 predict y0. Two cases: Estimate E[y|x0] = x0;Predict y0 = x0 + 0Obvious predictor, b’x0 + estimate of 0. Forecast 0 as 0, but allow for variance.Alternative: When we predict y0 with bx0, what is the 'forecast error?'Est.y0 - y0 = bx0 - x0 - 0, so the variance of the forecast error isx0Var[b - ]x0 + 2How do we estimate this? Form a confidence interval. Two cases:If x0 is a vector of constants, the variance is just x0 Var[b] x0. Form confidence interval as usual.If x0 had to be estimated, then we use a random variable. What is the variance of the product? (Ouch!) One possibility: Use bootstrapping.
5 Forecast Variance Variance of the forecast error is 2 + x0’ Var[b]x0 = 2 + 2[x0’(X’X)-1x0]If the model contains a constant term, this isIn terms squares and cross products of deviations from means. Interpretation: Forecast variance is smallest in the middle of our “experience” and increases as we move outside it.
12 Dummy Variable for One Observation A dummy variable that isolates a single observation. What does this do?Define d to be the dummy variable in question.Z = all other regressors. X = [Z,d]Multiple regression of y on X. We know thatX'e = 0 where e = the column vector ofresiduals. That means d'e = 0, which says that ej = 0 for that particular residual. The observation will be predicted perfectly.Fairly important result. Important to know.
13 Oaxaca DecompositionTwo groups, two regression models: (Two time periods, men vs. women, two countries, etc.)y1 = X11 + 1 and y2 = X22 + 2Consider mean values,y1* = E[y1|mean x1] = x1* 1y2* = E[y2|mean x2] = x2* 2Now, explain why y1* is different from y2*. (I.e., departing from y2, why is y1 different?) (Could reverse the roles of 1 and 2.)y1* - y2* = x1* 1 - x2* 2= x1*(1 - 2) (x1* - x2*) 2(change in model) (change in conditions)
15 Application - IncomeGerman Health Care Usage Data, 7,293 Individuals, Varying Numbers of Periods Variables in the file are Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. They can be used for regression, count models, binary choice, ordered choice, and bivariate binary choice. This is a large data set. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). HHNINC = household nominal monthly net income in German marks /(4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC = years of schoolingAGE = age in yearsMARRIED = 1 if married, 0 if notFEMALE = 1 if female, 0 if male