MKFM6: multivariate stationary state-space time-series modeling using ML estimation in the Kalman Filter at yt et zt S H G at+1 yt+1 et+1 zt+1 Z B xt+1 R at-1 at-2
y[t] an ny dimensional random vector repeatedly observed at occasion t-1...t...t+1 in a sample of N=1 or N>1 Q Q zt zt+1 G G H H H at-1 at at+1 at-2 B B S S xt xt+1 Z yt Z yt+1 et et+1 t=1.....T R R
y[t] = S a[t] S=I Q Q zt zt+1 G G H H H at-1 at at+1 at-2 B B S S xt et et+1 R R
a[t+1] = H a[t+1] + G z[t+1] G=I y[t] = S a[t] a[t+1] = H a[t+1] + G z[t+1] G=I Q Q zt zt+1 G G H H H at-1 at at+1 at-2 B B S S xt xt+1 Z yt Z yt+1 et et+1 1st Order Autoregressive in structure, Markov model VARMA(p,q) models R R
y[t] = S a[t] + e[t] S≠I a[t+1] = H a[t+1] + G z[t+1] at latent (as in factor analysis) Q Q zt zt+1 G G H H H at-1 at at+1 at-2 B B S S xt xt+1 Z yt Z yt+1 yt observed et et+1 R R
y[t] = S a[t] + e[t] + Z x[t] a[t+1] = H a[t+1] + G z[t+1] covariance matrices regression parameters Q Q zt zt+1 G G H H H at-1 at at+1 at-2 B B S S xt xt+1 yt yt+1 Z Z et et+1 R R x is fixed regressor (e.g., if x=1, Z are means)
y[t] = S a[t] + e[t] + Z x[t] a[t+1] = H a[t+1] + G z[t+1] + B x[t+1] covariance matrices regression parameters Q Q zt zt+1 G G H H H at-1 at at+1 at-2 B B S S xt xt+1 yt yt+1 Z Z et et+1 R R
y[t] = S a[t] + d + e[t] + Z x[t] a[t+1] = H a[t+1] + c + G z[t+1] + B x[t+1] d and c superfluous, but convenient Q Q zt zt+1 G G at-1 H H at H at+1 at-2 B B xt S S xt+1 Z yt Z yt+1 et et+1 R R
y[t] = S a[t] + d + e[t] + Z x[t] a[t+1] = H a[t+1] + c + G z[t+1] + B x[t+1] d and c superfluous, but convenient Q Q zt zt+1 G G at-1 H H at H at+1 at-2 c c 1 S S 1 d yt d yt+1 et et+1 R R
N Groups T type Software large ≥1 small SEM LISREL, M+, Mx at y at+1 e at-1 zt-1 zt zt+1 yt-11 yt1 yt+11 ....... yT1 ..... yt-1N ytN yt+1N ....... yTN subject time N Groups T type Software large ≥1 small SEM LISREL, M+, Mx small ≥1 intermediate hybrid MKFM 1 ≥1 large Timeseries Many, MKFM Acually all structural equation modeling, the details dictate computational strategies
S=1 R=1 H=1 Q=1 d=1 c=0 Z=0 P=1 B=0 G=1 at y at+1 e at-1 zt-1 zt zt+1 S R H Q G 1 d Syntax nm=1 se=yes mo=1 ny=4 ne=1 nx=0 df=ts1 rf=no ns=1 mi=-9 S=1 R=1 H=1 Q=1 d=1 c=0 Z=0 P=1 B=0 G=1
S=1 R=1 H=1 Q=1 d=1 c=0 Z=0 P=1 B=0 G=1 R fi di 0 0 0 0 R fr di 1 2 3 4 G fi di 1 G fr di d fr 11 12 13 14 d fi H fi H fr 21 Q fi fu 1 Q fr fu 31 S fi S fr 41 42 43 Syntax
R parameters - diagonal 0.395 0.453 0.563 0.500 H parameters 0.799 Model 1 of 1 S parameters 1.000 0.934 0.812 0.683 R parameters - diagonal 0.395 0.453 0.563 0.500 H parameters 0.799 Q parameters 0.348 d parameters 4.936 5.081 5.030 4.947 G parameters - diagonal output Q zt-1 G at-1 1 H d S y y y y e e e e R
Colored lines - observed series Black line - estimated latent series (Kalman Filter)
at y at+1 e at-1 zt-1 zt zt+1 S R H Q G 1 d Q: what if H is zero?
at y at+1 e at-1 zt-1 zt zt+1 S R Q G 1 d Q: what if H is zero? A: data at each occasion are independent. If H is zero, I can fit the model in LISREL (or Mx, or M+) Or in MKFM6
H=0 at+1 y e zt+1 S R Q I S parameters 1.000 0.867 0.841 0.665 LAMBDA-Y Q parameters 0.949 PSI R parameters - diagonal 0.557 0.481 0.472 0.634 THETA-EPS 0.557 0.481 0.472 0.634
Similarities between the LISREL model and the MKF State- Space model. measurement (linear factor) model y[t] = Sa[t] + d + e[t] + Z x[t] y[i] = Lh[i] + t + e[i] structural regression model a[t+1] = H a[t+1] + c + G z[t+1] + B x[t+1] h[i] = B h[i] + a + I z[t+1] cov(e) = R cov(e) = Q regression parameters covariance matrices cov(z) = GQG' = Q cov(z) = Y Sy-Miin Chow et al. SEM 2010. next example VAR
Restricted vector autoregressive model, S=I a1t a2t a3t a1t+1 a2t+1 a3t+1 H 1 x B d Q Restricted vector autoregressive model, S=I y[t] and a[t] variables identical
Effect of x on a2 and a3 via a1 (a causal model) regression on fixed x x x B B Q a1t a1t+1 a2t a2t+1 H a3t a3t+1 intercepts d 1 1 Effect of x on a2 and a3 via a1 (a causal model)
timeseries a1, a2, a3 x fixed variable gaps: 25% missing in each series
x x B Q B a1t a1t+1 a2t a2t+1 H a3t a3t+1 d 1 G fi di 1 1 1 Q fi di G fr di 0 0 0 d fr 1 2 3 d fi H fi H fr 4 5 6 7 8 9 10 11 12 Q fi di 1 1 1 Q fr di 21 22 23 S fi di S fr di 0 0 0 B fr 31 32 33 B fi x x B Q B a1t a1t+1 a2t a2t+1 H a3t a3t+1 d 1 1 S=1 R=0 H=1 Q=1 d=1 c=0 Z=0 P=1 B=1 G=1 next example latent var
D - depression, A anxiety autoregressive / cross lagged regressive model - with indicators
D - depression, A anxiety wife A D husband
N=1 Meas. Inv. of indicators w.r.t. external variable x z z z z a a a a y y y y y y y y y y y y e e e e e e e e e e e e B S H d (not shown) i.e. intercepts G
N=1 Meas. Inv. of indicators w.r.t. external variable x f(yi|a*) = f(yi|a*,xi) z z a a y y y y y y e e e e e e
N=1 Meas. Inv. of indicators w.r.t. external variable x two indicators biased w.r.t. x. x x z z z z a a a a y y y y y y y y y y y y e e e e e e e e e e e e B S H Z (bias with respect to x) G f(yi|a*) ≠ f(yi|a*,xi)
f(yi|a*) = f(yi|a*,subjecti) z e x B f(yi|a*) = f(yi|a*,subjecti) Are the indicators measurement invariant w.r.t. subject (e.g., N=2)? d is invariant (intercepts equal), B zero in subject 1, B free in subject 2 X could equal 1.
f(yi|a*) = f(yi|a*,xi) f(yi|a*) = f(yi|a*,subjecti) Definition of measurement invariance in N=1 or N=2. + Interpretation as an intra-individual causal model Relationship with inter-individual causal model Issue of power: simulation? exact simulation? is N=100, T=1 relevant to N=1, T=100. Application (real data)
ML parameter estimates R11 0.51000 R22 0.36000 R33 0.51000 R44 0.36000 T=50 Ny=4 N=250 ML parameter estimates R11 0.51000 R22 0.36000 R33 0.51000 R44 0.36000 D1 0.00000 D2 0.00000 D3 0.00000 D4 0.00000 H 0.00000 Q 1.00000 S21 0.80000 S31 0.70000 S41 0.80000 ML parameter estimates 0.50998 (.51) 0.35997 (.36) 0.50994 (.51) 0.36001 (.36) -0.00153 (.0) -0.00175 (.0) 0.79994 (.80) 0.37097 (.36) 0.79994 (.8) 0.69997 (.7) 0.79992 (.8)
Good points: N=1, N=few, N=many Multigroup, where group is N=1 or N>1 No limitation on length of timeseries T Can handle N=few T=intermediate (a niche!) Missing data no problem (under assumptions) Model quite flexible Freely available (FORTRAN 77 code) Easy-ish to use Bad points: Stationarity (cov structure) ML fixed effect only (no random effects) Continuous indicators, conditional normality
Estimation: Maximum Likelihood in the Kalman Filter (prediction error decomp.) Documentation: mkfm.doc (manual with examples: DFA, ARMA, includes FORTAN source code) j_adolf.doc (more examples incl meas. inv.) some technical doc (online; Ellen Hamaker) My main reference: A.C. Harvey (1996). Forecasting, structural time series models and the Kalman Filter. Cambridge: Cambridge Univ. Press. Other good references: Hamilton, Kim & Nelson. One or two articles using MKFM: Ellen Hamaker (UU).
To use: 1) Organize data input (manual) 2) Write input script (manual) 3) Run analysis in DOS window mkfm6-1 < inputfile > outputfile
Input #1: Model specification title example simulated nm=1 se=yes S fi mo=1 ny=4 ne=1 nx=1 df=ts1mi rf=no ns=1 mi=-999 B=1 S=1 R=1 H=1 Q=1 d=1 c=0 Z=0 P=1 G=1 R fi di 0 0 0 0 R fr di 1 2 3 4 G fi di 1 G fr di d fr 11 12 13 14 d fi H fi H fr 21 Q fi Q fr 31 Input #1: Model specification S fi 1 S fr 41 42 43 B fi B fr 50 P fi 100 P fr st ... lb ub
Input part #2: the data file 250 4.621209 5.754381 6.855362 7.026104 0 data file: TS1MI 250 4.621209 5.754381 6.855362 7.026104 0 5.826427 6.732414 6.818705 5.448525 0 4.840544 4.112223 6.653377 7.070116 0 6.197258 7.596196 7.257844 3.101740 1 ..........
Output part#1: Model specification max nm= 5 nt=5000 ns= 10 ny=30 nx= 5 ne=30 npar=400 Read from input file title example simulated nm=1 se=yes mo=1 ny=4 ne=1 nx=1 df=ts1mi rf=no ns=1 mi=-999 B=1 S=1 R=1 H=1 Q=1 d=1 c=0 Z=0 P=1 G=1 =================== MKFv1 April 2010 Model 1 of 1 S fr parameters (nonzero) 11 12 13 R fr parameters (nonzero) - diagonal 1 2 3 4 H fr parameters (nonzero) 9 Q fr parameters (nonzero) 10 d fr parameters (nonzero) 5 6 7 8 B fr parameters (nonzero) 14 Output part#1: Model specification
Output part#2: Summary stats + parameter estimates, st errs, tvals DATA SUMMARY MODEL 1 of 1 NY= 4 NX= 1 NE= 1 Ncases= 1 START= 1 END= 1 CASE 1 T= 250 N of T missing= 0 datafile ts1mi State_0 0.00 var 1 2 3 4 %miss 0.2720 0.2640 0.2520 0.2880 mean 5.52 5.41 5.49 5.40 var 1.83 1.67 1.39 0.94 std 1.35 1.29 1.18 0.97 min 1.32 2.01 2.23 3.22 max 8.82 9.75 9.33 8.01 Number of fixed regressors 1 ML parameter estimates nr 1 0.55179 g 0.000026 se 0.0851 t 6.48 nr 2 0.51076 g 0.000012 se 0.0748 t 6.83 nr 3 0.68355 g 0.000048 se 0.0830 t 8.23 nr 4 0.44029 g 0.000038 se 0.0562 t 7.83 nr 5 4.47651 g 0.000020 se 0.2983 t 15.01 nr 6 4.47448 g 0.000051 se 0.2748 t 16.28 nr 7 4.73732 g -0.000049 se 0.2244 t 21.11 nr 8 4.71898 g -0.000096 se 0.1928 t 24.47 nr 9 0.72616 g -0.000106 se 0.0506 t 14.36 nr 10 0.48247 g -0.000055 se 0.0932 t 5.18 nr 11 0.92335 g 0.000051 se 0.0738 t 12.52 nr 12 0.71907 g -0.000143 se 0.0710 t 10.13 nr 13 0.62061 g -0.000031 se 0.0609 t 10.19 nr 14 0.95749 g -0.000021 se 0.1301 t 7.36 Logl -513.914 -2xLogL 1027.828 Inform(NPSOL) 0
Output part#3: parameter estimates in matrices title example simulated Model 1 of 1 S parameters 1.000 0.923 0.719 0.621 R parameters - diagonal 0.552 0.511 0.684 0.440 H parameters 0.726 Q parameters 0.482 P parameters 100.000 d parameters 4.477 4.474 4.737 4.719 G parameters - diagonal B parameters 0.957 P(t|t) error cov 0.233 Output part#3: parameter estimates in matrices