Presentation is loading. Please wait.

Presentation is loading. Please wait.

Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September.

Similar presentations


Presentation on theme: "Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September."— Presentation transcript:

1 Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September 2012

2 EU-SILC data Data on income and living conditions
Data on household members and selected individuals Among the large number of variables we selected: VARIABLE TO BE IMPUTED PY010G - Gross annual income Completely at random deleted about 11% data EXPLANATORY VARIABLES PE040 - Level of education attained PL060 - Number of hours usually worked per week AGE - Age of person

3 Analysis PY010G PY010G is very asymmetrical Analysis according PE040
Because PE040 is categorical equal models

4 Further analysis PY010G For each level of education achieved
Analysis according to AGE and PL060 For 5th education level

5 Model for PY010G 𝑃𝑌010𝐺= 𝛽 1 + 𝛽 2 ∗𝑃𝐿060+ 𝛽 3 ∗𝐴𝐺𝐸+𝜀, 𝜀 ~ 𝑁 0, 𝜎 2
𝑌=𝑋𝛽+𝜀 , 𝜀 ~ 𝑁 0, 𝜎 2 𝑃𝑌010𝐺= 𝛽 1 + 𝛽 2 ∗𝑃𝐿060+ 𝛽 3 ∗𝐴𝐺𝐸+𝜀, 𝜀 ~ 𝑁 0, 𝜎 2 Estimations: 𝛽 = 𝑋 𝑜𝑏𝑠 𝑇 𝑋 𝑜𝑏𝑠 −1 𝑋 𝑜𝑏𝑠 𝑇 𝑌 𝑜𝑏𝑠 𝑠 2 = ( 𝑌 𝑜𝑏𝑠 − 𝑋 𝑜𝑏𝑠 𝛽 ) 𝑇 ( 𝑌 𝑜𝑏𝑠 − 𝑋 𝑜𝑏𝑠 𝛽 ) /(𝑛 𝑜𝑏𝑠 −𝑘) Example for: PE040=5, AGE=40, PL060=40 Graphs of normal distribution with respect to the data (red) and regression model (green).

6 Bayes aproach Equal treatment for Parameters: DATA: 𝑌 (PY010G)
and PARAMETERS: 𝛽, 𝜎 2 Parameters: are not fixed values, have their own probability distribution.

7 Simulations and Multiple imputation
Simulations of parameters: first draw variance: 𝜎 2 | 𝑌 𝑜𝑏𝑠 , 𝑋 𝑜𝑏𝑠 ~ 𝑆𝑐𝑎𝑙𝑒𝑑-𝐼𝑛𝑣- 𝜒 2 𝑛 𝑜𝑏𝑠 −𝑘, 𝑠 2 , then draw coefficients: 𝛽 | 𝜎 2 , 𝑌 𝑜𝑏𝑠 , 𝑋 𝑜𝑏𝑠 ~ 𝑁 𝛽 , 𝑋 𝑜𝑏𝑠 𝑇 𝑋 𝑜𝑏𝑠 −1 𝜎 2 . Simulations of missing values (Multiple imputation) draw missing value: 𝑦 𝑚𝑖𝑠,𝑖 ~ 𝑁 𝑋 𝑚𝑖𝑠,𝑖 𝛽, 𝜎 2 , independently for each missing value ( 𝑖=1, 2, …, 𝑛 𝑚𝑖𝑠 ). 5 imputations almost 98% efficiency (Rubin`s formula for about 11% rate of missing information.)

8 Imputed values Example of 5 imputations for: PE040=5, AGE=40, PL060=40

9 Evaluation Comparison of the average gross annual income
(Initial data: data before deleting.) Small relative errors Relatively narrow 95% confidence intervals Poorer results for model 6, because: only 58 units high variance from the linear regression ( )

10 Thank you for your attention !
Discussion Method is effective, if data are successfully described by the selected model. Mechanism of missing values is ignorable, if missing data are MAR and parameters of model and parameters of mechanism of missing values are divisible (parameters are independent). Imputed and explanatory variables have to be numerical. We tested the method progressively by using the SAS programme. The method is already included in the MCMC procedure in newer version (9.2 and 9.3) of the SAS. Thank you for your attention !


Download ppt "Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September."

Similar presentations


Ads by Google