Presentation is loading. Please wait.

Presentation is loading. Please wait.

When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto 05/06/2014 Joint work with Loredana Di Consiglio.

Similar presentations


Presentation on theme: "When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto 05/06/2014 Joint work with Loredana Di Consiglio."— Presentation transcript:

1 When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto 05/06/2014 Joint work with Loredana Di Consiglio

2 Outline of the talk 1.Motivations 2.Linkage errors and total survey error 3.Methodologies for analyses on linked data 4.A sensitivity analysis 5.Concluding remarks and future works Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

3 Why linking and why linkage errors? Integration of different sources (surveys, administrative lists, registers ) has acquired a preeminent role The huge accomplished effort to link data is not the final aim of the statistical process Whatever is the statistical analysis to perform on integrated data, when dealing with data resulting from a record linkage process, it should be taken into account that linkage is subject to two types of errors: 1.erroneous acceptance of false links 2.rejection of true matches (missed links) Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

4 Linkage Errors and Total Survey Error Biemer 2010 Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

5 Linkage Errors and Total Survey Error Zhang 2012 Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

6 Methodologies for analyses on linked data 1965 : Neter, Maynes and Ramanathan 1993-1997 : Scheuren and Winkler 2000 : Lahiri and Larsen 2009 : Chambers Regression analysis of probability-linked data, Official Statistics Research Series, Vol. 4. 2011 : Chipperfield, Bishop and Campbell Chambers (2009) contains a systematic overview of regression analysis of linked data, describes the approach developed by Neter et al., Scheuren et al, Lahiri et al. and gives his own bias-corrected estimators of regression parameters Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

7 Methodologies for analyses on linked data Those settings work under strong assumptions Exchangeability linkage errors model Equal size of linking sets (or smallest set contained in the biggest one) Linking in 1:1 constrain Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

8 A sensitivity analysis Winkler (2014) notes «Scheuren and Winkler (1997) observed that, if linkage error is below 1%, then can perform statistical analysis without adjustment. Most ‘good’ matching situations have overall linkage error above 10%. Even ‘high match scores’ sets of pairs may have linkage error in range 1- 5%. The current models may adjust the ‘observed’ matched pairs to having linkage error down from 10% to 7.5%» Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

9 Experimental data ScenarioDeclared Matches False matches in Declared  Gold 826 00.04800 Silver752110.1460.0870.015 Bronze786300.1290.2360.038 Random Sample of 1000 units from the fictitious population census data in the ESSnet DI (2011). Linear model (as in Chambers, 2009): Y= X  +  with X~[1,Uniform(0,1)]  =[1,5]  ~Norm(0,1) Logistic model: X~Bernoulli(0.75) Y~Multinom(0.7,0.05,0.2,0.05) dependent on X. Two lists L1 and L2 were generated L1 = [Xs, 942 units] L2 = [Ys, 921 units] Units in common (the true matches) 868; true un-matches are 127 Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

10 The three Linkage scenarios Probabilistic record linkage procedures (Fellegi and Sunter, 1969) with the software RELAIS (2011). Gold scenario: Name, Surname, Complete date of birth Silver scenario: Name, Surname, Year of Birth Bronze scenario: Day of birth, Month of birth, Address. ScenarioDeclared Matches False matches in Declared = prob. Missing true matches  = prob. False matches  = false matches rate Gold 826 00.04800 Silver 752110.1460.0870.015 Bronze 786300.1290.2360.038 Table 1 – Results of linkage procedures for the three Scenarios Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

11 Linear Model – Naive Estimator and Linkage error bias adjusted estimators Linkage scenarioEstimatorBetaStandard Error Population True Value0.886 - 5.1550.064 - 0.112 Perfect LinkageNaïve0.907 - 5.0930.069 - 0.121 Gold LinkageNaïve0.927 - 5.0850.071 - 0.123 Silver Linkage Naïve0.988 - 4.9760.079 - 0.138 Ratio – ModOLS – Predictive0.952 - 5.0500.080 - 0.141 Eb_CUE 0.949 - 5.0550.080 - 0.141 Bronze Linkage Naïve1.045 - 4.8760.078 - 0.135 Ratio – ModOLS – Predictive0.949 - 5.0700.081 - 0.144 Eb_CUE 0.947 - 5.0750.081 - 0.144 Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

12 Logistic Model – Naive and Adjusted estimators Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014 Linkage scenarioEstimatorBetaStandard Error PopulationTrue Value -1.6800.087 Perfect LinkageNaïve -1.7440.096 Gold LinkageNaïve -1.7620.100 Silver Linkage Naïve -1.7950.106 Est. Equ. ML -1.7980.106 LL -1.8030.107 Est. Equ. Ch. -1.8170.107 Bronze Linkage Naïve -1.7340.101 Est. Equ. ML -1.7410.102 LL -1.7550.102 Est. Equ. Ch. -1.7890.104

13 Remarks Relevance of the missing matches to completely remove linkage errors effect on the estimate bias. The naïve estimators under perfect linkage and Gold scenario are still biased due to missing true matches. Again, in the logistic regression, under the Bronze scenario the naïve estimate is less biased because there the missed matches component is lower than in the other scenarios. The correction for bias is effective in the linear case (achieving a bias reduction of about 10% for the Silver scenario and higher in the Bronze one) but more work is needed for the logistic case where the naïve estimator performs slightly better. Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

14 Future works Further works to investigate linkage errors effects on variability component. Further analyses to assess the trade-off in adjusting for bias with respect to the expected increase of variance. More flexible framework, as in Chipperfield et al. (2011), where exchangeability of linkage errors is not required and missed matches are explicitly considered Finally, here the probability of being correctly linked and the probability of erroneous missed matches are assumed to be known, whereas the linkage errors evaluation is not a straightforward task Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

15 Bibliography Biemer (2010) Total Survey Error Design, Implementation, And Evaluation Public Opinion Quarterly, Vol. 74, No. 5, 2010 Chambers R. (2009) Regression analysis of probability-linked data, Official Statistics Research Series, Vol. 4. Chipperfield, J. O., Bishop, G. R. and Campbell P. (2011). Maximum likelihood estimation for contingency tables and logistic regression with incorrectly linked data, Survey Methodology, Vol. 37, No. 1 Fellegi I.P., Sunter A.B. (1969) “A Theory for record linkage”, Journal of the American Statistical Association, 64, 1183-1210. Lahiri, P., and Larsen, M.D. (2000). Model based analysis of records linked using mixture models. Proc. Of the section on survey research methods, ASA, 11-19 Lahiri, P., and Larsen, M.D. (2005). Regression analysis with linked data. Journal of the American Statistical Association, 100, 222-230. McLeod, Heasman and Forbes, (2011) Simulated data for the on the job training, Essnet DI http://www.cros-portal.eu/content/job-training Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014

16 Bibliography Neter, J., Maynes, S., Ramanathan, R. (1965): The effect of mismatching on the measurement of response errors, JASA RELAIS, (2011). User’s guide version 2.2, available at http://joinup.ec.europa.eu/software/relais/release/22 Scheuren, F., Winkler, W.E. (1993): Regression analysis of data files that are computer matched, Survey Methodology, 39-58 Scheuren, F., Winkler, W.E. (1997): Regression analysis of data files that are computer matched part II, Survey Methodology, 157-165. Winkler, W.E. (2014), Quality and Analysis of National Files - Computational Methods for Censuses and Surveys, Presentation, January 9, 2014 Zhang, L.-C. (2012), Topics of statistical theory for register-based statistics and data integration. Statistica Neerlandica, 66 Adjusting for bias due to linkage errors, Tiziana Tuoto – Vienna, June 5° 2014


Download ppt "When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto 05/06/2014 Joint work with Loredana Di Consiglio."

Similar presentations


Ads by Google