Presentation is loading. Please wait.

Presentation is loading. Please wait.

WP 9 Assessing Disclosure Risk in Microdata using Record Level Measures Natalie Shlomo University of Southampton Office for National Statistics

Similar presentations


Presentation on theme: "WP 9 Assessing Disclosure Risk in Microdata using Record Level Measures Natalie Shlomo University of Southampton Office for National Statistics"— Presentation transcript:

1 WP 9 Assessing Disclosure Risk in Microdata using Record Level Measures Natalie Shlomo University of Southampton Office for National Statistics Chris Skinner University of Southampton

2 Disclosure Risk Assessment for Microdata Assume: sample categorical key variables no measurement error Seek: record level risk measures aggregated to file level measures

3 Record Level Measures Record with combination of key variable values Sample count with same combination = Population count with same combination = Only consider sample unique records, i.e. = Pr(population unique) = = Pr(correct match)=

4 Aggregated File-level Measures Expected number of population uniques in sample Expected number of correct matches among sample uniques to the population Note: sample uniques

5 Estimation Problem To make inference about: Record level measures and for sample unique File level measures and

6 Log-linear Model 1., and independent given 3. where, sampling fraction Estimate by maximum likelihood,,,

7 Some Literature Skinner and Holmes (1998, JOS): good properties of under all two-way interactions log-linear model, where:, Elamir and Skinner (2006, JOS): good properties of and under all two-way interactions model, but no need for term.

8 All two-way interactions model performs well, but… still evidence of some model-dependence of and in neighborhood of this model. Tendency for risk to decrease as model complexity increases. Model Sensitivity

9 Model Choice Goodness of fit tests? Pearson? Likelihood ratio? AIC, BIC? Problems with very large and sparse tables

10 Allow for small departures from Estimate bias of by: Choose model to minimise Similar to choosing model to minimise Bias Criterion

11 Minimising Over- (Under-) Dispersion Model estimates degree of over- or under-dispersion tests hypothesis of equal dispersion Cameron and Trivedi (1998)

12 Two areas with population of 944,793. ‘Large’ Key: Area (2), Sex (2), Age (101), Marital Status (6), Ethnicity (17), Economic Activity (10) 412,080 cells ‘Small’ Key: same except Age (18) 73,440 cells Samples from 2001 UK Census

13 Small key, Simple random sample of size 18,896 True values: number of population uniques in sample: sum of over sample uniques: ModelEstimatesCameron-Trivedi Independence All 2-way

14 Large Key, Simple random sample of size 4,724 True values, Model EstimatesCameron-Trivedi Independence All 2-way

15 Model Search Algorithm Starting solution: all 2-way interactions log-linear model Search by: Removing terms Adding terms Swapping terms TABU method of Drezner, Marcoulides and Salhi (1999)

16 Large key, Simple random sample of size 9,448 True values, Model Independent All 2-way Drop {ea*s} Drop {ea*a} Drop {ea*m} Drop {ea*et} Drop {ea*ec} Drop {s*ec} Drop {a*m} Drop {a*et} Drop {a*ec} Drop {m*et} Drop {m*ec} Drop {et*ec}

17 True values, Model Drop {et*ec} Drop {ea*s}{et*ec} Drop {ea*a}{et*ec} Drop {ea*et}{et*ec} Drop {s*a}{et*ec} Drop {s*m}{et*ec} Drop {s*et}{et*ec} Drop {s*ec}{et*ec} Drop {a*et}{et*ec} Drop {m*et}{et*ec} Drop {m*ec}{et*ec} In {ea} Out {et*ec}(ea*s} {ea*a}{ea*m}{ea*et}{ea*ec} In {s} Out {et*ec}(ea*s} {s*a}{s*m}{s*et}{s*ec}

18 Record Level Risk Measures Preferred Model: {ea}{s*a}{s*m}(s*et}{s*ec}{a*m}{a*et}{a*ec}(m*et}{m*ec} True Global Risk: Estimated Global Risk

19 Record Level Risk Measures Preferred Model: {ea}{s*a}{s*m}(s*et}{s*ec}{a*m}{a*et}{a*ec}(m*et}{m*ec} True Global Risk: Estimated Global Risk True Record Level Risk Measures Estimated Record Level Risk Measures 0 – – – 1Total 0 – 0.31, , – – Total 1, ,304

20 Conclusions Model selection by assessing over-, under-dispersion Similar risk estimates for models with nearly Poisson dispersion Further work: - stratification of files - complex survey designs


Download ppt "WP 9 Assessing Disclosure Risk in Microdata using Record Level Measures Natalie Shlomo University of Southampton Office for National Statistics"

Similar presentations


Ads by Google