Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.

Similar presentations


Presentation on theme: "Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier."— Presentation transcript:

1 www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier Dupriez, World Bank Francois Fonteneau, IHSN/P21 Mark McConaghy, DFID

2 www.ihsn.org I nternational H ousehold S urvey N etwork A network of international agencies Based in Paris at the OECD at PARIS21 A coordinating mechanism to: – Improve quality and use of household survey data in developing countries – Harmonize international recommendations for survey design, data analysis, etc – Produce and disseminate international good practices … About IHSN

3 www.ihsn.org Accelerated Data Program Implementing the IHSN Tools in the countries Technical and financial support to establish national data archives (in > 50 countries) Many datasets documented (DDI) Improved access to data by researchers, but not yet satisfactory. We can measure demand through the NADA Need to anonymize data remains the most frequently expressed concern and obstacle to data access. The ADP has provided some guidance but there is a lack of simple and intuitive tools and guidelines available ADP countries.

4 ADP/IHSN in the world ADP countryExpected ADP in 2009By partners

5 www.ihsn.org Setting up Catalogs

6 Focus Nigeria Effects of data availability on MDG 7.Halving the population without sustainable access to safe drinking water. Providing robust estimates to inform policy makers and sector monitoring. Water and Sanitation Sector. Workshop with WHO/UNICEF

7 www.ihsn.org Effects of Data Availability Nigeria and the MDG: Rural access to improved water source

8 Resistance in the countries Nigeria Statistics Law: Statistical Act of 2007 obliges microdata release after due anonymization. The legal framework exists. Willing institution (the NBS in Nigeria) Current anonymization strategies undertaken are limited to removal of direct identifiers however, Other countries are unable to articulate a proper policy for dissemination and tend to use confidentiality as a barrier to mask political resistance or inertia. IHSN anonymization tools will be a way to deal with both real ethical concerns but also political resistance

9 www.ihsn.org Better use of survey data Lots of survey data remain under-exploited because not accessible by researchers/users Obstacles: – Technical – Psychological – Financial  Support by many sponsors – Legal – Ethical – Political  … ? … IHSN Dissemination Policy Guidelines Missing piece: SDC tools  IHSN data documentation and cataloguing tools and guidelines 

10 www.ihsn.org Direct identifiers, which are variables such as names, addresses, or identity card numbers. They permit direct identification of a respondent but are not needed for statistical or research purposes, and should thus be removed from the published dataset. Indirect identifiers, which are characteristics that may be shared by several respondents, and whose combination could lead to the re-identification of one of them. For example, the combination of variables such as district of residence, age, sex, and profession would be identifying if only one individual of that particular sex, age and profession lived in that particular district. Such variables are needed for statistical purposes, and should thus not be removed from the published data files. Anonymize: Process

11 Once all identifying variables have been removed we can still have a disclosure problem, the problem remains dealing with the indirect identifiers. The IHSN Anonymization tools will approach these problems by building on a great deal of technical work undertaken by experts in the field. The IHSN hosted an expert meeting in October 2008 to present its tools and acknowledges the work done by: University of Manchester ISTAT (Italian Statistics) Cornell University ICPSR Defining the problem

12 Developing SDC tools Building on existing work Not an integrated software A collection of specialized tools for: – Measuring the risk – Reducing the risk – Assessing the information loss 12 plug ins developed in C++ that interface with SPSS, STATA or direct Server (Windows/Linux). Need to be thoroughly tested.

13 12 Plug-ins 12 plug-ins 1.The μ-argus risk for weighted sample 2.Re-identification rate to individual risk threshold 3.Individual risk to household risk 4.L-diversity for unweighted data 5.SUDA2: DIS-sample data 6.Kanon: Micro-aggregation 7.Local recoding 8.Fixed length micro aggregation 9.Noise Addition 10.Pram: Post Randomization 11.Rank Swapping 12.Sampling Risk Measures & Intruder Scenarios What does the intruder know? Risk Reduction What does the intruder want?

14 Based on CENEX Handbook on Statistical Disclosure Control Version 1.01 Individual risk methodology Poisson model Individual Hierarchical K-anonymity l-diversity t-completeness SUDA Record linkage Distance-based Probabilistic Others Measuring Disclosure Risk

15 Based on CENEX Handbook on Statistical Disclosure Control Version 1.01 Masking dataSynthetic data file Perturbative Sampling Global recoding Top/bottom coding Local suppression Non perturbative MASCC Fixed/variable group Uni-/Multivariate Uncorrelated Correlated Non-linear Noise addition Multiplicative noise Micro-aggregation Data swapping Rank swapping Rounding Resampling PRAM Local recoding Reducing risk disclosure

16 Categorical dataContinuous data Entropy-based measuresMean variation Direct comparison Comparison of contingency tables Mean square error Mean absolute error Based on CENEX Handbook on Statistical Disclosure Control Version 1.01 Measuring Information Loss

17 In Stata (SPSS, SAS) using C++ plugins – Stata version 9 or > – Log file for easy replication of procedure – Informative output Or command-line (plugins with “data server”) Why Stata (SPSS/SAS)? – Because most countries use/know these software – Can use all tabulation and analysis functions Developing SDC tools Proposal

18 Beta Interface

19 Large, imperfect datasets in under resourced countries For use by official data producers in developing countries (IHSN objective) Relevant for other users as well Free to all; public source code Target use

20 Testing, “calibrating” and documenting – Cornell + IHSN + selected countries Development/implementation of training and TA program – Detailed documentation and guidelines – Reference manual and training materials Possibly launched before end of the year (IHSN website) Participation of others welcome Work Program for 2009

21 Adding to the Tools to facilitate data access in developing countries: – Tools Metadata Editor CDROM/HTML developer Web Based National Data Archives Question Bank – Guidelines Data Dissemination Documentation Guide Survey Quality Assessment Framework

22 www.ihsn.org Thank you. The End


Download ppt "Www.ihsn.org Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier."

Similar presentations


Ads by Google