Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009.

Slides:



Advertisements
Similar presentations
Estimating Identification Risks for Microdata Jerome P. Reiter Institute of Statistics and Decision Sciences Duke University, Durham NC, USA.
Advertisements

The Impact of LFS & APS Reweighting Marilyn Thomas Labour Force Survey Output Manager, Office for National Statistics.
Census.ac.uk SARs Census.ac.uk Where we are Phase 1: SARs user meeting 12 November 2007 consultation survey with users/non-users in Phase.
Confidentiality and the SARs Update on SAR progress, and discussion of the disclosure work done for Scotland. Sam Smith
2011 Census Outputs Plans and Progress. CONTENTS Aims for 2011 Census Outputs Strategy Development User Consultation Next Steps.
Balancing Access and Confidentiality Jenny Telford Australian Bureau of Statistics September 2008.
Output Consultation Plans and Statistical Disclosure Control Strategy developments Angele Storey and Jane Longhurst ONS.
Conference Programme Introduction to the Samples of Anonymised Records - Keith Spicer, ONS CCSR's role in providing SAR's support - Jo Wathan,
WP 33 Information Loss Measures for Frequency Tables Natalie Shlomo University of Southampton Office for National Statistics Caroline.
Discussion of topic VI Censuses Work Session on Data Editing Vienna, April 21 st -23 rd 2008 Heather Wagstaff & Thomas Burg.
Household Projections for England Yolanda Ruiz DCLG 16 th July 2012.
Progress on the SDC Strategy for the 2011 Census 23 rd June 2008 Keith Spicer and Caroline Young.
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector.
Data linking – Project update 15 th May 2012 – Homecare & SDS event Atlantic Quay Ellen Lynch & Euan Patterson.
SDC for continuous variables under edit restrictions Natalie Shlomo & Ton de Waal UN/ECE Work Session on Statistical Data Editing, Bonn, September 2006.
Assessing Disclosure Risk in Sample Microdata Under Misclassification
Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.
Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK
Len Cook: Hero or Zero of the 2001 Census? OR A look at the impact of disclosure control on aggregate census outputs.
2001 Census Programme Delivering UK Census Data to Researchers: Progress and Challenges David Martin University of Southampton and ESRC/JISC Census Programme.
Methods of Geographical Perturbation for Disclosure Control Division of Social Statistics And Department of Geography Caroline Young Supervised jointly.
Synthetic Data within the Risk – Utility Framework Keith Spicer Office for National Statistics.
1 Numerical Data Masking Techniques for Maintaining Sub-Domain Characteristics Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State.
National Household Survey: collection, quality and dissemination Laurent Roy Statistics Canada March 20, 2013 National Household Survey 1.
11 Comparison of Perturbation Approaches for Spatial Outliers in Microdata Natalie Shlomo* and Jordi Marés** * Social Statistics, University of Manchester,
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.
The Application of the Concept of Uniqueness for Creating Public Use Microdata Files Jay J. Kim, U.S. National Center for Health Statistics Dong M. Jeong,
Intruder Testing: Demonstrating practical evidence of disclosure protection in 2011 UK Census Keith Spicer, Caroline Tudor and George Cornish 1 Joint UNECE/Eurostat.
1 Tel Aviv April 29th, 2007 Disclosure Limitation from a Statistical Perspective Natalie Shlomo Dept. of Statistics, Hebrew University Central Bureau of.
LOOKING TOWARDS 2011 Ian Cope Director 2011 Census.
Statistical Disclosure Control for the 2011 UK Census Jane Longhurst, Caroline Young and Caroline Miller (ONS)
WP. 46 Providing access to data and making microdata safe, experiences of the ONS Jane Longhurst Paul Jackson ONS.
1 Statistical Disclosure Control Methods for Census Outputs Natalie Shlomo SDC Centre, ONS January 11, 2005.
1 Statistical Disclosure Control for Communal Establishments in the UK 2011 Census Joe Frend Office for National Statistics.
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
Census/NeSS Roadshows March 2003 Better Information Initiatives.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
2011 Census: Lessons learned from the Business Sector Dr Barry Leventhal MRS Census & Geodemographics Group CAG Meeting 8 th January 2015.
The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM Researcher Janika Konnu Manchester, United Kingdom December.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 16.
Joint UNECE / Eurostat meeting on Population and Housing Censuses 7-9 July 2010, Geneva Disseminating Census information to maximise use and value Keith.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Disclosure Avoidance at Statistics Canada INFO747 Session on Confidentiality Protection April 19, 2007 Jean-Louis Tambay, Statistics Canada
1 IPAM 2010 Privacy Protection from Sampling and Perturbation in Surveys Natalie Shlomo and Chris Skinner Southampton Statistical Sciences Research Institute.
JOINT UN-ECE/EUROSTAT MEETING ON POPULATION AND HOUSING CENSUSES GENEVA, MAY 2009 DETERMINING USER NEEDS FOR THE 2011 UK CENSUS IAN WHITE, Office.
Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage Mark Elliot, University of Manchester Cathie.
Disclosure Control in the UK Census Keith Spicer 11 January 2005.
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
1 WP 10 On Risk Definitions and a Neighbourhood Regression Model for Sample Disclosure Risk Estimation Natalie Shlomo Hebrew University Southampton University.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
1 1 Confidentiality protection of large frequency data cubes UNECE Workshop on Statistical Confidentiality Ottawa October 2013 Johan Heldal and Svetlana.
Access to microdata in the Netherlands: from a cold war to co-operation projects Eric Schulte Nordholt Senior researcher and project leader of the Census.
Data Management and Analysis John Hollis Demographic Consultant, GLA Data Management and Analysis Statistical Aspects.
2011 Census Data Quality Assurance Strategy: Plans and developments for the 2009 Rehearsal and 2011 Census Paula Guy BSPS 10 th September 2009.
Exploring Microsimulation Methodologies for the Estimation of Household Attributes Dimitris Ballas, Graham Clarke, and Ian Turton School of Geography University.
The 2011 Census: Estimating the Population Alexa Courtney.
JOINT UN-ECE/EUROSTAT MEETING ON POPULATION AND HOUSING CENSUSES GENEVA, 7-9 JULY 2010 DISSEMINATING THE RESULTS OF THE 2011 CENSUS IN ENGLAND AND WALES.
Remote Analysis Server for Tabulation and Analysis of Data Tarragonia, October 2011 James Chipperfield and Frank Yu (presenter)
Census Office Fernando Casimiro Geneva, July 2010 Portugal – Census results tailored to user needs «
Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011.
Samples of Anonymised Records from the U.K. Census 1991 and 2001 Integrating Census Microdata Workshop Barcelona th July 2005 Dr. Ed Fieldhouse Cathie.
The complexities of publishing gridded data for the UK European Forum for Geostatistics Krakow – October 2014 Ian Coady Geography Policy and Research Manager.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
Natalie Shlomo Social Statistics, School of Social Sciences
Creation of synthetic microdata in 2021 Census Transformation Programme (proof of concept) Robert Rendell.
Progress towards a table builder with in-built disclosure control for 2021 Census Keith Spicer UNECE, 22 September 2017.
Assessing Disclosure Risk in Microdata
Integrating administrative data – the 2021 Census and beyond
Perturbative methods for ESS census tables
Confidentiality on the Fly
Presentation transcript:

Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

CONTENTS 2011 Census: Context : Progress Tabular outputs: Short-listed methods Risk Utility Framework and measures Registrars General statements Microdata: Reflection on 2001 use of SDC Issues arising

2011 Census - Context SDC for 2011 Census outputs is a major concern for users Different SDC methodologies were adopted for tabular 2001 Census outputs across UK Late addition of small cell adjustment by ONS/NISRA resulted in high level of user confusion and dissatisfaction Publicised commitment to aim for a common UK SDC methodology for all 2011 Census outputs

Progress Development of SDC Strategy UK SDC working group established to take forward methodological work consisting of representatives from Wales, Northern Ireland and Scotland UKCDMAC subgroup set up to QA work Methodological research: Determine the short-list of SDC methods (Aug 07) Quantitative evaluation of short-list (continuing)

Short-listed methods PRE-TABULAR Record swapping Over-imputation POST-TABULAR IACP (Invariant ABS Cell Perturbation) Using 2001 Census tables to assess SDC methods

B Area B A Treatment: FFind a different geographical Area F Identify another individual in a different area with virtually all the same characteristics F Swap the records Characteristics: Age: 22, Sex: Male, Marital Status: Married N o of Cars: 3 Region: Area A Characteristics Age: 22, Sex: Male, Marital Status: Married N o of Cars: 1 Region: Area B Matches all variables except N o of Cars Unique as only person with 3 cars in Area A Swap records Record Swapping

25 malesingle 6 people in hhld 0 carsstudent 21 malesingle 6 people in hhld 0 carsstudent Blank out age from record Find a donor to impute age Over-Imputation Select set of records to be protected – either random or targeted Distance based nearest neighbour to use as a donor based on a set of matching variables

Invariant ABS Cell Perturbation (IACP) Method Based on method developed by Australian Bureau of Statistics (ABS) Perturb each cell value in a table to create uncertainty around the true value This new post-tabular method preserves consistency: same cell value in different tables always the same – however small inconsistencies when cells broken down further

Risk Utility Framework Minimising risk of disclosure is important (in fact probably the most important aspect of SDC) But so is maintaining utility of data………

The Statistical Disclosure Control Problem Original Data Data Utility: Information about legitimate items Maximum Tolerable Risk Released Data No data Disclosure Risk: Information about confidential units

Risk and Utility Measures Risk measures (original v protected): Attribute disclosure - % protected Group disclosure Within group disclosure Negative attribute disclosure % of zeros left unchanged Identity disclosure - % small cells unperturbed

Risk and Utility Measures Utility measures (original v protected table): Ratio of variances across variables Association between variables – Cramers V Hellingers Distance metric Absolute Deviation – Relative & Absolute Impact on totals & sub-totals

Registrars General statements Commitment to aim for common UK SDC methodology Small counts could be included in publicly disseminated tables provided that – Sufficient uncertainty that count is true value – Creating that uncertainty does not significantly damage the data Key risk for 2011 output is attribute disclosure Their preference is for pre-tabular method

SDC for Tabular Outputs: Next steps Intention to go to UKCC in July 2009 with broad strategy Additional work on level of protection necessary

Microdata: reflection on 2001 use of SDC Ind L SARSAMSL-HSARCAMS PRAMPRAM (more)Some PRAM- RecodeRecode (more)Some Recode ,157+ GORLAE&W combinedLA 3% indiv5% indiv1% hhold3%, 1% EUL SLVML

Microdata: Issues arising I Protection through either access (CAMS), data perturbation (EUL samples) or bit of both (SL-HSAR) PRAM involved post-randomisation of variables – transition probability matrix; most values perturbed, if at all, by one or two categories – goal to treat sample uniques that are also population uniques How much protection is offered by EUL, SDS, VML Onus on researchers to comply with conditions as well as ONS to provide access

Microdata: Issues arising II Smaller sample does help (uncertainty that an individual or household is in the microdata) Want tabular outputs to provide sufficient uncertainty at all geographies – c.f. record swapping in Scotland 2001 Over-imputation and IACP would offer some protection to microdata After decision on tabular outputs, need to consider any additional SDC needed for microdata products

Summary UK SDC Working Group in mid-June; UKCC in late July to agree strategy for tabular outputs Three short-listed methods Effect on microdata is among assessment criteria Choice of method for tables will influence how we protect microdata Likely to be a range of microdata samples – making use of either/both SDC and access conditions Work on specific SDC methods for microdata will progress further after decision on tabular methods

Thank you Any Questions ?