Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK

Slides:



Advertisements
Similar presentations
Microdata access in practice Felix Ritchie. Overview Concerns Conceptual and practical concerns International practice UK experience Key lessons.
Advertisements

Dealing with confidential research information - Anonymisation techniques and access regulations to enable using and sharing research data Data Management.
ONS data – improving access Richard Laux National Statistics and International Division, ONS.
Samples of Anonymised Records: a resource for ethnicity research Ed Fieldhouse Director, SARs Support team
Will 2011 be the last Census of its kind in England and Wales? Roma Chappell, Programme Director Beyond 2011 Office for National Statistics, July 2011.
Samples of Anonymised Records from the 2001 Census Five different microdata files - with varying amounts of detail Three different modes of access - with.
Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009.
2011 SARs Consultation: Analysing ethnicity and identity variables David Owen, University of Warwick.
Requirements for 2011 Cross-sectional Microdata SARs Support Team University of Manchester
Requirements for 2011 Cross-sectional Microdata Ed Fieldhouse SARs Support Team University of Manchester
Confidentiality and the SARs Update on SAR progress, and discussion of the disclosure work done for Scotland. Sam Smith
User views Jo Wathan SARs Support team
Canadian Census 2006 Public Use Microdata File Presentation at the SARS Conference Manchester, United Kingdom September 3, 2008 Presented by: Sri Kanagarajah,
The Statistics Act and Research Access to Data Paul J Jackson Legal Services ONS.
IHS: Requirements for Secondary Analysts Jo Wathan ESDS Government University of Manchester.
Output Consultation Plans and Statistical Disclosure Control Strategy developments Angele Storey and Jane Longhurst ONS.
Conference Programme Introduction to the Samples of Anonymised Records - Keith Spicer, ONS CCSR's role in providing SAR's support - Jo Wathan,
CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.
ESDS Resources Vanessa Higgins ESDS Government Centre for Census and Survey Research University of Manchester.
The Samples of Anonymised Records: Understanding Individual differences Mark Brown.
Issues in Designing a Confidentiality Preserving Model Server by Philip M Steel & Arnold Reznek.
The Special Licence model for access to more detailed micro data IASSIST 2006 Thursday 25 May Karen Dennison UK Data Archive.
Scotland’s 2011 Census Migration Matters Scotland Thematic Event Cecilia Macintyre 26 February 2015.
2001 Census Programme Delivering UK Census Data to Researchers: Progress and Challenges David Martin University of Southampton and ESRC/JISC Census Programme.
Settings, Practices and Data Access: Results of a Survey of UK Social Scientists Jo Wathan Centre for Census and Survey Research University of Manchester.
United Nations Expert Group Meeting on Revising the Principles and Recommendations for Population and Housing Censuses New York, 29 October – 1 November.
Access to UK Census Data for Spatial Analysis: Towards an Integrated Census Support Service John Stillwell 1, Justin Hayes 2, Rob Dymond-Green 2, James.
Secondary Data Analysis Using the Census Stephen Drinkwater WISERD School of Business and Economics Swansea University.
Census.ac.uk Census Area Statistics and Casweb David Rawnsley Census Dissemination Unit (CDU) Mimas University of Manchester.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Beyond 2011 – A new paradigm for population statistics? Pete Benton, Beyond 2011 Programme Director Office for National Statistics, UK.
Nigel James Bodleian Library The Census Accessing and mapping British Census Data.
GEOG3025 Census and administrative data sources 2: Outputs and access.
Disclosure Avoidance: An Overview Irene Wong ACCOLEDS/DLI Training December 8, 2003.
WP. 46 Providing access to data and making microdata safe, experiences of the ONS Jane Longhurst Paul Jackson ONS.
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
1 The 2001 Census PUMFS Odyssey Sponsored by HAL and PALS Presented by Chuck Humphrey.
Population census micro data for research: the case of Slovenia Danilo Dolenc Statistical Office of the Republic of Slovenia Ljubljana, First Regional.
Census/NeSS Roadshows March 2003 Better Information Initiatives.
Dissemination and interpretation of time use data Social and Housing Statistics Section United Nations Statistics Division Time Use Statistics workshop.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
New and easier ways of working with aggregate data and geographies from UK censuses Justin Hayes UK Data Service Census Support.
Access to sensitive data in the UK: a principles-based approach Felix Ritchie.
RESEARCH ETHICS AND DATA CONFIDENTALITY: ANONYMISATION AND ACCESS CONTROL ……………………………………………………………………………………………………………………………….…………………………….. ……………………………………………………………......…...
Joint UNECE / Eurostat meeting on Population and Housing Censuses 7-9 July 2010, Geneva Disseminating Census information to maximise use and value Keith.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
JOINT UN-ECE/EUROSTAT MEETING ON POPULATION AND HOUSING CENSUSES GENEVA, MAY 2009 DETERMINING USER NEEDS FOR THE 2011 UK CENSUS IAN WHITE, Office.
Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage Mark Elliot, University of Manchester Cathie.
Disclosure Control in the UK Census Keith Spicer 11 January 2005.
1 Dissemination Michael J. Levin Harvard Center for Population and Development Studies
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Disclosure Analysis: What do RDC Analysts do? Research Data Centre Program, Statistics Canada James Chowhan Ontario DLI Training, Queen's University
Access to microdata in the Netherlands: from a cold war to co-operation projects Eric Schulte Nordholt Senior researcher and project leader of the Census.
Development of UK Virtual Microdata Laboratory Felix Ritchie Shanghai, March 2010.
Exploring Microsimulation Methodologies for the Estimation of Household Attributes Dimitris Ballas, Graham Clarke, and Ian Turton School of Geography University.
Business data linking recent UK experience. business data in the UK common register (IDBR) since 1994 key law: Statistics of Trade Act 1947 data collection.
JOINT UN-ECE/EUROSTAT MEETING ON POPULATION AND HOUSING CENSUSES GENEVA, 7-9 JULY 2010 DISSEMINATING THE RESULTS OF THE 2011 CENSUS IN ENGLAND AND WALES.
IAOS Shanghai – Reshaping Official Statistics Some Initiatives on Combining Data to Support Small Area Statistics and Analytical Requirements at.
Samples of Anonymised Records from the U.K. Census 1991 and 2001 Integrating Census Microdata Workshop Barcelona th July 2005 Dr. Ed Fieldhouse Cathie.
Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau.
Evaluating the potential for moving away from a traditional census Becky Tinsley Office for National Statistics (ONS), UK.
Disclosure scenario and risk assessment: Structure of Earnings Survey
Development of UK Virtual Microdata Laboratory
Data Confidentiality and the Common Good.
Samples of Anonymised Records: a resource for ethnicity research
Legal, political and methodological issues in confidentiality in the ESS Maria João Santos, Jean-Marc Museux Eurostat.
Disclosure Avoidance: An Overview
High-level Working Group on Statistical Confidentiality
Federal Statistical Office Germany Research Data Centre
Adult Education Survey Anonymisation Point 6
Presentation transcript:

Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK

UK Census context Traditional 10 yearly census at present Medium length form (c. 30 person questions, c. 10 household questions) –Ethnicity + optional religion question –No income question Legal framework in GB is Census Act 1920 –No statistics Act –Legislation only deals with confidentiality restrictions – up to 2years imprisonment!

1991 SARs Samples of Anonymised Records (SARs) from 1991 were first to be released Highly successful. c. 400 research papers used the data between 1993 & Also used in teaching. SARs are a commissioned output, paid for by UK Economic and Social Research Council. SARs support unit at CCSR represent client, disseminate and support the data.

Disclosure Control 1991 After work had been undertaken to demonstrate low risk of disclosure –Users had to register to use them –some ‘broadbanding’ or grouping of rare categories –Very large household had individual detail suppressed (12+ residents) –2 non-overlapping files for different interest groups: One for geographers One for sociologists/demographers

What did the 91 SARs look like? Individual SAR Individual level file 2% (c. 1.2M cases) Geography population threshold 120k = 278 SAR areas Individual year of age 10 ethnicity categories 73 categories of occupation Household SAR Hhd hierarchy 1% (c. 0.6M cases) Regional Individual year of age 10 ethnicity categories 358 categories of occupation

Request for 2001 SARs New work on disclosure control showed that we had previously overestimated the risk of disclosure –Requested larger sample size –Slightly more geography –A 3 rd SAR for small areas However new stricter interpretation of degree of disclosure risk required Initial level of detail available would not provide files of sufficient use for research

Why? Census Office concerns: –Perceived increased levels of concern amongst respondents –Increased data processing power –Increased levels of storage of personal information that might be used to match to the data Major strategic review of data stewardship issues at the time that Census outputs due for release

Principles Ongoing need for user consultation Recognise different users require different levels of detail (and may be able to accept different conditions) – trading detail/access against each other Trading different types of detail against each other: geog against socio/demographic etc. Flexible approach to combining a range of access and disclosure approaches: –Safe Data –Safe Users –Safe Setting International role models were very helpful

Where we are now Have succeeded in obtaining access to –End User License- Safe Data 2 Datasets which are accessible in the same way as in 1991: less detail on some variables, but with enough detail for research purposes –Special License – Safe Users 1 Dataset available for distribution but with extra access conditions –Controlled Access Microdata - Safe Setting Much more detailed versions of 2 datasets available in a safe setting

Safe Data: End User License Files Standard online application procedure for those with electronic signature (otherwise equivalent paper system) Not public data! Available for very low risk files Risk reduced by –Broadbanding (e.g. age, geography) –Perturbing data

EUL Files Individual SAR Individual level file 3% (c. 1.8M cases) Regional (13 categories Ages banded 16 categories of ethnicity 81 categories of occupation Small area microdata Individual level file 5% (c. 3 M cases) Local authority geography (< 90k) 13 Age bands (c. 10 years) 13 categories of ethnicity Only broad social class variable (economic activity 3 groups)

Safe Users: The 2001 S-L Household SAR Additional Complexity of a household SAR required special license No geography at all & not available for Northern Ireland or Scotland Age in 2-year bands of 16 categories of ethnicity 81 categories of occupation

Safe setting To compensate for loss of detail in the end user and special license files Same records as Individual and Household SARs but with MUCH more detail Managed by the Census offices Access currently at only a handful of census office sites Virtual microdata laboratory environment, outputs manually checked prior to release to user Access only permitted if this is the only available data source, for work in keeping with the aims of the Census Office

Individual CAM Individual level file 3% (c. 1.2M cases) Local authority – with context at lower level Individual year of age to ethncity categories Over 200 categories of occupation Controlled Access Microdata Household CAM Hhd hierarchy 1% (c. 0.6M cases) Local authority – with context at lower level Individual year of age to ethnicity categories Over 200 categories of occupation

Conclusion Have a range of research worthy datasets by treating different user groups differently Traded off: –Safe data –Safe users –Safe setting