Presentation is loading. Please wait.

Presentation is loading. Please wait.

Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK

Similar presentations


Presentation on theme: "Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK"— Presentation transcript:

1 Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK Jo.wathan@manchester.ac.uk

2 UK Census context Traditional 10 yearly census at present Medium length form (c. 30 person questions, c. 10 household questions) –Ethnicity + optional religion question –No income question Legal framework in GB is Census Act 1920 –No statistics Act –Legislation only deals with confidentiality restrictions – up to 2years imprisonment!

3 1991 SARs Samples of Anonymised Records (SARs) from 1991 were first to be released Highly successful. c. 400 research papers used the data between 1993 & 2002. Also used in teaching. SARs are a commissioned output, paid for by UK Economic and Social Research Council. SARs support unit at CCSR represent client, disseminate and support the data.

4 Disclosure Control 1991 After work had been undertaken to demonstrate low risk of disclosure –Users had to register to use them –some ‘broadbanding’ or grouping of rare categories –Very large household had individual detail suppressed (12+ residents) –2 non-overlapping files for different interest groups: One for geographers One for sociologists/demographers

5 What did the 91 SARs look like? Individual SAR Individual level file 2% (c. 1.2M cases) Geography population threshold 120k = 278 SAR areas Individual year of age 10 ethnicity categories 73 categories of occupation Household SAR Hhd hierarchy 1% (c. 0.6M cases) Regional Individual year of age 10 ethnicity categories 358 categories of occupation

6 Request for 2001 SARs New work on disclosure control showed that we had previously overestimated the risk of disclosure –Requested larger sample size –Slightly more geography –A 3 rd SAR for small areas However new stricter interpretation of degree of disclosure risk required Initial level of detail available would not provide files of sufficient use for research

7 Why? Census Office concerns: –Perceived increased levels of concern amongst respondents –Increased data processing power –Increased levels of storage of personal information that might be used to match to the data Major strategic review of data stewardship issues at the time that Census outputs due for release

8 Principles Ongoing need for user consultation Recognise different users require different levels of detail (and may be able to accept different conditions) – trading detail/access against each other Trading different types of detail against each other: geog against socio/demographic etc. Flexible approach to combining a range of access and disclosure approaches: –Safe Data –Safe Users –Safe Setting International role models were very helpful

9 Where we are now Have succeeded in obtaining access to –End User License- Safe Data 2 Datasets which are accessible in the same way as in 1991: less detail on some variables, but with enough detail for research purposes –Special License – Safe Users 1 Dataset available for distribution but with extra access conditions –Controlled Access Microdata - Safe Setting Much more detailed versions of 2 datasets available in a safe setting

10 Safe Data: End User License Files Standard online application procedure for those with electronic signature (otherwise equivalent paper system) Not public data! Available for very low risk files Risk reduced by –Broadbanding (e.g. age, geography) –Perturbing data

11 EUL Files Individual SAR Individual level file 3% (c. 1.8M cases) Regional (13 categories Ages 16-74 banded 16 categories of ethnicity 81 categories of occupation Small area microdata Individual level file 5% (c. 3 M cases) Local authority geography (< 90k) 13 Age bands (c. 10 years) 13 categories of ethnicity Only broad social class variable (economic activity 3 groups)

12 Safe Users: The 2001 S-L Household SAR Additional Complexity of a household SAR required special license No geography at all & not available for Northern Ireland or Scotland Age in 2-year bands of 16 categories of ethnicity 81 categories of occupation

13 Safe setting To compensate for loss of detail in the end user and special license files Same records as Individual and Household SARs but with MUCH more detail Managed by the Census offices Access currently at only a handful of census office sites Virtual microdata laboratory environment, outputs manually checked prior to release to user Access only permitted if this is the only available data source, for work in keeping with the aims of the Census Office

14 Individual CAM Individual level file 3% (c. 1.2M cases) Local authority – with context at lower level Individual year of age to 90+ 16 ethncity categories Over 200 categories of occupation Controlled Access Microdata Household CAM Hhd hierarchy 1% (c. 0.6M cases) Local authority – with context at lower level Individual year of age to 90+ 16 ethnicity categories Over 200 categories of occupation

15 Conclusion Have a range of research worthy datasets by treating different user groups differently Traded off: –Safe data –Safe users –Safe setting http://www.ccsr.ac.uk/sars


Download ppt "Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK"

Similar presentations


Ads by Google