Presentation is loading. Please wait.

Presentation is loading. Please wait.

Disclosure Control in the UK Census Keith Spicer 11 January 2005.

Similar presentations


Presentation on theme: "Disclosure Control in the UK Census Keith Spicer 11 January 2005."— Presentation transcript:

1 Disclosure Control in the UK Census Keith Spicer 11 January 2005

2 2 Contents National Statistics Code of Practice Background 2001 Census Disclosure Control – tables 2001 Samples of Anonymised Records Summary and lessons learnt

3 3 “The information you provide is protected by law and treated in strict confidence” 2001 Census form “Precautions will be taken so that published tabulations and abstracts of statistical data do not reveal any information about identifiable individuals or households” 2001 Census White Paper Cm4523, para 120

4 4 National Statistics Code of Practice “The National Statistician will set standards for protecting confidentiality, including a guarantee that no statistics will be produced that are likely to identify an individual unless specifically agreed with them” “It would take a disproportionate amount of time, effort and expertise for an intruder to identify a statistical unit to others, or to reveal information about that unit not already in the public domain”

5 5 National Statistics Code of Practice The purpose of disclosure control is to ensure that no unauthorised individual, technically competent with public data and private information could: identify information on an individual that has been supplied in confidence to ONS (such as in census or survey returns) with a reasonable degree of confidence

6 6 National Statistics Code of Practice Identity Disclosure – the association of a respondent’s identity with a disseminated data record Attribute Disclosure – the association of a respondent with an attribute value in the disseminated data (or an estimated attribute value based on the disseminated data)

7 7 Background Area ALLTINo LLTI TOTAL Econ Active 41620 Not Econ Active 12113 TOTAL161733 The table is disclosive because: (1) The person who is Not Econ Active and not LLTI can be identified in the table, both by themselves and others who know all the information (Identity Disclosure) (2) Any of these could then deduce that any other widowed male 45-59, COB=not UK and not Econ Active, has LLTI. Disclosure Example 1 For widowed males aged 45-59, COB=not UK

8 8 Background Area B2+ Cars 1 Car 0 Cars TOTAL Single419831 Married148527 Sep/Div /Wid 0606 TOTAL18331364 The table is disclosive because: If you know someone who is Separated, Widowed or Divorced in Area B, you can deduce they have 1 Car. Information being disclosed (Attribute Disclosure) Disclosure Example 2

9 9 Background Area CLLTINo LLTITOTAL Qual12165177 No Qual14108122 TOTAL26273299 The tables are disclosive because: Though each table is not disclosive by itself, they are in combination – we can ascertain a similar table for Area E The Area E table would have a 1 for LLTI – Qual cell Disclosure by Differencing. Disclosure Example 3 Area C (contains two smaller areas D and E) Area DLLTINo LLTITOTAL Qual11105116 No Qual87381 TOTAL19178197

10 10 Background 1991 Census Barnardisation: Adjustment of cells in tables by -1, 0 or +1, so that observed 1s not true 1s for certain However, still a good chance that an observed 1 was a ‘true’ 1 A degree of uncertainty about the accuracy of information apparently disclosed about an individual does not ensure that confidentiality has been completely protected

11 11 Background Since 1991: Increased risk of disclosure in 2001:- 2001 Census results more widely accessible, allowing Census data to be downloaded more freely Electronic storage of other data sets now much easier – increased risk of Census data being matched with other sources

12 12 Background More detail in 2001 Census outputs as smaller areas and more flexible boundaries desired by users. Data provided were considerably lower in geographic size than lowest level provided in 1991 Changing attitudes to trust in which public agencies are held 2001 Census data 100% coded, as opposed to 10% (for some) in 1991 – the 10% added level of uncertainty to published results

13 13 2001 Census Disclosure Control PRE-TABULATION Changes made to data records prior to preparing tables. 2001 Census the first to consider pre-tabulation methods as part of disclosure control. Record swapping Entire household record, except geographic variables, swapped with another in neighbouring area (paired on number, sex and grouped age of persons) Within LA - does not affect stats at LA or above No need for additional edit checks Statistical differences less than volume of changes Creates uncertainty about accuracy of identity

14 14 2001 Census Disclosure Control POST-TABULATION Changes made subsequent to preparing tables. Generally time- consuming as each output has to be checked. Small Cell Adjustment Only cells containing small counts are adjusted, so level of adjustment considerably less than that imposed under rounding Adjustment usually has little impact on the conclusions that can be validly drawn from the data Each table internally additive, though some totals from different breakdowns may be different

15 15 2001 Census Disclosure Control 2001 Census disclosure control used:- Record swapping – to introduce a degree of uncertainty into identity without affecting figures at LA and above Small cell adjustment – in addition, so that highly unusual people and households significantly less visible in the outputs Thresholds for Output Areas – minimum 40 households, 100 persons (recommended size 125 households); Standard Tables minimum 400 households, 1000 persons Use of Output Areas as building blocks

16 16 2001 Census Disclosure Control Effects:- Small cells in tables will not necessarily be ‘true’ figures Each table internally additive, but totals may appear inconsistent between different tables Time consuming for ONS to check each set of tables produced – particularly for Commissioned Output, for small areas; possibility of disclosure by differencing

17 17 2001 Census Disclosure Control Advice for users Use highest level of geography with fewest breakdowns and fewest number of cells summed Sources of error not only in disclosure control but in coverage error, respondent error and other processing error, e.g. One Number Census adjustment, data capture and coding, edit and imputation

18 18 Samples of Anonymised Records Licensed Samples of Anonymised Records (SARs) from 2001 Census 3% sample of individual records to Regional level (Version 1 available October 04) 1% sample of household records to Country level (due to be available Spring 05) Version 2 of individual SAR due to be available February 05

19 19 Samples of Anonymised Records Licensed Individual SAR – available through CCSR All researchers must sign agreement not to attempt to identify any individual from the SAR Disclosure may be inadvertent by differencing between a number of tables

20 20 Samples of Anonymised Records Initial approach to restrict sample uniques by recoding Version 1 Individual SAR – –grouped age individual years to 15, 16-18, 8 bands 18-74, individual year 75+, –grouped ethnic group variable to 5 categories, –occupation group to 25 categories, –country of birth E, W, S, NI, Rep Ire, EU, Other Post-Randomisation (PRAMming) – perturbation of some variables, normally by one category, only on a percentage of ‘risky’ records

21 21 Samples of Anonymised Records Any observed ‘1’ in a SAR table is unlikely to be a real population ‘1’: –The 1 is 1 from a 3% sample (members unknown) –PRAMming will have the effect of ‘moving’ members into / out of cells Version 2 Individual SAR will have:- – 81 occupational categories (25 in Version 1) – the full 16 ethnic group categories (5) – breakdown of country of birth to 16 categories (7) Due February 05

22 22 Samples of Anonymised Records In-house Controlled Access SARS with full detail on 3% individuals Labs in Titchfield and London Access through application, form available through ONS – applications assessed by Census Research Access Board (CRAB) All lab outputs assessed for disclosure (normally within one week)

23 23 Summary and lessons learnt Tables protected by both pre-tabulation (record swapping) and post-tabulation (small cell adjustment) SARs available for bespoke analysis –Licensed through CCSR –Controlled access through ONS data lab

24 24 Lessons learnt Protection of confidentiality of individual details becomes more difficult with each Census Disclosure risk assessment should have been carried out earlier to allow earlier consultation and more time to conduct research and develop different options

25 25 Lessons Learnt Need to provide users with information about the measurement and other errors that exist within Census data Review of 2001 disclosure control in preparation for 2011

26 26 Contact details Keith Spicer Office for National Statistics Segensworth Road Titchfield Fareham PO15 5RR 01329 813062 keith.spicer@ons.gov.uk sars@ons.gov.uk


Download ppt "Disclosure Control in the UK Census Keith Spicer 11 January 2005."

Similar presentations


Ads by Google