Presentation on theme: "Confidentiality and the SARs Update on SAR progress, and discussion of the disclosure work done for Scotland. Sam Smith"— Presentation transcript:
Confidentiality and the SARs Update on SAR progress, and discussion of the disclosure work done for Scotland. Sam Smith firstname.lastname@example.org
Update 2001 SARs Newsletter published very recently: More delays Disclosure Control is ongoing by CAPRI Current estimate for Individual data to be with the SARs team in June In-house access at ONS for users with urgent need.
England and Wales For the release of 100% tables, England and Wales and Northern Ireland rounded small cell counts. It is not possible to match between the SAR and the tables for England, Wales and NI.
Scotland Scotland did not round their 100% tables. As a result, there are counts of 1 in the tables. If any of these individuals are present in the SAR, it is disclosive.
Background The following work has been carried out in collaboration with the General Register Office for Scotland, by the SARs team at CCSR. At time of writing, I have had no access to disclosive data. There is no geography below Scotland level.
Population Uniques Population Uniques are people who have one or more characteristics which are Unique in the Population. Sample Uniques are people who are unique on one or more characteristics in the Sample.
Scale There are 62 variables in both the SAR and 100% tables. GROS are interested in Tri-variate tables. Only concerned with uniques. We obtained 37,820 tables, covering all combinations of trivariate tables.
Request of the tables An example request for input to their system was provided by GROS We then replicated and modified it, one request for each table. The tables arrived on 4 CDs, a month later.
An example table Space-Time Research 2001 ED Based OSD - Test 1 Table 1 Cars - Number of by Ever worked Indicator and Number of Rooms for Person No code requiredNo code requiredNo code requiredNo code required No code required Not applicable01-0203-0405-067+ None-53,323421,443232,33518,719 One-33,839577,499759,187188,235 Two-6,104174,884499,420368,657 Three-77220,02983,91584,619 Four or more-2224,62220,35329,984 Communal establishment50,485---- Cars - Number of by Ever worked Indicator and Number of Rooms Only No Code Required shown for Ever Worked.
A Bigger Example Table Age, Industry, Occupation Add table here
Analysis Custom software written to parse each table, and list the file, variables and values locations of all uniques. List the Uniques. There are 2.4 million of them.
Implementation Step by Step process. Keep intermediate steps. Keep It Simple.
Target The Scotland Specification is as compatible as possible with the England and Wales specification. Use recodes to reduce the unique count to a level where they can be dealt with on an record by record basis.
Simple Suppression of Uniques All records with uniques must be perturbed. Approximately 96% of Uniques will be immediately suppressed by virtue of the sample being 4%. There are also reductions because of differences in the specifications.
Recodes Variables were recoded to coarser categories. Some used to aid E&W disclosure work including: Age, Hours of Work, Industry + others At time of writing, Occupation is the only additional recode for Scotland.
Running the recodes. The previous slide represents 6 weeks of iterative work. Each recode had the uniques analysis run, producing a list of uniques.
Moving forwards We now have a slightly more restrictive specification for Scotland. Age recoded to between 2 and 5 year bands (for age 16+) (possibly also for EWNI) Occupation in ?? categories Industry in 15 categories (applied to EWNI) Hours of Work banded (applied to EWNI)
So far… Everything has been done on publicly accessible data. The above process needs to be rerun on the SAR to find Sample Uniques This requires access to the disclosive microdata.
Future Work The 38,720 tables will be recreated for the records in the sample. The lists of Population Uniques and Sample Uniques will be compared. Where there is a Population Unique in the Sample, it will be flagged.
Applying this to the Microdata All the Population Uniques in the Sample will be peturbed by ONS. The method of peturbation will be the same as done for England, Wales and NI records. This method is likely to involve PRAMM. Discussion paper available from the SARs website?
The 100% Tables The 37,820 tables requested cost £2,000 - paid for by the SARs project. They will be made available to registered SARs/Census users for use in research.
And Finally…. Slides will be available on the seminars webpage tomorrow. Any questions?