Presentation is loading. Please wait.

Presentation is loading. Please wait.

The 2001 SARs The Individual Licensed SAR Accessing the data Quality and analysis issues Controlled Access Microdata files The Household SAR Small Area.

Similar presentations

Presentation on theme: "The 2001 SARs The Individual Licensed SAR Accessing the data Quality and analysis issues Controlled Access Microdata files The Household SAR Small Area."— Presentation transcript:

1 The 2001 SARs The Individual Licensed SAR Accessing the data Quality and analysis issues Controlled Access Microdata files The Household SAR Small Area Microdata

2 Introduction to the 2001 Licensed Individual SAR Background to data development Licensing Accessing the data

3 Census Microdata Census outputs have historically been aggregate table – safe but inflexible Microdata permits more flexibility

4 The 1991 Samples of Anonymised Records Available for the first time after research into the confidentiality risk Two samples –Individual SAR Detailed geog (large LAs) 2% Sample –Household SAR Hierarchical, linked individuals - Detailed occupational information 1% Sample

5 The Request for the 2001 Individual SAR Request sent in autumn 2001 Following consultation with users and confidentiality assessment, we asked for similar detail as 1991, e.g: –16 categories of ethnic group (or national equivalent) –SOC 2000 minor (81 categories) But with a 3% sample and more LADs ONS greater concerns over confidentiality Controlled Access Microdata Sample more detailed available in safe setting

6 Safe Data Subject to extensive disclosure control –Broad banding –Special uniques analysis –Further recodes –Less detail than 1991 on: Geography Industry/occupation Age Ethnicity*/country of birth –Released October 2004

7 Second version of SARs ONS reconsidered confidentiality of SARs Have now released a second version with more detail Downloadable from CCSR web-site end March Users must undertake to destroy version 1 before downloading version 2

8 Licensed file content - geographical Regional Geography –GOR Region PLUS Inner/Outer London Northern Ireland Scotland Wales Country of birth –16 categories –Increased from version 1

9 Licensed file contents: demographic Age bandedv.2 –Individual year to 15 –16-18; 19-22; 23-29; 30-44; –45-59; 60-64; 65-69; –70-74; single years; 95+ Ethnic group v.2 –16 categories (E and W) –14 Scotland –2 N. Ireland

10 Licensed file content: Socio-economic Occupation –2000 SOC Minor categories v.2 NS-SEC –38 valid categories Industry –15 categories A-O, P, Q Hours of work – single hours to 80+

11 New or Improved Data Improved highest qualification –4 categories Religion – varies considerably by nation v.2 –9 categories in England and Wales –7 in Scotland – current only –7 in Northern Ireland, plus religion brought up in General health –Good / fairly good / not good Caring –Hours caring, 3 bands –Number of carers in household

12 Research value Ability to recode variables as wished Ability to select populations and variables Ability to conduct multivariate analysis Learning and Teaching Preliminary work before using in- house file (CAMS)

13 The Licence All users need to be licensed Academics complete license as part of the Census Registration System Process Non-academic users sign license as part of the data registration process Cannot pass the data to an unlicensed user Cannot attempt to identify an individual

14 The licence – good practice Keep your data password protected Destroy your data when you have finished using it Remove SAR files before passing on your PC to someone else Tell CCSR about your publications Tell CCSR if you leave your institution

15 Access Arrangements Data distributed by CCSR Academics, no charge –Register for the data under Census Registration System –Access the data online from CCSR website Non-academics –Not for profit £500 per file –Business users £1000 per file –10 users per application, incl. software –Download End User License from web

16 Accessing the data Non-academic users –Data available in NSDstat –Other formats available on CD –Can arrange direct download Academic users –Direct download (SPSS/Stata/tab delimited) –Nesstar, explore online and subset (wider range of formats available) –NSDstat available











27 Working with the 2001 Licensed Individual SAR Coverage and quality SAR data issues Analysing SAR data Software

28 Census coverage Major effort to improve coverage in 2001 One Number Census Use of large Census Coverage Survey to correct census results, 300K households –Design independent of census; –Used matched census and CCS data to estimate total population in each area, –adjusted all results for census non-response using imputation of households and individuals –Results in final database for UK adjusted for non- response

29 Census coverage Coverage before imputation: –94% households returned forms, with another 4% estimated to be in households identified by enumerators. Response rate lowest for –Young people in their early 20s (men aged resp. rate of 87%) –Inner London (resp rate of 78%) Once imputed cases are included estimated to be 100% coverage

30 Population base One population base: usual residents differs from 1991 when user had to chose either present or usual resident base Students enumerated at term time address Communal establishments are included

31 Implications for 2001 SARs 1991 SARs selected from 10% sample –Did not include imputed households –96% coverage 2001 SARs selected from 100% ONC database –94% response; 6% imputed –Imputed individuals/hholds are identified –Imputed items are flagged

32 Two kinds of imputation Entire individual or household may be imputed as part of ONC –Complete records copied from enumerated individuals/hhold –Variable oncperim Variables imputed when information missing

33 Edit 13.7 million edit procedures undertaken –28% population had 1+ items imputed –Common: Missing prof quals set to none Carer set to no where missing (unless economic activity also missing) Travel to work set to work mainly at/from home where workplace was mainly at/from home –Others 14k people multi-ticked sex (so imputed) 6k children had marital status changed to single impossible values set to missing then imputed Missing values are imputed on the basis of similar local cases does not remove unlikely values

34 Item imputation For census output database as a whole: One or more items imputed for 28% of the population Employment variables most affected: –Industry ever worked: 18% –Occupation ever worked: 14% –Workplace size: 9% Under-enumerated groups are most imputed, esp. single people

35 Can I tell what/who has been imputed? Oncperim records whether an individual has been imputed as part of the ONC –Copies entire record from census database z variables identify whether individual has imputed information on a specific variable –Parallel set of variables –zethew, zage0

36 Crosstab ethnic group (ethew) by imputation flag (zethew)

37 Percentage with ethnicity variable imputed, 2001 SARs Not imputedimputed White Mixed Asian Black Chinese/Other All

38 Percentage ONC imputed, 2001 SARs Not ONC imputed ONC imputed White Mixed Asian Black Chinese/Other All

39 Should I use imputed individuals or variables? Imputation of individuals is designed to compensate for under-enumeration -using imputed cases will give results comparable with national data - will help overcome bias from non- response Imputed variables are generally reported as accurate - in general we advise using imputed information

40 Ethnicity But doubt over imputed ethnic group Simpson and Akinwale used Longitudinal Study to compare 1991 ethnic group with imputed 2001 ethnic group Majority of imputed records are wrong Recommend not using imputed records for minority groups –SARs Percentage ethnic group imputed: – 2.5% white; 7.4% black; 11.7% mixed

41 PRAMMing PRAMMing is perturbation designed to deal with very unusual cases, eg widowed 16-year olds Avoids additional broad-banding Perturbation is constrained to –preserve univariate distributions –Preserve multivariate distributions on control variables –prevents strange results (like 5 year old widows) Affects 15 variables –Primary economic activity – 1% cases

42 The z-variables PRAMMed variables are flagged along with imputed variables –Cannot distinguish them Imputation flags are stored in variables with z prefix Two versions of the download file –use the larger *-impflag-*.extension version if interested in imputation/PRAMMing

43 General advice If unsure about impact of PRAMMing and imputation –Do a sensitivity test –use the z var to exclude cases with imputed variables and then repeat your analysis –Use ONCPERIM to exclude imputed individuals and repeat your analysis

44 National variation There is one file for the whole UK Some variables are country specific: –Irish language Other variables have national variations –educational qualifications –ethnicity –Watch out for the E,W,S and N suffixes! Slight variation in the sampling fraction for each country: –3.125 in England and Wales; –3.246 in Scotland –3.139 in Northern Ireland

45 How does the SARs compare to the aggregate data?

46 Get to know the data Use the documentation SARs User Guide –Use Census schedules to check questions –Check univariate frequencies –Do exploratory analyses –Contact if you cant find the information you need in the online documentation Contact if you think there is a problem with the data

47 SARs as a LARGE dataset 1.8 Million cases can cause trouble! Use Nesstar to do initial data exploration Extract a subset using NESSTAR or take a subset from the downloaded file For serious analysis using a syntax ( file to record syntax makes re-running easier –Create a single syntax file which starts with the original data –Use file naming conventions that will enable you to trace versions –Keep a record of work done

48 SARs as sample data Geographically stratified sample –approximates to simple random sample –no clustering in Individual file –Household file – clustering within households –Although large sample you may have small sample sizes when using sub-groups –use standard errors and confidence intervals

49 Comparisons between 1991 and 2001 Population base changed –Imputation (no imputed values in 1991 SARs) –Students – enumerated at term-time address –Residents only (choice in 1991) Variable continuity –Variable names have been changed where the variable is not exactly the same –Some variables (e.g. age, LLI) are easy to compare by grouping 1991 values –Some variables are harder to compare as the question has changed (eg qualifications)

50 Ethnicity 91/01 Different questions asked in 1991 and 2001 No agreed and perfect correspondence Simpson and Akinwale use LS to show how 1991 maps on to 2001

51 Software options Supported packages –Nesstar –NSDstat –SPSS –Stata Other options –Import or Stat/transfer to another package –Use Nesstar to save to SAS or Statistica –unless you use a v. small subsample the SARs will be too big for most spreadsheets!

52 Looking forward: Moving forward Controlled Access Microdata Samples Household SARs Small Area Microdata sample Learning and Teaching

53 CAMS content Controlled Access Microdata designed for professional researchers: Access in safe setting only Specification on SARs website Individual file and Household file

54 Content of CAMs files Files contains much more detail; e.g. –Individual year of age (topcoded at 95) –Full coding on country of birth –SOC Unit Goup –Local authority geography –Index of Deprivation for SOAs –Index of Deprivation for migrants last address

55 Controlled Access CAMS is managed by ONS Data is accessed at London/Titchfield/Newport in Virtual Laboratory setting on a server Virtual lab looks like a standard windows interface Use SPSS/Stata in usual way output checked for confidentiality before release Further information and appropriate forms at Contact for more details

56 CAMS Good practice Use the licensed SARs... –to exhaust the potential of other datasets –to write your syntax files check the disclosure guidelines before writing your file Avoid complex tables –small cell counts arent reliable –unique cells will usually be suppressed Do use models

57 Household SAR 1% of households and all individuals Allows linkage between individual in hholds Similar detail to Individual SAR –Continuing discussion over ONS confidentiality concerns Large households, less detail on households of 6+ Specification of Household SAR on website

58 The hierarchy of the household file Household 1 North West Social rented Person 1 HoH Female 28 No quals No LTILL Person 2 Son of HoH Male 12 N/A No LTILL Household 2 Wales Owner occupier Person 1 HoH Male 33 Degree No LTILL Person 2 Spouse of HOH Female 31 Degree P/T Employee No LTILL Person 3 Parent of HoH Female 72 No quals Econ Inactive LTILL

59 Release of household SAR Discussions are continuing over release of Household SAR One possibility is a dataset with full information on age and all individuals in large households but under more tightly regulated conditions than the Individual SAR

60 Small Area Microdata file 5% sample of individuals Full range of variables LA lowest geography Except Isles of Scilly and City of London in E and W; similar exceptions in S and NI –Excludes communal establishments –Age 11-year bands –Ethnicity – 5 groups or 16 with records swapping between LAs –Economic activity – 3 categories Summer 2005 for delivery

61 Using the SARs in Learning and Teaching SARs provides easy to use dataset Fits well with aggregate data Supported by learning and teaching materials – Access managed in same way: –use Census Registration System –need ATHENS (for data and CHCC)

62 User support Web pages are regularly updated Documentation online Resources and links added as we go Seminar invitations welcome! Regional workshop invites welcome! SARs Helpdesk –(0161) Join and newsletter lists SARs User Group – July 15 th, RSS, London

Download ppt "The 2001 SARs The Individual Licensed SAR Accessing the data Quality and analysis issues Controlled Access Microdata files The Household SAR Small Area."

Similar presentations

Ads by Google