Working with the 2001 Licensed Individual SAR Coverage and quality SAR data issues Analysing SAR data Software The other datasets…

Slides:



Advertisements
Similar presentations
The ONS Longitudinal Study - plans for the 2011 Census and beyond
Advertisements

Multiple Indicator Cluster Surveys Survey Design Workshop
Measuring Coverage: Post Enumeration Surveys Owen Abbott Office for National Statistics, UK.
1 ESDS Government Vanessa Higgins Cathie Marsh Centre for Census and Survey Research University of Manchester ESDS Awareness Day December 2003.
The Economic and Social Data Service (ESDS) Kevin Schürer ESDS/UKDA ESDS Awareness Day 5 December 2003.
Samples of Anonymised Records: a resource for ethnicity research Ed Fieldhouse Director, SARs Support team
Transitions from independent to supported environments in England and Wales: examining trends and differentials using the ONS Longitudinal Study Emily.
Analysing Households with the SARs Jo Wathan SARs support team University of Manchester.
Comparing Results from the England and Wales, Scotland and Northern Ireland Longitudinal Studies: Health and Mortality as a case study Census Microdata.
Constructing population time series with an ethnic breakdown for sub-national areas in England and Wales, Albert Sabater PhD student at CCSR.
The 2001 SARs The Individual Licensed SAR Accessing the data Quality and analysis issues Controlled Access Microdata files The Household SAR Small Area.
Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009.
Census.ac.uk SARs Census.ac.uk Where we are Phase 1: SARs user meeting 12 November 2007 consultation survey with users/non-users in Phase.
LFS/APS user meeting 2 Dec Is ethnicity or religion more important in explaining inequalities in the labour market? Jean Martin Anthony Heath University.
Requirements for 2011 Cross-sectional Microdata SARs Support Team University of Manchester
Requirements for 2011 Cross-sectional Microdata Ed Fieldhouse SARs Support Team University of Manchester
ESDS Government Tel: (0161) Jo Wathan CCSR, University of Manchester.
Confidentiality and the SARs Update on SAR progress, and discussion of the disclosure work done for Scotland. Sam Smith
Longitudinal LFS Catherine Barham and Paul Smith ONS.
1 Using the government data in employment research Vanessa Higgins CCSR University of Manchester.
Large-scale Microdata workshop: An introduction to the SARs and ESDS Government Surveys University of Plymouth 15 April 2005 Jo Wathan & Reza Afkhami.
ESDS Government Resources for the GLF/ GHS ESDS Government Centre for Census and Survey Research University of Manchester.
ESDS Government Resources for Government Crime Surveys ESDS Government Centre for Census and Survey Research University of Manchester.
ESDS Resources for BCS Users Vanessa Higgins Centre for Census and Survey Research University of Manchester.
User views Jo Wathan SARs Support team
1 Large-scale Government Surveys Benefits of the data Data covered by ESDS Aspects of the data in research Useful resources.
ESDS Government Resources for the LFS and APS Anthony Rafferty ESDS Government Centre for Census and Survey Research University of Manchester.
Output Consultation Plans and Statistical Disclosure Control Strategy developments Angele Storey and Jane Longhurst ONS.
Research potential of the SAM Giorgio Finella and Rachel Leeser Data Management and Analysis Group .
Conference Programme Introduction to the Samples of Anonymised Records - Keith Spicer, ONS CCSR's role in providing SAR's support - Jo Wathan,
The Samples of Anonymised Records: Understanding Individual differences Mark Brown.
2001 Census Programme Using the Census for contemporary and historical research ESRC Research Methods Festival Oxford, July 2004.
The Census Area Statistics Myles Gould Understanding area-level inequality & change.
Mapping and Visualising Census Data Keith Cole Jackie Carter Geo-data forum - 4/4/2001.
RELEASE OF THE 2001 CENSUS RESULTS March Release of the 2001 Census Content Media and formats Release schedule Arrangements for using the results.
Using American FactFinder John DeWitt Project Manager Social Science Data Analysis Network Lisa Neidert Data Services Population Studies Center.
Prerequisites Recommended modules to complete before viewing this module 1. Introduction to the NLTS2 Training Modules 2. NLTS2 Study Overview 3. NLTS2.
1 ESDS Government: added value for large-scale government datasets Vanessa Higgins, Economic and Social Data Service CCSR, University of Manchester MOF.
School Census Summer 2011 Headlines Version Jim Haywood Product Manager for Statutory Returns.
Methodological issues in LS analysis of mortality and fertility by ethnic group Bola Akinwale.
Data linking – Project update 15 th May 2012 – Homecare & SDS event Atlantic Quay Ellen Lynch & Euan Patterson.
Using synthetic data to improve the accessibility of the SLS Susan Carsley, SLS Project Manager.
Accessing and Using ESDS Government surveys Vanessa Higgins ESDS Government Centre for Census and Survey Research (CCSR) University of Manchester.
Sample of Anonymised Records: User Meeting Propensity to migrate by ethnic group: 1991 & 2001 Paul Norman 1, John Stillwell 2 & Serena Hussain 2 School.
Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK
Scotland’s 2011 Census Migration Matters Scotland Thematic Event Cecilia Macintyre 26 February 2015.
Geography and Geographical Analysis using the ONS Longitudinal Study Christopher Marshall & Julian Buxton CeLSIUS.
The ONS Longitudinal Study. © London School of Hygiene and Tropical Medicine The Office for National Statistics Longitudinal Study (LS) o What is it o.
Household projections for Scotland Hugh Mackenzie April 2014.
GEOG3025 Census and administrative data sources 2: Outputs and access.
U.S. Decennial Census Finding and Accessing Data Summer Durrant October 20, 2014 Data & Geographical Information Librarian Research Data Services
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
General Register Office for S C O T L A N D information about Scotland's people General Register Office for Scotland “Information about Scotland’s people”
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
2011 CENSUS Coverage Assessment – What’s new? OWEN ABBOTT.
18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Providing Access to Census- based Interaction Data in the UK: That’s WICID! John Stillwell School of Geography, University of Leeds Leeds, LS2 9JT, United.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
Census.ac.uk The UK Census Longitudinal Studies Chris Dibben, University of St Andrews.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage Mark Elliot, University of Manchester Cathie.
Disclosure Control in the UK Census Keith Spicer 11 January 2005.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
The Integrated Public Use Microdata Series database IPUMSwww.ipums.org Lab 1 Background on the IPUMS and SPSS.
2011 Census Data Quality Assurance Strategy: Plans and developments for the 2009 Rehearsal and 2011 Census Paula Guy BSPS 10 th September 2009.
The 2011 Census: Estimating the Population Alexa Courtney.
Using Data from the National Survey of Children with Special Health Care Needs Centers for Disease Control and Prevention National Center for Health Statistics.
Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011.
Samples of Anonymised Records from the U.K. Census 1991 and 2001 Integrating Census Microdata Workshop Barcelona th July 2005 Dr. Ed Fieldhouse Cathie.
Samples of Anonymised Records: a resource for ethnicity research
Presentation transcript:

Working with the 2001 Licensed Individual SAR Coverage and quality SAR data issues Analysing SAR data Software The other datasets…

The SARs Introduced from 91 Census as alternative to tabular outputs –Improved flexibility –Huge sample sizes –Only released following demonstration of non- disclosiveness Content and access methods of 01 data much more affected by confidentiality –Less detail on many variables in the licensed files –Codebook online

2001 Files Data available for download –Individual licensed SAR –On their way Household licensed SAR – under special license from the UK Data Archive Small area microdata file If you need more detail – Controlled Access Microdata Samples –Individual file –Household file (version 1)

Census coverage Major effort to improve coverage in 2001 One Number Census Use of large Census Coverage Survey to correct census results, 300K households –Design independent of census; –Used matched census and CCS data to estimate total population in each area, –adjusted all results for census non-response using imputation of households and individuals –Results in final database for UK adjusted for non- response

Census coverage Coverage before imputation: –94% households returned forms, with another 4% estimated to be in households identified by enumerators. Response rate lowest for –Young people in their early 20s (men aged resp. rate of 87%) –Inner London (resp rate of 78%) Once imputed cases are included estimated to be 100% coverage

Population base One population base: usual residents –differs from 1991 when user had to chose either present or usual resident base Students enumerated at term time address –And are included in the data. Use stulaway>1 to exclude those other than usual residents Communal establishments are included in the indivividual file

Implications for 2001 SARs 1991 SARs selected from 10% sample –Did not include imputed households –96% coverage 2001 SARs selected from 100% ONC database –94% response; 6% imputed –Imputed individuals/hholds are identified –Imputed items are flagged

Two kinds of imputation Entire individual or household may be imputed as part of ONC –Complete records copied from enumerated individuals/hhold –Variable oncperim Variables imputed when information missing

Edit 13.7 million edit procedures undertaken –28% population had 1+ items imputed –Common: Missing prof quals set to none Carer set to no where missing (unless economic activity also missing) Travel to work set to work mainly at/from home where workplace was mainly at/from home –Others 14k people multi-ticked sex (so imputed) 6k children had marital status changed to single impossible values set to missing then imputed Missing values are imputed on the basis of similar local cases does not remove unlikely values

Item imputation For census output database as a whole: One or more items imputed for 28% of the population Employment variables most affected: –Industry ever worked: 18% –Occupation ever worked: 14% –Workplace size: 9% Under-enumerated groups are most imputed, esp. single people

Can I tell what/who has been imputed? Oncperim records whether an individual has been imputed as part of the ONC –Copies entire record from census database z variables identify whether individual has imputed information on a specific variable –Parallel set of variables –zethew, zage0

Crosstab ethnic group (ethew) by imputation flag (zethew)

Percentage with ethnicity variable imputed, 2001 SARs Not imputedimputed White Mixed Asian Black Chinese/Other All

Percentage ONC imputed, 2001 SARs Not ONC imputed ONC imputed White Mixed Asian Black Chinese/Other All

Should I use imputed individuals or variables? Imputation of individuals is designed to compensate for under-enumeration -using imputed cases will give results comparable with national data - will help overcome bias from non- response Imputed variables are generally reported as accurate - in general we advise using imputed information

Ethnicity But doubt over imputed ethnic group Simpson and Akinwale used Longitudinal Study to compare 1991 ethnic group with imputed 2001 ethnic group Majority of imputed records are wrong Recommend not using imputed records for minority groups asp –SARs Percentage ethnic group imputed: – 2.5% white; 7.4% black; 11.7% mixed

PRAMMing PRAMMing is perturbation designed to deal with very unusual cases, eg widowed 16-year olds Avoids additional broad-banding Perturbation is constrained to –preserve univariate distributions –Preserve multivariate distributions on control variables –prevents strange results (like 5 year old widows) Affects 15 variables –Primary economic activity – 1% cases

The z-variables PRAMMed variables are flagged along with imputed variables –Cannot distinguish them Imputation flags are stored in variables with z prefix Two versions of the download file –use the larger *-impflag-*.extension version if interested in imputation/PRAMMing

General advice If unsure about impact of PRAMMing and imputation –Do a sensitivity test –use the z var to exclude cases with imputed variables and then repeat your analysis –Use ONCPERIM to exclude imputed individuals and repeat your analysis

National variation There is one file for the whole UK Some variables are country specific: –Irish language Other variables have national variations –educational qualifications –ethnicity –Watch out for the E,W,S and N suffixes! Sampling fraction is not quite consistent across countries! –Unlikely to result in major bias of proportions –Will not gross up to census figures

Sampling fraction: by country & sex England Male3.097 England Female3.092 Wales Male3.089 Wales Female3.098 Scotland Male3.210! Scotland Female3.232! N Ireland Male3.125 N Ireland Female3.065 total3.105

How does the SARs compare to the aggregate data? Tables of comparisons between the licensed individual SAR and the aggregate tables available online in the user guide. Results are very similar, with occasional deviations from 95% ci. Looked at univariate distribution of economic activity, general health, marital status and ethnicity No proportion significantly different from aggregate data at UK level By country 9/107 cells are significantly different – slightly over 5% - will be looking to see if PRAMMING is to blame

Get to know the data Use the documentation SARs User Guide –Use Census schedules to check questions –Check univariate frequencies –Do exploratory analyses –Contact if you cant find the information you need in the online documentation Contact if you think there is a problem with the data

SARs as a LARGE dataset 1.8 Million cases can cause trouble! Use Nesstar to do initial data exploration Extract a subset using NESSTAR or take a subset from the downloaded file For serious analysis using a syntax ( or.do) file to record syntax makes re-running easier –Create a single syntax file which starts with the original data –Use file naming conventions that will enable you to trace versions –Keep a record of work done

SARs as sample data Geographically stratified sample –approximates to simple random sample –no clustering in Individual file –Household file – clustering within households –Although large sample you may have small sample sizes when using sub-groups –use standard errors and confidence intervals

Comparisons between 1991 and 2001 Population base changed –Imputation (no imputed values in 1991 SARs) –Students – enumerated at term-time address –Residents only (choice in 1991) Variable continuity –Variable names have been changed where the variable is not exactly the same –Some variables (e.g. age, LLI) are easy to compare by grouping 1991 values –Some variables are harder to compare as the question has changed (eg qualifications)

Ethnicity 91/01 Different questions asked in 1991 and 2001 No agreed and perfect correspondence Simpson and Akinwale use LS to show how 1991 maps on to

Software options Supported packages –Nesstar –NSDstat –SPSS –Stata Other options –Import or Stat/transfer to another package –Use Nesstar to save to SAS or Statistica –unless you use a v. small subsample the SARs will be too big for most spreadsheets!

Looking forward: Moving forward Controlled Access Microdata Samples Household SARs Small Area Microdata sample Learning and Teaching

CAMS content Controlled Access Microdata designed for professional researchers: Access in safe setting only Specification on SARs website Individual file and Household file

Content of CAMs files Files contains much more detail; e.g. –Individual year of age (topcoded at 95) –FULL coding on country of birth –SOC Unit Goup –Local authority geography –Index of Multiple Deprivation for SOAs –Index of Multiple Deprivation for migrants last address

Controlled Access CAMS is managed by ONS Data is accessed at London/Titchfield/Newport/Southport in Virtual Laboratory setting on a server New bases soon Virtual lab looks like a standard windows interface Use SPSS/Stata in usual way output checked for confidentiality before release Further information and appropriate forms at Contact for more details

CAMS Good practice Use the licensed SARs... –to exhaust the potential of other datasets –to write your syntax files check the disclosure guidelines before writing your file Avoid complex tables –small cell counts arent reliable –unique cells will usually be suppressed Do use models

Household SAR 1% of households and all individuals Allows linkage between individual in hholds Will be available SOON under special license Similar detail to Individual SAR Specification of Household SAR on website

The hierarchy of the household file Household 1 North West Social rented Person 1 HoH Female 28 No quals No LTILL Person 2 Son of HoH Male 12 N/A No LTILL Household 2 Wales Owner occupier Person 1 HoH Male 33 Degree No LTILL Person 2 Spouse of HOH Female 31 Degree P/T Employee No LTILL Person 3 Parent of HoH Female 72 No quals Econ Inactive LTILL

Small Area Microdata file 5% sample of individuals Full range of variables LA lowest geography Except Isles of Scilly and City of London in E and W; similar exceptions in S and NI –Excludes communal establishments –Age 11-year bands –Ethnicity – 5 groups or 16 with records swapping between LAs –Economic activity – 3 categories Delivery at CCSR soon

Using the SARs in Learning and Teaching SARs provides easy to use dataset Fits well with aggregate data Supported by learning and teaching materials – Access managed in same way: –use Census Registration System –need ATHENS (for data and CHCC)

User support Web pages are regularly updated Documentation online Resources and links added as we go Seminar invitations welcome! Regional workshop invites welcome! SARs Helpdesk –(0161) Join and newsletter lists SARs User Group – July 15 th, RSS, London