Presentation is loading. Please wait.

Presentation is loading. Please wait.

Quality issues in the 2001 Census Ludi Simpson, Cathie Marsh Centre for Census and Survey Research.

Similar presentations

Presentation on theme: "Quality issues in the 2001 Census Ludi Simpson, Cathie Marsh Centre for Census and Survey Research."— Presentation transcript:

1 Quality issues in the 2001 Census Ludi Simpson, Cathie Marsh Centre for Census and Survey Research

2 Quality for which purposes? l Macro administration and planning l Fine geography, timeliness, constant questions l Policy evaluation, research l Cross-tabulated questions l Micro administration l Identified individuals

3 Major changes for the 2001 census l Content and design of the questionnaire l New questions l Health, lowest floor level, caring for others, relationship between residents, religion l Students at term-time address, no visitors l Fieldwork l Pre-printed list of addresses l Contracts for major operations l Post-back l Focus on hard to count areas

4 More major changes for the 2001 census l Processing and validation l All forms scanned l All responses fully coded and processed – no 10% tables l Redesigned census coverage survey (CCS) l impute the whole population before output: a One Number Census. l Output l Simpler cross-tabulations and key statistics UK wide l Standard area statistics free on the internet l Census Access Project l All standard area statistics; migration and commuting; vector boundaries. l Not included: commissioned output; SARs. l Confidentiality - disclosure control

5 Plan The quality of enumeration, and adjustments before publication l Differential undercount l Disclosure control: small cell adjustment l One Number Census

6 The key to our whole strategy has been to try to minimise the amount of differential undercount ONC Steering Group, and ONS evidence to Treasury Committee Courtesy of Mt Meagre cosmetic stones

7 Procedures affecting differential undercount l Community Liaison programme l Pre-listed addresses l Double EDs for enumerators in easier areas l Post-back l Checks on returned forms l Centralised form production l Local enumerator loyalty

8 More in 2001 Less in % 2001 % Census non-response 1991 and 2001

9 Person non-response (1) Differential between LADs Mean of LADs2.7%5.2% Range0% - 14%1% - 36% Standard Deviation1.9%4.6% Inter Quartile Range1.1%3.2% SD/mean (2) Differential between types of people Biases severe, but not as marked as in 1991: l Young men or young people generally l Private renters, unemployed, not-White

10 Item non-response: missing, invalid, inconsistent responses l Under 1%: age, sex, marital status l 1-5%: 15 variables l 5-10%: religion, provision of care, qualifications, employment status, supervisor status, industry, workplace address, hours worked, travel to work, number of rooms l > 10%: Professional quals 17%, company size 14% l LAD variation: Wokingham, Eastleigh, Hart, E Dorset best, Manchester, Blackburn and 5 London Boroughs worst. l Biases … were in the same direction as those present in the 1991 Census, but were less marked. (Edit-imputation evaluation report)

11 Conclusions on non-response l Post-back problems jeopardised quality l Neither levelling up of response rates nor levelling down l Lower response than 1991 l Wider geographical differences than 1991 l More of all types of people missed l Future preparation for next census

12 Measures to prevent disclosure have an impact on data quality l Measures to prevent disclosure: l Thresholds for census areas l OAs 40 hh and 100 residents l Broad output categories l No SARs sub-regional geography; no large households l Imputed records not distinguishable l Record swapping between areas l July 2002: Adjust small numbers to eliminate all 1s and 2s l Table totals the sum of internal cells l Different tables, different totals

13 Why avoid 1s and 2s? it should not be possible for someone to recognise their own information or information about someone they know from census outputs with sufficient confidence that they would be prepared to act on that information as though it were true. ONS AG(02)03 April There are no health professionals in the fishing industry in England or Wales

14 Impact on small area indicators Unemployment rate from ST028 l (Sum of 30 cells) divided by (Sum of 390 cells) l 21 ± 4 divided by 729 ± 11 l Rate: 2.9% is likely to be in error by 0.6%

15 Number of OA values of 0 or 3 Difference between Ward value and sum of OA values Impact of aggregating small area adjustment: 8,850 wards, 16 age groups (KS02) Impact of aggregating

16 How many rounded cells? Out of every 100 OA values for… …these many are 0 or 3: & over85 Q: What is the impact? A: It depends on the number of true 1s and 2s: the impact is different on each cell (unlike 1991) Eg, Table KS02, age

17 Impact of small area adjustment: advice for users l Substitute 1.5 for 0s and 3s? No! l Average error=sqrt(no. of 0s and 3s) *0.8 l 5% of time error more than twice this. l Beware percentages based on small rounded totals l Statistical analyses are affected: measurement error l Aggregate a minimal number of areas or cells l Key statistics better than Standard Tables l Use Univariate tables for denominators l 1 ward is better than the sum of its OAs l 1 ward minus 2 OAs is better than 18 OAs l Impact of adjustment is only worse than in 1991 if 0s and 3s are one third of summed values


19 Measuring census error l What is the probable distance of the 2001 ONC from the truth? l For future Census planners: for which populations is the ONC expected to have greater error than the census enumeration and other alternatives? l Population size; variable; undercount

20 How accurate is the census population? Root mean square error, % Enumeration Census + absent households One Number Census E&W6.2%0.1% LAD10.4%0.74% Ward Output Area

21 Statistics of a complete population versus Speedy delivery of results l Compromise: a firm output prospectus l Achievement of data release l Standard area statistics: Feb 03 – Sept 03 (3-4mths delay) l Origin-Destination statistics: Jan 04? l Documentation, software l Public availability of key statistics via NeSS/SCROL…

22 Enumerated

23 Additions to enumerated 48,843,000EvidenceJudgements Dual system estimate of undercount 3,199,000Census Coverage SurveyMedium Revised household estimate230,000Addresspoint with LFSMedium Census day to June 3043,000Births, deaths, migrationLight Revised persons estimate193,000Longitudinal StudyMedium Further revised persons estimate ?Address matching Mcr, Westr Medium Unmonitored intl migration , censusHeavy Unmonitored intl migration c111,000Improved 91-98Light c85,000Visitor switchersMedium c108,000Migrant switchersHeavy Unexplained difference with rolled forward MYE , census October

24 Independence of census and its post-enumeration survey l Assumption: the CCS is equally likely to find those missed by the Census as those counted by the Census

25 Dependence between the census and its Coverage Survey l Dual system estimation (DSE) example: l DSE assumes those missed by CCS are no more likely than others to have been missed by Census Census - counted Census - missed Total Coverage Survey - counted 1,000501,050 Coverage Survey - missed Total1,100551,155

26 Check independence with a third, independent, source l Households – AddressPoint calibrated against LFS l ONS added 230,000 in E&W, 41,000 in Scotland => dependence odds ratio of 1.5 l People within households – no third source l Dependence odds ratio of 3 l 3 times more likely to have been missed in CCS if missed in Census l adds half million l Other QA: demographic analysis, migration estimates

27 Male undercount (E&W) According to ONC2001 population & revised 1991 According to population rolled forward from 1981

28 Migration loss, evidence l IPS weaknesses l Australia, best statistics l Not such a male dominance l Revisions published June 2003 l Explain two thirds of 1.1m unmonitored change l Mostly uncorroborated assumptions l Immigrants returning overseas l Not a firm basis for future estimates

29 Sub-national l Best ever sub-national estimation procedures for census year l Methods agreed by users before census l Concerns: l Insufficient information about quality assurance l Administrative comparisons not acted upon l Unconvincing geography of undercount

30 QA of local population: comparators l Students, armed forces, prisoners l Local pre-census population estimates and administrative records l Child benefit, pensioner, births, school census, adjusted GP patients l calculate a range of plausible values for the number of people of each sex within five-year age groups in each geographical area ONC Guide l Diagnostic range up to double that of the comparators



33 Composition of undercount: ratio of undercount rates Change in population sex ratio Type of District: Under- count rate, all persons Male / Female All / All other M/F, ONC2001- MYE2000 Inner London22% Outer London10% Principal metropolitan cities9% Large cities7% Small cities6% Resort, port and retirement5% Other metroplitan Districts5% New towns5% Industrial areas4% Urban and mixed urban-rural4% Remoter, mainly rural4% England and Wales6%

34 Where did population estimates fall? l A few Districts in each region account for most of change l Districts with transient populations – students, armed forces, seasonal labour, immigration l ONS now focusing on outliers l Manchester and Westminster address-matching l Were address lists complete?

35 Conclusions l Wide-spread undercount – new types of people l Impact on output reduced by ONC imputation? l Uncertain population total l Young men, children l Multi-source error for small areas: advice for users l Future priorities l Fieldwork: improved management l Output: firm timetable, pre-release documentation. l Information: users are the Census best friends l Third way: validated administrative records l International migration: beyond interim revisions l Residence definitions: avoid legal population l Why were people missed?

Download ppt "Quality issues in the 2001 Census Ludi Simpson, Cathie Marsh Centre for Census and Survey Research."

Similar presentations

Ads by Google