Editing a Mixture of Canadian 2006 Census and Tax Data Mike Bankier Statistics Canada 2006 Work Session on Statistical Data Editing

Slides:



Advertisements
Similar presentations
IMPACT OF ONLINE EDITS AND INTERNET FEATURES IN THE 2006 CANADIAN CENSUS Presented by Mike Bankier on behalf of: Danielle Laroche and Chantal Grondin Statistics.
Advertisements

Data Imputation United Nations Statistics Division (UNSD) 16 March 2011 Santiago, Chile.
Unido.org/statistics International workshop on industrial statistics 8 – 10 July, Beijing Non response in industrial surveys Shyam Upadhyaya.
Burton Reist Chief, 2020 Research and Planning Office U.S. Census Bureau 2014 SDC and CIC Steering Committee Meeting March 5, Census Updates.
Do Economic and Demographic Characteristics Differ between Web and Mail Respondents to the 2005 Census of Agriculture Content Test? By Nancy J. Dickey.
Migration of a large survey onto a micro-economic platform Val Cox April 2014.
Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data.
The estimation strategy of the National Household Survey (NHS) François Verret, Mike Bankier, Wesley Benjamin & Lisa Hayden Statistics Canada Presentation.
United Nations Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys, Amman, Jordan,
Quality assurance -Population and Housing Census Alma Kondi, INSTAT, Albania.
Bosna i Hercegovina Agencija za statistiku Bosne i Hercegovine Bosna i Hercegovina Agencija za statistiku Bosne i Hercegovine Post-enumeration Survey-A.
1 The 2010 Census Coverage Measurement Survey Patrick J. Cantwell U.S Census Bureau Annual Meeting of the Association of Public Data Users September 25,
Kevin Deardorff Assistant Division Chief, Decennial Management Division U.S. Census Bureau 2014 SDC / CIC Conference April 2, Census Updates.
NLSCY – Non-response. Non-response There are various reasons why there is non-response to a survey  Some related to the survey process Timing Poor frame.
Bridging the Gaps: Dealing with Major Survey Changes in Data Set Harmonization Joint Statistical Meetings Minneapolis, MN August 9, 2005 Presented by:
Brief Overview of Data Processing of Afghanistan Household Listing, Pilot Census Results, Population and Housing Census and NRVA Survey Brief Overview.
Edit and Imputation of the 2011 Abu Dhabi Census Glenn Hui and Hanan AlDarmaki Statistics Centre - Abu Dhabi UNECE CES Work Session on Statistical Data.
National Statistical Office, Thailand 2-6 December 2013, Hanoi, Viet Nam Census Evaluation.
National Household Survey: collection, quality and dissemination Laurent Roy Statistics Canada March 20, 2013 National Household Survey 1.
Household Surveys ACS – CPS - AHS INFO 7470 / ECON 8500 Warren A. Brown University of Georgia February 22,
The new HBS Chisinau, 26 October Outline 1.How the HBS changed 2.Assessment of data quality 3.Data comparability 4.Conclusions.
Estimating the Labour Force Trinidad and Tobago 28 th May 2014 Sterling Chadee Director of Statistics.
Imputation in the 2011 Census NILS Brownbag Talk – 6 May 2014 Richard Elliott.
Copyright 2010, The World Bank Group. All Rights Reserved. PROCESSING, Part 1 Data capture, editing, imputation and tabulation Quality assurance for census.
Administrative Data at Statistics Canada – Current Uses and the Way Forward 27 th Voorburg Group Meeting Warsaw, Poland André Loranger October 4, 2012.
Copyright 2010, The World Bank Group. All Rights Reserved. Estimation and Weighting, Part I.
From Sample to Population Often we want to understand the attitudes, beliefs, opinions or behaviour of some population, but only have data on a sample.
Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE.
Use of survey (LFS) to evaluate the quality of census final data Expert Group Meeting on Censuses Using Registers Geneva, May 2012 Jari Nieminen.
Dutch Virtual Census Presentation at the International Seminar on Population and Housing Censuses; Beyond the 2010 Round November, 2012 Egon Gerards,
12th Meeting of the Group of Experts on Business Registers
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
The 2006 National Health Interview Survey (NHIS) Paradata File: Overview And Applications Beth L. Taylor 2008 NCHS Data User’s Conference August 13 th,
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys Bangkok,
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
Internet versus paper mode effects in the 2011 Census of England and Wales: analysis of Census Quality Survey agreement rates Cal Ghee 26 September 2014.
European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.
Imputation in the 2001 Census Robert Beatty NILS User Forum 11 December 2009.
The Dutch Virtual Census of 2001 A New Approach by Combining Different Sources Eric Schulte Nordholt ECE Census meetings Geneva, November 2004.
Using administrative registers in sample surveys European Conference on Quality in Official Statistics 3-–6 May 2010 Kaja Sõstra Statistics Estonia.
Quality Assurance Programme of the Canadian Census of Population Expert Group Meeting on Population and Housing Censuses Geneva July 7-9, 2010.
CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Oslo, Norway, September 2012 Jeroen Pannekoek and Li-Chun.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division Methodology Branch, Statistics Canada UNECE April 28-30, 2014.
The challenge of a mixed-mode design survey and new IT tools application: the case of the Italian Structure Earning Surveys Fabiana Rocci Stefania Cardinleschi.
Design of the 2011 Census Coverage Survey Owen Abbott (ONS) James Brown (Institute of Education)
May 12-15, Evaluating the Integrated Census Israel Pnina ZADKA Central Bureau of Statistics Israel.
Household Surveys: American Community Survey & American Housing Survey Warren A. Brown February 8, 2007.
Measuring Disability: Results from the 2001 Census and the 2001 Post-Censal Disability Survey Statistics Canada January 10, 2003.
Integrated Approach Processing Marie Brodeur Director General, Industry Statistics Branch, Statistics Canada St. Lucia February, 2014 SNA seminar in the.
United Nations Workshop on Evaluation and Analysis of Census Data, 1-12 December 2014, Nay Pyi Taw, Myanmar DATA VALIDATION-I Evaluation of editing and.
Analysis of the characteristics of internet respondents to the 2011 Census to inform 2021 Census questionnaire design Orlaith Fraser & Cal Ghee.
Census Processing Baku Training Module.  Discuss:  Processing Strategies  Processing operations  Quality Assurance for processing  Technology Issues.
Statistics Canada Citizenship and Immigration Canada Methodological issues.
© Statistisches Bundesamt, VI A Statistisches Bundesamt The new method of the next german Population census Johann Szenzenstein, Federal Statistical Office,
The 2011 Census: Estimating the Population Alexa Courtney.
INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011.
An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi, Antonia Manzari, Alessandra Reale UNECE-Eurostat Meeting.
Workshop on MDG, Bangkok, Jan.2009 MDG 3.2: Share of women in wage employment in the non-agricultural sector National and global data.
Managing Multi Mode Collection Instruments in the 2011 UK Census Frank Nolan, Heather Wagstaff, Ruth Wallis Office for National Statistics UK.
Administrative Data at Statistics Canada – Current Uses and the Way Forward Wesley Yung and Peter Lys, Statistics Canada.
FDI - Imputation. Overview Introduction Overview of Imputation Methods Overview of Outliering methods Overview of Estimation methods Aggregation Disclosure.
Evaluating imputation of sex and age for substitutes in substitute households Michael Ryan 2008 UNECE Work Session on Statistical Data Editing.
Canadian Census E&I – Lessons Learned from 2006 with Plans for 2011
An Active Collection using Intermediate Estimates to Manage Follow-Up of Non-Response and Measurement Errors Jeannine Claveau, Serge Godbout and Claude.
The European Statistical Training Programme (ESTP)
PRODCOM SURVEY IN THE UNITED KINGDOM
Chapter 13: Item nonresponse
Innovations on the Canadian Census
Presentation transcript:

Editing a Mixture of Canadian 2006 Census and Tax Data Mike Bankier Statistics Canada 2006 Work Session on Statistical Data Editing

Introduction Census respondents can give permission to link to tax form rather than answer 13 part census income question on 20% sample long form Census respondents can give permission to link to tax form rather than answer 13 part census income question on 20% sample long form Early returns indicate permission rate of 83%. Early returns indicate permission rate of 83%. Done to reduce level of response burden plus partial/total NR rate was rising for income. Done to reduce level of response burden plus partial/total NR rate was rising for income. Also census responses often approximate while tax responses generally very accurate. Also census responses often approximate while tax responses generally very accurate.

Overview of Talk Brief review of census/tax record linkage. Brief review of census/tax record linkage. Census data collection and processing prior to E&I. Census data collection and processing prior to E&I. Strategy to perform E&I on mixture of census and income tax data. Strategy to perform E&I on mixture of census and income tax data.

Census/Tax Record Linkage STC’s Generalized Record Linkage System (GRLS) based on Fellegi/Sunter will be used. STC’s Generalized Record Linkage System (GRLS) based on Fellegi/Sunter will be used. Name, birthdate, address, telephone number, sex, marital status, disability status, labour activity status (but not SIN) used to link. Name, birthdate, address, telephone number, sex, marital status, disability status, labour activity status (but not SIN) used to link. Nicknames, reordering names, accounting for typographic errors, search across Canada, more weight for common names will be used to achieve expected 85% match rate. Nicknames, reordering names, accounting for typographic errors, search across Canada, more weight for common names will be used to achieve expected 85% match rate.

Census/Tax Record Linkage Only very good matches retained since incorrect matches can generate undesirable outliers. Only very good matches retained since incorrect matches can generate undesirable outliers. No manual review done of all links because of large volume of data. No manual review done of all links because of large volume of data. Parameters fined tuned by running linkage several times and assessing quality of links for a sample of persons. Parameters fined tuned by running linkage several times and assessing quality of links for a sample of persons.

Data Collection/Processing Prior E&I In 2001, enumerators listed dwellings and dropped off a questionnaire. Questionnaires completed and mailed back by respondent. In 2001, enumerators listed dwellings and dropped off a questionnaire. Questionnaires completed and mailed back by respondent. In 2006, dwellings listed in advance and questionnaires were mailed to them for approximately 2/3 of dwellings. Other 1/3 treated the same way as in In 2006, dwellings listed in advance and questionnaires were mailed to them for approximately 2/3 of dwellings. Other 1/3 treated the same way as in % questionnaires completed over Internet. 20% questionnaires completed over Internet.

Data Collection/Processing Prior E&I Completed questionnaires scanned and data captured using intelligent character recognition. Completed questionnaires scanned and data captured using intelligent character recognition. Any responses not captured, keyed from imaged questionnaire. Any responses not captured, keyed from imaged questionnaire. In 2001, corrections made before keying (for example cents recorded as dollars) but not feasible for In 2001, corrections made before keying (for example cents recorded as dollars) but not feasible for In 2004 test, error rate of 11% for income variables. In 2004 test, error rate of 11% for income variables.

Data Collection/Processing Prior E&I Non-respondents or partial respondents with non-response to many questions were phoned or visited. Non-respondents or partial respondents with non-response to many questions were phoned or visited. Coverage edits applied at processing centre and persons were added or subtracted occasionally. Coverage edits applied at processing centre and persons were added or subtracted occasionally.

Data Collection/Processing Prior E&I Edits flagged persons with income responses outside limits. Edits flagged persons with income responses outside limits. Reviewed manually by comparing to correlated characteristics, looking at questionnaire image and manually modifying if necessary. Reviewed manually by comparing to correlated characteristics, looking at questionnaire image and manually modifying if necessary.

Data Collection/Processing Prior E&I Majority of income errors the result of Majority of income errors the result of –Decimals not recognized or not provided –Confusion between income sources –Monthly amounts reported –Occasionally erroneous amounts entered as prank Tax forms excluded from manual process because linkage done later and tax data mostly error free. Tax forms excluded from manual process because linkage done later and tax data mostly error free.

Adjustments Done – Coverage Studies Dwelling Classification Survey revisited sample of households to determine if they had been classified correctly as not part of housing stock, unoccupied or occupied. Census data base adjusted for estimated undercoverage and overcoverage. Dwelling Classification Survey revisited sample of households to determine if they had been classified correctly as not part of housing stock, unoccupied or occupied. Census data base adjusted for estimated undercoverage and overcoverage. Reverse Record Check measures undercoverage and overcoverage from all sources, is used to adjust the provincial population totals but does not adjust the Census data base. Reverse Record Check measures undercoverage and overcoverage from all sources, is used to adjust the provincial population totals but does not adjust the Census data base.

E&I of the Income Questions With completion of the tax/census linkage, income data from Census and tax sources will be available on the Census data base. With completion of the tax/census linkage, income data from Census and tax sources will be available on the Census data base. Canadian Edit and Imputation System (CANCEIS) will be used for all Census variables including income to perform Canadian Edit and Imputation System (CANCEIS) will be used for all Census variables including income to perform –Deterministic imputation –Minimum change donor imputation –Derive new variables

E&I of the Income Questions Assumed income data given by most respondents is correct so every attempt will be made to change as few responses as possible. Assumed income data given by most respondents is correct so every attempt will be made to change as few responses as possible. Some fields imputed deterministically. Some fields imputed deterministically. Donor imputation used to resolve NR. Donor imputation used to resolve NR. Also balance edits to make sure income components sum to within 10% of total income. Also balance edits to make sure income components sum to within 10% of total income. Total income is adjusted in later step to ensure perfect agreement with components. Total income is adjusted in later step to ensure perfect agreement with components.

E&I of the Income Questions Series of CANCEIS modules used. Series of CANCEIS modules used. First three modules First three modules –Merge tax and census data together. –Calculate average employment income by occupation and geography (SAS) for later use as matching variable. –Define strata to be used in later modules. –Determine status for each income field (income with amount reported, income indicated, loss indicated, no income, non- response).

E&I of the Income Questions Modules 4 to 6 impute missing income responses while ensuring total within 10% of sum of components. Modules 4 to 6 impute missing income responses while ensuring total within 10% of sum of components. Module 4 imputes partial respondents who provided total income. Module 4 imputes partial respondents who provided total income. Module 5 imputes partial respondents who did not provide total income but provided all the components of employment income. Module 5 imputes partial respondents who did not provide total income but provided all the components of employment income. Module 6 imputes all other partial and total non- respondents to the income question. Module 6 imputes all other partial and total non- respondents to the income question.

E&I of the Income Questions Modules 7 and 8 select a sample respondents with no pension benefits and impute positive amounts through donor imputation. Modules 7 and 8 select a sample respondents with no pension benefits and impute positive amounts through donor imputation. Modules 9 and 10 do something similar but for employment insurance benefits. Modules 9 and 10 do something similar but for employment insurance benefits. Module 11 derives other government benefits such as old age security pension. Module 11 derives other government benefits such as old age security pension.

E&I of the Income Questions Module 12 uses donor imputation to resolve non-response to the income tax field. Module 12 uses donor imputation to resolve non-response to the income tax field. Module 13 derives total income after tax. Module 13 derives total income after tax. Other modules aggregate income to the family and household levels plus derive 2 low income flags. Other modules aggregate income to the family and household levels plus derive 2 low income flags.

E&I of the Income Questions Donor selection edits extensively used to restrict what records which pass the edits can be used as donors. Donor selection edits extensively used to restrict what records which pass the edits can be used as donors. Reduces the number of outliers generated through imputation. Reduces the number of outliers generated through imputation.

E&I of the Income Questions In search for donors, distance measure applies larger weights to income fields considered more important or reliable such as total income. In search for donors, distance measure applies larger weights to income fields considered more important or reliable such as total income. Numeric amount can be missing but boxes checked can indicate that amount should be negative. Distance measure can be configured to almost guarantee that negative quantity will then be imputed. Numeric amount can be missing but boxes checked can indicate that amount should be negative. Distance measure can be configured to almost guarantee that negative quantity will then be imputed.

Other Changes in E&I Since 2001 Number of strata will be reduced dramatically since some variables used for stratification in 2001 now used in the distance measure to identify donors, this reduces boundary effects. Number of strata will be reduced dramatically since some variables used for stratification in 2001 now used in the distance measure to identify donors, this reduces boundary effects. Also in the past, exact matches within a stratum was required while with CANCEIS near matches will be allowed (e.g. age difference of 3 years). In past default imputation sometimes used while with CANCEIS a donor will always be found. Also in the past, exact matches within a stratum was required while with CANCEIS near matches will be allowed (e.g. age difference of 3 years). In past default imputation sometimes used while with CANCEIS a donor will always be found.

Differences in Processing of Census/Tax Data During donor imputation, data from tax records will generally be treated the same as data from census forms. During donor imputation, data from tax records will generally be treated the same as data from census forms. For tax data, will derive For tax data, will derive –Quebec provincial tax –Child Benefits –GST Credits For census form data, Child Benefits, GST Credits will be derived. For census form data, Child Benefits, GST Credits will be derived.

Differences in Processing of Census/Tax Data When adjusting for under-reporting of pensions and employment insurance, tax responses are not adjusted because of policy not to modify them. When adjusting for under-reporting of pensions and employment insurance, tax responses are not adjusted because of policy not to modify them. When imputing income tax field from census forms, donors restricted to tax forms because of poor quality of responses on census forms. When imputing income tax field from census forms, donors restricted to tax forms because of poor quality of responses on census forms.

Evaluation of Income E&I On experimental basis, income responses were blanked out and then CANCEIS imputed the blanks. On experimental basis, income responses were blanked out and then CANCEIS imputed the blanks. CANCEIS was quite effective at replicating responses and preserving distributions when matching variables were correlated with the variable being imputed. CANCEIS was quite effective at replicating responses and preserving distributions when matching variables were correlated with the variable being imputed.

Future Evaluations of Income Some people provide permission to link and also answer the income question on the census form. Some people provide permission to link and also answer the income question on the census form. It will be interesting to compare the tax and census responses for these people. It will be interesting to compare the tax and census responses for these people. In 2004 test, census income data often rounded to nearest thousand or five thousand. In 2004 test, census income data often rounded to nearest thousand or five thousand. Mode effects (paper versus internet) may also be studied. Mode effects (paper versus internet) may also be studied.

Future Changes to E&I It is hoped that we can eliminate certain deterministic modules by obtaining Child Benefits, for example, from other sources. It is hoped that we can eliminate certain deterministic modules by obtaining Child Benefits, for example, from other sources. Using CANCEIS, it may be possible to reduce the number of modules used in later censuses and improve consistency with labour and education variables. Using CANCEIS, it may be possible to reduce the number of modules used in later censuses and improve consistency with labour and education variables.

Conclusions Many changes to processing including use of tax data, new questionnaire layout for scanning, use of new E&I system. Many changes to processing including use of tax data, new questionnaire layout for scanning, use of new E&I system. These changes will require careful monitoring during production and may require fine-tuning. These changes will require careful monitoring during production and may require fine-tuning. Given high quality of tax data, its availability should prove useful. Given high quality of tax data, its availability should prove useful.