Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division Methodology Branch, Statistics Canada UNECE April 28-30, 2014.

Similar presentations


Presentation on theme: "Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division Methodology Branch, Statistics Canada UNECE April 28-30, 2014."— Presentation transcript:

1 Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division Methodology Branch, Statistics Canada UNECE April 28-30, 2014 Editing the 2011 Census data with CANCEIS and options considered for 2016 1 UNECE 2014 Statistics Canada Statistique Canada

2 Outline 1.Overview of CANCEIS 2.Recent improvements to CANCEIS and to the 2011 E&I strategy 3.Options considered for 2016 2 UNECE 2014 Statistics Canada Statistique Canada

3 1. Overview of CANCEIS (CANadian Census Edit and Imputation System) 3 UNECE 2014 Statistics Canada Statistique Canada

4 4 UNECE 2014 Statistics Canada Statistique Canada

5 CANCEIS users  Domestic Users (other than Census) National Household Survey Canadian Income Survey Survey on Financial Security Survey of Household Spending Longitudinal and International Study of Adults 5 UNECE 2014 Statistics Canada Statistique Canada

6  Other countries (users, past users, or exploring CANCEIS) Argentina Australia Brazil Israel ItalyJapan New Zealand PeruSwitzerland UKUSA  CSPA initiative (Common Statistical Processing Architecture) Targeted CANCEIS in a pilot with New Zealand to test portability. 6 UNECE 2014 Statistics Canada Statistique Canada

7 Imputation methods available  Deterministic imputation  Donor imputation Based upon the principles of –minimum change –preserving distribution of the data 7 UNECE 2014 Statistics Canada Statistique Canada

8 Developed by Mike Bankier in the 1990’s A.Apply edits  Search for invalid values, missing & inconsistencies  Classify records as Passed or Failed New Imputation methodology (NIM) 8 UNECE 2014 Statistics Canada Statistique Canada

9 B.Perform donor imputation Step1: establish list of best donors (i.e. that most resemble the failed record) Step2: find best imputation actions for these donors Step3: select an imputation action at random New Imputation methodology (NIM) (cont’d) 9 UNECE 2014 Statistics Canada Statistique Canada

10 Advantages of this methodology  Offers a practical solution to an operational problem  Allows simplification of edits  use minimum set in relation to the donor chosen  Computationally efficient  Can deal with non-linear edits  Data driven imputation UNECE 2014 Statistics Canada Statistique Canada 10

11 CANCEIS Features  Categorical, numerical and alphanumeric variables  Large numbers of edits & large data files  Portable, flexible & efficient  All parameterized  easy to customize Ten different distance functions to find best donors, which cover different types of variables 11 UNECE 2014 Statistics Canada Statistique Canada

12 over all paired fields (i) where V fi is the value of matching variable i for the failed record; V pi is the value of matching variable i for the passed record; w i is the weight of variable i (w i ≥0 ); D i is the distance function chosen for variable i (0 ≤ D i ≤1 ). Distance Measure for Potential Donors 12 UNECE 2014 Statistics Canada Statistique Canada

13 CANCEIS System Components Data Data Dictionary System Parameters Decision Logic Tables Donor Imputation Deterministic Imputation Imputed Data Reports & Logs 13 Inputs CANCEIS Components Outputs UNECE 2014 Statistics Canada Statistique Canada

14 14 2. Recent improvements to CANCEIS and to the 2011 E&I strategy UNECE 2014 Statistics Canada Statistique Canada

15 Improvements  For 2011, CANCEIS was rewritten in C# (C-sharp) in a.NET environment Easier to maintain Improved efficiency (lower processing time) Increased stability 15 UNECE 2014 Statistics Canada Statistique Canada

16 Improvements (cont’d)  Multi-threading now possible in donor imputation Enables processing of multiple failed units at one time Increases performance and reduces processing time 16 UNECE 2014 Statistics Canada Statistique Canada

17 Improvements (cont’d)  CANCEIS is more user friendly Before: could handle only.txt files (inputs/outputs) Now: handling also data dictionaries in Excel and creating summary reports in HTML 17 UNECE 2014 Statistics Canada Statistique Canada

18 Improvements (cont’d)  Increased content and level of detail in the logs Facilitate troubleshooting Facilitate validating desired strategy for each module 18 UNECE 2014 Statistics Canada Statistique Canada

19 New features added  Additional flexibility in specifying imputation parameters  New parameter to specify that the staged search will not stop until an excellent donor is found Continue to search if the target quality is not reached 19 UNECE 2014 Statistics Canada Statistique Canada

20 Modification to the 2011 E&I strategy  Group these five processes Place of birth of parents Immigration status Aboriginal status Citizenship Visible minorities into one ethnocultural process 20 UNECE 2014 Statistics Canada Statistique Canada

21 Modification to the 2011 E&I strategy (cont’d)  Goals: Increase data coherence between processes by using one single donor to impute all variables Reduce manual fixes after E&I  Challenge: manage lots of edits & data 21 UNECE 2014 Statistics Canada Statistique Canada

22 22 3. Options considered for 2016 UNECE 2014 Statistics Canada Statistique Canada

23  Planning E&I strategy for 2016 Evaluating the use of administrative data as alternative source of data Exploring if the language processes could be grouped (mother tongue, home language, official language) Exploring if steps within processes could be grouped Exploring if processes could be run in parallel Goals  improve quality, reduce processing time 23 UNECE 2014 Statistics Canada Statistique Canada

24  Continue improving CANCEIS to serve future requirements of the Census Research and development ongoing  Done by programmers and methodologists  CANCEIS v5.2 to be released by Dec.2014 Allowing DLTs and System Parameters in Excel Revisited contents of Inputs & Outputs Standardized naming convention Improvements to default values of parameters 24 UNECE 2014 Statistics Canada Statistique Canada

25  Will offer the CANVERT conversion tool Ensures smooth transition from v5.1 to v5.2  Updated documentation will be provided Basic User Guide (with two simple examples and basic features) Comprehensive User Guide (with more examples, and all features) 25 UNECE 2014 Statistics Canada Statistique Canada

26 Merci!  For more information,  Pour plus d'information, please contact:veuillez contacter : Lyne Guertin (1-613-951-4543) lyne.guertin@statcan.gc.ca Thank you for your attention! 26


Download ppt "Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division Methodology Branch, Statistics Canada UNECE April 28-30, 2014."

Similar presentations


Ads by Google