1 Census 1996, 2001 & Community Survey (CS) United Nations Regional Workshop on Census Data Processing Contemporary Technology from Census Data Capturing.

Slides:



Advertisements
Similar presentations
1 of 18 Information Dissemination New Digital Opportunities IMARK Investing in Information for Development Information Dissemination New Digital Opportunities.
Advertisements

Slide 1Slide Slide 1 International Conference on Establishment Surveys III Montreal June 18-21, 2007 United States Department of Agriculture National Agricultural.
System Construction and Implementation Objectives:
Lecture 13 Revision IMS Systems Analysis and Design.
©2003 Prentice Hall Business Publishing, Accounting Information Systems, 9/e, Romney/Steinbart 18-1 Accounting Information Systems 9 th Edition Marshall.
System Implementations American corporations spend about $300 Billion a year on software implementation/upgrade projects.
UNSD-CELADE Regional Workshop on Census Cartography for the 2010 Latin America’s census round Best practices in the use of GIS & census mapping.
Brief Overview of Data Processing of Afghanistan Household Listing, Pilot Census Results, Population and Housing Census and NRVA Survey Brief Overview.
Census Data Capture Challenge Intelligent Document Capture Solution UNSD Workshop - Minsk Dec 2008 Amir Angel Director of Government Projects.
Data capture of the PHC 2002 (Uganda) Experiences and lessons leant.
Improving the Quality of Tax Statistics: Recent Innovations in Editing and Imputation Techniques at the Statistics of Income Division of the U.S. Internal.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
The Core Welfare Indicators Questionnaire: A CWIQ Option for Monitoring Poverty Reduction Strategies.
Sudan Experience on Poverty Survey Somaia K.E.Omer Date 7-8 Aug بسم الله الرحمن الرحيم.
Complete and Integrated Lifecycle Management. Challenges 1.
1st NRC Meeting, October 2006, Amsterdam 1 ICCS 2009 Field Operations.
Chapter 10.
By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:
Computer Based Information Systems Control UAA – ACCT 316 – Fall 2003 Accounting Information Systems Dr. Fred Barbee.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Sterling Chadee Director of Statistics. The processing of the data from the field enumeration began in July 2011 until September All data processors.
MSS Technologies and the AIIM Grand Canyon Chapter present: Electronic Document Management System Needs Analysis.
1 DATA CAPTURE – PROCESSING 2006 POPULATION & HOUSING CENSUS OF NIGERIA Presented at UN Regional Workshop on Census Data Processing By Adesola Fatilewa.
Workshop on International Standards, Contemporary Technologies and Regional Cooperation, Noumea, New Caledonia, 04–08 February 2008 Results Generated from.
Scanning Technology and Its Application in Ethiopia Yakob Mudesir Deputy Director General Central Statistical Agency of Ethiopia
MICS Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Overview of MICS Tools, Templates, Resources, Technical Assistance.
Software Systems for Survey and Census Yudi Agusta Statistics Indonesia (Chief of IT Division Regional Statistics Office of Bali Province) Joint Meeting.
3rd NRC Meeting, 9-12 June 2008, Windsor ICCS 2009 Main Survey Field Operations.
© Beta Systems Software AG Process Stages of Census Surveys Richard J. Lang, International Manager September 2008, Bangkok.
UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in.
Data Capture Technology Statistical Centre Of IRAN Presented by : MS. SOMAYE AHANGAR Vice – Presidency for Strategic Planning and Supervision Statistical.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.
Copyright 2010, The World Bank Group. All Rights Reserved. ICT - a core management issue Part 1 Managing ICT resources Produced in Collaboration between.
Multi-modal of data collection for the 2010 Population and Housing Census National Statistical Office, Thailand (Daejeon, Republic of Korea, April.
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
Data Management Seminar, 9-12th July 2007, Hamburg 11 ICCS 2009 – Field Trial Survey Operations Overview.
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys Addis Ababa,
Quality Assurance Programme of the Canadian Census of Population Expert Group Meeting on Population and Housing Censuses Geneva July 7-9, 2010.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
Data processing of the 1999 Vietnam Population Census.
Data Processing of the 2010 Population and Housing Census September 2008, Bangkok, Thailand National Statistical Office, Thailand.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
Open GSBPM compliant data processing system in Statistics Estonia (VAIS) 2011 MSIS Conference Maia Ennok Head of Data Warehouse Service Data Processing.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
Census Data Capture with OCR Technology: Ghana’s Experience Presented at the UNSD Regional Workshop on Census Data Processing Dar es Salaam, Tanzania 9.
ABS Statistical Databases Session 6 Mark Viney Australian Bureau of Statistics 6 June 2007.
Use of Mobile Technology for Data Collection in Zimbabwe Experiences Gained and Lessons Learnt By Rodgers M. Sango Zimbabwe National Statistics Agency.
Census Processing Baku Training Module.  Discuss:  Processing Strategies  Processing operations  Quality Assurance for processing  Technology Issues.
Bina Nusantara 19 C H A P T E R SYSTEM CONSTRUCTION AND IMPLEMENTATION.
Copyright 2010, The World Bank Group. All Rights Reserved. Agricultural Coding and Data Processing Section A 1.
Copyright 2010, The World Bank Group. All Rights Reserved. Statistical Work Plan Development Section A 1.
Presentation to the UN Experts Group Meeting UNSD 29 May - 1 June 2007 Alister Nairn Director - Geography Section GIS BASED CENSUS MAPPING APPROACHES -
Library Online Resource Analysis (LORA) System Introduction Electronic information resources and databases have become an essential part of library collections.
CSO ITSIP Project - implementation of new Data Management System (DMS) ITDG meeting, Luxembourg, October 2006 Presentation by Joe Treacy CSO, Ireland.
5.8 Finalise data files 5.6 Calculate weights Price index for legal services Quality Management / Metadata Management Specify Needs Design Build CollectProcessAnalyse.
Census Planning and Management
National Population Commission (NPopC)
UNSD Census Workshop Data Capture: Intelligent Character Recognition
Session 8 Data Processing Estonian case study
Software Systems for Survey and Census
Data Capture Process Stages
Systems Construction and Implementation
Albania 2021 Population and Housing Census - Plans
Systems Construction and Implementation
Mapping Data Production Processes to the GSBPM
Population and Housing Census 2015, and Challenge
Lao in Census Quality Assurance
Technical Coordination Group, Zagreb, Croatia, 26 January 2018
International Standards and Contemporary Technologies,
Presentation transcript:

1 Census 1996, 2001 & Community Survey (CS) United Nations Regional Workshop on Census Data Processing Contemporary Technology from Census Data Capturing and Editing: A perspective of South Africa Data Processing System A presentation by South African Data Processing Team Dar-es-Salaam, Tanzania, 9-13 June 2008

2 Census 1996, 2001 & Community Survey (CS) The presentation layout Introduction Data processing Goal Planning phase Design of Data Processing System System Development & Testing Implementation & Operations –Process flow –Document Management System –Progress reporting –Tool of scanning –Exceptions –Quality Assurance (QA) Accounting or Balancing process Data validation & Editing Tabulation and output products

3 Census 1996, 2001 & Community Survey (CS) Introduction Data Processing is considered as part of Survey operations value chain (Proper define accountability structure); There is a define inter-dependency links with other Census sections (i.e. questionnaire design, Data collection,…); Heavily dependent on the available support in information technology around the country (Outsourcing of management of the system); Tight project management principle checking timeline, resources and detailed production lines Obliged to adapt on ever changing technology (1996 KFP, 2001 Census scanning, 2007 scanning with old scanner, 2011 Census scanning with upgraded scanners)

4 Census 1996, 2001 & Community Survey (CS) To accurately process or convert the statistical information from different collection tools such as the questionnaire into a comprehensive electronic data that is clean, accurate, consistent and reliable. Goal of Data Processing

5 Census 1996, 2001 & Community Survey (CS) Planning phase Going through the lessons learned from previous censuses and surveys (1996 Census, 2001 census, 2007 Community Survey ) –Preparation of processing site In 1996 Census, distributed data processing centre in 9 provinces In 2001 Census and 2007 CS centralized data processing centre –Mode of Data Capturing In 1996 Census Manual capturing (key from paper) running on SQL database with interface developed in visual basic In 2001 Census and 2007 CS: Use of proprietary scanning technology linked to Oracle database –Census Budget The 1996 Census budget estimated at 500 Million Rand The 2001 Census budget estimated at 1.2 Billion Rand The 2007 CS budget estimated at 600 Million Rand –Human Resource The 1996 Census have more staff for key from paper (options considered for Job creation across the country) The 2001 Census and 2007 CS has a reduced number of staff supporting the scanning technology working on shifts –Duration The 1996 Census data capturing was planned for 12 Months The 2001 Census was planned for 6 months. However, the period was extended 18 months due to not tested new technology The 2007 CS took only 3 months as planned –Systems design and specifications In 2001 Census, system specification & development was reviewed during implementation In 2007 CS, most the system specification & development were completed and tested before the production

6 Census 1996, 2001 & Community Survey (CS) Planning phase Strategic plan –There is a policy on standard procedure in terms of documentation, process flow, metadata, concepts managed by DMID (Data Management and Information Delivery) project ; –Common strategy across surveys program by using scanning technology with control of transaction in database –Moving toward a Centralised Corporate data processing Centre ( store management,…) –Accounting of production transaction tracking the questionnaire using a barcode; –Measurement of quality at each process of the production (; –Having a permanent team of data processors in order to keep the experience while build the capacity; –Acceptance of any system or module into production after it has gone through testing phase to avoid the experience of 2001 Census of untested system;

7 Census 1996, 2001 & Community Survey (CS) Operational plan & Budget –Since 2001 Census, there is a detailed activities list, sub-activities and tasks with timelines (start and end date) and responsible persons; –Since 2007 CS, each activity is linked to budget in what is called activity/task base costing; –Since 2007 CS, there is an independent and dedicate team in charge of project management and monitoring of activities; –A list of documents and other derivable are submitted to the project management team (PMO) to keep track of the progress; –Development of performance indicators for PMO to track which will give the daily production counts per process; –Based on activities costing, the budget has never been an issue, except in 2001 Census when the project went beyond the planned period. Planning phase

8 Census 1996, 2001 & Community Survey (CS) Design of Data Processing The data processing team get the user requirement from the questionnaire design team and data collection team; The team comprised by Data processors, system analyst (1 person), programmers, statisticians and Data technologist (IT technicians) prepare the overall design specifications; The data processing team is supplemented by the Data Collection team in the management of production and staff management on flow; The scanning module of the system is out source (in 2001 a consortium of companies, but in 2007 CS one company was accountable); In 2007 CS, the data processing project management was controlled in house to avoid the lack of accountability observed in 2001 Census where it was done by external (PROCON) Since the workflow was changing in 2001Census, a approved workflow with the operation procedure manual was ready in 2007 CS before the start of production The functional specifications where done only in 2007 CS as part of overall system specification; The technical specifications were completed for as build system in 2001 Census whereas the 2007 CS specification where done before any implementation.

9 Census 1996, 2001 & Community Survey (CS) System Development & Testing In 1996 Census, the system development was done by in-house team supported by the Swedish consultants; In 2001 Census, the system development was outsourced to local based company that put together a consortium of service providers in project management, system development, scanner specialist/maintenance, Image and recognition software; In 2007 CS, the system development and project management was done in-house outsourcing only the scanning software and scanner maintenance; In 1996 Census, only unit test was conducted whereas the 2001 Census, most of the tests (unit tests, production load test,…) were conducted while in production already; In 2007 CS, all tests were done before production: –For instance, the background colour drop out was tested in 2007 CS whereas the blue colour background in 2001 Census required a blue light in scanner (tested after months of production); –The decision on exception handling was done during production in 2001 Census (rescan or transcription) whereas in 2007 CS, the questionnaire were send to Key From Paper (KFP) or Key From Image (KFI); –In 2007 CS, false-positive reading were reduce by introducing voting rules between two different recognition engines whereas in 2001 Census all false-positive reading were sent to verification stage (Tiling and Completion/Key correction)

10 Census 1996, 2001 & Community Survey (CS) Implementation & Operations Operational procedures –In 2001 Census, operational procedure manual was prepared during production; –In 2007 CS, the operational procedure was in place before training –Every day production account is produced (extraction from Oracle database) Recruitment –In 1996 Census, the production staff were selected based on keying speed only; –In 2001 Census, the production staff were recruited based on each process requirement; –In 2007 CS, the production staff have versatile skills as data processors and can move between processes depending on needs as determined by the flow manager. –In 2001 Census, staff worked 24 hours, 7 days a week in 3 shifts. In 2007 CS, only one shift was managed to meet the deadline. Training –IN 2001 Census, training was conducted by service provider (PROCON) whereas in 1996 Census and 2007 CS, the training was by the senior data processors, system developers and statistician who were part of the design team. Preparation of work environment –In 1996 Census used 9 sites. In 2001 Census, one warehouse site and in 2007 CS, there were two sites (one for main storage and the other for the production. –Site preparation including partitioning, hardware and networking installed one month before the end of Census field operation.

11 Census 1996, 2001 & Community Survey (CS) High Level Process Flow Operations cont…

12 Census 1996, 2001 & Community Survey (CS) Document Management System Tracking the documents movement across processes Accounting of all transactions including the production staff login; Database driven (SyBase in 1996, Oracle in 2001 and 2007); Progress reporting per user, per function and per process Reporting gives the performance management (speed, time, production unit,…) Operations cont…

13 Census 1996, 2001 & Community Survey (CS) Progress reporting Operations cont…

14 Census 1996, 2001 & Community Survey (CS) Progress reporting Operations cont…

15 Census 1996, 2001 & Community Survey (CS) Tool of scanning Kodak 9520D –Used in 2001 Census; –Used in 2007 CS; Differential scanner feeding (pages by page and/or batches); Barcode recognition at scanning time Operation cont…

16 Census 1996, 2001 & Community Survey (CS) Exceptions Questionnaires transcription: –Damaged –Unscannable –Inconsistent page numbering –Unique identifier (barcode) Key From Paper (KFP): –Poor image quality –Faint writing –Missing pages –Wrong unique identifier (Enumerator Area, Dwelling Unit & Household Number) False-Positive reading: –Poor software recognition –Poor image quality –Incomplete text (character) –Unrecognized mark or character Failed quality checks: –Quality rate below the threshold (95% accurate rate) Operation cont…

17 Census 1996, 2001 & Community Survey (CS) Quality Assurance (QA) In 1996 Census, the quality was implemented as part of double keying without any measurement attached to it; In 2001 Census, the quality was measured at scanning time (check image quality) and after data capturing (Key from Image of the sampled batches (the threshold was 97%); In 2007 CS, the sample of captured were subjected to second capture comparing with the first capture where the agreement rate was determined (the threshold was 95% reduced due to good image quality): –For scanned cases: sample keyed from image and calculation of an agreement rate; –For exceptional cases: 100% double keyed from Paper and calculation of agreement rate; Operation cont…

18 Census 1996, 2001 & Community Survey (CS) Accounting or Balancing process After capturing, each questionnaire is accounted for linked to the geographical area (EA) and having the correct data structure (household, persons,….) before any export; In 1996 Census, the export process of captured data into SAS/ASCII for for post-capture process (editing and tabulation); In 2001 Census, the balancing process took longer because of lack of reference link to the EA of postal questionnaire (self- enumeration); In 2007 CS, a Census and Administration System (CSAS) assisted in getting the full account of the questionnaires linked to their referenced geography;

19 Census 1996, 2001 & Community Survey (CS) Data validation & Editing In 1996, the adopted strategy was not to impute any derived value. Only manual editing was allowed; In 2001 Census, based on editing specification with the assistance of US Bureau of Census, an automated editing was implemented using IMPS/CSpro. The 2007 CS follows the same approach used in 2001 Census. Different editing report with imputation rates were produced to an editing committee which come out with the rule to apply for correction; In 2001 Census and 2007 CS, limited manual editing were implemented; One of key editing rule is the removal of minimal processable cases caused by poor recognition or false-positive reading; Though the editing has been in ASCII, the output database is exported with in different formats (i.e. users driven: ASCII, Oracle, SAS, Oracle,…) linked with the metadata;

20 Census 1996, 2001 & Community Survey (CS) Data validation & editing Cont…

21 Census 1996, 2001 & Community Survey (CS) Tabulation and output products Since Stats SA policy is to give access to data users, the strategy is to put the Census data in different format to increase accessibility and promote data use; In 1996 Census, the output database was packaged in SuperCorss database and a set of aggregated databases put on CD for the users; In 2001 Census, the access to the data was increased by adding on the online processing tabulation tools (PX-Web), the SuperCross, reduced ASCII file,…. In 2007 CS, the data is also available in different format (SuperCross, ASCII file, PX-web and other map/chart linked tools The traditional reports are still produced based on tabulation plan/output reports

22 Census 1996, 2001 & Community Survey (CS) Benefit of scanning Technology Improve the Quality of the Data Save Time Reduce Costs

23 Census 1996, 2001 & Community Survey (CS) THANK YOU!