UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in.

Slides:



Advertisements
Similar presentations
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
Advertisements

INTRODUCTION ABOUT OMR. INDEX  Concept/Definition  Form Design  Scanners & Software  Storage  Accuracy  OMR Advantages  Commercial Suppliers.
Integrated Imaging and Document Management System Product Demonstration.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
MSIS 110: Introduction to Computers; Instructor: S. Mathiyalakan1 Systems Design, Implementation, Maintenance, and Review Chapter 13.
IRISDocument Server IRISPowerscan IRISCapture Pro/X4D Alone or together to meet your needs Alejandro Grüssi VAR / OEM Account Manager.
Brief Overview of Data Processing of Afghanistan Household Listing, Pilot Census Results, Population and Housing Census and NRVA Survey Brief Overview.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
UNSD Census Workshop Day 2 - Session 6 Data Capture: Optical Mark Recognition Andy Tye – International Manager DRS are Worldwide specialists in data capture.
Census Data Capture Challenge Intelligent Document Capture Solution UNSD Workshop - Minsk Dec 2008 Amir Angel Director of Government Projects.
Improving Government Effectiveness by Automating Data Capture: A Government Case Study Presented by: Jeff Toren Kofax Image Products Presented by: Ray.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
UNSD Census Workshop Day 2 - Session 6 Data Capture: Optical Mark Recognition Andy Tye – International Manager DRS are Worldwide specialists in Census.
1 Use of scanning technology for data capture ICR System (Intelligent Character Recognition) Information and Communication Technology Center National Statistical.
Data Processing Capabilities
By Cleophas Kiio Director, ICT 15-sep-101 The Best Practices in Census Data Processing Operation: Case of 2009 Census:
DRS Census Experience Andy Tye International Manager, DRS DRS Census Experience Andy Tye International Manager, DRS Census Meeting – New Caledonia Feb.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Sterling Chadee Director of Statistics. The processing of the data from the field enumeration began in July 2011 until September All data processors.
True OMR Second Darkest Mark Detection For Erasure Analysis.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
HOUSELISTING SCHEDULE NPR SCHEDULE HOUSEHOLD SCHEDULE.
IN THE MEANTIME…. INTERIM SOLUTIONS TO AUTOMATED DATA CAPTURE.
Using OCR for Census Data Capture in China National Bureau of Statistics of China.
Workshop on International Standards, Contemporary Technologies and Regional Cooperation, Noumea, New Caledonia, 04–08 February 2008 Results Generated from.
Copyright 2010, The World Bank Group. All Rights Reserved. COVERAGE, FRAMES & GIS, Part 2 Quality assurance for census 1.
Scanning Technology and Its Application in Ethiopia Yakob Mudesir Deputy Director General Central Statistical Agency of Ethiopia
Principles of Information Systems, Sixth Edition Systems Design, Implementation, Maintenance, and Review Chapter 13.
CDP Standard Grade1 Commercial Data Processing Standard Grade Computing Studies.
© Beta Systems Software AG Process Stages of Census Surveys Richard J. Lang, International Manager September 2008, Bangkok.
Data Capture Overview United Nations Statistics Division
UNSD Census Workshop Day 2 - Session 9 Data Capture: Process Stages Andy Tye – International Manager DRS are Worldwide specialists in data capture from.
Data Capture Technology Statistical Centre Of IRAN Presented by : MS. SOMAYE AHANGAR Vice – Presidency for Strategic Planning and Supervision Statistical.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.
Uganda – October 2009 Census Data Collection & Processing John Gomersall.
0 Paper rocess Scanner Throughput P eople PP P Effective Scanner Throughput Consider KOFAX – VRS (Virtual Re-Scan) Increase Productivity.
Census Data Processing: Contemporary Technologies for Data Capture Bangkok, Thailand September, 2008 By Jatan Kumar Saha Systems Analyst Bangladesh.
The Dark Side of Document Imaging: ‘The Hidden Cost of Capture’
Digitization/Scanning Process from Crystal Infosystems & Services.
UNSD Workshop Tanzania June 2008 JOHN GOMERSALL ANDY TYE.
Data Processing of the 2010 Population and Housing Census September 2008, Bangkok, Thailand National Statistical Office, Thailand.
Principles of Information Systems, Sixth Edition 1 Systems Design, Implementation, Maintenance, and Review Chapter 13.
UNSD Census Workshop Day 3 - Belarus Data Capture: Commercial Presentation (GPS) Andy Tye – International Manager DRS are Worldwide specialists in Census.
Census Data Capture: ABS Experience 1991 to 2006 Noumea February 2008.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
7 Strategies for Extracting, Transforming, and Loading.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
Census Data Capture with OCR Technology: Ghana’s Experience Presented at the UNSD Regional Workshop on Census Data Processing Dar es Salaam, Tanzania 9.
Census Processing Baku Training Module.  Discuss:  Processing Strategies  Processing operations  Quality Assurance for processing  Technology Issues.
Key From Image Technical Experiences and Insights Philippine NSO Implementation.
 ReadSoft 2004 Processing census forms.  ReadSoft 2004 ReadSoft Corporate Profile n Swedish company - founded1991 n Listed in Stockholm stock exchange.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in.
UNSD Census Workshop Data Capture: Optical Mark Recognition
UNSD Census Workshop Data Capture: Intelligent Character Recognition
Ethiopian 2007 CENSUS DATA CAPTURING AND PROCESSING
Selection and Use of Input Devices and Input Media High Volume Devices
Automatic Digitizing.
UN Workshop on Data Capture, Bangkok Session 7 Data Capture
Optical Data Capture: Optical Character Recognition (OCR)
UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture
Data Capture Process Stages
Improving assessment and feedback processes with OCR technology
Data Capture - ICR Typical Workflow
UNSD Census Workshop Day 2 - Session 6
Optical Data Capture: Optical Mark Recognition (OMR)
Manual Data Capture – Key Entry
Presentation transcript:

UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in Census data capture

Data Capture Intelligent Character Recognition (ICR) Elements Form design Hardware/Software requirements –Scanners –Computer infrastructure Workflow Accuracy Advantages Disadvantages DRS are Worldwide specialists in Census data capture

Data Capture - ICR Forms design Typical stock grade paper (90GSM) Corner Stones advised Dropout colour is recommended but not essential DRS are Worldwide specialists in Census data capture

Hardware requirements Image Scanners –TWAIN or ISIS Database Server (Full redundancy) Storage Server – Terabytes –(Raid 5, Mirrored, etc.) Network (Gb preferred) Administrator PC CS-Pro PCs Key correction PCs (Verification) Character Inspection PCs –(Mass verification - optional) Scanner PCs Automatic data capture PCs Software requirements MS-SQL or other database Data Storage, Archive and Retrieval Backup Software Software for Administrator PC CS-Pro for analysis and reporting PCs Software for Key correction PCs Software for Character inspection PCs Software for Scanner PCs Software for automatic data capture Data Capture - ICR DRS are Worldwide specialists in Census data capture

Data Capture - ICR Typical Workflow ICR DRS are Worldwide specialists in Census data capture

Data Capture - ICR Typical Workflow Paper Movement – Processing Centre/s DRS are Worldwide specialists in Census data capture

Data Capture - ICR Typical Workflow Receiving DRS are Worldwide specialists in Census data capture

Data Capture - ICR DRS are Worldwide specialists in Census data capture Typical Workflow Logging/Checking Open Batch Verify Contents Register Batch

Data Capture - ICR DRS are Worldwide specialists in Census data capture Typical Workflow Sifting Orientation Other Forms

Data Capture - ICR DRS are Worldwide specialists in Census data capture Typical Workflow Spine removal Cut Booklets 30,000/day

Data Capture - ICR DRS are Worldwide specialists in Census data capture Typical Workflow Scanning Double Sided High Speed Double Detection Ease of Use

Data Capture - ICR DRS are Worldwide specialists in Census data capture Typical Workflow Scanning/sorting Automatic Identification Data Capture

Data Capture - ICR DRS are Worldwide specialists in Census data capture Typical Workflow Storage Conditions Retrieval Space

Data Capture - ICR Typical Workflow Image Movement/Data Extraction – Processing Centre/s DRS are Worldwide specialists in Census data capture

Data Capture - ICR DRS are Worldwide specialists in Census data capture Typical Workflow Image interpretation Automated Process Background Task Page Identification De-skew Image Clean up Pre-defined Areas

Data Capture - ICR DRS are Worldwide specialists in Census data capture Typical Workflow Character inspection Tiling High Confidence Operator Decision Field Context Tall to Short

Data Capture - ICR DRS are Worldwide specialists in Census data capture Typical Workflow Key correction Low Confidence Operator Decision From Context External Verification

Data Capture - ICR DRS are Worldwide specialists in Census data capture Typical Workflow Key Correction ASCII File CSV Format 1 Line/Form CSPro Import

Data Capture - ICR Typical Workflow ICR DRS are Worldwide specialists in Census data capture

Data Capture - ICR Accuracy This is always the first question Handprint Numeric only in isolated fields 98% Numeric only in semi constrained fields 95-96% Alpha upper case only 90% Alpha lower case only 85-87% Alpha mixed case 75-80% Alpha/Numeric mixed case 50% or less –reduce by 5% if there are special characters not a-z and 0-9 The accuracy level post data correction (e.g. the final output accuracy) should be 100% (subject to good operators) DRS are Worldwide specialists in Census data capture

Data Capture - ICR Accuracy continued… The accuracy of all modern ICR engines are pretty much comparable The major differences with suppliers solutions are the methods and workflow utilised with each offering False positive detection takes 10 times longer than entry of characters recognised with low confidence – false positives (substitutions) are the most expensive errors DRS are Worldwide specialists in Census data capture

Data Capture - ICR Accuracy continued… Accuracy can be improved by: Restricting the responses to any given question Using external verification Using multiple ICR engines to ‘vote’ which is expensive Training your ICR engines on local hand writing styles (If possible) DRS are Worldwide specialists in Census data capture

Data Capture - ICR Advantages No specialist hardware required An image archive can be automatically produced of every form Very high speed scanning can be achieved Both OMR and ICR can be interpreted using ICR software Forms designed for ICR relatively easy to fill in. Locally printed forms can be used. Allows capturing much more complex data than with OMR alone DRS are Worldwide specialists in Census data capture

Data Capture - ICR Disadvantages Significant hardware/software and trained IT staff will be required Accuracy dependant on manual intervention High calibre IT staff are required to support the ICR system More complex cost/benefit analysis than with OMR alone DRS are Worldwide specialists in Census data capture

Data Capture - ICR Indicative Costs For 65 Million Population Census (20M Single Sided A4 household form) Processing period of 12 Weeks (8 hours/day 5 days/week) Hardware $800k-$1M in total Software $700k-$1.3M in total Total Indicative Costs are $1.5M to $2.3M No. of Staff in total –6-10 Managers – PC Operators DRS are Worldwide specialists in Census data capture

Data Capture - OMR Summary ICR offers considerable flexibility at the cost of higher skilled IT personnel The single most important factor for timely and accurate data capture is to make sure ‘the forms are filled in correctly and are returned in good condition’ DRS are Worldwide specialists in Census data capture

UNSD Census Workshop Day 2 - Session 7 Thank you for listening Andy Tye – International Manager DRS are Worldwide specialists in Census data capture