UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.

Slides:



Advertisements
Similar presentations
Collecting data Chapter 6. What is data? Data is raw facts and figures. In order to process data it has to be collected. The method of collecting data.
Advertisements

SADC Course in Statistics The Use of Optical Character Recognition Technology In National Statistical Offices.
AQA INFO 1 SECTION 4 Selection & Use of Input devices and media tcowling 2009 from Mott, Leaming & Williams.
In put Devices and Media In order for a computer to do anything it must be told what to do.
Review of AI from Chapter 3. Journal May 13  What advantages and disadvantages do you see with using Expert Systems in real world applications like business,
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
INTRODUCTION ABOUT OMR. INDEX  Concept/Definition  Form Design  Scanners & Software  Storage  Accuracy  OMR Advantages  Commercial Suppliers.
Input & Output Devices ASHIMA KALRA.
Commercial Data Processing Lesson 2: The Data Processing Cycle.
Data Capture Methods. In this topic, we will be looking at: Methods of data capture When it would be appropriate to use each method Advantages and disadvantages.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Input to the Computer * Input * Keyboard * Pointing Devices
AUTOMATIC DATA CAPTURE  a term to describe technologies which aim to immediately identify data with 100 percent accuracy.
بسم الله الرحمن الرحيم معالج الحروف الضوئي OCR. Introduction Definition : OCR stands for O ptical C haracter R ecognition refers to the branch of computer.
Complete the below… Input Complete the below… Processing Input Complete the below…
Workshop on international standards, contemporary technologies and regional cooperation Noumea, New Caledonia, 4 – 8 February 2008 Introduction to Optical.
UNSD Census Workshop Day 2 - Session 6 Data Capture: Optical Mark Recognition Andy Tye – International Manager DRS are Worldwide specialists in data capture.
Hardware, Software & Automatic input devices LO: Recognise hardware, software. Learning outcome: Correctly identify hardware and software. Recognise and.
Software for Digital Library By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Census Data Capture Challenge Intelligent Document Capture Solution UNSD Workshop - Minsk Dec 2008 Amir Angel Director of Government Projects.
ادارة الوثائق الالكترونية Naji Shukri Alzaza University of Palestine February 2010.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
UNSD Census Workshop Day 2 - Session 6 Data Capture: Optical Mark Recognition Andy Tye – International Manager DRS are Worldwide specialists in Census.
1 Use of scanning technology for data capture ICR System (Intelligent Character Recognition) Information and Communication Technology Center National Statistical.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Input Devices Manual and Automatic By Laura and Gracie.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
HOUSELISTING SCHEDULE NPR SCHEDULE HOUSEHOLD SCHEDULE.
IN THE MEANTIME…. INTERIM SOLUTIONS TO AUTOMATED DATA CAPTURE.
Data Capture Overview United Nations Statistics Division
UNSD Census Workshop Day 2 - Session 7 Data Capture: Intelligent Character Recognition Andy Tye – International Manager DRS are Worldwide specialists in.
Data Capture Technology Statistical Centre Of IRAN Presented by : MS. SOMAYE AHANGAR Vice – Presidency for Strategic Planning and Supervision Statistical.
UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and.
Uganda – October 2009 Census Data Collection & Processing John Gomersall.
Census Data Processing: Contemporary Technologies for Data Capture Bangkok, Thailand September, 2008 By Jatan Kumar Saha Systems Analyst Bangladesh.
Status of Data Capture Technology in Population and Housing Censuses in the ESCAP region Statistics Division ESCAP.
Data Capture.
Data Processing of the 2010 Population and Housing Census September 2008, Bangkok, Thailand National Statistical Office, Thailand.
UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation.
Census Data Capture: ABS Experience 1991 to 2006 Noumea February 2008.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping.
Census Data Capture with OCR Technology: Ghana’s Experience Presented at the UNSD Regional Workshop on Census Data Processing Dar es Salaam, Tanzania 9.
© 2008 Lockheed Martin Corporation. All Rights Reserved. Capture Data Quality Session 9 – Data Capture: Process Stages UNSD-ESCWA Regional Workshop on.
OMR, OCR and MICR Software Group 2: Maaz Masood(Leader) Haris Khan Talha Mobeen Hasan Shariq.
Key From Image Technical Experiences and Insights Philippine NSO Implementation.
Slide 1 A Free sample background from © 2003 By Default! HANDLING DATA IN INFORMATION SYSTEM 19 July 2005 Tuesday Lower 6.
Outsourcing of Census Operations United Nations Statistics Division UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary.
 ReadSoft 2004 Processing census forms.  ReadSoft 2004 ReadSoft Corporate Profile n Swedish company - founded1991 n Listed in Stockholm stock exchange.
ViciForm – Form Processing Solution Creating Info repositories from documents.
UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.
Section 6 Advanced Version OMR Software Modules & Features
The Big Picture Things to think about What different ways are there to collect information automatically? What are the advantages and disadvantages of.
Input & Output Devices ASHIMA KALRA.
DATA COLLECTION Data Collection Data Verification and Validation.
MAGNETIC STRIPE READER
UNSD Census Workshop Data Capture: Optical Mark Recognition
UNSD Census Workshop Data Capture: Intelligent Character Recognition
Selection and Use of Input Devices and Input Media High Volume Devices
UN Workshop on Data Capture, Bangkok Session 7 Data Capture
Optical Data Capture: Optical Character Recognition (OCR)
UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture
Data Capture Process Stages
Data Capture - ICR Typical Workflow
UNSD Census Workshop Day 2 - Session 6
Optical Data Capture: Optical Mark Recognition (OMR)
Data Capture F451 - AS Computing.
Input and Output devices in a Computer
Presentation transcript:

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Optical Data Capture: Optical Character Recognition (OCR) Intelligent Character Recognition (ICR) Intelligent Recognition

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Summary  Concept/Definition  Forms Design  Scanners & Software  Storage  Accuracy  OCR/ICR Advantages and Disadvantages  Intelligent Recognition (IR)  Commercial Suppliers

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Definition/Concept of OCR  Gives scanning and imaging systems the ability to turn images of machine printed characters into machine readable characters. Images of the machine printed characters are extracted from a bitmap of the scanned image

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Definition/Concept of ICR  Gives scanning and imaging systems the ability to turn images of hand written characters into machine readable characters Images of the hand written characters are extracted from a bitmap of the scanned image

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 OCR and ICR Differences  OCR is less accurate than OMR but more accurate than ICR  ICR will require editing to achieve high data coverage

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Forms  OCR/ICR has less strict form design compared to OMR No timing tracks Has Registration Marks  ICR requires hand printed boxes filled one alphanumeric character per box

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 OCR  Forms OCR/ ICR is more flexible since:  no timing tracks are required  The image can float on a page The use of drop color reduces the size of the scanner’s output and enhances the accuracy ICR/OCR technology often uses registration mark on the four-corners of a document, in the recognition of an image

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 OCR/ICR Scanners and Software  Forms can be scanned through a scanner and then the recognition engine of the OCR/ICR system interpret the images and turn images of handwritten or printed characters into ASCII data (machine-readable characters).  Users can scan up without doing the OCR  Speeds Range from: sheets/min (dependent on the recognition engine)

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 OCR/ICR Storage Characteristics  Storage/Retrieval  Images are scanned and stored and maintained electronically  There is no need to store the paper forms as long as you safeguard the electronic files  With OCR/ICR technologies, images can be scanned, indexed, and written to optical media

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Ideal OCR/ICR Accuracy Thresholds  Accuracy: Accuracy achieved by data entry clerks (~99.5%) are approximately equal to OCR/ICR in in perfect tuning (~99.5%) Up to 99.9% accuracy with editing (like OMR)  The recognition engine must be tuned, tested and validated very carefully

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 OCR/ICR Advantages  Advantages  Recognition engines used with imaging can capture highly specialized data sets  OCR/ICR recognize machine-printed or hand-printed characters.  Scanning and recognition allowed efficient management and planning for the rest of the processing workload  Quick retrieval for editing and reprocessing

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 OCR/ICR Disadvantages  Technology is costly  May require significant manual intervention  Additional workload to data collectors -ICR has severe limitations when it comes to human handwriting  Characters must be hand-printed/machine-printed with separate characters in boxes  ineffective when dealing with cursive characters

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 OMR-OCR/ICR Compared

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 OCR/ICR Challenges/Issues  Has corresponding issues with OMR  Algorithm development (Preparation of memory dictionary)  Processing time considerations due to recognition engine  Development costs

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Definition/Concept of IR State of the art recognition technology  Gives scanning and imaging systems the ability to turn images of hand written and cursive characters into machine readable characters  Images of the hand written and cursive characters are extracted from a bitmap of the scanned image  The ability to capture cursive make this method unique

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Definition/Concept of IR  eight elements that make up the trajectories of all cursive letters (figure 1) Photo: Parascript LLC

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Definition/Concept of IR  Intelligent Recognition dynamically uses context  context is used during the recognition process, improving the accuracy of results  Contexts helps to identify letters where the symbol segmentation of an image is ambiguous Photo: Parascript LLC

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Cursive Bad quality machine print Unconstrained Handprint Constrained Handprint Machine Print TEXT STYLES FORM TYPES No special form design No constraining boxes or combs Condensed strings Dirty & Noisy forms Bad quality paper Legacy Forms Specially designed for automatic recognition Constraining boxes or combs Drop out ink for preprinted text & boxes TECHNOLOGY EVOLUTION OCR ICR Intelligent Recognition Technology Evolution Illustration: Conference on Technology Options for 2011 Census

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 Major Commercial Suppliers  Top Image Systems (TIS) (  ReadSoft (  Teleform (  Scanner Suppliers Fujitsu, Canon, Bell & Howell, Kodak

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, May 2008 THANK YOU!