Presentation is loading. Please wait.

Presentation is loading. Please wait.

UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data.

Similar presentations


Presentation on theme: "UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data."— Presentation transcript:

1 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Optical Data Capture: Optical Character Recognition (OCR) Intelligent Character Recognition (ICR) Intelligent Recognition

2 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Summary  Concept/Definition  Forms Design  Scanners & Software  Storage  Accuracy  OCR/ICR Advantages and Disadvantages  Intelligent Recognition (IR)  Commercial Suppliers

3 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Definition/Concept of OCR  Gives scanning and imaging systems the ability to turn images of machine printed characters into machine readable characters. Images of the machine printed characters are extracted from a bitmap of the scanned image

4 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Definition/Concept of ICR  Gives scanning and imaging systems the ability to turn images of hand written characters into machine readable characters Images of the hand written characters are extracted from a bitmap of the scanned image

5 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 OCR and ICR Differences  OCR is less accurate than OMR but more accurate than ICR  ICR will require editing to achieve high data coverage

6 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Forms  OCR/ICR has less strict form design compared to OMR No timing tracks Has Registration Marks  ICR requires hand printed boxes filled one alphanumeric character per box

7 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 OCR  Forms OCR/ ICR is more flexible since:  no timing tracks are required  The image can float on a page The use of drop color reduces the size of the scanner’s output and enhances the accuracy ICR/OCR technology often uses registration mark on the four-corners of a document, in the recognition of an image

8 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008

9 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 OCR/ICR Scanners and Software  Forms can be scanned through a scanner and then the recognition engine of the OCR/ICR system interpret the images and turn images of handwritten or printed characters into ASCII data (machine-readable characters).  Users can scan up without doing the OCR  Speeds Range from: 85-160 sheets/min (dependent on the recognition engine)

10 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 OCR/ICR Storage Characteristics  Storage/Retrieval  Images are scanned and stored and maintained electronically  There is no need to store the paper forms as long as you safeguard the electronic files  With OCR/ICR technologies, images can be scanned, indexed, and written to optical media

11 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Ideal OCR/ICR Accuracy Thresholds  Accuracy: Accuracy achieved by data entry clerks (~99.5%) are approximately equal to OCR/ICR in in perfect tuning (~99.5%) Up to 99.9% accuracy with editing (like OMR)  The recognition engine must be tuned, tested and validated very carefully

12 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 OCR/ICR Advantages  Advantages  Recognition engines used with imaging can capture highly specialized data sets  OCR/ICR recognize machine-printed or hand-printed characters.  Scanning and recognition allowed efficient management and planning for the rest of the processing workload  Quick retrieval for editing and reprocessing

13 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 OCR/ICR Disadvantages  Technology is costly  May require significant manual intervention  Additional workload to data collectors -ICR has severe limitations when it comes to human handwriting  Characters must be hand-printed/machine-printed with separate characters in boxes  ineffective when dealing with cursive characters

14 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 OMR-OCR/ICR Compared

15 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 OCR/ICR Challenges/Issues  Has corresponding issues with OMR  Algorithm development (Preparation of memory dictionary)  Processing time considerations due to recognition engine  Development costs

16 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Definition/Concept of IR State of the art recognition technology  Gives scanning and imaging systems the ability to turn images of hand written and cursive characters into machine readable characters  Images of the hand written and cursive characters are extracted from a bitmap of the scanned image  The ability to capture cursive make this method unique

17 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Definition/Concept of IR  eight elements that make up the trajectories of all cursive letters (figure 1) Photo: Parascript LLC

18 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Definition/Concept of IR  Intelligent Recognition dynamically uses context  context is used during the recognition process, improving the accuracy of results  Contexts helps to identify letters where the symbol segmentation of an image is ambiguous Photo: Parascript LLC

19 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Cursive Bad quality machine print Unconstrained Handprint Constrained Handprint Machine Print TEXT STYLES FORM TYPES No special form design No constraining boxes or combs Condensed strings Dirty & Noisy forms Bad quality paper Legacy Forms Specially designed for automatic recognition Constraining boxes or combs Drop out ink for preprinted text & boxes TECHNOLOGY EVOLUTION OCR ICR Intelligent Recognition Technology Evolution Illustration: Conference on Technology Options for 2011 Census

20 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 Major Commercial Suppliers  Top Image Systems (TIS) (http://www.topimagesystems.com)http://www.topimagesystems.com  ReadSoft (http://www.readsoft.com)http://www.readsoft.com  Teleform (http://www.intelliscan.com/TeleForm1.htm)http://www.intelliscan.com/TeleForm1.htm  Scanner Suppliers Fujitsu, Canon, Bell & Howell, Kodak

21 UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data editing Doha, State of Qatar, 18-22 May 2008 THANK YOU!


Download ppt "UNSD-ESCWA Regional Workshop on Census Data Processing in the ESCWA region: Contemporary technologies for data capture, methodology and practice of data."

Similar presentations


Ads by Google