Presentation is loading. Please wait.

Presentation is loading. Please wait.

UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture

Similar presentations


Presentation on theme: "UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture"— Presentation transcript:

1 UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture
Richard Lang International Manager

2 OCR Optical Character Recognition
Agenda OCR Optical Character Recognition ICR Intelligent Character Recognition DFR Dynamic Form Recognition 12/9/2018

3 OCR = optical character recognition
Technology was first invented in 1929 Gustav Tauschek obtained a patent on OCR in Germany Mechanical device that used templates First commercial system was installed at Readers Digest in 1955 Years later donated to the Smithsonian Institution Today Recognition of machine written text is now considered largely a solved problem Accuracy rates exceed 99% 12/9/2018

4 OCR Beta Systems well experienced with this recognition engines in Banks in Germany OCR A ⑁ Chair ⑀ Hook ⑂ Fork Austria OCR B Plus 12/9/2018

5 ICR Intelligent Character Recognition
The technique is far ahead of OCR because of ongoing development of ICR Handwriting recognition system Allows different styles of handwriting to be learned by a computer during / before processing to improve accuracy and recognition rates 12/9/2018

6 ICR Process: Capturing the image with Scanners
Processing by (ICR) and/or (OCR) Segmentation is a very important step Decision if the homogenous criteria belong to the foreground or to the background Human editors can do that depending on the context Compare also computer tomography: according to different results from radio waves reflected from different angels the computer can reconstruct the picture With the first step only a suitable starting point (sets of pixels) is possible The increasing process links all closer pixels (computation of valleys and peaks with high degree of confidence) 12/9/2018

7 ICR Process: Pre-processing Deskew Shift, rotate Stretch 12/9/2018

8 ICR Process: Less / More Contrast
Enhance Less / More Contrast Clean up (de-noise, halftone removal) to enable the recognition engine to give best results 12/9/2018

9 ICR Process: Feature extraction Data reduction 12/9/2018

10 Classification A one was written 90 % = 1 8 % = 7 2 % = 4 ICR Process:
90 % = 1 8 % = 7 2 % = 4 12/9/2018

11 ICR Algorithm: Neural Network Using kNN k-Nearest Neighbour SVM Support Vector Machine Minimize simultaneously the empirical classification error and maximize the geometric margin; hence they are also known as maximum margin classifiers 12/9/2018

12 ICR Process: After different classification alternatives the appropriate confidence will be provided Recognition Limitation only for most probable characters e.g. if only characters 3,6,0 are possible the engine can also be limited to this set and the results are much better Voting Machine Usability: security, efficiency and Accuracy 12/9/2018

13 Dynamic Field Recognition
No fixed position is required If form is only ½ available still ½ readable No special Forms are required No timing tracks are necessary on the forms for OMR but results are also available the same time no cleaning of LEDs in the scanner necessary Robust against vertical / horizontal stretching or shrinking (e.g. different printers) 12/9/2018

14 Dynamic Field Recognition
Recognizes: features (word as pixel cloud) boxes, lines and symbols 12/9/2018

15 Hardware- / Software - Requirement
Scanner PC Network Disc Storage only necessary if images are needed for audit purposes Software Scan Software One Recognition and Voting Software for OMR, OCR, ICR, Barcode 12/9/2018

16 Cost Comparatives in general
OMR Cost Comparatives in general OMR from image Dedicated OMR Scanner Forms Design Same Forms Production - Up to 50% More Enumerator Training Up to double the cost Scanners PC Low cost PC PC Operators Servers Cost of more/new flexibility low high 12/9/2018

17 ICR Advantages Better than: Manual keying
90 % (plus) correct keys Manual = higher substitution rate than automated recognition Time consuming Deliberate manipulation possible OMR, because OMR is space consuming OCR, because OCR is machine written and therefore of limited use 12/9/2018

18 ICR Advantages Clear accuracy for OMR because of dirt removal by software depending on the mark size and figure Can detect line and can ignore dirt Clear result 12/9/2018

19 ICR Advantages Barcode, OCR, OMR,
and ICR Recognition with one Software 12/9/2018

20 ICR Advantages Pro: Only rejected characters/fields need correction Rest of the form untouched With new technologies open for future faster, better quality With standardized correction mode Handwriting of the corresponding country will be recognized The previously mentioned advantages do not have to be repeated here again 12/9/2018

21 Thank you for your attention
12/9/2018


Download ppt "UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture"

Similar presentations


Ads by Google