Presentation is loading. Please wait.

Presentation is loading. Please wait.

Projects CS 661. DAS 02, Princeton, NJ OCR Features and Systems –Degradation models, script ID, Bilingual OCR, Kannada OCR, Tamil OCR, mp versus hw checks,

Similar presentations


Presentation on theme: "Projects CS 661. DAS 02, Princeton, NJ OCR Features and Systems –Degradation models, script ID, Bilingual OCR, Kannada OCR, Tamil OCR, mp versus hw checks,"— Presentation transcript:

1 Projects CS 661

2 DAS 02, Princeton, NJ OCR Features and Systems –Degradation models, script ID, Bilingual OCR, Kannada OCR, Tamil OCR, mp versus hw checks, traffic ticket reading Handwriting Recognition –Stochastic models, holistic methods, Japanese OCR Classifiers and Learning –Multi-classifier systems Layout Analysis –Skew correction, geometric methods, test/graphics separation, logical labeling Tables and Forms –Detecting tables in HTML documents, use of graph grammars, semantics Text Extraction Indexing and Retrieval Document Engineering New Applications –CAPTCHA, Tachograph chart system, accessing driving directions

3 ICDAR 03, Edinburgh, UK Multiple Classifiers Postal Automation and Check Processing Document Understanding HMM Classifiers Segmentation Character Recognition Graphics Recognition Non-Latin Alphabets- Kanji/Chinese, Korean/Hangul, Arabic/Indian Web Documents, Video Word Recognition Image Processing Writer Identification Forms and Tables

4 Project Assignments Faisal FarooqMultilingual Digital Library- Indexing, Retrieval, Script discrimination Swapnil KhedekarMultilingual document layout analysis, OCR Kompalli SuryaMultilingual OCR using HMMs Lei HanshengOff-line and on-line handwriting integration and matching Sumit ManochaFingerprint image enhancement and minutiae extraction Lin Yu-Hsuan **Multiple Classifier Combination- multiple modlaities Praveer MansukhaniInteractive Handwriting Recognition Model Amalia RusuHandwritten Captchas Sutanto Adi **Indirect biometric data extraction from medical forms

5 Multilingual Digital Library

6 Query Result Control Panel Query Input Telugu and Arabic modules under development

7 Multilingual DIA and OCR

8 Text/Image Separation Intervals between peaks

9 Line Separation Ascenders & descenders interfering with lines Region-growing approach In Devanagari, single word is a single connected component Grow regions using horizontally adjacent components

10 Word Separation In Devanagari, all characters in a word are glued together by Shirorekha Vertical Projection profile easily separates words

11 Multilingual OCR using HMMs

12 Continuous Attributes graphemeposorientationangle Down cusp 3.0-90 o Up loop Down arc

13 Stochastic Model

14 Observations

15 Integrating Online and Offline Handwriting Recognition

16 Structural Features BAG Junction Loops Loop Turns End

17 Feature Extraction and Ordering Critical node: removal disconnects a connected component. 2-degree critical nodes keep feature ordering from left to right. Left Component Right Component Loop End Turns Junction Loops End Turns

18 Fingerprint Enhancement and Feature Extraction

19 Fingerprint Recognition Orientation maps and minutiae detection

20 Preprocessing Operations Filtering Image Enhancement Image Segmentation Correlation among fingers

21 Multiple Classifier Systems

22 Combination and Dynamic Selection [Govindaraju and Ianakiev, MCS 2000] WR 1 WR 2 WR 3 + Lexicon 1 Top 5 <55 Top 50 image Optimization problem Combinatorial explosion in arrangement of recognizers lexicon reduction levels

23 Lexicon Density [Govindaraju, Slavik, and Xue, IEEE PAMI 2002] Lexicon 1Lexicon 2 MeMe HeMemo SoMemory ToMemoirs InMellon

24 Interactive Handwriting Recognition

25 Handwriting Recognition Context Ranked Lexicon

26 Multiple Choice Question Context Ranked Lexicon

27 Interactive Models [McClelland and Rumelhart, Psychological Review, 1981] ABLETRIPTRAP A T N Words Letters Features

28 Handwritten CAPTCHAs

29 “CAPTCHAs”: Completely Automated Public Turing Tests to Tell Computers & Humans Apart challenges can be generated & graded automatically (i.e. the judge is a machine) accepts virtually all humans, quickly & easily rejects virtually all machines resists automatic attack for many years (even assuming that its algorithms are known?) NOTE: the machine administers, but cannot pass the test! L. von Ahn, M. Blum, N.J. Hopper, J. Langford, “CAPTCHA: Using Hard AI Problems For Security,” Proc., EuroCrypt 2003, Warsaw, Poland, May 4-8, 2003 [to appear].

30 Yahoo!’s present CAPTCHA: “EZ-Gimpy” Randomly pick: one English word, deformations, degradations, occlusions, colored backgrounds, etc Better tolerated by users Now used on a large scale to protect various services Weaknesses: a single typeface, English lexicon

31 Indirect Biometrics from Medical Forms Images

32 Hard biometrics Face Eye :Retina & Iris Fingerprint Hand Geometry Handwriting Speech DNA Soft biometrics Age Ethnicity Nationality Build Gait Mannerisms Writing style (Semantic) Derived biometrics Text/News WWW Indirect biometrics Driver’s License Medical Records INS Forms Fields PR Statistics NLP Cog Sc Ontology Anthropomet ry Sociology Vision Digital Lib AI Law The Biometrics Spectrum Biometric Consortium (www.biometrics.org) lists several products: –Faces (30); Fingerprints (50); Hand geometry (30); Handwriting (5); Iris (5); Multimodal (6); Retinal (2); Vein (3); Voice (22); Other (20) –NONE on soft biometrics –NONE on the fusion of indirect and derived biometrics

33 NYS EMS PCR Form NYS PCR Example Thousands are filed a day. Passed from EMS to Hospital. PCR Purpose: – Medical care/diagnosis – Legal Documentation – Quality Assurance EMS Abbreviations COPDChronic Obstructive Pulmonary Disease CHFCongestive Heart Failure D/SDextrose in Saline PIDPelvic Inflammatory Disease GSWGunshot Wound NKANo known allergies KVOKeep vein open NaCLSodium Chloride

34 Medical Text Recognition and Data Mining


Download ppt "Projects CS 661. DAS 02, Princeton, NJ OCR Features and Systems –Degradation models, script ID, Bilingual OCR, Kannada OCR, Tamil OCR, mp versus hw checks,"

Similar presentations


Ads by Google