Presentation is loading. Please wait.

Presentation is loading. Please wait.

UC Berkeley CS294-9 Fall 200011- 1 Document Image Analysis Lecture 11: Word Recognition and Segmentation Richard J. Fateman Henry S. Baird University of.

Similar presentations


Presentation on theme: "UC Berkeley CS294-9 Fall 200011- 1 Document Image Analysis Lecture 11: Word Recognition and Segmentation Richard J. Fateman Henry S. Baird University of."— Presentation transcript:

1 UC Berkeley CS294-9 Fall 200011- 1 Document Image Analysis Lecture 11: Word Recognition and Segmentation Richard J. Fateman Henry S. Baird University of California – Berkeley Xerox Palo Alto Research Center

2 UC Berkeley CS294-9 Fall 200011- 2 The course so far…. DIA overview, objectives, measuring success Isolated-symbol recognition: –Symbols/glyphs, models/features/classifiers –image metrics, scaling up to 100 fonts of full ASCII –last 2 lectures: ‘best’ classifier none dominates but: voting helps combinations of randomized features/ classifiers!

3 UC Berkeley CS294-9 Fall 200011- 3 Recall: we can often spot words when characters are unclear… Crude segmentation into columns, paragraphs, lines, words Bottom up, by smearing horiz/ vert … or Top down, by recursive x-y cuts what we really want is WORD recognition, most of the time.

4 UC Berkeley CS294-9 Fall 200011- 4 Recall the scenario (lecture 9) Lopresti & Zhou (1994)

5 UC Berkeley CS294-9 Fall 200011- 5 The flow goes one way No opportunity to correct failures in segmentation at symbol stage No opportunity to object to implausible text at the next stage. (providing alternative character choices gives limited flexibility)

6 UC Berkeley CS294-9 Fall 200011- 6 Recall: Character-by-Character Voting Succeeds & Fails Majority vote (the most commonly used method)

7 UC Berkeley CS294-9 Fall 200011- 7 High accuracy requires some cleverness In fact, some words, even in cleanly typeset text high- resolution scanned, have touching characters In noisy or low resolution images, adjacent characters may be nearly entirely touching or broken (or both touching and broken!) If we accept the flowchart model: we need perfect segmentation to feed the symbol recognition module If we reject the flowchart: OK, where do we go from here?

8 UC Berkeley CS294-9 Fall 200011- 8 Compare alternative approaches First clarify the word recognition problem and see how to approach it. Next we see how good a job can we do on segmentation (a fall-back when can’t use the word recognition model). Robustness might require both approaches (multiple algorithms again!)

9 UC Berkeley CS294-9 Fall 200011- 9 Formalize the word recognition problem (TKHo) Machine printed, ordinary fonts (var. width) Cut down on the variations –NOT: A word is all in same font/size [shape= feature] [we could trivialize task with one font, e.g. E-13B] Known lexicon (say 100,000 English words) 26^6 is 308 million; our lexicon is < 0.3% of this [trivialize with 1 item (check the box, say “yes”..)] Applications in mind: post office, UNLV bakeoff

10 UC Berkeley CS294-9 Fall 200011- 10 Word Recognition: Objective

11 UC Berkeley CS294-9 Fall 200011- 11 At Least Three Approaches

12 UC Berkeley CS294-9 Fall 200011- 12 In reality, a combination: Later we will find that additional processing: inter-word statistics or even natural language parsing may be incorporated in the ranking.

13 UC Berkeley CS294-9 Fall 200011- 13 Character Recognition Approach Symbol recognition is done at the character level. Contextual knowledge is used only at the ranking stage

14 UC Berkeley CS294-9 Fall 200011- 14 One error in character segmentation can distort many characters Input word image Character Segmentation Segmented and normalized characters Recognition decisions

15 UC Berkeley CS294-9 Fall 200011- 15 How to segment words to characters? Aspect ratio (fixed width, anyway) Projection profile Other tricks

16 UC Berkeley CS294-9 Fall 200011- 16 Projection Profiles

17 UC Berkeley CS294-9 Fall 200011- 17 Modified Projection profiles “and” adjacent columns

18 UC Berkeley CS294-9 Fall 200011- 18 Poor images: confusing profiles

19 UC Berkeley CS294-9 Fall 200011- 19 The argument for more context Similar shapes in different contexts, in each case different characters, or parts of them.

20 UC Berkeley CS294-9 Fall 200011- 20 Segmentation- based Approach Segment the word to characters. Extract the features from normalized charcter images. Concatenate the feature vectors to form a word feature vector. The character features are compared in the context of a word. (Works if segmentation is easy but characters are difficult to recognize in isolation)

21 UC Berkeley CS294-9 Fall 200011- 21 Segmentation- based Word Recognition Note that you would not have much chance to recognize these individual characters!

22 UC Berkeley CS294-9 Fall 200011- 22 Word-shape Analysis Approach Squeeze out extra white space, locate global reference lines (upper, top, base, bottom: Xxp ) TKH partions a word into 40 cells: 4 vertical regions and 10 horizontal. Some words have no descender or ascender regions: Hill

23 UC Berkeley CS294-9 Fall 200011- 23 Word transformations

24 UC Berkeley CS294-9 Fall 200011- 24 Detecting base, upper, top by smearing

25 UC Berkeley CS294-9 Fall 200011- 25 The 40 area partitions

26 UC Berkeley CS294-9 Fall 200011- 26 Stroke Directions

27 UC Berkeley CS294-9 Fall 200011- 27 Edges, Endpoints

28 UC Berkeley CS294-9 Fall 200011- 28 Cases Each Approach is Best At …

29 UC Berkeley CS294-9 Fall 200011- 29 Most effective features? Best: Defined locally, yet containing shape information: stroke vectors, Baird templates Less effective: very high level “holes”; very low level “pixel values” Uncertainly/ partial matching is important/ TK Ho..

30 UC Berkeley CS294-9 Fall 200011- 30 TKHo’s experiments Context: Zip code recognition Redundancy check requires reading the whole address 33850 Postal words Character recognizer trained on 19151 images 77 font samples were used to make prototypes

31 UC Berkeley CS294-9 Fall 200011- 31 TKHo’s experiments Five (10?) methods used in parallel 1.A fuzzy character template matcher plus heuristic contextual postprocessor 2.Six character recognizers 3.Segmentation-based word recognizer using pixel values 4.Word shape analyzer using strokes 5.Word shape analyzer using Baird templates

32 UC Berkeley CS294-9 Fall 200011- 32 TKHo’s experiments Many interesting conclusions.. 1.If several methods agree, they are almost always (99.6%) correct or right on second choice (100%) 2.Classifiers can be dynamically selected


Download ppt "UC Berkeley CS294-9 Fall 200011- 1 Document Image Analysis Lecture 11: Word Recognition and Segmentation Richard J. Fateman Henry S. Baird University of."

Similar presentations


Ads by Google