Download presentation
Presentation is loading. Please wait.
Published byRonald Price Modified over 9 years ago
1
(Off-Line) Cursive Word Recognition Tal Steinherz Tel-Aviv University
2
2 Cursive Word Recognition Preprocessing Recognition Post Processing Feature Extraction Segmentation
3
3 Preprocessing Skew correction Slant correction Smoothing Reference line finding
4
4
5
5 Segmentation Motivation Given a 2-dimensional image and a model that expects a 1-dimensional input signal, one needs to derive an ordered list of features. Fragmentation is another alternative where the resulting pieces have no literal meaning.
6
6 Segmentation Dilemma To segment or not to segment ? That’s the question! Sayre’s paradox: “To recognize a letter, one must know where it starts and where it ends, to isolate a letter, one must recognize it first”.
7
7 Recognition Model What is the basic (atomic) model? –word (remains identical through training and recognition) –letter (concatenated on demand during recognition) What are the training implications? –specific = total cover (several samples for each word) –dynamic = brick cover (samples of various words that include all possible characters=letters)
8
8 Basic Word Model 1st letter sub-model... i th letter sub-model last letter sub-model...
9
9 Segmentation-Free In a segmentation-free approach recognition is based on measuring the distance between observation sequences
10
10 Segmentation-Free - continue The most popular metric is Levenshtein’s Edit Distance, where a transformation between sequences is done by atomic operations: insertion, deletion and substitution - associated with different costs Implementations: Dynamic programming, HMM
11
11 Segmentation-Free (demo) Each column was translated into a feature vector. Two types of features: –number of zero-crossing –gradient of the word’s curve
12
12
13
13 1)The gradient of the word’s curve at a given pixel column
14
14 Letter sub-HMM components Normal Transition Null Transition
15
15 Letter sub-HMM Normal Transition Null Transition
16
16 Segmentation-Based In a segmentation-based approach recognition is based on complete bipartite match-making between blocks of primitive segments and letters of a word
17
17 Segmentation-Based - continue The best match is found by the dynamic programming Viterbi algorithm An implementation by an HMM is very popular and enhances the model capabilities
18
18 Segmentation-Based (demo) First the word is heuristically segmented. It is preferable to over segment a character. Nevertheless a character must not span more than a predefined number of segments. Each segment is translated into a feature vector.
19
19 Features in Segments (demo) Global features: –ascenders, descenders, loops, i dots, t strokes Local features: –X crossings, T crossings, end points, sharp curvatures, parametric strokes Non-symbolic features: –pixel moments, pixel distributions, contour condings
20
20
21
21
22
22 Letter sub-HMM (maximum 4 segments per character) 1 23 1 4
23
23 Two-Letter joined sub-HMM (0.5-3 segments per character) LMRMRLL
24
24 Pattern Recognition Issues Lexicon size: –small (up to 100 words) –limited (between 100 to 1000 words) –infinite (more than 1000 words)
25
25 Word Model Extension A new approach to practice recognition? –path discriminant (a single general word model, a path=hypothesis per word) ‘a’ sub-HMM... ‘m’ sub- HMM ‘z’ sub-HMM...
26
26 Online vs. Off-Line Online – captured by pen-like devices. the input format is a two-dimensional signal of pixel locations as a function of time (x(t),y(t)). Off-line – captured by scanning devices. the input format is a two-dimensional image of gray-scale colors as a function of location I(m * n). strokes have significant width.
27
27 Online vs. Off-Line (demo)
28
28 Online vs. Off-Line (cont.) In general online classifiers are superior to off-line classifiers because some valuable strokes are blurred in the static image. Sometimes temporal information (stroke order) is also a must in order to distinguish between similar objects.
29
29 Online Weaknesses Sensitivity to stroke order, stroke number and stroke characteristics variations: Similar shapes that resemble in the image domain might be produced by different sets of strokes. Many redundant strokes (consecutive superfluous pixels) that are byproducts of the continuous nature of cursive handwriting. Incomplete (open) loops are more frequent.
30
30
31
31 Off-Line can improve Online Sometimes the off-line representation enables one to recognize words that are not recognized given the online signal. An optimal system would combine online and off-line based classifiers.
32
32 The desired integration between online and off-line classifiers Having a single word recognition engine to practice both the online and off-line data. It requires an off-line to online transformation to extract an alternative list of strokes that preserves off-line like features while being consistent in order.
33
33 Online signal Recognition results The “pseudo-online” transformation Projection to image Domain “Painting” (thickening the strokes) Bitmap image Stroke width=1 Real static image Stroke width>1 Online signal Pseudo-online representation Integration by some combination scheme C l a s s i f i c a t i o n Online classifiers Pseudo-online classifiers Online recognition engine Online classification outputs Pseudo-online classification outputs
34
34 Cursive Handwriting Terms Axis - The main subset of strokes that assemble the backbone, which is the shortest path from left to right including loops on several occasions. Tarsi - The other subsets of connected strokes that produce branches, which are hang above (in case of ascenders) or below (in case of descenders) the axis.
35
35 The Pseudo-Online Transformation Follow the skeleton of the axis from the left most pixel until reaching the first intersection with a tarsus. Surround the tarsus by tracking its contour until returning back to the intersection point we started from. Continue along the axis to the next intersection with a tarsus, and so on until the right most pixel is reached. Loops that are encountered along the axis are also surrounded completely.
36
36 Computing the axis’s skeleton
37
37 Computing the axis’s skeleton (cont.)
38
38 Computing the axis’s skeleton (cont.)
39
39 Processing the tarsi
40
40 Processing the tarsi (cont.)
41
41 Handling i-dots
42
42
43
43 Experimental Setup The online word recognition engine of Neskovic et al. – satisfies Trainability and Versatility. A combination of 6/12 online and pseudo- online classifiers. Several combination schemes – majority vote, max rule, sum rule. An extension of the HP’s dataset that can be found in the UNIPEN collection.
44
44 Experimental Setup (cont.) Different Training sets of 46 writers. Disjoint validation sets of 9 writers. Disjoint test set of 11 writers. The lexicon contains 862 words.
45
45 Experimental Results for 6 Classifiers
46
46 Experimental Results for 12 Classifiers
47
47 Result Analysis Word level - in 110 word classes (12.8%) at least 7 word samples (10.6%) were correctly recognized only by the combination with the pseudo-online classifiers. Writer level – for 12 writers (18.2%) at least 65 of the words they produced (7.5%) were correctly recognized only by the combination with the pseudo-online classifiers.
48
48 Result Analysis (cont.) 909 of the input words (5.9%) were correctly recognized by at least one pseudo-online classifier and neither one of the 12 online classifiers. 357 of the input words (2.3%) were correctly recognized by at least 4 of the 12 pseudo-online classifiers and neither one of the 12 online classifiers. For 828 of the input words (5.3%) the difference between the number of pseudo-online and online classifiers that correctly recognized them was 6 or more.
49
49 Conclusions The pseudo-online representation does add information that cannot be obtained by optimizing \ extending a combination of online classifiers only.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.