Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lukáš Neumann and Jiří Matas Centre for Machine Perception, Department of Cybernetics Czech Technical University, Prague 1.

Similar presentations


Presentation on theme: "Lukáš Neumann and Jiří Matas Centre for Machine Perception, Department of Cybernetics Czech Technical University, Prague 1."— Presentation transcript:

1 Lukáš Neumann and Jiří Matas Centre for Machine Perception, Department of Cybernetics Czech Technical University, Prague 1

2 2015.08.25 Neumann, Matas, ICDAR 2015 Problem Introduction Contributions: 1. Text Fragments – Generalization of character detection 2. Stroke Support Pixels 3. Text-line Resegmentation Experiments Conclusion 2/22

3 2015.08.25 Neumann, Matas, ICDAR 2015  Text ◦ Anything that can be represented as a sequence of Unicode characters 3/22

4 2015.08.25 Neumann, Matas, ICDAR 2015 Scene Text (Text in the Wild)  Typically short snippet(s) of text, arbitrary script and orientation, non-standard fonts, out-of-vocabulary words, complex backgrounds  Image/video taken by a camera Text in the wild Other text 4/22

5 2015.08.25 Neumann, Matas, ICDAR 2015  Region-based methods assume: one region (connected component) represents one character  We generalize this assumption by detecting arbitrary Text Fragments in a single pass  Text Fragment ◦ Part of a Character ◦ Character ◦ Group of Characters ◦ Word 5/22

6 2015.08.25 Neumann, Matas, ICDAR 2015  Text Fragments in the majority of scripts and fonts share the “strokeness” property  This observation was popularized in the Stroke Width Transform [1] to detect individual characters [1] B. Epshtein et al., “Detecting text in natural scenes with stroke width transform,” in CVPR 2010 6/22

7 2015.08.25 Neumann, Matas, ICDAR 2015  Text Fragment candidates detected as MSERs over multiple scales and color projections  MSERs classified as either ◦ Character (character or a character part) ◦ Multi-character (group of characters or words) ◦ Background  Characters and multi-characters grouped into text lines with an efficient exhaustive search strategy [2]  Each text line is refined using a local text model  Character segmentations are recognized using an OCR module trained on synthetic data [3] [2] L. Neumann, J. Matas, “Text localization in real-world images using efficiently pruned exhaustive search,” in ICDAR 2011 [3] L. Neumann, J. Matas, “On combining multiple segmentations in scene text recognition,” in ICDAR 2013 7/22

8 2015.08.25 Neumann, Matas, ICDAR 2015  Area A of a stroke is approximately equal to the product of the stroke axis length s l and the stroke width s w  Stroke area ratio A s / A is a very discriminative feature to eliminate non-text regions  A character can be “drawn” by a circular brush with a possibly changing diameter d i equal the stroke width s w sweeping a curve S – the stroke axis.  The non-constant diameter models characters made of strokes of different width swsw w s l didi = S 8/22

9 2015.08.25 Neumann, Matas, ICDAR 2015  The stroke is “in the mind of the writer” (it could be easily found in a online handwriting setup)  The Stroke Support Pixels (SSP) is a subset of pixels that lie on the stroke (but unlike skeleton, it does not have to be continuous)  The subset is found as local maxima in a region’s distance map  Stroke area discretization effects are compensated by weighing all SSPs in a 3x3 neighborhood 9/22

10 2015.08.25 Neumann, Matas, ICDAR 2015  Less sensitive to discretization effects and scale change than standard skeleton algorithms; detection trivial 10/22

11 2015.08.25 Neumann, Matas, ICDAR 2015  Less sensitive to discretization effects and scale change than standard skeleton algorithms 11/22

12 2015.08.25 Neumann, Matas, ICDAR 201512/22

13 2015.08.25 Neumann, Matas, ICDAR 2015 Character/ FragmentMulti-characterBackground * only not rotation invariant, replaced in current work to achieve full rotation invariance 13/22

14 2015.08.25 Neumann, Matas, ICDAR 2015  Key feature in the classification  Works for wide variety of scripts and fonts  Example: MSERs 460 Character Multi-character Non-character MSER 14/22

15 2015.08.25 Neumann, Matas, ICDAR 2015  Not all characters (even their fragments or groups) are detected as MSERs  Characters which are detected can have many different segmentations (over-complete representation)  The detected Text Fragments are used to initialize a hypotheses-verification iterative process  For each text line, a local color model is iteratively updated using a standard graph cut framework  The graph cut is initialized using the stroke support pixels  Note that unlike with MSERs, the segmentation is not limited to threshold a scalar value 15/22

16 2015.08.25 Neumann, Matas, ICDAR 2015 Source ImageMSER detectionInitialization Iteration #1 Iteration #2Final iteration (#6) After every iteration: the text box position is re-estimated connected components are classified (character, multi, non- char ) stroke support pixels in green 16/22

17 2015.08.25 Neumann, Matas, ICDAR 2015 Source ImageText Fragment detection Final Segmentation Latin (stencil), Hebrew Script 17/22

18 2015.08.25 Neumann, Matas, ICDAR 2015 Source ImageText Fragment detection Final Segmentation Indian (Kanada), “Latin”, Armenian Script 18/22

19 2015.08.25 Neumann, Matas, ICDAR 2015 pipelinerecallprecisionf Proposed method 72.481.877.177.1 Yin et al. [4] 68.386.376.2 TexStar (ICDAR’13 winner) 66.488.575.9 our previous method [3]64.887.574.5 Kim (ICDAR’11 winner) 62.583.071.3 ICDAR 2013 Dataset – Text Localization [4] X.-C. Yin, X. Yin, K. Huang, and H.-W. Hao, “Robust text detection in natural scene images,”, TPAMI 2014 19/22

20 2015.08.25 Neumann, Matas, ICDAR 2015 TAXI CARLINGD8LL iMacTHE DOLLAR ARMSPANTENE PROV 20/22

21 2015.08.25 Neumann, Matas, ICDAR 2015  Arbitrary Text Fragments detected in a single pass  An efficiently calculated “strokeness” feature exploited to discriminate between Text Fragments and background clutter  Detected Text-lines are refined by re-segmentation in a hypotheses-verification iterative process that exploits local text line properties  Competitive results with the state-of-the-art  Online demo available at http://www.textspotter.org/http://www.textspotter.org/  Current and future work ◦ Rotation-invariant real-time character detector (~ 5fps) ◦ OCR accuracy improvement 21/22

22 2015.08.25 Neumann, Matas, ICDAR 2015 Thank you for your attention! http://www.TextSpotter.org/ 22/22


Download ppt "Lukáš Neumann and Jiří Matas Centre for Machine Perception, Department of Cybernetics Czech Technical University, Prague 1."

Similar presentations


Ads by Google