Presentation is loading. Please wait.

Presentation is loading. Please wait.

High-level Component Filtering for Robust Scene Text Detection

Similar presentations


Presentation on theme: "High-level Component Filtering for Robust Scene Text Detection"— Presentation transcript:

1 High-level Component Filtering for Robust Scene Text Detection
Weilin Huang (黄韡林) Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences Multimedia Laboratory, The Chinese University of Hongkong

2 Outline ♦ Connected Component and Sliding-Window Methods
■ Introduction ♦ Connected Component and Sliding-Window Methods ♦ Stroke Width Transform (SWT) ♦ SWT based Text Detection ■ Stroke Feature Transform ♦ Colour Information on Text Stroke Detection ■ Text Covariance Descriptor (TCD) ♦ TCD for Component Filtering ♦ TCD for Text-line Filtering ■ Convolution Neural Network Induced MSER Trees ♦ Maximally Stable Extremal Regions (MSERs) ♦ CNN for Component Classification ♦ Component Splitting

3 I. Introduction: Text Detection Methods
■ Connected Component Methods ♦ Step 1: Separate text and non-text information at pixel-level ♦ Step 2: Group text pixels to construct character components ♦ Advantages: fast computing ♦ Limitations: not robust, erroneous components, many false alarms ♦ Examples: SWT, MSERs ■ Sliding-Window Methods ♦ Step 1: Train a text classifier ♦ Step 1I: Scan a sliding sub-window though the image ♦ Advantages: high-level text classification ♦ Limitations: computing costly, difficulty in feature design

4 I. Introduction: Stroke Width Transform(1)
■ Example SWT Operator Stroke width constraint: |Op - Oq|<λ SWT Map ■ Problem 1: Erroneous connection ■ Low-level pixel filter Connecting multiple characters ■ Canny edges Separating single characters ■ Gradient orientation for ray tracking ■ Problem 2: many non-text components ■ Compute stroke width bwt. paired pixels

5 I. Introduction: SWT based Text Detection
■ Complete Processing: Comp. filtering SWT Heuristic Filtering Random Forest classifier (heuristic and geometric features) Our Improvements TL filtering GP More powerful high-level filters Text components Grouped text lines Final text lines C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu, Detecting texts of arbitrary orientations in natural images, CVPR, 2012.

6 Stroke Width Constraint: Neighborhood Coherency Constraint
II. Stroke Feature Transform (SFT) (1) ■ Stroke Feature Transform(SFT): Stroke Width Constraint: |Op - Oq|<λ1 Stroke Color Constraint: |Cp - Cq|<λ2 Stroke width constraint: |Op - Oq|<λ Neighborhood Coherency Constraint SWT SFT Stroke Width Map Output Stroke Width Map Stroke Color Map

7 II. Stroke Feature Transform (SFT) (2)
■ SFT vs SWT  Mitigate inter-component connections  Enhance intra-component connections  Better character candidate detection  Higher Recall

8 …… II. Stroke Feature Transform (SFT) (3)
■ Limitation: not robust by low-level operation  Text-like outliers ■ Bricks ■ Windows ■ Leaves …… Many false alarms  Low Precision  Heuristic filter not work well  High-level learning based filtering required

9 III. Text Covariance Descriptor (TCD) (1)
 Each pixel represented by d-features  TCD is computed as:  U is a given region:  Multiple features are incorporated in a matrix

10 III. Text Covariance Descriptor (TCD) (2)
■ TCD for components  Pixel coordinates in X- and Y-axis Encode spatial information  Pixel intensities and RGB values Color uniformity 9x9 Covariance Features  Stroke width and distance values Stroke width/distance consistency  Edge information by Canny detector Stroke spatial layout ■ Totally 9 features to construct a 9 x 9 matrix ■ Transform to a 45-dim feature vector ■ Get component confident maps by RF classifier

11 III. Text Covariance Descriptor (TCD) (3)
■ TCD for Text-line  Mean properties of component features Uniformity  Coordinates of component centers 12x12 Covariance Features Spatial information  Heights of components Consistency  Horizontal distances between components Text spatial layout  16-bins HOG on edge pixels 16x16 Covariance Features Orientated spatial features ■ Get Text-line Confident Maps by RF classifier

12 III. Text Covariance Descriptor (TCD) (4)
■ Component and text-line confidence maps

13 III. Text Covariance Descriptor (TCD) (5)
■ Top: TCD for component; Middle: TCD for text-line; Bottom: detection

14 III. Text Covariance Descriptor (TCD) (5)
■ Results ■ Failure Cases W. Huang, Z. Lin, J. Yang and J. Wang, Text localization in natural images using stroke feature transform and text covariance descriptors, ICCV, 2013.

15 Convolution Neural Network Induced MSER Trees (1)
■ Maximally Stable Extremal Region (MSER) Tree L. Neumann and J. Matas. Text localization in real-world images using efficiently pruned exhaustive search, ICDAR, 2011. ■ MSER vs SWT ♦ Detect low-quality texts  Higher Recall ♦ Generate more non-text components  Lower Precision ♦ Require a more powerful classifier/filter

16 Convolution Neural Network Induced MSER Trees (2)
■ A Two-layers Convolution Neural Network (CNN) T. Wang, D. J. Wu, A. Coates and A. Y. Ng, End-to-end text recognition with convolutional neural networks, ICPR, 2012.

17 Convolution Neural Network Induced MSER Trees (3)
■ Training Data: Synthetic samples ■ Data Transformation ♦ Fixed-size of 32x32 ♦ Horizontal warp ♦ Include additional image context

18 Convolution Neural Network Induced MSER Trees (3)
■ CNN Confident Scores MSERs CNN Scores Comp. Splitting Detection

19 Convolution Neural Network Induced MSER Trees (4)
■ Component Splitting Erroneously connected Component ■ High aspect ratio ■ Positive conf. score ■ Leaf of the MESR tree or conf. score> all children

20 Convolution Neural Network Induced MSER Trees (5)
■ Comparisons with SFT-TCD

21 Convolution Neural Network Induced MSER Trees (6)
■ Results

22 Convolution Neural Network Induced MSER Trees (7)
■ Results on the ICDAR 2011 Database W. Huang, Y. Qiao, and X. Tang, Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees, ECCV, 2014.

23 The End Thank You!


Download ppt "High-level Component Filtering for Robust Scene Text Detection"

Similar presentations


Ads by Google