Presentation is loading. Please wait.

Presentation is loading. Please wait.

End-to-End Text Recognition with Convolutional Neural Networks

Similar presentations


Presentation on theme: "End-to-End Text Recognition with Convolutional Neural Networks"— Presentation transcript:

1 End-to-End Text Recognition with Convolutional Neural Networks
Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng Computer Science Department Stanford University * Denotes equal contribution

2 Scene Text Recognition Overview
Text “in the wild” are hard to recognize Wide range of variations in backgrounds, textures, fonts, and lighting conditions ICDAR 2003 Dataset S. Lucas et al., 2003 Street View Text Dataset K.Wang et al., 2011 Tao Wang

3 Detection/Classification
Two-Stage Framework Detection/Classification High-level Inference “HOTEL” Tao Wang

4 Classification and detection
Works Classification and detection High-level inference Weinman et al., 2008 Appearance + Geometry Semi-Markov CRF K. Wang et al., 2011 HOG + Random Ferns Pictorial Structure Mishra et al., 2012 HOG + SVM with RBF Kernel CRF + N-gram model Neumann and Matas, 2012 MSER + SVM with RBF Kernel Exhaustive Graph Search Tao Wang

5 Classification and detection
High-level inference Most other approaches Hand-designed features + off-the-shelf classifier Graph based inference models Our approach Learnt features layer CNN Simple off-the-shelf heuristics Tao Wang

6 SOTA Various Benchmarks SOTA SOTA on ICDAR Detection/Classification
End-to-end system after high-level inference ICDAR 62-way cropped character classification ICDAR and SVT end-to-end text recognition SOTA Lexicon ICDAR and SVT Cropped word recognition SOTA SOTA on ICDAR Tao Wang

7 Unsupervised Feature Learning
Contrast Normalization + ZCA whitening K-Means Coates et al., 2011 Tao Wang

8 ~10K parameters for detection
~50K parameters for classification L2-SVM Classifier √ Text × Non-Text Large representation but not enough data. Overfitting? 96 256 Spatial Pooling Spatial Pooling Convolution Convolution 1st layer 2nd layer Backpropagation Tao Wang

9 Java.Font + Natural backgrounds
Synthetic Data Real Real Data Unrealistic Synthetic Data Synthetic Java.Font + Natural backgrounds Color Statistics Synthetic “hard negatives” Tao Wang

10 Detector Performance Tao Wang

11 Text Line Bounding boxes
Candidate spaces Tao Wang

12 Classifier Performance
62-way classification accuracy on ICDAR cropped characters 83.9 Higher is better Accuracy(%) (on ICDAR-Sample characters) Tao Wang

13 Tao Wang

14 Sliding window position
Char Class Sliding window position Tao Wang

15 Word Recognition max ∑ Lexicon: … MAKE S E R I E S SERIES ESTATE
POKER S E R I E S -5.45 7.82 -1.74 -9.02 max ∑ Tao Wang

16 Cropped Word Recognition Accuracy
Higher is better Cropped Words Benchmarks Tao Wang

17 Candidate spaces generated by detector Tao Wang

18 Tao Wang

19 End-to-end text recognition results
F-Score Higher is better End-to-end Benchmarks Tao Wang

20 Sample Output Images from SVT
Tao Wang

21 Sample Output Images from ICDAR-FULL
Tao Wang

22 c Hunspell -- “confidence margin” LEXICON POSE POST Suggested Words
PEOPLE PISTOL Suggested Words POS POST Our F-score: 0.38 Neumann and Matas, 2010: 0.40 Hunspell PEOST PEOSTEL Tao Wang

23 Conclusion Learnt features + 2-layer CNN for+ character detection and classification Simple heuristics to build end-to-end scene text recognition system State-of-the-art performances on - ICDAR cropped character classification - ICDAR cropped word recognition - Lexicon based end-to-end recognition on ICDAR and SVT Extensible to more general lexicon with off-the-shelf spelling checker Tao Wang

24 Questions? Tao Wang


Download ppt "End-to-End Text Recognition with Convolutional Neural Networks"

Similar presentations


Ads by Google