Presentation on theme: "Robust Scene Text Detection with Adaptive Clustering"— Presentation transcript:
1Robust Scene Text Detection with Adaptive Clustering Xu-Cheng Yin (殷绪成) PhDPattern Recognition and Information Retrieval LabDepartment of Computer Science and TechnologyUniversity of Science and Technology Beijing
2Text detection in natural scenes: Background Challenges with scene text detectionComplex backgroundVariations of font and sizeVariations of text colorVariations of illuminationVariations of text orientation
3Text detection in natural scenes: Review Previous text detection technologiesRegion-based (Sliding window-based)K. Kim et al., “Texture-based approach for text detection in images using SVM …”, TPAMI 2003.X. Chen, and A. Yuille, “Detecting and reading text in natural scenes”, CVPR 2004.T. Wang, D.J. Wu, A. Coates, and A. Y. Ng, “End-to-end text recognition with CNN”, ICPR 2012.VERY SLOW (each pixel, multi-scales)Connected components-basedB. Epshtein et al., “Detecting text in natural scenes with stroke width transform (SWT)”, CVPR2010.C. Yao, X. Bai et al., Detecting texts of arbitrary orientations in natural images…, CVPR 2012, TIP 2014.W. Huang et al., Text localization in natural images with Stroke Feature Transform …, ICCV 2013, ECCV 2014.C. Yi and Y. Tian, Text string detection from natural scenes with boundary clustering, stroke segmentation, structure modeling, …, TIP 2011, TIP 2012, CVIU 2013.Y.-F. Pan, X. Hou and C.-L. Liu, “A hybrid approach to detect and localize texts in natural scene images”, TIP 2011Frangibility in CC calculation
4Text detection in natural scenes: Review Recent MSER/ER-based text detection technologiesMaximally Stable Extremal Region (MSER/ER)Robust to color, size, illumination, resolutionMSER/ER-based detectionA specific category of CC-based methods;Use MSERs/ERs as character candidates (have become the focus of recent projects).L. Neumann and J. Matas, (Realtime) Text localization and recognition in real-world images, ACCV 2010, ICDAR 2011/2013, CVPR 2012, ICCV 2013.H.I. Koo and D.H. Kim, “Scene text detection via connected component clustering and nontext filtering”, TIP 2013.C. Shi, C. Wang, B. Xiao, et al., Scene text detection using graph model, MSER, CRF, …, Pattern Recognition Letters 2013, CVPR 2013, ICDAR 2013, TCSVT 2014, PR 2014.L. Sun, Q. Hou, et al., Robust text detection in natural scene images by Generalized Color enhanced contrasting extremal region, … ICPR 2012, ICDAR 2013, ICPR 2014.L. Kang, D. Doermann, et al., Orientation robust text line detection with HOCC…, CVPR 2014.X.-C. Yin, et al., “Robust text detection in natural scenes,” TPAMI 2014.
5Text detection in natural scenes: Motivation Main pitfalls for MSER/ER-based text detection methodsMost of the detected character candidates(MSERs/ERs) correspond to non-characters(MSER pruning)Insufficient text candidates construction with time consuming and error pruning (parameter tuning with rule-based methods)(Adaptive hierarchical clustering with metric learning)Text candidate classifier trained on an unbalanced data(Eliminating most non-text candidates with the character classifier)
6Text detection in natural scenes: System overview
7Text detection in natural scenes: Highlights A MSERs pruning algorithm with minimizing regularized variations is proposed to reduce most of the non-charactersCharacter candidates are clustered into text candidates by the adaptive single-link clustering algorithm where distance weights and threshold are learned simultaneously using a self-training metric learning algorithmThe posterior probabilities of text candidates corresponding to non-text are measured using the character classifier and text candidates with high probabilities for non-text are removed efficiently
8Text detection in natural scenes: Key technologies Character candidates extraction with MSER pruningText candidates construction with adaptive hierarchical clustering and distance metric learningText candidates elimination with the character classifier
11Text Candidates Construction Clustering-based text candidates grouping from character candidates (MSERs)Clustering:single-link clustering(elongated clusters)Similarity:weighted distanceThreshold:threshold for decidingthe number of clusters
12Adaptive single-link clustering with distance metric learning Feature space (similarity)
13Adaptive single-link clustering with distance metric learning Weighted distanceClustersHow to select weights and threshold?Rule-based: time consuming and error-proneClustering-based: a separate two-stage learning style (first weights, then threshold)Adaptive (single-link) clustering where distance weights and threshold are learned simultaneously using a self-training metric learning algorithm.
14Adaptive single-link clustering with distance metric learning (1) Sample selectionFocus on the hardest part (closest and farthest data)
15Adaptive single-link clustering with distance metric learning (2) Weight conversionOriginal:Converted: ( weights and threshold learned simultaneously)
16Adaptive single-link clustering with distance metric learning (3) Model determinationWith the logistic regression loss, a discriminative model is designed byDistance metric learning:
17Adaptive single-link clustering with distance metric learning (4) Self-training algorithm
18Text Candidates Elimination Empirical resultsIn ICDAR 2011 competition training set, only 9% of the text candidates correspond to true textHard to train an effective text classifier using such unbalanced datasetText candidates eliminationMost methods based on rules and heuristicsOur discriminative methodUse a character classifier to estimate the posterior probabilities of text candidates corresponding to non-textRemove candidates with high probability for non-text
20ExperimentsOn the ICDAR 2011 Robust Reading Competition Set (Challenge 2: Reading Text in Scene Images) 1,2,3,4Top 4 winners of ICDAR2011: Kim’s, Yi’s, TH-TextLoc System, and Neumann’sShi et al.’s (Pattern Recognition Letters, 2013(2))Neuman and Matas’s (CVPR2012)
21Experiments Speed on ICDAR 2011 data set Methods Time (s) per image RemarksOur Method0.43A Linux laptop with Intel (R) Core (TM)2 Duo 2.00GHZ CPUShi et al.’s1.5A PC with Intel (R) Core (TM)2 Duo 2.33GHZ CPUNeuman and Matas’s1.8A “standard PC”
22Experiments (ICDAR 2011 Samples) Notice the robustness against low contrast, complex background and font variations.
23ExperimentsOn a publicly multilingual (include Chinese and English) dataset 1,2,3Scheme III: constructed on ICDAR 2011 training setScheme IV: constructed on the multilingual training setPan et al.’s method (Yifeng Pan, Xinwen Hou, and Cheng-Lin Liu, IEEE TIP 20(3), 2011)Speed of Pan et al.'s method is with a PC with Pentium D 3.4GHz CPU
29ICDAR 2013 Robust Reading Competition Results Results for the ICDAR 2013 Robust Reading Competition (Challenge2: Text Localization in Real Scenes)
30ICDAR 2013 Robust Reading Competition Results Results for the ICDAR 2013 Robust Reading Competition (Challenge1: Text Localization in Born-Digital Images (Web and ))
31Main References Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao, “Robust text detection in natural scene images,” IEEE Trans. Pattern Analysis and Machine Intelligence (TPAMI), 36(5): , 2014. Xu-Cheng Yin, Wei-Yi Pei, Jun Zhang, and Hong-Wei Hao, “Multi-orientation scene text detection with adaptive clustering”, IEEE TPAMI, submitted (with revision), 2014. Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, and Hong-Wei Hao, “Accurate and robust text detection: A step-in for text retrieval in natural scene images”, ACM SIGIR’13. Xuwang Yin, Xu-Cheng Yin, et al., “Effective text localization in natural scene images with MSER, geometry-based grouping and AdaBoost”, IAPR ICPR’12.
32Discussions and Questions Industrial R&DMultilingual text detection and recognition in natural scenes, web images, ubiquitous documents and videosAcademic ResearchEnd-to-end text recognition and retrieval in natural scenes and web images with Feedforward-Feedback