High-level Component Filtering for Robust Scene Text Detection

Slides:

Advertisements

Similar presentations

Distinctive Image Features from Scale-Invariant Keypoints

Advertisements

On Combining Multiple Segmentations in Scene Text Recognition

TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.

Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.

Special Topic on Image Retrieval Local Feature Matching Verification.

Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA

São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.

Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.

Hue-Grayscale Collaborating Edge Detection & Edge Color Distribution Space Jiqiang Song March 6 th, 2002.

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Robust and large-scale alignment Image from

Text Detection in Video Min Cai Background  Video OCR: Text detection, extraction and recognition  Detection Target: Artificial text  Text.

Multi-Class Object Recognition Using Shared SIFT Features

Multiple Human Objects Tracking in Crowded Scenes Yao-Te Tsai, Huang-Chia Shih, and Chung-Lin Huang Dept. of EE, NTHU International Conference on Pattern.

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.

Stepan Obdrzalek Jirı Matas

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

Con-Text: Text Detection Using Background Connectivity for Fine-Grained Object Classification Sezer Karaoglu, Jan van Gemert, Theo Gevers 1.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Lecture 6: Feature matching and alignment CS4670: Computer Vision Noah Snavely.

Overview Introduction to local features

Bag of Video-Words Video Representation

Learning Based Hierarchical Vessel Segmentation

EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

Local invariant features Cordelia Schmid INRIA, Grenoble.

End-to-End Text Recognition with Convolutional Neural Networks

Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.

Gili Werner. Motivation Detecting text in a natural scene is an important part of many Computer Vision tasks.

Phase Congruency Detects Corners and Edges Peter Kovesi School of Computer Science & Software Engineering The University of Western Australia.

Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.

Chao-Yeh Chen and Kristen Grauman University of Texas at Austin Efficient Activity Detection with Max- Subgraph Search.

Local invariant features Cordelia Schmid INRIA, Grenoble.

NTIT IMD 1 Speaker: Ching-Hao Lai( 賴璟皓 ) Author: Hongliang Bai, Junmin Zhu and Changping Liu Source: Proceedings of IEEE on Intelligent Transportation.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

Harris Corner Detector & Scale Invariant Feature Transform (SIFT)

P ROBING THE L OCAL -F EATURE S PACE OF I NTEREST P OINTS Wei-Ting Lee, Hwann-Tzong Chen Department of Computer Science National Tsing Hua University,

A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )

Project 3 questions? Interest Points and Instance Recognition Computer Vision CS 143, Brown James Hays 10/21/11 Many slides from Kristen Grauman and.

Text From Corners: A Novel Approach to Detect Text and Caption in Videos Xu Zhao, Kai-Hsiang Lin, Yun Fu, Member, IEEE, Yuxiao Hu, Member, IEEE, Yuncai.

Lukáš Neumann and Jiří Matas Centre for Machine Perception, Department of Cybernetics Czech Technical University, Prague 1.

Regionlets for Generic Object Detection IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 37, NO. 10, OCTOBER 2015 Xiaoyu Wang, Ming.

Object Recognition by Discriminative Combinations of Line Segments and Ellipses Alex Chia ^˚ Susanto Rahardja ^ Deepu Rajan ˚ Maylor Leung ˚ ^ Institute.

776 Computer Vision Jan-Michael Frahm Spring 2012.

Scene Text Extraction Using Focus of Mobile Camera Egyul Kim, SeongHun Lee, JinHyung Kim Artificial Intelligence & Pattern Recognition Lab, KAIST, Korea.

Course 5 Edge Detection. Image Features: local, meaningful, detectable parts of an image. edge corner texture … Edges: Edges points, or simply edges,

More sliding window detection: Discriminative part-based models

Preliminary Transformations Presented By: -Mona Saudagar Under Guidance of: - Prof. S. V. Jain Multi Oriented Text Recognition In Digital Images.

Image Quality Measures Omar Javed, Sohaib Khan Dr. Mubarak Shah.

1 Shape Descriptors for Maximally Stable Extremal Regions Per-Erik Forss´en and David G. Lowe Department of Computer Science University of British Columbia.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.

Recent developments in object detection

Deeply learned face representations are sparse, selective, and robust

TP12 - Local features: detection and description

Nonparametric Semantic Segmentation

Mixture of SVMs for Face Class Modeling

Local features: detection and description May 11th, 2017

Lecture 5 Smaller Network: CNN

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Text Detection in Images and Video

RCNN, Fast-RCNN, Faster-RCNN

Feature descriptors and matching

Jie Chen, Shiguang Shan, Shengye Yan, Xilin Chen, Wen Gao

Presented by Xu Miao April 20, 2005

Human-object interaction

An introduction to Machine Learning (ML)

Presentation transcript:

High-level Component Filtering for Robust Scene Text Detection Weilin Huang (黄韡林) Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences Multimedia Laboratory, The Chinese University of Hongkong

Outline ♦ Connected Component and Sliding-Window Methods ■ Introduction ♦ Connected Component and Sliding-Window Methods ♦ Stroke Width Transform (SWT) ♦ SWT based Text Detection ■ Stroke Feature Transform ♦ Colour Information on Text Stroke Detection ■ Text Covariance Descriptor (TCD) ♦ TCD for Component Filtering ♦ TCD for Text-line Filtering ■ Convolution Neural Network Induced MSER Trees ♦ Maximally Stable Extremal Regions (MSERs) ♦ CNN for Component Classification ♦ Component Splitting

I. Introduction: Text Detection Methods ■ Connected Component Methods ♦ Step 1: Separate text and non-text information at pixel-level ♦ Step 2: Group text pixels to construct character components ♦ Advantages: fast computing ♦ Limitations: not robust, erroneous components, many false alarms ♦ Examples: SWT, MSERs ■ Sliding-Window Methods ♦ Step 1: Train a text classifier ♦ Step 1I: Scan a sliding sub-window though the image ♦ Advantages: high-level text classification ♦ Limitations: computing costly, difficulty in feature design

I. Introduction: Stroke Width Transform(1) ■ Example SWT Operator Stroke width constraint: |Op - Oq|<λ SWT Map ■ Problem 1: Erroneous connection ■ Low-level pixel filter Connecting multiple characters ■ Canny edges Separating single characters ■ Gradient orientation for ray tracking ■ Problem 2: many non-text components ■ Compute stroke width bwt. paired pixels

I. Introduction: SWT based Text Detection ■ Complete Processing: Comp. filtering SWT Heuristic Filtering Random Forest classifier (heuristic and geometric features) Our Improvements TL filtering GP More powerful high-level filters Text components Grouped text lines Final text lines C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu, Detecting texts of arbitrary orientations in natural images, CVPR, 2012.

Stroke Width Constraint: Neighborhood Coherency Constraint II. Stroke Feature Transform (SFT) (1) ■ Stroke Feature Transform(SFT): Stroke Width Constraint: |Op - Oq|<λ1 Stroke Color Constraint: |Cp - Cq|<λ2 Stroke width constraint: |Op - Oq|<λ Neighborhood Coherency Constraint SWT SFT Stroke Width Map Output Stroke Width Map Stroke Color Map

II. Stroke Feature Transform (SFT) (2) ■ SFT vs SWT  Mitigate inter-component connections  Enhance intra-component connections  Better character candidate detection  Higher Recall

…… II. Stroke Feature Transform (SFT) (3) ■ Limitation: not robust by low-level operation  Text-like outliers ■ Bricks ■ Windows ■ Leaves …… Many false alarms  Low Precision  Heuristic filter not work well  High-level learning based filtering required

III. Text Covariance Descriptor (TCD) (1)  Each pixel represented by d-features  TCD is computed as:  U is a given region:  Multiple features are incorporated in a matrix

III. Text Covariance Descriptor (TCD) (2) ■ TCD for components  Pixel coordinates in X- and Y-axis Encode spatial information  Pixel intensities and RGB values Color uniformity 9x9 Covariance Features  Stroke width and distance values Stroke width/distance consistency  Edge information by Canny detector Stroke spatial layout ■ Totally 9 features to construct a 9 x 9 matrix ■ Transform to a 45-dim feature vector ■ Get component confident maps by RF classifier

III. Text Covariance Descriptor (TCD) (3) ■ TCD for Text-line  Mean properties of component features Uniformity  Coordinates of component centers 12x12 Covariance Features Spatial information  Heights of components Consistency  Horizontal distances between components Text spatial layout  16-bins HOG on edge pixels 16x16 Covariance Features Orientated spatial features ■ Get Text-line Confident Maps by RF classifier

III. Text Covariance Descriptor (TCD) (4) ■ Component and text-line confidence maps

III. Text Covariance Descriptor (TCD) (5) ■ Top: TCD for component; Middle: TCD for text-line; Bottom: detection

III. Text Covariance Descriptor (TCD) (5) ■ Results ■ Failure Cases W. Huang, Z. Lin, J. Yang and J. Wang, Text localization in natural images using stroke feature transform and text covariance descriptors, ICCV, 2013.

Convolution Neural Network Induced MSER Trees (1) ■ Maximally Stable Extremal Region (MSER) Tree L. Neumann and J. Matas. Text localization in real-world images using efficiently pruned exhaustive search, ICDAR, 2011. ■ MSER vs SWT ♦ Detect low-quality texts  Higher Recall ♦ Generate more non-text components  Lower Precision ♦ Require a more powerful classifier/filter

Convolution Neural Network Induced MSER Trees (2) ■ A Two-layers Convolution Neural Network (CNN) T. Wang, D. J. Wu, A. Coates and A. Y. Ng, End-to-end text recognition with convolutional neural networks, ICPR, 2012.

Convolution Neural Network Induced MSER Trees (3) ■ Training Data: Synthetic 15000 samples ■ Data Transformation ♦ Fixed-size of 32x32 ♦ Horizontal warp ♦ Include additional image context

Convolution Neural Network Induced MSER Trees (3) ■ CNN Confident Scores MSERs CNN Scores Comp. Splitting Detection

Convolution Neural Network Induced MSER Trees (4) ■ Component Splitting Erroneously connected Component ■ High aspect ratio ■ Positive conf. score ■ Leaf of the MESR tree or conf. score> all children

Convolution Neural Network Induced MSER Trees (5) ■ Comparisons with SFT-TCD

Convolution Neural Network Induced MSER Trees (6) ■ Results

Convolution Neural Network Induced MSER Trees (7) ■ Results on the ICDAR 2011 Database W. Huang, Y. Qiao, and X. Tang, Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees, ECCV, 2014.

The End Thank You!