Symbol Recognition Contest 2009 current status Philippe Dosch 1, Ernest Valveny 2 and Mathieu Delalandre 2 1 LORIA, QGAR team, Nancy, France 2 CVC, DAG Group, Barcelona, Spain GREC 2009 Workshop La Rochelle, France Thursday 23th of July 2009
Introduction Context Many recognition methods exist, sometimes very ad-hoc and domain dependent Which are the most generic/robust ones? Able to recognize a large variety of data, from different application domains Robust to common noise and distortion found in documents Easy to implement and/or tune Objective: Measure their performance and robustness under different criteria and kinds of noise
Introduction Past recognition contests: ICPR’00, GREC’2003, GREC’2005 and GREC’2007 Contest evolution ICPR’00, GREC’2003, GREC’2005 segmented technical symbols GREC2007segmented logos GREC2009whole drawings (i.e. symbol localization) Agenda by 31th of Julytraining datasets will be available by OctoberThe contest will be run online Interested people are invited to participate
Introduction Concerned data
Plan Recognition datasets (segmented technical symbols and logos) Localization datasets (drawings, queries) Conclusions
Recognition Datasets images/class All classes included Basic dataset Scalability … Subsets of the basic dataset with increasing number of classes (25, 50, 100, 150) Geometric transformations … Application of rotation and scaling to the images of the basic dataset Image degradations … Application of increasing levels of degradation to the images of the basic dataset (for each kind of degradation) …
Recognition Datasets noise A noise Bnoise E
Recognition Datasets DomainNº of symbol models Nº of images / symbol models SymbolsNoise Technical Rotation Technical Scaling Technical Rotation and Scaling Technical Noise A (1-5) Technical Noise B (1-5) Technical Noise E (1-6) Logos Rotation Logos Scaling Logos Rotation and Scaling Logos Noise A (1-5) Logos Noise B (1-5) Logos Noise E (1-6) Training sets
Localization Datasets c2c2 c1c1 M1M1 M2M2 M3M3 M4M4 C1C1 C2C2 C3C3 C4C4 L1L1 θ1θ1 p1p1 L2L2 θ2θ2 p2p2 p L bounding box and control point alignment symbol model loaded symbol Symbol Models Building Engine (2) run (3) display (1) edit Background Image
Localization Datasets
Groundtruth Generation of queries 1. Random selection of a document 2. Radom selection of a symbol 3. Random crop Background Dataset 1 Random selection of a test image with groundtruth Background Dataset 2 Background Dataset n --- Image degradation Contest Dataset 1 Contest Dataset 2 Contest Dataset n ---
Localization Datasets Level 1 Level 2Level 3
Localization Datasets TypeDomainNº of symbol models ImagesSymbolsNoise DrawingsArchitectural ideal DrawingsArchitectural level 1 DrawingsArchitectural level 2 DrawingsArchitectural level DrawingsElectrical20 246ideal DrawingsElectrical20 274level 1 DrawingsElectrical20 237level 2 DrawingsElectrical20 322level QueriesBoth36900 NA 900
Conclusions New feature of the contest, localization datasets Remaining work, performance characterization for localization simple method (e.g. bounding box overlapping) Agenda by 31th of Julytraining datasets will be available by OctoberThe contest will be run online Interested people are invited to participate, please contact us: