Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan Ciresan.

Slides:

Advertisements

Similar presentations

Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.

Advertisements

Learning Techniques for Video Shot Detection Under the guidance of Prof. Sharat Chandran by M. Nithya.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.

DIGITAL IMAGE PROCESSING

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Simple Neural Nets For Pattern Classification

Workshop on Earth Observation for Urban Planning and Management, 20 th November 2006, HK 1 Zhilin Li & Kourosh Khoshelham Dept of Land Surveying & Geo-Informatics.

Aula 5 Alguns Exemplos PMR5406 Redes Neurais e Lógica Fuzzy.

Radial Basis Functions

1Ellen L. Walker Recognizing Objects in Computer Images Ellen L. Walker Mathematical Sciences Dept Hiram College Hiram, OH 44234

Neural Networks Part 4 Dan Simon Cleveland State University 1.

Chapter 2: Pattern Recognition

Contents Description of the big picture Theoretical background on this work The Algorithm Examples.

Prénom Nom Document Analysis: Segmentation & Layout Analysis Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

A Novel 2D To 3D Image Technique Based On Object- Oriented Conversion.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Copyright 2008 Koren ECE666/Koren Part.6a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.

Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:

Dan Simon Cleveland State University

CSC2535: Advanced Machine Learning Lecture 11a Priors and Prejudice Geoffrey Hinton.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

1 Two-dimensional Context-Free Grammars: Mathematical Formulae Recognition Daniel Průša, Václav Hlaváč Center for Machine Perception Faculty of Electrical.

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

FEATURE EXTRACTION FOR JAVA CHARACTER RECOGNITION Rudy Adipranata, Liliana, Meiliana Indrawijaya, Gregorius Satia Budhi Informatics Department, Petra Christian.

A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.

A Simple Method to Extract Fuzzy Rules by Measure of Fuzziness Jieh-Ren Chang Nai-Jian Wang.

Cascade Correlation Architecture and Learning Algorithm for Neural Networks.

CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.

S EGMENTATION FOR H ANDWRITTEN D OCUMENTS Omar Alaql Fab. 20, 2014.

A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.

COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.

Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.

Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.

EE459 Neural Networks Examples of using Neural Networks Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.

Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.

Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Faten Hussein Presented by The University of British.

CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.

Hand-written character recognition

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.

1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.

Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.

Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.

 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Big data classification using neural network

The Relationship between Deep Learning and Brain Function

Data Mining, Neural Network and Genetic Programming

Data Mining, Neural Network and Genetic Programming

Depth estimation and Plane detection

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

A Comparative Study of Convolutional Neural Network Models with Rosenblatt’s Brain Model Abu Kamruzzaman, Atik Khatri , Milind Ikke, Damiano Mastrandrea,

RGB-D Image for Scene Recognition by Jiaqi Guo

network of simple neuron-like computing elements

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

On Convolutional Neural Network

Handwritten Characters Recognition Based on an HMM Model

Analysis of Trained CNN (Receptive Field & Weights of Network)

Department of Computer Science Ben-Gurion University of the Negev

Department of Computer Science Ben-Gurion University of the Negev

Automatic Handwriting Generation

Object Detection Implementations

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan Ciresan Politehnica University of Timisoara Computer Department Timisoara, Romania

Introduction the objective of the present work is to provide an efficient technique for off-line recognition of handwritten numeral strings practical applications: –postal code recognition (in USA only, 250 millions of envelops are sorted every day) –information extraction from fields of different forms best digit recognition rate: 99.6% (Simard [18]) on MNIST best numeral string recognition rate: 96-97% (Liu, Sako and Fujisawa [12]) on a set from NIST SD 19 almost all methods use segmentation and training with negative examples

Numeral string recognition system the proposed solution uses two Convolution Neural Networks (CNNs), one for digit recognition (1CNN) and one for numeral strings composed from two digits partially overlapped (2CNN) both classifiers are trained without negative examples and the use of 2CNN completely relieves our method from the usage of segmentation by comparing the results of the classifiers it can decide if the image contains one digit or two digits evaluated on NIST SD 19 and the results are comparable with the best results from literature, even if those are using elaborate segmentation

Connected Component Analyzer (CCA) all the connected components from the input image are extracted using a recursive search in only four directions, horizontally and vertically for each component there are computed various parameters, like: size, bounding rectangle, width, height, aspect (width/height), distances to other components the CCA assigns a color to each component no segmentation is performed because not even a single connected image is split up in parts. We extract only already separated components in contrast to other methods [12, 6, 7], we can avoid complex segmentation by using a classifier for two connected digits

The Clustering Stage In order to reconstruct the image, the clustering stage performs in sequence the following four operations all small components, far from any other component are deleted any two components separated by only one pixel are concatenated if they obey several condition. Repeat this step for components separated by two or three pixels reconnecting the digit five any two components that are completely horizontally overlapped are concatenated The end result of clustering Reconnecting the digit five

The Classifier uses two Convolution Neural Networks (CNNs), one for single digit recognition and one for pairs of two partially overlapped digits the single digit CNN(1CNN) is similar with that from [18] (Simard) –trained on LNIST set: digit images –tested with LNIST set: digit images –recognition rate: 99.34% –architecture: five layer CNN (L0-input, L1 and L2 – convolution, L3 and L4 – fully connected) the two-digit CNN (2CNN) was presented in previous work [4] –trained with images automatically generated from digit images from NIST SD 19 –tested with images (21x13 pixels) –recognition rate: 94.65% –architecture: five layer CNN (L0-input, L1 and L2 – convolution, L3 and L4 – fully connected)

Test set - 3DNS (Three Digit Numeral String) in order to compare our method with the best from the field [12], we have tested our recognition system on exactly the same data set (derived from NISD SD 19) we extracted all three-digit images from 300 writers four pages written with faded ink were discarded each form contains five fields with three-digit strings, for a total of 1480 images four images that mismatch the ground truth were eliminated

Recognition system based on maximum score each component is presented to both 1CNN and 2CNN. The greatest score of each CNN are compared, and the one who generates the maximum will offer the class and the number of digits from 1476 three-digit images, 126 were incorrectly recognized, which means a 91.46% recognition rate observations: –because we trained both CNNs without negative examples, we cannot use them directly to detect if a component contains one, two or more digits –by training a NN with only positive examples, the NN will try to map any input image, even those that are incorrect, to the closest resembling class –many images containing digit 0, 1, 3, 4, 7, 8 or 9 were recognized by the 2CNN to be 01, 11, 31, 41, 71, 81 respectively 91. The problem can be attributable to the method of joining ([4]) the digits for the training set of the 2CNN. These cases will be denoted as

Translating the image on the input field of the classifiers because the images for the 2CNN are only 21x13 pixels and are not mass centered, 2CNN is more sensible to translation than 1CNN if the image is perfectly centered on the classifier, then the scores are better we tried to move the image in the input field of the classifiers 1CNN was trained with images that are mass centered. We have repeatedly applied the 1CNN on the input image translated with ±1 pixel relative to mass center and kept the greatest score of the 3×3 = 9 tests 2CNN was trained with images that were bounding-box centered. Considering the generation method [4] of the images, we placed the 18x10 pixel image on all positions in the 21x13 input of the 2CNN. There are (21 − ) × (13 − ) = 16 possibilities. The recognition rate decreased to only 71.54%

Recognition system based on differences of scores we try to avoid training of CNNs with negative examples by using the fact that the difference between the best and the second best score of a NN is very large for a correct recognition for X1 cases the condition is strengthened if at least one of the digits is 1 the recognition rate is only 76.49% applying translation for both 1CNN and 2CNN increased the recognition rate to 93.36%

Recognition system based on both maximum score and differences of scores a combination of the two previous methods for each classifier we simply add the best score with the difference between it and the second best score for X1 cases the condition is strengthened if at least one of the digits were 1 the recognition rate, 83.40%, is greater than that based on differences of scores, but smaller than that based on maximum scores applying translation for both 1CNN and 2CNN increased the recognition rate to 93.77%

Error analysis The 92 misrecognized images from the 1476 present on the 3DNS test set (93.77% recognition rate) are of four types: 4 segmentation errors, all caused by the very poorly scanned images 8 three-digit errors. This case is not addressed by current method 23 errors on 1CNN. Main causes: under-representation in the training set, very confusing images 26 errors on 2CNN. Main cause: the two digits are very different (even more than 200%) in height, and 2CNN was trained with pairs of digits that were maximum 10% different in height 63 errors generated by selecting the wrong classifier. They can be corrected by complicating the rules that select the classifier or, preferably, by training the classifiers with negative examples In order to verify the opportunity of using 2CNN, we deactivated it and we tested the recognition system only with the single digit classifier (1CNN). The recognition rate decreased with more than 4%

Conclusions we devised a new method for numeral string recognition, based on two CNN, one for digit recognition and the other for pairs of digits we eliminated the segmentation process by using a two-digit CNN training with negative examples was avoided by implementing simple rules for choosing the proper CNN our recognition rate of 93.77% is better than all previous result (Liu, Sako and Fujisawa [12]) obtained with NNs trained without negative examples, and close (3%) to the best result from [12], even if we have neither used segmentation, nor negative examples for training the networks adding the two-digit classifier to the recognition system increased the recognition rate with more than 4%

Future work implementing the CNNs on GPU for speed acceleration (more than 20x) => bigger CNNs can be trained (2CNN from 21x13 to 41x29) adding negative examples to the training process in order to increase the recognition rate and to further simplify the recognition system devising a method to solve the (rare) cases with three or more joined digits improving the automated process for generating the training set for the 2CNN (e.g. X1 cases)