Handwriting Recognition CPSC UTC/CSE. Handprint Recognition aims to design systems which are able to recognize handwriting of natural language.

Slides:



Advertisements
Similar presentations
Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.
Advertisements

Applications of one-class classification
QR Code Recognition Based On Image Processing
Word Spotting DTW.
電腦視覺 Computer and Robot Vision I
Face Recognition. Introduction Why we are interested in face recognition? Why we are interested in face recognition? Passport control at terminals in.
My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
Detecting Faces in Images: A Survey
Prénom Nom Document Analysis: Document Image Processing Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Each pixel is 0 or 1, background or foreground Image processing to
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
DIGITAL GRAPHICS & ANIMATION Complete LESSON 4 ADDING TEXT TO GRAPHICS.
December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.
Chapter 2: Pattern Recognition
Prénom Nom Document Analysis: Segmentation & Layout Analysis Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Data Mining on NIJ data Sangjik Lee. Unstructured Data Mining Text Keyword Extraction Structured Data Base Data Mining Image Feature Extraction Structured.
Document Image Analysis CSE 717 An Introduction. Document Image Analysis  DIA is the theory and practice of recovering the symbol structures of digital.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
Programming Logic and Design, Introductory, Fourth Edition1 Understanding Computer Components and Operations (continued) A program must be free of syntax.
E.G.M. PetrakisBinary Image Processing1 Binary Image Analysis Segmentation produces homogenous regions –each region has uniform gray-level –each region.
Graphology / Handwriting Analysis
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Track, Trace & Control Solutions © 2010 Microscan Systems, Inc. Machine Vision Tools for Solving Auto ID Applications Part 3 of a 3-part webinar series:
Introduction --Classification Shape ContourRegion Structural Syntactic Graph Tree Model-driven Data-driven Perimeter Compactness Eccentricity.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture Notes by Neşe Yalabık Spring 2011.
AdvisorStudent Dr. Jia Li Shaojun Liu Dept. of Computer Science and Engineering, Oakland University 3D Shape Classification Using Conformal Mapping In.
Classification with Hyperplanes Defines a boundary between various points of data which represent examples plotted in multidimensional space according.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Digital Image Processing, 2nd ed. © 2002 R. C. Gonzalez & R. E. Woods Chapter 11 Representation & Description Chapter 11 Representation.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
Topic 10 - Image Analysis DIGITAL IMAGE PROCESSING Course 3624 Department of Physics and Astronomy Professor Bob Warwick.
S EGMENTATION FOR H ANDWRITTEN D OCUMENTS Omar Alaql Fab. 20, 2014.
COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.
September 23, 2014Computer Vision Lecture 5: Binary Image Processing 1 Binary Images Binary images are grayscale images with only two possible levels of.
Digital Image Processing CCS331 Relationships of Pixel 1.
G52IVG, School of Computer Science, University of Nottingham 1 Edge Detection and Image Segmentation.
Handwritten Recognition with Neural Network Chatklaw Jareanpon, Olarik Surinta Mahasarakham University.
Handwritten Hindi Numerals Recognition Kritika Singh Akarshan Sarkar Mentor- Prof. Amitabha Mukerjee.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Introduction --Classification Shape ContourRegion Structural Syntactic Graph Tree Model-driven Data-driven Perimeter Compactness Eccentricity.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
EE 7740 Fingerprint Recognition. Bahadir K. Gunturk2 Biometrics Biometric recognition refers to the use of distinctive characteristics (biometric identifiers)
Presented By Lingzhou Lu & Ziliang Jiao. Domain ● Optical Character Recogntion (OCR) ● Upper-case letters only.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
By Pushpita Biswas Under the guidance of Prof. S.Mukhopadhyay and Prof. P.K.Biswas.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
Scanned Documents INST 734 Module 10 Doug Oard. Agenda Document image retrieval  Representation Retrieval Thanks for David Doermann for most of these.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Graphology / Handwriting Analysis
MDL Principle Applied to Dendrites and Spines Extraction in 3D Confocal Images 1. Introduction: Important aspects of cognitive function are correlated.
Course 3 Binary Image Binary Images have only two gray levels: “1” and “0”, i.e., black / white. —— save memory —— fast processing —— many features of.
Handwriting Recognition
Arabic Handwriting Recognition Thomas Taylor. Roadmap  Introduction to Handwriting Recognition  Introduction to Arabic Language  Challenges of Recognition.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
The Big Picture Things to think about What different ways are there to collect information automatically? What are the advantages and disadvantages of.
Optical Character Recognition
April 21, 2016Introduction to Artificial Intelligence Lecture 22: Computer Vision II 1 Canny Edge Detector The Canny edge detector is a good approximation.
Image Representation and Description – Representation Schemes
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
S.Rajeswari Head , Scientific Information Resource Division
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Computer Vision Lecture 5: Binary Image Processing
UN Workshop on Data Capture, Bangkok Session 7 Data Capture
Target 4-2 Handwriting Analysis.
Handwriting Analysis Like Fingerprints, every person’s handwriting is unique and personalized Handwriting is difficult to disguise or forge Questioned.
Aniko T. Valko, Keymodule Ltd.
UN Workshop on Data Capture, Dar es Salaam Session 7 Data Capture
Handwritten Characters Recognition Based on an HMM Model
Warm Up Objective: Scientists will describe questioned documents by analyzing handwriting. What is the topic? What will you be doing? Why is this important?
Presentation transcript:

Handwriting Recognition CPSC UTC/CSE

Handprint Recognition aims to design systems which are able to recognize handwriting of natural language Methods and recognition rates depend on the level of constraints on handwriting. The constraints are mainly characterized by the: –types of handwriting –number of scriptors –size of the vocabulary –spatial layout. Handprint Recognition

Methods and Strategies Recognition strategies heavily depends on the nature of the data to be recognized. In the cursive case, the problem is made complex by the fact that the writing is fundamentally ambiguous as the letters in the word are generally linked together, poorly written and may even be missing. On the contrary, hand printed word recognition is more related to printed word recognition, the individual letters composing the word being usually much easier to isolate and to identify.

Character Recognition techniques can be classified according to two criteria: –the way preprocessing is performed on the data –the type of the decision algorithm Preprocessing techniques include : –the use of global transforms (correlation, Fourier descriptors, etc.) –local comparison (local density, intersections with straight lines, variable masks, etc.) –geometrical or topological characteristics (strokes, loops, openings, diacritical marks, skeleton, etc.) Decision methods include: –various statistical methods, –neural networks, structural matching (on trees, chains, etc.) –stochastic processing (Markov chains, etc.). Character Recognition

Two main types of strategies have been applied to this problem: –the holistic approach - recognition is globally performed on the whole representation of words and there is no attempt to identify characters individually. The main advantage of holistic methods is that they avoid word segmentation –the analytical approach - deal with several levels of representation corresponding to increasing levels of abstraction (usually the feature level, the grapheme or pseudo-letter level and the word level). Words are not considered as a whole, but as sequences of smaller size units which must be easily related to characters in order to make recognition independent from a specific vocabulary Word Recognition

Form-based Handprint Recognition National Institute of Standards and Technology (NIST) released to the public a standard reference form-based handprint recognition system for evaluating optical character recognition (OCR) in Yang/CPSC415/4-Handwriting/hsfsys2.pdf Yang/CPSC415/4-Handwriting/hsfsys2.pdf

The NIST system is designed to read the hand printed characters written on a Handwriting Sample Forms (HSF). The form is designed to collect a large sample to handwriting to support handprint recognition research. NIST Special Database 19 (SD19) contains 3669 completed forms, each filled by a unique writer, and scanned binary at 11.8 pixels per millimeter. The dataset also contains over 800,000 segmented and labeled characters images from these forms. Form-based Handprint Recognition

There is a blank form provided that can be printed, filled in, scanned and recognized.

System Components Batch Initialization Load Form Image Register Form Image Remove From Box Isolate Lines of Handprint Segment Text Lines Normalize Characters Extract Feature Vectors Classify Characters Spell-correct Text Lines

Batch Initialization Load pre-computed items from training –A list of images files to be processed –Coordinate locations of dominant form structures used for form registration –Spatial template containing the coordinate location –Basis functions used for feature extraction –Neural network weights for classification –Dictionaries for spelling correction Four types of fields: numeric, lowercase, uppercase, and preamble paragraph Each type of fields requires a separate set of basis functions and neural network weights.

Register From Image To reliably isolate the handprint on a form Form registration automatically estimates the amount of rotation and translation in the image. Because most forms contains a fixed configuration of vertical and horizontal lines, we trace parallel ray across the image accumulating the number of black pixels along each ray. A range of ray angles are sample, the angles producing the maximum response is used to estimate the rotational skew.

A prototypical from is scanned, its rotational distortion is automatically measured and removed, and the position of the detected dominant line s are stored for future registration. The image is the result of logically ORing corresponding pixels across a set of 500 registered images.

Remove Form Box Given a field sub-image, black pixels corresponding to the handwriting must be separated from the black pixels corresponding to the form. We need locate the box within the field sub-image, and intelligently removes the sides so as to preserve overlapping characters. The sides of the box are detected using a run-based techniques that tracks the longest runs across the sub- image. Overlapping character stokes are identified using spatial cures, and only pixels that are distinctly part of the form’s box are removed.

Remove Form Box

Isolate Lines of Handprint A A connected component is defined as the largest set of black pixels where each pixel is a direct neighbor of at least one other black pixel in the component. For multiple-line responses There are no lines provided within this paragraph box to guide a writer. Bottom-up approach to isolate the lines of handprint within a paragraph. Each component is represented by its geometric center. To reconstruct the handprinted lines of text, a nearest neighbor search is performed left-to-right and top-to- bottom through the system of 2-dimensional points.

Isolate Lines of Handprint

Segment Text Lines Connected components are used as first- order approximations to single and complete characters. Connected components frequently represent single characters and are computed very quickly. Errors occur when characters touch one another and when characters are written with disconnected strokes (naturally occurring with dotted letters).

A simple adaptive model of writing style In a simple adaptive model of writing style, fragmented characters are reconstructed, multiple characters are split, and noise components are identified and discarded. M. D. Garris, “Component-Based Handprint Segmentation Using Adaptive Writing Style Model,” NIST Internal Report 5843, June 1996.

Model Writing Style To To adapt to variations in handwriting style, one needs to be able to statistically capture how much black ink (or pixels) in an image is likely to constitute a single character. Two simple statistical features are measured from each isolated image of handwriting – –The estimated stroke width (esw) approximates the width of the lines comprising the characters. – –The estimate character height (ech) is to find the maximum height of all the connected components in the image. – –Standard stroke pixel (ssp) = square of one stroke width –Standard stroke area (ssa) = –Standard stroke area (ssa) = estimated stroke width * estimate character height

If (component.area < (0.5 * ssa) then Noise where structure member (a) is the pixel area of the component (c) and ssa is the pixel area of a standard stroke width. If (component.width < (2 * esw)) && (component.height < (3 * esw)) then Dot where structure member (w) is the pixel width of the component (c)

Characters that required the merging of connected components

Multiple Character Detection Before one can split touching characters, one must be able to detect that multiple characters exist in a component image. a simple aspect ratio (ar) was tested. where w is the width of the component, and ech is the estimated character height for the field. The larger the width is to the height, the more likely the component contains multiple characters. A training set of single and touching character components was used to compute a range of aspect ratio samples, and a threshold was empirically derived.

Multiple Character Detection standard stroke count (ssc) or ssc = p/ssa where p is the black pixel count of the component.

Vertically Straight Cut An example of multiple characters A component determined to contain multiple touching characters must be further analyzed to derive a strategy for splitting the characters.

Vertically Straight Cut Perpendicular distances are computed from the left and right feature points to the detector line and the larger of the two distances is stored along with the x-position of the vertical cut. By minimizing the maximum perpendicular distances across the range of cuts, the vertical cut is selected whose left and right pieces both contain maximal pixel data and both pieces qualify as single characters.

Contoured Cut Path A single straight cut does not satisfactorily divide the component. In these cases, a more sophisticated non-straight path is required. Starting at the x-position of the optimal vertical cut, a search (or trace) is initiated from the top of the component downwards and from the bottom of the component upwards. The trace downwards (the top- trace) performs much like sand being dribbled down the side of a complex surface.

Segment Text Lines

Normalize Characters and Extract Feature Vectors The segmented character images vary greatly in size, slant, and shape. Image normalization is performed to deal with the size and slant of writing, leaving the recognition process primarily the task of differentiating characters by variation in shape. The Karhunen Loève (KL) transform is applied to these binary pixel vectors in order to reduce dimensionality, suppress noise, and produce optimally compact features (in terms of variance) for classification.

Classify Characters Once segmented characters are represented as feature vectors, a whole host of different pattern classification techniques can be applied. Probabilistic Neural Network (PNN), or Multi-Layer Perceptron (MLP) neural network

Spell-Correct Text Line(s) segmented character images have been extracted from the handprinted paragraph, sorted into reading order line by line, and classified. This results in one long contiguous character stream for each line in the paragraph. Words are parsed from each line of raw classifications by applying the preloaded dictionary.

Spell-Correct Text Line(s)

Lab Exercises Download and Install NIST form-based handprint software. Test and document the process of handwriting recognition. databases/defs/nist_ocr.html Download and Install NIST form-based handprint software. Test and document the process of handwriting recognition. databases/defs/nist_ocr.html databases/defs/nist_ocr.htmlhttp:// databases/defs/nist_ocr.html

References M. D. Garris and P. J. Grother, “Generalized Form Registration Using Structure-Based Techniques,” NIST Internal Report 5726 and in Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval, pp , UNLV, April M. D. Garris, “Method and Evaluation of Character Stroke Preservation of Handprint Recognition,” NIST Internal Report 5687, July 1995, and in Proceedings of Document Recognition III, Vol. 2660, pp , SPIE, San Jose, February M. D. Garris, “Component-Based Handprint Segmentation Using Adaptive Writing Style Model,” NIST Internal Report 5843, June W. Postl, “Method for Automatic Correction of Character Skew in the Acquisition of a Text Original in the Form of Digital Scan Results,” United States Patent Number 4,723,297, February 1988.

Graphology

Graphology or Handwriting Analysis is a science of interpreting a person's character from his/her personal handwriting. Handwriting analyzing can tell a lot about personality Large companies use graphology (handwriting analyzing) to check job applications. The police still use handwriting experts to determine who wrote what. Graphology

 Coupled with psychology and knowledge of human behavior, it can be used for recruitment, marriage compatibility, career guidance, and child development, etc. Signature is easy to collect, but not the best way to analyze character from handwriting, for several reasons. – Firstly, the Signature is sometimes illegible and different from the normal handwriting. – Secondly, the lone signature does not give enough words and letters to help the graphologist in his/her judgment. Graphology

Graphology can analyze a person's personality without the person’s knowledge that he/she is being analyzed. Distortion of the result may be cause by the subject's knowledge that he/she is being analyzed. This is common in most Question-&-Answer type of personality analysis tools. This behavior analysis is more accurate than putting the subject under unnatural stress of a long questionnaire. A single personality questionnaire may reveal a single dimension of personality. But two handwriting samples of the same person at different occasion may reveals different behavior characteristics. Graphology

What is Graphoanalysis? Graphoanalysis is a scientific system of identifying and assessing the character and personality of an individual through a study of handwriting. The techniques used are based on a well-defined, standardized method of: –identifying strokes, –relating these strokes to specific personality traits, and –evaluating the relative strength of the interrelated traits.

The Hidden Meaning of Handwriting Handwriting gives us access to inside secrets about the hidden meaning of handwriting If you received a note like the one below –What would you be able to tell about the writer? The words are friendly enough! –If you look a little closer you will see that there are a number of conflicting signs. –If you look a little closer you will see that there are a number of conflicting signs. –Can you see the danger signals? There are red flags popping up all over the place

An understanding of handwriting analysis will quickly show you that this writer is dogmatic and dictatorial, emotionally unstable, bad tempered and possibly even violent! It's even possible to find the underlying reasons for the dangerous signals and to understand why the writer has so much personal conflict in his life. You'll be surprised to discover how much detail can be extracted from just a single page of handwriting. You'll be surprised to discover how much detail can be extracted from just a single page of handwriting. Of course, to be able to read these signs yourself, you will need to have some understanding of handwriting analysis first. Graphology

Handwriting Guidelines Get a sheet of paper, sign your name, and start to analyze with these guidelines: –Writing that leans to the right shows good communication skills. –Straight, vertical writing shows independence and stability. –Writing that leans to the left may mean you find it difficult to communicate. –Small writing can be a sign of modesty. –Large writing indicates enthusiasm and generosity.

Exercise ( 15 minutes) Get a paper and write the following sentences: –Go west young man and grow up with the country –It is Valentino’s day today, so I will give her … Give it to your group member to analyze it – based on the guidelines.

Lab Exercise Go to the following site and do an online handwriting analysis. – – ex.html ex.htmlhttp:// ex.html