Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal.

Similar presentations


Presentation on theme: "1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal."— Presentation transcript:

1 1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal of Computer Science-2006 Rahma A. Al-Zahrani Presented by King Saud University College of Computer and Information Science Computer Science Department CSC 563 Neural Network Nailah S. Al-Hassoun Prof. Mohamed Batouche Supervised by

2 2 Objectives Overview the various techniques used in recognizing the hand written Arabic characters. Focus on description of Optical Character Recognition (OCR) Present new technique which assists in developing a recognition system for handling the Arabic Hand Written text (AHOCR).

3 3 Outline  Introduction.  Definition  History  (OCR): Optical Character Recognition system.  Proposed method (AHOCR) “ Arabic Hand Written Optical Character Recognition”.  Moment Invariants  Probabilistic Neural Network  Experimental Results.  Conclusion.  References.

4 4 Introduction Machine simulation of human reading has been the subject of intensive research. A large number of researches are concerned with Latin, chiness and Japanese characters. A little work has been conduct on the automatic of Arabic characters because of its complexity of text. We will present the state of Arabic character recognition research.Then, we will present the proposed method.

5 5 Definition Recognized Characters Categories: The task of recognized characters can be separated into two categories: 1.The recognition of machine printed data. Uniform (size,and position for any given font). Simpler. 2.The recognition of hand written data. Non-uniform( different styles and sizes by different writers and by the same writers). Difficult.

6 6 There are two kinds of input for Character Recognition : Off-line character recognition: 1.Takes a raster image from a scanner, digital camera or other digital input source. 2.Off-line processing happens after the writing of characters is complete and the scanned image is preprocessed. 3.Its knowledge is limited to whether a given pixel is on (1) or off (0). On-line character recognition: 1.Takes (x,y) coordinate pairs from an electronic pen touching a pressure-sensitive digital tablet. 2.On-line processing happens in real-time while the writing is taking place. 3.Relationships between pixels and strokes are supplied due to the implicit sequencing of on-line systems that can assist in the recognition task. Example: off-line and on-line handwriting inputs off-line on-line Definition cont.

7 7 History (1990) Hierarchical rule based approach: In this approach the author (Sheik and Al Taweel )assumed : A reliable segmentation stage which divide letters into the four groups of position: 1.Initial, 2.media, 3.final, 4.Isolated. The recognition system depends on a hierarchical division by the number of strokes. (One stroke letters were classified separately from two stroke letters …etc)

8 8 History Cont. (1990) Segmented structural analysis approach: In this approach the author (Al -Emami and Usher) presented : An on-line system to recognize handwritten Arabic words. Use a structural analysis method for selecting features of Arabic characters. The classifications use a decision tree. Words are segmented into primitives that are usually smaller than characters. The system is fed by specifications of the primitives of each character. The system was trained on 10 writers. They use one tester who had a recognition rate of 86%.

9 9 History Cont. (1995) Template matching and dynamic programming approach: In this approach the author ( Alimi and Ghorbel) showed : how to minimize error in an off-line recognition system for isolated Arabic characters using template matching and dynamic programming. The reference bank of prototypes was prepared. When new data was presented to the system, the distance between the prototype and the new data string was minimized using dynamic programming. The number of prototypes was varied to see the effect on recognition rates. More prototypes give better accruing.

10 10 History Cont. (1998) Structural and fuzzy approach: In this approach the author (Amin and Bouslama) presented : A hybrid system that combine structural and fuzzy techniques. –Structural analysis : separated between various letter classes to be recognized. –Fuzzy logic : allowed for variability in people hand writing within the same class.

11 11 History Cont. (2005) Artificial neural networks classifiers: In this approach the author ( Haraty and El-Zabadani) present: A system for recognition of handwritten Arabic text using neural networks. Their work builds upon previous work that dealt with the vertical segmentation of the written text. In fact, They faced some problems like overlapping characters that share the same vertical space. They tried to fix that problem by performing horizontal segmented. The system was tested and the rate of recognition obtained was 90%.

12 12 Optical Character Recognition system (OCR) Optical Character Recognition (OCR) : It is machine reads (machine printed /hand written) characters and tries to determine which character from a fixed set of the (machine printed /hand written) characters is intended to represent. The goal : is to classify optical patterns (often contained in a digital image) corresponding to alphanumeric or other characters. The process of OCR involves several steps including: 1.segmentation, 2.feature extraction, 3.classification.

13 13 OCR-Pre-processing Pre-processing receives a first binary image of characters. It is primarily used to reduce variations of characters. The pre-processing steps often performed in OCR are: –Scanning, –Binarization, –Noise Removal, –Segmentation, –Normalization. Pre-processing Optical Character Recognition System

14 14 Feature Extractor Feature extraction abstracts high level information about individual patterns to facilitate recognition.

15 15 Recognizer There are three categories of character Recognizers (classifiers): –Neural network approach, –Statistical approach, –Structural approach.

16 16 Classification Process For the classification process, there are two steps in building a classifier: 1.Training. 2.Testing. 1 2

17 17 Post-processor The post-Processor is designed to improve the accuracy of the recognition process. The post-processor use various techniques to distinguish one character from another. Example, the geometry of an image can be used to distinguish characters represented by the image. Geometric features include: –loops such that appear in the handwritten letters ﻩ، و، ق،ف. –Straight lines which appear in such handwritten letters as أ.

18 18 Proposed Method (AHOCR) Arabic Hand Written Optical Character Recognition (AHOCR) It aims to convert images document to text. –moving the document from a storage location to a digital scanner. –the output is formatted image with bmp, 'jpg',or 'jpeg’ –applies the image to a recognition system according to the invention.

19 19 Input Image document Binarization Noise removal Normalization Segmentation Thinning Size normalization Slant correction Feature extraction ( moment invariant) Recognizer (probabilistic neural network) Text document Preprocess Character feature extraction

20 20 The main components of AHOCR Preprocess: – This stage is structured from various phase given by: Binarization, noise removal, segmentation and normalization. Character feature extraction operation of AHOCR: –Geometric moment invariant, and probabilistic neural network classifier of AHOCR

21 21 Scanned handwritten document on scanner with (300 dpi) to create image document (as bmp or jpg file format) Binarization: convert coloring image to black and white (binary image) image Noise removal by using median filtering and wiener filter Preprocess image document to segment characters to make each character on the image document as a single bmp image file Thinning Size normalization & slant correction Calculate moment invariant (with seven moments invariant) for each segmented image (Handwritten Arabic characters) and store it in moment text file

22 22 Moment Invariants Moment invariants are properties of connected regions in binary images that are invariant to translation, rotation and scale. They are useful because they define a simply calculated set of region properties that can be used for shape classification and part recognition.

23 23 Probabilistic Neural Network A PNN Probabilistic Neural Network has three layers of nodes : –The input layer contains N nodes, one for each of the N input features of a feature vector. –The hidden layer contains a node for each training vector. The hidden nodes are collected into groups: one group for each of the K classes. –The output layer has a node for each class that is recognized by the PNN.

24 24 Probabilistic Neural Network

25 25 Experimental Results 25 independent writers document Documents were then processed. The experiments were done on 3 disjoint data sets given by: 1. Training (37800)= 20 volunteers x 5 iterations x 378 characters 2. Validation (3780)= 10 volunteers x 378 characters 3. Test (764) (5 volunteers with different number of characters in each document).

26 26 The baseline for recognition accuracy was defined as the average accuracy of the validation and test set of the best PNN architecture Probabilistic neural network (PNN) has 7 input nodes in its input layer and 142 nodes in its output layer architecture Experimental Results

27 27 Experimental Results After running our algorithm with a learning rate of η=0.9 Average accuracy rateTest recognition accuracy rate Validation recognition accuracy rate Train recognition accuracy rate 97% 98%99 %

28 28 Conclusion Arabic handwritten recognition is a difficult problem but the AHOCR system will be a step towards a neural network approach to robustly solve it. AHOCR is optical handwritten Arabic character recognition (OCR) software capable of producing a fully editable electronic document with current accuracy of 97% for isolated Arabic handwritten character recognition and 96% for Arabic handwritten document recognition.

29 29 References Klassen, T., “Towards Neural Network Recognition Of Handwritten Arabic Letters. ”Dalhousie University (2001). (Link)Link Abuhaiba, I.S. “A Discrete Arabic Script for Better Automatic Document Understanding,” The Arabian J. Science and Eng., vol. 28, pp. 77-94, (2003). (Link)Link https://www.dacs.dtic.mil/techs/neural/neural7.php www.homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/F ISHER/mominv.htmwww.homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/F ISHER/mominv.htm www.scipub.org/fulltext/ajas/ajas.html


Download ppt "1 Probabilistic Artificial Neural Network For Recognizing the Arabic Hand Written Characters Khalaf khatatneh, Ibrahiem El Emary,and Basem Al- Rifai Journal."

Similar presentations


Ads by Google