Optical Character Recognition Qurat-ul-Ain (Ainie) Akram Sarmad Hussain Center for language Engineering Al-Khawarizmi Institute of Computer Science University.

Slides:



Advertisements
Similar presentations
Don’t Type it! OCR it! How to use an online OCR..
Advertisements

By: Hossein and Hadi Shayesteh Supervisor: Mr J.Connan.
Florida International University COP 4770 Introduction of Weka.
Word Recognition of Indic Scripts
Extraction of text data and hyperlink structure from scanned images of mathematical journals Ann Arbor, March 19, 2002 Masakazu Suzuki (Kyushu University)
Real-time Computer Vision with Scanning N-Tuple Grids Simon Lucas Computer Science Dept.
1 Lecture 2 Topics –Importance of this material Fundamental Limitations –Connecting Problems and Languages Problems –Search, function and decision problems.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Sindhi Optical Character Recognition By: Mutee U Rahman Muhammad Rafi Waleed Butt سنڌي عڪسي اکرن جي سڃاڻپ.
Evaluating the use of OCR on a Mobile Device Presented by : Hamed Alharbi Supervisor by :Dr Brett Wilkinson.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Artificial Intelligence (AI) Addition to the lecture 11.
Systems Analysis And Design © Systems Analysis And Design © V. Rajaraman MODULE 14 CASE TOOLS Learning Units 14.1 CASE tools and their importance 14.2.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
: Chapter 1: Introduction 1 Montri Karnjanadecha ac.th/~montri Principles of Pattern Recognition.
1 An Efficient Classification Approach Based on Grid Code Transformation and Mask-Matching Method Presenter: Yo-Ping Huang Tatung University.
Automated Patent Classification By Yu Hu. Class 706 Subclass 12.
Getting Started with ANTLR Chapter 1. Domain Specific Languages DSLs are high-level languages designed for specific tasks DSLs include data formats, configuration.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
--Caesar Cat.  Write an optical character recognition application that identifies and recognizes printed text within an image.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Research Component on Technology Concluding Thoughts Sarmad Hussain Center for Research in Urdu Language Processing National University of Computer and.
Computer Science: A Structured Programming Approach Using C A Programming Example— Morse Code Morse code, patented by Samuel F. B. Morse in 1837,
Compiled By: Raj G Tiwari.  A pattern is an object, process or event that can be given a name.  A pattern class (or category) is a set of patterns sharing.
Community Readiness for IDN Variant TLDs Arabic Script Case Sarmad Hussain Center for Language Engineering Al-Khawarizmi Institute of Computer.
Phone Reader Project Presenter: Marilyn Bihina Supervisor: James Connan.
Automated Target Recognition Using Mathematical Morphology Prof. Robert Haralick Ilknur Icke José Hanchi Computer Science Dept. The Graduate Center of.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
# Author: David Foltz # # Notice: Orginal Code Idea for program taken from Python v3.2.2 Documentation def Fibonacci(): """Print a Fibonacci series with.
Handwritten Hindi Numerals Recognition Kritika Singh Akarshan Sarkar Mentor- Prof. Amitabha Mukerjee.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
IR Homework #3 By J. H. Wang May 4, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
Presented By Lingzhou Lu & Ziliang Jiao. Domain ● Optical Character Recogntion (OCR) ● Upper-case letters only.
IR Homework #1 By J. H. Wang Mar. 5, Programming Exercise #1: Indexing Goal: to build an index for a text collection using inverted files Input:
Sequencing The most simple type of program uses sequencing, a set of instructions carried out one after another. Start End Display “Computer” Display “Science”
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
1 An Efficient Classification Approach Based on Grid Code Transformation and Mask-Matching Method Presenter: Yo-Ping Huang.
Objectives: Terminology Components The Design Cycle Resources: DHS Slides – Chapter 1 Glossary Java Applet URL:.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.
Scanned Documents INST 734 Module 10 Doug Oard. Agenda Document image retrieval  Representation Retrieval Thanks for David Doermann for most of these.
SIGNATURE RECOGNITION SYSTEM Group Number:10 Group Members: Richa Goyal(y08uc103) Rashmi Singhal(y08uc102)
Biology Front End. Abushnag’s Tickets Establish Communication document the team work and how the site works Job page Adding the new features to the add.
Section 2B. Objectives List two reasons why some people prefer alternative methods of input over a standard keyboard or mouse. List three categories of.
HTML Introduction 2-1. Lecture 6 HTML - HyperText Markup Language  not a programming language  structure text into title, body, paragraphs, lists, links,
C - IT Acumens. COMIT Acumens. COM. To demonstrate the use of Neural Networks in the field of Character and Pattern Recognition by simulating a neural.
Getting Started with Quick Fields LAB 103 Jonathan Lai.
WP3: Image Segmentation - OCR Stavros Perantonis, Vassilis Maragos Edinburgh, March 6-7, 2003 Institute of Informatics & Telecommunications NCSR “Demokritos”
1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.
Optical Character Recognition
IR Homework #2 By J. H. Wang May 9, Programming Exercise #2: Text Classification Goal: to classify each document into predefined categories Input:
Automatic License Plate Recognition for Electronic Payment system Chiu Wing Cheung d.
A Simple Approach for Author Profiling in MapReduce
A Straightforward Author Profiling Approach in MapReduce
Topics discussed in this section:
S.Rajeswari Head , Scientific Information Resource Division
Intro to Machine Learning
Overview of Compilation The Compiler Front End
Overview of Compilation The Compiler Front End
Introductory Seminar on Research: Fall 2017
Optical Character Recognition
network of simple neuron-like computing elements
PROJECTS SUMMARY PRESNETED BY HARISH KUMAR JANUARY 10,2018.
Lecture 7: Simple Classifier (KNN)
Process Description Tools
Engine Part ID Part 1.
Engine Part ID Part 2.
Engine Part ID Part 2.
Topics discussed in this section:
iLayout: Performance Evaluation
Presentation transcript:

Optical Character Recognition Qurat-ul-Ain (Ainie) Akram Sarmad Hussain Center for language Engineering Al-Khawarizmi Institute of Computer Science University of Engineering and Technology, Lahore, Pakistan Lecture 8

Syllable String Creation using lookup table ISSALE Syllable String Main body ID Diacritics1_I D …. تا 5002 و 501 پتھر

Project Presentation 1.Front Page – Optical Character Recognition(in English) – Optical Character Recognition(in Your Language) – Document Image – Output of OCR (Recognized Syllable Strings of OCR) – Syllable String Recognition Accuracy(Syllables /Total Syllables*100) – Group Members Name ISSALE 20143

1.Preprocessing – Line Segmentation Samples of line segmentation Line segmentation accuracy results Samples of incorrect line segmentation – Syllable/Ligature Segmentation Samples of Syllable/Ligature segmentation Syllable/Ligature Segmentation Accuracy Results Samples of incorrect Syllable/Ligature segmentation ISSALE Total LinesCorrectl LinesIncorrect Lines % Accuracy Total SyllablesCorrectly Syllables Incorrect Syllables % Accuracy

Pre-processing – Main body and diacritics disambiguation ISSALE Total main bodiesCorrectly classified as main bodies % Accuracy Total diacriticsCorrectly classified as diacritics % Accuracy

Classification and Recognition – Data Description 15 Main body Types (DataSet-1) – Training Data (35 Tokens) – Testing Data (15 Tokens) – Image samples Document Images(DataSet-2) – Testing Data » X Tokens of Y main body Types » X Tokens of Y diacritics Types » Image sample ISSALE Main body TypeTotal tokens in document images Total unique syllables in document images

Classification and recognition results – Recognition Results on DataSet-1 using Decision Trees Main body recognition accuracy – Diacritics recognition accuracy – Recognition Results on DataSet-1 using Tesseract Main body recognition accuracy – Diacritics recognition accuracy ISSALE Class TypeTotal Samples Test data (15 Tokens) Correctly Recognized % Accuracy Class TypeTotal Samples Test data (15 Tokens) Correctly Recognized % Accuracy

Classification and recognition results – Recognition Results on DataSet-2 using Decision Trees Main body recognition accuracy – Diacritics recognition accuracy OR – Recognition Results on DataSet-2 using Tesseract Main body recognition accuracy – Diacritics recognition accuracy ISSALE Class TypeTotal SamplesCorrectly Recognized % Accuracy Class TypeTotal SamplesCorrectly Recognized % Accuracy

Post-processing – Syllable String Creation – Syllable String Recognition Accuracy ISSALE Syllable String Main body ID Diacritics1_I D …. تا 5002 و 501 Syllable TypeTotal SamplesCorrectly Recognized % Accuracy

Output of OCR Input Document Image ISSALE OCR Output

Deliverables to submit 1.Presentation slides 2.OCR Complete Code 1.Line segmentation 2.Syllable segmentation 3.Recognition of diacritics and main bodies 4.Syllable string creation using lookup Table 5.Output.txt file generation 3.Data Set-1 4.Data Set-2 5.Tesseract Traineddata file ISSALE

Good Luck

Document Image Creation ISSALE Syllable_of_MB1_Samples_1 Syllable_of_MB2_Samples_1 Syllable_of_MB2_Samples_1 Syllable_of_MB3_Samples_1 Syllable_of_MB4_Samples_1 Syllable_of_MB5_Samples_1,,, Syllable_of_MB15_Samples_1 Syllable_of_MB1_Samples_2 Syllable_of_MB2_Samples_2 Syllable_of_MB2_Samples_2 Syllable_of_MB3_Samples_2 Syllable_of_MB4_Samples_2 Syllable_of_MB5_Samples_2,,, Syllable_of_MB15_Samples_2 Syllable_of_MB1_Samples_3 Syllable_of_MB2_Samples_3 Syllable_of_MB2_Samples_3 Syllable_of_MB3_Samples_3 Syllable_of_MB4_Samples_3 Syllable_of_MB5_Samples_3,,, Syllable_of_MB15_Samples_3 Syllable_of_MB1_Samples_4 Syllable_of_MB2_Samples_4 Syllable_of_MB2_Samples_4 Syllable_of_MB3_Samples_4 Syllable_of_MB4_Samples_4 Syllable_of_MB5_Samples_4,,, Syllable_of_MB15_Samples_4, Syllable_of_MB1_Samples_15 Syllable_of_MB2_Samples_15 Syllable_of_MB2_Samples_15 Syllable_of_MB3_Samples_15 Syllable_of_MB4_Samples_15 Syllable_of_MB5_Samples_15,,, Syllable_of_MB15_Samples_15 Syllable = MB + Diacritics or Syllable = MB

Examples of Document Image ISSALE