iLayout: Performance Evaluation

Slides:



Advertisements
Similar presentations
Don’t Type it! OCR it! How to use an online OCR..
Advertisements

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
HTML / CSS – Basics Why the heck are we doing this?
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
UNIVERSITY OF PUERTO RICO AT HUMACO DEPARTMENT OF PHYSICS ENGINEERING GRAPHICS I Traditional tools Dr. Walter López Moreno.
Word Recognition of Indic Scripts
I MAGE SEGMENTATION AND 3 D MODELING TO BOOST TEXT RECOGNITION IN NATURAL SCENES Shounak Gore 04/26/11.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 9 Qualitative Data Analysis and Interpretation.
Aletheia Apostolos Antonacopoulos PRImA Lab, The University of Salford, United Kingdom
Observing Behavior A nonexperimental approach. QUANTITATIVE AND QUALITATIVE APPROACHES Quantitative Focuses on specific behaviors that can be easily quantified.
Title slide PIPELINE QRA SEMINAR. PIPELINE RISK ASSESSMENT INTRODUCTION TO RISK IDENTIFICATION 2.
Validation of the GLC2000 products Philippe Mayaux.
Content Level Access to Digital Library of India Pages
IIIT HyderabadUMASS AMHERST Robust Recognition of Documents by Fusing Results of Word Clusters Venkat Rasagna 1, Anand Kumar 1, C. V. Jawahar 1, R. Manmatha.
Face Alignment Using Cascaded Boosted Regression Active Shape Models
NCIDA 2008 Keep the Magic in Math Joanie Gerken, Fellow Trident Academy.
Hypothesis Testing.
Qualitative Data Analysis and Interpretation Dr. Bill Bauer
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
1 Macmillan Academy - ICT Department Unit 2 – ICT In Organisations UNIT 2 – ICT IN ORGANISATIONS.
Million Book Bibliotheca Alexandrina Noha Adly 20 November 2006.
E-Books Presentation. Hard Copy (Book) Scanning OCR Text Document HTML Conversion Text Formatting Linking Image Insertion Final QC Soft Copy (JPG/TIFF)
Identify a Health Problem Qualitative Quantitative Develop Program -theory -objectives -format -content Determine Evaluation -design -sampling -measures.
OCR AS Applied ICT Business Documents. Session Outline Intro to flyers Outline of Flyers assignment Plan, produce and review own flyers.
Gateacre Community Comprehensive School Learning Objectives AO1 - Research and plan a set of documents to meet a given client brief. y13 – Unit 7 Desktop.
How do we assess whether we are improving instrument design? Alice McGee.
IIIT Hyderabad Document Image Retrieval using Bag of Visual Words Model Ravi Shekhar CVIT, IIIT Hyderabad Advisor : Prof. C.V. Jawahar.
EXPLOITING DYNAMIC VALIDATION FOR DOCUMENT LAYOUT CLASSIFICATION DURING METADATA EXTRACTION Kurt Maly Steven Zeil Mohammad Zubair WWW/Internet 2007 Vila.
OCR AS Applied ICT Business Documents. Big picture.
Mrs Gutteridge. Welcome Back - Year 11 Focus A) Unit 3 Building a Business = 50% exam 2015 B) Reinforcing Learning from Unit 1 Intr to Small Businesses.
1 NEW CLAIT 2006 OCR Level 1 Certificate/Diploma for IT Users.
Project Risk Management Planning Stage
Improving Dependability in Service Oriented Architectures using Ontologies and Fault Injection Binka Gwynne Jie Xu School of Computing University of Leeds.
ITSC/University of Alabama in Huntsville ADaM version 4.0 (Eagle) Tutorial Information Technology and Systems Center University of Alabama in Huntsville.
Qualitative Reading Inventory
Scanned Documents INST 734 Module 10 Doug Oard. Agenda Document image retrieval  Representation Retrieval Thanks for David Doermann for most of these.
A Performance Characterization Algorithm for Symbol Localization Mathieu Delalandre 1,2, Jean-Yves Ramel 2, Ernest Valveny 1 and Muhammad Muzzamil Luqman.
Tutorials and Quick Guides A quick introduction. Overview  Genre of Tutorial  Genre of Quick Guide  Genre of Reference  Genre of User Manual  Attributes.
3. IPUMS Documentation Dynamic Metadata System: 5 “clicks” to compare any census question, in English, for any combination of years and countries in the.
Evaluating Web Sources By Kathy West English II Research.
Objective Enhance the document production workflow at US Government Printing Office (GPO) Extract images from PDF OCR the extracted images/PDF Produce.
A Performance Characterization Algorithm for Symbol Localization Mathieu Delalandre 1, Jean-Yves Ramel 2, Ernest Valveny 1 and Muhammad Muzzamil Luqman.
1 Record Linkage & Fuzzy Matching (More on "Blocking" for Performance Improvement) Joseph Vertido Melissa Data Fuzzy.
Pupil Name OCR Nationals in ICT (2010) : Unit 06: Design a SpreadsheetsAO2 – Format a spreadsheet.
EECS6898 Final Project Mortality Predictions in ICU Yijing Feng yf2375.
Figure 5.1 The steps involved in an evaluation process of an event or activity. Information that outlines what should be considered at each stage is included.
Machine Learning with Spark MLlib
Previously Covered Material
UNIVERSITY OF PUERTO RICO AT HUMACO
Cover Page Children Bible Stories – 2nd Edition
Cover Page Children Bible Stories – 2nd Edition page 1: the cover of the book Page 2: the contents of the book Page 3: the sample of.
INTEGRATED SPEAKING AND WRITING
Historic Document Image De-Noising using Principal Component Analysis (PCA) and Local Pixel Grouping (LPG) Han-Yang Tang1, Azah Kamilah Muda1, Yun-Huoy.
Computers & Programming Languages
Fail Fail Poor Communication Lack of Documentation Poor Execution.
Computational Imaging and Display Project Title
Chapter 10 Image Segmentation.
Accuracy vs. Precision.
A01 DESIGN To be completed Your proposal  Your House style 
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
INFORMATION VISUALIZATION (CS 5984) PRESENTATION
Chapter 8 Making Sense of Statistical Significance: Effect Size, Decision Errors, and Statistical Power.
Presentation and project
Standards of Measurement
DESIGN OF EXPERIMENTS by R. C. Baker
Department of Computer Science Ben-Gurion University of the Negev
Jiahe Li
Evaluate the integral {image}
Cross-lingual Information Retrieval (CLIR) Johns Hopkins University
Faithful Multimodal Explanation for VQA
Presentation transcript:

iLayout: Performance Evaluation

Dataset Sample Images

Architecture Output Text Traditional OCR OCR using i-Layout Document image OCR Output Text Traditional OCR Document Image Page Segmentation by i-Layout OCR Combining output from blocks Output Text OCR using i-Layout

Evaluation Metrics Evaluation Metrics Intersection/Union based Error wise Reading Order Based Penalty Goal oriented (OCR accuracy) Goal: Systematic and detailed analysis of layout performance at every stage Error wise quantitative and qualitative analysis

Intersection/Union based measure  

Error based evaluation Can we quantify each errors individually?

Error based evaluation  

Error based evaluation  

Evaluation Measure Score Over-segmentation score 0.0984 Under-segmentation score 0.2923 False Alarm Score 0.5640 Missing Score 0.0045 Error based performance evaluation for i-Layout on 100 pages (Telugu book)

Such splits do no affect OCR accuracy, thus should be less penalized Less Penalty Such splits do no affect OCR accuracy, thus should be less penalized

Goal oriented evaluation Language #Pages OCR Accuracy before iLayout OCR Accuracy after iLayout Char Word Telugu 50 45.98 4.84 70.14 46.97 Comparison of OCR accuracy before & after page-segmentation using i-Layout

OCR Accuracy after iLayout Evaluation on failed pages Evaluated on 130 images, previously reported as failures by CDAC-Noida. Language #Pages OCR Accuracy after iLayout Char Word Telugu 100 61.66 14.51 Hindi 30 68.87 46.07

Visual Results Consortium OCR i-Layout

Visual Results Consortium OCR i-Layout