Toward Automatic Processing and Indexing of Microfilm.

Slides:



Advertisements
Similar presentations
Don’t Type it! OCR it! How to use an online OCR..
Advertisements

Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Database Management Using Microsoft Access Xinhua Chen, Ph.D. Chinese Association of Professionals in Science and Technology March 23, 2003.
The following is designed to give a brief understanding of the different methods you can use to scan, file and search documents in FILEstream. FILEstream.
Organisation Of Data (1) Database Theory
Microsoft Access.
XML DOCUMENTS AND DATABASES
Computer Science Research for Family History and Genealogy David W. Embley Heath Nielson, Mike Rimer, Luke Hutchison, Ken Tubbs, Doug Kennard, Tom Finnigan.
Integrating Access with the Web and with Other Programs.
Automatically Identifying Record Patterns from the Extracted Data Fields of Genealogical Microfilm Kenneth Tubbs David W. Embley.
ISP 121 Week 1 Introduction to Databases. ISP 121, Winter Why a database and not a spreadsheet? You have too many separate files or too much data.
Access Tutorial 1 Creating a Database
Creating a Blank Database 1. Open up Microsoft Access 2. Click on Blank document button 3. On the right panel, Specify the location for saving your database.
Automatically Identifying Records from the Extracted Data Fields of Genealogical Microfilm Kenneth Tubbs.
Microsoft Access 2007 Microsoft Access 2007 Introduction to Database Programs.
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
Databases & Data Warehouses Chapter 3 Database Processing.
PubMed/How to Search, Display, Download & (module 4.1)
Chapter 2 Querying a Database
Create Mailing Labels (Word 2007) Word 2007 using the Mail Merge function and an Excel spreadsheet Create mailing labels from Member Rosters in.
The objective of this presentation is to provide an on-the-job reference for pharmacy technicians, instructors and other authorized pharmacy users. This.
Chapter 11 Databases. 11 Chapter 11: Databases2 Chapter Contents  Section A: File and Database Concepts  Section B: Data Management Tools  Section.
Lesson 28: Exploring Access Learning Objectives After studying this lesson, you will be able to:  Define database and key terms associated with.
Chapter 2 Querying a Database MICROSOFT ACCESS 2010.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
LSP 121 Week 1 Intro to Databases. Welcome to LSP 121 Quantitative Reasoning and Technological Literacy II Continuation of quantitative data concepts.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
MICROSOFT ACCESS 2007 BTA – Spring What is Access?  Microsoft Access is a database management system…this means that it contains database information.
Mail Merge Basics. Use Mail Merge to: Create mass mailings Form letters Envelopes Can Print directly to the envelope Graduation announcements Christmas.
Microsoft Word 2000 Presentation 7 Microsoft Word 2000 Presentation 7.
 A database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. What is Database?
1 By: Nour Hilal. Microsoft Access is a database software where data is stored in one or more Tables. A Database is a group of related Tables. Access.
Relational Databases (MS Access)
Database Essentials. Key Terms Big Data Describes a dataset that cannot be stored or processed using traditional database software. Examples: Google search.
Lesson 1: Exploring Access Learning Objectives After studying this lesson, you will be able to: Start Access and identify elements of the application.
Datafaces Data Base Management Software (DBMS) is a tool used to transform Data into Information. What is Data…? What is Information…? What is a Database…?
Microsoft Access 2010 Chapter 10 Administering a Database System.
® Microsoft Office 2013 Access Creating a Database.
Database Management Systems.  Database management system (DBMS)  Store large collections of data  Organize the data  Becomes a data storage system.
What have we learned?. What is a database? An organized collection of related data.
Access Forms and Queries. Entering Data in Your Table  You can add data to your table in Datasheet view, by typing in the columns and rows.  This.
1/62 Introduction to and Using MS Access Database Management and Analysis Yunho Song.
 Decide on the information needed, and create column headings. (See picture below.)
Intro to Databases Vocabulary Copyright © Texas Education Agency, All rights reserved.
ACIS Introduction to Data Analytics & Business Intelligence Database s Benefits & Components.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Database Objective Demonstrate basic database concepts and functions.
Lesson 13 Databases Unit 2—Using the Computer. Computer Concepts BASICS - 22 Objectives Define the purpose and function of database software. Identify.
Chapter 10 Database Management. Data and Information How are data and information related? p Fig Next processing data stored on disk Step.
INTRODUCTION TO ACCESS. OBJECTIVES  Define the terms field, record, table, relational database, primary key, and foreign key  Create a blank database.
Producing a Mail Merged Letter Step 1 Create an Access database for Names and Addresses you can use the ‘Customers’ template in Group Work. Enter the necessary.
CPSC 203 Introduction to Computers T97 By Jie (Jeff) Gao.
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
COMPREHENSIVE Access Tutorial 1 Creating a Database.
Computers Are Your Future Tenth Edition Spotlight 5: Microsoft Office Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
ACCESS LESSON 1 DATABASE BASICS VOCABULARY. BACKSTAGE VIEW A menu of options and commands that allows you to access various screens to perform common.
Introduction to Business Information Systems by Mark Huber, Craig Piercy, Patrick McKeown, and James Norrie Tech Guide D: The Details of SQL, Data Modelling,
Enhance Zone Label OCR Text/ Bitmaps Text/ Bitmaps Database Full Text Index/ Search Full Text Index/ Search Index/ Search by Word ROI Pattern Index/ Search.
Week 1 Intro to the Course Intro to Databases.  Formerly ISP 121  “Continuation” of LSP 120 concepts  Topics include: ◦ Databases ◦ Basic statistics.
Database Essentials Access Lesson 1.
Database Essentials Access Lesson 1.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Database Management  .
Databases.
1 2 3 Here we are on the Ohio Web Library’s home page. To get to Business Source Premier, use the following steps: 1. Go to Ohio Web Library 2. Click on.
Access Tutorial 8 Sharing, Integrating, and Analyzing Data
Lessons Vocabulary Access 2016.
WJEC GCSE Computer Science
Tutorial 8 Sharing, Integrating, and Analyzing Data
Presentation transcript:

Toward Automatic Processing and Indexing of Microfilm

Microfilm Processing Images are scanned from ribbons of microfilm. Each image on the microfilm ribbon is then cropped and de-skewed.

Microfilm Processing Cropped and De- skewed Image

Lines in a document emit a unique signature. Lines in a document emit a unique signature. The algorithm searches for these The algorithm searches for these patterns to detect the lines that describe a table. patterns to detect the lines that describe a table. Image Zoning

Automatically Identifies Table Structure.

Optical Character Recognition A neural net evaluates each zone in the image. A neural net evaluates each zone in the image. The neural net converts the printed characters in each zone into ASCII text. The neural net converts the printed characters in each zone into ASCII text.

Optical Character Recognition Automatically Converts Printed Text to ASCII.

Column-Row Recognition The algorithm uses the geometry of The algorithm uses the geometry of each zone to identify the table’s columns and rows. each zone to identify the table’s columns and rows. The algorithm associates each column and row label with its values in the The algorithm associates each column and row label with its values in the table. table.

Column-Row Recognition

Identify Labels The algorithm maps the printed text of each label to a standardized name. The algorithm maps the printed text of each label to a standardized name. The standardized names correspond to the fields in a database. The standardized names correspond to the fields in a database.

ROAD, STREET, &c., And No. or NAME of HOUSE Address Identify Labels

NAME and Surname of each Person Full Name Address

RELATION to Head of Family Relationship Identify Labels Address Full Name

Extract Data The algorithm identifies factored table values. The algorithm identifies factored table values. The algorithm stores each record in an XML file. The algorithm stores each record in an XML file.

Extract Data CollaferAddress Full Name Relationship* * Extracted by hand.

Extract Data John Eyres HeadAddress Full Name RelationshipCollafer * * Extracted by hand.

Extract Data Annie Eyres WifeAddress Full Name RelationshipCollafer * * Extracted by hand.

Extract Data Lehailes Eyre SonAddress Full Name RelationshipCollafer * * Extracted by hand.

Microfilm Queries A web form provides the interface to query the microfilm database. A web form provides the interface to query the microfilm database. Individuals can enter keywords (such as a first and last name), and the system locates appropriate records in the indexed microfilm documents. Individuals can enter keywords (such as a first and last name), and the system locates appropriate records in the indexed microfilm documents.

John Web Query Eyre

Search Results The system returns the indexed images that contain the results. The system returns the indexed images that contain the results. Since the database indexes both the text and geometry of the document, the process can return just the relevant regions of the microfilm image. Since the database indexes both the text and geometry of the document, the process can return just the relevant regions of the microfilm image.

Search Results Click an image to select a result document.

Search Results Relevant region of the document is displayed.

Just-In-Time Browsing To make the query results display quickly, the system uses Just-In-Time Browsing. To make the query results display quickly, the system uses Just-In-Time Browsing. Just-In-Time Browsing will allow people to browse digitized microfilm and other large collections of images over the Internet at interactive rates. Just-In-Time Browsing will allow people to browse digitized microfilm and other large collections of images over the Internet at interactive rates.

Just-In-Time Browsing Small versions of each image will allow rapid browsing of the collection as a whole.

Just-In-Time Browsing People will be able to “Zoom In” on individual images and parts of images as necessary.