Presentation is loading. Please wait.

Presentation is loading. Please wait.

Toward Automatic Processing and Indexing of Microfilm.

Similar presentations


Presentation on theme: "Toward Automatic Processing and Indexing of Microfilm."— Presentation transcript:

1 Toward Automatic Processing and Indexing of Microfilm

2 Microfilm Processing Images are scanned from ribbons of microfilm. Each image on the microfilm ribbon is then cropped and de-skewed.

3 Microfilm Processing Cropped and De- skewed Image

4 Lines in a document emit a unique signature. Lines in a document emit a unique signature. The algorithm searches for these The algorithm searches for these patterns to detect the lines that describe a table. patterns to detect the lines that describe a table. Image Zoning

5 Automatically Identifies Table Structure.

6 Optical Character Recognition A neural net evaluates each zone in the image. A neural net evaluates each zone in the image. The neural net converts the printed characters in each zone into ASCII text. The neural net converts the printed characters in each zone into ASCII text.

7 Optical Character Recognition Automatically Converts Printed Text to ASCII.

8 Column-Row Recognition The algorithm uses the geometry of The algorithm uses the geometry of each zone to identify the table’s columns and rows. each zone to identify the table’s columns and rows. The algorithm associates each column and row label with its values in the The algorithm associates each column and row label with its values in the table. table.

9 Column-Row Recognition

10 Identify Labels The algorithm maps the printed text of each label to a standardized name. The algorithm maps the printed text of each label to a standardized name. The standardized names correspond to the fields in a database. The standardized names correspond to the fields in a database.

11 ROAD, STREET, &c., And No. or NAME of HOUSE Address Identify Labels

12 NAME and Surname of each Person Full Name Address

13 RELATION to Head of Family Relationship Identify Labels Address Full Name

14 Extract Data The algorithm identifies factored table values. The algorithm identifies factored table values. The algorithm stores each record in an XML file. The algorithm stores each record in an XML file.

15 Extract Data CollaferAddress Full Name Relationship* * Extracted by hand.

16 Extract Data John Eyres HeadAddress Full Name RelationshipCollafer * * Extracted by hand.

17 Extract Data Annie Eyres WifeAddress Full Name RelationshipCollafer * * Extracted by hand.

18 Extract Data Lehailes Eyre SonAddress Full Name RelationshipCollafer * * Extracted by hand.

19 Microfilm Queries A web form provides the interface to query the microfilm database. A web form provides the interface to query the microfilm database. Individuals can enter keywords (such as a first and last name), and the system locates appropriate records in the indexed microfilm documents. Individuals can enter keywords (such as a first and last name), and the system locates appropriate records in the indexed microfilm documents.

20 John Web Query Eyre

21 Search Results The system returns the indexed images that contain the results. The system returns the indexed images that contain the results. Since the database indexes both the text and geometry of the document, the process can return just the relevant regions of the microfilm image. Since the database indexes both the text and geometry of the document, the process can return just the relevant regions of the microfilm image.

22 Search Results Click an image to select a result document.

23 Search Results Relevant region of the document is displayed.

24 Just-In-Time Browsing To make the query results display quickly, the system uses Just-In-Time Browsing. To make the query results display quickly, the system uses Just-In-Time Browsing. Just-In-Time Browsing will allow people to browse digitized microfilm and other large collections of images over the Internet at interactive rates. Just-In-Time Browsing will allow people to browse digitized microfilm and other large collections of images over the Internet at interactive rates.

25 Just-In-Time Browsing Small versions of each image will allow rapid browsing of the collection as a whole.

26 Just-In-Time Browsing People will be able to “Zoom In” on individual images and parts of images as necessary.


Download ppt "Toward Automatic Processing and Indexing of Microfilm."

Similar presentations


Ads by Google