Creating a Digital Microfilm Library Dan R. Olsen Jr Computer Science Dept Brigham Young University.

Slides:



Advertisements
Similar presentations
Introduction to Computers Chapter 2 §Computers are common in every facet of human activity. §They overcome the shortcomings of conventional radiography.
Advertisements

From books to bytes: accelerating digitization at TTU Libraries with Kirtas BookScan APT 2400 Jessica Lu Donell Callender Texas Tech University Libraries.
Basic Computer Vocabulary
CLEARSPACE Digital Document Archiving system INTRODUCTION Digital Document Archiving is the process of capturing paper documents through scanning and.
Why should I consider Implementing a Document Imaging / Management System? Created by Harold Hegerhorst North American Technology. LLC © North American.
THE JOKOMO / YAMADA LIBRARY DIGITAL LIBRARY PROJECT.
Chapter 4: Representation of data in computer systems
David Underdown 22 November 2013 Digital Records and the County Durham Home Guard Catalogue Day 2013.
Computer Hardware.
* Making Timely and Accurate Patient Care Decisions through Vista Imaging.
An Introduction to Computer Forensics James L. Antonakos Professor Computer Science Department.
ARCHIVE IMAGING SEARCHABLE VIA THE WEBPAC Marthie de Kock The Hong Kong Institute of Education 9 December 2002.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Basic Computer Components
Real-time Computer Vision with Scanning N-Tuple Grids Simon Lucas Computer Science Dept.
Introduction to Computer Terminology
assumes basic arithmetic
Introduction to Computers Essential Understanding of Computers and Computer Operations.
In the puzzle of history, archives are the clues. records created by individuals, institutions and governments. the institutions that preserve them. Archives.
3-2 Solving Inequalities Using Addition or Subtraction
1 Australian Newspapers Digitisation Program Development of the Newspapers Content Management System Rose Holley – ANDP Manager ANPlan/ANDP Workshop, 28.
HISTORY OF WINDOWS OS! BY JASON. ORIGINS OF WINDOWS OS. Windows started with Windows Windows 1 had “windows” you can click out of with.
Communications Technology 2104 Mercedes Lahey. Bit 1. bit=From a shortening of the words “binary digit” 2. the basic unit of information for computers.
Introduction to Video Game Programming (VGP) Mr. Shultz.
Computer Systems I’m ONLY a machine! Standard Grade Revision.
Introduction to Computers
Computer Systems. Basic Components Auxiliary Storage OutputProcessor Main Memory Input.
Computer System.
INTRODUCTION TO COMPUTERS The personal computer was far from reality in Computer users were limited to specially trained operators and engineers.
Uganda Science Digital Library (USDL) Digitizing and publishing documents Bergen – Makerere visit February 2005.
The Complexities & Economics of Digitizing Microfilm
Computer Basics By: Nicole Hayes. Purposes of Computers Business – Track inventory. – Calculate payroll. – Maintain databases. Personal – Surf the web.
Copyright © Allyn & Bacon 2008 POWER PRACTICE Chapter 4 Digital Technologies in the Classroom START This multimedia product and its contents are protected.
SilverImage Streamline tool for photographing herbarium specimen sheets Developed by SilverBiology Michael.
1 Bridging the gap between the paper past and digital future.
1 Introduction to Computers By Masseta ICT Dept. Mzumbe University.
LDS-TCGS Archive Project By Shon R. Edwards A.G., M.A.
NCAR storage accounting and analysis possibilities David L. Hart, Pam Gillman, Erich Thanhardt NCAR CISL July 22, 2013
Training begins in… 15:00 minutes Training begins in… 14:00 minutes.
FAMILYSEARCH INDEXING IS WORLDWIDE. INDEXING 1.WHAT IS INDEXING? - A PROCESS WHERE A PERSON CAN TRANSCRIBE DATA FROM A DIGITAL IMAGE WHICH IS THEN POSTED.
How We Measure Memory. Learning Goal Today we are going to learn how the computer stores information.
MSBCS-BCSI-1 Students will identify computer system components MSBCS-BCSIII-8 Students will examine basics of networking Computer Basics & Terminology.
Control & Data Handling, Operator Control, Aircraft Interface to C&DH Steve Musko Space Physics Research Laboratory University of Michigan Ann Arbor, MI.
Computers - The Journey Inside continues…
Semester One 2001/2002 Sheffield Hallam University1 What does a PC do? computers collect, process, store, and communicate information Collect via input.
Hashed Files Text Versus Binary Meghan Cavanagh. Hashed Files a file that is searched using one of the hashing methods User gives the key, the function.
1 Objectives Discuss reasons for taking this course on computers Outline the scope of this course Define the computer Differentiate between hardware and.
Analysing Indexes SQLBits 6 th October 2007 © Colin Leversuch-Roberts Kelem Consulting Limited September 2007.
Measuring Memory and Storage
The Complexities & Economics of Scanning Microfilmed Documents Videos
2.1.4 Data Representation Units.
1 CS 430: Information Discovery Lecture 17 Web Crawlers.
Chapter 2 Hardware.
The Big Picture Things to think about What different ways are there to collect information automatically? What are the advantages and disadvantages of.
What to look for?  Consider whether you should purchase a new or used computer.  Many people who sell computers before they are out of date allow you.
Basic Computer Components Unit 2. What is a computer?  A computer is an electronic device that accepts raw data and processes it into information that.
1 FamilySearch Catalog Update CCLA 04/21/ FamilySearch Catalog.
Digital Storage Digital Storage Ann Ware ~
Hardware specifications
Tools for identifying duplicate files and known software files
Computer Hardware and Software
File Sizes Bits and Bytes.
Automatic Digitizing.
CHAPTER 4 Memory System Organization and Architecture
Before we get started You will need the following apps downloaded to your mobile device: Microsoft Translator Office Lens  This matches with Engage section.
Storage Chapter 7.
instructor Sami Aleswi Faculty of Computing and Information Technology
Digital Storage Digital Storage
Digital Storage Digital Storage Ann Ware ~
Lesson 1 Computer Technology.
Presentation transcript:

Creating a Digital Microfilm Library Dan R. Olsen Jr Computer Science Dept Brigham Young University

The meaning of family history

Getting in touch with your ancestors

Family history out of the library and into the home

How big is the problem? 2.5 million films and growing 1000 images per film = 2.5 billion images 600 KB per image = 1,500,000 Gigabytes 25,000 laptop hard disks :-)

What will it cost to store it? 1,500,000 GB $30 per GB - real servers not PCs Total library cost –Today - $45,000,000 –5 years - $4,500,000 $450,000 –10 years - $450,000

Producing the Image Library Scanning rate frames per minute (optimistic) Images to scan Billion Scanner time per year hours To complete the library –208 scanner years

Cost of production 20 scanners = $1,000,000 –10 years to finish –replacement costs $1,000,000 Worker costs = 10x$100,000 = $1,000, year plan –GB per year - 150,000 $3,000,000 –Cost - $3,000,000

Creating the Digital Microfilm Library 10 years $8,000,000

Delivering images to the home Image size = 500K Dialup data rate = 5K Bytes / sec –(on a good day) Time per image = 1.6 minutes You cannot scan digital images the way you scan through microfilm The library must be indexed at the image level

Extracting data from images Extraction 12/hour –Assumes one record per image hours to extract the entire library –208 million hours –cost at $5/hour = $1 billion 20,000 extractors hours per year –104 years to complete 2 million extractors to complete in 10 years

To build the library in 10 years $5,000,000 to store $3,000,000 to scan must be indexed $1,000,000,000 to extract using current approach Need a new indexing plan

Extract for index Ordered collections by name by date –Parish records –Death records –Main archive group sheets Unordered collections –Wills –Deeds –Other records where image order is not helpful

Ordered collections Extract top date or name from each image –100/hour –12 years to complete - 20,000 volunteers Sample extract every 10 images then interpolate –1000/hour –1.2 years to complete - 20,000 volunteers

Unordered collections Extract only essential name, date and place info Let the image carry most of the data –Eliminate interpretation errors in extraction Extractors map extracted data to image fragments Auto extraction methods - OCR and Handwriting –Use extraction as a training set for new algorithms

What library should we build?

Lessons from the past Microfilm library –Make the raw data available in a uniform way WWW/GEDCOM –Make the library open –Base collection Tools on top but not in place Support both people and software Guaranteed archiving –Digital Stone Guaranteed naming

The vision Out of the library into the home Scan it all in 10 years –$8,000,000 Index not extract –High level index (beat microfilm) –Deeper indices on important collections Open library architecture –Raw data and raw indexes publicly available Library of last resort - Digital Stone –Guaranteed archive, guaranteed naming

Family history out of the library and into the home