Presentation is loading. Please wait.

Presentation is loading. Please wait.

TECHNOLOGY SUPPORT FOR ESSSS Progress, Issues, and Challenges Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library.

Similar presentations


Presentation on theme: "TECHNOLOGY SUPPORT FOR ESSSS Progress, Issues, and Challenges Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library."— Presentation transcript:

1 TECHNOLOGY SUPPORT FOR ESSSS Progress, Issues, and Challenges Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library Founder and Publisher, Library Technology Guides http://www.librarytechnology.org/ http://twitter.com/mbreeding ESSSS Digital Archive Workshop February 4, 2012

2 Turning Pages on Paper to Digital Images  Digitizing in the field involves many compromises compared to what can be done in more controlled settings  Access to archives may be of limited duration  Arbitrary and political  Materials deteriorating rapidly  Practices related to physical preservation tend to be minimal  Must be light, fast, and expensive

3

4

5

6 Achieve best results possible  Maximize quality and consistency  Handheld digital cameras  Rapid advancement in capabilities  Early images down at lower resolutions compared with what is possible today  Fixed camera stands  Consistency in orientation and framing  Organization of Images (folders / image names)

7 Image Standards  TIFF: Currently regarded as best image format for archiving images  RAW: Native proprietary format of a camera  JPEG: Compressed images for display on the Web  Data lost during compression: non-reversible  VU system creates multiple sizes of JPEG images  JPEG2000  Lossless compression method  Not well supported on the Web

8 Bringing Images to the Web  Take advantage of infrastructure developed at by the Vanderbilt University Library to manage images  Digital Library framework:  Presentation and functionality created in Perl-based interface  Data and Metadata stored in MySQL relational tables  ODBC connectivity between presentation layer and MySQL  Microsoft Windows Server/IIS for Web server  Images reside on digital storage provided by the Vanderbilt University Library

9 Digital Preservation  Disaster Recovery  Ability to restore files in the case of any hardware, software, or human Error  Digital Preservation  Commitment and processes in place to preserve digital information for the very long term  Multiple replications  Migration of data into future formats as current standards become obsolete

10 Building structure through Metadata  Metadata structure based on Dublin Core  Volume-level descriptive metadata  Courtney Campbell designed metadata structure and is analyzing volumes to populate metadata for each volume  EXIF Data extracted from images into the individual records for each page  Page-level structure  Supports ability to select volumes and browse page images

11 Demonstration  Image management environment  Interface  Metadata  Page Images

12 Turning Pages into Data  The contents of the page images contain valuable data  Page images can be read by humans but do not support essential features: search, computer analysis, etc.  Full value of these collections can be realized through transcription

13 Challenges in transcription  Page characteristics  Hand written by many different hands  Many names and numbers  Spanish language  Varying contrast  Many defects: water damage, insects, etc

14

15

16

17

18 Human transcription  Scholars that work with pages of interest can create transcriptions manually  Optical character recognition?  Highly accurate for typescript  Not effective for handwritten manuscripts

19 Crowdsourcing  Find ways to have large numbers of persons create transcript snippets  Google uses crowdsourcing to improve transcripts for Google Books project.

20 Google ReCAPTCHA:  “Digitizing books one word at a time”  Each transaction transcribes one or two words  Each word is transcribed many times  Results compared to determine correct version

21 Google ReCAPTCHA

22 Crowdsourcing to Transcribe ESSSS  Scholars contribute any transcriptions created as they work with any given set of pages  Students assigned to create transcriptions  Language, history, LIS  Collaboration with some organization with ReCAPTCHA like infrastructure


Download ppt "TECHNOLOGY SUPPORT FOR ESSSS Progress, Issues, and Challenges Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library."

Similar presentations


Ads by Google