Presentation is loading. Please wait.

Presentation is loading. Please wait.

METS Case Study: The NYU Digital Library Team METS Opening Day 27 October, 2003 Leslie Myrick.

Similar presentations


Presentation on theme: "METS Case Study: The NYU Digital Library Team METS Opening Day 27 October, 2003 Leslie Myrick."— Presentation transcript:

1 METS Case Study: The NYU Digital Library Team METS Opening Day 27 October, 2003 Leslie Myrick

2 Projects at NYU using METS EAD Finding Aid Project Tokyo Tribunal Proceedings Afghanistan Digital Library CRL Political Web Archiving Project DRAM * Hemispheric Institute * REPO History Sign Project *

3 WHY METS? (1) METS was formulated to serve as a: Submission Information Package Archival Information Package Dissemination Information Package

4 Why METS? (2) In other words, it’s a … Transfer Syntax Archival Syntax Functional Syntax

5 METS and Complex Digital Objects Finding aid + images with multiple scans/versions Page turner for photo albums, documents, books – Edisto Album, Tokyo Tribunal brief, Afghanistan Digital Library Multimedia/Time-Based Media Navigators: Hemispheric Institute; SMIL Viewer Web Site Navigator – CRL Political Communications Web Archiving Project

6 Using METS as a SIP Berol Collection Finding Aid -- in negotiations with RLG Cultural Materials Project METS will be bundled with objects; EAD

7 METS as a Functional Syntax METS designed not only for transfer and archival management, but for giving access to, navigating an object METS + XSLT can create dynamic interfaces with links to resources and their metadata METS can be dumped into Oracle, indexed and searched using context-aware queries.

8 METS Plays Well With Others We have … EAD Finding Aids pointing to METS METS pointing to Finding Aids and marcxml records METS pointing to and manipulating TEI

9 METS and Extensions at NYU MODS and DC for descriptive MIX for Images/technical textMD for text/technical LC A/V Prototype + smptetechMD + AES Missing Links: overall Preservation Schema plugin (PREMIS); rights MD schema

10 Ingredients (so far) Perl MySQL and some Oracle Tomcat Servlets and jsp Saxon and XT XSLT

11 Tools for Creation zeroDB Database Input via interface as well as batch loading of metadata extracted by scripts e.g. ImageMagick identify, arcscraper.pl Outputs METS using Perl DBI

12 Tools for Dissemination Page-turners Multimedia Viewers Thumbnail Browsers

13 Typical METS Creation Workflow ImageMagick extraction of image metadata Database input (batch and manual entry) of descriptive and technical metadata Generation of METS using Perl DBI against MySQL

14 Image Magick Verbose Dump Image: taqw_001s.jpg Format: JPEG (Joint Photographic Experts Group JFIF format) Geometry: 625x886 Class: DirectClass Type: true color Depth: 8 bits-per-pixel component Colors: 33080 Profile-color: 552 bytes Profile-iptc: 5636 bytes unknown: êëÿ Resolution: 100x100 pixels/inch Filesize: 210kb Interlace: None Background Color: white Border Color: #dfdfdf Matte Color: grey74 Iterations: 0 Compression: JPEG signature: 8c37d0b82374d8eaa6b4d6b062699a9b8d7d86f2ba1d4e320f2226181d062822 Tainted: False

15 Image Magick non-Verbose Dump taqw-fr001.tif TIFF 6500x6817 DirectClass 8- bit 126mb 4.3u 0:06 taqw-fr001s.jpg[1] JPEG 625x886 DirectClass 8-bit 191kb 0.0u 0:01 taqw-fr001t.jpg[2] JPEG 100x142 DirectClass 8-bit 9954b 0.0u 0:01

16 Extracting METS from a DB doWebArchive.cgi MODS for homepage; DC for pages MIX for images/technical textMD for web page/technical

17 METS for Discovery Dump METS files into Oracle as CLOB Create Oracle Intermedia index – XML-aware full-text search Example: CRL political web archiving project

18 CRL Political Web Archive Collaboration between Stanford, Cornell, Texas, NYU, IA under aegis of CRL, Mellon Sub-Saharan Africa, South East Asia, Latin America, Western Europe Testbed: 400 URLs; websites from radical groups, NGOs Internet Archive.arc files

19 .arc file 100 MB aggregate of harvested files, along with HTTP headers and crawler- generated header for each file Fine as a simple SIP, but basically unmanageable as an AIP or DIP At present accessed using byte offsets to grab content from aggregate file Only searchable by URL (Wayback Machine)

20 Automated extraction of text-based metadata e.g. web pages arcscraper.pl – Descriptive and technical MD for object datscraper.pl – Checksums, titles – Links from each object makeLinkTable.pl – Creates link to object relationships

21 Go to Videotape

22 The Future? Persistent Identifiers Preservation Metadata Schema Java development Move from Oracle to Cheshire II


Download ppt "METS Case Study: The NYU Digital Library Team METS Opening Day 27 October, 2003 Leslie Myrick."

Similar presentations


Ads by Google