DRS 2 Metadata Migration June 25, 2013. Agenda Introduction Preliminary results - content analysis Metadata options Next steps Questions.

Slides:



Advertisements
Similar presentations
1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.
Advertisements

Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
Introduction to METS (Metadata Encoding and Transmission Standard) Jerome McDonough New York University
METS: An Introduction Structuring Digital Content.
OCLC Digital Archive Overview Judith Cobb LIPA Meeting July 2006.
More Better Metadata SAA 2014 Panel: Metadata and Digital Preservation: How Much Do We Really Need? Andrea Goethals, Harvard Library Even v.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
WIN-202 System QCEW Technical Conference Philadelphia, PA June 16, 2005.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
Fedora 3.0 and METS: A Partnership for the Organization, Presentation and Preservation of Digital Objects Open Repositories Georgia Tech, Atlanta,
Joachim Bauer Senior System Engineer, CCS
R.Jantz, August 31, Two-day forum on PREMIS Preservation Metadata and the Trusted Digital Repositories August 31, September 1 National Library of.
3. Technical and administrative metadata standards Metadata Standards and Applications.
DigiTool METS Profile DigiTool Version 3.0. DigiTool METS Profile 2 What is METS? A Digital Library Federation initiative built upon the work of MOA2.
Chapter 6 Methodology Conceptual Databases Design Transparencies © Pearson Education Limited 1995, 2005.
AIP Archival Information Package – Defines how digital objects and its associated metadata are packaged using XML based files. METS (binding file) MODS.
WMS: Democratizing Data
Descriptive Metadata o When will mods.xml be used by METS (aip.xml) ?  METS will use the mods.xml to encode descriptive metadata. Information that describes,
Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian
Harvard’s Digital Repository Service (DRS) Architecture Harvard University Library (HUL) Andrea Goethals, Randy Stern December 10, 2009.
The New DRS (DRS 2) Introduction. What is DRS? Digital repository for preservation and access –Maintains integrity of deposited content –Preserves content.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
ETL By Dr. Gabriel.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Overview of the Database Development Process
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
FITS: The File Information Tool Set
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
Methodology - Conceptual Database Design Transparencies
Methodology Conceptual Databases Design
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
1 Chapter 15 Methodology Conceptual Databases Design Transparencies Last Updated: April 2011 By M. Arief
Peoplesoft XML Publisher Integration with PeopleTools -Jayalakshmi S.
The DigiTool to FDA Program Lydia Motyka Florida Center for Library Automation.
An Introduction to METS Morgan Cundiff Network Development and MARC Standards Office Library of Congress Metadata Encoding and Transmission Standard.
Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
DRS 2 Orientation Harvard University Library September 30, 2010 DRS = Digital Repository Service.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Nate Trail Network Development & MARC Standards Office 8/1/2006 With help from Sydney Olive How to Build, Display and Find METS Objects.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Implementation of PREMIS in METS Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair San.
Methodology - Conceptual Database Design
Introduction ESDS Qualidata John Southall ESDS Creating and delivering re-usable qualitative data 24 June 2004.
PREMIS Implementation Fair – SF 2009 PREMIS use in Rosetta Yair Brama – Ex Libris.
Label Design Tool Management Council F2F Washington, D.C. November 29-30, 2006
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
Introduction to metadata
ETD2006 Preserving ETDs With D.A.I.T.S.S. FLORIDA CENTER FOR LIBRARY AUTOMATION FC LA PAPER AUTHORS: Chuck Thomas Priscilla.
Rights Metadata in DRS Basic Rights Functions in: – Batch Builder – EAS – DRS Web Admin.
The New DRS Introduction. What is DRS? Digital repository for preservation and access – Maintains integrity of deposited content – Preserves content for.
PREMIS at the British Library Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009.
Metadata and Technology/Architecture Working Groups DLF Aquifer Project DLF Fall Forum Providence, RI November 14, 2008.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
NLW. Object Classes Class 1  1 MARC Record  1 Image  No METS Class 2  1 MARC Record  Many images  No METS Class 3  1 MARC Record  Many.
Warfighter Support Stewardship Growth & Development Leadership Defense EDI Convention Development System (DECoDe) Briefing for: DLMSO April 29, 2008 Defense.
Using Workflow With Dataforms Tim Borntreger, Director of Client Services.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
OAIS (archive) OAIS (archive) Producer Management Consumer.
Joint Meeting of CSUL Committees,
Methodology Conceptual Databases Design
Andrea Goethals, Harvard Library
PREMIS Tools and Services
Metadata in Digital Preservation: Setting the Scene
Methodology Conceptual Databases Design
Presentation transcript:

DRS 2 Metadata Migration June 25, 2013

Agenda Introduction Preliminary results - content analysis Metadata options Next steps Questions

INTRODUCTION

Reason for metadata migration Different data model – File -> Object (a coherent set of content that is considered a single intellectual unit for purposes of description, use and/or management: for example a particular book, web harvest, serial or photograph.) Different metadata schemas – Many locally-defined -> community-standard Different packaging of metadata – Use of METS in some cases -> consistent use of METS

Path to metadata migration Analysis Metadata Content Users Prototype Proof-of- concept Time estimates Migration plan Sequence Schedule Develop tools Dashboard Object builders Metadata migration We are here

Key feedback points Analysis Metadata Content Users Prototype Proof-of- concept Time estimates Migration plan Sequence Schedule Develop tools Dashboard Object builders Metadata migration Technical options Process options

Timing Analysis Metadata Content Users Prototype Proof-of- concept Time estimates Migration plan Sequence Schedule Develop tools Dashboard Object builders Metadata migration Next 3 months

What does it involve? Aggregate DRS1 files into objects – Different object types = content models Generate an object descriptor per object

Document example PDF file

Document example PDF file New object (content model = DOCUMENT)

Document example PDF file Descriptor file New object (content model = DOCUMENT)

Still image example Archival master image file

Still image example Archival master image file Production master image file

Still image example Archival master image file Deliverable image file Production master image file

Still image example Archival master image file New object (content model = STILL IMAGE) Deliverable image file Production master image file

Still image example Archival master image file Descriptor file Deliverable image file Production master image file New object (content model = STILL IMAGE)

Aggregate DRS1 files into objects One content file per object – Color profile – Document – Google document container 1 – Google document container 2 – Google document container 3 – Opaque container – Text

Aggregate DRS1 files into objects Multiple content files per object – Audio – Web harvest – Biomedical image – PDS document – Target image – MOA2 – Still image

Generate object descriptors METS format – Embedded schemas (PREMIS, MODS, MIX, etc.) Metadata sources – DRS1 database – DRS1 METS files where they exist – Examining the content files – Catalog records?

PRELIMINARY RESULTS: CONTENT ANALYSIS

Preliminary content analysis Conceptually “built” objects for 13/14 content models (~36 million / 44 million files) – All but still image – Order helps! Still Image MOA2 Biomedical Image PDS Document

Preliminary content analysis 1,091,670 objects from 36,190,120 files – ~33 files per object Relatively few surprises but content analysis is not complete

Content cleanup MOA2 files (8,024) Index maps (2,686) Entity files (1) Merged PDS descriptors (22,203)

Content cleanup Orphaned target image (5), target description files (4) Orphaned audio files (71)

METADATA OPTIONS

O DRS1DRS2 e.g., billingCode ownerCode accessFlag tech metadata owner-suppliedName role purpose quality usageClass e.g., accessFlag tech metadata owner-suppliedName role processing quality usageClass e.g., billingCode ownerCode owner-suppliedName FILE INFO OBJECT INFO DESCRIPTOR

O DRS1DRS2 e.g., billingCode ownerCode accessFlag tech metadata owner-suppliedName role purpose quality usageClass e.g., accessFlag tech metadata owner-suppliedName role processing quality usageClass e.g., billingCode ownerCode owner-suppliedName FILE INFO OBJECT INFO DESCRIPTOR

O DRS1DRS2 e.g., billingCode ownerCode accessFlag tech metadata owner-suppliedName role purpose quality usageClass accessFlag tech metadata owner-suppliedName role processing quality usageClass billingCode ownerCode owner-suppliedName caption unit name view text FILE INFO OBJECT INFO DESCRIPTOR METS Object Label MODS PDS info, etc. Object Label Object-level MODS

Objects Owner supplied name is required Need to generate during migration Four cases – A METS file exists – New object will be built from a single content file – New object will be built from multiple content files – No OSN (potential case) Proposal for most cases: – add prefix or suffix to METS or content file owner supplied name

Objects Other required object elements – insertionDate date of earliest file? – captionBehavior for existing objects, set based on billing code prospectively, set by depositor – viewText available for all objects, not just PDS default to off

Objects Descriptive metadata – Take MODS from existing METS as is or import new From Aleph From Finding Aid – If re-imported, update METS label or not? – Import from OLIVIA based on owner supplied name for the file?

Objects from existing METS Identifiers for Harvard metadata – Identify finding aid identifiers – Convert “Old HOLLIS” numbers – Aleph IDs: include check digit or not? – Convert to URIs or actionable URNs from plain IDs Could DRS format such URIs for new DRS2 input?

Objects from existing METS PDS elements – PDF owner text becomes caption unit name – viewOcr function becomes viewText – goto function will be automatically determined by presence of structMap/div attributes Caption behavior – for existing objects, set by billing code

Files Run automated processes to identify, validate and characterize file technical characteristics Extract technical metadata

Files isFirstGenerationinDrs – Values: yes, no, unspecified – Should we supply “yes” for archival masters and/or top of derivation chain?

Image Files Converting from local scheme to MIX Local field questions – Methodology – History – Source – Enhancements

Text files Converting from local scheme to textMD Descriptor_type will be absorbed into different places in DRS2 Extracted metadata can supply markup_basis markup_language for specific schemas possibly other elements

Audio files Moving from local schema to AES : Audio object structures for preservation and restoration

Versioned metadata History will be tracked for key administrative elements: – Access flag – Admin flag (new) – Billing code – Owner code What values to assign for required creation date and agent for migrated content?

NEXT STEPS

Next steps Continue analysis and development of technical requirements Build prototype September check-in on progress Create metadata migration plan Open meeting to review plan

OPEN FOR QUESTIONS