Download presentation
Presentation is loading. Please wait.
Published byCayden Bryars Modified over 9 years ago
1
DRS 2 Metadata Migration June 25, 2013
2
Agenda Introduction Preliminary results - content analysis Metadata options Next steps Questions
3
INTRODUCTION
4
Reason for metadata migration Different data model – File -> Object (a coherent set of content that is considered a single intellectual unit for purposes of description, use and/or management: for example a particular book, web harvest, serial or photograph.) Different metadata schemas – Many locally-defined -> community-standard Different packaging of metadata – Use of METS in some cases -> consistent use of METS
5
Path to metadata migration Analysis Metadata Content Users Prototype Proof-of- concept Time estimates Migration plan Sequence Schedule Develop tools Dashboard Object builders Metadata migration We are here
6
Key feedback points Analysis Metadata Content Users Prototype Proof-of- concept Time estimates Migration plan Sequence Schedule Develop tools Dashboard Object builders Metadata migration Technical options Process options
7
Timing Analysis Metadata Content Users Prototype Proof-of- concept Time estimates Migration plan Sequence Schedule Develop tools Dashboard Object builders Metadata migration Next 3 months
8
What does it involve? Aggregate DRS1 files into objects – Different object types = content models Generate an object descriptor per object
9
Document example PDF file
10
Document example PDF file New object (content model = DOCUMENT)
11
Document example PDF file Descriptor file New object (content model = DOCUMENT)
12
Still image example Archival master image file
13
Still image example Archival master image file Production master image file
14
Still image example Archival master image file Deliverable image file Production master image file
15
Still image example Archival master image file New object (content model = STILL IMAGE) Deliverable image file Production master image file
16
Still image example Archival master image file Descriptor file Deliverable image file Production master image file New object (content model = STILL IMAGE)
17
Aggregate DRS1 files into objects One content file per object – Color profile – Document – Google document container 1 – Google document container 2 – Google document container 3 – Opaque container – Text
18
Aggregate DRS1 files into objects Multiple content files per object – Audio – Web harvest – Biomedical image – PDS document – Target image – MOA2 – Still image
19
Generate object descriptors METS format – Embedded schemas (PREMIS, MODS, MIX, etc.) Metadata sources – DRS1 database – DRS1 METS files where they exist – Examining the content files – Catalog records?
20
PRELIMINARY RESULTS: CONTENT ANALYSIS
21
Preliminary content analysis Conceptually “built” objects for 13/14 content models (~36 million / 44 million files) – All but still image – Order helps! Still Image MOA2 Biomedical Image PDS Document
22
Preliminary content analysis 1,091,670 objects from 36,190,120 files – ~33 files per object Relatively few surprises but content analysis is not complete
23
Content cleanup MOA2 files (8,024) Index maps (2,686) Entity files (1) Merged PDS descriptors (22,203)
24
Content cleanup Orphaned target image (5), target description files (4) Orphaned audio files (71)
25
METADATA OPTIONS
26
O DRS1DRS2 e.g., billingCode ownerCode accessFlag tech metadata owner-suppliedName role purpose quality usageClass e.g., accessFlag tech metadata owner-suppliedName role processing quality usageClass e.g., billingCode ownerCode owner-suppliedName FILE INFO OBJECT INFO DESCRIPTOR
27
O DRS1DRS2 e.g., billingCode ownerCode accessFlag tech metadata owner-suppliedName role purpose quality usageClass e.g., accessFlag tech metadata owner-suppliedName role processing quality usageClass e.g., billingCode ownerCode owner-suppliedName FILE INFO OBJECT INFO DESCRIPTOR
28
O DRS1DRS2 e.g., billingCode ownerCode accessFlag tech metadata owner-suppliedName role purpose quality usageClass accessFlag tech metadata owner-suppliedName role processing quality usageClass billingCode ownerCode owner-suppliedName caption unit name view text FILE INFO OBJECT INFO DESCRIPTOR METS Object Label MODS PDS info, etc. Object Label Object-level MODS
29
Objects Owner supplied name is required Need to generate during migration Four cases – A METS file exists – New object will be built from a single content file – New object will be built from multiple content files – No OSN (potential case) Proposal for most cases: – add prefix or suffix to METS or content file owner supplied name
30
Objects Other required object elements – insertionDate date of earliest file? – captionBehavior for existing objects, set based on billing code prospectively, set by depositor – viewText available for all objects, not just PDS default to off
31
Objects Descriptive metadata – Take MODS from existing METS as is or import new From Aleph From Finding Aid – If re-imported, update METS label or not? – Import from OLIVIA based on owner supplied name for the file?
32
Objects from existing METS Identifiers for Harvard metadata – Identify finding aid identifiers – Convert “Old HOLLIS” numbers – Aleph IDs: include check digit or not? – Convert to URIs or actionable URNs from plain IDs Could DRS format such URIs for new DRS2 input?
33
Objects from existing METS PDS elements – PDF owner text becomes caption unit name – viewOcr function becomes viewText – goto function will be automatically determined by presence of structMap/div attributes Caption behavior – for existing objects, set by billing code
34
Files Run automated processes to identify, validate and characterize file technical characteristics Extract technical metadata
35
Files isFirstGenerationinDrs – Values: yes, no, unspecified – Should we supply “yes” for archival masters and/or top of derivation chain?
36
Image Files Converting from local scheme to MIX Local field questions – Methodology – History – Source – Enhancements
37
Text files Converting from local scheme to textMD Descriptor_type will be absorbed into different places in DRS2 Extracted metadata can supply markup_basis markup_language for specific schemas possibly other elements
38
Audio files Moving from local schema to AES57-2011: Audio object structures for preservation and restoration
39
Versioned metadata History will be tracked for key administrative elements: – Access flag – Admin flag (new) – Billing code – Owner code What values to assign for required creation date and agent for migrated content?
40
NEXT STEPS
41
Next steps Continue analysis and development of technical requirements Build prototype September check-in on progress Create metadata migration plan Open meeting to review plan
42
OPEN FOR QUESTIONS
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.