Metadata for Digitization and Preservation

Slides:



Advertisements
Similar presentations
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Advertisements

METS Awareness Training An Introduction to METS Digital libraries – where are we now? Digitisation technology now well established and well-understood.
Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Standards showcase: MODS, METS, MARCXML ALA Annual 2006 Rebecca Guenther and Jackie Radebaugh Network Development and MARC Standards Office Library of.
METS: An Introduction Structuring Digital Content.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
An Introduction to MODS: The Metadata Object Description Schema Tech Talk By Daniel Gelaw Alemneh October 17, 2007 October 17, 2007.
An Introduction to Metadata by Wendy Duff ECURE 2000 October 6, 2000.
Metadata: An Introduction By Wendy Duff October 13, 2001 ECURE.
Kristin Eberle Monica Hampton Carmen Velasquez Kristin Eberle Monica Hampton Carmen Velasquez Knowledge Management.
THE RUTGERS WORKFLOW MANAGEMENT SYSTEM Mary Beth Weber Cataloging and Metadata Services Rutgers University Libraries August 3, 2007.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
OLC Spring Chapter Conferences Metadata, Schmetadata … Tell Me Why I Should Care? OLC Spring Chapter Conferences, 2004 Margaret.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
OCLC Online Computer Library Center OCLC’s Digital Archive – Disseminating with METS Jay Goodkin Software Engineer Digital Collection and Preservation.
Guest Lecture LIS 656, Spring 2011 Kathryn Lybarger.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
‘The Universal Catalogue’ a cultural sector viewpoint David Dawson Senior Policy Adviser (Digital Futures) Museums, Libraries and archives Council.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
JENN RILEY METADATA LIBRARIAN IU DIGITAL LIBRARY PROGRAM Introduction to Metadata.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
Integrating a Statewide Web Gateway With Digital Collections ______________________ Eric Weig and Beth Kraemer University of Kentucky and KCVL.
Robin L. Dale Director of Digital & Preservation Services LYRASIS Getting Started with the Digital Commonwealth.
Introduction to Omeka. What is Omeka? - An Open Source web publishing platform - Used by libraries, archives, museums, and scholars through a set of commonly.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Metadata Bridget Jones Information Architecture I February 23, 2009.
Introduction to metadata
Best Practices for Digital Imaging and Metadata Roy Tennant The Library, University of California, Berkeley
Introduction to Metadata Jenn Riley Metadata Librarian IU Digital Library Program.
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Digitization – Basics and Beyond workshop Interoperability of cultural and academic resources New services for digitized collections Muriel Foulonneau.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
Metadata Standards in Various Environments Spring January, 2006 Bharat Mehra IS 520 Organization and Representation of Information School of Information.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
A RCHIVAL COLLECTIONS IN A D IGITAL W ORLD Cheryl Walters Nov. 6, 2008.
A centre of expertise in digital information management UKOLN is supported by: Metadata – what, why and how Ann Chapman.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
Metadata & Repositories Jackie Knowles RSP Support Officer.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
Ktisis: Building an Open Access Institutional and Cultural Repository Alexia Kounoudes, Petros Artemi, Marios Zervas Library and Information Services,
Building A Repository for Digital Objects
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Introduction to Metadata
VI-SEEM Data Repository
Metadata for research outputs management
Metadata to fit your needs... How much is too much?
A Whirlwind Tour Through Part of the Metadata Landscape
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
Metadata in Digital Preservation: Setting the Scene
Oya Y. Rieger Cornell University Library May 2004
Presentation transcript:

Metadata for Digitization and Preservation

Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is stored Metadata standards How much will it cost?

What is metadata? Metadata is data that facilitates the management, description, and preservation of a digital object or aggregation of digital objects. The creation of metadata is governed by a body of standards, best practices and schemas that, when appropriately applied, work together to facilitate the management, description, and preservation of digital objects. This is my definition for this workshop – you can agree or not. It is specific to the information professional community – doesn’t apply to other communities that use metadata in different ways.

What is metadata? Tony Gill – ARTstor/CJH Metadata refers to structured descriptions, stored as computer data, that attempt to describe the essential properties of other discrete computer data objects. Big picture definition: the sum total of what can be said about any information object at any level of aggregation

What is metadata for? World Wide Web consortium say metadata is: to provide a means to discover that the data set exists and how it might be obtained or accessed to document the content, quality, and features of a data set, indicating its fitness for use. Therefore we need to think: content, context and structure

Why Does Metadata Matter? “Doing research on the Web is like using a library assembled piecemeal by packrats and vandalized nightly.” – R. Ebert, Internet Life Finding the needle in the haystack Managing thousands of identical looking needles Finding visual materials without viewing them Expanding use Preserving content and context

Key Elements Administrative Metadata – used in managing and administering information resources Descriptive Metadata – used to describe or identify information resources Preservation Metadata – related to the preservation management of information resources Structural Metadata – used for control over complex digitized objects Technical Metadata – related to how a system functions or metadata behave Use Metadata – related to the level and type of use of information resources

Structure of metadata Collection Work Item

How metadata is created By software tools From resource content e.g. catalogues or databases From creation tool e.g. digital camera or file header By human intervention Description by resource creator/owner Description by third party provider e.g. technical metadata Creating and maintaining good metadata is time consuming and high cost

Where metadata is stored Embedded in the resource XIF information with TIFF images – viewable in Photoshop File headers or invisible copyright watermarking Linked to resource Created as record in database format

Metadata Standards Dublin Core DIG35 – for technical metadata http://vads.ahds.ac.uk/guides/creating_guide/sect43.html DIG35 – for technical metadata www.i3a.org/I_dig35.html Categories for the Description of Works of Art (CDWA) www.getty.edu/research/institute/standards/cdwa/ Visual Resources Association Core Categories www.vraweb.org/ SEPIA working group www.knaw.nl/ecpa/sepia/workinggroups/wp5/cataloguing.html Resource Description Framework (RDF) Encoded Archival Description (EAD)

How much will it cost? How long is a piece of string? Depends upon the stop points There is no one-size-fits-all or one-cost framework Depends upon the description already in place and how well the collection is currently indexed Inhouse measurement Balance skill, time, and automation Photographs – descriptive metadata will not take <5 minutes per photograph and usually not >30 minutes

Traditional Functions Traditionally we applied these functions to: Paper based and microform based information resources Monographs, serials, photographs, etc. Access provided through local library services Including inter-library loan

New Functions Apply these functions to: Web documents, online serials, digital images, digital collections, web sites, digital audio and video, born digital material, etc. Access provided via the web and email We are facing a new environment with digital resources. Born digital (define if necessary) and digitized resources We provide access in new ways and have to manage the resources in new ways.

Why are these digital objects different? Information explosion Multiple versions Instant access Less physical control over collection Some are surrogates Increased user expectations Preservation is more complex Information explosion – the web has changed everything. People want instant access to information…self-serve via the web or with the assistance of a librarian, often via email. Creates new pressures on librarians to make collections available on the web and to provide reference service in new ways. Key differences are Access and Preservation Access: access often via the web with no reference interview, no interaction with the patron. Preservation: digital preservation is much more difficult

Why do we need metadata to do these things? Provides the necessary tools to manage, preserve and provide access to information in the digital environment Our jobs have not fundamentally changed; but our collections have and our users have ASK AUDIENCE TO DEFINE METADATA

About Metadata Sets Encoding standards/schema Metadata set = rules Encoding schema = representation There are many different metadata standards that cover varied facets of metadata function. However, two major thematic divisions are apparent. Specifically, metadata divides into those standards that may be used commonly for all resources, and metadata standards specific to a particular discipline, sector or domain.   A common metadata schema is a ‘core’ set (i.e. schema) of metadata elements that can be applied to all resources because they answer common functions for metadata to perform. Common metadata functions to consider are: resource discovery, administration, recordkeeping, preservation, rights management, and structural / technical. The common schema may then be supplemented with additional domain-specific schemas of metadata. The add-on nature of schemas has resulted in metadata being described as ‘modular’. Domain-specific metadata refers to metadata that is necessary only for a certain field or discipline. For instance, statistical resources would use both a common schema like Dublin Core (DC) as its ‘core’ metadata set, and a statistical schema to add helpful and relevant elements for searching and retrieving information, such as: “statistical population”, “geographical coverage”, “observation unit”, etc. The various metadata schemas are then collected and organized into schema registries to enable organizations to discover the best fitting schema for their use, and to facilitate standardization and interoperability globally. Finally, as metadata labels or envelopes the information object, encoding standards label or envelope the metadata. Encoding standards are not metadata standards, but they affect how metadata is marked-up or coded, transmitted, accessed, and used.

Metadata Sets AACR2 Dublin Core Visual Resources Association Metadata Object Descriptive Schema Text Encoding Initiative Encoded Archival Description

Encoding Standards/Schema HTML MARC Metadata Encoding Transmission Standards (METS) Resource Description Framework (RDF) XML Z39.50

Choosing Sets and Schema: Interoperability Why is interoperability important? How is it achieved? Crosswalks/mapping Standardization Schema Controlled vocabulary Open Archives Initiative (OAI) Common elements harvested and made searchable from one interface Very basic level of description, working to develop it to make it better Promotes interoperability and allows cross collection searching of metadata. Based on 3 DC elements – title, creator, and description. An OAI harvester can “grab” these three elements from several digitization projects. Then the metadata from all the harvested projects can be made searchable using an OAI server – providing one search interface to all the collections’ metadata. OAI is still being developed. Currently it only supports these three elements and as such, is certainly not as detailed and robust as it could be.

Choosing an Encoding Schema The more digitized objects you have; the more complex they are; the more data sharing you do; the more important it will be to utilize an encoding schema XML is the most prevalent encoding schema All metadata schema have XML based encoding schema already available

Factors in Metadata Decisions for Digitization Projects Audience Workflow and Timelines Preservation Interoperability Number of and complexity of digitized objects Audience (discuss at length in Planning session) -access points, reading level, web interface Workflow and Timelines -creating and managing metadata can be time consuming; plan for how much time and money you need to do it; cost to scan is a few dollars; cost to manage, a recent estimate, may be as high as $70 per object Technical Skills -software requirements, hardware requirements, technical expertise available Preservation -need much more complete metadata if you are digitizing with preservation in mind Interoperability -collaborative projects will require metadata that will work together -ultimately, we want to make these images available in a more centralized way. Adhering to schema and standards facilitates this.

What Do You Want To Do? Digitize for access only? Descriptive Some administrative Digitize for preservation? Administrative Technical Eventually preservation

What Materials Are You Digitizing? The more complex the material, the more complex your metadata Structural metadata becomes vital For example….

Complex Digital Objects Original = 150 page book with 7 chapters Digitization results in 4 versions of the same content 150 master TIFF images 150 JPEG access images 150 JPEG thumbnail images 7 ASCII text transcripts (one per chapter) Files to manage = 457

Complex Digital Objects and Structure Which images belong in which chapter? Which digital version is which? Where is chapter 3 in each version? There is technical metadata for each digital version AND each digital file. How do we relate the correct metadata to the correct version/file? Probably the most practical way to easily show the structure of many types of digitized resources is through the implementation of simple, yet well thought out file structures and naming conventions. However, this process does not result in metadata that records the relationships between several related files.

Digitization and Metadata Descriptive metadata for access and administration Technical metadata for preservation Structural metadata for control over complex digitized objects Preservation metadata for management within a digital archive

Descriptive Metadata Information users will have to gain access to the digitized material Should facilitate access to the original source material whenever possible Access via a web interface search engine User friendly Standardized Well written Read

Common Descriptive Metadata Sets for Digitization Projects Visual Resources Association Metadata Object Descriptive Schema Encoded Archival Description Text Encoding Initiative Dublin Core MARC VRA VRA metadata is based on the Dublin Core; but created specifically for visual resources and the images that document them. Describes both the “work” (original) and the “image” (representation of the work) and allows you to link between the two of them. Like DC; no elements are mandatory, all are repeatable, and they recommend using controlled vocabularies. MODS Developed by LC, particularly for libraries. It is currently listed as draft. A subset of MARC fields that uses language based tags, instead of the alphanumeric MARC tags. Encoded using XML MODS is richer than Dublin Core, yet simpler than MARC. You can convert MARC records to MODS records OR you can create original records using MODS. One drawback is that you cannot easily convert from MODS to MARC since MODS fields often incorporate more than one MARC field. There is no one to one mapping back from MODS to MARC.

Choosing a Set Should we use MARC? Should we use something else? Integrated into existing work Rules for creation already exist Less technical infrastructure necessary Complex – more training Time consuming Should we use something else? Collaborating? Interoperability concerns? Staff expertise Size of project Exhibit and web access

Choosing a Schema Can we use both? MARC for collection level Metadata for item level MARC for all Crosswalked to web accessible database Database for all Crosswalked to MARC

Implementation What informational elements do you need? List them, making sure to think through web design, audience and access issues What descriptive schema schema will you use? MARC Dublin Core VRA MODS A review of how descriptive metadata will be created and maintained. Note that converting to XML is not a requirement, but a best practice. And, if metadata is stored in a database, conversion to XML is just a technical issue that can be confronted in the future as necessary. We’ll talk more about XML later

Implementation Build database or implement content management system for metadata storage Map the fields to the schema you have chosen Document the mapping Create Style Guide for your project Staff creates the metadata manually according to Style Manual and established work processes Metadata is reviewed for quality A review of how descriptive metadata will be created and maintained. Note that converting to XML is not a requirement, but a best practice. And, if metadata is stored in a database, conversion to XML is just a technical issue that can be confronted in the future as necessary. We’ll talk more about XML later

Implementation Metadata is stored and made web accessible XML (if supported) Back-ups, “master” metadata record, and/or web access A review of how descriptive metadata will be created and maintained. Note that converting to XML is not a requirement, but a best practice. And, if metadata is stored in a database, conversion to XML is just a technical issue that can be confronted in the future as necessary. We’ll talk more about XML later

Dublin Core Title Creator Subject /Keywords Description Publisher Contributor Date Audience Resource Type Format Resource Identifier Source Language Relation Coverage Rights Management 15 elements DC began as a metadata effort for web documents. Elements were to be embedded in the header of an HTML document. Designed to be very easy and flexible; for use with high volume materials; and easy to implement without trained catalogers. Although the use of DC within web documents hasn’t really taken off, the standard does have many user communities that are implementing the standard. Including digitization projects.

Characteristics of the Dublin Core All elements optional All elements repeatable All elements displayable in any order Extensible (a starting place for richer description) International Originally intended to include only descriptive information about the DIGITAL object. But, this intention is often overlooked and DC records often include some technical elements and some information about the original source. DC wasn’t created with digitization in mind. It doesn’t always facilitate the description of digitized resources in the best way possible. No good way to include and relate information about the original to information about the digitized object. Thus, implementation of the standard is often customized for digitization. We will see this when we look at the Western States Metadata Guide. Extensible – each of the 15 elements can be refined for richer, more thorough description.

Extensibility Refining mechanism for elements improve sharpness of description with qualifiers Means for extending element set complementary packages of other types of metadata (administrative, rights management, discipline-specific, etc) Two kinds of qualifiers Element Refinement -narrows the meaning of the element -definitions of a refining qualifier are to be publicly available Encoding schemes -established schemes that help interpret the metadata value -ex. Controlled vocabularies, established schemes for representing dates, geographic location, etc. -must be definitive and publicly available

Technical Metadata Information file that facilitates management and preservation of the file Technical information about: Master file (TIFF) Scanning specifications (resolution, bit depth, etc) Derivative Storage – compression Instructor: Pass around single copy of NISO draft standard for audience to look at. There are some resources available that have been created by particular organizations and projects. Including Harvard, Library of Congress, California Digital Library and others. NISO probably provides the most complete listing of technical metadata requirements. A data dictionary, available from the Library of Congress, is in your packets to give you an idea of what kind of information makes up technical metadata. It is not the same as the NISO draft standard.

NISO Metadata Purpose: To define a standard set of metadata elements for digital images Facilitate interoperability Support long term management of and continuing access to digital images Management refers to the tasks and operations needed to support image quality assessment and image data processing throughout the image life cycle. Intended to facilitate the development of applications to validate, manage, migrate and process images of enduring value Refers frequently to TIFFs. Flexible and platform independent format Specification is widely available Header includes rich set of technical information important for long term retention The metadata set, however, is for all file formats, not just TIFF

Tagged Image File Format – Background and Metadata TIFF is a specification for a file format Spec includes a “directory” or “header” section which consists of several metadata fields A TIFF can consist of several images Directory/Header information is unique for each image

Encoding: METS Metadata Encoding and Transmission Standard Product of Making of America project Digital Library Federation Initiative Provides an XML schema for encoding metadata necessary for: management of digital library objects exchange of those objects (OAIS) Brings all the metadata together PASS AROUND SAMPLE METS encoded record

Encoding: METS Five Sections of a METS document Descriptive Administrative File Group Structural Map Behavior A METS document has five sections – read them. We will briefly look at each section

Preservation Metadata If you are digitizing with preservation in mind, ALL metadata is preservation oriented Metadata must be of the highest quality that is possible Incorporate the creation and management of metadata into your project at the planning stage Preservation strategies for digital materials are still unclear. But, we know they will depend on metadata; and the higher quality the metadata, the better off we will be.

Preservation Metadata Designed to facilitate the process of preservation and management in a digital repository Generally implemented at the time a digital resource is moved to a digital archive Several schemas under development for particular operating environments and/or programs INSTRUCTOR: PASS OUT OCLC DIGITAL ARCHIVE METADATA so the audience gets a “sense” of it. All the preservation metadata schemes we will discuss today are based on the Open Archive Information System Reference Model (OAIS). OAIS is a high level model, developed by the scientific community, that models the functions of an archive. The model itself, on first glance, seems relatively technical – but archivists will notice that they are really describing traditional archival functions – but using new language to do so. For example, using this model, you don’t “accession” a record; you “ingest” a record. The OAIS provides information on what kind of metadata will be necessary to manage a digital object when it is within a digital archive. It has also been the basis for several implementations of digital archives. And these implementations have developed their own, unique preservation metadata schemas. So, preservation metadata is metadata that documents and facilitates the management and preservation of a digital object within a digital archive. We are going to briefly look at four preservation metadata schemas. The goal is to make you aware that these exist so that if you are ever at the place of implementing or using a digital archive, you have some very basic knowledge of these schemes; where they came from, etc. Safe to say that all these schemas are really working drafts – although they don’t specifically say that. But, they have not been tested yet; so I do not consider them in their final form.

Preservation Metadata Sets CEDARS – Consortium of University Research Libraries, Exemplars in Digital Archives project www.leeds.ac.uk/cedars/guideto/metadata/ NLA -- National Library of Australia www.nla.gov.au/preserve/pmeta.html NEDLIB – Networked European Deposit Library www.kb.nl/coop/nedlib/results/D4.2/D4.2.htm OCLC Digital Archive www.oclc.org/digitalarchive/about/works/metadata/ Based on the OAIS model. High level elements, with no sub-elements. Although they seem to assume that the elements can and will be narrowed into sub-elements as necessary. Applicable at any level of granularity (collection, object, file) Contains technical, administrative, descriptive and legal elements within the set. There will be some overlap within this set with your DC (or descriptive metadata) and your technical and structural metadata.

Preservation Metadata Inference that there is a core of metadata necessary for preservation regardless of the preservation strategy More work needs to be done to identify the particular elements necessary for particular preservation strategies Again, they do need to be tested and refined. This won’t happen for a little while, since digital objects that are being placed in digital archives have, for the most part, not reached a point where they require preservation action.

Metadata Wrap up New tools for new resources Metadata schema = rules Encoding schema = mark up and storage

Descriptive Metadata Use an established metadata schema Create a project style guide to facilitate standardized, high quality creation Store in content management software or database to provide web access Document the database design and map fields to DC (or other schema) within the documentation Encode and back up using XML, if technically feasible

Technical and Structural Use TIFF Document scanning software used as TIFF has many different “flavors” Use as much of the NISO draft standard as possible – watch for implementation developments, or… Use descriptive schema to collect technical information Structural metadata ( METS) to manage numerous, complex digital objects, or… Documented file naming and structures

Planning Plan for the costs associated with good metadata Creation and research Technical resources (staff, hardware, software, backups) Get a team of appropriate people together Identify goals, elements, and research appropriate schema and encoding Style Guide for descriptive metadata Create the highest quality, most thorough metadata possible in your situation Document mappings

Some Conclusions Metadata is a work in progress at both the community level and the project level Use standards Technical metadata will be easier to implement in time Structural metadata is vital for large projects with complex digital object Preservation metadata isn’t standardized yet