Besser--JISC Image Metadata 6/20/02 4 Serious Longevity Problems What we know from prior widespread digital file formats Images separating from their metadata Inaccessibility of software needed to view an image Inability to even decode the file format of an image
Besser--JISC Image Metadata 6/20/02 5 Traditional Digital Library Model DL user search & presentation
Besser--JISC Image Metadata 6/20/02 6 Ideal Digital Library Model DL user search & presentation
Besser--JISC Image Metadata 6/20/02 7 For Interoperability Digital Libraries Need Standards Descriptive Metadata for consistent description Discovery Metadata for finding Administrative Metadata for viewing and maintaining Structural Metadata for navigation ... Terms & Conditions Metadata for controlling access...
Besser--JISC Image Metadata 6/20/02 8 Why are Standards and Metadata consensus important? Managing digital files over time Longevity Interoperability Veracity Recording in a consistent manner Will give vendors incentive to create applications that support this
Besser--JISC Image Metadata 6/20/02 9 Philosophical Metadata Decisions- _ Warwick vs MARC _ Where to put the metadata
Besser--JISC Image Metadata 6/20/02 10 Containers and Packages of Metadata Warwick, not MARC _ modular _ overlapping _ extensible _ community-based _ designed for a networked world to aid commonality btwn communities while still providing full functionality within each community
Besser--JISC Image Metadata 6/20/02 11 Some different schemes where Metdata is kept _ embedded withing the object (HTML tags) _ in a separate related DB maintained by same organization (OPAC, MOA II) _ in a separate DB maintained by a separate organization (Books in Print, ratings systems) _ derived on-the-fly from a different scheme (MARC-to-DC)
Besser--JISC Image Metadata 6/20/02 13 Dublin Core--further work _ Warwick Framework – metadata packages for extensible functions – layed groundwork for RDF _ Canberra Qualifiers – refining the semantics of the element set to provide more precise info – SUBELEMENT, SCHEME, LANG _ Granularity – no hierarchical relationships w/i a given DC record; only one record per discrete object (collection or item-level), and relationship field plus qualifier links them
The Research Process and Functional Categories of Metadata _ Discovery _ Retrieval _ Collation _ Analysis _ Re-presentation
Besser--JISC Image Metadata 6/20/02 15 Structural & Administrative Metadata- Making of America II (MOA2) Metadata Encoding & Transmission Standard (METS)
Besser--JISC Image Metadata 6/20/02 16 MOA II Classes of Objects Continuous Tone Photos Photo Albums Diaries, journals, letterpress books Ledgers Correspondence
Besser--JISC Image Metadata 6/20/02 17 MOA II Metadata _ Administrative Metadata – for enhancing resource management _ Structural Metadata – for reflecting internal hierarchies and relationships btwn parts _ Raw/Seared/Cooked
Besser--JISC Image Metadata 6/20/02 19 MOA II Best practices Use/Users/Collection: Benchmarking Masters vs. Derivatives Scanning- Administrative Metadata- Structural Metadata-
Besser--JISC Image Metadata 6/20/02 20 Scanning Best Practices _ Think about users (and potential users), uses, and type of material/collection _ Scan at the highest quality that does not exceed the likely potential users/uses/material _ Do not let today’s delivery limitations influence your scanning file sizes; understand the difference between digital masters and derivative files used for delivery _ Many documents which appear to be bitonal actually are better represented with greyscale scans _ Include color bar and ruler in the scan _ Use objective measurements to determine scanner settings (do NOT attempt to make the image good on your particular monitor or use image processing to color correct) _ Don’t use lossy compression _ Store in a common (standardized) file format _ Capture as much metadata as is reasonably possiple (including metadata about the scanning process itself)
Besser--JISC Image Metadata 6/20/02 21 Why Scale is important
Besser--JISC Image Metadata 6/20/02 22 Administrative Metadata to uniquely identify a digital resource and manage it over time _ Information about where the various pieces/versions of the object reside _ Information to view the digital object _ Information about the scanning process
Besser--JISC Image Metadata 6/20/02 23 Structural Metadata: that which is relevant to presentation of the digital object to the user _ metadata defining the "object”: a book, a diary, a photo album _ metadata defining the “sub-objects”: pages (physical) or chapters and subheads (intellectual)
Besser--JISC Image Metadata 6/20/02 24 Other Types of Metadata- _ Actors Metadata _ Longevity _ Identification/Provenance _ Rights Management
Besser--JISC Image Metadata 6/20/02 25 Reference Models for Digital Libraries: Actors and Roles DELOS/NSF Working Group http://www.delos-nsf.actorswg.cdlib.org/
Besser--JISC Image Metadata 6/20/02 27 Multimedia & Collaborative Authorship imply _ Not only: –Authors –Editors –Publishers _ But also creators of –Text –Illustrations –Composers –Musicians...
Besser--JISC Image Metadata 6/20/02 28 And goes beyond conventional authors _ Others that are part of digital library process –Users –Catalogers –Reference librarians _ Even other groups/entities –Software agents –Mediators –Special rights holders...
Besser--JISC Image Metadata 6/20/02 29 Borbinha’s “naive tentative sketch” of the problem... User Registered Anonymous Librarian Agent CreatorEditor Distributor Preservation Publication LicensingAcquisition RegistrationDissemination Search Digital Library Access
Besser--JISC Image Metadata 6/20/02 30 Benefits for _ Linking metadata to authority records _ Rights management _ Privacy protection
Besser--JISC Image Metadata 6/20/02 31 Deliverables _ Workshop proceedings: proceedings with invited contributions and papers selected from a call, intended to be a reference source for the current state of the art. _ White paper: –Definition and introduction to the problem. –Description and analysis of the requirements. –A proposal to the community for a reference model, focusing on definitions of key concepts, terminology, classes of agents, services, relationships, etc. –Proposals for an international agenda for further technical and collaborative developments.
Besser--JISC Image Metadata 6/20/02 32 Core group DELOS (Europe) _ José Borbinha, National Library of Portugal (DELOS coordinator) _ Michel Mabe, Elsevier Science, UK (Publishing industry) _ Peter Mutschke, Social Science Information Centre, Germany (Software agents, Information Retrieval) _ Hans-Jörg Lieder, Berlin State Library, Germany (LEAF project) _ Gunnar Karlsen, University of Bergen, Norway (Archives) WIPO – World Intellectual Property Organisation _ Glenn Macstravic NSF (USA) _ John Kunze, University of California, USA (NSF coordinator) _ Barbara Tillett, Library of Congress, USA (Libraries) _ Becky Dean, OCLC, USA (Libraries services) _ Angela Spinazze, CIMI/RLG, USA (Museums) _ Howard Besser, University of California, USA (Multimedia and digital art production) DCMI - Dublin Core Metadata Initiative _ Warwick Cathro, National Library of Australia
Besser--JISC Image Metadata 6/20/02 33 Work plan Phase 1: Starting (March - April 2002) _ Tuning objectives, scope, and action plan _ Identification of reference sources _ Call for contributions to the workshop Phase 2: Internal Discussion (May - June 2002) _ Analysis of the problem _ Draft paper Phase 3: Public Discussion (July - October 2002) _ Expose the draft paper. Promote open public discussion _ Workshop in Portugal (July 3-5). Workshop report _ Draft paper (second version) Phase 4: Conclusions (November - December 2002) _ Review of the work done... _ Final report
Besser--JISC Image Metadata 6/20/02 34... Actors and Roles ???
Besser--JISC Image Metadata 6/20/02 35 Recent Digital Preservation Activities - The Problem Preservation Repositories Preservation Metadata Other Digital Preservation Activities Special concerns of Cult Heritage community
Besser--JISC Image Metadata 6/20/02 36 Serious Longevity Problems What we know from prior widespread digital file formats Previous formats required little ongoing intervention (remote storage facilities, Iron Mtn); digital formats require intense ongoing management The Short Life of Digital Info-
Besser--JISC Image Metadata 6/20/02 37 The Short Life of Digital Info: Digital Longevity Problems Disappearing Information The Viewing Problem The Scrambling Problem The Inter-relation Problem The Custodial Problem The Translation Problem
Besser--JISC Image Metadata 6/20/02 38 Older Longevity Projects http://sunsite.berkeley.edu/Longevity/ CPA Task Force Getty “Time & Bits” Conference & Follow-ups- Preservation experiments in US and Europe NEDLIB, CURL, Michigan Internet Archive Long Now
Besser--JISC Image Metadata 6/20/02 39 Preservation Repositories: Projects based on OAIS Model CEDARS NEDLIB Pandora CDL OCLC/RLG Working Group on Preservation Metadata, Attributes of a Trusted Digital Repository, August 2001-
Besser--JISC Image Metadata 6/20/02 40 Preservation Metadata OCLC/RLG Working Group on Preservation Metadata, Preservation Metadata for Digital Objects: A Review of the State of the Art, January 31 2001 OCLC/RLG Working Group on Preservation Metadata, A Recommendation for Content Information, October 2001
Besser--JISC Image Metadata 6/20/02 42 LC’s National Digital Information Infrastructure and Preservation Program _ Authorized Dec 2000 _ LC, Dept of Commerce, NARA, White House Office of Sci & Tech Policy _ with help from CLIR, NLM, NAL, OCLC, RLG _ Ongoing collab process _ Commissioned papers on preserving: the Web, periodicals, digital sound, E-Books, Digital TV, Digital Video
Besser--JISC Image Metadata 6/20/02 43 InterPARES International Research on Permanent Authentication Records in Electronic Systems _ Ongoing international archival world project examining how to make electronically-generated records last over time _ Developing the theoretical and methodological knowledge needed, then will formulate model policies, strategies, and standards _ Next year will be extended to include images and rich media
Besser--JISC Image Metadata 6/20/02 44 Electronic Resource Preservation and Access NETwork (ERPANET) _ Best practices and skills development for digital preservation of cultural heritage and scientific objects _ 3 year project launched Nov 2001; 1.2 million Euros
Besser--JISC Image Metadata 6/20/02 50 Complexity of Rich Media _ Works often have artistic nature (including video games) _ Enormous number of elements can, at times, be very important to preserve (pacing, original artifact, elements used to construct the artifact) _ Too complex to save every one of these aspects for every type of material _ Importance of saving documentation
Besser--JISC Image Metadata 6/20/02 51 What can we do specific to Electronic Art? _ Works themselves may no longer even exist; in many cases, what we can save amounts to forensic evidence _ Enormous number of elements can, at times, be very important to preserve (pacing, original artifact, elements used to construct the artifact) _ Too complex to save every one of these aspects for every type of material _ Importance of saving pieces, representations, and documentation _ Involve the artists to capture their intentions _ Importance of Standards _ Familiarize ourselves with recent conservation developments (Who Knows?, TechArcheology, Tate, IMAP)
Besser--JISC Image Metadata 6/20/02 52 Standards for encoding artists intentions (group efforts w/i Cult Heritage community) _ Artists Interviews Project, Netherlands Institute for Cultural Heritage 1998-1999, Modern Art: Who Cares (http://www.icn.nl/english/6.4.2.html) _ TechArcheology: A Symposium on Installation Preservation (SFMOMA) _ More recent SFMOMA/Tate collaborations _ IMAP _ Guggenheim’s Variable Media
Besser--JISC Image Metadata 6/20/02 53 Structural Metadata Standards for Encoding Multimedia- (no time for details) _ SMIL _ MPEG 4
Besser--JISC Image Metadata 6/20/02 54 Identification/Provenance (Images)- The number of variant forms of a work can be enormous Image Families A digital image frequently has many layers of parentage Information about the parentage that can indicate the quality and veracity of the image (Dublin Core "Source" and "Relation") how to deal with different versions derived from the same scan or different encoding schemes Vocabulary Standards to express this
Besser--JISC Image Metadata 6/20/02 55 The number of variant forms of a work can be enormous different views of the same object different scans of the same photo different resolutions different compression schemes different compression ratios different file storage formats different details of the same image ...
Besser--JISC Image Metadata 6/20/02 57 Identification/Provenance how to deal with different versions (browse, hi-res, medium res) derived from the same scan or different encoding schemes (TIFF, PICT, JFIF) Vocabulary Standards to express this – VRA Surrogate Categories – CIMI's "Image Elements”
Besser--JISC Image Metadata 6/20/02 58 NISO/DLF Image Metadata Workshop ( Z39.87-2002 draft) Possible Goals Metadata fields Rules for Field Contents (authority control) Core set of necessary fields Syntax for expressing fields and contents (headers)
Besser--JISC Image Metadata 6/20/02 59 Image Metadata Focus on Metadata that may prove helpful for management use preservation ...
Besser--JISC Image Metadata 6/20/02 62 Image Metadata Break-out Group Considerations Providing standard practices for a wide body of metadata vs. practicality and costs of gathering that metadata Distinction between documentation of the physical production of the digital image and the more intellectual links to what that image represents What can we all agree upon? How should we handle disagreements?
Besser--JISC Image Metadata 6/20/02 63 Image Metadata Break-out Groups: Work Done Characteristics and Features of Images Image Production and Reformatting Features Image Identification and Integrity
Besser--JISC Image Metadata 6/20/02 64 Image Metadata Characteristics and Features of Images Format issues Resolution issues Color issues Compression stuff Other characteristics Characteristics passed on to other groups Guiding Principles
Besser--JISC Image Metadata 6/20/02 65 Image Metadata Characteristics: Format issues MIME type (M) File Format (M) File Size (O) Class ID/ ‘Genotype’ (Desirable)
Besser--JISC Image Metadata 6/20/02 70 Image Metadata Characteristics: Passed to other groups Date & Time ( to both ) Image Enhancement ( to both ) Audit Trail ( to both ) Dimensions of original object ( ‘Descriptive’ ) Reflective/ Transmission ( to Production ) Lamp/Sensor characteristics ( to Production ) Identification ( to Identification )
Besser--JISC Image Metadata 6/20/02 71 Image Metadata Characteristics: Guiding principles Metadata that is not directly ‘actionable’ should not necessarily be in the file header Metadata should have a long–term utility We should specifically deal with the image at hand We should be aware of the cost of omission Those elements described as (M) are only mandatory if not already represented within the file format in question.
Besser--JISC Image Metadata 6/20/02 72 Image Metadata Image Production and Reformatting Features Need to document the intent of the reformatting Document what you do Use full-text field to include all the attributes of the scanning process Include a target with the digital image
Besser--JISC Image Metadata 6/20/02 73 Image Metadata Image Identification and Integrity Administrative and Descriptive metadata are hard to think about separately The identification and integrity problems for images are often the same as other digital files, but the solutions are not How to handle future situations when home digital camera images eventually enter archives? Often bad image metadata is good Verification needs to deal with the digital object, not a particular file format representation of that object Need a vocabular for expressing generational relationships (Image Families) “When” is easy compared to “Who”, “Why”, and “What”
Besser--JISC Image Metadata 6/20/02 74 NISO/DLF Image Metadata Workshop Where do we go from here? More work in this area? Extend this kind of work to other areas? Creating best practices? Getting buy-in from larger communities? What new groups do we need to reach out to?
Besser--JISC Image Metadata 6/20/02 75 Other Metadata _ Description of depiction/surrogate (What VRA calls its "Surrogate Categories") _ Description of original object _ Rights and Reproduction Information _ Location Information
Besser--JISC Image Metadata 6/20/02 76 Data Structures: The VRA Core 28 elements specifically for visual resource collections Work Description Categories- Visual Document Description Categories- http://www.oberlin.edu/~art/vra/dsc.html
Besser--JISC Image Metadata 6/20/02 77 VRA Core: Work Description Categories Work type Title Measurements Material Technique Creator Role Date Repository name Repository place _ Repository number _ Current site _ Original site _ Style/period/group/movem ent _ Nationality/culture _ Subject _ Related work _ Relationship type _ Notes
Besser--JISC Image Metadata 6/20/02 80 LCSH very general
Besser--JISC Image Metadata 6/20/02 81 Thesaurus for Graphic Materials designed for subject indexing of pictorial materials, particularly large general collections of historical images for cataloging and retrieval good for general audiences and broad approaches to the material TGM-I: Subject Terms & TGM-II: Genre and Physical Characteristic Terms http://lcweb.loc.gov/rr/print/tgm/toc.html
Besser--JISC Image Metadata 6/20/02 82 AAT 120,000 terms for describing objects, textual materials, images, architecture, and material culture from antiquity to present large and complex http://www.getty.edu/gri/vocabularies/
Besser--JISC Image Metadata 6/20/02 84 Thesaurus of Geographic Names over 1 million records hierarchical and global throughout history most records include coordinates and descriptive notes
Besser--JISC Image Metadata 6/20/02 85 Metadata for Digital Commerce DOI -
Besser--JISC Image Metadata 6/20/02 86 formal structure for describing and uniquely identifying intellectual property itself, the people and businesses involved in its trading, and the agreements which they make about it (primarily for publishing, music, and visual arts) will develop high-level specifications for the services that will be required to implement a global IP trading system based on this generic data model focus is on encoding rights at a high level, not on resource discovery likely to involve metadata schma registration and directory to allow interoperation of personal identifiers for rightsholders and users supported by EEC DG-13 First meeting July 1999 http://www.indecs.org/
Besser--JISC Image Metadata 6/20/02 88 Crosswalks mapping btwn differing metadata structures eliminate the need for monolithic, universally adopted standards focus on flexibility and interoperatiblity RDF-based metadata registries
Besser--JISC Image Metadata 6/20/02 89 Crosswalk Example
Besser--JISC Image Metadata 6/20/02 90 Resource Description Framework (RDF, spec released 2/99) _ W3C Metadata activity _ designed to move the Web beyond simple links to semantically-rich relationships btwn resources _ metadata application using XML as a common syntax for exchange and processing _ flexible architecture for managing diverse application- specific metadata packets that can be processed by machines _ associates resources, property types, and corresponding values _ http://www.w3.org/RDF/
Besser--JISC Image Metadata 6/20/02 91 RDF _ Resources (character strings, names, digital objects) _ Property (“is the author of”) _ Value _ resources+properties=relationships _ many different relationships can be reflected
Besser--JISC Image Metadata 6/20/02 93 Should you start building with RDF today? _ Tools are primitive _ Standard still likely to evolve
Besser--JISC Image Metadata 6/20/02 94 Metadata for Digital Libraries Howard Besser UCLA School of Education & Information Baca, Murtha (ed). Introduction to Metadata, Los Angeles: Getty Information Institute, 1998 http://www.getty.edu/gri/standard/intrometadata/ http://www.gseis.ucla.edu/~howard/Metadata/UC-May00/ http://sunsite.berkeley.edu/Metadata/sp2000.html http://sunsite.berkeley.edu/Longevity/ http://www.oclc.org/digitalpreservation/presmeta_wp.pdf http://is.gseis.ucla.edu/us-interpares/ http://www.niso.org/commitau.html http://sunsite.berkeley.edu/moa2/ http://www.ifla.org/II/metadata.htm http://www.gseis.ucla.edu/~howard/image-meta.html http://sunsite.berkeley.edu/Imaging/Databases/#standards
Besser--JISC Image Metadata 6/20/02 96 Preservation Repositories: Open Archival Info System Model High-level reference model describing submission, organization and management, and continuing access Conceptual framework for different organizations to share discussions with a common language Producers, consumers, management, actual repository SIP, DIP, AIP AIP consists of data objects plus representation info (Content, Preservation Description, Packaging, Descriptive) Originally developed for Space Science community
Besser--JISC Image Metadata 6/20/02 98 OCLC/RLG Selected Recommendations _ Policies, Certification processes, Risk management, Persistent ID, Migration/Emulation experiments _ Stakeholders meet to decide how to describe what is in a dig repository _ Examine special properties of particular classes of digital objects _ Technical standards for exchange and interoperability btwn repositories _ Develop projects and case studies _ Copyright issues
Besser--JISC Image Metadata 6/20/02 100 E-Journal Archiving _ Issues –License, don’t own; may not be even able to obtain right to make archival copy –Increasingly no paper back-up at all –Usually we don’t have the important redundancy factor _ Mellon funded projects (2001) –Yale, Harvard, Penn working w/individual publishers –Cornell, NYPL--specific disciplines –MIT exploring characteristics that change (dynamic)\ –Stanford--archiving software tools
Besser--JISC Image Metadata 6/20/02 101 NEA 2001 grant to BAVC for $150,000 _ “To support development and dissemination of a DVD that contains a curriculum for the preservation of electronic art. The DVD will feature a preservation overview; discussions with conservators, artists, curators and technicians; a curriculum to train professionals in the field and project case studies to conserve electronic art.”
Besser--JISC Image Metadata 6/20/02 102 A few questions the NINCH community should address _ Special issues raised by non-library institutions _ Special issues raised by images and rich media _ What is the work (or salient points we need to preserve)? _ Bring the arts communities (artist intent, BAVC) together with the preservation repository communities and the preservation metadata communities _ Specifically get Cult Heritage communities involved with the selected OCLC/RLG recommendations _ Get cult heritage groups started on working to make sure that structure standards incorporate our works _ What organizations will take responsibility to save today’s digital “ephemeral” materials (online ‘zines, arts discussion groups, etc.)?