Presentation on theme: "September, 1999 Grace Agnew Metadata Overview Metadata: Data that describes data Structured data about data Pure metadata has meaning only in relation."— Presentation transcript:
September, 1999 Grace Agnew Metadata Overview Metadata: Data that describes data Structured data about data Pure metadata has meaning only in relation to the primary data that is being described.
September, 1999 Grace Agnew Metadata Overview Metadata may be either: Extrinsic: Existing indendepently of the primary data being described, usually in an indexable metadata base or Intrinsic: Existing as a part of the primary data being described
September, 1999 Grace Agnew Metadata Overview Design Criteria for a Metadata System: Durable - independent of changes to hardware, software and network infrastructure Interoperable Can be seamlessly shared across the web with disparate hardware, software, network infrastructure and search engines
September, 1999 Grace Agnew Metadata Overview Precise Enables the creation of customized virtual collections--pulling objects together seamlessly from any digital space to meet exact information requirements. Flexible Supports any search engine, search strategy, transport or display option Efficient Provides immediate access to the most appropriate asset for the searcher. Controlled Insures digital assets are from a trusted source to an authorized end user.
September, 1999 Grace Agnew Metadata Overview Granular - Able to search the top page, subsequent pages, or drill down to an underlying database of objects. Break through the web skin Query metadatabase Search Engine Underlying ObjectDatabase
September, 1999 Grace Agnew Metadata Overview Key Concepts: Semantics: Meaning ascribed by a community to a metadata element or to the values for that element. Organized into a vocabulary. Structure: Imposes order for the unambiguous expression of the semantics--consistent coding, exchange and display of metadata elements, providing consistent interpretation by the end user. Syntax: Provides a means to represent one or more structures in a flexible, extensible manner. Provides underlying mechanism for encoding, exchange, display and machine processing of metadata. Example: XML
September, 1999 Grace Agnew Metadata Overview Schema Identifies, defines, organizes and constrains the elements in a set, their characteristics and descriptions. Involves both semantics and structure. Examples: Dublin Core, RDF
September, 1999 Grace Agnew Metadata Overview Types of Metadata: Structural Describes the physical and logical attributes of the object, related to creation, transport, storage and display; Describes the hardware and software used to create the object; (Some place this in Administrative metadata) Describes the hardware, software and bandwidth needed to transport and display the object. May be machine-readable, human readable or both. May be part of digital object header (ex: TWAIN)
September, 1999 Grace Agnew Metadata Overview Provenance/Ingest Metadata: Admission ticket to the Archive or Data Repository. Acknowledges the rules of entry and identifies the object for positioning within the Archive. Best if intrinsic in the object, e.g. in the Header. Identifies the owner/creator of the metadata. Identifies the owner/creator of the digital asset. Provides date created, permanence of asset; updates and modifications to asset. May push asset to users when content changes.
September, 1999 Grace Agnew Metadata Overview Rights & Access: Provides requirements for access, display and download/storage of asset. Should integrate with organizations access and authorization system, e.g. Reference/hyperlink to Digital certificate authority Indicate User restrictions (may reference attribute on certificate authoritys user attribute server Support multilayered access: download only vs. store; free vs. fee; asset versions (high res. Vs. low res.)
September, 1999 Grace Agnew Metadata Overview Descriptive Should uniquely identify an asset through: Physical description (overlap with structural metadata) Publication/Creation information (overlap with ingest metadata) Should describe the information content in subject and free-text fields to identify and select the asset in response query from a search engine.
September, 1999 Grace Agnew Metadata Overview Linking Metadata Persistent Links: Metadata record and the described asset. All physical instantiations of the asset. Registries for metadata schemas used to provide a meta- schema to describe the object. Security system for access and authorization and/or link to intermediary access page Considerable overlap with other metadata types
September, 1999 Grace Agnew Metadata Overview Mining Web Assets: Current Practice A query is sent to a proprietary search engine, or a metasearch engine which queries many engines. Benefits: Ubiquitous and free; competition results in better precision and coverage Drawbacks: Access for assets only, not long-term management; Ephemeral metadata; Asset creator has no control over description and access.
September, 1999 Grace Agnew Metadata Overview Standards are Developed to: Create durable, persistent metadata records that precisely define the asset so that exactly-relevant assets are identified and retrieved in response to a query. Create metadata that is flexible, extensible, and scalable to support the needs of any organization, any type of asset, and varying skill and interest levels of metadata creators. Allow the metadata records from many schemas with differing levels of complexity to interoperate for data discovery. Enable machine-intervention for automatic interpretation of metadata and data discovery, particularly among disparate search and retrieval platforms
September, 1999 Grace Agnew Metadata Overview ISO Joint Standard of the ISO (International Organization for Standardization) and IEC (International Electrotechnical Commission) to provide a robust framework for defining data elements in an unambiguous and persistent manner within user committees. Also provides a framework for creating and maintaining metadata registries to store and maintain data element definitions. NCITS L8 Draft Standards available at the following websites:
September, 1999 Grace Agnew Metadata Overview Relevant Metadata Standards: Dublin Core Element Set V. 1.1 (IETF Recommendation) - Flexible lowest common denominator standard with 15 optional, repeatable fields; - XML and HTML based - integrates completely with assets that live on the web or are accessed via the web and live in an attached database May be intrinsic or separate from the asset described; - Automated tools for generating/validating Dublin Core are freely available, e.g. DC.dot:
September, 1999 Grace Agnew Metadata Overview From Description of Dublin Core Elements
September, 1999 Grace Agnew Metadata Overview Dublin Core Drawbacks: Too Flexible and Simple for complex, sophisticated collections; Elements lack standardized use and precision. Different communities are developing extensions to specify and categorize the elements. Approved extensions are available but slow to appear. Some elements (rights, coverage) are ambiguous in their application
September, 1999 Grace Agnew Metadata Overview Dublin Core Drawbacks: Intended for web objects that are textual or primarily textual. Does not provide for: Media asset components (video sequences, scenes, shots, frames, objects); sequential media (audio and video, slide shows); synchronized media (video, audio, caption file or transcription; slide shows).
September, 1999 Grace Agnew Metadata Overview Result: Every Community Creates Their Own Metadata Archives: EAD (Encoded Archival Description) Government: GILS (Government or Global Information Locator System) IMS:Instructional Metadata System TEI:Text Encoding Initiative - books and humanities; TEIH (TEI Header used for metadata description Dublin CoreEdNA FlavorsCIMI Guide to Best Practice: Dublin Core. Available as PDF from
September, 1999 Grace Agnew Metadata Overview MARC Machine-readable cataloging: most library catalogs worldwide. MPEG-7Digital Audio, Video and Still Image files. (In development. Committee draft due October 2000)
September, 1999 Grace Agnew Metadata Overview MPEG-7: Intended to describe audiovisual information regardless of storage, coding, display, medium of technology--will include analog and digital media and combinations of media formats Will Standardize: *Core set of Descriptors (D) *Description Schemes (codified structures of Descriptors-- definition, constraints, relationships among Descriptors) (DS) *Language defining Description Schemes and Descriptors
September, 1999 Grace Agnew Metadata Overview Jane Hunter. MPEG-7: Behind the Scenes in D-Lib Magazine September, 1999 (v. 5, no. 9): 6) MPEG-7 Structural Model
September, 1999 Grace Agnew Metadata Overview Possible MPEG7 schema incorporating DC Image.Moving.TV.News.sequence.scene Footage of Grenade Attack Sam Rainsy knows the violence of political life in Cambodia. Four months ago, 16 of his supporters were killed in a grenade attack in Phom Penh. 10seconds 19:31:57;1 19:32:07;1 From: Jane Hunter and Renato Iannella. The Application of Metadata Standards to Video Indexing. In Research and advanced technology for digital libraries : second European conference, ECDL '98, Heraklion, Crete, Cyprus, September 21-23, 1998 : Proceedings. Berlin: Springer: 1998 (Lecture Notes in Computer Science: 1513):
September, 1999 Grace Agnew Metadata Overview Beyond the Metadata Schema: Access to Information: Information stored and managed within your organization (possibly under different metadata schema) Information stored and managed by outside organizations
September, 1999 Grace Agnew Metadata Overview Metadatabase - Dublin Core Record 1 DC.Creator Grace Agnew Record 70 DC.Contributor. Grace Agnew Books and web sites written by Grace Agnew Author: Agnew, Grace Parameter mapping: DC.Creator, DC.Contributor Result Set: AGNEW, GRACE…1999……………… AGNEW, GRACE…1994……………...
September, 1999 Grace Agnew Metadata Overview Books and web sites written by Grace Agnew Author: Agnew, Grace Parameter mapping: DC.Creator, DC.Contributor SEARCH ENGINE 1 SEARCH ENGINE 2 Author: Agnew, Grace Parameter mapping: 100, 700
September, 1999 Grace Agnew Metadata Overview Z39.50 Information Retrieval (Z39.50): Application Service Definition and Protocol Specification Enables a client to interact with multiple servers, employing different search engines and different data element formats and definitions, to search databases and retrieve the records that result from the search
September, 1999 Grace Agnew Metadata Overview Z39.50 Initiates a session between client and server Executes a query from the client against one or more databases on the server Creates a result set consisting of records that match the query on one or more query attributes (access points)
September, 1999 Grace Agnew Metadata Overview Z39.50 Returns a report on the number of records matching the search Returns records--individual records selected by the client--in a format selected by the client Primary formats returned: MARC, SUTRS, extending to SQL, Dublin Core, other schema
September, 1999 Grace Agnew Metadata Overview Z39.50 Version 3 Extends the capabilities of the standard to include: Boolean and proximity searching Extended services, including saved queries to be periodically re-executed (SDI) Explain facility to allow client to solicit information about the server and dynamically reconfigure itself.
September, 1999 Grace Agnew Metadata Overview Z39.50 Profiles for User Groups: LOC: Access to Digital Collections LOC: Access to Digital Library Objects CIMI: Companion profile for museum digital collections and objects GEO: Geospatial Datasets Z+SQL: extension to the SQL query language
September, 1999 Grace Agnew Metadata Overview Z Limitations Requires client software and Z39.50-enabled server software (which requires Z39.50 aware search engine) Most commercial C/S Products have not implemented the explain feature in version 3 Requires human collaboration for implementation, particularly at the profile level Limited primarily to features provided by commercial servers and clients
September, 1999 Grace Agnew Metadata Overview Z39.50 Limitations Indexing parameters proprietary to server database are not shared with client to allow client to override or extend the proprietary search parameters Databases that are not on a Z39.50 server are invisible
September, 1999 Grace Agnew Metadata Overview Metadata Registries: Dynamic specification, maintenance and description of metadatabase structures: unambiguous definition of data structures unambiguous definition and description of relationships between data structures, behaviors of data structures, integrity constraints on the contents of data structures. semantics (meaning in context) and structure definition
September, 1999 Grace Agnew Metadata Overview Metadata Registries Links/Hooks into subordinate registries used to define data content within a metadata element Mapping of data structures between registries Should be both eye-readable and able to be interpreted by computer programs for seamless, unambiguous discovery, query and display across disparate database and search engine structures and to enable intelligent query agents, advanced data mining, etc.
September, 1999 Grace Agnew Metadata Overview Metadata Registries Collaborative Effort of the Joint Technical Committee 1 (JTC1) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) Open Forum on Metadata Registries: events/openforum/index.htm
September, 1999 Grace Agnew Metadata Overview Metadata Registries REGGIE - Java Applet that dynamically creates metadata according to available online registries; Allows you to enter your own registry, describing, characterizing and constraining all the elements in the set. UK/Australia joint effort
September, 1999 Grace Agnew Metadata Overview Anything by Grace Agnew? Metadatabase Scheme = DC Dublin Core. Author defined as: Creator, Contributor
September, 1999 Grace Agnew Metadata Overview Resource Description Framework W3C Resource Description Framework (RDF) Model and Syntax Specification (22 February 1999): Provide robust application of metadata in the web environment: Model for unambiguous, schema-independent description of resources. Key Concepts: Resource: Any object uniquely identifiable by a URI (uniform resource identifier) Property-type: Property associated with a resource. Value:Associated with a property type--may be atomic (a string) or another resource, creating a new hierarchy)
September, 1999 Grace Agnew Metadata Overview RDF Property types express the relationships of values associated with resources: Famous Example The Author of Metadata Overview is Grace Agnew Metadata Overview Grace Agnew Resource Property Type Value Author
September, 1999 Grace Agnew Metadata Overview RDF Enables interoperability among metadata schemes, including the modular use of multiple schemes within a metadata record utilizing the XML namespace facility; Adds machine-interpretable semantics to the encoding, exchange and reuse of structured metadata; Enables automatic negotiation between search engine, metadata record, and metadata registry for powerful, flexible search and retrieval independent of server and client search and retrieval infrastructures (or, at least, it will!)
September, 1999 Grace Agnew Metadata Application of Dublin Core and RDF for resource description: Dublin Core in HTML - Resides in the Header Element A Thousand Wheels are set in Motion - Georgia Tech Library and Information Center Full Metadata record:
September, 1999 Grace Agnew Metadata Overview RDF / Dublin Core in XML A Thousand Wheels are Set in Motion The Building of Georgia Tech at the Turn of the 20th Century, Georgia Tech Library and Information Center
September, 1999 Grace Agnew Metadata Overview Georgia Institute of Technology-- Buildings This Web site provides photographs, engravings and sketches of the first buildings on the Georgia Tech Campus, from As of 9/20/1999, 88 images are provided but more will be added. Cataloged in EAD Single Item Metadata (SIM) format. Chritton, Heather Crafts,Laurel
September, 1999 Grace Agnew Metadata Overview Notes: 1. RDF shows three types of relationships among collected resources: Sequence (specified ordering of elements) Bag (all members of equal importance) Alternatives (choice between members) In this example, I am specifying among contributors that Heather Chritton, the web page developer, appears first among contributors and Laurel Crafts, the digital image creator, appears second. Other contributors follow (text creation, metadata creation, indexing, etc.) in specified order in the complete record. I use the RDF Sequence list to establish this fixed contributor order. 2. LCSH (Library of Congress Subject Headings) and LCNAF (Library of Congress Name Authority File) do not currently reside on web pages at a URL. The URLs provided are for illustration only
September, 1999 Grace Agnew Metadata Overview XML Extensible Markup Language, a subset of SGML (Standard Generalized Markup Language) provides the ability to define elements within a web document. XML documents have a logical and a physical structure. Each unit of an XML document is an entity. Entities are defined within the document in relation to each other. The logical and physical structures of the document include declarations, elements, comments, character references and processing instructions. Structural relationship is provided through nesting.
September, 1999 Grace Agnew Metadata Overview XML XML display is governed by an attached style document, formulated in CSS (Cascading Style Sheet) or XSL (Extensible Style Language) to provide rules for display. Styles can be applied to single elements as well as to the entire document. More than one style sheet or style document can be provided for a document or element, with precedence rules governing the given display.
September, 1999 Grace Agnew Metadata Overview DTD The Document Type Declaration provides a formally defined structure, vocabulary and syntax for an XML document type. Documents are validated against a DTD to insure nested structure and semantic constraints are followed to insure consistent meaning across documents. DCD A semantic superset of XML DTDs--intended to be conformant with the RDF Model and Syntax Specification. Describes an XML vocabulary for schemas--for specifying object classes. Based on elements (RDF property types) and attributes Supports RDF vocabulary and constructs.
September, 1999 Grace Agnew Metadata Overview SOX Schema for Object-Oriented XML Alternative to DTD for validating XML documents. Supports scalar (numeric) datatypes, enumerated datatypes (values enumeration) and format datatypes. An expanded namespace facility supports objects from any identifiable namespace to be used to build the document.
September, 1999 Grace Agnew Metadata Overview Role of the Database: A database that can be parsed and reported to a validated XML metadata format, as well as other metadata syntaxes, provides a robust space for metadata development. Also reports to any XML Document type and hooks into applications via APIs, to support unique user needs ORACLE DATABASE MARC-BASED CATALOG COLLABORATIVE RESEARCH SPACE WEB-BASED COURSEWARE APPLICATION SUBJECT- SPECIFIC WEB RESEARCH TOOL PERSONAL RESEARCH SPACE
September, 1999 Grace Agnew Metadata Overview Last Step: Data Retrieval Data storage, access and delivery architecture should be open, standards-based, hardware and software independent, providing users across platforms with common, consistent interface and underlying storage structure for efficient retrieval, display, storage and use of digital information Data architecture should support a well-defined, widely available security system to validate authenticity of users and provide data for a variety of uses according to a scalable authorization hierarchy
September, 1999 Grace Agnew Metadata Overview Last Step: Data Retrieval Data architecture should support data as objects for scalable, extensible access, with sophisticated and flexible support for object relationships, particularly to support different physical instantiations of identical data, e.g. digital video object as D1, MPEG1, Quicktime, etc. CORBA Common Object Request Broker Architecture - emerging architecture for open distributed object computing. Intended to provide transparent access to applications and databases, regardless of the hardware and software infrastructure at each end of the transaction
September, 1999 Grace Agnew Metadata Overview Putting It All Together: A Digital Archive Architecture Reference Model for Open Archival Information Systems (OAIS), Developed by a US ISO archiving group under ISO TC20/SC13 and the Consultative Committee for Space Data Systems (CCSDS). This model has recently been released for formal ISO and CCSDS review. An electronic version of the OAIS Reference Model can be found at
September, 1999 Grace Agnew Reference Model for Open Archival Information Systems (OAIS) EXTERNAL DATA FLOW DIAGRAM