Presentation on theme: "Metadata Overview Metadata: Data that describes data"— Presentation transcript:
1Metadata Overview Metadata: Data that describes data Structured data about data“Pure” metadata has meaning only in relation to the primary data that is being described.Metadata provides for both description and management of primary data. It can exist intrinsic to the data or separate (extrinsic) from the data.Extrinsic metadata exists in a metadatabase separate from the primary data. Library catalogs, which have traditionally described books, are examples of extrinsic metadatabases. Books, journals, etc. are described in MARC (machine-readable catalog format) databases and provide a call number or other shelf-location to identify and retrieve the book. MARC provides a hyperlink field for materials in electronic format. Metadatabases are designed in ORACLE®, Microsoft Access®, mySQL, etc.A critical component of an extrinsic metadatabase is that a field provides a persisent link between the metadata record and the primary data being described. The primary benefits of extrinsic metadatabases for digital primary data are: (1) flexibility and scalability--as the need for additional fields become necessary, they can be added without re-encoding the primary data. (2) asset management--the asset management overhead (bandwidth traffic for discovery, for example) is separated from the primary data. This can provide both security (the user discovering the data must be authorized and/or authenticated to access the data) and some level of network traffic management.
2Metadata Overview Metadata may be either: Extrinsic: Existing indendepently of the primary data being described, usually in an indexable metadata baseorIntrinsic: Existing as a part of the primary data being describedIntrinsic metadata offers the following benefits: (1) the metadata may be synchronized with the data, for context-sensitive access. This is particularly useful for data with sequential playback, such as audio and video files. “Bookmarks” within a data file are an example of intrinsic metadata.(2) The metadata may reside in a header, providing information necessary to the playback technology (computer, DVD player, plug-in software, etc.), providing durable description for the data contents and providing technical information about the creation of the data, to aid in data management and data migration. Instrinsic metadata can provide information about the ownership and management of the data, to automate the archiving process through the use of “smart metadata.” The main problems with intrinsic metadata are the management of metadata that is diffused throughout the archive rather than collected in a file, for example if the metadata includes a URL and the object’s location is changed, it can be difficult to find and change the URL in the metadata record. There are also technical difficulties inherent in creating and revising intrinsic metadata.The OAIS (open archival information system) standard recommends a combination of extrinsic and intrinsic metadata. Descriptive information would be extrinsic--in a separate database-- and the structural and administrative information would be intrinsic--generally within the object header.
3Metadata Overview Design Criteria for a Metadata System: Durable - independent of changes to hardware, software and network infrastructure Interoperable Can be seamlessly shared across the web with disparate hardware, software, network infrastructure and search enginesDurable, interoperable metadata will have the following characteristics:Open, standards-based, documented schema and syntaxPlatform and search engine independent, for storage, retrieval, access and displayAble to support a wide variety of standards, for a common knowledge management system among a user base, without being dependent on a single standard.This last item is a crucial issue. It is very important to be able to support a common metadata standard, such as Dublin Core, so that the user body can agree on the elements that define data, for precise description, retrieval and display. However, metadata standards change at a rapid pace and usually serve the lowest common denominator user needs. A metadata system needs to accommodate a commonly-supported standard but also unique, specific needs of individuals or small groups within a consortial community. Currently, the most robust metadata systems are developed within an open database platform (Oracle, SQL, Access, etc.) with fields that can report to a number of popular standards--at a minimum, MARC and Dublin Core, within an XML-based RDF wrapper.
4Metadata Overview Precise Enables the creation of customized “virtual collections”--pulling objects together seamlessly from any digital space to meet exact information requirements. Flexible Supports any search engine, search strategy, transport or display option Efficient Provides immediate access to the mostappropriate asset for the searcher. Controlled Insures digital assets are from a trusted source to an authorized end user.
5Metadata Overview Granular - Able to search the top page, subsequent pages, or drill down to an underlying database of objects.“Break through the web skin”To be useful for large object collections or to describe objects in the context of the rapidly expanding Internet, metadata must move to the next generation,a melding of data description and programming to create dynamic, self-sustaining metadata records. I call this “smart metadata”Metadata must dynamically interact with the objects that are described. This can be as simple as dynamically displaying metadata information with the object, when the object is retrieved. Georgia Tech has experiemented with this in an image database:Also, the modification or deletion of the metadata record should be tightly bound to the modification or deletion of the digital object.For web pages, an iterative metadata record could be created that self iterates and expands from the home page downward, so that for example, the base or home page record provides a basic who, what, when, where description and then iterates for subsequent pages by adding a field that indicates the hierarchical relationship between the new page (e.g. child, sibling, related, etc.) and the home page.QuerySearch EngineUnderlying ObjectDatabasemetadatabase
6Metadata Overview Key Concepts: Semantics: Meaning ascribed by a community to a metadata element or to the values for that element. Organized into a “vocabulary.”Structure: Imposes order for the unambiguous expression of the semantics--consistent coding, exchange and display of metadata elements, providing consistent interpretation by the end user.Syntax: Provides a means to represent one or more structures in a flexible, extensible manner. Provides underlying mechanism for encoding, exchange, display and machine processing of metadata . Example: XMLIt is critical to develop or use a robust, consistent metadata system that will persist, and be intelligent to your target audience and, hopefully, to an inter national audience, beyond your working lifetime. A metadata system may start with a unifying or base schema, which defines the structure and rules of the metadata record fields, but extend into subschemes, related schemes, etc. The same is true with semantics, which provide common definitions for the field labels and the information contained within the fields. Contextual definitions may be flexibly used within a metadata scheme to support different domains or users by referencing and using different online thesauri, for example.A metadata registry structure should be designed as a relational and/or object-oriented database that can support current and future schemas, structures, semantics and yntaxes. Documentation should be robust and precise, according to the format provided by ISO/IEC Creating a metadata registry will help you develop a schema that applies consistent rules for schema, semantics and syntax, robust documentation that can be interpreted by human and machine and consistency of application throughout the metadatabase. Otherwise, the repository will face a costly redesign and conversion at some point in the future. Always design for a minimum of two generations of users and include a clear migration path for future technologies.Syntax is a thorny issue at the moment as the world straddles uneasily between HTML and XML. However, XML is finally maturing into a browser-friendly, ubiquitous language. However, given the rate of change on the Internet, XML will migrate to something even less text-based and more extensible. I think the next syntax will be more object oriented and include inherent object management, without the requirement to hook programming code, such as Java or Perl, into the syntax. The object-oriented versions of XML are worth exploring for establishing a syntax migration path.
7Metadata OverviewSchema Identifies, defines, organizes and constrains the elements in a set, their characteristics and descriptions. Involves both semantics and structure. Examples: Dublin Core, RDFThere are many available schema, but Dublin Core, as a flexible”least common denominator” international standard is currently dominating the field. I say this with the caveat that no group has found Dublin Core in its basic form to be adequate and therefore each group is developing and applying extensions or adding fields to the base record. Much work is currently being done on Dublin Core extensions. In addition, groups such as DC Education are looking at extensions and field applications for specific interest groups, such as the Education community. Dublin Core is an excellent base record to provide a robust base record that can be indexed by a variety of search engines, with a Z39.50 client/server interface, etc. Additional fields that target your user base, documented in a registry, can be added to the core record for a standards-based system that offers more sophisticated and responsive data management and access.Developments in Dublin Core and RDF technology are discussed further in the presentation.
8Metadata Overview Structural Types of Metadata: Describes the physical and logical attributes of the object, related to creation, transport, storage and display;Describes the hardware and software used to create the object; (Some place this in Administrative metadata)Describes the hardware, software and bandwidth needed to transport and display the object.May be machine-readable, human readable or both. May be part of digital object header (ex: TWAIN)Structural and Administrative Metadata serve two major purposes:1. Enable the display and manipulation of an object requiring viewer software or that require a bandwidth or client hardware or software threshold for display. Examples include: video files, audio files, SMIL files, digital images, VRML objects, programs, etc.2. Enable digital persistence for digital objects by indexing information about digital objects so that as technology standards change, objects for an obsolete standard can be identified and methods for object migration can be developed. Currently, object migration strategies include:Technology emulation: maintaining emulation programs on newer technology platforms that allow objects to be retrieved and, ideally, transcoded to the new standard.Backward compatibility/migration path. This is a standards and vendor driven solution, where technology for storage, indexing, display and manipulation support the previous standard. MPEG1, 2 and 4 are backward compatible. Reputable vendors also build a migration or conversion path between new and old standards that their platforms support.
9Metadata Overview Provenance/Ingest Metadata: “Admission ticket” to the Archive or Data Repository. Acknowledges the rules of entry and identifies the object for positioning within the Archive. Best if intrinsic in the object, e.g. in the Header.Identifies the owner/creator of the metadata.Identifies the owner/creator of the digital asset.Provides date created, permanence of asset; updates and modifications to asset. May “push” asset to users when content changes.Interpolation is a strategy that I expect to occur for compressed objects. As client platforms become more powerful and provide richer display, algorithms that interpolate compressed data to provide richer, more complete visual and audio information will probably be developed. For this reason, levels of service for MPEG standards, for example, should be codified.Some structural/administrative metadata, as well as provenance/ingest metadata can be automatically supplied within object headers. It is critical, however, that this metadata is fully searchable, for long-term object management, so that if the licensed rights to display a commercial collection expire, for example, all the objects in that collection can be identified and retrieved. Also, if an encoding standard becomes obsolete or is enhanced, the objects in a large, distributed collection can be quickly identified and adjusted to the new standard.Metadata serves more than one purpose--object retrieval and display for the client, but, of equal importance, management of the digital object collection by the collection administrators.See Further:Michael Day. Extending Metadata for Digital Preservation
10Metadata Overview Rights & Access: Provides requirements for access, display and download/storage of asset.Should integrate with organization’s access and authorization system, e.g. Reference/hyperlink to Digital certificate authorityIndicate User restrictions (may reference attribute on certificate authority’s user attribute serverSupport multilayered access:download only vs. store; free vs. fee; asset versions (high res. Vs. low res.)An excellent discussion and proposal for rights metadata:Rust, Godfrey. “Metadata: the Right Approach: an Integrated Model for Descriptive and Rights Metadata in E-commerce” D-Lib Magazine (July/August 1998):The Digital Object Identifier (DOI) , an intiative of the international publishing community, is intended to direct users to an intermediary server providing rights and access information (as well as the ability to pay for access) via a unique and permanent identifier (the DOI) is an important initiative in the digital rights and access arena. Learn more about DOI atThe DOI initiative has a rights metadata working group that is developing standardized rights metadata to facilitate automated rights transactions, particularly procedures for authentication:An excellent short webliography on rights metadata can be found at:An interesting article on electronic copyright management:For an interesting dissussion of rights management in the context of migrating and refreshing digital files to insure file longevity, see:
11Metadata Overview Descriptive Should uniquely identify an asset through:Physical description (overlap with structural metadata)Publication/Creation information (overlap with ingest metadata)Should describe the information content in subject and free-text fields to identify and select the asset in response query from a search engine.By and large, descriptive metadata should exist in a database separate from the electronic object, for at least two reasons:1. Search engines operate against metadatabases to drill down to the most relevant citations. This can require several iterative searches, as the search strategy is refined, etc. A separate database for identification of relevant resources is faster and more efficient, with less bandwidth overhead.2. Data elements in a metadatabase change over time, particularly geographic names, URLs, etc. A separate metadatabase can be globally modified.3. Dublin Core is recommended as the core descriptive base for any metadata record, to provide interoperability across metadata systems, digital archives, domains, etc. Additional fields that support the unique needs of the repository or the subject domain should be added but codified in a registry to insure that the elements can be understood and used by the end user and to enable management and control of the metadata registry.
12Metadata Overview Linking Metadata Persistent Links: Metadata record and the described asset. All physical instantiations of the asset.Registries for metadata schemas used to provide a “meta- schema” to describe the object.Security system for access and authorization and/or link to intermediary access pageConsiderable overlap with other metadata typesA growing concern among digital archive and metadata managers is the use of URLs to link to an object. URLs are not a standard, intended to persist, but rather a technology and a convention.As large repositories of digital objects are created, the management of linking metadata will become a serious problem if a persistent identifier is not used.The recommended practice is not to provide a URL for thedirect link to the object but instead to link to an entry in a permanent registry that resolves location changes. One such technology is the Handles technology developed by CNRI world for the DOI (digital object identifier) standard.Best practice for developing a persistent location for a digital object is to establish a URN (Uniform Resource Name) according to IETF RFC 1737Also currently in use, with PURL registries supported by OCLC (US), UKOLN, NAL (Australia), among others is:PURL: Permanent URL which involves intermediate resolution by a third party. registrySee also: Library of Congress. National Digital Library Program. “The Relationship between URNs, Handles and PURLs.Internet Engineering Task Force on URNs.World Wide Web Consortium pages on Uniform Resource Identifiers.
13Metadata Overview Mining Web Assets: Current Practice A query is sent to a proprietary search engine, or a metasearch engine which queries many engines.Benefits:Ubiquitous and free; competition results in better precision and coverageDrawbacks:Access for assets only, not long-term management; “Ephemeral” metadata; Asset creator has no control over description and access.Search engines that can be used to mine Dublin Core metadatabases include:Harvest, SWISH-E, Isite, Zebra, Blue Angel Technologies, sgrep(particularly worth a look!)Interesting article:Guidelines for extending the use of Dublin Core elements to create a generic application integrating all kinds of information resources Draft, 10/1/97XML-based search tools:BUS (Bottom Up Scheme) of indexing and retrieval GOXML: GoXML.com is an XML context-based search processor that can index, store, and enable the accurate searching of XML data.XRS: XML Retrieval SystemLink: XRS is an XML indexing and retrieval engine that uses the BUS (Bottom-Up Scheme of Index retrieval), which only indexes the leaf elements as defined by a DTD. XRS uses a Java component to render its XML outpu as HTML, and is a connected to a "Query Mediator" servlet between the end-user and the back-end of the search engine.XML Query Language: XQL.
14Metadata Overview Standards are Developed to: Create durable, persistent metadata records that precisely define the asset so that exactly-relevant assets are identified and retrieved in response to a query. Create metadata that is flexible, extensible, and scalable to support the needs of any organization, any type of asset, and varying skill and interest levels of metadata creators. Allow the metadata records from many schemas with differing levels of complexity to interoperate for data discovery. Enable machine-intervention for automatic interpretation of metadata and data discovery, particularly among disparate search and retrieval platforms
15Metadata OverviewISO 11179Joint Standard of the ISO (International Organization for Standardization) and IEC (International Electrotechnical Commission) to provide a robust framework for defining data elements in an unambiguous and persistent manner within user committees.Also provides a framework for creating and maintaining metadata registries to store and maintain data element definitions.NCITS L8 Draft Standards available at the following websites:Six Parts:Framework for the Specification and Standardization of DataElementsClassification for Data ElementsBasic Attributes of Data ElementsRules and Guidelines for the Formulation of Data DefinitionsNaming and Identification Principles for Data ElementsRegistration of Data Elements
16Metadata OverviewRelevant Metadata Standards:Dublin Core Element Set V. 1.1 (IETF Recommendation)- Flexible “lowest common denominator” standard with 15 optional, repeatable fields;- XML and HTML based - integrates completely with assets that live on the web or are accessed via the web and live in an attached database May be intrinsic or separate from the asset described;- Automated tools for generating/validating Dublin Core are freely available, e.g. DC.dot:Best Practices Guidelines worth study:Drenth, B.D., et al Guide to Best Practice: Dublin Core. Consortium for the Computer Interchange of Museum InformationInteresting new RFC:RFC 2731Kunz, J. Encoding Dublin Core Metadata in HTML ( December 1999):ftp://ftp.isi.edu/in-notes/rfc2731.txtThe Dublin Core [DC1] is a small set of metadata elements for describing information resources. This document explains how these elements are expressed using the META and LINK tags of HTML [HTML4.0]. A sequence of metadata elements embedded in an HTML file is taken to be a description of that file. Examples illustrate conventions allowing interoperation with current software that indexes, displays,and manipulates metadata, such as [SWISH-E], [freeWAIS-sf2.0], [GLIMPSE], [HARVEST], [ISEARCH], etc., and the Perl [PERL] scripts in the appendix.
17Metadata Overview From “Description of Dublin Core Elements” Current recommended articles on Dublin Core:[DC-SCHEMA 1999] DC Schema Discussion Paper: Dublin Core views of an underlying data model. Edited by Carl Lagoze.Bearman, David, et al. A Common Model to Support Interoperable Metadata: Progress report on reconciling metadata requirements the Dublin Core and INDECS/DOI Communities.. D-Lib Magazine, January Volume 5, Number 1.Stuart Weibel. The State of the Dublin Core Metadata Initiative April 1999From “Description of Dublin Core Elements”
18Metadata Overview Dublin Core Drawbacks: Too Flexible and Simple for complex, sophisticated collections;Elements lack standardized use and precision. Different communities are developing extensions to specify and categorize the elements. Approved extensions are available but slow to appear.Some elements (rights, coverage) are ambiguous in their applicationUseful Articles:Martin, David. “Beyond Dublin Core: The Need for High Quality Product Information”Clarke, Roger. Beyond the Dublin Core: Rich Meta-Data and Convenience of Use are Compatible After All.
19Metadata Overview Dublin Core Drawbacks: Intended for web objects that are textual or primarily textual. Does not provide for:Media asset components (video sequences, scenes, shots, frames, objects);sequential media (audio and video, slide shows);synchronized media (video, audio, caption file or transcription; slide shows).
20Metadata Overview Result: Every Community Creates Their Own Metadata Archives: EAD (Encoded Archival Description)Government: GILS (Government or Global Information Locator System)IMS: Instructional Metadata SystemTEI: Text Encoding Initiative - books and humanities; TEIH (TEI Header used for metadata descriptionDublin Core EdNA“Flavors” CIMI Guide to Best Practice: Dublin Core. Available as PDFfromEAD References:Home Page:Browsable, searchable tag set:GILS Home Page:Home page:IMS Home Page:home page:TEI Home Page:Overview of Metadata Formats:Heery, Rachel. Overview of Metadata Formats:
21Metadata OverviewMARC Machine-readable cataloging: most library catalogs worldwide.MPEG-7 Digital Audio, Video and Still Image files. (In development. Committee draft due October 2000)Very interesting tool available. Stanford Medical School’s Lane Library has released free XMLMARC software (Java client/server application):Other development work on MARC XML/SGML DTDs, as well as info on the MARC standard itself:MARC to XML mapping table (Library of Congress) and SGML/XML DTD:Also, Dublin Core/GILS/MARC crosswalk:PERL Utilties for MARC-->XML conversionThere is also a web interface to MARC.pm available atwhere you can upload MARCrecords and receive back the output.Two very interesting products are available from MARCView that convert text to MARC and provide complex searching of MARC records:
22Metadata Overview MPEG-7: Intended to describe audiovisual information regardless of storage, coding, display, medium of technology--will include analog and digital media and combinations of media formatsWill Standardize:* Core set of Descriptors (D)* Description Schemes (codified structures of Descriptors-- definition, constraints, relationships among Descriptors) (DS)* Language defining Description Schemes and DescriptorsMPEG-7 has a home page now:The development of MPEG7 will include a Document Definition Language (DDL) to overcome the limitations of primarily text-based languages and schemas currently available.MPEG7 will address primarily audiovisual content--whether digital or analog--and only address text embedded within audiovisual objects, and not text-only objects.MPEG7 is being designed to work with extraction tools, (many of which are in basic stages of development or do not exist), which convert audio speech to text, for example, analyze direction and movement vectors, isolate and analyze color histograms, provide shape recognition, etc. to provide audiovisual, rather than text-based recognition. MPEG7 will be a standard for the indexing middleware that rests between the recognition technology and the source object, so that a user can search for an object (e.g. a dog) within a video file collection; search by musical note or melody through audio files, etc.
23Metadata Overview MPEG-7 Structural Model Development of the Standard: Call for Proposals October 1998Working Draft December1999Committee Draft October 2000Final Committee Draft February2001Draft International Standard July 2001International Standard September 2001MPEG-7, formally called "Multimedia Content Description Interface", will standardise:A set of description schemes and descriptorsA language to specify description schemes, i.e. a Description Definition Language (DDL)A scheme for coding the descriptionJane Hunter. “MPEG-7: Behind the Scenes” in D-Lib Magazine September, 1999 (v. 5, no. 9): 6)
24Metadata Overview Possible MPEG7 schema incorporating DC <DC:Type>Image.Moving.TV.News.sequence.scene</DC:Type><DC:Description.text>”Footage of Grenade Attack”</DC:Description.text><DC:Description.transcript>”Sam Rainsy knows the violence of political life in Cambodia. Four months ago, 16 of his supporters were killed in a grenade attack in Phom Penh.”</DC:Description.transcript><DC:Format.Length>10seconds</DC:Format.Length><DC:Coverage.t.min DC.Scheme=“SMPTE”>19:31:57;1</DC:Coverage.t.min><DC:Coverage.t.maxDC Scheme=“SMPTE”>19:32:07;1</DC:Coverage.t.max>From: Jane Hunter and Renato Iannella. “The Application of Metadata Standards to Video Indexing.” In Research and advanced technology for digital libraries : second European conference, ECDL '98, Heraklion, Crete, Cyprus, September 21-23, 1998 : Proceedings. Berlin: Springer: 1998 (Lecture Notes in Computer Science: 1513):
25Metadata Overview Beyond the Metadata Schema: Access to Information: Information stored and managed within your organization (possibly under different metadata schema) Information stored and managed by outside organizationsCurrently, access to information across servers and disparate databases is very inadequate.The Z39.50 standard, discussed later, provides a fairly well-understood but limited cross-platform access to primarily textual metadata record repositories.XML and object-oriented database development, such as CORBA, are opening up crossplatform access, retrieval and manipulation of information objects.Interesting developments include:XMI: Open information interchange model for exchange of models and data over the Internet in a standardized manner.Tool: XMI Toolkit. Available from IBM Alpha Works. 90-day cost-free testing period.
26Metadata Overview QUERY Search Engine Result Set: Author: Agnew, Grace Books and web sites written by Grace AgnewQUERYSearch EngineAuthor: Agnew, GraceParameter mapping: DC.Creator, DC.ContributorCommon Warehouse Metadata Interchange:Request for Proposal issued. Submissions dueOMG Document ad/ Available fromObjectives:Establish an industry standard specification for common warehouse metadata interchangeProvide a generic mechanism that can be used to transfer warehouse metadataLeverage existing vendor-neutral interchange mechanismsThe American Library Association has an excellent metadata bibliography that includes a section on transmission protocols, discussing Z39.50, SQL (structured query language), OLE, CORBA and WAIS:Metadatabase - Dublin CoreRecord 1DC.Creator Grace AgnewRecord 70DC.Contributor. Grace AgnewResult Set:AGNEW, GRACE…1999………………AGNEW, GRACE…1994……………...
27Metadata Overview QUERY Author: Agnew, Grace Author: Agnew, Grace Books and web sites written by Grace AgnewSEARCH ENGINE 2SEARCH ENGINE 1Author: Agnew, GraceParameter mapping: 100, 700Author: Agnew, GraceParameter mapping: DC.Creator, DC.ContributorBXXP Protocol: Interesting New Development!Multiplexes several generic application channels carrying XML (or other mime-type data) on a single socket connection.Provides for segmented data, windowed flow control, user authentication, profile negotiation and secure transportBlock Architectural Precepts - Marshall Rose and Carl MalamudBlocks Simple Exchange Profile (M. Rose)Blocks eXtensible eXchange Protocol (M. Rose)
28Metadata OverviewZ39.50Information Retrieval (Z39.50): Application Service Definition and Protocol SpecificationEnables a client to interact with multiple servers, employing different search engines and different data element formats and definitions, to search databases and retrieve the records that result from the searchOverview of Z39.50:Current version: 3The American Library Association provides a very useful annotated bibliography, including citations to freeware:
29Metadata OverviewZ39.50 Initiates a session between client and server Executes a query from the client against one or more databases on the server Creates a result set consisting of records that match the query on one or more query attributes (access points)Two of the best introductions, even though older articles (1996), no longer apparently available on the web, are very worth reading because they provide the programming behind the server and client applications, for those wishing to build their own.Anyone wishing copies of either article should contact Grace Agnew atLeVan, Ralph. “Building a Z39.50 Client” OCLC Online Computer Library Center. (pdf file)Kunze, John A. “Basic Z39.50 Server Concepts and Creation.” University of California at Berkeley. (pdf file)Anyone interested in implementing Z39.50 can find software at the following location:List of commercial and shareware systems:
30Metadata OverviewZ39.50 Returns a report on the number of records matching the search Returns records--individual records selected by the client--in a format selected by the client Primary formats returned: MARC, SUTRS,extending to SQL, Dublin Core, other schemaOne of the core components of the Z39.50 standard is the attribute set, that part of the search query that specifies characteristics of the search term.The Z39.50 Implementers Group (ZIG) has recently approved a new architecture for attribute sets. A new attribute set, that adheres to the new architecture, is currently in development. The most recent version of the attribute set is the third draft of the Bib-2 attribute set:
31Metadata Overview Z39.50 Version 3 Extends the capabilities of the standard to include:Boolean and proximity searchingExtended services, including saved queries to be periodically re-executed (“SDI”)“Explain” facility to allow client to solicit information about the server and dynamically reconfigure itself.Z39.50 Variations:Z-SQL (SQL query language and generic record export in Z39.50)ZORBA (CORBA object retrieval in Z39.50)Ward, Nigel. Michael Lawley & Sonya Finnegan. “ZORBA: Information Retrieval Using Distributed Object Technologies”Use of Dublin Core with Metadata Schema:LeVan, Ralph. “Dublin Core and Z39.50”
32Metadata Overview Z39.50 Profiles for User Groups: LOC: Access to Digital CollectionsLOC: Access to Digital Library ObjectsCIMI: Companion profile for museum digital collections and objectsGEO: Geospatial DatasetsZ+SQL: extension to the SQL query languageProfiles:Profile for Access to Online Thesauri:Profile for Access to Digital Library Collections:CIMI Profile for Museum CollectionsThe Bath Profile is probably the most interesting and useful development in Z39.50 profiles, particularly for APAN:The Bath Profile:An International Z39.50 Specification for Library Applications and Resource Discovery
33Metadata Overview Z39.50 - Limitations Requires client software and Z39.50-enabled server software (which requires Z39.50 aware search engine)Most commercial C/S Products have not implemented the “explain” feature in version 3Requires human collaboration for implementation, particularly at the profile levelLimited primarily to features provided by commercial servers and clientsTwo Very Interesting Developments, intended to overcome some of the limitations of Z39.50 are:The Mozilla organization’s project to integrate RDF and Z39.50:DSTC in Australia has developed a Z39.50 implementation that can search more than one server at a time with its “Hot Oil product, providing simultaneous keyword searching of MELVYL and UNILINC
34Metadata Overview Z39.50 Limitations Indexing parameters proprietary to server database are not shared with client to allow client to override or extend the proprietary search parametersDatabases that are not on a Z39.50 server are invisible
35Metadata Overview Metadata Registries: Dynamic specification, maintenance and description of metadatabase structures:unambiguous definition of data structuresunambiguous definition and description of relationships between data structures, behaviors of data structures, integrity constraints on the contents of data structures.semantics (meaning in context) and structure definitionArticle on Metadata Registries:Heery, Rachel. “Metadata Corner: Naming Names: Metadata Registries”. Ariadne (e-journal) issue 11:From X.25, Metamodel for Data Registries:Data Registry: “A place to keep characteristics of data that are necessary to clearly describe, inventory, analyze and classify data. A data registry supports data sharing with cross-system and cross-organization descriptions of common units of data. A data registry allows users of shared data to have a common understanding of a unit of data’s meaning, representation and identification.”The current, practical application of metadata registration is to establish concise, unambiguous definitions and context for atomic data elements, as well as the structure and format for the values that represent the data element, for sharing data, primarily in large datasets or technical reports.
36Metadata Overview Metadata Registries Links/Hooks into subordinate registries used to define data content within a metadata elementMapping of data structures between registriesShould be both eye-readable and able to be interpreted by computer programs for seamless, unambiguous discovery, query and display across disparate database and search engine structures and to enable intelligent query agents, advanced data mining, etc.The standard for metadata registration is IAO/IEC 11179, Specification and Standardization of Data Elements, a six-part standardFramework for the Specification and Standardization of Data ElementsClassification for Data ElementsBasic Attributes of Data ElementsRules and Guidelines for the Formulation of Data DefinitionsNaming and Identification Principles for Data ElementsRegistration of Data ElementsPart three is currently undergoing extensive revision, which may take as much as 24 months for development, to conform more to a metamodel format,
37Metadata Overview Metadata Registries Collaborative Effort of the Joint Technical Committee 1 (JTC1) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC)Open Forum on Metadata Registries:Data elements within the described dataset are registered in ISO compliant registry to:standardize representation of the data element to enable shareability and durability (reuse) of dataestablish context and meaning for intelligent retrieval and interpretation of dataData element is the equivalent of an attribute in a data or object model. The representation of a single property of a class of objects in the natural world.Draft Standards:
38Metadata Overview Metadata Registries REGGIE - Java Applet that dynamically creates metadata according to available online registries;Allows you to enter your own registry, describing, characterizing and constraining all the elements in the set.UK/Australia joint effortA key objective for creating a metadata registry is to document atomic data (“data elements”) in a reusable form--understandable by human and computer manipulation.For this purpose, t has been suggested that Registries be documented in a standard format, such as an XML DTD. This is not currently happening.Many tools exist to generate DTDs.The Center for Advanced Information Systems is developingDTD-Miner, a system that automatically generates DTD from a givensmall set of XML documents that do not have any DTD.DTDGenerator will attempt to create a DTD from a well-formed XML document:
39Metadata Overview REGISTRY Search Engine XML DTDs, cont’d: Anything by Grace Agnew?REGISTRYSearch EngineMetadatabaseScheme = DC <URL of Registry>XML DTDs, cont’d:There is also the suite of IBM AlphaWorks tools, currently available at no cost for testing and evaluation:DDbE: accepts well-formed XML documents and constructs a DTDXMI Toolkit: generate DTDs and share Java objectsXML Parser for Java: validating parserXML Generator: generates instances of valid XML from a DTDDublin Core.Author defined as:Creator,Contributor
40Metadata Overview Resource Description Framework W3C Resource Description Framework (RDF) Model and Syntax Specification (22 February 1999):Provide robust application of metadata in the web environment:Model for unambiguous, schema-independent description of resources.Key Concepts:Resource: Any object uniquely identifiable by a URI (uniform resource identifier)Property-type: Property associated with a resource.Value: Associated with a property type--may be atomic (a string) or another resource, creating a new hierarchy)Some resources for RDF:Dave Beckett’s Resource Description Framework (RDF) Resources:RDF Schema Functional Requirements:Boye, Janus. “RDF: What’s in it for us?” (IRT Org e-newsletter:)
41Metadata OverviewRDFProperty types express the relationships of values associated with resources:“Famous Example”The Author of “Metadata Overview” is Grace AgnewDC-dot can now generate Dublin Core metadata for a submitted URL anddisplay the metadata in RDF/XML format. It is located atThe Goettingen Digitization Center of the Lower Saxony State andUniversity Library, a national supply Center for Digitization inGerman libraries, defined and applies an XML/RDF format for theirdigitized documentsFollowing the XML syntax (well formed) and the RDF naming conventions, they are currently transforming the structure into a DTD.Reggie, from the DTSC in Australia, will also export records in abbreviated RDF format:AuthorMetadata Overview“Grace Agnew”Property TypeValueResource
42Metadata OverviewRDFEnables interoperability among metadata schemes, including the modular use of multiple schemes within a metadata record utilizing the XML namespace facility;Adds machine-interpretable semantics to the encoding, exchange and reuse of structured metadata;Enables automatic negotiation between search engine, metadata record, and metadata registry for powerful, flexible search and retrieval independent of server and client search and retrieval infrastructures (or, at least, it will!)See also the OCLC Cooperative Online Resource Catalog (CORC), a cooperative cataloging venture of web-based resources that provides output in MARC, Dublin Core and Dublin Core/RDF.Home Page:Click “log on to CORC” and you will be automatically logged on as a guest. Search by subject to locate a resource and display that resource in MARC, Dublin Core and DublinCore/RDF.
43MetadataApplication of Dublin Core and RDF for resource description: Dublin Core in HTML - Resides in the Header Element<html><head><title>A Thousand Wheels are set in Motion - Georgia Tech Library and Information Center </title><link rel=schema.DC" href="http://purl.org/dc"><meta name="DC.Title" content="A Thousand Wheels are Set in Motion”><meta name=“DC.Title.Alternative" content="The Building of Georgia Tech at the Turn of the 20th Century, "><meta name="DC.Creator.CorporateName” scheme="LCNAF" content="Georgia Institute of Technology Library and Information Center"><meta name="DC.Subject" scheme="LCSH" content="Georgia Institute of Technology--Buildings"><meta name="DC.Description" content="This Web site provides photographs, engravings and sketches of the first buildings on the Georgia Tech Campus, from As of 9/20/1999, 88 images are provided but more will be added. Cataloged in EAD Single Item Metadata format."><meta name="DC.Publisher.CorporateName" scheme="LCNAF" content="Georgia Institute of Technology Library and Information Center"><meta name="DC.Contributor.PersonalName" scheme="LCNAF" content="Chritton, Heather"><meta name=Dc.Contributor.PersonalName” scheme=“LCNAF”content=“Crafts, Laurel”>Resources:Guidance on Expressing the Dublin Core within the Resource Description Framework (RDF). (Dublin Core Metadata Initiative. Draft Proposal):Application of RDF for Extensible Dublin Core Metadata(a bit old but still valuable):layman.htmlFull Metadata record:
44Metadata Overview RDF / Dublin Core in XML <?xml:namespace href=“http://www.w3c.org/RDF/”=as=”RDF”?><?xml:namespace href=“http://purl.org/RDF/DC”as=”DC”?><?XMl:namespace href=“http://loc.gov/LCNAF”as=”LCNAF”?><?XML:namespace href=“http://loc.gov/LCSH” as= “LCSH”?><RDF:RDF><RDF: Description RDF: HREF=“http://purl.org/metadata/dublin_core_elements”><DC.Title> A Thousand Wheels are Set in Motion</DC:Title>< DC.Title.Alternative> The Building of Georgia Tech at the Turn of the 20th Century, </DC.Title.Alternative><DC:Creator.CorporateName><RDF:Description><LCNAF:CorporateName>Georgia Tech Library and Information Center</LCNAF:Corporate Name></RDF:Description>
45Metadata Overview <DC:Subject> <RDF:Description> <LCSH:CorporateName>Georgia Institute of Technology Buildings</LCSH:CorporateName></RDF:Description><DC:Description> This Web site provides photographs, engravings and sketches of the first buildings on the Georgia Tech Campus, from As of 9/20/1999, 88 images are provided but more will be added. Cataloged in EAD Single Item Metadata (SIM) format.</DC:Description><RDF:Seq><RDF:LI><LCSH:PersonalName>Chritton, Heather</LCSH:PersonalName></RDF:LI><RDF:LI><LCSH:PersonalName>Crafts,Laurel</LCSH:PersonalName></RDF:LI></RDF:Seq>
46Metadata Overview Notes: 1. RDF shows three types of relationships among collected resources:Sequence (specified ordering of elements)Bag (all members of equal importance)Alternatives (choice between members)In this example, I am specifying among contributors that Heather Chritton, the web page developer, appears first among contributors and Laurel Crafts, the digital image creator, appears second. Other contributors follow (text creation, metadata creation, indexing, etc.) in specified order in the complete record. I use the RDF Sequence list to establish this fixed contributor order.2. LCSH (Library of Congress Subject Headings) and LCNAF (Library of Congress Name Authority File) do not currently reside on web pages at a URL. The URLs provided are for illustration only
47Metadata OverviewXMLExtensible Markup Language, a subset of SGML (Standard Generalized Markup Language) provides the ability to define elements within a web document. XML documents have a logical and a physical structure. Each unit of an XML document is an entity. Entities are defined within the document in relation to each other. The logical and physical structures of the document include declarations, elements, comments, character references and processing instructions. Structural relationship is provided through nesting.Excellent XML schema documents are now available:And a Schema Primer!Good Websites for XML Information:XMLInfo:XML Zone:
48Metadata OverviewXMLXML display is governed by an attached style document, formulated in CSS (Cascading Style Sheet) or XSL (Extensible Style Language) to provide rules for display. Styles can be applied to single elements as well as to the entire document. More than one style sheet or style document can be provided for a document or element, with precedence rules governing the given display.Using XSL and CSS Together: W3C Note 11-September 1998:Introduction to XSL (slide show):Formatting XML with XSL (tutorial):
49Metadata OverviewDTD The Document Type Declaration provides a formally defined structure, vocabulary and syntax for an XML document type. Documents are validated against a DTD to insure nested structure and semantic constraints are followed to insure consistent meaning across documents.DCD A semantic superset of XML DTDs--intended to be conformant with the RDF Model and Syntax Specification. Describes an XML vocabulary for schemas--for specifying object classes. Based on elements (RDF property types) and attributes Supports RDF vocabulary and constructs.Web Developer’s Virtual Library (WDVL): XML Specifications, Proposals and Vocabularies(master list of all XML-based specifications and proposals submitted to W3CTo see a metadata record for video expressed in RDF, XML DTD, DCD and SOX, seeHunter, Jane and Armstrong, Liz. “A Comparison of Schemas for Video Metadata Representation”
50Metadata Overview SOX Schema for Object-Oriented XML Alternative to DTD for validating XML documents. Supports scalar (numeric) datatypes, enumerated datatypes (values enumeration) and format datatypes. An expanded namespace facility supports objects from any identifiable namespace to be used to build the document.
51Metadata Overview Role of the Database: A database that can be parsed and reported to a validated XML metadata format, as well as other metadata syntaxes, provides a robust space for metadata development. Also reports to any XML Document type and hooks into applications via APIs, to support unique user needsORACLE DATABASESUBJECT-SPECIFIC WEB RESEARCH TOOLA good bibliography on the creation of digital libraries:Klemperer, Katharina and Chapman, Stephen. Digital Libraries: A Selected Resource Guide:MARC-BASED CATALOGPERSONALRESEARCHSPACECOLLABORATIVE RESEARCH SPACEWEB-BASED COURSEWARE APPLICATION
52Metadata Overview Last Step: Data Retrieval Data storage, access and delivery architecture should be open, standards-based, hardware and software independent, providing users across platforms with common, consistent interface and underlying storage structure for efficient retrieval, display, storage and use of digital informationData architecture should support a well-defined, widely available security system to validate authenticity of users and provide data for a variety of uses according to a scalable authorization hierarchyInteresting Work in the Commercial CenterEnterprise Information Portal:Hummingbird Enterprise Software and Solutions:or
53Metadata Overview Last Step: Data Retrieval Data architecture should support data as objects for scalable, extensible access, with sophisticated and flexible support for object relationships, particularly to support different physical instantiations of identical data, e.g. digital video object as D1, MPEG1, Quicktime, etc.CORBA Common Object Request Broker Architecture - emerging architecture for open distributed object computing. Intended to provide transparent access to applications and databases, regardless of the hardware and software infrastructure at each end of the transactionCORBA Specifications (postscript format):Object Management Group (OMG) Home Page:Keahey, Kate. A Brief Tutorial on CORBABrando, Thomas J. Interoperability and the CORBA Specification:Interesting Implementation: OASIS (Open Architecture Scientific Information System):OASIS Overview:
54Metadata Overview Putting It All Together: A Digital Archive ArchitectureReference Model for Open Archival Information Systems (OAIS),Developed by a US ISO archiving group under ISO TC20/SC13 and the Consultative Committee for Space Data Systems (CCSDS). This model has recently been released for formal ISO and CCSDS review. An electronic version of the OAIS Reference Model can be found atInteresting Draft Implementation of an OAIS-compatible archive:Holdsworth, D. Proposed Architecture for CEDARS Demonstrator:
55Reference Model for Open Archival Information Systems (OAIS) EXTERNAL DATA FLOW DIAGRAM