Dublin Core Metadata Initiative

Slides:



Advertisements
Similar presentations
Dublin Core in Multiple Languages Thomas Baker Sixth Dublin Core Workshop Library of Congress, Washington DC Tuesday, 3 November 1998.
Advertisements

Distributed Systems Architectures
Chapter 7 System Models.
Copyright © 2003 Pearson Education, Inc. Slide 8-1 Created by Cheryl M. Hughes, Harvard University Extension School Cambridge, MA The Web Wizards Guide.
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,
doi> Digital Object Identifier: overview
1 Web Search Environments Web Crawling Metadata using RDF and Dublin Core Dave Beckett Slides:
T. Baker / 27 March 2000 A Registry for Dublin Core Thomas Baker, GMD IuK 2000: "Information, Knowledge and Knowledge Management Darmstadt, 27 March 2000.
T. Baker / 23 Sep 2000 Dublin Core Qualifiers and A Grammar for Dublin Core Thomas Baker DC-8, National Library of Canada, Ottawa 4 October 2000.
DC8 Registries Breakout. Goals of the session Discuss and clarify : Requirements for registry Framework for policy Relate issues raised to EOR prototype.
DC2001, Tokyo DCMI Registry : Background and demonstration DC2001 Tokyo October 2001 Rachel Heery, UKOLN, University of Bath Harry Wagner, OCLC
DC Architecture WG meeting Monday Sept 12 Slot 1: Slot 2: Location: Seminar Room 4.1.E01.
OLAC Metadata Steven Bird University of Melbourne / University of Pennsylvania OLAC Workshop 10 December 2002.
Dublin Core Metadata Tutorial July 9, 2007 Stuart Weibel Senior Research Scientist OCLC Programs and Research.
18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
A centre of expertise in digital information management IMS Digital Repositories Interoperability Andy Powell UKOLN,
PwC SCHEMAS Forum for metadata schema implementers The SCHEMAS project and metadata ETB Workshop, London, 9-10 January 2001 Michael Day,
Metadata vocabularies and ontologies Dr. Manjula Patel Technical Research and Development
Community, Consensus, and the Trajectory of Progress Reflections on the Dublin Core experience and what it tells us about the future Stuart Weibel OCLC.
UKOLN, University of Bath
An overview of collection-level metadata Applications of Metadata BCS Electronic Publishing Specialist Group, Ismaili Centre, London, 29 May 2002 Pete.
An ontology server for the agentcities.NET project Dr. Manjula Patel Technical Research and Development
Andy Powell, Eduserv Foundation Feb 2007 The Dublin Core Abstract Model – a packaging standard?
Dublin Core, OAI-PMH and the eBank UK schema Monica Duke UKOLN, University of Bath, UK UKOLN is supported by:
February Harvesting RDF metadata Building digital library portals with harvested metadata workshop EU-DL All Projects concertation meeting DELOS.
The Discovery Landscape in Crystallography UKOLN is supported by: Monica Duke UKOLN, University of Bath, UK – eBank UK project A centre.
Dr. Alexandra I. Cristea CS 253: Topics in Database Systems: C3.
© Tally Solutions Pvt. Ltd. All Rights Reserved Shoper 9 License Management December 09.
Welcome. © 2008 ADP, Inc. 2 Overview A Look at the Web Site Question and Answer Session Agenda.
Break Time Remaining 10:00.
Configuration management
Collections and services in the information environment JISC Collection/Service Description Workshop, London, 11 July 2002 Pete Johnston UKOLN, University.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Encoding DC in (X)HTML, XML and RDF Andy Powell UKOLN, University of Bath, UK UKOLN is supported by: Tutorial.
: 3 00.
5 minutes.
1 An inference engine for the semantic web Naudts Guido Student at the Open University Netherlands.
Clock will move after 1 minute
Select a time to count down from the clock above
RefWorks: The Basics October 12, What is RefWorks? A personal bibliographic software manager –Manages citations –Creates bibliogaphies Accessible.
Introduction Peter Dolog dolog [at] cs [dot] aau [dot] dk Intelligent Web and Information Systems September 9, 2010.
1 DIGITAL INTERACTIVE MEDIA Wednesday, October 28, 2009.
Developing a Metadata Exchange Format for Mathematical Literature David Ruddy Project Euclid Cornell University Library DML 2010 Paris 7 July 2010.
DC 2004, Shanghai, October 2004D. Hillmann, Slide 1 An Introduction to Dublin Core Diane I. Hillmann National Science Digital Library DC2004 Tutorial,
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
RDF Kitty Turner. Current Situation there is hardly any metadata on the Web search engine sites do the equivalent of going through a library, reading.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Everything Around the Core Practices, policies, and models around Dublin Core Thomas Baker, Fraunhofer-Gesellschaft DC2004, Shanghai Library
Stuart Weibel OCLC, Inc. October, 1997 Dublin Core Metadata Stuart Weibel Consulting Research Scientist OCLC Office of Research purl.org/net/weibel October.
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
Metadata Standards and Applications 5. Applying Metadata Standards: Application Profiles.
The role of metadata schema registries XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN.
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
Metadata Modularization Concepts and Tools Carl Lagoze CS
Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – Carl Lagoze – Cornell University.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
1 Dublin Core & DCMI – an introduction Some slides are from DCMI Training Resources at:
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Registry of MEG-related schemas MEG BECTa, Coventry, 17 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported by:
Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – Carl Lagoze – Cornell University.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Introduction to Metadata
Some Options for Non-MARC Descriptive Metadata
Presentation transcript:

Dublin Core Metadata Initiative Stuart Weibel OCLC Office of Research Director, Dublin Core Metadata Initiative

Presentation Outline Introduction to Metadata Dublin Core Metadata Initiative Metadata Registries Syntax Alternatives for Web Metadata A Few Strategic Applications

Introduction to Metadata 1 1 1 1 1

The Web as an Information System Search systems are motivated by business models, not user needs Index coverage is unpredictable and limited Too much recall, too little precision Index spam abounds Resources (and their names) are volatile Archiving is presently unsolved Authority and quality of service are spotty Managing intellectual property rights is hard

Metadata: Part of a Solution Structured data about data Organization and management of content Support discovery Direct content in channels Enable automated discovery/manipulation

Internet Commons includes Multiple Communities Commerce Whatever... Home Pages Geo Internet Commons Library Museums Scientific Data

Interoperability requires conventions about: Semantics The meaning of the elements Structure human-readable machine-parseable Syntax grammars to convey semantics and structure

Haven’t we done metadata already? The MARC family of standards is the single most successful resource description standard in the world

What’s wrong with this model on the Web? Expensive Complex Professional catalogers required Bias towards bibliographic artifacts Fixed resources Incomplete handling of resource evolution and other resource relationships Anglo-centric MARC 21 accounts for ¾ of MARC records, but there are other varieties

Dublin Core Metadata Initiative 1 1 1 1 1

History of the Dublin Core 1994: Simple tags to describe Web pages 1995: The Dublin Core is one of many vocabularies needed ("Warwick Framework") 1996: The Dublin Core: 13 elements expanded to 15 - appropriate for Text and Images 1997: WF needs formal expression in a Resource Description Framework (RDF) 2000: Dublin Core Metadata Initiative recommends qualifiers, broadens its organizational scope beyond the Core

Dublin Core Metadata Initiative The mission of DCMI is to make it easier to find resources using the Internet through the following activities: Developing metadata standards for discovery across domains (example: the Dublin Core) Defining frameworks for the interoperation of metadata sets Facilitating the development of community or disciplinary specific metadata sets

DCMI Organizational Structure Board of Trustees Executive Director Usage Board Directorate Managing Director Standards Development WGs Advisory Board Infrastructure WGs DCMI Subscribers User Support and Education WGs DCMI Activity Areas Liaison

DCMI Activities Standards development and maintenance Metadata registry and infrastructure Technical working groups and periodic workshops Tutorial materials and user guides Education and training Open source software Liaisons with other standards or user communities

Unqualified Dublin Core is the Pidgin metadata language Metadata is language Dublin Core is a small and simple language -- a pidgin -- for finding resources across domains using the internet. Speakers of different languages naturally "pidginize" to communicate

Qualifiers and Domain-specific Extensions The Dublin Core architecture supports more sophisticated metadata solutions through the addition of: Qualifiers Domain-specific extensions Application Profiles of involving mixed namespaces (more on this later) Increased sophistication comes at the cost of some degree of interoperability

Varieties of Qualifiers: Value Encoding Schemes Says that the value is a term from a controlled vocabulary (e.g., Library of Congress Subject Headings) a string formatted in a standard way (e.g., "2001-05-02" means May 2, not February 5) Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.

Varieties of qualifiers: Element Refinements Make the meaning of an element narrower or more specific. a Date Created versus a Date Modified an IsReplacedBy Relation versus a Replaces Relation If your software does not understand the qualifier, you can safely ignore it.

A Grammar of Dublin Core http://www.dlib.org/dlib/october00/baker/10baker.html By design not as subtle as mother tongues, but easy to learn and useful in practice Pidgins: small vocabularies (Dublin Core: fifteen special nouns and lots of optional adjectives) Simple grammars: sentences (statements) follow a simple fixed pattern...

Resource has property X implied verb one of 15 properties property value (an appropriate literal) DC:Creator DC:Title DC:Subject DC:Date... implied subject Resource has property X qualifiers (adjectives) [optional qualifier] [optional qualifier]

Resource has Subject "Languages -- Grammar" Resource has Date LCSH Resource has Date "2000-06-13" ISO8601 Revised

Dumb-Down Principle for Qualifiers The fifteen elements should be usable and understandable with or without the qualifiers Qualifiers refine meaning (but may be harder to understand) Nouns can stand on their own without adjectives If your software encounters an unfamiliar qualifier, look it up -- or just ignore it!

Using DC with other vocabularies Specialized application profiles may need to: Use general-purpose Dublin Core elements Use elements from another, more domain-specific standard Narrow standard definitions of DC elements for specific local uses Invent local elements outside the scope of existing standards

What is an Application Profile? A metadata schema incorporating a set of elements from one or more metadata element sets A set of policies defining how the elements should be applied to the domain of the application A set of guidelines that make the policies concerning elements explicit

Application Profiles and Namespaces Namespaces declare terms and definitions Dublin Core namespace = Dublin Core standard Application profiles re-use terms from one or more namespaces May package terms from multiple namespaces May adapt definitions to local purposes All terms must be defined in namespaces May include locally defined namespaces

Multiple Namespace Fragment xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:co="http://purl.org/rss/1.0/modules/company/" <dc:publisher>The O'Reilly Network</dc:publisher> <dc:creator>Rael Dornfest</dc:creator> <dc:rights>Copyright © 2000 O'Reilly & Associates, Inc.</dc:rights> <dc:date>2000-01-01T12:00+00:00</dc:date> <dc:description> XML is placing increasingly heavy loads on the existing technical infrastructure of the Internet. </dc:description> <co:name>XML.com</co:name> <co:market>NASDAQ</co:market> <co:symbol>XML</co:symbol>

Adapting standard definitions to local uses Dublin Core Namespace: DC:Title - machine-readable name of an element "Title: A name given to the resource" -- human-readable name and definition Collection Description Profile (UKOLN) DC:Title - name reused from the DC namespace "Title: A name given to the collection" Definition is modified for the application context Local adaptations should not change semantics of the element definition, but rather, clarify it within a local context

Namespaces and Translation Dublin Core has been translated into 26 languages machine-readable tokens are shared by all human-readable labels are defined in different languages translations are distributed, maintained in many countries eventually linked in DCMI registry

One concept identifier – with labels in many languages dc:creator “Verfasser” rdfs:label [German] “Creator” rdfs:label [English] “Pencipta” rdfs:label [Indonesian]

Dictionaries of Metadata terms and Usage Metadata Registries: Dictionaries of Metadata terms and Usage 1 1 1 1 1

Metadata is language Metadata schemas are languages for making statements about resources: Book has Title "Gone with the Wind". Web page has Publisher "Springer Verlag". Vocabulary terms (elements) are defined in standards like Dublin Core Metadata grammars constrain the statements and data models one can form

Metadata languages are Multilingual Metadata is not a spoken language The words of metadata -- "elements" -- are symbols that stand for concepts expressible in multiple natural languages Standards may have dozens of translations Are concepts like "title", "author", or "subject" used the same way in English, Finnish, and Korean?

Languages Evolve With Use Inevitably, languages resist stability People stretch official definitions Implementers misunderstand the intended meaning or use of elements Implementors coin local terms and extensions If the application does not fit the standard, the standard is often "customized" to fit the application

How do we manage this evolution? How can we monitor the usage of a language that is: Never spoken? Rarely published in a way that can be harvested? How can dictionary editors help a metadata language evolve and grow in response to usage? How can this evolution occur across (human) languages?

RDF Schemas (RDFS) -- W3C standard A dictionary format for metadata terms: Simple XML format for namespaces, terms and definitions Example: "Title" (Dublin Core) Human-readable label and definition: Title: A name given to the resource. Unique, machine-readable identifiers dc:title Support for cross-references Between multiple language renditions of a namespace between terms in related standards between local adaptations and related standards

Registries can function as dictionaries Metadata dictionaries can help metadata vocabularies evolve more like other human languages Not just top-down, like traditional standards Also bottom-up, in response to usage

DCMI – Metadata Registry Stores official metadata element definitions in a central database or repository Managing a namespace (as a standards agency): publish qualifiers as available, with version control Managing translations of the standard in multiple languages Eventually: User guide interface Support for standardisation processes (peer review) Downloadable input to software tools for generating, editing, validating DC metadata

Dictionaries as a tool for harmonization Knowledge of how other projects are using standards will avoid "reinventing the wheel" To help information providers harmonize their schemas for improved access within domains: Between countries (Nordic Metadata Project) Preprint repositories (Open Archives Initiative) Subject gateways (Renardus) Theses and dissertations (NDLTD) Mathematics and physics (MathNet, PhysNet)

A global registry infrastructure? RDF Schema format suggests a scalable ecology of metadata vocabularies on the Web Sharing machine-readable elements translated into many languages suggests a global (multilingual) metadata language for digital libraries Can a well-managed registry infrastructure allow this language to evolve -- with flexible innovation in usage alongside more stable standards?

EOR -- an RDF Toolkit for Schema Infrastructure Harvests RDF Schemas Schemas distributed on multiple Web servers Creates huge database of schemas for searching Web interface functions as a "metadata browser" Click on cross-references between linked terms Downloadable as open source software http://eor.dublincore.org/

EOR Toolkit Integrate RDF components for supporting search services, topic-maps, site-maps, annotation environments and semantic metadata registries Base-level functionality of this toolkit includes: Creation, deletion, and management of RDF databases. Ability to infuse RDF instance data into RDF databases. Ability to search RDF databases. Generic interface design capabilities to support RDF applications. Web interface functions as a "metadata browser„ Open Source: http://eor.dublincore.org              

Syntax Alternatives for Web Metadata 1 1 1 1 1

Syntax Alternatives: HTML Advantages: Simple Mechanism – META tags embedded in content Widely deployed infrastructure (the Web) Public domain tools Disadvantages Limited structural richness (won’t easily support hierarchical,tree-structured data or entity distinctions ).

Syntax Alternatives: XML The standard for networked text and data Wide-spread tool support Parsers (DOM and SAX) Extensibility (namespaces) Type definition (XML Schema) Transformation and Rendering (XSLT) Rich linking semantics (XLINK)

XML DTDs Works, but… DTDs are a stopgap measure Extensibility is problematic Many ways to ‘say’ the same thing (too much flexibility) Interoperability must be pre-coordinated DTDs cannot evolve gracefully Granularity is at the level of the DTD

XML Schemas Rich XML-based language for expressing type semantics Replaces arcane and limited DTD (origin in SGML) Facilities Data typing (both complex and primitive) Constraints Defaults

Syntax Alternatives: RDF RDF (Resource Description Format) The instantiation of the Warwick Framework on the Web Rich data model supporting notions of distinct entities and properties Syntax expressed in XML Granularity is at the level of the element, not the entire schema as with XML DTDs

RDF Components RDF Model and Syntax WG RDF Schema (RDFS) Formal data model Syntax for interchange of data RDF Schema (RDFS) Type system (schema model)

RDF Schemas Declaration of vocabularies properties defined by a particular community characteristics of properties and/or constraints on corresponding values Schema Type System - Basic Types Property, Class, SubClassOf, Domain, Range Minimal (but extensible) at this time minimize significant clashes with typing system designed for XML Schema WG Expressible in the RDF model and syntax

RDF: In Summary RDF Metadata transmission RDF Data Model RDF Schema Embedded (e.g. <META>), Transmitted with resource (HTTP), or from a trusted 3rd Party RDF Data Model Support consistent encoding, exchange and processing of metadata… critical when aggregating data from multiple sources RDF Schema Declare, define, reuse vocabularies

Unresolved Issues Concerning RDF and XML Schemas RDF Schemas and XML Schemas have overlapping functionality XML Schemas provide strong data typing, but also supports semantic specifications RDF is focused on semantic data model and extensible namespace management Resolution of overlap and market acceptance will determine the future of each Semantic Web Activity in the W3C Chartered to address such issues: http://www.w3.org/2001/sw

A Few Strategic Projects 1 1 1 1 1

Open Archives Initiative http://www.openarchives.org Protocols to support alternative scholarly publishing solutions: Federated repositories for: ePrints Libraries Publishers OAI archives may contain full text or surrogates (metadata) Metadata harvesting protocols

OAI Metadata OAI archives will use specific metadata sets and formats that suit the needs of their communities and the types of data they handle. However, interoperability depends on a shared format for exchanging metadata and therefore archives should implement the basic Open Archives Metadata Set.

OAI Metadata Solutions Adoption of unqualified Dublin Core Element Set as required metadata. Support for parallel metadata sets maintained EPMS (e-print community) Others Research library community Museum community

Renardus Project (EU) http://www.konbib.nl/coop/reynard National libraries (Netherlands coordinates) NDR: National Digital Resource in UK Die Deutsche Bibliothek Goal: integrated access to subject gateways in Europe High-level agreement on simple, Dublin-Core-based schema as common denominator

Networked Digital Library of Theses and Dissertations (NDLTD) http://www.ndltd.org International consortium of projects putting dissertations online NDLTD agreement on a small Dublin-Core-based set of metadata elements with extensions to support application-specific needs http://www.ndltd.org/standards/metadata/current.html

PRISM Publishing Requirements for Industry Standard Metadata PRISM XML metadata standard for syndicating, aggregating, post-processing and multi-purposing content from magazines, news, catalogs, books and mainstream journals. Uses DC and its relation types as the foundation for its metadata Adobe, Time, Inc, Getty Images, Conde Nast, Sotheby’s, Interwoven…. http://www.prismstandard.org

Rich Site Summary (RSS) http:/purl.org/RSS Metadata for content syndication (news feeds) Used in developing media content portals Built on established vocabularies (DC), using RDF syntax Layers of application-specific semantics: syndication vocabularies, annotation vocabularies, etc.

For further information.... "Metadata Watch Reports" of SCHEMAS Project, http://www.schemas-forum.org Critical overview (with expert commentary) on the metadata landscape as it evolves Related database of individual activity reports D-Lib Magazine, http://www.dlib.org/dlib/ Ariadne, http://ariadne.ac.uk DCMI Homepage, http://dublincore.org

DC-2001 DC-2001 in Tokyo Three tracks: October 22-26, 2001 Technical working group meetings Implementation reports and research papers General introduction and tutorials for non-experts

How to Participate Join the DC-General mailing list Join a working group Create a working group Information on lists and working groups is available at http://dublincore.org