4 The Web as an Information System Search systems are motivated by business models, not user needsIndex coverage is unpredictable and limitedToo much recall, too little precisionIndex spam aboundsResources (and their names) are volatileArchiving is presently unsolvedAuthority and quality of service are spottyManaging intellectual property rights is hard
5 Metadata: Part of a Solution Structured data about dataOrganization and management of contentSupport discoveryDirect content in channelsEnable automated discovery/manipulation
6 Internet Commons includes Multiple Communities CommerceWhatever...HomePagesGeoInternetCommonsLibraryMuseumsScientificData
7 Interoperability requires conventions about: SemanticsThe meaning of the elementsStructurehuman-readablemachine-parseableSyntaxgrammars to convey semantics and structure
8 Haven’t we done metadata already? The MARC family of standards is the single most successful resource description standard in the world
9 What’s wrong with this model on the Web? ExpensiveComplexProfessional catalogers requiredBias towards bibliographic artifactsFixed resourcesIncomplete handling of resource evolution and other resource relationshipsAnglo-centricMARC 21 accounts for ¾ of MARC records, but there are other varieties
11 History of the Dublin Core 1994: Simple tags to describe Web pages1995: The Dublin Core is one of many vocabularies needed ("Warwick Framework")1996: The Dublin Core: 13 elements expanded to 15 - appropriate for Text and Images1997: WF needs formal expression in a Resource Description Framework (RDF)2000: Dublin Core Metadata Initiative recommends qualifiers, broadens its organizational scope beyond the Core
12 Dublin Core Metadata Initiative The mission of DCMI is to make it easier to find resources using the Internet through the following activities:Developing metadata standards for discovery across domains (example: the Dublin Core)Defining frameworks for the interoperation of metadata setsFacilitating the development of community or disciplinary specific metadata sets
13 DCMI Organizational Structure Board of TrusteesExecutive DirectorUsageBoardDirectorateManaging DirectorStandardsDevelopmentWGsAdvisoryBoardInfrastructureWGsDCMISubscribersUser Support andEducationWGsDCMIActivityAreasLiaison
14 DCMI Activities Standards development and maintenance Metadata registry and infrastructureTechnical working groups and periodic workshopsTutorial materials and user guidesEducation and trainingOpen source softwareLiaisons with other standards or user communities
15 Unqualified Dublin Core is the Pidgin metadata language Metadata is languageDublin Core is a small and simple language -- a pidgin -- for finding resources across domains using the internet.Speakers of different languages naturally "pidginize" to communicate
16 Qualifiers and Domain-specific Extensions The Dublin Core architecture supports more sophisticated metadata solutions through the addition of:QualifiersDomain-specific extensionsApplication Profiles of involving mixed namespaces (more on this later)Increased sophistication comes at the cost of some degree of interoperability
17 Varieties of Qualifiers: Value Encoding Schemes Says that the value isa term from a controlled vocabulary (e.g., Library of Congress Subject Headings)a string formatted in a standard way (e.g., " " means May 2, not February 5)Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.
18 Varieties of qualifiers: Element Refinements Make the meaning of an element narrower or more specific.a Date Created versus a Date Modifiedan IsReplacedBy Relation versus a Replaces RelationIf your software does not understand the qualifier, you can safely ignore it.
19 A Grammar of Dublin Core By design not as subtle as mother tongues, but easy to learn and useful in practicePidgins: small vocabularies (Dublin Core: fifteen special nouns and lots of optional adjectives)Simple grammars: sentences (statements) follow a simple fixed pattern...
20 Resource has property X implied verb one of 15 properties property value(an appropriateliteral)DC:CreatorDC:TitleDC:SubjectDC:Date...impliedsubjectResourcehaspropertyXqualifiers(adjectives)[optional qualifier][optional qualifier]
21 Resource has Subject "Languages -- Grammar" Resource has Date LCSHResourcehasDate" "ISO8601Revised
22 Dumb-Down Principle for Qualifiers The fifteen elements should be usable and understandable with or without the qualifiersQualifiers refine meaning (but may be harder to understand)Nouns can stand on their own without adjectivesIf your software encounters an unfamiliar qualifier, look it up -- or just ignore it!
23 Using DC with other vocabularies Specialized application profiles may need to:Use general-purpose Dublin Core elementsUse elements from another, more domain-specific standardNarrow standard definitions of DC elements for specific local usesInvent local elements outside the scope of existing standards
24 What is an Application Profile? A metadata schema incorporating a set of elements from one or more metadata element setsA set of policies defining how the elements should be applied to the domain of the applicationA set of guidelines that make the policies concerning elements explicit
25 Application Profiles and Namespaces Namespaces declare terms and definitionsDublin Core namespace = Dublin Core standardApplication profiles re-use terms from one or more namespacesMay package terms from multiple namespacesMay adapt definitions to local purposesAll terms must be defined in namespacesMay include locally defined namespaces
27 Adapting standard definitions to local uses Dublin Core Namespace:DC:Title - machine-readable name of an element"Title: A name given to the resource" -- human-readable name and definitionCollection Description Profile (UKOLN)DC:Title - name reused from the DC namespace"Title: A name given to the collection"Definition is modified for the application contextLocal adaptations should not change semantics of the element definition, but rather, clarify it within a local context
28 Namespaces and Translation Dublin Core has been translated into 26 languagesmachine-readable tokens are shared by allhuman-readable labels are defined in different languagestranslations are distributed, maintained in many countrieseventually linked in DCMI registry
29 One concept identifier – with labels in many languages dc:creator“Verfasser”rdfs:label[German]“Creator”rdfs:label[English]“Pencipta”rdfs:label[Indonesian]
30 Dictionaries of Metadata terms and Usage Metadata Registries:Dictionaries of Metadata terms and Usage11111
31 Metadata is languageMetadata schemas are languages for making statements about resources:Book has Title "Gone with the Wind".Web page has Publisher "Springer Verlag".Vocabulary terms (elements) are defined in standards like Dublin CoreMetadata grammars constrain the statements and data models one can form
32 Metadata languages are Multilingual Metadata is not a spoken languageThe words of metadata -- "elements" -- are symbols that stand for concepts expressible in multiple natural languagesStandards may have dozens of translationsAre concepts like "title", "author", or "subject" used the same way in English, Finnish, and Korean?
33 Languages Evolve With Use Inevitably, languages resist stabilityPeople stretch official definitionsImplementers misunderstand the intended meaning or use of elementsImplementors coin local terms and extensionsIf the application does not fit the standard, the standard is often "customized" to fit the application
34 How do we manage this evolution? How can we monitor the usage of a language that is:Never spoken?Rarely published in a way that can be harvested?How can dictionary editors help a metadata language evolve and grow in response to usage?How can this evolution occur across (human) languages?
35 RDF Schemas (RDFS) -- W3C standard A dictionary format for metadata terms:Simple XML format for namespaces, terms and definitionsExample: "Title" (Dublin Core)Human-readable label and definition:Title: A name given to the resource.Unique, machine-readable identifiersdc:titleSupport for cross-referencesBetween multiple language renditions of a namespacebetween terms in related standardsbetween local adaptations and related standards
36 Registries can function as dictionaries Metadata dictionaries can help metadata vocabularies evolve more like other human languagesNot just top-down, like traditional standardsAlso bottom-up, in response to usage
37 DCMI – Metadata Registry Stores official metadata element definitions in a central database or repositoryManaging a namespace (as a standards agency): publish qualifiers as available, with version controlManaging translations of the standard in multiple languagesEventually:User guide interfaceSupport for standardisation processes (peer review)Downloadable input to software tools for generating, editing, validating DC metadata
38 Dictionaries as a tool for harmonization Knowledge of how other projects are using standards will avoid "reinventing the wheel"To help information providers harmonize their schemas for improved access within domains:Between countries (Nordic Metadata Project)Preprint repositories (Open Archives Initiative)Subject gateways (Renardus)Theses and dissertations (NDLTD)Mathematics and physics (MathNet, PhysNet)
39 A global registry infrastructure? RDF Schema format suggests a scalable ecology of metadata vocabularies on the WebSharing machine-readable elements translated into many languages suggests a global (multilingual) metadata language for digital librariesCan a well-managed registry infrastructure allow this language to evolve -- with flexible innovation in usage alongside more stable standards?
40 EOR -- an RDF Toolkit for Schema Infrastructure Harvests RDF SchemasSchemas distributed on multiple Web serversCreates huge database of schemas for searchingWeb interface functions as a "metadata browser"Click on cross-references between linked termsDownloadable as open source software
41 EOR ToolkitIntegrate RDF components for supporting search services, topic-maps, site-maps, annotation environments and semantic metadata registriesBase-level functionality of this toolkit includes:Creation, deletion, and management of RDF databases.Ability to infuse RDF instance data into RDF databases.Ability to search RDF databases.Generic interface design capabilities to support RDF applications.Web interface functions as a "metadata browser„Open Source:
43 Syntax Alternatives: HTML Advantages:Simple Mechanism – META tags embedded in contentWidely deployed infrastructure (the Web)Public domain toolsDisadvantagesLimited structural richness (won’t easily support hierarchical,tree-structured data or entity distinctions ).
44 Syntax Alternatives: XML The standard for networked text and dataWide-spread tool supportParsers (DOM and SAX)Extensibility (namespaces)Type definition (XML Schema)Transformation and Rendering (XSLT)Rich linking semantics (XLINK)
45 XML DTDs Works, but… DTDs are a stopgap measure Extensibility is problematicMany ways to ‘say’ the same thing (too much flexibility)Interoperability must be pre-coordinatedDTDs cannot evolve gracefullyGranularity is at the level of the DTD
46 XML Schemas Rich XML-based language for expressing type semantics Replaces arcane and limited DTD (origin in SGML)FacilitiesData typing (both complex and primitive)ConstraintsDefaults
47 Syntax Alternatives: RDF RDF (Resource Description Format)The instantiation of the Warwick Framework on the WebRich data model supporting notions of distinct entities and propertiesSyntax expressed in XMLGranularity is at the level of the element, not the entire schema as with XML DTDs
48 RDF Components RDF Model and Syntax WG RDF Schema (RDFS) Formal data modelSyntax for interchange of dataRDF Schema (RDFS)Type system (schema model)
49 RDF Schemas Declaration of vocabularies properties defined by a particular communitycharacteristics of properties and/or constraints on corresponding valuesSchema Type System - Basic TypesProperty, Class, SubClassOf, Domain, RangeMinimal (but extensible) at this timeminimize significant clashes with typing system designed for XML Schema WGExpressible in the RDF model and syntax
50 RDF: In Summary RDF Metadata transmission RDF Data Model RDF Schema Embedded (e.g. <META>), Transmitted with resource (HTTP), or from a trusted 3rd PartyRDF Data ModelSupport consistent encoding, exchange and processing of metadata… critical when aggregating data from multiple sourcesRDF SchemaDeclare, define, reuse vocabularies
51 Unresolved Issues Concerning RDF and XML Schemas RDF Schemas and XML Schemas have overlapping functionalityXML Schemas provide strong data typing, but also supports semantic specificationsRDF is focused on semantic data model and extensible namespace managementResolution of overlap and market acceptance will determine the future of eachSemantic Web Activity in the W3C Chartered to address such issues:
53 Open Archives Initiative http://www.openarchives.org Protocols to support alternative scholarly publishing solutions:Federated repositories for:ePrintsLibrariesPublishersOAI archives may contain full text or surrogates (metadata)Metadata harvesting protocols
54 OAI Metadata OAI archives will use specific metadata sets and formats that suit the needs of their communitiesand the types of data they handle.However, interoperability depends on a shared format for exchanging metadata and therefore archives should implement the basic Open Archives Metadata Set.
55 OAI Metadata Solutions Adoption of unqualified Dublin Core Element Set as required metadata.Support for parallel metadata sets maintainedEPMS (e-print community)OthersResearch library communityMuseum community
56 Renardus Project (EU) http://www.konbib.nl/coop/reynard National libraries (Netherlands coordinates)NDR: National Digital Resource in UKDie Deutsche BibliothekGoal: integrated access to subject gateways in EuropeHigh-level agreement on simple, Dublin-Core-based schema as common denominator
57 Networked Digital Library of Theses and Dissertations (NDLTD) International consortium of projects putting dissertations onlineNDLTD agreement on a small Dublin-Core-based set of metadata elements with extensions to support application-specific needs
58 PRISM Publishing Requirements for Industry Standard Metadata PRISM XML metadata standard for syndicating, aggregating, post-processing and multi-purposing content from magazines, news, catalogs, books and mainstream journals.Uses DC and its relation types as the foundation for its metadataAdobe, Time, Inc, Getty Images, Conde Nast, Sotheby’s, Interwoven….
59 Rich Site Summary (RSS) http:/purl.org/RSS Metadata for content syndication (news feeds)Used in developing media content portalsBuilt on established vocabularies (DC), using RDF syntaxLayers of application-specific semantics: syndication vocabularies, annotation vocabularies, etc.
60 For further information.... "Metadata Watch Reports" of SCHEMAS Project,Critical overview (with expert commentary) on the metadata landscape as it evolvesRelated database of individual activity reportsD-Lib Magazine,Ariadne,DCMI Homepage,
61 DC-2001 DC-2001 in Tokyo Three tracks: October 22-26, 2001 Technical working group meetingsImplementation reports and research papersGeneral introduction and tutorials for non-experts
62 How to Participate Join the DC-General mailing list Join a working groupCreate a working groupInformation on lists and working groups is available at