Presentation on theme: "IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ."— Presentation transcript:
IFLA/DELOS/NSF Workshop Standards and Metadata EVA 2000 Moscow November 2, 2000 Thomas BakerGMD Carl LagozeCornell Univ.
EVA 2000 Introductions Thomas Baker –GMD Library, Bonn, Germany –Dublin Core Executive Committee –EU DELOS Network of Excellence Carl Lagoze –Digital Library Research Group, Faculty of Computing and Information, Cornell University, Ithaca, NY, USA –Dublin Core Advisory Committee –NSF Digital Library Initiative
EVA 2000 Workshop Roadmap Introduction to Metadata (30 min.) Dublin Core Metadata Initiative (60 min.) Break Simplicity and Complexity (45 min.) Metadata Infrastructure (45 min.) Lunch Deploying and Using Metadata (90 min.) Metadata Landscape (30 min.)
Introduction to Metadata EVA 2000 Moscow
EVA 2000 Haven’t we done metadata already?
EVA 2000 What’s wrong with this model? Expensive –Complex (even for its original goal?) –Professional intervention (assumes single community of expertise) Monolithic –One size fits all approach –Reflects its centralized system origins Bias towards physical artifacts –Fixed resources –Incomplete handling of resource evolution and other resource relationships
EVA 2000 Internet Commons includes Multiple Communities Scientific Data Home Pages Geo Internet Commons Library Museums Commerce Whatever...
EVA 2000 Web Challenge to Traditional Cataloging Scale Permanence Authenticity Organizational Context Variety
EVA 2000 State of the Web as an Information System Search systems are motivated by advertising Index coverage is unpredictable and limited (1/3) Too much recall, too little precision Index spam abounds Resources (and their names) are volatile What about versions, editions, back issues? Archiving is presently unsolved Authority and quality of service are spotty Managing Intellectual Property Rights is hard
EVA 2000 Metadata: Part of a Solution Structured data about data –helps to impose order on chaos –enables automated discovery/manipulation Variety across various dimension: –specialization –decentralization –democratization
EVA 2000 Metadata Takes Many Forms
EVA 2000 Metadata Challenges Accommodate multiple varieties of metadata Tension: functionality and simplicity Tension: extensibility and interoperability Human and machine creation and use Community-specific functionality, creation, administration, access
EVA 2000 Warwick Framework: Containing Chaos Conceptual Architecture for metadata from the Warwick Metadata Workshop (DC-2) Conceptual architecture to support the specification, collection, encoding, and exchange of modular metadata Provide context for metadata efforts (including Dublin Core) –avoids the “black-hole” of comprehensive element sets –focuses interoperability issues at package level
EVA 2000 Modularization Allows Distributed Management Communities of expertise (not software vendors) are responsible for: –Semantics –Registration –Administration –Access management –Authority of data –Sharing and Distribution
EVA 2000 Interoperability requires conventions about: Semantics –The meaning of the elements Structure –human-readable –machine-parseable Syntax –grammars to convey semantics and structure
Dublin Core Metadata Initiative EVA 2000 Moscow
EVA 2000 History of the Dublin Core 1994: "Do we have a simple set of tags for ordinary people to describe their Web pages?" 1995: The Dublin Core: 13 elements, later : The Dublin Core is but one of many vocabularies needed ("Warwick Framework") 1997: "WF needs formal expression in a Resource Description Framework (RDF)" 2000: Dublin Core Metadata Initiative recommends qualifiers, broadens its organizational scope beyond the Core
EVA 2000 A pidgin for digital tourists Metadata is language. Dublin Core is a small and simple language -- a pidgin -- for finding resources across domains. Speakers of different languages naturally "pidginize" to communicate –E.g., tourists using simple phrases to order beer ("zwei Bier bitte" "dva pivo" "biru o san bai"...) We are all "tourists" on the global Internet.
EVA 2000 A grammar of Dublin Core By design not as subtle as mother tongues, but easy to learn and extremely useful in practice Pidgins: small vocabularies (Dublin Core: fifteen special nouns and lots of optional adjectives) Simple grammars: sentences (statements) follow a simple fixed pattern...
EVA 2000 Example Dublin Core statements Resource has Title 'Grammar of Dublin Core'. Resource has Creator 'Tom Baker'. Resource has Subject 'Metadata'. Resource has Relation
EVA 2000 Resourcehasproperty DC:Creator DC:Title DC:Subject DC:Date... X implied subject implied verb one of 15 properties property value (an appropriate literal) [optional qualifier] qualifiers (adjectives)
EVA 2000 The fifteen special nouns (properties)
EVA 2000 Dumb-Down Principle for qualifiers The fifteen elements should be usable and understandable with or without the qualifiers Like saying that nouns can stand on their own without adjectives If your software encounters an unfamiliar qualifier, look it up -- or just ignore it!
EVA 2000 ResourcehasDate" " Revised ISO8601 ResourcehasSubject"Languages -- Grammar" LCSH To test whether qualifiers are "good", cover them with your hand and ask: -- Does the statement still make sense? -- Is it still correct?
EVA 2000 Element Refinements Make the meaning of an element narrower or more specific. –a Date Created versus a Date Modified –an IsReplacedBy Relation versus a Replaces Relation If your software does not understand the qualifier, you can safely ignore it.
EVA 2000 Value Encoding Schemes Says that the value is –a term from a controlled vocabulary (e.g., Library of Congress Subject Headings) –a string formatted in a standard way (e.g., " " means May 3, not March 5) Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.
EVA 2000 Peer review of proposals for new terms DCMI Usage Committee reviews proposals for new qualifiers (and perhaps elements) Evaluates proposals in light of grammatical principles (are the qualifiers ignorable?) Tiered model of approval status (tentative): proposed, conforming, recommended, obsolete First qualifiers "recommended" in July 2000
EVA 2000 A not-so-good example ResourcehasCreator "Last.name: Smith First.name: John Type: Person Affiliation: IBM"
EVA 2000 Open questions in Dublin Core What are "appropriate values" for the fifteen properties? How can they be used for cross-domain searching? How can DCMI control the evolution of Dublin Core as it is adapted in practice? How can an application use DC as a pidgin while describing resources with more complex metadata? Can we keep the Core simple?
EVA 2000 Search buckets versus description Think of DC elements as fuzzy search buckets –Different types of data appropriate for different buckets: URLs, date strings, word strings, names –Separate books about Sigmund Freud versus books by Sigmund Freud into different buckets Search bucket: for discovering resources But general, fuzzy categories may not be sufficient for describing resources –After searching, display more detailed descriptions on screen
EVA 2000 DCMI broadens its mission (Oct 2000) The mission of the DCMI is to make it easier to find resources using the Internet through the following activities: –Developing metadata standards for discovery across domains (example: the Dublin Core) –Defining frameworks for the interoperation of metadata sets –Facilitating the development of community or disciplinary specific metadata sets that are consistent with items 1 and 2
EVA 2000 A context for the Core If "the Dublin Core" is the core of DCMI, what is the surrounding context? If "the Dublin Core" is the simple pidgin, what is the broader landscape of metadata language? How do pidgins relate to more complex models or "application profiles"? Do we need pidgins for describing other things, such as "people" and "events"?
EVA 2000 Using DC with other vocabularies Specialized application profiles [government information, education, mathematics] may need to: –Use general-purpose Dublin Core elements –Use elements from another, more domain-specific standard –Narrow standard definitions of DC elements for specific local uses –Invent local elements outside the scope of existing standards
EVA 2000 Namespaces versus Profiles Namespaces declare terms and definitions –Dublin Core namespace = Dublin Core standard Application profiles (only) re-use terms from namespaces –May package terms from multiple namespaces –May adapt definitions to local purposes –All terms must be defined in namespaces
EVA 2000 Adapting standard definitions to local uses Dublin Core Namespace: –DC:Title - machine-readable name of an element –"Title: A name given to the resource" -- human- readable name and definition Collection Description Profile (UKOLN) –DC:Title - name reused from the DC namespace –"Title: A name given to the collection" Definition is modified for the application context
EVA 2000 Example: adapting DC:Title to local uses As defined in the official Dublin Core "namespace": –"Title: A name given to the resource" As defined in a UK "application profile": –"Title: A name given to the collection" Definition is narrower
EVA 2000 Profiles may model multiple entities "Resource" (a thing) as an entity with its own: –Title (dc:title) –Date created (dc:date dcq:created) –Identifier (dc:identifier) "Agent" (a person) with its own –Name (vcard:fn) –Date of birth (vcard:bday) –Identifier (dc:identifier)
EVA 2000 Namespaces in translation Dublin Core has been translated into 26 languages –machine-readable tokens are shared by all –human-readable labels are defined in different languages –translations are distributed, maintained in many countries
EVA 2000 One token - labels in many languages dc:creator “Verfasser” rdfs:label “Creator” rdfs:label “Pencipta” rdfs:label [Server in Germany] [Server in Jakarta] [DCMI Server]
EVA 2000 RDF -- a more powerful sentence pattern Dublin Core statements: –Resource has Creator "Tom Baker". –Resource has Identifier Resource Description Framework "triples" - a more powerful way to say the same thing: –http://foo.org/bar.htm has Creator "Tom Baker".
EVA 2000 Resourcehasproperty DC:Creator DC:Title DC:Subject DC:Date... X implied subject implied verb one of 15 properties property value (an appropriate literal) [optional qualifier] qualifiers (adjectives)
EVA 2000 Resource "X" Property explicit subject implied verb "has" property (from any vocabulary) object, also known as "property value" (a literal -- or another resource) predicate
EVA 2000 DCMI Re-organization Expanded mission –Core metadata elements for Agents (or Events)? –Frameworks for integrating multiple standards Re-organization model –Membership organization like W3C or Unicode Consortium? –Retain open consensus model –International perspective –Better training, documentation, outreach
EVA 2000 DCMI Open Metadata Registry Managing vocabularies defined by the DCMI –Languages –Versioning –Controlled vocabularies Foundation for modular, incremental integration and evolution Collaboration with European SCHEMAS Project and ULIS in Tsukuba, Japan
EVA 2000 Official recognition of the Dublin Core CEN Workshop Agreement –endorse Dublin Core elements as CWA13874 –provide usage guidelines for European industry NISO Z39.85 –National Information Standards Organization, an ANSI affiliate –Balloting concluded in August 2000
EVA 2000 DCMI Activities Standards development and maintenance Metadata registry Technical working groups and periodic workshops Tutorial materials and user guides Education and training Access to software Liaisons with other standards or user communities
EVA 2000 DC-9 Workshop in Tokyo, 2001 DC-8 Workshop was a National Library of Canada (Ottawa) –emphasis on application profiles, longer-term organizational mission, and domain-specific adaptations of Dublin Core DC-9 in Tokyo: well-defined tracks –implementation reports and research papers –ongoing technical working group meetings –general introduction and tutorials for non- experts
Simplicity and Complexity EVA 2000 Moscow
EVA 2000 Warwick Framework Container/Package approach to metadata Rejection of universal ontology Recognition of individual community needs Provide scope for metadata efforts
EVA 2000 Warwick Framework Design Containers for aggregating Packages of typed metadata sets Container Package MARC Metadata Package Indirect Reference Package Terms and Conditions URI Package Dublin Core
EVA 2000 Warwick Framework Implementation and Research Packaging, linking, storing, and transmitting component/package framework Semantic interactions and interoperability among multiple metadata packages/vocabularies
EVA 2000 Interoperability among Metadata Vocabularies abc core classes Dublin Core MARC INDECSIMS
EVA 2000 Harmony Project Project Investigators –Dan Brickley - ILRT, Bristol (U.K.) –Jane Hunter - DSTC, Brisbane (Australia) –Carl Lagoze - Computer Science, Cornell (U.S.) More Information –http://www.ilrt.bris.ac.uk/discovery/harmony/
EVA 2000 Attribute/Value approaches to metadata… Hamlet has a creator Shakespeare subjectimplied verbmetadata nounliteral Playwright metadata adjective The playwright of Hamlet was Shakespeare R1 “ Shakespeare ” “ Hamlet ” dc:creator.playwright dc:title
EVA 2000 …run into problems for richer descriptions… Hamlet has a creator Stratford birthplace The playwright of Hamlet was Shakespeare, who was born in Stratford “ Stratford ” R1 “ Shakespeare ” dc:creator.playwright dc:creator.birthplace Hamlet has a creator Shakespeare
EVA 2000 …because of their failure to model entity distinctions R1 “ Stratford ” creator R2 name “ Shakespeare ” birthplace title “ Hamlet ”
EVA 2000 Applying a Model-Centric Approach Formally define common entities and relationships underlying multiple metadata vocabularies Describe them (and their inter- relationships) in a simple logical model Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies.
EVA 2000 Applications of the ABC Model Guidance for communities developing vocabularies Foundation for understanding existing vocabularies Basis for mappings among vocabularies using formalisms such as RDF
EVA 2000 Harmony/ABC Workshop January CNI Washington Representatives from –Dublin Core, INDECS, MPEG-7, IFLA –Archives, Museums, Libraries, Audiovisual Result: Importance of processes, events, and states in understanding and describing resources
EVA 2000 Conceptual Basis: Evolution of Content over Time IFLA Entity Model From Bearman, et. al., D-Lib Magazine, January 1999.
EVA 2000 Events help metadata relationships? Recognizing inherent lifecycle aspects of digital content - transformation of “input” resources to “output” resources and of their descriptions. (e.g., IFLA model) Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles. Clarifying attachment points facilitates mapping across common entities in different vocabularies.
EVA 2000 Content, Events, & Descriptions
EVA 2000 ABC Event Model
EVA 2000 A Simple Example: Live At Lincoln Performance Performance at The Lincoln Center for the Performing Arts On April 7, 1998 at 8pm Eastern time Orchestra is New York Philharmonic Musical score – “Concerto for Violin” 130 minute MP3 audio recording Rights held by Lincoln Center
EVA 2000 Example in ABC Model
EVA 2000 Live At the Lincoln Centre 7/4/98 20:00 Lincoln Centre New York Philharmonic Orchestra Lincoln Center for Performing Arts
EVA 2000 Derivation of Multiple Views CIDOC CRM Model ABC Description in XML ID3 tags embedded in MP3 MPEG-7 description in DDL Dublin Core in XML/RDF
EVA 2000 Step 1 – Structural Mapping Event-aware model Resource-centric model
EVA 2000 Structural Mapping Rules Event attributes transferred to output: Context/Date, /Time, /Place -> Date.Performance, Time.Performance, Place.Performance Act/Role -> Agent.Role e.g. Orchestra Event Type -> Relation between input & ouput e.g. Performance ->Relation.isPerformanceOf Output Description generated from event Type and input Title e.g. “Performance of Concerto for Violin”
EVA 2000 Live At Lincoln Center :00 Lincoln Centre New York Philharmonic comp523 Performance of 'Concerto for Violin' Lincoln Center for Performing Arts audio MP3 130
EVA 2000 Step 2 – Semantic Mapping
EVA 2000 XSLT for Transformations Works well for structural and syntactic mapping between metadata descriptions Semantic mappings need to be hardcoded Unsuitable for loosely constrained or variable input
EVA 2000 A More General Solution Flexible semantic mappings require additional knowledge: –Metadata Term Ontology – MetaNet Methods for using that context knowledge for mapping –Some combination of procedural language (Java) and XSLT –Investigating more general mapping rule language (analogies to compiler technology)
EVA 2000 Planned Experimental Context CIMI Experiments –Dublin Core for basic resource descriptions –Richer descriptions derived from ABC model –Mapping among descriptions –Understanding relationship between ABC and CIDOC CRM Connecting with Recordkeeping Metadata Issue - SPIRT Project
Metadata Infrastructure EVA 2000 Moscow
EVA 2000 Metadata is language Metadata schemas are languages for making statements about resources: –Book has Title "Gone with the Wind". –Web page has Publisher "Springer Verlag". Vocabulary terms (elements) are defined in standards like Dublin Core Metadata grammars constrain the statements and data models one can form
EVA 2000 But languages evolve with use Inevitably, languages resist stability People stretch official definitions Implementers misunderstand the intended meaning or use of elements Implementors coin local terms and extensions If the application does not fit the standard, the standard is often "customized" to fit the application
EVA 2000 Metadata languages are "multilingual" Metadata is not a spoken language The words of metadata -- "elements" -- are symbols that stand for concepts expressible in multiple natural languages Standards may have dozens of translations Are concepts like "title", "author", or "subject" used the same way in English, Finnish, and Korean?
EVA 2000 What metadata languages lack Comprehensive dictionaries –Where can one get an overview of vocabulary terms used in metadata languages? A publication context for implementers –Where can you see how they are using metadata? Standard grammars –How do we understand the principles of metadata?
EVA 2000 Can we manage this evolution? How can we (scalably) monitor the usage of a language that is: –Never spoken? –Rarely published in a way that can be harvested? How can dictionary editors help a metadata language evolve and grow in response to usage? How can this evolution occur across (human) languages?
EVA 2000 RDF Schemas (RDFS) -- W3C standard A dictionary format for metadata terms: –Simple XML format for terms and definitions Example: "Title" (Dublin Core) –Human-readable label and definition: Title: A name given to the resource. –Unique, machine-readable identifiers dc:title Support for cross-references –between terms in related standards –between local adaptations and related standards
EVA 2000 Print world versus the Web Traditional print world –Standards are currently defined and published as paper documents or Web pages in HTML –Metadata implementors rarely publish their local extensions and adaptations RDF Schemas (RDFS) –Web-based publication format –Explicit cross references from implementation schemas and the standards on which they are based
EVA 2000 EOR -- an RDF Schema Browser Harvests RDF Schemas –Schemas distributed on multiple Web servers –Creates huge database of schemas for searching –Web interface functions as a "metadata browser" –Click on cross-references between linked terms Downloadable as open source software –http://eor.dublincore.org/index.html –Authors: Eric Miller (OCLC, RDF Working Group, DCMI) and Tod Matola
EVA 2000 Hyperlink Metadata Terms over the Web Index of metadata terms searchable as one huge database Click on cross-references to follow term-to- term links between vocabularies Point-to-point, like the Web itself –In 1992, Gopher located the right file within directory trees (but not points within the file) –HTML enabled point-to-point links between documents
EVA 2000 " Editor" -- a MARC relator -- refines "Contributor"
EVA 2000 Follow the link to MARC Relator Terms
EVA the source of which looks like this:
EVA or to Contributor [here, in English, French, German]
EVA 2000 Or view the schema of MyRDF itself...
EVA itself an RDF schema like the others
EVA 2000 Registries can function as dictionaries Historically, dictionaries of English, French, etc: recorded variants, prescribed forms, and helped standardize (national) languages Metadata dictionaries can help metadata vocabularies evolve more like other human languages –Not just top-down, like traditional standards –Also bottom-up, in response to usage
EVA 2000 Dictionaries prescribe and describe Prescribe definitions and recommend usage Describe how terms are actually used –Monitor usage through collecting examples Editors and usage boards must strike a balance between prescription and description.
EVA 2000 SCHEMAS Project -- a Thin Registry an EU Project Pointers to resources elsewhere (a "thin" registry or portal) Short descriptions of metadata standards activities Critical commentaries by domain experts Promote the publication of schemas (in RDF) Goal: help implementors discover how others (e.g. EU Projects) are using standards in order to harmonize usage
EVA 2000 DCMI -- a Thick Registry A thick registry: stores official metadata element definitions in a central database or repository Managing a namespace (as a standards agency): publish qualifiers as available, with version control –Managing translations of the standard in multiple languages Eventually: –User guide interface –Support for standardisation processes (peer review) –Downloadable input to software tools for generating, editing, validating DC metadata
EVA 2000 Dictionaries as a tool for harmonization Knowledge of how other projects are using standards will avoid "reinventing the wheel" To help information providers harmonize their schemas for improved access within domains: –Between countries (Nordic Metadata Project) –Preprint repositories (Open Archives Initiative) –Subject gateways (Renardus) –Theses and dissertations (NDLTD) –Mathematics and physics (MathNet, PhysNet)
EVA 2000 A global registry infrastructure? Analogously to HTML for text, RDF Schema format suggests a scalable ecology of metadata vocabularies on the Web Sharing machine-readable elements translated into many languages suggests a global (multilingual) metadata language for digital libraries Can a well-managed registry infrastructure allow this language to evolve -- with flexible innovation in usage alongside more stable standards?
EVA 2000 The scope of registries Anything "semantic" (terms and definitions) is potentially an RDF schema: –controlled vocabularies –namespaces, application profiles, annotations –the "schema" of the registry itself Application constraints can be modelled in XML Schemas –"title is mandatory"; "date must be after 1980" Will XML and RDF Schemas merge?
Deploying and Using Metadata EVA 2000 Moscow
EVA 2000 Syntax Alternatives: HTML Advantages: –Simple Mechanism – META tags embedded in content –Widely deployed tools and knowledge Disadvantages –Limited structural richness (won’t support hierarchical,tree-structured data or entity distinctions). –Limited formalisms (parsing and schema definition)
EVA 2000 Dublin Core in HTML
EVA 2000 Syntax Alternatives: XML The standard for networked text and data Wide-spread tool support –Parsers (DOM and SAX) –Extensibility (namespaces) –Type definition (XML Schema) –Transformation and Rendering (XSLT) –Rich linking semantics (XLINK)
EVA 2000 XML Schema Rich XML-based language for expressing type semantics Replaces arcane and limited DTD (origin in SGML) Facilities –Data typing (both complex and primitive) –Constraints –Defaults
EVA 2000 Dublin Core in XML Carl Lagoze Accommodating Simplicity and Complexity in Metadata Cornell University, Computer Science
EVA 2000 Syntax Alternatives: RDF RDF (Resource Description Format) The instantiation of the Warwick Framework on the Web Provides enabling technology for richly- structured metadata Rich data model supporting notions of distinct entities and properties Syntax expressed in XML
EVA 2000 RDF Components Formal data model Syntax for interchange of data Schema Type system (schema model)
EVA 2000 RDF Data Model Directed labeled graphs Model elements –Resource –Property –Value –Statement –Containers
EVA 2000 RDF Model Primitives Resource Property Value Resource Statement
EVA 2000 RDF Syntax Example URI:R “CIMI Presentation” Title Creator dc: “Eric Miller” CIMI Presentation Eric Miller
EVA 2000 “Eric Miller” RDF Model Example #2 URI:R URI:ERIC oclc.org” “Eric Miller” “OCLC” bib: bib:Aff bib:Name URI:OCLC “CIMI Presentation” Title Creator oa: dc:
EVA 2000 CIMI Presentation Eric Miller RDF Syntax Example #2
EVA 2000 RDF Containers Permit the aggregation of several values for a property Express multiple aggregation semantics –unordered –sequential or priority order –alternative
EVA 2000 RDF Schemas Declaration of vocabularies –properties defined by a particular community –characteristics of properties and/or constraints on corresponding values Schema Type System - Basic Types –Property, Class, SubClassOf, Domain, Range –Minimal (but extensible) at this time –minimize significant clashes with typing system designed for XML Schema WG Expressible in the RDF model and syntax
EVA 2000 Relationships among vocabularies dc:Creator ms:director marc:100 bib:Author
EVA 2000 Bringing it together RDF Metadata transmission –Embedded (e.g. ), Transmitted with resource (HTTP), Trusted 3rd Party (HTTP GET) RDF Data Model –Support consistent encoding, exchange and processing of metadata… critical when aggregating data from multiple sources RDF Schema –Declare, define, reuse vocabularies
EVA 2000 Open Archives Initiative
EVA 2000 History Increasing interest in alternative scholarly publishing solutions – e.g., LANL arXiv Facilitation through federation UPS Mtg., Sante Fe, October 1999 –Representatives of various ePrint, library, publishing, communities –Goal: definition of an interoperability framework among ePrint providers
EVA 2000 What is Interoperability? Naming? –Handles –Purls Metadata? –MARC –Dublin Core Document models? –WebDAV Federated searching? –Z39.50? –DASL? Services and Protocols? –Dienst
EVA 2000 Searching Current Awareness Summarization Service Providers Data Providers harvesting The World According to OAI
EVA 2000 UPS Meeting Results Establishment of Open Archives Initiative –Loose coalition to experiment with interoperability solutions Santa Fe Convention –Organizational and technical framework to support metadata harvesting for ePrint archives
EVA 2000 Metadata Harvesting is not New Harvest Project ( ) –DARPA-funded –Mike Schwartz (U. Colorado), Mic Bowman (Penn State), Udi Manber (U. Arizona)
EVA 2000 “Open” Archives Political Agenda? –Author self-archiving of E-Prints –“Mission” to reformulate scholarly publishing framework Technical? –Infrastructure to facilitate interoperability across multiple domains
EVA 2000 Other communities of interest “Cambridge” digital library federation meetings –research library community has many materials for which they’d like to ‘expose’ metadata San Antonio OAI workshop –librarians, publishers (some), others
EVA 2000 Technical Umbrella for Practical Interoperability… Reference Libraries Publishers E-Print Archives …that can be exploited by different communities
EVA 2000 Acting mission statement Supply and promote an application independent technical framework – a supportive infrastructure that empowers different scholarly communities to pursue their own interests in interoperability in the technical, legal, business, and organizational contexts that are appropriate to them. Dan Greenstein, Director DLF
EVA 2000 What does this REALLY Mean? Keep the bar low enough to make widespread adoption possible Provide enough back-doors to make true “disruption” possible (e.g., ePrint community: –refine record notion to mandate full-content connection –refine metadata to mandate linkage to full- content
EVA 2000 Organizational Stability Institutional backing of CNI (Coalition for Networked Information) and DLF (Digital Library Federation) Formation of steering committee –first steps towards international involvement
EVA 2000 Framework for Partitioning Tasks Steering Committee –policy guidance Technical Committee –technical specifications Workshops –public dissemination, feedback, community- building
EVA 2000 Ithaca Technical Meeting Input –experiences gained with implementing & discussing the current SFc specs – emerging interest for the application of SFc- concepts as a general interoperability framework in a scholarly environment
EVA 2000 Ithaca technical meeting Output –guidelines for an in-depth revised technical spec to be issued early 2001 –stable for experimentation; not definitive –minimize risk for early adopters –maximize chances for future interoperability across communities
EVA 2000 underlying concepts abstract principles concrete implementation of principles Components of OAI Model
EVA 2000 service providers records in an archive open interface to archives managed archives (data providers) OAI Underlying Concepts
EVA 2000 metadata harvesting identifiers metadata set formats acceptable use registration abstract principles implementation of principle OAI harvesting protocol URIs (community schemes) DC & XML container (parallel sets) Flow Control (usage restrictions) (community specific) Building on Underlying Concepts
EVA 2000 What is a record? A record in an archive is a metadata-record. The metadata record describes – and can contain an entry point to- full-content.
EVA 2000 We recognize that archives will use specific metadata sets and formats that suit the needs of their communities and the types of data they handle. However, interoperability depends on a shared format for exchanging metadata and therefore archives should implement the basic Open Archives Metadata Set. Metadata: Interoperability & Extensibility
EVA 2000 Adoption of unqualified Dublin Core Element Set as required metadata. Support for parallel metadata sets maintained –EPMS (e-print community) –Others Research library community Museum community Metadata Solutions
EVA 2000 Metadata XML Container oai:arXiv:hep/ Ernest Rutherford Investigations of Radioactivity doi:1234/5432
EVA 2000 Identifier Issues Basic identifier constraints based on URI specifications –A key for requesting a record from a repository –Key and metadata format ID uniquely identify a record Individual communities may develop URN registration schemes
EVA 2000 Identifier Solutions full-identifier = oai:archive-identifier:record-identifier Registered URI Scheme Archive Idendifier: Registered within OAI Unique ID within archive: (syntax is archive- specific) example = oai:ncstrl:ncstrl.cornellcs/TR
EVA 2000 Repositories, Identifiers, and Records Identifier Datestamp MF1MF2MF3MF4 … ….
EVA 2000 Selective harvesting Recognized need for light-weight facility for selective harvesting –By Date Sets –A low-cost means of selective harvesting –NOT a general tool for defining global categories –Attribution of meanings to sets can be done within communities and in bilateral fashion
EVA 2000 Protocol Solutions Normalized and Enhanced Verb Set –GetRecord –Identity –ListIdentifiers –ListMetadataFormats –ListRecords –ListSets
EVA 2000 Protocol Solutions CGI-script friendly syntax –baseurl?verb=verbname&argname=argval... –verbname is the name of the verb –argname is the name of the attribute –argval is the value of the attribute Example
EVA 2000 Registration Solutions Automation through: –On-line registration of: Archive identifier (uniqueness enforcement) base-url of archives OAI protocol implementation –Identity verb that exposes archive characteristics –Use of protocol for registration of metadata formats and validity checking Registration of service providers is still an open issue
EVA 2000 Release Schedule October 15 – normalized meeting notes distributed to meeting group November 1 – beta specification to steering committee and limited distribution Early January – stabilization of specification and public meeting
Metadata Landscape EVA 2000 Moscow
EVA 2000 Conferences ACM Digital Libraries 2001, San Antonio, June 2001, European Conference on Digital Libraries, Darmstadt, Sep Asian Digital Library Conference, Seoul, December 2000, Tenth International WWW Conference, Hong Kong, May 2001,
EVA 2000 NSF Digital Library Initiative Phase I ( ): six large-scale testbeds involving research universities, industrial partners, and next-generation technologies Phase II (1999+): expanded scope, smaller projects as well as large testbeds, emphasis on making accessible new types of content
EVA 2000 Distributed National Electronic Resource (UK) A managed environment for Internet access to scholarly journals and other materials relevant to higher education in the UK Uses international standards (eg, Dublin Core) National purchase and licensing agreements for best value to UK education community eLib research funding since mid-1990s emphasized incremental improvement of standards and services
EVA 2000 Global Info (Germany) "The German Digital Library Project" Since 1996, integrating access to scientific information among libraries, publishers, learned societies, and individual scientists Emphasis on open standards (e.g., Dublin Core) and open-standard formats (e.g., XML, RDF, MPEG)
EVA 2000 European Union Fifth Framework Programme, –several dozen projects with several countries each –Digital Heritage, Cultural Content –Interactive Electronic Publishing –Multimedia Content and Tools DELOS Network of Excellence –http://www.ercim.org/delos/ –Communication within European digital library research community and international networking
EVA 2000 MathNet German Mathematical Societies index math pre-prints and home pages of mathematicians –Encourages use of Dublin-Core-based metadata by distributing free metadata editor; displays hits "with metadata" separately from hits "without metadata" International Mathematical Union (IMU) planning international Web service based on German MathNet model Seeking international agreement on simple metadata profiles for types of math materials
EVA 2000 IMS Global Learning Consortium, Inc. Teachers seeking appropriate classroom materials on Web may want to know: –for which age-group? –has it already been used successfully in classrooms? –will it work on my equipment? IMS: Rich descriptions of learning resources in a standard record format
EVA 2000 Federal Geographic Data Committee (US) FGDC Content Standard for Digital Geospatial Metadata: integrate access to resources about a particular area found in diverse repositories Government, education, and business needs –Emergency management –Integrated databases and comprehensive maps –City planning –Environmental control
EVA 2000 Visual Resources Association VRA Core Categories in a two-level model for describing objects such as paintings and buildings "Works" described separately from "images" of those works (One-to-One Principle) Conceptual clarity of One-to-One Principle implies more complex work-flow and processing for catalogers and software
EVA 2000 Nordic Metadata Project Cooperation between Scandinavian countries (since circa 1996) Pioneered idea of metadata-based distributed index across national boundaries NetLab (Lund University) maintains SAFARI, which harvests Dublin-Core- based metadata embedded in documents on Web servers
EVA 2000 Renardus Project (EU) –National libraries (Netherlands coordinates) –NDR: National Digital Resource in UK –Die Deutsche Bibliothek Goal: integrated access to subject gateways in Europe High-level agreement on simple, Dublin- Core-based schema as common denominator
EVA 2000 Networked Digital Library of Theses and Dissertations (NDLTD) International consortium of projects putting dissertations online Difficult to agree on single unified metadata schema -- national, legal, and disciplinary requirements differ significantly NDLTD agreement on a small Dublin-Core- based set of metadata elements?
EVA 2000 CIDOC International Council of Museums: object- oriented model (CIDOC) designed for describing multiple entities that may be –physical (e.g., museum objects) –conceptual (e.g., works) –temporal (e.g., historical periods) –spatial (e.g., places) Implies an integrated information space of "encyclopedic" scope
EVA 2000 Rich Site Summary (RSS) Metadata for content syndication (news feeds) Used in developing media content portals Built on established vocabularies (DC), uses RDF syntax Layers of application-specific semantics: syndication vocabularies, annotation vocabularies, etc.
EVA 2000 Moving Picture Experts Group (MPEG) MPEG 4: encoding and interacting with audio-visual objects MPEG 7: multimedia content description interface for such objects MPEG 21: ambitious "umbrella" framework describing the infrastructure for delivering and consuming multimedia content
EVA 2000 More... INDECS - Uses an event-based model to describe intellectual property rights for commercial transactions DOI - Uses the INDECS framework with a Digital Object Identifier for content description and management of references between scientific, technical, and medical journals BSR - Basic Semantic Registry as a universal interlingua of concepts GILS - Government Information Locator Service
EVA and more... PDS - Planetary Data System IEEE Learning Object Metadata - an elaborate, hierarchical scheme for describing multiple facets of educational material MARC 21 - Machine Readable Cataloging format and related vocabularies for libraries EPICS Data Dictionary, a subset of which -- ONIX -- describes books in a specific XML format (pushed by Amazon.com)
EVA 2000 For further information.... "Metadata Watch Reports" of SCHEMAS Project, –Critical overview (with expert commentary) on the metadata landscape as it evolves –Related database of individual activity reports D-Lib Magazine, Ariadne,
EVA 2000 Why the Web won Tim Berners-Lee's original model was very simple, and it was easy to implement Real-world experience with simple HTML led iteratively to better understanding of priorities –As with bicycles and airplanes, there was no "theory" for design -- design was perfected iteratively, starting simple Complex standards impose significant costs, especially if legacy data must be converted
EVA 2000 Learning from experience People are only human: the most perfect language is always subject to interpretation By design, metadata languages must allow for innovation and evolution Physics and art history, Chinese and Finnish -- different languages will continue in real life Likewise, a diversity of metadata languages is inevitable Interoperability over "everything" can only be via a simple and general pidgin