An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston.

Slides:



Advertisements
Similar presentations
Números.
Advertisements

Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
OAI from 50,000 Feet OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. Begun in 1999.
A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,
1 Web Search Environments Web Crawling Metadata using RDF and Dublin Core Dave Beckett Slides:
DC8 Ottawa, October 4-6, 2000 Rachel Heery UKOLN, University of Bath Application Profiles: managing metadata.
T. Baker / 23 Sep 2000 Dublin Core Qualifiers and A Grammar for Dublin Core Thomas Baker DC-8, National Library of Canada, Ottawa 4 October 2000.
DC2001, Tokyo DCMI Registry : Background and demonstration DC2001 Tokyo October 2001 Rachel Heery, UKOLN, University of Bath Harry Wagner, OCLC
A centre of expertise in digital information management IMS Digital Repositories Interoperability Andy Powell UKOLN,
Pete Johnston UKOLN, University of Bath Bath, BA2 7AY
Collection-level description & collection management: tool for the trade or information trade-off? Collection Description Focus Workshop 4 Newcastle, 8.
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
Metadata vocabularies and ontologies Dr. Manjula Patel Technical Research and Development
A centre of expertise in digital information management UKOLN is supported by: Is Metasearching Really Better Searching? STM Innovations.
UKOLN, University of Bath
An overview of collection-level metadata Applications of Metadata BCS Electronic Publishing Specialist Group, Ismaili Centre, London, 29 May 2002 Pete.
An ontology server for the agentcities.NET project Dr. Manjula Patel Technical Research and Development
Image metadata: interoperability and exchange
UKOLN is supported by: JISC Information Environment update Repositories and Preservation Programme meeting, October 24-25, 2006 Rachel Heery UKOLN
Andy Powell, Eduserv Foundation July 2006 Repository Roadmap – technical issues.
Collection-level description & the Information Landscape: users evaluate strategies for resource discovery Collection Description Focus Workshop 5 Cambridge,
February Harvesting RDF metadata Building digital library portals with harvested metadata workshop EU-DL All Projects concertation meeting DELOS.
The Discovery Landscape in Crystallography UKOLN is supported by: Monica Duke UKOLN, University of Bath, UK – eBank UK project A centre.
The 5S numbers game..
The basics for simulations
Collections and services in the information environment JISC Collection/Service Description Workshop, London, 11 July 2002 Pete Johnston UKOLN, University.
Copyright © [2002]. Roger L. Costello. All Rights Reserved. 1 XML Schemas Reference Manual Roger L. Costello XML Technologies Course.
Collection-level description in practice Collection-Level Description & NOF-digitise projects NOF-digitise programme seminar, London, 22 February 2002.
Collection description & Collection Description Focus JISC/DNER Moving Image & Sound Cluster Steering Group meeting, HEFCE Office, London, 24 September.
The Dublin Core Collection Description Application Profile (DC CD AP) Pete Johnston, UKOLN, University of Bath Chair, DC Collection Description Working.
An Introduction to Dublin Core
Towards consensus on collection-level description Collection Description Focus Briefing Day 1 British Library, St Pancras, London 22 October 2001 Bridget.
An introduction to collections and collection-level description Collection-Level Description & NOF-digitise projects NOF-digitise programme seminar, London,
Collections and collection-level description CIMI Members’ meeting, Boston, MA, USA April 2002 Pete Johnston UKOLN, University of Bath Bath, BA2.
Static Equilibrium; Elasticity and Fracture
Resistência dos Materiais, 5ª ed.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Introduction Embedded Universal Tools and Online Features 2.
DNER Architecture Andy Powell UKOLN, University of Bath Web of Science Enhancements Committee, Centre Point 5 March.
The JISC IE Metadata Schema Registry Pete Johnston UKOLN, University of Bath JISC Joint Programmes Meeting Brighton, 6-7 July 2004
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
CEN/ISSS DC workshop, January The UK approach to subject gateways Rachel Heery UKOLN University of Bath UKOLN is.
The role of metadata schema registries XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN.
XML and “meta-tagging” Technical seminar for Pathfinder LEAs, BECTa, Coventry, 26 February 2002 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN.
The DNER - a national digital library Andy Powell ZIG Meeting, York October 2001 UKOLN, University of Bath UKOLN is funded by Resource:
The JISC IE Metadata Schema Registry and IEEE LOM Application Profiles Pete Johnston UKOLN, University of Bath CETIS Metadata & Digital Repositories SIG,
A centre of expertise in digital information management RDN, e-Prints UK and NOF- Digitise: a (very) small sample of UK OAI activity Andy.
The Resource Discovery Network and OAI Andy Powell UKOLN, University of Bath UKOLN is funded by Resource: The Council.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
DNER Architecture Andy Powell 6 March 2001 UKOLN, University of Bath UKOLN is funded by Resource: The Council for.
Accessing a national digital library: an architecture for the UK DNER Andy Powell ELAG 2001, Prague 7 June 2001 UKOLN, University of Bath
RDF, XML and interoperability Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Registry of MEG-related schemas MEG BECTa, Coventry, 17 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported by:
Open Archive Forum Rachel Heery UKOLN, University of Bath UKOLN is funded by Resource: The Council for Museums, Archives.
1 RDF, XML & interoperability Metadata : a reprise Communities, communication & XML An introduction to RDF RDF, XML and interoperability.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Surveying the landscape: collection-level description & resource discovery JISC/NSF DLI Projects meeting, Edinburgh, 24 June 2002 Pete Johnston UKOLN,
Collection-level description: from theory to practice Minerva project meeting Paris, 24 January 2003 Pete Johnston UKOLN, University of Bath Bath, BA2.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
A centre of expertise in digital information management UKOLN is supported by: IEMSR, the Information Environment & Metadata Application.
Metadata Schema Registries: background and context MEG Registry Workshop, Bath, 21 January 2003 Rachel Heery UKOLN, University of Bath Bath, BA2 7AY UKOLN.
Accessing a national digital library: an architecture for the UK DNER
Attributes and Values Describing Entities.
Pete Johnston UKOLN, University of Bath Bath, BA2 7AY
Some Options for Non-MARC Descriptive Metadata
Attributes and Values Describing Entities.
Presentation transcript:

An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported by:

Section 1 : An Introduction to Metadata

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April An introduction to Metadata Memory institutions, network services and metadata What is metadata? Exposing/sharing metadata Exposing/sharing metadata : semantics –the Dublin Core Metadata Initiative

Memory institutions, network services and metadata

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Memory institutions Museums, libraries and archivesoften called memory institutionsare trusted organizations that collectively document the entire range of human experience and expression. Memory institutions are engaged in the important work of: Capturing, authenticating, and making sense of cultural memory; Preserving the human record for future generations; and Sharing knowledge to support education and learning.

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Delivering services Memory institutions provide services to users –(At least some of) these services provide access to resources Emergence of built on global networks –remote access to digital resources for all (potentially…) –resources available round the clock –resources comparable to other digital resources from elsewhere Investment in –digitisation of cultural content –network services providing access to digitised content

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Delivering services Potential for new types of service –digital libraries, virtual museums etc –integrated access to resources from multiple remote content providers –services defined by theme/subject/activity/audience etc, not by location/source –packaging and re-purposing of content –user-oriented rather than provider-oriented Changing user expectations –user wants information relevant to task/activity –may see structural/organisational boundaries of content providers as unimportant! –user wants access from any location –user wants access at any time

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Delivering services Move from web sites to portals –A network service that provides a personalised, single point of access to a range of heterogeneous network services, local and remote, structured and unstructured –Andy Powell, 2002 Content providers exposing content for delivery through multiple services, channels Presentation services surfacing content from multiple (distributed) sources Memory institutions may perform both roles Move away from silo mentality towards more joined-up approaches

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Resource discovery on the Web Broadly two approaches to providing discovery services –software indexing of resource content –human description of resources Web search engines –software agents (robots) retrieve documents by following hyperlinks (crawling) –index text of documents –make index available as searchable database –some clever ranking algorithms –e.g. Google infers Page Ranking based on links to document –find pages which link to page X –find pages similar to X

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Resource discovery on the Web Web search engines –tend to generate many results –and may suffer from spamming –ranking algorithms may help –dont support structured search –search on author name –search on document type (journal article) –limited to textual resources –generally, poor support for search for multimedia objects The hidden Web –robots may not crawl documents dynamically generated from databases/CMS

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Resource discovery on the Web But automated indexing –is low cost –At least compared to human resource description –(usually) scales to large numbers of resources –can be a useful tool! Challenge of finding appropriate balance of approaches for context

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Metadata for services Metadata has been important to traditional service provision… … is essential component of effective network services

What is metadata?

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April What is metadata? Simple definitions… Structured data about data. –Dublin Core Metadata Initiative FAQ, 2003 Machine-understandable information about Web resources or other things. –Tim Berners-Lee, W3C, 1997

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Towards a functional view of metadata Data associated with objects which relieves their potential users of having to have full advance knowledge of their existence or characteristics. A user might be a program or a person. –Lorcan Dempsey & Rachel Heery, 1998 Structured data about resources that can be used to help support a wide range of operations –Michael Day, 2001

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April What resources, objects, things? HTML documents digital images databases books museum objects archival records metadata records Web sites collections services physical places people institutions abstract works concepts events Metadata might exist for almost anything –digital, physical, abstract resources

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April What resources, objects, things? Metadata records include –bibliographic records in library catalogues or from abstracting & indexing services –descriptions of archival material in archival finding aids –object records in museum documentation / collection management systems –entries in directories of organisations, individuals and services –descriptions of digital objects (documents, images, software) –descriptions of collections of digital objects –descriptions of network services –descriptions of metadata records

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April What operations? Operations by human users, software tools Metadata might be used to support many different functions –resource disclosure & discovery –resource management, including preservation –intellectual property rights management –commerce –authentication and authorisation –personalisation and localisation of services Different functions require different types/classes of metadata –No one size fits all solution –Need to specify functional requirements

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Metadata elements & element sets Metadata describes attributes or properties of a resource Each attribute or property is described by a metadata element –Can be identified, formally documented/defined –May be represented in different forms A metadata element set –coherent bounded set of elements formulated as basis for metadata creation –created for purpose, as a unit Schema –structured representation of an element set

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Metadata for resource discovery User wishes to 1.discover resources according to some criteria 2.(optionally) identify a specific resource –confirm that resource described is resource sought –distinguish similar resources 3.select –evaluate, choose resource appropriate to needs 4.locate resource 5.obtain/access resource 6.use resource –open, read, display, run, play, copy, unpackage/repackage –interpret content Resource discovery metadata supporting (primarily) operations 1 - 4

Metadata for resource discovery full-text indexes might not be classed as metadata by some! generated by software tools discovery (by content), location semantically simple forms (e.g. Dublin Core) typically covering description of broad range of resources maybe part generated automatically, partly human authored discovery, identification, selection, location richer complex forms (e.g. MARC, EAD, CIMI-SPECTRUM, AMICO etc) typically covering specific types of resources often associated with particular community/domain creation may involve relatively high degree of human expertise discovery, identification, selection, location, access, use (which may be type specific) Continuum of complexity/functionality

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Association of resource and metadata (1) Resource1 e.g. meta elements in HTML docs; summary properties in word processor docs Can resource support embedding of metadata? Does metadata creator have write access to resource? Can service extract embedded metadata? Metadata about aggregates of resources? Metadata about people, places, concepts? Creator = J Smith Date = Title = Report Metadata embedded in resource

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April e.g. link elements in HTML docs Metadata record may be remote from resource Can resource support embedding of link? Does metadata creator have write access to resource? Can service follow link to metadata record? What happens when resource deleted? Metadata about aggregates of resources? Metadata about people, places, concepts? Resource1 Metadata rec 1 Metadata rec = 1 Creator = J Smith Date = Title = Report Metadata record as separate object Record identifier embedded in resource Association of resource and metadata (2)

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Metadata record may be remote from resource Does not require embedding of metadata or link Does not require metadata creator to have write access to resource Metadata record created independently of resource – possibly multiple records Service uses metadata records independently of resource Metadata record may persist after resource deleted Metadata record can describe anything (with identifier…) Resource1 Metadata rec 1 Creator = J Smith Date = Title = Report Doc = 1 Metadata record as separate object Resource identifier in metadata record Association of resource and metadata (3)

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April J Smith Report CreatorDateTitleDoc 1 Metadata record is used separately from resource described Recognition that metadata is resource to be managed, separately from resource described Metadata content stored in database, exposed in form(s) appropriate for service(s) Metadata as managed resource

Exposing/sharing metadata

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April How is metadata exposed/shared? Resource description communities –characterised by consensus on conventions for internal exchange of metadata Metadata for resource discovery –is used beyond its creator community –is combined/compared with metadata from other communities –is aggregated or cross-searched by services How does a content provider make metadata records available in a commonly understood form? How does a service provider obtain these metadata records from data providers?

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April How is metadata exposed/shared? Effective sharing of information expressed in metadata record requires agreement on –metadata semantics –what metadata elements mean –metadata structure –data model, relationships of component parts –metadata syntax –rules of expression –protocols –how metadata records transmitted between content provider and service provider Agreements formalised as specifications and standards (ideally…)

Exposing/sharing metadata : semantics Introducing the Dublin Core

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Introducing the Dublin Core Initiative to improve resource discovery on Web –not for complex resource description –based on description of simple document- like objects –extended to other classes of resource International, cross-disciplinary consensus on simple element set –15 elements –all optional –all repeatable

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Introducing the Dublin Core (2) Title Subject Description Creator Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Dublin Core: creator Term Name: creator Label: Creator Definition: An entity primarily responsible for making the content of the resource. Comment: Examples of a Creator include a person, an organisation, or a service. Typically, the name of a Creator should be used to indicate the entity. Type of Term: element Status: recommended Date issued: URI:

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Dublin Core: date Term Name: date Label: Date Definition: A date associated with an event in the life cycle of the resource. Comment: Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format. Type of Term: element Status: recommended Date issued: URI:

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Standardisation of Dublin Core CEN Workshop Agreement (EU) 2000: Dublin Core elements endorsed as CWA13874 Usage guidelines for European industry NISO Z39.85 (USA) 2001: National Information Standards Organization, an ANSI affiliate ISO 2002: Dublin Core Metadata Element Set approved as ISO 15836

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Using the Dublin Core Tom Baker, A Grammar of Dublin Core, Dlib, October 2000 Metaphor of metadata as language DC as a simple pidgin language for use by tourists on the Internet commons Small vocabulary, simple grammar/structure –This Resource has Title An introduction to metadata –This Resource has Subject Resource discovery Not subtly expressive, but easy to learn and deploy - good enough to work

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Using the Dublin Core Designed for simplicity of semantics, ease of use Provides basic semantic interoperability –semantics sufficiently general to be useful across domains Can provide 15 windows into richer resource descriptions –disclose rich description in simple form –semantic cross-walks, mappings

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Using the Dublin Core title creator date desc rights Rich description Simple DC description

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Qualifying Dublin Core Allows for controlled extensibility through qualifiers –Element refinements –make element meanings narrower, more specific: –a Date Created versus Date Modified –an IsReplacedBy versus Replaces Relation –Encoding schemes –provide contextual information or parsing rules that aid in the interpretation of a value –may specify that a value is drawn from a controlled vocabulary (e.g. LCSH, TGN etc) –may specify that a value is formatted in accordance with a specified notation (e.g. date formats)

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Qualifying Dublin Core Qualifiers make elements more specific –Element Refinments narrow meanings, never extend –Encoding Schemes give context to element values The dumb-down rule –Application should be able to use the value as if it were unqualified –Ignore unknown Encoding Schemes –Resolve (semantically more specific) Element Refinements to (more generic) Elements Some loss of specificity, but still generally correct and useful for discovery

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Dublin Core: valid Term Name: valid Label: Valid Definition: Date (often a range) of validity of a resource. Type of Term: element-refinement Status: recommended Date issued: URI:

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Using the Dublin Core Not a replacement for richer descriptive standards But useful –If you wish disclose community-specific metadata to other communities using commonly understood semantics –If you wish to provide integrated access to your own metadata databases with different underlying semantics –If you only need simple metadata semantics

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Using the Dublin Core Inherent tensions in DC –Broad, fuzzy search buckets or rigidly prescribed usage? –Generic applicability across domains or intra- domain precision? –One-size-fits-all or customise-as-you-please? –Simply discovering resources (a few typical search attributes) or describing them fully (lots of detail)? –Dublin Core primarily as a native record format or extracted from richer metadata? –Broad-brush minimalism or comprehensive structuralism?

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Summary Emergence of global networks enable new approaches to providing access to resources –Increasing requirement to provide resource discovery across boundaries Metadata supports many functions, including resource discovery DC as simple, cross-disciplinary metadata element set Next: –How metadata records are represented: syntax/structure –How metadata records are exposed/shared/used in resource discovery services

Section 2 : Sharing metadata: XML and the OAI Protocol for Metadata Harvesting

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Sharing metadata : XML and OAI Exposing/sharing metadata: syntax and structure –Extensible Markup Language (XML) –XML Schema Metadata harvesting –The Open Archives Initiative Protocol for Metadata Harvesting Some OAI-based services Developing metadata-based services

Exposing/sharing metadata : syntax and structure XML & XML Schema

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Embedding DC metadata in (X)HTML Dublin Core metadata can be embedded into (X)HTML documents –Simple to deploy but may be difficult to manage, maintain But almost none of the Web search engine services index it Lack of trust in open Web context –Abuse by content providers seeking to improve the ranking of their documents However, may be useful technique in closed context –e.g. single Web site or where control over which documents indexed

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Embedding DC metadata in (X)HTML

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Introducing XML Extensible Markup Language –Recommendation of W3C, 1998, 2000 Defines means of describing tree-structured data in text-based format –embedded markup delimits and describes data Simple, platform-independent syntax Standard programming interfaces –reusable software components Support from major software vendors Widely adopted for transferring data between programs, systems

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April J Smith Report J Smith Report CreatorDateTitleDoc 1 record title Report creator J Smith date table record doc 1

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April CreatorDateTitleDoc Serialisation Transmission De-serialisation Remote application

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April XML and interoperability Meta-language –language for describing markup languages –can define unlimited number of markup languages But…. –XML says nothing about what your names mean –will a software agent process my XML element correctly? Interoperability requires consensus on –the names of components (XML elements and attributes) –the structural model of a class of document: –the semantics represented by the components and the structure Shared use of common XML schemas

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April XML schemas Means to codify syntax/structure rules for class of XML document –what markup is allowed –structural constraints on use of markup Document Type Definition (DTD) –part of XML Recommendation W3C XML Schema –W3C recommendation –data-typing i.e. tighter control on element content –support for XML Namespaces –uses XML syntax Software can validate instance against DTD/schema

Metadata harvesting: The Open Archives Initiative Protocol for Metadata Harvesting

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Searching & harvesting Resource discovery services operating across the resources of multiple distributed content providers Possible strategies –Distributed search –submit parallel queries to multiple metadata databases –collate multiple result sets for presentation to user –Harvest –gather metadata records from multiple providers into single database –(periodic re-gathering to refresh data) –query central database Performance issues in cross-searching

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Introducing OAI Open Archives Initiative –develops/promotes interoperability standards to facilitate dissemination of content –roots in e-prints community seeking to improve access to scholarly publications –Deposit pre-prints – for quicker dissemination –Deposit post-prints – to reduce institutional costs, maximise impact –e-print archives –institutional –federated subject/discipline-based –required simple low-cost interface to expose metadata for reuse

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Introducing OAI (2) Terminology –Archive = repository, not archive –Open in terms of architecture, not free/unlimited access to repository Protocol for Metadata Harvesting (OAI-PMH) –Developed by international technical committee, –Shift from optimising discovery of e-prints to more generic resource discovery –OAI committed to version 2.0 as a production release

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Introducing OAI PMH Lightweight, low-cost protocol which allows data providers to expose metadata records for retrieval by service providers Service providers can say give me all/some of your metadata records Built on HTTP, XML –Six verbs: requests from service provider to data provider sent using HTTP GET/POST –responses from data provider to service provider as XML documents Not a distributed search protocol Not limited to e-print archives

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Introducing OAI PMH (2) Supports transfer of metadata records –resources made available separately –identifier/locator of resources typically included in metadata record Data provider must provide simple/unqualified DC metadata record –may provide metadata records in other formats –metadata formats must be associated with a W3C XML Schema Extensible framework for metadata about –repository, sets, records Metadata and resources often freely available –but not a requirement

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Introducing OAI PMH (3) Supports selective harvesting –by sets –by datestamps Example –Service Provider: List all records added since Jan in simple DC format (oai_dc) –verb = ListRecords –from = –metadataPrefix = oai_dc – bin/oai?verb=ListRecords&from= &metadataPrefix=oai_dchttp:// bin/oai?verb=ListRecords&from= &metadataPrefix=oai_dc –Data Provider: Returns XML document containing records

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Resources Metadata Web site Resources Metadata Web site DC Portal Web site Portal Web site Portal Web site DC OAI- PMH

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April OAI DC metadata record (from Library of Congress Repository 1) Empire State Building. [View from], to Central Park Gottscho, Samuel H , photographer Jan. 19 image two-dimensional nonprojectible graphic Cityscape photographs. Acetate negatives. United States--New York (State)--New York. No known restrictions on publication.

Some OAI based services

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Resource Discovery Network (RDN) Co-operative network of subject gateways –Funded by JISC for HE and FE Seven hubs –ALTIS - Hospitality, Leisure, Sport and Tourism –BIOME: Health and Life Sciences –EEVL: Engineering, Mathematics and Computing –GESource: Geography and Environment –Humbul: Humanities –PSIgate: Physical Sciences –SOSIG: Social Sciences, Business and Law Databases of metadata records describing Internet resources selected for high quality

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Resource Discovery Network (RDN) Hubs as subject communities –metadata creators are subject specialists –good links with users –separate metadata schemas Hubs provide their own Web interfaces –search databases –other services: tutorials, guides, alerting etc But operate within a shared policy framework –collection development –cataloguing guidelines –technical standards –agreements on IPR

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Resource Discovery Network (RDN) RDN Resource Finder –Cross-search of Hubs metadata records –Initially distributed search using Z39.50 –Performance issues –Difficult to build flexible browse interface –Now using OAI PMH to harvest records –Currently harvesting simple DC –Basic keyword searching –Exploring harvesting some richer record formats for additional functionality Also some sharing of metadata –between Hubs (DC plus extensions) –between Hubs and other similar services (LOM) –but Hubs metadata not freely available for harvest

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Resource Discovery Network Resource Discovery Network

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April e-Prints UK JISC-funded project, Provide access to e-prints via subject-based RDN services Harvest metadata from e-print archives –institutional, non-institutional, personal Automatically enhance harvested metadata (using Web Services) –Add (or validate) authoritative forms of author names (OCLC) –Assign subject classification (based on analysis of full-text of resource) (OCLC) –Generate OpenURLs from citations (based on analysis of full-text of resource) (Univ of Southampton/UKOLN)

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April e-Prints UK Provide search services –across all metadata –subject-partitioned search services for Hubs Enhanced metadata records made available to originating e-print archive Note –service provider enhancing harvested metadata to provide more functionality –some of enhancement process requires access to resource as well as metadata record –two-way flow of metadata records –recommendations for how to use simple DC to describe e-prints to maximise benefits of metadata disclosure

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April e-Prints UK RDN gateway/portal service RDN gateway/portal service RDN gateway/portal service Subject classification service Name authority service Citation analysis service Institutional e-print archives Personal e-print archives OAI-PMH SOAP Non-institutional e-print archives SOAP Javascript/HTTP Z39.50 Web services offered by OCLC Web service offered by Southampton e-print archives end-user services thru the RDN

Developing metadata-based services

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Developing services Consensus on metadata semantics/syntax, transport protocols etc as minimal requirements Resource selection –collections policies Metadata quality assurance –cataloguing rules –mandatory elements, minimum-level records –guidance on content of values of elements: formats, controlled vocabularies, identifiers etc –Maintenance, currency of metadata Agreements on IPR, usage rights, branding –for metadata records as well as resources

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Developing services DCMES intended to be simple enough for creation by untrained creators –assumption that metadata creation straightforward? Recognition that precision in services depends on quality of metadata Subject terms/classification difficult for non- expert Different services providing different functionality to different audiences may require different metadata

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Developing services Human creation of metadata is not cheap! Where possible, use automated methods to –Generate metadata –Normalise/enhance metadata Service providers as well as data providers can contribute (e.g. e-prints UK) Reuse/repurpose metadata Where human creation required, provide support –Education, guidelines –Appropriate software tools

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Developing services Service developers use/implement metadata standards in pragmatic way Standards creators concerned with –Consensus, commonality, interoperability –e.g. DCMES Implementers concerned with –Functionality, specificity, localisation –e.g. Using simple DC to describe e-Prints Application profile –A metadata element set optimised for a particular application

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Summary Standards for metadata semantics XML as syntax for metadata exchange, but requires consensus on structures Harvesting model as alternative to distributed search –OAI PMH Service provision –metadata quality –rights issues –application profiles Next: –A common framework for metadata? –Towards the Semantic Web?

Section 3 : Sharing metadata: RDF and the Semantic Web

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Sharing metadata: RDF & the Semantic Web Is there a problem? The vision of the Semantic Web Introducing RDF Some RDF applications

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April The problem with XML? XML as a mechanism for expressing tree- structured data Different communities make different design choices for the meaning of their trees –All good (and valid v XML DTD/Schema) Within resource description community, meaning(s) of structure(s) may be limited But applications working across communities have to work with multiple XML trees –potentially unlimited –not scalable in an open Web environment? –how to manage ever increasing set of conventions –always encountering new structures/schemas

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April The Semantic Web Activity of World Wide Web Consortium (W3C) To make data available on the Web in a form which is easier for machines to to process –Machine-processable statements about all kinds of things (Web pages, organisations, people, concepts, products, etc) and the relationships/links between them To share data between programs and systems designed independently –Unlock the data held in databases –Link data from different sources –To enable richer more flexible services

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April The Semantic Web Builds on –use of Uniform Resource Identifiers (URIs) to uniquely identify resources –the Resource Description Framework (RDF) as a common model for expressing information about resources –an XML syntax for representing RDF data –existing Web protocols (HTTP) for transferring data

Introducing RDF

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Introducing RDF Resource Description Framework –Model & Syntax, W3C Recommendation, 1999 –RDF Core WG activity, Set of revised/expanded specifications currently (April 2002) in last call –Semantics: formal model –Concepts: abstract syntax (graph) –RDF/XML syntax: conventions for encoding statements using XML –Test Cases –Vocabulary Description Language –Primer: introduction

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Introducing RDF (2) Provides generic framework for representing information about resources –set of conventions/infrastructure for applications exchanging metadata –allows semantics to be defined by different resource description communities –accommodates mixing of information from diverse sources Resource : any object identified by URI –not necessarily accessible via Web Property : attribute to describe resource –properties also uniquely identified by URI Statement : triple of specific resource, property, and value

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April The RDF model author John A resource has some property whose value is either (i) a simple string value (literal)… The resource identified by the URI has a property author whose value is John Or, John is the author of the resource identified by

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April The RDF model (2) … or (ii) another resource... author name The value of property author is another resource which has a property name with value John and a property with value

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April The RDF model (3) … which may itself have a URI author John name

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April The RDF model (4) Properties themselves are identified by URIs John

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April The power of the RDF model Extensible model –supports any vocabularies Supports arbitrary complexity of description URIs as unique fixed points to identify –resources –properties Descriptions created independently can be merged using URIs as anchors –i.e. supports distributed metadata

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April First source author John name

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Second source subject XML

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Third source organisation JS Foundation

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April organisation JS Foundation author John name subject XML Three descriptions merged

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April A simple DC metadata record (the hedgehog) dc:subject dc:type dc:title dc:creator dc:contributor dc:coverage dc:rights dc:relation dc:format dc:identifier dc:date dc:description dc:source dc:language dc:publisher

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April The RDF XML syntax XML representation of model –to store/exchange descriptions Use of XML Qualified Names and XML Namespaces to represent URIs in RDF/XML Conventions for the meaning of structures in RDF/XML document Service can know in advance the meaning of structures in RDF/XML document –i.e. always represents RDF graphs –even if unanticipated vocabularies used –can read multiple descriptions into store and merge on URIs

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April A simple DC metadata record (RDF/XML) a b c d e f g h i j k l m n o

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April RDF Vocabulary Description Language (RDF Schema) Provides mechanisms to describe –terms used in RDF statements –relationships between terms –e.g. Dublin Core metadata element set described using RDF(S) Defines type system –resources grouped into classes –classes may be related hierarchically (subClassOf) –properties may be related hierarchically (subPropertyOf) –use of properties may be constrained (domain, range) More RDF statements –i.e. metadata about metadata elements

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Description of Dublin Core Creator rdfs:label Creator rdfs:comment An entity … dc:description Examples of a … rdf:type

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Description of Dublin Core Creator (RDF/XML) Creator An entity primarily responsible for making the content of the resource. Examples of a Creator include a person, an organisation, or a service. Typically, the name of a Creator should be used to indicate the entity

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Simplicity, contradiction, trust In RDF, meaning is expressed by simple statements: –Subject-Predicate-Object Anyone on Web can assert (in RDF sense) anything about anything –software agents navigating Web of statements –may be able to process some of these statements but not all –ignore the statements you don't understand –tolerance of inconsistency and errors Establishing trust as fundamental part of Semantic Web infrastructure –Who said this (and when etc)

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Metadata and the Semantic Web Argued that the Semantic Web principles fit the nature of metadata –Metadata supports many different functions –Metadata is inherently "modular" –Metadata creation is not a one-off act, but an ongoing, distributed process –the metadata creator can't predict how users may want to use resources and query metadata –new uses of resources result in new metadata –Metadata is not (or at least not only) "objective", "authoritative" information –Some attributes represent interpretations –Some attributes are context-dependent –Multiple (even conflicting) descriptions can co-exist

Some RDF applications

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April RDF Site Summary (RSS) 1.0 Simple RDF metadata vocabulary designed to support syndication of "news" items An RSS "channel" is published as an RDF/XML docment Provides metadata about –The channel itself –A summary of its scope and purpose –A sequence of items –Summary descriptions of Web documents Content of channel regularly updated by provider Wide, simple, automated distribution

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April RDF Site Summary (RSS) 1.0 Typical applications –Web sites: render content of specific channels as part of their own Web sites –On line aggregator services: harvest numerous channels and provide search/filtering services across the items –e.g. Meerkat –Desktop news readers: allow users to "subscribe" to list of channels, regularly download content for user to browse –e.g. Amphetadesk RSS also generated from some Weblog management systems – SWAD(E) activity on "semantic weblogging"

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Metadata schema registries How to encourage convergence and reuse of metadata vocabularies Implementers –may be unaware of existing vocabularies –adapt/customise "standard" terms for application- specific use –may combine terms from multiple "standard" sources –coin application-specific terms or extensions Application profile –A metadata element set optimised for a particular application

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Metadata schema registries A publication context for –"standard" metadata vocabularies and their terms –(depending on scope of registry) also implementer usages/adaptations of those vocabularies and their terms –To provide a "dictionary" function –To highlight relationships, encourage reuse/convergence Based on indexing RDF data distributed on Web? Requires shared conventions for describing –metadata vocabularies –and their usages and adaptations

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Summary RDF provides a common framework for making machine-processable statements about resources The Semantic Web provides a vision of metadata as –modular, extensible –distributed, devolved –dynamic, evolving Seeks to address (some of) the challenges of cross-domain, cross-community interoperability Fundamental role of trust on the Semantic Web

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Overall summary Global networks have created a new context for the delivery of services Metadata fundamental to service provision Services being built (successfully!) –OAI PMH as a low-barrier technology No one-size-fits-all solution Debates, tensions, balances…. –automated processes v human labour –domain-specific richness v cross-domain (over-?) simplicity –standards v their implementation –objectivity v subjectivity –centralisation v distribution Emergence of a Semantic Web?

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Acknowledgements Parts of the content of this presentation are adapted from earlier presentations by: Tom Baker (Fraunhofer-Gesellschaft, Berlin), Michael Day, Rachel Heery, Paul Miller, and Andy Powell (UKOLN)

Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April Acknowledgements UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.