Presentation is loading. Please wait.

Presentation is loading. Please wait.

An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston.

Similar presentations


Presentation on theme: "An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston."— Presentation transcript:

1 An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported by: p.johnston@ukoln.ac.uk http://www.ukoln.ac.uk/

2 Section 1 : An Introduction to Metadata

3 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 3 An introduction to Metadata Memory institutions, network services and metadata What is metadata? Exposing/sharing metadata Exposing/sharing metadata : semantics –the Dublin Core Metadata Initiative

4 Memory institutions, network services and metadata

5 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 5 Memory institutions Museums, libraries and archivesoften called memory institutionsare trusted organizations that collectively document the entire range of human experience and expression. Memory institutions are engaged in the important work of: Capturing, authenticating, and making sense of cultural memory; Preserving the human record for future generations; and Sharing knowledge to support education and learning. http://www.ukoln.ac.uk/interop-focus/ccs/positions/

6 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 6 Delivering services Memory institutions provide services to users –(At least some of) these services provide access to resources Emergence of built on global networks –remote access to digital resources for all (potentially…) –resources available round the clock –resources comparable to other digital resources from elsewhere Investment in –digitisation of cultural content –network services providing access to digitised content

7 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 7 Delivering services Potential for new types of service –digital libraries, virtual museums etc –integrated access to resources from multiple remote content providers –services defined by theme/subject/activity/audience etc, not by location/source –packaging and re-purposing of content –user-oriented rather than provider-oriented Changing user expectations –user wants information relevant to task/activity –may see structural/organisational boundaries of content providers as unimportant! –user wants access from any location –user wants access at any time

8 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 8 Delivering services Move from web sites to portals –A network service that provides a personalised, single point of access to a range of heterogeneous network services, local and remote, structured and unstructured –Andy Powell, 2002 Content providers exposing content for delivery through multiple services, channels Presentation services surfacing content from multiple (distributed) sources Memory institutions may perform both roles Move away from silo mentality towards more joined-up approaches

9 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 9 Resource discovery on the Web Broadly two approaches to providing discovery services –software indexing of resource content –human description of resources Web search engines –software agents (robots) retrieve documents by following hyperlinks (crawling) –index text of documents –make index available as searchable database –some clever ranking algorithms –e.g. Google infers Page Ranking based on links to document –find pages which link to page X –find pages similar to X

10 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 10 Resource discovery on the Web Web search engines –tend to generate many results –and may suffer from spamming –ranking algorithms may help –dont support structured search –search on author name –search on document type (journal article) –limited to textual resources –generally, poor support for search for multimedia objects The hidden Web –robots may not crawl documents dynamically generated from databases/CMS

11 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 11 Resource discovery on the Web But automated indexing –is low cost –At least compared to human resource description –(usually) scales to large numbers of resources –can be a useful tool! Challenge of finding appropriate balance of approaches for context

12 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 12 Metadata for services Metadata has been important to traditional service provision… … is essential component of effective network services

13 What is metadata?

14 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 14 What is metadata? Simple definitions… Structured data about data. –Dublin Core Metadata Initiative FAQ, 2003 Machine-understandable information about Web resources or other things. –Tim Berners-Lee, W3C, 1997

15 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 15 Towards a functional view of metadata Data associated with objects which relieves their potential users of having to have full advance knowledge of their existence or characteristics. A user might be a program or a person. –Lorcan Dempsey & Rachel Heery, 1998 Structured data about resources that can be used to help support a wide range of operations –Michael Day, 2001

16 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 16 What resources, objects, things? HTML documents digital images databases books museum objects archival records metadata records Web sites collections services physical places people institutions abstract works concepts events Metadata might exist for almost anything –digital, physical, abstract resources

17 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 17 What resources, objects, things? Metadata records include –bibliographic records in library catalogues or from abstracting & indexing services –descriptions of archival material in archival finding aids –object records in museum documentation / collection management systems –entries in directories of organisations, individuals and services –descriptions of digital objects (documents, images, software) –descriptions of collections of digital objects –descriptions of network services –descriptions of metadata records

18 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 18 What operations? Operations by human users, software tools Metadata might be used to support many different functions –resource disclosure & discovery –resource management, including preservation –intellectual property rights management –commerce –authentication and authorisation –personalisation and localisation of services Different functions require different types/classes of metadata –No one size fits all solution –Need to specify functional requirements

19 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 19 Metadata elements & element sets Metadata describes attributes or properties of a resource Each attribute or property is described by a metadata element –Can be identified, formally documented/defined –May be represented in different forms A metadata element set –coherent bounded set of elements formulated as basis for metadata creation –created for purpose, as a unit Schema –structured representation of an element set

20 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 20 Metadata for resource discovery User wishes to 1.discover resources according to some criteria 2.(optionally) identify a specific resource –confirm that resource described is resource sought –distinguish similar resources 3.select –evaluate, choose resource appropriate to needs 4.locate resource 5.obtain/access resource 6.use resource –open, read, display, run, play, copy, unpackage/repackage –interpret content Resource discovery metadata supporting (primarily) operations 1 - 4

21 Metadata for resource discovery full-text indexes might not be classed as metadata by some! generated by software tools discovery (by content), location semantically simple forms (e.g. Dublin Core) typically covering description of broad range of resources maybe part generated automatically, partly human authored discovery, identification, selection, location richer complex forms (e.g. MARC, EAD, CIMI-SPECTRUM, AMICO etc) typically covering specific types of resources often associated with particular community/domain creation may involve relatively high degree of human expertise discovery, identification, selection, location, access, use (which may be type specific) Continuum of complexity/functionality

22 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 22 Association of resource and metadata (1) Resource1 e.g. meta elements in HTML docs; summary properties in word processor docs Can resource support embedding of metadata? Does metadata creator have write access to resource? Can service extract embedded metadata? Metadata about aggregates of resources? Metadata about people, places, concepts? Creator = J Smith Date = 2001-11-05 Title = Report Metadata embedded in resource

23 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 23 e.g. link elements in HTML docs Metadata record may be remote from resource Can resource support embedding of link? Does metadata creator have write access to resource? Can service follow link to metadata record? What happens when resource deleted? Metadata about aggregates of resources? Metadata about people, places, concepts? Resource1 Metadata rec 1 Metadata rec = 1 Creator = J Smith Date = 2001-11-05 Title = Report Metadata record as separate object Record identifier embedded in resource Association of resource and metadata (2)

24 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 24 Metadata record may be remote from resource Does not require embedding of metadata or link Does not require metadata creator to have write access to resource Metadata record created independently of resource – possibly multiple records Service uses metadata records independently of resource Metadata record may persist after resource deleted Metadata record can describe anything (with identifier…) Resource1 Metadata rec 1 Creator = J Smith Date = 2001-11-05 Title = Report Doc = 1 Metadata record as separate object Resource identifier in metadata record Association of resource and metadata (3)

25 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 25 J Smith2001-11-05Report CreatorDateTitleDoc 1 Metadata record is used separately from resource described Recognition that metadata is resource to be managed, separately from resource described Metadata content stored in database, exposed in form(s) appropriate for service(s) Metadata as managed resource

26 Exposing/sharing metadata

27 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 27 How is metadata exposed/shared? Resource description communities –characterised by consensus on conventions for internal exchange of metadata Metadata for resource discovery –is used beyond its creator community –is combined/compared with metadata from other communities –is aggregated or cross-searched by services How does a content provider make metadata records available in a commonly understood form? How does a service provider obtain these metadata records from data providers?

28 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 28 How is metadata exposed/shared? Effective sharing of information expressed in metadata record requires agreement on –metadata semantics –what metadata elements mean –metadata structure –data model, relationships of component parts –metadata syntax –rules of expression –protocols –how metadata records transmitted between content provider and service provider Agreements formalised as specifications and standards (ideally…)

29 Exposing/sharing metadata : semantics Introducing the Dublin Core

30 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 30 Introducing the Dublin Core Initiative to improve resource discovery on Web –not for complex resource description –based on description of simple document- like objects –extended to other classes of resource International, cross-disciplinary consensus on simple element set –15 elements –all optional –all repeatable http://dublincore.org/

31 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 31 Introducing the Dublin Core (2) Title Subject Description Creator Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights

32 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 32 Dublin Core: creator Term Name: creator Label: Creator Definition: An entity primarily responsible for making the content of the resource. Comment: Examples of a Creator include a person, an organisation, or a service. Typically, the name of a Creator should be used to indicate the entity. Type of Term: element Status: recommended Date issued: 1999-07-02 URI: http://purl.org/dc/elements/1.1/creator

33 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 33 Dublin Core: date Term Name: date Label: Date Definition: A date associated with an event in the life cycle of the resource. Comment: Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format. Type of Term: element Status: recommended Date issued: 1999-07-02 URI: http://purl.org/dc/elements/1.1/date

34 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 34 Standardisation of Dublin Core CEN Workshop Agreement (EU) 2000: Dublin Core elements endorsed as CWA13874 Usage guidelines for European industry NISO Z39.85 (USA) 2001: National Information Standards Organization, an ANSI affiliate ISO 2002: Dublin Core Metadata Element Set approved as ISO 15836

35 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 35 Using the Dublin Core Tom Baker, A Grammar of Dublin Core, Dlib, October 2000 Metaphor of metadata as language DC as a simple pidgin language for use by tourists on the Internet commons Small vocabulary, simple grammar/structure –This Resource has Title An introduction to metadata –This Resource has Subject Resource discovery Not subtly expressive, but easy to learn and deploy - good enough to work

36 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 36 Using the Dublin Core Designed for simplicity of semantics, ease of use Provides basic semantic interoperability –semantics sufficiently general to be useful across domains Can provide 15 windows into richer resource descriptions –disclose rich description in simple form –semantic cross-walks, mappings

37 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 37 Using the Dublin Core title creator date desc rights Rich description Simple DC description

38 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 38 Qualifying Dublin Core Allows for controlled extensibility through qualifiers –Element refinements –make element meanings narrower, more specific: –a Date Created versus Date Modified –an IsReplacedBy versus Replaces Relation –Encoding schemes –provide contextual information or parsing rules that aid in the interpretation of a value –may specify that a value is drawn from a controlled vocabulary (e.g. LCSH, TGN etc) –may specify that a value is formatted in accordance with a specified notation (e.g. date formats)

39 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 39 Qualifying Dublin Core Qualifiers make elements more specific –Element Refinments narrow meanings, never extend –Encoding Schemes give context to element values The dumb-down rule –Application should be able to use the value as if it were unqualified –Ignore unknown Encoding Schemes –Resolve (semantically more specific) Element Refinements to (more generic) Elements Some loss of specificity, but still generally correct and useful for discovery

40 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 40 Dublin Core: valid Term Name: valid Label: Valid Definition: Date (often a range) of validity of a resource. Type of Term: element-refinement Status: recommended Date issued: 2000-07-11 URI: http://purl.org/dc/terms/valid

41 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 41 Using the Dublin Core Not a replacement for richer descriptive standards But useful –If you wish disclose community-specific metadata to other communities using commonly understood semantics –If you wish to provide integrated access to your own metadata databases with different underlying semantics –If you only need simple metadata semantics

42 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 42 Using the Dublin Core Inherent tensions in DC –Broad, fuzzy search buckets or rigidly prescribed usage? –Generic applicability across domains or intra- domain precision? –One-size-fits-all or customise-as-you-please? –Simply discovering resources (a few typical search attributes) or describing them fully (lots of detail)? –Dublin Core primarily as a native record format or extracted from richer metadata? –Broad-brush minimalism or comprehensive structuralism?

43 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 43 Summary Emergence of global networks enable new approaches to providing access to resources –Increasing requirement to provide resource discovery across boundaries Metadata supports many functions, including resource discovery DC as simple, cross-disciplinary metadata element set Next: –How metadata records are represented: syntax/structure –How metadata records are exposed/shared/used in resource discovery services

44 Section 2 : Sharing metadata: XML and the OAI Protocol for Metadata Harvesting

45 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 45 Sharing metadata : XML and OAI Exposing/sharing metadata: syntax and structure –Extensible Markup Language (XML) –XML Schema Metadata harvesting –The Open Archives Initiative Protocol for Metadata Harvesting Some OAI-based services Developing metadata-based services

46 Exposing/sharing metadata : syntax and structure XML & XML Schema

47 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 47 Embedding DC metadata in (X)HTML Dublin Core metadata can be embedded into (X)HTML documents –Simple to deploy but may be difficult to manage, maintain But almost none of the Web search engine services index it Lack of trust in open Web context –Abuse by content providers seeking to improve the ranking of their documents However, may be useful technique in closed context –e.g. single Web site or where control over which documents indexed

48 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 48 Embedding DC metadata in (X)HTML

49 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 49 Introducing XML Extensible Markup Language –Recommendation of W3C, 1998, 2000 Defines means of describing tree-structured data in text-based format –embedded markup delimits and describes data Simple, platform-independent syntax Standard programming interfaces –reusable software components Support from major software vendors Widely adopted for transferring data between programs, systems

50 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 50 1 J Smith 2001-11-05 Report J Smith2001-11-05Report CreatorDateTitleDoc 1 record title Report creator J Smith date 2001-11-05 table record doc 1

51 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 51 CreatorDateTitleDoc...... Serialisation Transmission De-serialisation Remote application

52 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 52 XML and interoperability Meta-language –language for describing markup languages –can define unlimited number of markup languages But…. –XML says nothing about what your names mean –will a software agent process my XML element correctly? Interoperability requires consensus on –the names of components (XML elements and attributes) –the structural model of a class of document: –the semantics represented by the components and the structure Shared use of common XML schemas

53 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 53 XML schemas Means to codify syntax/structure rules for class of XML document –what markup is allowed –structural constraints on use of markup Document Type Definition (DTD) –part of XML Recommendation W3C XML Schema –W3C recommendation –data-typing i.e. tighter control on element content –support for XML Namespaces –uses XML syntax Software can validate instance against DTD/schema

54 Metadata harvesting: The Open Archives Initiative Protocol for Metadata Harvesting

55 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 55 Searching & harvesting Resource discovery services operating across the resources of multiple distributed content providers Possible strategies –Distributed search –submit parallel queries to multiple metadata databases –collate multiple result sets for presentation to user –Harvest –gather metadata records from multiple providers into single database –(periodic re-gathering to refresh data) –query central database Performance issues in cross-searching

56 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 56 Introducing OAI Open Archives Initiative –develops/promotes interoperability standards to facilitate dissemination of content –roots in e-prints community seeking to improve access to scholarly publications –Deposit pre-prints – for quicker dissemination –Deposit post-prints – to reduce institutional costs, maximise impact –e-print archives –institutional –federated subject/discipline-based –required simple low-cost interface to expose metadata for reuse http://www.openarchives.org/

57 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 57 Introducing OAI (2) Terminology –Archive = repository, not archive –Open in terms of architecture, not free/unlimited access to repository Protocol for Metadata Harvesting (OAI-PMH) –Developed by international technical committee, 1999-2002 –Shift from optimising discovery of e-prints to more generic resource discovery –OAI committed to version 2.0 as a production release

58 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 58 Introducing OAI PMH Lightweight, low-cost protocol which allows data providers to expose metadata records for retrieval by service providers Service providers can say give me all/some of your metadata records Built on HTTP, XML –Six verbs: requests from service provider to data provider sent using HTTP GET/POST –responses from data provider to service provider as XML documents Not a distributed search protocol Not limited to e-print archives

59 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 59 Introducing OAI PMH (2) Supports transfer of metadata records –resources made available separately –identifier/locator of resources typically included in metadata record Data provider must provide simple/unqualified DC metadata record –may provide metadata records in other formats –metadata formats must be associated with a W3C XML Schema Extensible framework for metadata about –repository, sets, records Metadata and resources often freely available –but not a requirement

60 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 60 Introducing OAI PMH (3) Supports selective harvesting –by sets –by datestamps Example –Service Provider: List all records added since Jan 1 2002 in simple DC format (oai_dc) –verb = ListRecords –from = 2002-01-01 –metadataPrefix = oai_dc –http://www.myarchive.org/cgi- bin/oai?verb=ListRecords&from=2002-01- 01&metadataPrefix=oai_dchttp://www.myarchive.org/cgi- bin/oai?verb=ListRecords&from=2002-01- 01&metadataPrefix=oai_dc –Data Provider: Returns XML document containing records

61 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 61 Resources Metadata Web site Resources Metadata Web site DC Portal Web site Portal Web site Portal Web site DC OAI- PMH

62 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 62 OAI DC metadata record (from Library of Congress Repository 1) Empire State Building. [View from], to Central Park Gottscho, Samuel H. 1875-1971, photographer. 1932 Jan. 19 image two-dimensional nonprojectible graphic Cityscape photographs. Acetate negatives. http://hdl.loc.gov/loc.pnp/gsc.5a18067 United States--New York (State)--New York. No known restrictions on publication.

63 Some OAI based services

64 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 64 Resource Discovery Network (RDN) Co-operative network of subject gateways –Funded by JISC for HE and FE Seven hubs –ALTIS - Hospitality, Leisure, Sport and Tourism –BIOME: Health and Life Sciences –EEVL: Engineering, Mathematics and Computing –GESource: Geography and Environment –Humbul: Humanities –PSIgate: Physical Sciences –SOSIG: Social Sciences, Business and Law Databases of metadata records describing Internet resources selected for high quality http://www.rdn.ac.uk/

65 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 65 Resource Discovery Network (RDN) Hubs as subject communities –metadata creators are subject specialists –good links with users –separate metadata schemas Hubs provide their own Web interfaces –search databases –other services: tutorials, guides, alerting etc But operate within a shared policy framework –collection development –cataloguing guidelines –technical standards –agreements on IPR

66 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 66 Resource Discovery Network (RDN) RDN Resource Finder –Cross-search of Hubs metadata records –Initially distributed search using Z39.50 –Performance issues –Difficult to build flexible browse interface –Now using OAI PMH to harvest records –Currently harvesting simple DC –Basic keyword searching –Exploring harvesting some richer record formats for additional functionality Also some sharing of metadata –between Hubs (DC plus extensions) –between Hubs and other similar services (LOM) –but Hubs metadata not freely available for harvest

67 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 67 Resource Discovery Network http://www.rdn.ac.uk/ Resource Discovery Network http://www.rdn.ac.uk/

68 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 68 e-Prints UK JISC-funded project, 2002-2004 Provide access to e-prints via subject-based RDN services Harvest metadata from e-print archives –institutional, non-institutional, personal Automatically enhance harvested metadata (using Web Services) –Add (or validate) authoritative forms of author names (OCLC) –Assign subject classification (based on analysis of full-text of resource) (OCLC) –Generate OpenURLs from citations (based on analysis of full-text of resource) (Univ of Southampton/UKOLN) http://www.rdn.ac.uk/projects/eprints-uk/

69 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 69 e-Prints UK Provide search services –across all metadata –subject-partitioned search services for Hubs Enhanced metadata records made available to originating e-print archive Note –service provider enhancing harvested metadata to provide more functionality –some of enhancement process requires access to resource as well as metadata record –two-way flow of metadata records –recommendations for how to use simple DC to describe e-prints to maximise benefits of metadata disclosure

70 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 70 e-Prints UK RDN gateway/portal service RDN gateway/portal service RDN gateway/portal service Subject classification service Name authority service Citation analysis service Institutional e-print archives Personal e-print archives OAI-PMH SOAP Non-institutional e-print archives SOAP Javascript/HTTP Z39.50 Web services offered by OCLC Web service offered by Southampton e-print archives end-user services thru the RDN

71 Developing metadata-based services

72 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 72 Developing services Consensus on metadata semantics/syntax, transport protocols etc as minimal requirements Resource selection –collections policies Metadata quality assurance –cataloguing rules –mandatory elements, minimum-level records –guidance on content of values of elements: formats, controlled vocabularies, identifiers etc –Maintenance, currency of metadata Agreements on IPR, usage rights, branding –for metadata records as well as resources

73 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 73 Developing services DCMES intended to be simple enough for creation by untrained creators –assumption that metadata creation straightforward? Recognition that precision in services depends on quality of metadata Subject terms/classification difficult for non- expert Different services providing different functionality to different audiences may require different metadata

74 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 74 Developing services Human creation of metadata is not cheap! Where possible, use automated methods to –Generate metadata –Normalise/enhance metadata Service providers as well as data providers can contribute (e.g. e-prints UK) Reuse/repurpose metadata Where human creation required, provide support –Education, guidelines –Appropriate software tools

75 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 75 Developing services Service developers use/implement metadata standards in pragmatic way Standards creators concerned with –Consensus, commonality, interoperability –e.g. DCMES Implementers concerned with –Functionality, specificity, localisation –e.g. Using simple DC to describe e-Prints Application profile –A metadata element set optimised for a particular application

76 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 76 Summary Standards for metadata semantics XML as syntax for metadata exchange, but requires consensus on structures Harvesting model as alternative to distributed search –OAI PMH Service provision –metadata quality –rights issues –application profiles Next: –A common framework for metadata? –Towards the Semantic Web?

77 Section 3 : Sharing metadata: RDF and the Semantic Web

78 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 78 Sharing metadata: RDF & the Semantic Web Is there a problem? The vision of the Semantic Web Introducing RDF Some RDF applications

79 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 79 The problem with XML? XML as a mechanism for expressing tree- structured data Different communities make different design choices for the meaning of their trees –All good (and valid v XML DTD/Schema) Within resource description community, meaning(s) of structure(s) may be limited But applications working across communities have to work with multiple XML trees –potentially unlimited –not scalable in an open Web environment? –how to manage ever increasing set of conventions –always encountering new structures/schemas

80 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 80 The Semantic Web Activity of World Wide Web Consortium (W3C) To make data available on the Web in a form which is easier for machines to to process –Machine-processable statements about all kinds of things (Web pages, organisations, people, concepts, products, etc) and the relationships/links between them To share data between programs and systems designed independently –Unlock the data held in databases –Link data from different sources –To enable richer more flexible services http://www.w3.org/2001/sw/

81 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 81 The Semantic Web Builds on –use of Uniform Resource Identifiers (URIs) to uniquely identify resources –the Resource Description Framework (RDF) as a common model for expressing information about resources –an XML syntax for representing RDF data –existing Web protocols (HTTP) for transferring data

82 Introducing RDF

83 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 83 Introducing RDF Resource Description Framework –Model & Syntax, W3C Recommendation, 1999 –RDF Core WG activity, 2001-2003 Set of revised/expanded specifications currently (April 2002) in last call –Semantics: formal model –Concepts: abstract syntax (graph) –RDF/XML syntax: conventions for encoding statements using XML –Test Cases –Vocabulary Description Language –Primer: introduction http://www.w3.org/RDF/

84 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 84 Introducing RDF (2) Provides generic framework for representing information about resources –set of conventions/infrastructure for applications exchanging metadata –allows semantics to be defined by different resource description communities –accommodates mixing of information from diverse sources Resource : any object identified by URI –not necessarily accessible via Web Property : attribute to describe resource –properties also uniquely identified by URI Statement : triple of specific resource, property, and value

85 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 85 The RDF model http://example.org/doc/1 author John A resource has some property whose value is either (i) a simple string value (literal)… The resource identified by the URI http://example.org/doc/1 has a property author whose value is John Or, John is the author of the resource identified by http://example.org/doc/1

86 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 86 The RDF model (2) … or (ii) another resource... http://example.org/doc/1 author Johnjohn@example.org nameemail The value of property author is another resource which has a property name with value John and a property email with value john@example.org

87 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 87 The RDF model (3) … which may itself have a URI http://example.org/doc/1 author John http://example.org/person/john john@example.org nameemail

88 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 88 The RDF model (4) Properties themselves are identified by URIs http://example.org/doc/1 http://example.org/author John http://example.org/person/john john@example.org http://example.org/namehttp://example.org/email

89 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 89 The power of the RDF model Extensible model –supports any vocabularies Supports arbitrary complexity of description URIs as unique fixed points to identify –resources –properties Descriptions created independently can be merged using URIs as anchors –i.e. supports distributed metadata

90 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 90 First source http://example.org/doc/1 author John http://example.org/person/john john@example.org nameemail

91 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 91 Second source http://example.org/doc/1 subject XML

92 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 92 Third source http://example.org/person/john organisation JS Foundation

93 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 93 http://example.org/person/john organisation JS Foundation http://example.org/doc/1 author John http://example.org/person/john john@example.org nameemail http://example.org/doc/1 subject XML Three descriptions merged

94 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 94 A simple DC metadata record (the hedgehog) http://example.org/doc/1 dc:subject dc:type dc:title dc:creator dc:contributor dc:coverage dc:rights dc:relation dc:format dc:identifier dc:date dc:description dc:source dc:language dc:publisher

95 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 95 The RDF XML syntax XML representation of model –to store/exchange descriptions Use of XML Qualified Names and XML Namespaces to represent URIs in RDF/XML Conventions for the meaning of structures in RDF/XML document Service can know in advance the meaning of structures in RDF/XML document –i.e. always represents RDF graphs –even if unanticipated vocabularies used –can read multiple descriptions into store and merge on URIs

96 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 96 A simple DC metadata record (RDF/XML) a b c d e f g h i j k l m n o

97 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 97 RDF Vocabulary Description Language (RDF Schema) Provides mechanisms to describe –terms used in RDF statements –relationships between terms –e.g. Dublin Core metadata element set described using RDF(S) Defines type system –resources grouped into classes –classes may be related hierarchically (subClassOf) –properties may be related hierarchically (subPropertyOf) –use of properties may be constrained (domain, range) More RDF statements –i.e. metadata about metadata elements

98 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 98 Description of Dublin Core Creator http://purl.org/dc/elements/1.1/creator rdfs:label Creator rdfs:comment An entity … dc:description Examples of a … rdf:type http://www.w3.org/1999/02/22-rdf-syntax-ns#Property

99 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 99 Description of Dublin Core Creator (RDF/XML) Creator An entity primarily responsible for making the content of the resource. Examples of a Creator include a person, an organisation, or a service. Typically, the name of a Creator should be used to indicate the entity. 1999-07-02

100 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 100 Simplicity, contradiction, trust In RDF, meaning is expressed by simple statements: –Subject-Predicate-Object Anyone on Web can assert (in RDF sense) anything about anything –software agents navigating Web of statements –may be able to process some of these statements but not all –ignore the statements you don't understand –tolerance of inconsistency and errors Establishing trust as fundamental part of Semantic Web infrastructure –Who said this (and when etc)

101 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 101 Metadata and the Semantic Web Argued that the Semantic Web principles fit the nature of metadata –Metadata supports many different functions –Metadata is inherently "modular" –Metadata creation is not a one-off act, but an ongoing, distributed process –the metadata creator can't predict how users may want to use resources and query metadata –new uses of resources result in new metadata –Metadata is not (or at least not only) "objective", "authoritative" information –Some attributes represent interpretations –Some attributes are context-dependent –Multiple (even conflicting) descriptions can co-exist

102 Some RDF applications

103 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 103 RDF Site Summary (RSS) 1.0 Simple RDF metadata vocabulary designed to support syndication of "news" items An RSS "channel" is published as an RDF/XML docment Provides metadata about –The channel itself –A summary of its scope and purpose –A sequence of items –Summary descriptions of Web documents Content of channel regularly updated by provider Wide, simple, automated distribution http://purl.org/rss/1.0/

104 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 104 RDF Site Summary (RSS) 1.0 Typical applications –Web sites: render content of specific channels as part of their own Web sites –On line aggregator services: harvest numerous channels and provide search/filtering services across the items –e.g. Meerkat –Desktop news readers: allow users to "subscribe" to list of channels, regularly download content for user to browse –e.g. Amphetadesk RSS also generated from some Weblog management systems – SWAD(E) activity on "semantic weblogging"

105 http://www.ukoln.ac.uk/

106 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 106 Metadata schema registries How to encourage convergence and reuse of metadata vocabularies Implementers –may be unaware of existing vocabularies –adapt/customise "standard" terms for application- specific use –may combine terms from multiple "standard" sources –coin application-specific terms or extensions Application profile –A metadata element set optimised for a particular application

107 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 107 Metadata schema registries A publication context for –"standard" metadata vocabularies and their terms –(depending on scope of registry) also implementer usages/adaptations of those vocabularies and their terms –To provide a "dictionary" function –To highlight relationships, encourage reuse/convergence Based on indexing RDF data distributed on Web? Requires shared conventions for describing –metadata vocabularies –and their usages and adaptations

108 http://dublincore.org/dcregistry/

109 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 109 Summary RDF provides a common framework for making machine-processable statements about resources The Semantic Web provides a vision of metadata as –modular, extensible –distributed, devolved –dynamic, evolving Seeks to address (some of) the challenges of cross-domain, cross-community interoperability Fundamental role of trust on the Semantic Web

110 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 110 Overall summary Global networks have created a new context for the delivery of services Metadata fundamental to service provision Services being built (successfully!) –OAI PMH as a low-barrier technology No one-size-fits-all solution Debates, tensions, balances…. –automated processes v human labour –domain-specific richness v cross-domain (over-?) simplicity –standards v their implementation –objectivity v subjectivity –centralisation v distribution Emergence of a Semantic Web?

111 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 111 Acknowledgements Parts of the content of this presentation are adapted from earlier presentations by: Tom Baker (Fraunhofer-Gesellschaft, Berlin), Michael Day, Rachel Heery, Paul Miller, and Andy Powell (UKOLN)

112 Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 112 Acknowledgements UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based. http://www.ukoln.ac.uk/


Download ppt "An introduction to metadata for libraries, museums and archives Metadata in Digital Libraries, DELOS meeting, Riga, Latvia, 16 April 2003 Pete Johnston."

Similar presentations


Ads by Google