Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University.

Similar presentations


Presentation on theme: "Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University."— Presentation transcript:

1 Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University

2 Cornell CS 502 Co-existing Cost/Functionality Levels Greater Functionality & Cost

3 Cornell CS 502 Dublin Core Qualifiers From fuzzy buckets to more specific description Model of “graceful degradation” –Support both simplicity and specificity –Intra-domain and inter-domain semantics

4 Cornell CS 502 Resourcehasproperty DC:Creator DC:Title DC:Subject DC:Date... X implied subject implied verb one of 15 properties property value (an appropriate literal) [optional qualifier] qualifiers (adjectives)

5 Cornell CS 502 Varieties of qualifiers: Element Refinements Make the meaning of an element narrower or more specific. Narrowing implies an is a relationship –a "date created“ is a "date“ –an "is part of relation“ is a "relation“ If your software does not understand the qualifier, you can safely ignore it.

6 Cornell CS 502 Varieties of Qualifiers: Value Encoding Schemes Says that the value is –a term from a controlled vocabulary (e.g., Library of Congress Subject Headings) –a string formatted in a standard way (e.g., "2001-05-02" means May 3, not February 5) Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.

7 Cornell CS 502 ResourcehasDate"2000-06-13" Revised ISO8601 ResourcehasSubject"Languages -- Grammar" LCSH

8 Cornell CS 502 Dumb-Down Principle for Qualifiers The fifteen elements should be usable and understandable with or without the qualifiers Qualifiers refine meaning (but may be harder to understand) Nouns can stand on their own without adjectives If your software encounters an unfamiliar qualifier, look it up -- or just ignore it! "has a“ relations break the model –E.g., a creator has a hair color

9 Cornell CS 502 ResourcehasDate"2000-06-13" Revised ISO8601 ResourcehasSubject"Languages -- Grammar" LCSH Test for “good““ qualifiers: cover and ask: -- Does the statement still make sense? -- Is it still correct?

10 Cornell CS 502 Resourcehassubject audience Resourcehascreator affiliation “Incorrect” Qualification “Cornell University” “pre-schoolers”

11 Cornell CS 502 Open questions in this model Are uncontrolled and unconstrained values really useful for discovery? Is it possible for an organization (DCMI) to control the evolution of a language? How can "simple discovery metadata" be combined with complex descriptions? Is there a notion of graceful degradation? Can DC serve as a lingua franca (mapping template) among more complex models

12 Cornell CS 502 Models for Deploying Metadata Embedded in the resource –low deployment threshold –Limited flexibility, limited model Linked to from resource –Using xlink –Is there only one source of metadata? Independent resource referencing resource –Model of accessing the object through its surrogate

13 Cornell CS 502 Syntax Alternatives: HTML Advantages: –Simple Mechanism – META tags embedded in content –Widely deployed tools and knowledge Disadvantages –Limited structural richness (won’t support hierarchical,tree-structured data or entity distinctions).

14 Cornell CS 502 Dublin Core in HTML http://www.dublincore.org/documents/2000/08/15/dcq-html/ HTML constructs – to establish pseudo-namespace – for metadata statements name attribute for DC element (DC.element.ER) content attribute for element value scheme attribute for encoding scheme or controlled vocabulary lang attribute for language of element value

15 Cornell CS 502 Dublin Core in HTML example <meta name="DC.Date.Created" scheme="W3CDTF" content="2000-10-23">

16 Cornell CS 502 Unqualified Dublin Core in XML http://www.dublincore.org/documents/2000/11/dcmes-xml/ <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> Dave Beckett's Home Page Dave Beckett ILRT, University of Bristol 2000-06-06

17 Cornell CS 502 Example of Dublin Core Use A map in the United States Library of Congress on- line American Memory Collection

18 Cornell CS 502 Title The name given to the resource < META name = “DC.Title” content = “Novi Belgii Novæque Angliæ:nec non partis Virginiæ tabula multis in locis emendata ” lang = “la” >

19 Cornell CS 502 Creator An entity primarily responsible for making the content of the resource < META name = “DC.Creator” content = “Nicolaum Visscher” >

20 Cornell CS 502 Subject The topic of the content of the resource < META name = “DC.Subject” content = “Middle Atlantic States” scheme = “LCSH” > < META name = “DC.Subject” content = “Maps” scheme = “LCSH” > < META name = “DC.Subject” content = “Early works to 1800” scheme = “LCSH” >

21 Cornell CS 502 Description An account of the content of the description < META name = “DC.Description.Abstract” content = “An historical map showing the coast of New Jersey as perceived in the seventeenth century” >

22 Cornell CS 502 Publisher An entity responsible for making the resource available < META name = “DC.Publisher” content = “Library of Congress, United States” >

23 Cornell CS 502 Contributor An entity responsible for making contributions to the content of the resource. < META name = “DC.Contributor” content = “Historic Urban Plans” >

24 Cornell CS 502 Date A date associated with an event in the lifecycle of the resource < META name = “DC.Date.Created” content = “1996-04-17” scheme = “W3C-DTF” >

25 Cornell CS 502 Type The nature or genre of the content of the resource < META name = “DC.Type” content = “image” scheme = “DCMIType” >

26 Cornell CS 502 Format The physical or digital manifestation of the resource < META name = “DC.Format.Medium” content = “image/gif” scheme = “ IMT” > < META name = “DC.Format.Extent” content = “556K” >

27 Cornell CS 502 Identifier An unambiguous reference to the resource in the current context < META name = “DC.Identifier” content = “http://loc.gov/coll1/img456.jpg” scheme = “ URI” >

28 Cornell CS 502 Source A reference to a resource from which the present resource is derived. < META name = “DC.Source” content = “G3715 1685.V5 1969 (LOC catalog #)” >

29 Cornell CS 502 Language Language of the intellectual content of the object < META name = “DC.Language” content = “nl” scheme = “ISO 639-2” >

30 Cornell CS 502 Relation A reference to a related resource < META name = “DC.Relation.isPartOf” content = “http://lcweb2.loc.gov/ammem/ gmdhtml/dsxpimg.html” scheme = “URI” >

31 Cornell CS 502 Coverage The extent or scope of the content of the resource < META name = “DC.Coverage.Spatial” content = “New Jersey” scheme = “TGN" >

32 Cornell CS 502 Rights Information about rights in and over the resource < META name = “DC.Rights” content = “http://www.loc.gov/ rights_statement.htm” >

33 Cornell CS 502 Distributed Content The Metadata Challenge From fixed, contained physical artifacts to fluid, distributed digital objects Need for basis of trust and authenticity in network environment Decentralization and specialization of resource description and need for mapping formalisms

34 Cornell CS 502 Multi-entity nature of object description Photographe r Camera type Software Computer artist

35 Cornell CS 502 Understanding Metadata based on Query Capabilities Simple boolean tags? –Creator=“Tom Baker” and “Title” contains “Dublin Core” Agent, time, place questions? –Who was responsible for what and when and where

36 Cornell CS 502 Attribute/Value approaches to metadata… Hamlet has a creator Shakespeare subjectimplied verbmetadata nounliteral Playwright metadata adjective The playwright of Hamlet was Shakespeare R1 “ Shakespeare ” “ Hamlet ” dc:creator.playwright dc:title

37 Cornell CS 502 …run into problems for richer descriptions… Hamlet has a creator Stratford birthplace The playwright of Hamlet was Shakespeare, who was born in Stratford “ Stratford ” R1 “ Shakespeare ” dc:creator.playwright dc:creator.birthplace

38 Cornell CS 502 …because of their failure to model entity distinctions R1 “ Stratford ” creator R2 name “ Shakespeare ” birthplace title “ Hamlet ”

39 Cornell CS 502 Applying a Model-Centric Approach Formally define common entities and relationships underlying multiple metadata vocabularies Describe them (and their inter-relationships) in a simple logical model Provide the framework for extending these common semantics to domain and application- specific metadata vocabularies.

40 Cornell CS 502 Events are key to understanding metadata relationships? Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles. Clarifying attachment points facilitates understanding and querying “who was responsible for what when”.

41 Cornell CS 502 Content, Events, & Descriptions

42 Cornell CS 502 ABC/Harmony Event-aware metadata ontology Recognizing inherent lifecycle aspects of description (esp. of digital content) Modeling incorporates time (events and situations) as first-class objects –Supplies clear attachment points for agents, roles, existential properties Resource description as a “story-telling” activity

43 Cornell CS 502 Resource-centric Metadata TitleAnna Karenina AuthorLeo Tolstoy IllustratorOrest Vereisky TranslatorMargaret Wettlin Date Created1877 Date Translated1978 DescriptionAdultery & Depression BirthplaceMoscow Birthdate1828 ?

44 Cornell CS 502 “translator” “Margaret Wettlin” “Orest Vereisky” “illustrator” “Anna Karenina” “Tragic adultery and the search for meaningful love” “English” “author” “creation” “1877” “1978” “translation” “Russian” “Leo Tolstoy” "Moscow" “1828”

45 Cornell CS 502 Queries over complex descriptive graphs Ability to ask questions like “show me all the translations of War and Peace between 1980 and 1990”


Download ppt "Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University."

Similar presentations


Ads by Google