Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – 20020226 Carl Lagoze – Cornell University.

Slides:



Advertisements
Similar presentations
Ali Alshowaish. dc.coverage element articulates limitations in the scope of the resource, typically along the following lines: geographical, temporal,
Advertisements

THE DONOR PROJECT Titia van der Werf-Davelaar. Project Financed by: Innovation of Scientific Information Provision (IWI) Duration: –phase 1: 1 may 1998.
T. Baker / 23 Sep 2000 Dublin Core Qualifiers and A Grammar for Dublin Core Thomas Baker DC-8, National Library of Canada, Ottawa 4 October 2000.
A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background The.
UKOLN, University of Bath
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Developing a Metadata Exchange Format for Mathematical Literature David Ruddy Project Euclid Cornell University Library DML 2010 Paris 7 July 2010.
CS570 Artificial Intelligence Semantic Web & Ontology 2
Content - part 2 Week 5. Tonight Report from ECDL 2006 Alicante Spain More detailed look at metadata description of content.
Natalia Wehler: Dublin Core Requirements on Metadata  multiple softwares to use metadata  management of changing standards  needs to be functional,
Dublin Core A meta future Kara Luedke & Manny Brown.
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
RDF Kitty Turner. Current Situation there is hardly any metadata on the Web search engine sites do the equivalent of going through a library, reading.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Metadata for the Web A Necessary Evil? CS 431 – March 2, 2005 Carl Lagoze – Cornell University.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Resource Description Framework ( RDF ) Xinxia An.
Some URLs JODI Paper – Harmony project –
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Dublin Core Metadata Jenn Riley Metadata Librarian IU Digital Library Program.
1 CS 430: Information Discovery Lecture 17 Library Catalogs 2.
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University
1 CS/INFO 430 Information Retrieval Lecture 20 Metadata 2.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
JENN RILEY METADATA LIBRARIAN IU DIGITAL LIBRARY PROGRAM Introduction to Metadata.
Metadata Modularization Concepts and Tools Carl Lagoze CS
1 CS 430: Information Discovery Lecture 7 Descriptive Metadata 3 Dublin Core Automatic Generation of Catalog Records.
Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – Carl Lagoze – Cornell University.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Modularization and Interoperability: Dublin Core and the Warwick Framework Sandra D. Payette Digital Library Research Group Cornell University November.
RDF and XML 인공지능 연구실 한기덕. 2 개요  1. Basic of RDF  2. Example of RDF  3. How XML Namespaces Work  4. The Abbreviated RDF Syntax  5. RDF Resource Collections.
1 Discussion Class 4 The Dublin Core Metadata Initiative.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Metadata Bridget Jones Information Architecture I February 23, 2009.
A Quick Introduction to Metadata Michael Day UKOLN: The UK Office for Library and Information Networking, University of Bath
Evidence from Metadata INST 734 Doug Oard Module 8.
1 Dublin Core & DCMI – an introduction Some slides are from DCMI Training Resources at:
Introduction to Metadata Jenn Riley Metadata Librarian IU Digital Library Program.
A Whirlwind Tour Through Part of the Metadata Landscape Jenn Riley Metadata Librarian IU Digital Library Program.
1 The ABC Metadata Ontology and Model Carl Lagoze, Cornell University Jane Hunter, DSTC.
21 June 2001Managing Information Resources for e-Government1 The Dublin Core Makx Dekkers, Managing Director, Dublin Core Metadata Initiative
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
A centre of expertise in digital information managementwww.ukoln.ac.uk DCMI Affiliates: Implications for Institutions Rosemary Russell UKOLN University.
1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
THE BIBFRAME EDITOR AND THE LC PILOT Module 3 – Unit 1 The Semantic Web and Linked Data : a Recap of the Key Concepts Library of Congress BIBFRAME Pilot.
1cs The Need “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.” Berners-Lee,
Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – Carl Lagoze – Cornell University.
Metadata for the Web Beyond Dublin Core? CS 431 – March 9, 2005 Carl Lagoze – Cornell University Acknowledgements to Liz Liddy and Geri Gay.
1 RDF, XML & interoperability Metadata : a reprise Communities, communication & XML An introduction to RDF RDF, XML and interoperability.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Dublin Core Basics Workshop Lisa Gonzalez KB/LM Librarian.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
TRIG: Truckee River Info Gateway Dave Waetjen Graduate Student in Geography Information Center for the Environement (ICE) University of California, Davis.
Professional Development Programme: Design and Development of Institutional Repository Using DSpace Nipul G Shihora INFLIBNET Centre Gandhinagar
prepared by Dr. Ammar Yakan
Metadata Standards - Types
Metadata for the Web From Discovery to Description
Introduction to Metadata
Attributes and Values Describing Entities.
Some Options for Non-MARC Descriptive Metadata
Attributes and Values Describing Entities.
Presentation transcript:

Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – Carl Lagoze – Cornell University

Cornell CS 502 Co-existing Cost/Functionality Levels Greater Functionality & Cost

Cornell CS 502 Dublin Core Qualifiers From fuzzy buckets to more specific description Model of “graceful degradation” –Support both simplicity and specificity –Intra-domain and inter-domain semantics

Cornell CS 502 Resourcehasproperty DC:Creator DC:Title DC:Subject DC:Date... X implied subject implied verb one of 15 properties property value (an appropriate literal) [optional qualifier] qualifiers (adjectives)

Cornell CS 502 Varieties of qualifiers: Element Refinements Make the meaning of an element narrower or more specific. Narrowing implies an is a relationship –a "date created“ is a "date“ –an "is part of relation“ is a "relation“ If your software does not understand the qualifier, you can safely ignore it.

Cornell CS 502 Varieties of Qualifiers: Value Encoding Schemes Says that the value is –a term from a controlled vocabulary (e.g., Library of Congress Subject Headings) –a string formatted in a standard way (e.g., " " means May 3, not February 5) Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.

Cornell CS 502 ResourcehasDate" " Revised ISO8601 ResourcehasSubject"Languages -- Grammar" LCSH

Cornell CS 502 Dumb-Down Principle for Qualifiers The fifteen elements should be usable and understandable with or without the qualifiers Qualifiers refine meaning (but may be harder to understand) Nouns can stand on their own without adjectives If your software encounters an unfamiliar qualifier, look it up -- or just ignore it! "has a“ relations break the model –E.g., a creator has a hair color

Cornell CS 502 ResourcehasDate" " Revised ISO8601 ResourcehasSubject"Languages -- Grammar" LCSH Test for “good““ qualifiers: cover and ask: -- Does the statement still make sense? -- Is it still correct?

Cornell CS 502 Resourcehassubject audience Resourcehascreator affiliation “Incorrect” Qualification “Cornell University” “pre-schoolers”

Cornell CS 502 Open questions in this model Are uncontrolled and unconstrained values really useful for discovery? Is it possible for an organization (DCMI) to control the evolution of a language? How can "simple discovery metadata" be combined with complex descriptions? Is there a notion of graceful degradation? Can DC serve as a lingua franca (mapping template) among more complex models

Cornell CS 502 Models for Deploying Metadata Embedded in the resource –low deployment threshold –Limited flexibility, limited model Linked to from resource –Using xlink –Is there only one source of metadata? Independent resource referencing resource –Model of accessing the object through its surrogate

Cornell CS 502 Syntax Alternatives: HTML Advantages: –Simple Mechanism – META tags embedded in content –Widely deployed tools and knowledge Disadvantages –Limited structural richness (won’t support hierarchical,tree-structured data or entity distinctions).

Cornell CS 502 Dublin Core in HTML HTML constructs – to establish pseudo-namespace – for metadata statements name attribute for DC element (DC.element.ER) content attribute for element value scheme attribute for encoding scheme or controlled vocabulary lang attribute for language of element value

Cornell CS 502 Dublin Core in HTML example <meta name="DC.Date.Created" scheme="W3CDTF" content=" ">

Cornell CS 502 Unqualified Dublin Core in XML <rdf:RDF xmlns:rdf=" xmlns:dc=" Dave Beckett's Home Page Dave Beckett ILRT, University of Bristol

Cornell CS 502 Example of Dublin Core Use A map in the United States Library of Congress on- line American Memory Collection

Cornell CS 502 Title The name given to the resource < META name = “DC.Title” content = “Novi Belgii Novæque Angliæ:nec non partis Virginiæ tabula multis in locis emendata ” lang = “la” >

Cornell CS 502 Creator An entity primarily responsible for making the content of the resource < META name = “DC.Creator” content = “Nicolaum Visscher” >

Cornell CS 502 Subject The topic of the content of the resource < META name = “DC.Subject” content = “Middle Atlantic States” scheme = “LCSH” > < META name = “DC.Subject” content = “Maps” scheme = “LCSH” > < META name = “DC.Subject” content = “Early works to 1800” scheme = “LCSH” >

Cornell CS 502 Description An account of the content of the description < META name = “DC.Description.Abstract” content = “An historical map showing the coast of New Jersey as perceived in the seventeenth century” >

Cornell CS 502 Publisher An entity responsible for making the resource available < META name = “DC.Publisher” content = “Library of Congress, United States” >

Cornell CS 502 Contributor An entity responsible for making contributions to the content of the resource. < META name = “DC.Contributor” content = “Historic Urban Plans” >

Cornell CS 502 Date A date associated with an event in the lifecycle of the resource < META name = “DC.Date.Created” content = “ ” scheme = “W3C-DTF” >

Cornell CS 502 Type The nature or genre of the content of the resource < META name = “DC.Type” content = “image” scheme = “DCMIType” >

Cornell CS 502 Format The physical or digital manifestation of the resource < META name = “DC.Format.Medium” content = “image/gif” scheme = “ IMT” > < META name = “DC.Format.Extent” content = “556K” >

Cornell CS 502 Identifier An unambiguous reference to the resource in the current context < META name = “DC.Identifier” content = “ scheme = “ URI” >

Cornell CS 502 Source A reference to a resource from which the present resource is derived. < META name = “DC.Source” content = “G V (LOC catalog #)” >

Cornell CS 502 Language Language of the intellectual content of the object < META name = “DC.Language” content = “nl” scheme = “ISO 639-2” >

Cornell CS 502 Relation A reference to a related resource < META name = “DC.Relation.isPartOf” content = “ gmdhtml/dsxpimg.html” scheme = “URI” >

Cornell CS 502 Coverage The extent or scope of the content of the resource < META name = “DC.Coverage.Spatial” content = “New Jersey” scheme = “TGN" >

Cornell CS 502 Rights Information about rights in and over the resource < META name = “DC.Rights” content = “ rights_statement.htm” >

Cornell CS 502 Distributed Content The Metadata Challenge From fixed, contained physical artifacts to fluid, distributed digital objects Need for basis of trust and authenticity in network environment Decentralization and specialization of resource description and need for mapping formalisms

Cornell CS 502 Multi-entity nature of object description Photographe r Camera type Software Computer artist

Cornell CS 502 Understanding Metadata based on Query Capabilities Simple boolean tags? –Creator=“Tom Baker” and “Title” contains “Dublin Core” Agent, time, place questions? –Who was responsible for what and when and where

Cornell CS 502 Attribute/Value approaches to metadata… Hamlet has a creator Shakespeare subjectimplied verbmetadata nounliteral Playwright metadata adjective The playwright of Hamlet was Shakespeare R1 “ Shakespeare ” “ Hamlet ” dc:creator.playwright dc:title

Cornell CS 502 …run into problems for richer descriptions… Hamlet has a creator Stratford birthplace The playwright of Hamlet was Shakespeare, who was born in Stratford “ Stratford ” R1 “ Shakespeare ” dc:creator.playwright dc:creator.birthplace

Cornell CS 502 …because of their failure to model entity distinctions R1 “ Stratford ” creator R2 name “ Shakespeare ” birthplace title “ Hamlet ”

Cornell CS 502 Applying a Model-Centric Approach Formally define common entities and relationships underlying multiple metadata vocabularies Describe them (and their inter-relationships) in a simple logical model Provide the framework for extending these common semantics to domain and application- specific metadata vocabularies.

Cornell CS 502 Events are key to understanding metadata relationships? Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles. Clarifying attachment points facilitates understanding and querying “who was responsible for what when”.

Cornell CS 502 Content, Events, & Descriptions

Cornell CS 502 ABC/Harmony Event-aware metadata ontology Recognizing inherent lifecycle aspects of description (esp. of digital content) Modeling incorporates time (events and situations) as first-class objects –Supplies clear attachment points for agents, roles, existential properties Resource description as a “story-telling” activity

Cornell CS 502 Resource-centric Metadata TitleAnna Karenina AuthorLeo Tolstoy IllustratorOrest Vereisky TranslatorMargaret Wettlin Date Created1877 Date Translated1978 DescriptionAdultery & Depression BirthplaceMoscow Birthdate1828 ?

Cornell CS 502 “translator” “Margaret Wettlin” “Orest Vereisky” “illustrator” “Anna Karenina” “Tragic adultery and the search for meaningful love” “English” “author” “creation” “1877” “1978” “translation” “Russian” “Leo Tolstoy” "Moscow" “1828”

Cornell CS 502 Queries over complex descriptive graphs Ability to ask questions like “show me all the translations of War and Peace between 1980 and 1990”