Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University.

Similar presentations


Presentation on theme: "Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University."— Presentation transcript:

1 Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

2 Cornell CS 502 “Metadata is data about data”

3 Cornell CS 502 Metadata is semi-structured data conforming to commonly agreed upon models, providing operational interoperability in a heterogeneous environment

4 Cornell CS 502 Some untested hypotheses Metadata is useful for… –People –Machines More metadata is better (semi) automated digital libraries and simple metadata

5 Cornell CS 502 Some known facts Number and variety of metadata vocabularies will continue to increase The Tower of Babel is a franchise –There is not one common view of reality “The one thing I know about metadata is that it is expensive” (Bill Arms) “I hate metadata projects because they make every other digital library project more expensive” (Michael Lesk)

6 Cornell CS 502 Are metadata and data distinguishable? Objectivity? Intellectual property? Structure? Aboutness?

7 Cornell CS 502 The fiction of classification …there is no classification of the universe that is not fictional and conjectural. Jorge Luis Borges

8 Cornell CS 502 Lenses and Views All classification does and should provide a biased lens or view of reality Each view emphasizes certain characteristics and hides others Geospatial Rights Museum

9 Cornell CS 502 Reality is Complex Created by: George Castaldo Created on: 1994 Created by: Leonardo da Vinci Created on: 1506 Relationship?

10 Cornell CS 502 Objects are Related IFLA Entity Model

11 Cornell CS 502 Entities, Events, and Agents Photographe r Camera type Software Computer artist

12 Cornell CS 502 Haven’t we done metadata already?

13 Cornell CS 502 What’s wrong with this model? Expensive –Complex (even for its original goal?) –Professional intervention (assumes single community of expertise) Monolithic –One size fits all approach –Reflects its centralized system origins Bias towards physical artifacts –Fixed resources –Incomplete handling of resource evolution and other resource relationships Anglo-centric

14 Cornell CS 502 Web Challenge to Traditional Cataloging Scale Permanence Authenticity Organizational Context Custodial Control Variety

15 Cornell CS 502 Internet Commons includes Multiple Communities Scientific Data Home Pages Geo Internet Commons Library Museums Commerce Whatever...

16 Cornell CS 502 Metadata Takes Many Forms

17 Cornell CS 502 Metadata Challenges Accommodate multiple varieties of metadata –community-specific functionality, creation, administration, access Tensions –functionality and simplicity –extensibility and interoperability –human and machine creation and use

18 Cornell CS 502 Interoperability has many facets Semantics –Meaning/classification/ontology Models/Structure –Entities and relationships Syntax –grammars to convey semantics and structure

19 Cornell CS 502 Warwick Framework: Containing Chaos Conceptual Architecture for metadata from the Warwick Metadata Workshop (DC-2) Conceptual architecture to support the specification, collection, encoding, and exchange of modular metadata Provide context for metadata efforts (including Dublin Core) –avoids the “black-hole” of comprehensive element sets –focuses interoperability issues at package level

20 Cornell CS 502 Metadata Container Container Package Dublin Core Package MARC record Package Indirect Reference Package Terms and Conditions URI

21 Cornell CS 502 Modularization Allows Distributed Management Communities of expertise (not software vendors) are responsible for: –Semantics –Registration –Administration –Access management –Authority of data –Sharing and Distribution

22 Cornell CS 502 Realities of Web search and discovery Search systems are motivated by advertising Index coverage is unpredictable and limited Too much recall, too little precision Index spam abounds Resources (and their names) are volatile

23 Cornell CS 502 Metadata: Part of a Solution Structured data about data –helps to impose order on chaos –enables automated discovery/manipulation Variety across various dimensions: –specialization –decentralization –democratization

24 Cornell CS 502 Web Metadata Models: Drill-Down Searching Paradigm Moving along a specificity spectrum Inter-domain vs. intra-domain terms, models, query mechanisms One size doesn't fit all –Cognitive models of searching and browsing

25 Cornell CS 502 Drill-down search paradigm Domain Independent view Domain Specific View

26 Cornell CS 502 Metadata: Part of the problem cost functionality AACR2/MARC google Dublin Core

27 Cornell CS 502 Why hasn’t metadata worked on the Web? Its all about trust People are lazy Metadata is hard No perceived benefit –“Reverse tragedy of the commons” No agreement on one way to describe things “Metacrap” - http://www.well.com/~doctorow/metacrap.htm http://www.well.com/~doctorow/metacrap.htm


Download ppt "Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University."

Similar presentations


Ads by Google