Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 430: Information Discovery

Similar presentations


Presentation on theme: "CS 430: Information Discovery"— Presentation transcript:

1 CS 430: Information Discovery
Lecture 13 Descriptive Metadata: Dublin Core

2 Course Administration

3 Notes on MARC A great achievement: Developed in 1960s
Magnetic tape exchange format for printing catalog records The dawn of computing: mixed upper and lower case variable length fields, repeated fields non-Roman scripts 100(?) million records with standard content and format Thousands of trained librarians (millions?)

4 Notes on MARC A great problem: Not designed for computer algorithms
One record per item (poor links between records) Tied to traditional materials and traditional practices Not Unicode 100 million records at $ $10 billion A classic legacy system!

5 IFLA Model Work A work is the underlying abstraction, e.g., The Iliad
The Computer Science departmental web site Beethoven's Fifth Symphony Unix operating system The 1996 U.S. census This is roughly equivalent to the concept of "literary work" used in copyright law.

6 IFLA Model Expression. A work is realized through an expression, e.g.,
The Illiad has oral expressions and written expressions A musical work has score and performance(s). Software has source code and machine code Many works have only a single expression, e.g. a web page, or a book.

7 IFLA Model Manifestation. A expression is given form in one or more manifestations, e.g., The text of The Iliad has been manifest in numerous manuscripts and printed books. A musical performance can be distributed on CD, or broadcast on television. Software is manifest as files, which may be stored or transmitted in any digital medium.

8 IFLA Model Item. When many copies are made of a manifestation, each is a separate item, e.g., a specific copy of a book computer file [Works, expressions, manifestations and items are explored in CS 502, Architecture of Web Information Systems.]

9 Dublin Core Simple set of metadata elements for online information
15 basic elements intended for all types and genres of material all elements optional all elements repeatable Developed by an international group chaired by Stuart Weibel since 1995. (Diane Hillmann and Carl Lagoze of Cornell are very active in this group.)

10

11 Dublin Core publisher: OCLC creator: Weibel, Stuart L.
creator: Miller, Eric J. title: Dublin Core Reference Page date: format: text/html (MIME type) language: en (English) identifier:

12 Dublin Core elements 1. Title The name given to the resource by the creator or publisher. 2. Creator The person or organization primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources. 3. Subject The topic of the resource. Typically, subject will be expressed as keywords or phrases that describe the subject or content of the resource. The use of controlled vocabularies and formal classification schemes is encouraged.

13 Dublin Core elements 4. Description A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. 5. Publisher The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity. 6. Contributor A person or organization not specified in a creator element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a creator element (for example, editor, transcriber, and illustrator).

14 Dublin Core elements 7. Date A date associated with the creation or availability of the resource. 8. Type The category of the resource, such as home page, novel, poem, working paper, preprint, technical report, essay, dictionary. 9. Format The data format of the resource, used to identify the software and possibly hardware that might be needed to display or operate the resource. 10. Identifier A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs.

15 Dublin Core elements 11. Source Information about a second resource from which the present resource is derived. 12. Language The language of the intellectual content of the resource. 13. Relation An identifier of a second resource and its relationship to the present resource. This element permits links between related resources and resource descriptions to be indicated. Examples include an edition of a work (IsVersionOf), or a chapter of a book (IsPartOf).

16 Dublin Core elements 14. Coverage The spatial locations and temporal durations characteristic of the resource. 15. Rights A rights management statement, an identifier that links to a rights management statement, or an identifier that links to a service providing information about rights management for the resource.

17 Qualifiers Element qualifier Example: Date DC.Date.Created 1997-11-01
DC.Date.Issued DC.Date.Available / DC.Date.Valid /

18 Qualifiers Value qualifiers Example: Subject DC.Subject.DDC 509.123
DC.Subject.LCSH Digital libraries-United States

19 Representations of Dublin Core: Meta Tags
<meta name="publisher" content="OCLC"> <meta name="creator" content="Weibel, Stuart L."> <meta name="creator" content="Miller, Eric J."> <meta name="title" content="Dublin Core Reference Page"> <meta name="date" content=" "> <meta name="format" content="text/html"> <meta name="language" content="en"> <meta name="identifier" content="

20

21 Representations of Dublin Core: XML (with qualifiers)
<title>Digital Libraries and the Problem of Purpose</title> <creator>David M. Levy</creator> <publisher>Corporation for National Research Initiatives</publisher> <date date-type = "publication">January 2000</date> <type resource-type = "work">article</type> <identifier uri-type = "DOI"> /january2000-levy</identifier> <identifier uri-type = "URL"> <language>English</language> <rights>Copyright (c) David M. Levy</rights>

22 Representations of Dublin Core: Text (with qualifiers)
See next two slides for an example of a Dublin Core record for a web site prepared by a professional cataloguer at the Library of Congress. Note that the record does not follow the principle of dumbing-down.

23

24

25 Old Midterm Examination
What is the Dublin Core principle of dumbing-down? Are there any fields in this record that do not satisfy the principle?

26 Old Midterm Examination
What is the Dublin Core principle of dumbing-down? Are there any fields in this record that do not satisfy the principle? "The theory behind this principle is that consumers of metadata should be able to strip off qualifiers and return to the base form of a property. ... this principle makes it possible for client applications to ignore qualifiers in the context of more coarse-grained, cross-domain searches." Lagoze 2001

27 Old Midterm Examination
Dumbing-down failures: Description.note Title from home page as viewed on Nov. 1, 2000. Description Title from home page as viewed on Nov. 1, 2000. which is not a description of the object Publisher.place Nashville, Tenn. : Publisher Nashville, Tenn. : which is not the publisher of the object Correct dumbing-down: Subject.class.LCC E840.8.G65 Subject E840.8.G65 which is a subject code

28 Old Midterm Examination
4(b) The metadata in the fields Publisher and Publisher place end in punctuation marks. Can you suggest any reasons for doing so?

29 Old Midterm Examination
4(b) The metadata in the fields Publisher and Publisher place end in punctuation marks. Can you suggest any reasons for doing so? This is a historic curiosity. It comes from the concept that the metadata will be printed, so that the metadata is stored in a printable format. Publisher Gore/Lieberman, Publisher.place Nashville, Tenn. : is intended to be combined with a date as follows: Nashville, Tenn. : Gore/Lieberman, 2001

30 Old Midterm Examination
4(c) This record has no Creator field. It has a Contributor.nameCorporate field with value "Gore/Lieberman, Inc." Do you consider that this is correct use of Dublin Core? What would you put in the Creator and Contributor fields? Why?

31 Old Midterm Examination
Specification of Dublin Core: A. All fields are optional. It is not necessary to have a Creator. B. Definitions of fields Creator The person or organization primarily responsible for the intellectual content of the resource. Contributor A person or organization not specified in a creator element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a creator element. Gore/Lieberman, Inc. is the corporate author of this web site and is therefore the Creator.

32 Limits of Dublin Core Complex objects Metadata records Complete object
Sub-objects Article within a journal A thumbnail of another image The March 28 final edition of a newspaper

33 Flat v. linked records Flat record
All information about an item is held in a single Dublin Core record, including information about related items convenient for access and preservation information is repeated -- maintenance problem Linked record Related information is held in separate records with a link from the item record less convenient for access and preservation information is stored once Compare with normal forms in relational databases

34 Dublin Core with flat record extension
Continuation <relation rel-type = "InSerial"> <serial-name>D-Lib Magazine</serial-name> <issn> </issn> <volume>6</volume> <issue>1</issue> </relation>

35 Events Version 1 Version 2 New material
Should Version 2 have its own record or should extra information be added to the Version 2 record? How are these represented in Dublin Core?

36 Minimalist versus structuralist
15 elements, no qualifiers, suitable for non-professionals encourage creators to provide metadata Structuralists 15 elements, qualifiers, RDF, detailed coding rules will require trained metadata experts [For an example of how complex Dublin Core can become, see the source of:

37 Dublin Core: Personal Opinion
Dublin Core is a simple way to describe digital content that: is a single, self-contained object ("document-like") is static with time has few relationships Some web sites satisfy these criteria Dublin Core is not suitable for digital content that: is heavily structured changes dynamically Dublin Core contains limited descriptive metadata for information discovery

38 Dublin Core in Many Languages
See: Thomas Baker, Languages for Dublin Core, D-Lib Magazine December 1998,


Download ppt "CS 430: Information Discovery"

Similar presentations


Ads by Google