Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core.

Similar presentations


Presentation on theme: "1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core."— Presentation transcript:

1 1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core

2 2 Course Administration Assignment 1 Submission instructions will be posted soon. You will need a csuglab account. If you do not have such an account, go to Upson 311. Programming in Perl First class on Perl is Wednesday night, Hollister 110, 7:30 to 9:00 p.m.

3 3 Course Administration New Course LAW 410 Limits on and Protection of Creative Expression - Copyright Law and Its Close Neighbors This course, offered during fall term 2001, provides an introduction to copyright law and closely related legal regimes for non-law students.

4 4 Example: Monograph catalog record Citation Caroline R. Arms, editor, Campus strategies for libraries and electronic information. Bedford, MA: Digital Press, 1990.

5 5 MARC fields tag value 001 89-16879 r93 050 Z675.U5C16 1990 082 027.7/0973 20 245 Campus strategies for libraries and electronic title statement information/Caroline Arms, editor. 260 {Bedford, Mass.} : Digital Press, c1990. publisher 300 xi, 404 p. : ill. ; 24 cm. collation 440 EDUCOM strategies series on information technology series title 504 Includes bibliographical references (p. {373}-381). 020 ISBN 1-55558-036-X : $34.95

6 6 MARC fields (continued) 650 Academic libraries--United States--Automation. subject heading 650 Libraries and electronic publishing--United States. 650 Library information networks--United States. 650 Information technology--United States. 700 Arms, Caroline R. (Caroline Ruth) 040 DLC DLC DLC 043 n-us--- 955 CIP ver. br02 to SL 02-26-90 985 APIF/MIG

7 7 MARC Encoding: For Print and Computer Processing tag: 260 subfield a:{Bedford, Mass.} : subfield b:Digital Press, subfield c:c1990. MARC encoding: &2600#abc#{Bedford, Mass.} :#Digital Press,#c1990.%

8 8 Name authority files Caroline R. Arms or Caroline Ruth Arms? Which William Phillips of Cardiff? Mark Twain or Samuel Clemens? Epithets: of Cardiff doctor Dates: 1832 - 1876 flourished 1860 circa 1832 - 1876

9 9 Shared cataloguing OCLC -- Large centralized transaction processing database system When a library catalogs a book it deposits MARC record in OCLC Other libraries can copy the record saves duplication of cataloguing build database of holdings OCLC database has 43 million records

10 10 Subject information Library of Congress Subject Headings Academic libraries--United States--Automation Hierarchical classification Library of Congress call number:Z675.U5C16 Dewey Decimal Classification:027.7 Creation and maintenance of lists of subject headings and classifications is a never ending task.

11 11 Notes on MARC A great achievement: Developed in 1960s Magnetic tape exchange format for printing catalog records The dawn of computing: mixed upper and lower case variable length fields, repeated fields non-Roman scripts 100(?) million records with standard content and format Thousands of trained librarians (millions?)

12 12 Notes on MARC A great problem: Not designed for computer algorithms One record per item (poor links between records) Tied to traditional materials and traditional practices Not Unicode 100 of million records at $100 -- $10 billion A classic legacy system!

13 13 Cataloguing Objectives Functions of catalogs: finding collocating (recall and precision) choosing acquiring navigating... among items in a bibliographic universe Compare use cases in software design.

14 14 IFLA Model Work A work is the underlying abstraction, e.g., The Iliad The Computer Science departmental web site Beethoven's Fifth Symphony Unix operating system The 1996 U.S. census This is roughly equivalent to the concept of "literary work" used in copyright law.

15 15 IFLA Model Expression. A work is realized through an expression, e.g., The Illiad has oral expressions and written expressions A musical work has score and performance(s). Software has source code and machine code Many works have only a single expression, e.g. a web page, or a book.

16 16 IFLA Model Manifestation. A expression is given form in one or more manifestations, e.g., The text of The Iliad has been manifest in numerous manuscripts and printed books. A musical performance can be distributed on CD, or broadcast on television. Software is manifest as files, which may be stored or transmitted in any digital medium.

17 17 IFLA Model Item. When many copies are made of a manifestation, each is a separate item, e.g., a specific copy of a book computer file [Works, expressions, manifestations and items are explored in CS 502, Computing Methods of Digital Libraries.]

18 18 Dublin Core Simple set of metadata elements for online information 15 basic elements intended for all types and genres of material all elements optional all elements repeatable Developed by an international group chaired by Stuart Weibel since 1995. (Diane Hillmann and Carl Lagoze of Cornell are very active in this group.)

19 19

20 20 Dublin Core publisher: OCLC creator: Weibel, Stuart L. creator: Miller, Eric J. title: Dublin Core Reference Page date: 1996-05-28 format: text/html (MIME type) language: en (English) identifier: http://purl.org/dc/documents/rec-dces-199809.htm#

21 21 Dublin Core with Meta Tags

22 22 Dublin Core elements 1. Title The name given to the resource by the creator or publisher. 2. Creator The person or organization primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources. 3. Subject The topic of the resource. Typically, subject will be expressed as keywords or phrases that describe the subject or content of the resource. The use of controlled vocabularies and formal classification schemes is encouraged.

23 23 Dublin Core elements 4. Description A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. 5. Publisher The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity. 6. Contributor A person or organization not specified in a creator element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a creator element (for example, editor, transcriber, and illustrator).

24 24 Dublin Core elements 7. Date A date associated with the creation or availability of the resource. 8. Type The category of the resource, such as home page, novel, poem, working paper, preprint, technical report, essay, dictionary. 9. Format The data format of the resource, used to identify the software and possibly hardware that might be needed to display or operate the resource. 10. Identifier A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs.

25 25 Dublin Core elements 11. Source Information about a second resource from which the present resource is derived. 12. Language The language of the intellectual content of the resource. 13. Relation An identifier of a second resource and its relationship to the present resource. This element permits links between related resources and resource descriptions to be indicated. Examples include an edition of a work (IsVersionOf), or a chapter of a book (IsPartOf).

26 26 Dublin Core elements 14. Coverage The spatial locations and temporal durations characteristic of the resource. 15. Rights A rights management statement, an identifier that links to a rights management statement, or an identifier that links to a service providing information about rights management for the resource.

27 27 Qualifiers Element qualifier Example: Date DC.Date -> Created: 1997-11-01 DC.Date -> Issued: 1997-11-15 DC.Date -> Available: 1997-12-01/1998-06-01 DC.Date -> Valid: 1998-01-01/1998-06-01

28 28 Qualifiers Value qualifiers Example: Subject DC.Subject -> DDC: 509.123 DC.Subject -> LCSH: Digital libraries-United States

29 29

30 30 Dublin Core with qualifiers Digital Libraries and the Problem of Purpose David M. Levy Corporation for National Research Initiatives January 2000 article 10.1045/january2000-levy http://www.dlib.org/dlib/january00/01levy.html English Copyright (c) David M. Levy

31 31

32 32

33 33 Limits of Dublin Core Complex objects Article within a journal A thumbnail of another image The March 28 final edition of a newspaper Complete object Sub-objects Metadata records

34 34 Flat v. linked records Flat record All information about an item is held in a single Dublin Core record, including information about related items convenient for access and preservation information is repeated -- maintenance problem Linked record Related information is held in separate records with a link from the item record less convenient for access and preservation information is stored once Compare with normal forms in relational databases

35 35 Dublin Core with flat record extension Continuation D-Lib Magazine 1082-9873 6 1

36 36 Events Version 1 New material Version 2 Should Version 2 have its own record or should extra information be added to the Version 2 record? How are these represented in Dublin Core?

37 37 Minimalist versus structuralist Minimalist 15 elements, no qualifiers, suitable for non-professionals encourage creators to provide metadata Structuralists 15 elements, qualifiers, RDF, detailed coding rules will require trained metadata experts [For an example of how complex Dublin Core can become, see the source of: http://purl.org/dc/documents/rec-dces- 199809.htm#]

38 38 Dublin Core in many languages See: Thomas Baker, Languages for Dublin Core, D-Lib Magazine December 1998, http://www.dlib.org/dlib/december98/12baker.html

39 39 Dublin Core: Personal Opinion Dublin Core is a simple way to describe digital content that: is a single, self-contained object ("document-like") is static with time has few relationships Some web sites satisfy these criteria Dublin Core is not suitable for digital content that: is heavily structured changes dynamically


Download ppt "1 CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs Dublin Core."

Similar presentations


Ads by Google