Presentation is loading. Please wait.

Presentation is loading. Please wait.

2002.09.10 - SLIDE 1IS 202 - Fall 2002 Lecture 05: Metadata: Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30.

Similar presentations


Presentation on theme: "2002.09.10 - SLIDE 1IS 202 - Fall 2002 Lecture 05: Metadata: Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30."— Presentation transcript:

1 2002.09.10 - SLIDE 1IS 202 - Fall 2002 Lecture 05: Metadata: Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2002 SIMS 202: Information Organization and Retrieval

2 2002.09.10 - SLIDE 2IS 202 - Fall 2002 Lecture Contents Review –Categories –The Vocabulary Problem Organization of Information Metadata Kinds of Metadata Dublin Core

3 2002.09.10 - SLIDE 3IS 202 - Fall 2002 Lecture Contents Review –Categories –The Vocabulary Problem Organization of Information Metadata Kinds of Metadata Dublin Core

4 2002.09.10 - SLIDE 4IS 202 - Fall 2002 Categorization Classical categorization –Necessary and sufficient conditions for membership –Generic-to-specific monohierarchical structure Modern categorization –Characteristic features (family resemblances) –Centrality/typicality (prototypes) –Basic-level categories

5 2002.09.10 - SLIDE 5IS 202 - Fall 2002 Properties of Categorization Family Resemblance –Members of a category may be related to one another without all members having any property in common Prototypes –Some members of a category may be “better examples” than others, i.e., “prototypical” members

6 2002.09.10 - SLIDE 6IS 202 - Fall 2002 Furnas: The Vocabulary Problem People use different words to describe the same things –“If one person assigns the name of an item, other untutored people will fail to access it on 80 to 90 percent of their attempts.” –“Simply stated, the data tell us there is no one good access term for most objects.”

7 2002.09.10 - SLIDE 7IS 202 - Fall 2002 Vocabulary Problem Solutions? Furnas et al. –Make the user memorize precise system meanings –Have the user and system interact to identify the precise referent Minsky and Lenat –Give the system “commonsense” so it can understand what the user’s words can mean

8 2002.09.10 - SLIDE 8IS 202 - Fall 2002 Calling Things Names Impromptu Study by Nathan Good –Asked people to identify 3 common objects –Although the objects were fairly common, people came up with widely different names for them –Found 14 people from four different contexts (Soda hall, my home, HP labs, bus stop)

9 2002.09.10 - SLIDE 9IS 202 - Fall 2002 Results 2 - Vanilla Coke Bottle 1 - Vanilla Coke Can 1 - Coke 1 - Empty bottle 2 - Bottle 2 - Coke bottle 1 - Bottle of coke 2 - Plastic bottle 1 - Empty vanilla coke bottle 1 - 20 oz coke bottle 7- Pen 1 - A horizontal line 1 - Blue Ball point Pen 1 - Ink Pen 1 - Pencil 1 - Pentel Pen 1 - Transparent Pen 1 - Pentel pen with blue rubber grip

10 2002.09.10 - SLIDE 10IS 202 - Fall 2002 Results 9 - Notebook 1 - A black object 1 - Black Media Star Notebook 1 - Black Notebook 1 - Binder 1 - Spiral notebook

11 2002.09.10 - SLIDE 11IS 202 - Fall 2002 Lecture Contents Review –Categories –The Vocabulary Problem Organization of Information Metadata Kinds of Metadata Dublin Core

12 2002.09.10 - SLIDE 12IS 202 - Fall 2002 Organization of Information Is there a basic human need to put things into some sort of order? –Much of natural language concerns categories of things rather than individual things –Why do we organize things and information? Why do spoons go in THAT drawer in the kitchen and not in a can in the garage? Why do your favorite books go on one shelf and not-so-favorite on another?

13 2002.09.10 - SLIDE 13IS 202 - Fall 2002 Why Organize Information? The main reason –So that you can find things more effectively I.e., effective retrieval is predicated on some sort of organization applied to information resources Historically there have been many institutions and tools devoted to information organization –Libraries –Museums –Archives –Indexes and catalogs, dictionaries, phone books, etc.

14 2002.09.10 - SLIDE 14IS 202 - Fall 2002 Why Organize Information? A question of scale: –Using your own ad hoc set of categories and methods to organize your own collection of books seems to work fine… –What if your collection grew to 10 Times the size? How would you organize it? 100 Times? 1000 Times? 100000 times?

15 2002.09.10 - SLIDE 15IS 202 - Fall 2002 What is Information Organization? Identifying the existence of all types of information-bearing entities as they are made available Identifying the works contained within those information-bearing entities or as parts of them Systematically pulling together these information-bearing entities into collections in libraries, archives, museums, Internet communications files and other such depositories From Hagler via Taylor, Chap. 1

16 2002.09.10 - SLIDE 16IS 202 - Fall 2002 What is Information Organization? Producing lists of these information- bearing entities prepared according to standard rules for citation Providing name, title, subject and other useful access to these information-bearing entities Providing the means of locating each information-bearing entity or a copy of it

17 2002.09.10 - SLIDE 17IS 202 - Fall 2002 Organizing Information Libraries Archives Museums and galleries Internet Corporate and office environments

18 2002.09.10 - SLIDE 18IS 202 - Fall 2002 Key Issues in This Course How to describe information resources or information-bearing objects in ways so that they may be effectively used by those who need to use them –Organizing How to find the appropriate information resources or information-bearing objects for someone’s (or your own) needs –Retrieving

19 2002.09.10 - SLIDE 19IS 202 - Fall 2002 Key Issues Creation UtilizationSearching Active Inactive Semi-Active Retention/ Mining Disposition Discard Using Creating Authoring Modifying Organizing Indexing Storing Retrieval Distribution Networking Accessing Filtering

20 2002.09.10 - SLIDE 20IS 202 - Fall 2002 Organizing/Indexing Collecting and integrating information Affects data, information and metadata “Metadata” describes data and information –More on this later Organizing information –Types of organization? Indexing

21 2002.09.10 - SLIDE 21IS 202 - Fall 2002 Accessing/Filtering Using the organization created in the O/I stage to: –Select desired (or relevant) information –Locate that information –Retrieve the information from its storage location (often via a network)

22 2002.09.10 - SLIDE 22IS 202 - Fall 2002 Structure of an IR System Interest profiles & Queries Documents & data Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Storage Line Potentially Relevant Documents Comparison/ Matching Store1: Profiles/ Search requests Store2: Document representations Indexing (Descriptive and Subject) Formulating query in terms of descriptors Storage of profiles Storage of Documents Information Storage and Retrieval System

23 2002.09.10 - SLIDE 23IS 202 - Fall 2002 Lecture Contents Review –Categories –The Vocabulary Problem Organization of Information Metadata Kinds of Metadata Dublin Core

24 2002.09.10 - SLIDE 24IS 202 - Fall 2002 Metadata Metadata is –“Data about Data” (database systems) –Information about Information First used (to the best we can discover) in 1978 (meta-data) Used for databases in (Meta-Data Base) –“a data base which itself contains the structural and semantic data of other data bases” »Thomas R. Cousins & Wayne D. Dominick, “The Management of Data Bases of Data Bases” ASIS Proceedings, 1978.

25 2002.09.10 - SLIDE 25IS 202 - Fall 2002 Metadata Structures and languages for the description of information resources and their elements (components or features) “Metadata is information on the organization of the data, the various data domains, and the relationship between them” (Baeza-Yates p. 142)

26 2002.09.10 - SLIDE 26IS 202 - Fall 2002 Metadata Often two main types of metadata are distinguished: –Descriptive metadata Describes the information/data object and its properties May use a variety of descriptive formats and rules –Topical metadata Describes the topic or “aboutness” of an information/data object May include a variety of vocabularies for describing, subjects, topics, categories, etc.

27 2002.09.10 - SLIDE 27IS 202 - Fall 2002 Lecture Contents Review –Categories –The Vocabulary Problem Organization of Information Metadata Kinds of Metadata Dublin Core

28 2002.09.10 - SLIDE 28IS 202 - Fall 2002 Types of Metadata Element names Element description Element representation Element coding Element semantics Element classification

29 2002.09.10 - SLIDE 29IS 202 - Fall 2002 How Can You Describe an Information- Bearing Object?

30 2002.09.10 - SLIDE 30IS 202 - Fall 2002 Goals of Descriptive Cataloging To enable a person to find a document of which –The author, or –The title, or –The subject is known To show what a library has –By a given author –On a given subject (and related subjects) –In a given kind (or form) of literature. To assist in the choice of a document –As to its edition (bibliographically) –As to its character (literary or topical) Charles A. Cutter, 1876

31 2002.09.10 - SLIDE 31IS 202 - Fall 2002 Rules for Descriptive Cataloging ISBD AACR AACR II

32 2002.09.10 - SLIDE 32IS 202 - Fall 2002 AACRII Sources of Information ISBD areas Choice of Access Points

33 2002.09.10 - SLIDE 33IS 202 - Fall 2002 Sources of Information Each different type of material has a preferred location for deriving information about it –Books and printed material Title page –Cartographic materials (maps, globes, etc) The map itself, or containers, stands, etc. –Sound recordings Disc label, cassette label, etc.

34 2002.09.10 - SLIDE 34IS 202 - Fall 2002 ISBD Areas Title and statement of responsibility Edition Material or type of publication specification Publication, distribution (etc.) Physical description Series Notes Standard numbers

35 2002.09.10 - SLIDE 35IS 202 - Fall 2002 ISBD Punctuation Title Proper (GMD) = Parallel title : other title info / First statement of responsibility ; others. -- Edition information. -- Material. -- Place of Publication : Publisher Name, Date. -- Material designation and extent ; Dimensions of item. -- (Title of Series / Statement of responsibility). -- Notes. -- Standard numbers: terms of availability (qualifications).

36 2002.09.10 - SLIDE 36IS 202 - Fall 2002 Bibliographic Record Introduction to cataloging and classification / Bohdan S. Wynar. -- 8th ed. / Arlene G. Taylor. -- Englewood, Colo. : Libraries Unlimited, 1992. -- (Library science text series).

37 2002.09.10 - SLIDE 37IS 202 - Fall 2002 Choice of Access Points Title(s) (Always main title) Main Entry?? Added Entries Series Titles Identifying Numbers

38 2002.09.10 - SLIDE 38IS 202 - Fall 2002 More Metadata Systems The following are a sample of metadata systems for a variety of special types of data/documents/objects

39 2002.09.10 - SLIDE 39IS 202 - Fall 2002 Metadata Systems and Standards Naming and ID systems Bibliographic description –Texts Music Images and objects Numeric data Geospatial data Collections Video and motion pictures

40 2002.09.10 - SLIDE 40IS 202 - Fall 2002 Naming and ID Systems URLs (Uniform Resource Locators) –URIs (Uniform Resource Indentifiers) URNs (Uniform Resource Names ) URCs (Uniform Resource Characteristics) Kahn/Wilensky Handles SICI (Serial Item and Content Identifiers) ISBN ISSN

41 2002.09.10 - SLIDE 41IS 202 - Fall 2002 Bibliographic Description MARC (Machine Readable Cataloging) DUBLIN CORE –Warwick Framework for Dublin Core Metadata GILS (Government Information Locator Service) RFC 1807 (Format for Bibliographic Records) RDF (Resource Description Format)

42 2002.09.10 - SLIDE 42IS 202 - Fall 2002 More Bibliographic Descriptors TEI Headers (Text Encoding initiative) BibTex PICS (Platform for Internet Content Selection) SOIF (Summary Object Interchange Format)

43 2002.09.10 - SLIDE 43IS 202 - Fall 2002 Music Standard Music Description Language (SMDL)

44 2002.09.10 - SLIDE 44IS 202 - Fall 2002 Numeric Data ICPSR Data Documentation Initiative (SGML DTD development) Standard for Survey Design and Statistical Methodology Metadata (SDSM)

45 2002.09.10 - SLIDE 45IS 202 - Fall 2002 Images and Objects Categories for the Description of Works of Art (Getty Art Institute) Consortium for the Computer Interchange of Museum Information (CIMI) RLG REACH Element Set (for Shared Description of Museum Objects) VRA Core Categories (Visual Resources Association)

46 2002.09.10 - SLIDE 46IS 202 - Fall 2002 Geospatial Data Content Standards for Digital Geospatial Metadata FGDC (Federal Geographic Data Committee) ASTM Section D18.01.05 Draft Specification Content Specification for Digital Geospatial Metadata (American Society for Testing and Materials (ASTM)

47 2002.09.10 - SLIDE 47IS 202 - Fall 2002 Collection Level Descriptors EAD (Encoded Archival Description) Z39.50 Profile for Access to Digital Collections RSLP Collection Description (Research Support Libraries Programme)

48 2002.09.10 - SLIDE 48IS 202 - Fall 2002 Video and Motion Pictures: Multimedia MPEG-7 (more on this later) Video Development Initiative (ViDe) User's Guide: Dublin Core Application Profile for Digital Video Data Dictionary for Audio/Video Metadata (Library of Congress Digital Audio-Visual Preservation Prototyping project)

49 2002.09.10 - SLIDE 49IS 202 - Fall 2002 Mega-Metadata Standards METS - Metadata Encoding and Transmission Standard –Developed by the Digital Library Federation as an implementation strategy for preservation metadata –"XML document format for encoding metadata necessary for both management of digital library objects within a repository and exchange of such objects between repositories (or between repositories and their users)” –Provides a flexible mechanism for encoding descriptive, administrative, and structural metadata for a digital library object, and for expressing the complex links between these various forms of metadata

50 2002.09.10 - SLIDE 50IS 202 - Fall 2002 Lecture Contents Review –Categories –The Vocabulary Problem Organization of Information Metadata Kinds of Metadata Dublin Core

51 2002.09.10 - SLIDE 51IS 202 - Fall 2002 Dublin Core Simple metadata for describing internet resources For “Document-Like Objects” 15 Elements (in base DC)

52 2002.09.10 - SLIDE 52IS 202 - Fall 2002 Dublin Core Elements Title Creator Subject Description Publisher Other Contributors Date Resource Type Format Resource Identifier Source Language Relation Coverage Rights Management

53 2002.09.10 - SLIDE 53IS 202 - Fall 2002 Title Label: TITLE The name given to the resource by the CREATOR or PUBLISHER

54 2002.09.10 - SLIDE 54IS 202 - Fall 2002 Author or Creator Label: CREATOR The person(s) or organization(s) primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources.

55 2002.09.10 - SLIDE 55IS 202 - Fall 2002 Subject and Keywords Label: SUBJECT The topic of the resource, or keywords or phrases that describe the subject or content of the resource. The intent of the specification of this element is to promote the use of controlled vocabularies and keywords. This element might well include scheme-qualified classification data (for example, Library of Congress Classification Numbers or Dewey Decimal numbers) or scheme-qualified controlled vocabularies (such as Medical Subject Headings or Art and Architecture Thesaurus descriptors) as well.

56 2002.09.10 - SLIDE 56IS 202 - Fall 2002 Description Label: DESCRIPTION A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources. Future metadata collections might well include computational content description (spectral analysis of a visual resource, for example) that may not be embeddable in current network systems. In such a case this field might contain a link to such a description rather than the description itself.

57 2002.09.10 - SLIDE 57IS 202 - Fall 2002 Publisher Label: PUBLISHER The entity responsible for making the resource available in its present form, such as a publisher, a university department, or a corporate entity. The intent of specifying this field is to identify the entity that provides access to the resource.

58 2002.09.10 - SLIDE 58IS 202 - Fall 2002 Other Contributors Label: CONTRIBUTORS Person(s) or organization(s) in addition to those specified in the CREATOR element who have made significant intellectual contributions to the resource but whose contribution is secondary to the individuals or entities specified in the CREATOR element (for example, editors, transcribers, illustrators, and convenors).

59 2002.09.10 - SLIDE 59IS 202 - Fall 2002 Date Label: DATE The date the resource was made available in its present form. The recommended best practice is an 8 digit number in the form YYYYMMDD as defined by ANSI X3.30-1985. In this scheme, the date element for the day this is written would be 19961203, or December 3, 1996. Many other schema are possible, but if used, they should be identified in an unambiguous manner.

60 2002.09.10 - SLIDE 60IS 202 - Fall 2002 Resource Type Label: TYPE The category of the resource, such as home page, novel, poem, working paper, preprint, technical report, essay, dictionary. It is expected that RESOURCE TYPE will be chosen from an enumerated list of types. One preliminary set of such types can be found at the following URL (now out of date): http://www.roads.lut.ac.uk/Metadata/DC-ObjectTypes.html

61 2002.09.10 - SLIDE 61IS 202 - Fall 2002 Format Label: FORMAT The data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image. The intent of specifying this element is to provide information necessary to allow people or machines to make decisions about the usability of the encoded data (what hardware and software might be required to display or execute it, for example). As with RESOURCE TYPE, FORMAT will be assigned from enumerated lists such as registered Internet Media Types (MIME types). In principal, formats can include physical media such as books, serials, or other non- electronic media.

62 2002.09.10 - SLIDE 62IS 202 - Fall 2002 Resource Identifier Label: IDENTIFIER String or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs (when implemented). Other globally-unique identifiers,such as International Standard Book Numbers (ISBN) or other formal names would also be candidates for this element.

63 2002.09.10 - SLIDE 63IS 202 - Fall 2002 Source Label: SOURCE The work, either print or electronic, from which this resource is derived, if applicable. For example, an html encoding of a Shakespearean sonnet might identify the paper version of the sonnet from which the electronic version was transcribed.

64 2002.09.10 - SLIDE 64IS 202 - Fall 2002 Language Label: LANGUAGE Language(s) of the intellectual content of the resource. Where practical, the content of this field should coincide with the Z39.53 three character codes for written languages. See: http://www.sil.org/sgml/nisoLang3-1994.html

65 2002.09.10 - SLIDE 65IS 202 - Fall 2002 Relation Label: RELATION Relationship to other resources. The intent of specifying this element is to provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves. For example, images in a document, chapters in a book, or items in a collection. A formal specification of RELATION is currently under development. Users and developers should understand that use of this element should be currently considered experimental.

66 2002.09.10 - SLIDE 66IS 202 - Fall 2002 Coverage Label: COVERAGE The spatial locations and temporal duration characteristic of the resource. Formal specification of COVERAGE is currently under development. Users and developers should understand that use of this element should be currently considered experimental.

67 2002.09.10 - SLIDE 67IS 202 - Fall 2002 Rights Management Label: RIGHTS The content of this element is intended to be a link (a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a server that would provide such information in a dynamic way. The intent of specifying this field is to allow providers a means to associate terms and conditions or copyright statements with a resource or collection of resources. No assumptions should be made by users if such a field is empty or not present.

68 2002.09.10 - SLIDE 68IS 202 - Fall 2002 The Same Item in Different Metadata Systems ISBD Dublin Core RFC 1807 TEI Header MARC Record

69 2002.09.10 - SLIDE 69IS 202 - Fall 2002 ISBD Punctuation Title Proper (GMD) = Parallel title : other title info / First statement of responsibility ; others. -- Edition information. -- Material. -- Place of Publication : Publisher Name, Date. -- Material designation and extent ; Dimensions of item. -- (Title of Series / Statement of responsibility). -- Notes. -- Standard numbers: terms of availability (qualifications).

70 2002.09.10 - SLIDE 70IS 202 - Fall 2002 Bibliographic Record Introduction to cataloging and classification / Bohdan S. Wynar. -- 8th ed. / Arlene G. Taylor. -- Englewood, Colo. : Libraries Unlimited, 1992. -- (Library science text series).

71 2002.09.10 - SLIDE 71IS 202 - Fall 2002 Dublin Core TITLE: Introduction to cataloging and classification CREATOR: Taylor, Arlene G. OTHER CONTRIBUTOR: Wynar, Bohdan S. DATE: 1992 FORMAT: BOOK LANGUAGE: ENG PAGES: 633 PUBLISHER: Libraries Unlimited SUBJECT: Cataloging. SUBJECT: subject cataloging. SUBJECT: Classification -- Books DESCRIPTION: Textbook on cataloging and classification RESOURCE TYPE: text.monograph RESOURCE IDENTIFIER: (ISBN) 0872879674

72 2002.09.10 - SLIDE 72IS 202 - Fall 2002 RFC 1807 BIB-VERSION:: CS-TR-v2.1 ID:: UCB//123456 ENTRY:: September 9, 1997 TYPE:: BOOK TITLE:: Introduction to cataloging and classification AUTHOR:: Wynar, Bohdan S. AUTHOR:: Taylor, Arlene G. DATE:: 1992 PAGES:: 633 COPYRIGHT:: Libraries Unlimited, 1992 SERIES:: Library Science Text Series END:: UCB//123456

73 2002.09.10 - SLIDE 73IS 202 - Fall 2002 Minimal TEI Header Introduction to cataloging and classification Bohdan S. Wynar 8th edition by Arlene G. Taylor Libraries Unlimited Introduction to cataloging and classification / Bohdan S. Wynar. -- 8th ed. / Arlene G. Taylor. -- Englewood, Colo. : Libraries Unlimited, 1992.

74 2002.09.10 - SLIDE 74IS 202 - Fall 2002 MARC Record (Display) ID:DCLC9124851-B RTYP:c ST:p FRN: MS:c EL: AD:06-20-91 CC:9110 BLT:am DCF:a CSC: MOD: SNR: ATC: UD:04-11-92 CP:cou L:eng INT: GPC: BIO: FIC:0 CON:b PC:s PD:1992/ REP: CPI:0 FSI:0 ILC:a II:1 MMD: OR: POL: DM: RR: COL: EML: GEN: BSE: 010 9124851 020 0872878112 (cloth) 020 0872879674 (paper) 040 DLC$cDLC$dDLC 050 00 Z693$b.W94 1991 082 00 025.3$220 100 1 Wynar, Bohdan S. 245 10 Introduction to cataloging and classification /$cBohdan S. Wynar. 250 8th ed. /$bArlene G. Taylor. 260 Englewood, Colo. :$bLibraries Unlimited,$c1992. 300 xvii, 633 p. :$bill. ;$c24 cm. 440 0 Library science text series 504 Includes bibliographical references (p. 591-599) and index. 650 0 Cataloging. 650 0 Subject cataloging. 650 0 Classification$xBooks. 630 00 Anglo-American cataloguing rules. 700 10 Taylor, Arlene G.,$d1941-

75 2002.09.10 - SLIDE 75IS 202 - Fall 2002 Metadata Resources Check the Links section from the class home page Best site is the “Digital Library: Metadata Resources” page from IFLA at http://www.ifla.org/II/metadata.htm http://www.ifla.org/II/metadata.htm For another good source of information on metadata standards see http://www.chin.gc.ca/English/Standards http://www.chin.gc.ca/English/Standards

76 2002.09.10 - SLIDE 76IS 202 - Fall 2002 Next Time Controlled vocabularies (Introduction) Readings for next time (in Protected) –Paper by Chris Borgman on online catalogs –Paper by Marcia Bates on a design model for access –Paper by Elaine Svenonius on controlled vocabularies


Download ppt "2002.09.10 - SLIDE 1IS 202 - Fall 2002 Lecture 05: Metadata: Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30."

Similar presentations


Ads by Google