Presentation is loading. Please wait.

Presentation is loading. Please wait.

2009.01.26 - SLIDE 1IS 257 – Fall 2009 Controlled Vocabularies University of California, Berkeley School of Information IS 245: Organization of Information.

Similar presentations


Presentation on theme: "2009.01.26 - SLIDE 1IS 257 – Fall 2009 Controlled Vocabularies University of California, Berkeley School of Information IS 245: Organization of Information."— Presentation transcript:

1 2009.01.26 - SLIDE 1IS 257 – Fall 2009 Controlled Vocabularies University of California, Berkeley School of Information IS 245: Organization of Information In Collections

2 2009.01.26 - SLIDE 2IS 257 – Fall 2009 Lecture Contents Review –Organization of Information –Metadata –Dublin Core Controlled Vocabularies Discussion

3 2009.01.26 - SLIDE 3IS 257 – Fall 2009 Lecture Contents Review –Organization of Information –Metadata –Dublin Core Controlled Vocabularies Discussion

4 2009.01.26 - SLIDE 4IS 257 – Fall 2009 Organization of Information Is there a basic human need to put things into some sort of order? –Much of natural language concerns categories of things rather than individual things –Why do we organize things and information? Why do spoons go in THAT drawer in the kitchen and not in a can in the garage? Why do your favorite books go on one shelf and not-so-favorite on another?

5 2009.01.26 - SLIDE 5IS 257 – Fall 2009 Why Organize Information? The main reason –So that you can find things more effectively I.e., effective retrieval is predicated on some sort of organization applied to information resources Historically there have been many institutions and tools devoted to information organization –Libraries –Museums –Archives –Indexes and catalogs, dictionaries, phone books, etc.

6 2009.01.26 - SLIDE 6IS 257 – Fall 2009 Why Organize Information? A question of scale –Using your own ad hoc set of categories and methods to organize your own collection of books or CDs seems to work fine… –What if your collection grew to 10 Times the size? How would you organize it? 100 Times? 1000 Times? 100000 times? What if it wasn’t physical objects, but electronic?

7 2009.01.26 - SLIDE 7IS 257 – Fall 2009 Key Issues in This Course How to describe information resources or information-bearing objects in ways so that they may be effectively used by those who need to use them –Organizing How to find the appropriate information resources or information-bearing objects for someone’s (or your own) needs –Retrieving

8 2009.01.26 - SLIDE 8IS 257 – Fall 2009 Metadata Metadata is –“Data about Data” (database systems) –Information about Information First used (to the best we can discover) in 1978 (meta-data) Used for databases in (Meta-Data Base) –“a data base which itself contains the structural and semantic data of other data bases” »Thomas R. Cousins & Wayne D. Dominick, “The Management of Data Bases of Data Bases” ASIS Proceedings, 1978.

9 2009.01.26 - SLIDE 9IS 257 – Fall 2009 Metadata Structures and languages for the description of information resources and their elements (components or features) “Metadata is information on the organization of the data, the various data domains, and the relationship between them” (Baeza-Yates p. 142)

10 2009.01.26 - SLIDE 10IS 257 – Fall 2009 Metadata Often two main types of metadata are distinguished –Descriptive metadata Describes the information/data object and its properties May use a variety of descriptive formats and rules –Topical metadata Describes the topic or “aboutness” of an information/data object May include a variety of vocabularies for describing, subjects, topics, categories, etc.

11 2009.01.26 - SLIDE 11IS 257 – Fall 2009 Types of Metadata Element names Element description Element representation Element coding Element semantics Element classification

12 2009.01.26 - SLIDE 12IS 257 – Fall 2009 Dublin Core Simple metadata for describing internet resources For “Document-Like Objects” 15 Elements (in base DC)

13 2009.01.26 - SLIDE 13IS 257 – Fall 2009 Dublin Core (original version) TITLE: Introduction to cataloging and classification CREATOR: Taylor, Arlene G. OTHER CONTRIBUTOR: Wynar, Bohdan S. DATE: 1992 FORMAT: BOOK LANGUAGE: ENG PAGES: 633 PUBLISHER: Libraries Unlimited SUBJECT: Cataloging. SUBJECT: subject cataloging. SUBJECT: Classification -- Books DESCRIPTION: Textbook on cataloging and classification RESOURCE TYPE: text.monograph RESOURCE IDENTIFIER: (ISBN) 0872879674

14 2009.01.26 - SLIDE 14IS 257 – Fall 2009 Dublin Core (XML) Introduction to cataloging and classification Taylor, Arlene G. Wynar, Bohdan S. 1992 BOOK ENG 633 pages Libraries Unlimited Cataloging. subject cataloging. Classification -- Books. Textbook on cataloging and classification text.monograph (ISBN) 0872879674

15 2009.01.26 - SLIDE 15IS 257 – Fall 2009 Dublin Core Elements Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights

16 2009.01.26 - SLIDE 16IS 257 – Fall 2009 Mega-Metadata Standards METS - Metadata Encoding and Transmission Standard (http://www.loc.gov/standards/mets) –Developed by the Digital Library Federation as an implementation strategy for preservation metadata –"XML document format for encoding metadata necessary for both management of digital library objects within a repository and exchange of such objects between repositories (or between repositories and their users)” –Provides a flexible mechanism for encoding descriptive, administrative, and structural metadata for a digital library object, and for expressing the complex links between these various forms of metadata

17 2009.01.26 - SLIDE 17IS 257 – Fall 2009 Metadata Resources Check the Links section from the class home page Best site is the “Digital Library: Metadata Resources” page from IFLA at http://www.ifla.org/II/metadata.htm http://www.ifla.org/II/metadata.htm For another good source of information on metadata standards see http://www.chin.gc.ca/English/Standards http://www.chin.gc.ca/English/Standards

18 2009.01.26 - SLIDE 18IS 257 – Fall 2009 Lecture Contents Review –Organization of Information –Metadata –Dublin Core Controlled Vocabularies (Introduction) Discussion

19 2009.01.26 - SLIDE 19IS 257 – Fall 2009 Controlled Vocabularies Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information That is, it is an attempt to provide a consistent set of descriptions for use in (or as) metadata

20 2009.01.26 - SLIDE 20IS 257 – Fall 2009 Controlled Vocabularies Dictionaries Names and name authorities Gazetteers (geographic names) Code lists (e.g., LC language codes) Subject heading lists Classification schemes Thesauri Other (new) structures –Time Period Directories –?

21 2009.01.26 - SLIDE 21IS 257 – Fall 2009 Control of Names Cutter’s (1876) objectives of bibliographic description –To enable a person to find a document of which The author, or The title, or The subject is known –To show what a library has By a given author On a given subject (and related subjects) In a given kind (or form) of literature. First serves access Second serves collocation

22 2009.01.26 - SLIDE 22IS 257 – Fall 2009 Problems with Names How many names should be associated with a document? Which of these should be the “main entry?” What form should each of the names take? What references should be made from other possible forms of names that haven’t been used?

23 2009.01.26 - SLIDE 23IS 257 – Fall 2009 The Problem Proliferation of the forms of names –Different names for the same person –Different people with the same names Examples –from Books in Print (semi-controlled but not consistent) –ERIC author index (not controlled)

24 2009.01.26 - SLIDE 24IS 257 – Fall 2009 Goethe …etc…

25 2009.01.26 - SLIDE 25IS 257 – Fall 2009 John Muir

26 2009.01.26 - SLIDE 26IS 257 – Fall 2009 Pauline Cochrane nee Atherton

27 2009.01.26 - SLIDE 27IS 257 – Fall 2009 Pauline Cochrane nee Atherton

28 2009.01.26 - SLIDE 28IS 257 – Fall 2009 Rules for Description AACR II and other sets of descriptive cataloging rules provide guidelines for: –Determining the number of name entries –Choosing a main entry –Deciding on the form of name to be used –Deciding when to make references

29 2009.01.26 - SLIDE 29IS 257 – Fall 2009 Authority Control Authority control is concerned with creation and maintenance of a set of terms that have been chosen as the standard representatives (also know as established) based on some set of rules If you have rules, why do you need to keep track of all of the headings? Can’t you just infer the headings from the rules?

30 2009.01.26 - SLIDE 30IS 257 – Fall 2009 Conditions of Authorship? Single person or single corporate entity Unknown or anonymous authors –Fictitiously ascribed works Shared responsibility Collections or editorially assembled works Works of mixed responsibility (e.g., translations) Related works

31 2009.01.26 - SLIDE 31IS 257 – Fall 2009 Choice of Name AACR II says that the predominant form of the name used in a particular author’s writings should be chosen as the form of name References should be made from the other forms of the name

32 2009.01.26 - SLIDE 32IS 257 – Fall 2009 Form of the Name When names appear in multiple forms, one form needs to be chosen Criteria for choice are: –Fullness (e.g., full names vs. initials only) –Language of the name –Spelling (choose predominant form) Entry element: –John Smith or Smith, John? –Mao Zedong or Zedong, Mao? (Mao Tse Tung?)

33 2009.01.26 - SLIDE 33IS 257 – Fall 2009 Name Authority Files ID:NAFL8057230 ST:p EL:n STH:a MS:c UIP:a TD:19910821174242 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:05-14-80 RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 08-21-91 Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 053 PR6005.R517 100 10 Creasey, John 400 10 Cooke, M. E. 400 10 Cooke, Margaret,$d1908-1973 400 10 Cooper, Henry St. John,$d1908-1973 400 00 Credo,$d1908-1973 400 10 Fecamps, Elise 400 10 Gill, Patrick,$d1908-1973 400 10 Hope, Brian,$d1908-1973 400 10 Hughes, Colin,$d1908-1973 400 10 Marsden, James 400 10 Matheson, Rodney 400 10 Ranger, Ken 400 20 St. John, Henry,$d1908-1973 400 10 Wilde, Jimmy 500 10 $wnnnc$aAshe, Gordon,$d1908-1973 Different names for the same person

34 2009.01.26 - SLIDE 34IS 257 – Fall 2009 Name Authority Files ID:NAFO9114111 ST:p EL:n STH:a MS:n UIP:a TD:19910817053048 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:06-03-91 RFE:a CSC:c SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 08-19-91 040 OCoLC$cOCoLC 100 10 Marric, J. J.,$d1908-1973 500 10 $wnnnc$aCreasey, John 663 Works by this author are entered under the name used in the item. For a listing of other names used by this author, search also under$bCrease y, John 670 OCLC 13441825: His Gideon's day, 1955$b(hdg.: Creasey, John; usage: J.J. Marric) 670 LC data base, 6/10/91$b(hdg.: Creasey, John; usage: J.J. Marric) 670 Pseuds. and nicknames dict., c1987$b(Creasey, John, 1908-1973; Britis h author; pseud.: Marric, J. J.)

35 2009.01.26 - SLIDE 35IS 257 – Fall 2009 Name Authority Files ID:NAFL8166762 ST:p EL:n STH:a MS:c UIP:a TD:19910604053124 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:08-20-81 RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 06-06-91 Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 100 10 Butler, William Vivian,$d1927- 400 10 Butler, W. V.$q(William Vivian),$d1927- 400 10 Marric, J. J.,$d1927- 670 His The durable desperadoes, 1973. 670 His The young detective's handbook, c1981:$bt.p. (W.V. Butler) 670 His Gideon's way, 1986:$bCIP t.p. (William Vivian Butler writing as J.J. Marric) Different people writing with the same name

36 2009.01.26 - SLIDE 36IS 257 – Fall 2009 The Haunting of Lauran Paine 1. Paine, Lauran. ALSO KNOWN AS: Carrel, Mark. Thompson, Russ. Andrews, A. A. Benton, Will. Bradford, Will. Bradley, Concho. Brennan, Will. Carter, Nevada. Allen, Clay. Almonte, Rosa. Armour, John. Cassady, Claude. Glendenning, Donn. Kelley, Ray. Kilgore, John. Martin, Tom. Slaughter, Jim. Standish, Buck. … Batchelor, Reg. Beck, Harry. Bedford, Kenneth. Bosworth, Frank. Bovee, Ruth. Cassidy, Claude. Custer, Clint. Dana, Amber. Dana, Richard. Davis, Audrey. Drexler, J. F. Duchesne, Antoinette. Fisher, Margot. Fleck, Betty. Frost, Joni. Gordon, Angela. Gorman, Beth. Hayden, Jay. Houston, Will. Howard, Troy. Ingersol, Jared. … Kelly, Ray. Ketchum, Jack. Liggett, Hunter. Lucas, J. K. Lyon, Buck. Morgan, Arlene. Morgan, Valerie. O'Connor, Clint. St. George, Arthur. Sharp, Helen. Thorn, Barbara. Archer, Dennis. Clark, Badger.

37 2009.01.26 - SLIDE 37IS 257 – Fall 2009 Some Interesting Ones…

38 2009.01.26 - SLIDE 38IS 257 – Fall 2009 4Ws – What, Where, When, Who

39 2009.01.26 - SLIDE 39IS 257 – Fall 2009 Metadata as Infrastructure The difference between memorization and understanding lies in knowing the context and relationships of whatever is of interest. When setting out to learn about a new topic, a well-tested practice is to follow the traditional “5Ws and the H”: Who?, What?, When?, Where?, Why?, and How?

40 2009.01.26 - SLIDE 40IS 257 – Fall 2009 Metadata as Infrastructure The reference collections of paper-based libraries provide a structured environment for resources, with encyclopedias and subject catalogs, gazetteers, chronologies, and biographical dictionaries, offering direct support for at least What, Where, When, and Who. The digital environment does not yet provide an effective, and easily exploited, infrastructure comparable to the traditional reference library.

41 2009.01.26 - SLIDE 41IS 257 – Fall 2009 What? Searching texts by topic, e.g. Dewey, LCSH, any subject index, or category scheme applied to documents. Two kinds of mapping in every search: Documents are assigned to topic categories, e.g. Dewey Queries have to map to topic categories, e.g. Dewey’s Relativ Index from ordinary words/phrases to Decimal Classification numbers. Also mapping between topic systems, e.g. US Patent classification and International Patent Classification.

42 2009.01.26 - SLIDE 42IS 257 – Fall 2009 Texts ‘What’ searches involve mapping to controlled vocabularies Thesaurus/ Ontology

43 2009.01.26 - SLIDE 43IS 257 – Fall 2009 Find Plutonium In Arabic Chinese Greek Japanese Korean Russian Tamil Statistical association Digital library resources

44 2009.01.26 - SLIDE 44IS 257 – Fall 2009 EVI example EVI 1 Index term: “pass mtr veh spark ign eng” User Query “Automobile” EVI 2 Index term: “automobiles” OR “internal combustible engines”

45 2009.01.26 - SLIDE 45IS 257 – Fall 2009 Texts Numeric datasets It is also difficult to move between different media forms Thesaurus/ Ontology EVI

46 2009.01.26 - SLIDE 46IS 257 – Fall 2009 Searching across data types Different media can be linked indirectly via metadata, but often (e.g. for socio- economic numeric data series) you also need to specify WHERE to get correct results

47 2009.01.26 - SLIDE 47IS 257 – Fall 2009 Texts Numeric datasets But texts associated with numeric data can be mapped as well… Thesaurus/ Ontology captions EVI

48 2009.01.26 - SLIDE 48IS 257 – Fall 2009 Texts Numeric datasets But there are also geographic dependencies… Thesaurus/ Ontology captionsMaps/ Geo Data EVI

49 2009.01.26 - SLIDE 49IS 257 – Fall 2009 WHERE: Place names are problematic… Variant forms: St. Petersburg, Санкт Петербург, Saint- Pétersbourg,... Multiple names: Cluj, in Romania / Roumania / Rumania, is also called Klausenburg and Kolozsvar. Names changes: Bombay  Mumbai. Homographs:Vienna, VA, and Vienna, Austria; –50 Springfields. Anachronisms: No Germany before 1870 Vague, e.g. Midwest, Silicon Valley Unstable boundaries: 19th century Poland; Balkans; USSR Use a gazetteer!

50 2009.01.26 - SLIDE 50IS 257 – Fall 2009 Place names found in documents. Gazetteer provided lat. & long. Places displayed on map. Timebar  WHERE – Geo-Temporal Search

51 2009.01.26 - SLIDE 51IS 257 – Fall 2009 Zoom on map. Click on place for a list of records. Click on record to display text.

52 2009.01.26 - SLIDE 52IS 257 – Fall 2009 Texts Numeric datasets So geographic search becomes part of the infrastructure Thesaurus/ Ontology GazetteerscaptionsMaps/ Geo Data EVI

53 2009.01.26 - SLIDE 53IS 257 – Fall 2009 WHEN: Search by time is also weakly supported… Calendars are the standard for time But people use the names of events to refer to time periods Named time periods resemble place names in being: –Unstable: European War, Great War, First World War –Multiple: Second World War, Great Patriotic War –Ambiguous: “Civil war” in different centuries in England, USA, Spain, etc. Places have temporal aspects & periods have geographical aspects: When the Stone Age was, varies by region

54 2009.01.26 - SLIDE 54IS 257 – Fall 2009 Linking vocabularies WHAT, WHERE, WHEN Library subject headings Topic – Geographic subdivision – Chronological subdivision Place name gazetteer: Place name – Type – Spatial markers (Lat & long) – When Time Period Directory Period name – Type – Time markers (Calendar) – WhereC Connecting Metadata Systems

55 2009.01.26 - SLIDE 55IS 257 – Fall 2009 Time period directories link via the place (or time) Texts Numeric datasets Thesaurus/ Ontology GazetteerscaptionsMaps/ Geo Data EVI Time Period Directory Time lines, Chronologies

56 2009.01.26 - SLIDE 56IS 257 – Fall 2009 WHEN: Time Period Directory Timeline Link to Catalog Link to Wikipedia

57 2009.01.26 - SLIDE 57IS 257 – Fall 2009 WHO: Biographical Dictionary Complex relationships Life events metadata WHAT: Actions prisoner WHERE: Places Holstein WHEN: Times 1261-1262 WHO: People Margaret Sambiria Need external links

58 2009.01.26 - SLIDE 58IS 257 – Fall 2009 Any document, object, or performance Any resource: Audio, Images, Texts, Numeric data, Objects, Virtual reality, Webpages Any catalog: Archives, Libraries, Museums, TV, Publishers Connect it with its context – and other resources. Facet Vocabulary Displays WHAT Thesaurus Cross- e.g. LCSH references WHERE Gazetteer Map WHEN Period directory Timeline WHO Biograph. dict. Personal e.g. Who’s Who relations

59 2009.01.26 - SLIDE 59IS 257 – Fall 2009 Demo of search interface

60 2009.01.26 - SLIDE 60IS 257 – Fall 2009 Entry Vocabulary Index suggests correct LCSH with different spelling

61 2009.01.26 - SLIDE 61IS 257 – Fall 2009 Related places

62 2009.01.26 - SLIDE 62IS 257 – Fall 2009 Potentially related people

63 2009.01.26 - SLIDE 63IS 257 – Fall 2009 Potentially related periods

64 2009.01.26 - SLIDE 64IS 257 – Fall 2009 Mostly in India 16 th - 18 th century

65 2009.01.26 - SLIDE 65IS 257 – Fall 2009 Find out more about this area.

66 2009.01.26 - SLIDE 66IS 257 – Fall 2009 Different Browsing Options!

67 2009.01.26 - SLIDE 67IS 257 – Fall 2009 Zooming in to South Asia Restricting time frame Select

68 2009.01.26 - SLIDE 68IS 257 – Fall 2009 More information about the country of India…

69 2009.01.26 - SLIDE 69IS 257 – Fall 2009 More information about the country of India… Wikipedia CIA Factbook BBCEthnologue Berkeley Natural History Museums

70 2009.01.26 - SLIDE 70IS 257 – Fall 2009 Historical events – linked to Library catalog & Wikipedia : none avail. for this time period

71 2009.01.26 - SLIDE 71IS 257 – Fall 2009 ECAI Cultural Atlases: presenting history in its geographical & chronological contexts

72 2009.01.26 - SLIDE 72IS 257 – Fall 2009 Mongol Empire Video

73 2009.01.26 - SLIDE 73IS 257 – Fall 2009 Structure of an IR System Search Line Interest profiles & Queries Documents & data Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Storage Line Potentially Relevant Documents Comparison/ Matching Store1: Profiles/ Search requests Store2: Document representations Indexing (Descriptive and Subject) Formulating query in terms of descriptors Storage of profiles Storage of Documents Information Storage and Retrieval System Adapted from Soergel, p. 19

74 2009.01.26 - SLIDE 74IS 257 – Fall 2009 Uses of Controlled Vocabularies Library subject headings, classification, and authority files Commercial journal indexing services and databases Yahoo, and other web classification schemes Online and manual systems within organizations –SunSolve –MacArthur

75 2009.01.26 - SLIDE 75IS 257 – Fall 2009 Types of Indexing Languages Uncontrolled keyword indexing Indexing languages –Controlled, but not structured Thesauri –Controlled and structured Classification systems –Controlled, structured, and coded Faceted thesauri and classification systems Much more on these topics later…

76 2009.01.26 - SLIDE 76IS 257 – Fall 2009 Lecture Contents Review –Lexical Relations –WordNet Organization of Information Metadata Dublin Core Controlled Vocabularies Discussion

77 2009.01.26 - SLIDE 77IS 257 – Fall 2009 Discussion

78 2009.01.26 - SLIDE 78IS 257 – Fall 2009 Assignment (Due Thursday) Describe the Svenonius book using the 15 Dublin Core elements

79 2009.01.26 - SLIDE 79IS 257 – Fall 2009 Next Time More on bibliographic description and rules (particularly AACR II) –History –Goals


Download ppt "2009.01.26 - SLIDE 1IS 257 – Fall 2009 Controlled Vocabularies University of California, Berkeley School of Information IS 245: Organization of Information."

Similar presentations


Ads by Google