Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Significance of Vocabulary Michael Buckland School of Information Management and Systems University of California, Berkeley.

Similar presentations


Presentation on theme: "The Significance of Vocabulary Michael Buckland School of Information Management and Systems University of California, Berkeley."— Presentation transcript:

1 The Significance of Vocabulary Michael Buckland School of Information Management and Systems University of California, Berkeley

2 The Significance of Vocabulary An economic claim: Vocabulary problems reduce the benefits and return on investment in information services. Vocabulary is used for indexicality, therefore issues of identity are central to LIS. Vocabulary is central to digital libraries. Vocabulary central to explaining the history of conceptions of LIS!

3 A correctly formed Library of Congress Subject heading, but who would think of such search terms? God --- Knowableness --- History of doctrines --- Early church, ca. 30-600 --- Congresses.

4 Economic Rationale: Massive investment in repositories Large investment in categorization schemes: classifications, thesauri, concept codes, headings, … Categorization schemes usually specialized and stylized Increasingly unfamiliar to searchers, hence ineffective, inefficient use

5 Remedy Support for searching unfamiliar metadata vocabularies: Interface to translate searcher’s vocabulary into system’s vocabulary.

6 Examples Automobile import, export data (Census Bureau) Automobiles? No data. Cars? “Railway or tramway stock” (Passenger motor vehicles, spark ignition engine.)

7 “Automobiles”, also know as... TL 205 180/280 3711 in Library of Congress Classification in U.S. Patent Classification in Standard Industrial Classification

8 Example: Coastal pollution F SU COASTAL POLLUTION 0 F TW COASTAL POLLUTION SUMMARIZE SUBJECTS LCSH Marine pollution Coastal zone management Water --- Pollution Petroleum industry and trade Beach erosion Coasts Barrier islands MeSH Seawater Water pollution Bacteria Water microbiology Air pollution Environmental monitoring Bathing beaches

9 International Harmonized Commodity Classification System: “Computer” HS 84: “Nuclear reactors, boilers, machines and mechanical appliances” HS 8471: “Automatic data processing machines and units thereof, magnetic or optical readers, machines for transcribing data” HS 847120: “Digital auto data proc mach contng in the same housing a CPU and input & output device”

10 INSPEC Thesaurus subdomain- based indexes: “Water” subdomain: Fission reactor safety; Fission reactor fuel; Polymers; Organic insulating materials; Water supply; Cable insulation; Insulation testing; and Insulating oils. “Biology” subdomain: Water; Biomechanics; Physiological models; Neurophysiology; Cellular effects of radiation. “Information Studies” subdomain: Agriculture; Natural resources; Forecasting theory; Operations research; Erosion.

11 Example: Vietnam War. U.C. MELVYL Online Catalog FIND XSU VIETNAM WAR Search Results: 0 records FIND XSU VIETNAMESE CONFLICT Search Results: 4,190 records

12 Dictionaries don’t always help Emanuel Goldberg: Aerial photography using a “Drachen” Actual meaning: Aerodynamic tethered balloon. Standard contemporary English was: Aerostat. German: Drachen (= Kite in dictionary)

13 “Entry vocabulary” search interfaces: Software and algorithms map natural language vocabulary to specialized metadata terms. Allows users to enter ordinary language queries while taking advantage of existing subject headings, categorization Uses co-occurrence statistics to link users’ ordinary language terms to system vocabularies Statistical association between lexical items in titles and abstracts and the system’s metadata vocabulary Suggests most likely system vocabulary

14 Thesaurus navigation Facilitates browsing where structure is present: Broader, narrower, related terms Guides searcher to other parts of the structure Retrieval set analysis Navigation within micro-domain

15 Web access: WWW forms-based application supported by Perl Supports searches on remote repositories Four subdomain dictionaries in three databases --- BIOSIS (Biological abstracts): subdomain “water” --- INSPEC: subdomains: “information science”, “water” --- U.S. Patent Office classification

16 Statement of work: Varied prototype Entry Vocabulary Modules. Unintrusive development of EVMs by agents Sensitivity to subdomains. Natural language processing to augment statistical term frequency. Recommendations for metadata “codebooks” for numeric databases. www.sims.berkeley.edu/metadata/


Download ppt "The Significance of Vocabulary Michael Buckland School of Information Management and Systems University of California, Berkeley."

Similar presentations


Ads by Google