Thesaurus Design and Development

Slides:



Advertisements
Similar presentations
Ali Alshowaish. dc.coverage element articulates limitations in the scope of the resource, typically along the following lines: geographical, temporal,
Advertisements

BS 8723 advances to encompass interoperability Stella G Dextre Clarke Convenor, IDT/2/2 Working Group of BSI.
Database Searching: How to Find Journal Articles? START.
UK-based developments in online thesauri for taxonomic information Copp, C., Grant, M., Hewzulla, D., Hussey, C., Robinson, J., van Breda, J. & White,
Searching at the TFDL September, 2013 Rosvita Vaska
Subject Analysis: An Introduction Based on BASIC SUBJECT CATALOGING USING LCSH edited by Lori Robare.
ISO – plans and progress towards the revised international standard for thesauri Stella G Dextre Clarke Project Leader, ISO NP
Advanced Searching Engineering Village.
6. Applying metadata standards: Controlled vocabularies and quality issues Metadata Standards and Applications Workshop.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Engineering Village ™ ® Basic Searching On Compendex ®
Search Strategies Online Search Techniques. Universal Search Techniques Precision- getting results that are relevant, “on topic.” Recall- getting all.
SLIDE 1IS 257 – Fall 2007 Thesaurus Construction and Use University of California, Berkeley School of Information IS 245: Organization of.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
Coolheads Consulting Copyright © 2003 Coolheads Consulting The Internal Revenue Service Tax Map Michel Biezunski Coolheads Consulting New York City, USA.
SLIDE 1IS 202 – FALL 2003 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003
11/21/2000Information Organization and Retrieval Thesaurus Design and Development University of California, Berkeley School of Information Management and.
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
8/28/97Information Organization and Retrieval Controlled Subject Vocabularies and Thesauri University of California, Berkeley School of Information Management.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
SLIDE 1IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
A Registry for controlled vocabularies at the Library of Congress
Jump to first page Information Management Process Information adapted from Prince William County Information Management Manual.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
Vocabulary & languages in searching
How do I know the differences and uses of keyword versus subject searching in a database?
EuroVoc, Eurlex, EU Bookshop Danica Maleková, Publications Office STS Bratislava, 22 October 2010.
Some facets of knowledge management in mathematics Wolfram Sperber (Zentralblatt Math) Patrick Ion (Math Reviews) Facets of Knowledge Organization A tribute.
Research Strategies Step-by-Step An Introduction to Library Research Questions about this activity? Contact Kimberley Stephenson at
1 MeSH & Principles of Classification April 13, 2005.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Indexes/Abstracts Ready Reference Dr. Dania Bilal IS 530 Spring 2002.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Improving Access to Audio- Visual Materials by Using Genre/Form Terms OLAC Conference 1-3 October 2004 Montreal, Quebec.
1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany.
WHO-CEHA Inter-Water Thesaurus and other WHO Sources for Health and Environment Terminology Mazen Malkawi Technical Information Officer WHO/EMRO/CEHA.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
CINAHL DATABASE FOR HINARI USERS: nursing and allied health information (Module 7.1)
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
 Publications that appear regularly within certain intervals of time.  Publications that are published continuously within a regular time frame (daily,
Current Events and Issues Using Index Databases for Finding Answers.
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
INFO Week 8 Subject Indexing & Knowledge Representation Dr. Xia Lin Assistant Professor College of Information Science and Technology Drexel University.
Thesauri usage in information retrieval systems: example of LISTA and ERIC database thesaurus Kristina Feldvari Departmant of Information Sciences, Faculty.
Subject Headings for Reference Everything You Need to Know About Subject Headings in One Easy Lesson By Dr. Nancy J. Becker Presented by Dr. Kevin Rioux.
June 2003INIS Training Seminar1 INIS Training Seminar 2-6 June 2003 Subject Analysis Thesaurus and Indexing Alexander Nevyjel Subject Control Unit INIS.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
LIS 204: Introduction to Library and Information Science Week Nine Kevin Rioux, PhD.
ORGANIZATION OF ELEMENTS OF INFORMATION The Thesaurus.
8/28/97Information Organization and Retrieval Introduction University of California, Berkeley School of Information Management and Systems SIMS 245: Organization.
Subject Access to Your Information Sandy Tucker Texas A&M University Libraries August 1, 2006 Second International Symposium on Transportation Technology.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Chapter 7 Part II Structuring System Process Requirements MIS 215 System Analysis and Design.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
Among the skills we’ll address today....  Constructing a search for scholarly articles (Where? How?)  Working with your search results  Locating the.
Queensland University of Technology Faculty of Information Technology Michael Middleton 1 CRICOS No J Bibliographic description.
MEDLINE®/PubMed® PubMed for Trainers, Fall 2015 U.S. National Library of Medicine (NLM) and NLM Training Center An introduction.
1 How do we describe something? n What something is about? –What the content of an object is “about”? n Different methods (Wilson, 1968) –counting terms.
GUIDE. P UB M ED
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
1 SUBJECT ACCESS INF 389F: Organization of Records Information Professor Fran Miksa October 29, 2003.
Subject Analysis: An Introduction
Slides Template for Module 3 Contextual details needed to make data meaningful to others CC BY-NC.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
CINAHL DATABASE FOR HINARI USERS
PubMed.
Presentation transcript:

Thesaurus Design and Development University of California, Berkeley School of Information Management and Systems SIMS 202: Information Organization and Retrieval 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval Review Origins and Uses of Controlled Vocabularies for Information Retrieval Types of Indexing Languages, Thesauri and Classification Systems 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval Indexing Languages An index is a systematic guide designed to indicate topics or features of documents in order to facilitate retrieval of documents or parts of documents. An Indexing language is the set of terms used in an index to represent topics or features of documents, and the rules for combining or using those terms. 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval Thesauri A Thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among Synonymous, Equivalent, Broader, Narrower and other Related Terms 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval Thesauri (cont.) Examples: The ERIC Thesaurus of Descriptors The Art and Architecture Thesaurus The Medical Subject Headings (MESH) of the National Library of Medicine 8/28/97 Information Organization and Retrieval

Classification Systems A classification system is an indexing language often based on a broad ordering of topical areas. Thesauri and classification systems both use this broad ordering and maintain a structure of broader, narrower, and related topics. Classification schemes commonly use a coded notation for representing a topic and it’s place in relation to other terms. 8/28/97 Information Organization and Retrieval

Automatic Indexing and Classification Automatic indexing is typically the simple deriving of keywords from a document and providing access to all of those words. More complex Automatic Indexing Systems attempt to select controlled vocabulary terms based on terms in the document. Automatic classification attempts to automatically group similar documents using either: A fully automatic clustering method. An established classification scheme and set of documents already indexed by that scheme. 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval Today Thesaurus design Steps in Thesaurus development Indexing 8/28/97 Information Organization and Retrieval

Why develop a thesaurus? To provide a conceptual structure or “space” for a body of information To make it possible to adequately describe the topical contents of informational objects at an appropriate level of generality or specificity To provide enhanced search capabilities and to improve the effectiveness of searching (I.e., to retrieve most of the relevant material without too much irrelevant material). 8/28/97 Information Organization and Retrieval

Why develop a thesaurus? To provide vocabulary (or terminological) control. When there are several possible terms designating a single concept, the thesaurus should lead the indexer or searcher to the appropriate concept, regardless of the terms they start with. 8/28/97 Information Organization and Retrieval

Preliminary considerations What is used now? Continue using an existing thesaurus? Ad hoc modification of existing thesaurus? Develop a new well-structured thesaurus? What is the scope and complexity of the subject field? What kind of retrieval objects or data will be dealt with? How exhaustive and specific is the desired description of objects? 8/28/97 Information Organization and Retrieval

Preliminary Considerations The scope and complexity of the field will provide some indication of the scope and complexity of the thesaurus. It is better to plan for a larger and more comprehensive system than a smaller system that rapidly will become inadequate as the database grows. Development of a good thesaurus requires a major intellectual effort as well as clerical operations like data entry and production of sorted lists. 8/28/97 Information Organization and Retrieval

Development of a Thesaurus Term Selection. Merging and Development of Concept Classes. Definition of Broad Subject Fields and Subfields. Development of Classificatory structure Review, Testing, Application, Revision. 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval 1. Term Selection Select sources for the collection of terms. Prearranged Sources Open-ended Sources Assign codes to each source. Selection of terms For part of pre-arranged and for all open-ended sources Enter terms into database with all information. 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval 1.1 Kinds of Sources Prearranged Sources Existing descriptor lists, classification schemes thesauri. This includes universal schemes like DDC or LCSH. Nomenclatures of single disciplines Treatises on the terminology of a field Encyclopedias, lexica, dictionaries and glossaries. Tables of contents of textbooks and handbooks Indexes of journals or abstracting journals Indexes of other publications in the field 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval 1.1 Kinds of Sources Open-ended sources Lists of search requests or interest profiles Description of projects/activities to be served by the information retrieval system. Discussion with specialists in the field Sample of documents in the field Ask users why and how these documents relate to the field. Have documents indexed by experts in the field Lists of titles of documents in the field Abstracts and reviews of documents Your own knowledge 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval Selection of sources Prearranged sources require less effort in gathering the material, and may already indicate some relationships between terms and concepts and relationships among terms. Open-ended sources can reflect current terminology and may provide more complete coverage. Choose a set of sources that are current, as complete as possible, and considered authoratative. 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval Selection of Sources Each selected source is assigned an ID for tracking its use in the development of the thesaurus. Useful when making decisions about which terms to prefer Useful for backtracking when questions arise (where did this come from?) 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval Selection of Terms Terms can be transferred directly from prearranged sources to the recording medium (cards or database) Have to decide which terms and references to include, or to take the whole source 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval Selection of Terms In open-ended sources you read through the source and pick out terms (I.e. words and phrases) that might be useful in retrieval or as references to other terms. Alternatively, use keyword and phrase extraction software to create lists of terms and select from those. Transfer selected terms to the recording medium (cards or database). 8/28/97 Information Organization and Retrieval

2. Merging and Development of Concept Classes Sort Term DB into alphabetical order. First Round: Merge information for Identical terms -- possibly pulling info from additional sources. Second Round: Merge synonyms or terms in the same concept class. 8/28/97 Information Organization and Retrieval

3. Definition of Broad Subject Fields and Subfields Work out the detailed structure Select Preferred Terms Merge information for terms in the same concept class Repeat these steps for each subfield within a broad field and for each broad field Until all terms have been consolidated and preferred terms selected Define Broad Subject fields and sort terms into these broad fields Define subfields within each broad field and sort terms into these subfields. 8/28/97 Information Organization and Retrieval

4. Development of Classificatory Structure Produce preliminary version of classified index and update the working database. Improve classificatory structure Reality check: produce and distribute a version of the classified index. Distribute to users/experts. 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval 5. Final Stages Review Testing Application Revision 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval Review Discuss classified index with users/experts. Select descriptors and checklist descriptors. Assign Notational Symbols Produce Main Thesaurus & Indexes 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval Review (cont.) Check cross references and insert where needed Produce Test Version Test by Indexing Modify as needed Produce Production Version. 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval Testing a Thesaurus Assign descriptors to a sample set of NEW documents (use enough to get an idea of any gaps in the thesaurus. Test retrieval using sample questions and seeing how effectively the thesaurus maps to the appropriate descriptor 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval The Indexing Process Concept identification term selection (via thesaurus) term assignment 8/28/97 Information Organization and Retrieval

Application: The Indexing Process (Manual) Select Alternative term to represent Concept NO Would Concept be better represented by one of these terms Is There Another Concept Consider Preferred Term Select Establish Term Denoting Examine Document and Identify Significant Concepts First Term? Start NO YES Does Thesaurus contain term for Consider any associated terms in Thesaurus (NT,BT) Admit New Term Into Thesaurus Can Concept be expressed combining terms? Consider Each of These Terms Assign Terms to Document Prefer Alternative Term(s) End Is Term suitable Adapted from ISO 5963, p.5 8/28/97 Information Organization and Retrieval

Thesaurus Revision and Updates There will always be new concepts, products, or expressions that need to be added to the thesaurus. Set a regular schedule of reviews and revisions. Collect complaints, problems, etc. and fold into revision of the thesaurus 8/28/97 Information Organization and Retrieval

Information Organization and Retrieval References Soegel, D. Indexing Languages and Thesauri: Construction and Maintenance. Los Angeles : Melville Publishing Co., 1974 Foskett, A.C. The Subject Approach to Information. London: Clive Bingley, 1982. Standards: ANSI/NISO z39.19--1994 -- American National Standard Guidelines for the Construction, Format and Management of Monolingual Thesauri ANSI/NISO Draft Standard Z39.4-199x -- American National Standard Guidelines for Indexes in Information Retrieval ISO 2788 -- Documentation -- Guidelines for the establishment and development of monolingual thesauri ISO 5964-- Documentation -- Guidelines for the establishment and development of multilingual thesauri 8/28/97 Information Organization and Retrieval