Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 3 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Ontology Assessment – Proposed Framework and Methodology.
Taxonomy as Content Outline, Site Map and Search Aid SLA NWR Vancouver October 6, 2006 Marjorie M.K. Hlava President
Subject Analysis: An Introduction Based on BASIC SUBJECT CATALOGING USING LCSH edited by Lori Robare.
1 Metadata Registry Standards: A Key to Information Integration Jim Carpenter Bureau of Labor Statistics MIT Seminar June 3, 1999 Previously presented.
Database Systems: Design, Implementation, and Management Tenth Edition
Controlling values The equivalence relationship. The vocabulary problem What is this?
Taxonomies and Classification for Organizing Content Prentiss Riddle INF 385E 9/21/2006.
Library Resources For Education Prof. Jacqueline A. Gill Ext
Copyright Irwin/McGraw-Hill Data Modeling Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley.
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group.
United Nations Statistics Division Principles and concepts of classifications.
Session 8 Technical Services Moving from conceptual description to implementation technology.
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Engineering Village ™ ® Basic Searching On Compendex ®
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
Thesaurus Design and Development
Module 7b: Extracting/Controlling Terms and Semantic Relationships IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files.
International Atomic Energy Agency INIS : International Nuclear Information System Yves Turgeon Head, INIS Unit International Atomic Energy Agency.
Database Design Concepts Info1408
Knowledge organisation and information architecture, Nils Pharo Knowledge organisation and the Web Nils Pharo, 6th November 2002.
Sunday May 4 – 5 PM Bradford, Hlava, McNaughton
Subject languages part 2: Structure. Structure of subject languages Alphabetical representation and classified representation. Synthetic structure and.
EuroVoc, Eurlex, EU Bookshop Danica Maleková, Publications Office STS Bratislava, 22 October 2010.
William Yajima, PhD Senior Editor How to effectively organize and write for scientific books Association of Japanese Geographers 30 March 2013.
Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City, May.
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Terminology and Standards Dan Gillman US Bureau of Labor Statistics.
Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 5 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Metadata and Taxonomies The Best of Both Worlds Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
INF 384 C, Spring 2009 Subject Languages Category structures to represent topics.
AAT Art & Architecture Thesaurus. Diffuse list of museum standards
Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter rm.action?confKey=jhp7e.
Controlled Vocabulary & Thesaurus Design Hierarchies & Taxonomies.
IL Step 3: Using Bibliographic Databases Information Literacy 1.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
Current Events and Issues Using Index Databases for Finding Answers.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 2 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City,
Fanny Widadie, S.P, M.Agr 1 Database Management Systems.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
Literature Review Related Science, Knowledge, and Practice – The Context of the Study Back to Class 4.
Thesauri usage in information retrieval systems: example of LISTA and ERIC database thesaurus Kristina Feldvari Departmant of Information Sciences, Faculty.
Controlled Vocabulary & Thesaurus Design Hierarchies.
RESEARCH – DOING AND ANALYSING Gavin Coney Thomson Reuters May 2009.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
Controlled Vocabulary & Thesaurus Design Associative Relationships & Thesauri.
Subject Access to Your Information Sandy Tucker Texas A&M University Libraries August 1, 2006 Second International Symposium on Transportation Technology.
Charlyn P. Salcedo Instructor Types of Indexing Languages.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
Slide 6 HMD1SPI376 - Slide 6. What is the Relationship Between BT and NT?  Normally, BT and NT are "inverse" links. In other words, if X is a broader.
Controlling values for information organization 384C – Organizing Information Spring 2016 Karen Wickett School of Information University of Texas at Austin.
1 How do we describe something? n What something is about? –What the content of an object is “about”? n Different methods (Wilson, 1968) –counting terms.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Information Organization
Information Organization
Taxonomies, Lexicons and Organizing Knowledge
Attributes and Values Describing Entities.
THESAURUS CONSTRUCTION: GROUND WATER
Presentation transcript:

Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Part 3 Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City, May 21, 2006

Copyright © 2006 Access Innovations, Inc. 2 Build a taxonomy – simple steps Get paper and pencil –Sharpen pencil Define subject field Collect terms Organize terms Fill in gaps Flesh out and interrelate terms You’re done!

Copyright © 2006 Access Innovations, Inc. 3 Define subject field Review representative collection of content Determine: –Core areas –Peripheral topics Psychology Education Sociology Law Scope can be modified later

Copyright © 2006 Access Innovations, Inc. 4 Before you go on: Build or buy? Survey existing thesaurus/taxonomy resources for your domain Test for –Scope –Depth Make-or-break terms –Cost Don’t reinvent the wheel!

Copyright © 2006 Access Innovations, Inc. 5 Collect terms Your documents and databases Departmental terminology Text books and their indexes (indices) Book tables of contents and indexes Journal quarterly indexes Encyclopediae Lexicons, glossaries on the topic Web resources Users and experts Search logs

Copyright © 2006 Access Innovations, Inc. 6 Gather terms from search logs Beyond the Spider: The Accidental Thesaurus (Richard Wiggins, Information Today, Oct 2002) Top ~100 search terms from search logs Match to web site with appropriate answer Basis for favorites or best bets, presented at the top of results list. (AKA behavior-based taxonomy) Not a thesaurus or taxonomy, but still a useful source of terms.

Copyright © 2006 Access Innovations, Inc. 7 Organize terms – roughly Sort terms into several major categories – logical groups of similar concepts as Top Terms –Identify core areas and peripheral topics –10 – 20 to start –Consider moving proper names to authority files Result: loose collection of terms under several main headings –Rough and tentative – see how it fits as you go –Initial gap analysis –Add / modify / delete as needed

Copyright © 2006 Access Innovations, Inc. 8 Labelling a concept – cognitive linguistics Most-used labels are middle in range from abstract to specific --- relates to search Linguistic universal – true across cultures Unique beginner Life form Generic Specific Varietal Insurance Health insurance Group health insurance Practical application?

Copyright © 2006 Access Innovations, Inc. 9 Craft the Top Terms Toughest job and most important step! Dictates further organization Determines how browsers/searchers perceive the taxonomy –Coverage –Formality Establish the concept first, tweak the wording later

Copyright © 2006 Access Innovations, Inc. 10 The term record Main Term (MT) Top Term (TT) Broader Terms (BT) Narrower Terms (NT) Related Terms (RT) –See also (SA) Scope Note (SN) History (H) NonPreferred Term (NP) –Used for (UF), See (S) see Lexicographer’s lexicon = subject term, heading, node, category, descriptor, class TAXONOMY THESAURUS

Copyright © 2006 Access Innovations, Inc. 11 Usefulness of a term – the “duh” factor Some terms are so basic for a domain that they have little or no value –“Sports” in Sports Illustrated –“Technology” in Technology Review –“Golf” in Golf Magazine How useful will the term be for indexing? –Apply to everything in the domain? –Distinguish important concepts? –If term is needed, specify limited use conditions in Scope Note

Copyright © 2006 Access Innovations, Inc. 12 Hierarchy structures – variations on a theme Not pre-determined –Wines  type  variety  region  cost –Or Wines  cost  type…. Varies by user group and needs –May have multiple views of same content –Standard alpha view or customized notation Affects information architecture, i.e. how web site functions

Copyright © 2006 Access Innovations, Inc. 13 How do terms relate? Hierarchical relationships -- Parents and their children Equivalence relationships -- Aliases Associative relationships -- Cousins TAXONOMY THESAURUS

Copyright © 2006 Access Innovations, Inc. 14 Hierarchical relationships Broader Term represents the category Narrower Term represents the specific Three types: –Generic relationship (BTG/NTG) –Whole-part relationship (BTP/NTP) –Instance relationship (BTI/NTI) BTs/NTs have a reciprocal relationship

Copyright © 2006 Access Innovations, Inc. 15 Broader to Narrower Terms Gubernatorial elections Politics Elections Presidential elections Mayoral elections Generic Specific Varietal

Copyright © 2006 Access Innovations, Inc. 16 Hierarchy – Generic (genus-species) relationship Inheritance or inclusion – what’s true of the parent (BT) is true for all children (NTs) Applies to entities, actions, properties, agents – not just biological taxonomies ValueTeachersThinking Cultural value Adult educators Contemplation Economic value School teachers Divergent thinking Moral value Special ed teachers Lateral thinking Social value Student teachers Reasoning

Copyright © 2006 Access Innovations, Inc. 17 Generic relationship test – 1 Both terms in same fundamental category “All-and-some” test SOMEALL SOMENOT ALL Rodents Squirrels Pests Squirrels

Copyright © 2006 Access Innovations, Inc. 18 Generic relationship test – 2 Pests Squirrels Rodents ALL squirrels are rodents x NOT ALL squirrels are pests x NOT ALL pests are rodents

Copyright © 2006 Access Innovations, Inc. 19 Hierarchy – Whole-part relationship Also known as meronymy or partonomy Four types allowed in thesaurus standards –Body systems and organs Ear  Middle ear –Geographical locations Bernalillo County  Albuquerque –Fields of study Geology  Physical geology –Hierarchical organizational/corporate/social/political structures Diocese  Parish

Copyright © 2006 Access Innovations, Inc. 20 Hierarchy – Instance relationship General category (common noun) = BT Individual example (proper noun) = NT SeasNew York museums Baltic Sea Guggenheim Museum Caspian Sea Museum of Modern Art Mediterranean Sea Museum of Natural History Essentially identical to “final node” in taxonomies. Best practice: long list  move to authority file

Copyright © 2006 Access Innovations, Inc. 21 Polyhierarchical relationship Term can logically fit under more than one Broader Term – can have Multiple Broader Terms (MBT) New to ANSI/NISO standards SpoonsForks Sporks NursesHealth administrators Nurse administrators FinanceCareers Accounting

Copyright © 2006 Access Innovations, Inc. 22 Equivalence relationship Preferred Term –Thesaurus term and valid for indexing –Thesaurus notation: USE NonPreferred Term –Not valid for indexing –An alias or imposter –Entry point, directs user to Preferred Term –Thesaurus notation: UF or NPT SpidersPlant pathology UF Arachnids USE Phytopathology

Copyright © 2006 Access Innovations, Inc. 23 Equivalence – when to use Synonyms, slang, quasi-synonyms Scientific and trade names –IbubrofenUF Motrin™ Lexical variants –Fiber opticsUF Fibre optics –MouseUF Mice Upward posting of narrow concepts not specified in taxonomy or thesaurus –Social classUF Elite, Middle class, Working class Get equivalent terms from search logs, brainstorming…

Copyright © 2006 Access Innovations, Inc. 24 Associative relationship Related Terms (RTs) ~ cousins “…terms related conceptually but not hierarchically, and are not part of an equivalence set” (i.e. not synonyms) –Should siblings be Related Terms?? Both terms are valid thesaurus terms for indexing, and have reciprocal relationship Expands user’s awareness, reflects thesaurus coverage of unanticipated areas Standards describe specific types (see Lexicon)

Copyright © 2006 Access Innovations, Inc. 25 Sibling rivalry and facets Format and sense of sibling terms should be consistent If siblings don’t coexist well, separate them Subdivide large groups of terms into facets, mutually exclusive subcategories Growing demand with faceted navigation Facet examples –Properties, Materials, Agents, Actions, Influence –Objects, Styles and periods, Color, Shape (Art & Architecture Thesaurus)

Copyright © 2006 Access Innovations, Inc. 26 Faceted classification Pharmaceuticals –(by action) Anti-inflammatory agents… –(by chemical structure) Alkaloids… –(by indication) Pain… –(by use) Immunosuppression… Facet indicators (aka Node labels), not to be used for indexing

Copyright © 2006 Access Innovations, Inc. 27 Faceting challenge Paint –Oil paint –High-gloss paint –Interior paint –Matte paint –Latex paint –Semi-gloss paint –Exterior paint Propose facet indicators and subgroup these paint varieties into facets.

Copyright © 2006 Access Innovations, Inc. 28 Do you agree? Paint –(by type) Oil paint Latex paint –(by use) Interior paint Exterior paint –(by surface) High-gloss paint Matte paint Semi-gloss paint

Copyright © 2006 Access Innovations, Inc. 29 Scope Notes (SN) Indicate meaning of the term in the context of this thesaurus, for this audience –Stress – Metal, Psychological, Physiological Indicate any restriction in meaning Indicate range of topics covered Provide direction for indexers; for terms often confused, may suggest an alternative term Use only as needed – not for every term Establish and stick with consistent format Be concise