Download presentation
Presentation is loading. Please wait.
Published byDeirdre Pope Modified over 9 years ago
1
Synonyms & Taxonomies Thesaurus Design for Information Architects
an ACIA Seminar by Peter Morville & Samantha Bailey
2
Introductions Peter Morville (morville@argus-inc.com)
CEO, Argus Associates Co-author, Information Architecture for the World Wide Web Director, ACIA LIS background Fortune 500 consulting
3
Introductions Samantha Bailey (bailey@argus-inc.com)
VP of Operations, Argus Associates LIS background Fortune 500 consulting VC experience
4
Seminar Outline Thesauri in Context Value of Thesauri Methodology
Metadata Vocabulary Control Structure & Relationships Thesaurus Management Case Study Related Topics Instructional Methods Exercises, Quizzes, Discussions, Breaks
5
Our Approach Assumptions Understanding of IA Basics
Interest in Thesauri and the Web Philosophy Reality is Important Technology has Limitations Success takes Time Tension can be Healthy
6
Thesauri in Context What is IA?
The art and science of structuring and organizing information systems to help people achieve their goals.
7
Thesauri in Context An Ecological Approach
Books: Information Ecologies by Bonnie Nardi and Information Ecology by Thomas Davenport
8
Thesauri in Context IA From Top to Bottom
Top-Down Bottom-Up portal sub-site strategy objects hierarchy metadata primary path multiple paths portal local subsites (HR, Engineering, R&D…) Object X Name: Product Category: Topic: Stale Date: Author: Security:
9
Thesauri in Context Where Does IA Fit?
The Elements of User Experience Jesse James Garrett
10
Thesauri in Context What is Vocabulary Control?
Controlled Vocabulary A list of preferred and variant terms. A subset of natural language. Preferred Variants Authority AZ Ariz, Arizona, 85XXX US Postal Service IBM Intl Bus Machines, Big Blue NY Stock Exchange Nyctalopia Night blindness Moon blindness National Library of Medicine
11
Thesauri in Context Why Control Vocabulary?
Language is Ambiguous Synonyms, homonyms, antonyms, contronyms, etc. In the Oxford English Dictionary: “Round” takes 7 ½ pages or 15,000 words to define. “Set” has 58 uses as a noun, 126 as a verb, 10 as an adjective. The Mother Tongue: English & How It Got That Way by Bill Bryson
12
Thesauri in Context Why Control Vocabulary?
So Your Users Don’t Have To!
13
Thesauri in Context Semantic Relationships
Types Equivalence Hierarchical Associative (Broader) United States 2 (Variant) Vt 1 (Preferred) Vermont (Variant) Green Mountain State 3 (Related) Skiing (Narrower) Burlington (Related) Maple Syrup
14
Thesauri in Context Levels of Control
15
Thesauri in Context What is a Thesaurus?
Traditional Use Dictionary of synonyms (Roget’s) From one word to many words Information Retrieval Context A controlled vocabulary in which equivalence, hierarchical, and associative relationships are identified for purposes of improved retrieval Many words to one concept
16
Thesauri in Context Terminology
Preferred Terms (UF subject headings, descriptors) SN Scope Notes UF Used For BT Broader Term NT Narrower Term RT Related Terms (“See Also”) Variant Terms (UF non-preferred, entry terms) USE (“See”)
17
Thesauri in Context Types of Thesauri
18
Thesauri in Context Visibility
Classic Use Both indexers and searchers explicitly map natural language terms onto controlled vocabularies Web Environment Able to choose level of visibility (implicit use, thesaural browsers) Opportunity to educate users (terminology, associative learning)
19
Thesauri in Context Niche Applications (hypothetical example)
20
Thesauri in Context Thesaurus Standards
Mono-Lingual Thesauri ISO 2788 (1974, 1985, 1986, International) BS 5723 (1987, British) AFNOR NFZ (1981, French) DIN 1463 ( , German) ANSI/NISO Z39.19 (1994, United States) Multi-Lingual Thesauri ISO 5964 (1985, International)
21
Thesauri in Context ANSI/NISO Standard
Z Guidelines for the Construction, Format, and Management of Monolingual Thesauri. 84 pp. ISBN: Price: $49.00 Reasons to Follow Standard Significant thinking behind guidelines Technology integration Cross-database compatibility
22
Thesauri in Context Oracle’s Perspective
“The phrase…thesaurus standard is somewhat misleading. The computing industry considers a ‘standard’ to be a specification of behavior or interface. These standards do not specify anything. If you are looking for a thesaurus function interface, or a standard thesaurus file format, you won't find it here. Instead, these are guidelines for thesaurus compilers -- compiler being an actual human, not a program. What Oracle has done is taken the ideas in these guidelines and in ANSI Z39.19…and used them as the basis for a specification of our own creation…So, Oracle supports ISO-2788 relationships or ISO-2788 compliant thesauri.”
23
Thesauri in Context A World in Transition
“The majority of basic problems of thesaurus construction had already been solved by 1967.” (Krooks and Lancaster, 1993) Traditional Thesauri Web Thesauri Print Online Academic / Library Business Expert / Repeat Users Novice / Infrequent Users Visible Invisible Accepted Value Unknown Value
24
Section Break Thesauri in Context Value of Thesauri Methodology
Metadata Vocabulary Control Structure & Relationships Thesaurus Management Case Study Related Topics
25
Value of Thesauri IA Metrics
Cost of finding (time, clicks, frustration, precision). Cost of not finding (success, recall, frustration, alternatives). Cost of development (time, budget, staff, frustration). Value of learning (related products, services, projects, people).
26
Value of Thesauri KM Metrics
Revenue Generation (% revenues spent on KM, new revenue generation) Opportunity Cost (staff time, customers lost) Knowledge Efficiency (faster product development, # mistakes made twice) Data Quality (% knowledge on intranet, % with attachments) Intranet Usage (# hits, # contributions) Individual Behavior (# citations) Technical Performance (uptime, search response time) Working Council for Chief Information Officers Basic Principles of Information Architecture (
27
Value of Thesauri Web Site Statistics
Wasted expense: most sites will waste between $1.5M and $2.1M on redesigns next year. Forfeited revenue: poorly architected retailing sites are underselling by as much as 50%. Lost customers: the sites we tested are driving away up to 40% of repeat traffic. Eroded brand: people who have a bad experience, typically tell 10 others. Forrester Research Why Most Web Sites Fail (Sept 98)
28
Value of Thesauri Intranet Statistics
Employees spend 35% of productive time searching for information online. Working Council for Chief Information Officers Basic Principles of Information Architecture ( Managers spend 17% of their time (6 weeks a year) searching for information. Information Ecology Thomas Davenport and Lawrence Prusak (
29
Value of Thesauri Intranet Statistics
Sun Microsystems’ usability experts calculated that 21,000 employees were wasting an average of six minutes per day due to inconsistent intranet navigation structures. When lost time was multiplied by staff salaries, the estimated productivity loss exceeded $10 million per year. Jakob Nielsen Web Design and Development September 1997
30
Value of Thesauri Intranet Statistics
After spending two years and $3 million on development and usability testing, Bay Networks expects to see $10 million in productivity gains and a 10 percent cycle-time reduction for new product development as a result of its new information architecture. Working Council for Chief Information Officers Basic Principles of Information Architecture (
31
Value of Thesauri Intranet Statistics
40% of corporate users can’t find the information they need on their intranet. Prior to intranet reengineering in 1997, Ford conducted a survey of its 100,000+ user base. Employees stated they could only find 15% of the information they needed to do their jobs. Under-investment in (unstructured) information. 80% spending on 20% (structured) data. Working Council for Chief Information Officers Basic Principles of Information Architecture (
32
Value of Thesauri Searching Problems
“Most of the complaints we get are due to the way users search – they use the wrong keywords.” - a manufacturing company “We have problems with the way customers enter queries. Capitalizations and misspellings give us headaches.” - a software company Forrester Research Must Search Stink? (June 2000)
33
Value of Thesauri Searching Statistics
“Search will become the center piece of navigation.” 90% of firms rate search as very or extremely important. 52% don’t measure search effectiveness. Forrester Research Must Search Stink? (June 2000)
34
Value of Thesauri CV Statistics
Researchers at Bell Labs found the probability that two people would choose the same word to describe an object to be less than 20%. Furnas, Landauer, et. al., Bell Labs (1987) 30% of corporations systematically utilize metadata to classify information, while only one to three percent of companies populate those metadata tags using controlled vocabularies. 71% don’t account for misspellings or synonyms. Forrester Research Building an Intranet Portal (Jan 1999)
35
Value of Thesauri CV Statistics
Principle of unlimited aliasing: by leveraging synonyms, recall went from 20% to 80% (in a small collection). The Trouble with Computers Research study at Bellcore (Furnas et al. 1987) “The findings indicate that a hypertext index with multiple access points for each concept…led to greater effectiveness and efficiency of retrieval on almost all measures.” A Usability Assessment of Online Indexing Structures By Carol A. Hert, Elin K. Jacob, and Patrick Dawson Journal of the American Society for Information Science (September 2000)
36
Value of Thesauri Complementary Approaches
Basic Navigation Design (Browsing) Full Text Indexing (Searching) Advanced Collaborative Filtering Lexical Databases Automated Hierarchy-Generation
37
Value of Thesauri Navigation Design
Relationships Global & Local (hierarchical) Contextual (associative)
38
Value of Thesauri Full Text Indexing
Strengths Enables high precision (exact phrase) Enables high recall (word occurrence) Weaknesses Often results in low precision (“aboutness”) Often results in low recall (synonyms) Complementary Use Provide users with option (search CV, full text) Intelligent next step (no hits on CV > full text) Full text search within CV search zones
39
Value of Thesauri Collaborative Filtering
SN. Approaches that leverage knowledge about preferences or behaviors of people or organizations to facilitate information retrieval. Popularity / Importance Direct Hit (analysis of searcher behavior) Amazon (cross-title purchasing habits) Google (citation indexing) Considerations Favors established materials Lacks benefits of vocabulary control User-centric (ignores content, context)
40
Value of Thesauri Lexical Databases
Scope Notes Broad term banks or semantic networks that specify lexical variants and term relationships. General-interest, off-the-shelf thesauri. Examples Roget’s Thesaurus WordNet Plumb Design Visual Thesaurus
41
Value of Thesauri Lexical Databases
Number of Terms (General, Niche) Importance of Context (Bug in Software, Espionage) # of Terms # of Meanings Notes WordNet 50,000 70,000 Oxford English Dictionary 615,000 2.4M > 20,000 New Terms Per Year Named Insect Species 1.4M Drosophila UF Fruit Fly Square D Products 300,000 Electrical Distribution
42
Value of Thesauri Hierarchy-Generation Software
An Intimidating Vocabulary Multivariate regression models, probabilistic Bayesian models, neural networks, symbolic rule learning, computational semiotics, and support vector machines General Techniques Clustering (similarity, word co-occurrence) Vector Space (extract “meaning” from terms, teach by example)
43
Value of Thesauri Hierarchy-Generation Software
Examples Autonomy ( Semio ( Cartia ( Hyperbole Autonomy claims their software eliminates "the need for any manual labor in the process."
44
Value of Thesauri Hierarchy-Generation Software
Considerations No business context No consideration of users No planning for future Mixed category schemes Hidden costs integration rule design training Trends Niche use (e.g., news, web search results) Integration with manual classification schemes
45
Section Break Thesauri in Context Value of Thesauri Methodology
Metadata Vocabulary Control Structure & Relationships Thesaurus Management Case Study Related Topics
46
indicates special emphasis during this phase Strategy Design Build
Methodology Overview indicates special emphasis during this phase Strategy Design Build Process Deliverables Consulting
47
Methodology Strategy x Process
Information Architect’s Toolbox * Business Context strategy meetings opinion leader interviews technology assessment Content & Applications content inventory content analysis metadata evaluation Users log analysis observation / usability testing interviews / affinity modeling Existing IA heuristic evaluation classification scheme analysis benchmarking * select right mix for project; this is a partial list of tools
48
Methodology Design x Deliverables
Information Architect’s Toolbox * Organization & Labeling metadata specifications controlled vocabularies thesaurus Navigation (Embedded) primary taxonomy classification schemes blueprints and wireframes Navigation (Supplemental) search system sitemap / indexes personalization / customization Synthesis design / authoring guidelines content management policies functional specifications * select right mix for project; this is a partial list of tools
49
Methodology Consulting x Build
Information Architect’s Toolbox * Metadata Application object-level indexing guides support indexers support thesaurus managers Point of Production support designers / developers usability testing input / analysis fix problems Post - Launch metrics evaluation improvement * select right mix for project; this is a partial list of tools
50
Methodology Thesaurus Construction
Strategy 1. Define Thesaurus Strategy 2. Develop Project Plan Design Gather Candidate Terms / Variants Select Preferred Terms Develop Facet Hierarchies Identify ‘See Also’ Links Write Design / Functional Specifications Build / Buy Software Applications Build 9. Launch Indexing Operation 10. Refine Controlled Vocabularies
51
Methodology Strategy Questions
Does vocabulary control make sense? Where and for what purposes? How will it align with business goals? How will it support users’ goals? How will it impact content management? Will we buy, borrow, or build?
52
Section Break Thesauri in Context Value of Thesauri Methodology
Metadata Vocabulary Control Structure & Relationships Thesaurus Management Case Study Related Topics
53
Metadata Definition Information about information Purposes
1. Document surrogate (abstract) 2. Provides context (date, publisher) 3. Facilitates retrieval (subject)
54
Metadata Ways to Leverage
User Interface Generate browsable indexes (site-wide, sub-site, specialized authority files) Enable field-specific searching (filters, zones, sorting) Support personalization (map profile to tags) Behind the Scenes Enable efficient content management Support decentralized tagging
55
Metadata Types of Indexing
Manual Automated Full Text x complete text minus stop words Keyword (Natural Language) humans assign “relevant” words and phrases software assigns “relevant” words and phrases Controlled Vocabulary humans map variants to preferred terms software maps variants to preferred terms
56
Metadata Full Text Indexing
57
Metadata Keyword Indexing
<HTML><HEAD> <TITLE>STARTREK.COM:The Official Star Trek Web Site!</TITLE> <META NAME='description' CONTENT='STARTREK.COM:The Official Star Trek Web Site! The starting point for all Star Trek information on the web.'> <META NAME='keywords' CONTENT='star trek, enterprise, james kirk, mister spock, seven of nine, doctor mccoy, captain sulu, borg, klingon, romulan, ferengi, human, starfleet command, delta quadrant, alpha quadrant, gamma quadrant, excelsior, paramount, voyager, deep space nine, captain sisko, jean luc picard, kathryn janeway, starfleet academy, united federation of planets'> <META NAME='author' CONTENT='Paramount Digital Entertainment'>
58
Partners/Competitors
Metadata CV Indexing Partners/Competitors UI ACCEPTED TERM LRID Variant Terms PC0004 Bell Atlantic BellAtlantic; Bell Atlantic / North; NYNEX; Nynex PC0091 NLG National Leisure Group PC0076 VH1 Video Hits 1; VH-1
59
Metadata Indexing Guidelines
Considerations Specificity: rule of specific entry Exhaustivity: number of terms per document Aboutness: strive for consistent interpretation Consistency: can be more important than quality Quality: balance against speed and consistency
60
Metadata Comparative Analysis
Full Text (extraction) High specificity enables precision (sometimes) Exhaustivity allows for high recall (sometimes) Keyword (assignment or extraction) Relatively low level of investment Selection of more relevant words / phrases may increase recall and precision (sometimes) Controlled Vocabulary (assignment) Synonym management increases recall Disambiguation increases precision (value increases with size, Medline > 6M documents) Enables hierarchical and “see also” browsing
61
Metadata Cost Analysis
62
Metadata Automated Indexing
Primary Benefit Save money (cost of manually classifying 1 journal article = $1.70) Approaches Term Extraction: extraction of “important” words and phrases (proximity, stemming) Latent Semantic Indexing: vector space approach (extracts meaning, training required) Desired Features Assign terms from controlled vocabularies Integrate with thesauri, database tools, etc. Handle multi-lingual collections
63
Metadata Automated Indexing
Software Categories & Labels Search Engines, Data Mining, Text Extraction, Knowledge Management, Automatic Classification, Meta-Tagging Leading Products Metacode’s Metatagger ( Mohomine ( Oingo ( InXight Categorizer ( Semio Taxonomy ( Inktomi / Ultraseek CCE (
64
Metadata Selecting a Strategy
Factors to Consider Manual Automated Cost (per document) High Low Speed Slow Fast Consistency Variable Quality Multimedia-Capable Yes No Intelligent (understand text and guidelines)
65
Section Break Thesauri in Context Value of Thesauri Methodology
Metadata Vocabulary Control Structure & Relationships Thesaurus Management Case Study Related Topics
66
Vocabulary Control Getting Started
Types Equivalence Hierarchical Associative (Broader) United States 2 (Variant) Vt 1 (Preferred) Vermont (Variant) Green Mountain State 3 (Related) Skiing (Narrower) Burlington (Related) Maple Syrup
67
Vocabulary Control Identify Terms
Published Reference Materials Thesauri, classification schemes, encyclopedias, dictionaries, glossaries, indexes Content Representative sample of web site / intranet Users Search log analysis, surveys, interviews Experts Authors, subject experts
68
Vocabulary Control Organize Terms
Define preferred terms Link synonyms and variants Group preferred terms by subject Identify broader and narrower terms Identify related terms Note: steps 3-5 are tentative designations and part of iterative process.
69
Vocabulary Control Form of Preferred Terms
Grammatical Form (noun, adjective, verb) Spelling (defined authority, house style) Singular & Plural Form (count nouns) Abbreviations & Acronyms (popular use) Considerations Stemming helps (but not for mouse/mice) Global guidelines / term-specific decisions Rules simplify decision-making Consistency enhances usability
70
Vocabulary Control Selection of Preferred Terms
ANSI/NISO Z 3.0 “Literary warrant (occurrence of terms in documents) is the guiding principle for selection of the preferred (term).” “Preferred terms should be selected to serve the needs of the majority of users.”
71
Vocabulary Control Definition of Terms
The meaning of the term must be deliberately restricted. Qualifiers (manage homographs) Cells (biology) / Cells (electric) Scope Notes (restrict meaning) Hamburger. SN: includes burgers made with beef. Otherwise use “Turkey Burger” or “Veggie Burger” Definition (clarify and educate) Trend towards integration of glossaries
72
Vocabulary Control Variant Terms
Variant terms provide the users with entry points into the vocabulary. Synonyms (same meaning) cats USE felines, helicopters USE whirlybirds Lexical Variants (different word forms) paediatrics USE pediatrics, BK USE Burger King Quasi-Synonyms (treated as equivalent) generic posting: beagle USE dog antonyms/continuum: wetness USE dryness
73
Vocabulary Control Recall and Precision
74
Vocabulary Control Term Specificity
Assuming a good entry vocabulary, increased term specificity allows for improved precision without hurting recall (but costs grow fast). Vocabulary A Vocabulary B United States United States California San Diego
75
Vocabulary Control Compound Terms
ANSI/NISO Z39.19. “Each descriptor…should represent a single concept.” ISO 2788. “It is a general rule that…compound terms should be factored (split) into simple elements.”
76
Vocabulary Control Compound Terms
Article: “Software for Information Architecture”
77
Section Break Thesauri in Context Value of Thesauri Methodology
Metadata Vocabulary Control Structure & Relationships Thesaurus Management Case Study Related Topics
78
Structure & Relationships
Types Bottom-up (semantic, term to term) Top-down (shape, classification) Semantic Relationships (reciprocity) Equivalence Hierarchical Associative
79
Structure & Relationships Semantic Relationships
(Broader) Cultural Landscapes (Synonym) Inhabited Places (Preferred) Settlements (Variant) Human Settlements (Related) Housing (Narrower) Ghost Towns (Related) Dwellings
80
Structure & Relationships Semantic Relationships
Equivalence Use/Used For (USE/UF) Leads from variants to preferred e.g., prams: USE baby carriages
81
Structure & Relationships Semantic Relationships
Hierarchical Broader Term/Narrower Term (BT/NT) Types Generic (class/species, inheritance) Vertebrata NT Amphibia Whole-Part (associative unless exclusive) Ear NT Vestibular Apparatus Instance (proper name) Seas NT Mediterranean Sea
82
Structure & Relationships Semantic Relationships
Associative Related Term (RT, See Also) Non-hierarchical and non-equivalent Relation should be “strongly implied” e.g., hammers RT nails
83
Structure & Relationships Associative Relationships
Examples Field of Study and Object of Study Forestry RT Forests Process and its Agent Temperature Control RT Thermostat Concepts and their Properties Poisons RT Toxicity Action and Product of Action Weaving RT Cloth Concepts Linked by Causal Dependence Bereavement RT Death
84
Structure & Relationships Classification Schemes
SN Hierarchical arrangement of terms. In navigation context, use Hierarchy. UF Categorization Taxonomy Ontology RT Hierarchy
85
Structure & Relationships Pre- & Post-Coordination
Enumerative Classification Schemes Pre-coordinate (more compound terms) All terms are enumerated (listed) in their entirety in the scheme. Library of Congress Classification Scheme Synthetic Classification Schemes Post-coordinate (more uni-terms) New terms can be created by combining terms during a search (AND). Art & Architecture Thesaurus
86
Structure & Relationships Pre- & Post-Coordination
In the highly enumerative LC Classification, “Groundwater - - Pollution” and “Soil pollution” are dispersed at indexing (high precision, low recall). Keyword searching improves recall, hurts precision (a synthetic band-aid, potential false drop on “soil purification standards”).
87
Structure & Relationships Polyhierarchy
Strict Hierarchies Each term appears in only one place in the hierarchy. Essential for placement of physical objects. Polyhierarchies Terms cross-listed in multiple categories. Accepts complex nature of reality.
88
Structure & Relationships Polyhierarchy
Medical Subject Headings (MeSH) Compound terms needed to manage 6 million documents in Medline. High level of pre-coordination forces polyhierarchy. Terms may have more than one BT.
89
Structure & Relationships Faceted Classification
Overview Invented by S.R. Ranganathan (1930s) Handle complex subjects (reality) One principle of division at a time Multiple “pure” taxonomies UF analytico-synthetic scheme, fielded database Facets Fundamental facets: personality, matter, energy, space, time Common facets: subject (about), geography (in), author (by whom) Art & Architecture Thesaurus, ASIS Thesaurus
90
Structure & Relationships Facets, Coordination, Specificity
91
Structure & Relationships Yahoo
Characteristics Single Facet (a topical hierarchy) Fairly Enumerative (search on “Boston” finds 45 categories including: Boston Celtics, Boston Tea Party, Anonymous Account of the Boston Massacre) Polyhierarchical (Computer listed under Computers & Internet and Science) Observations Huge number of categories and levels (unwieldy) Fits user expectations (where do I find this?)
92
Structure & Relationships ASIS Thesaurus
Characteristics Faceted (16 facets including document types, fields and disciplines, organizations, qualities) Fairly Synthetic (large percentage of one or two word single-concept descriptors) Polyhierarchical (machine aided indexing BT computer applications, BT indexing) Observations Faceted approach allows small number of terms to be combined in large number of unexpected ways (e.g., ambiguity and informatics) Presentation is not accessible to typical user
93
Structure & Relationships A Unification Theory
Hypothesis: This hybrid information architecture will become a common model for web sites and intranets over the next several years. Taxonomy single facet, enumerative Thesaurus faceted, synthetic fits user expectations (where did they put this?) fits content complexity (how can I describe this?) use for top few levels (familiar gateway to site) populate the hierarchy (combinations, see also) early user tests (best primary hierarchy) ongoing user tests (leverage power, flexibility) application of human expertise human-software hybrid (facet-specific solutions)
94
Section Break Thesauri in Context Value of Thesauri Methodology
Metadata Vocabulary Control Structure & Relationships Thesaurus Management Case Study Related Topics
95
Thesaurus Management What’s Involved?
Software, workflow, quality control Vocabularies evolve over time Impacts authors, indexers, users Vocabulary Maintenance Tasks Add, delete, enhance, normalize terms Overall evaluation
96
Thesaurus Management Software: What to Look For
Traditional database functionality Compliant with standards (ANSI, ISO) Relationship control (reciprocity, validation, orphan identification) Term status (proposed, provisional, accepted) Flexible output (alphabetical, hierarchical) Integration with related tools and tasks (indexing, searching, browsing) Willpower’s List of Thesaurus Software
97
Thesaurus Management Software: What You’ll Find
Standards-compliant, sophisticated, Poor integration (library-centric) Examples: Lexico, MultiTes Database Management Software Strong integration Less thesaurus-specific functionality Examples: Oracle (interMedia), Sybase (English Wizard)
98
Thesaurus Management Software What You’ll Find
Search Engines Watch for casual use of “thesaurus” Look for integration with browsing. Ultraseek Thesaurus Expansion for Queries: Administrators may put sets of synonyms in the thesaurus.txt file…When a query matches one of the terms in that file, the synonyms will automatically appear, so the user has the option to add it to the query. Verity Verity's core search products include the following advanced knowledge retrieval capabilities: advanced query expansion and disambiguation tools, including linguistic stemming and thesaurus expansion.
99
Section Break Thesauri in Context Value of Thesauri Methodology
Metadata Vocabulary Control Structure & Relationships Thesaurus Management Case Study Related Topics
100
Case Study Call Center Intranet
Introduction KM application 6,000 users (customer care associates) 8,000 documents (hierarchy, search) 6 month project (10/97 to 4/98) $500K of $10M redesign Goals Reduce training time / time to find Increase use / customer satisfaction
101
Case Study: Call Center Intranet Process Overview
Strategy Background, vocabulary, meetings, observation 4 weeks x 2.5 PM + 1 IA Design Bottom-up focus (doc types, fields, templates) 4 weeks x 2 PM + 2 IA 4 weeks x 1 IA (during implementation) Implementation Indexing / develop controlled vocabularies Specifications (authors, indexers, developers) 16 weeks x 4 indexers + 1 IA + 2 PM subject expert
102
Case Study: Call Center Intranet Controlled Vocabularies
Primary Vocabularies Partners/Competitors (122) Plans/Promotions (173) Products/Services (151 / 184 variants) Geographic Codes (51) Secondary Vocabularies Adjustment Codes (36) Corporate Terminology (70) Time Codes (12)
103
Case Study: Call Center Intranet Primary Vocabularies
Partners/Competitors UI ACCEPTED TERM LRID Variant Terms PC0004 Bell Atlantic BellAtlantic; Bell Atlantic / North; NYNEX; Nynex PC0091 NLG National Leisure Group PC0076 VH1 Video Hits 1; VH-1
104
Case Study: Call Center Intranet Primary Vocabularies
Products/Services UI Accepted Term LRID Variant Terms PS0135 Access Dialing 10-288; ; dial around PS0006 Air Miles AirMiles PS0151 XYZ Direct USADirect; XYZ USA Direct; XYZDirect card
105
Case Study: Call Center Intranet Primary Vocabularies
Geographic Codes CT Connecticut DE Delaware DC District of Columbia; Dist. of Columbia; Dist. Columbia Note:Continental U.S. is equivalent to the lower 48 states.
106
Case Study: Call Center Intranet Secondary Vocabularies
Adjustment Codes DAK Denies All Knowledge - MOS Monthly Service Charge Mnthly. Service Charge; Mnthly. Svc. Charge; Monthly Svc. Charge WNO Wrong Number WTN Working Telephone Number Working Tele. Number
107
Case Study: Call Center Intranet Secondary Vocabularies
Corporate Terminology Billed Telephone Number (BTN) Billed Tele. Number Cross Boundary Account Foreign Account Fraud - Multi Level Marketing Multi-Level Marketing; MultiLevel Marketing; MLM World Wide Web WWW; WorldWideWeb
108
Case Study: Call Center Intranet Blueprints
109
Case Study: Call Center Intranet Wireframes: Content
110
Case Study: Call Center Intranet Wireframes: Browsable Index
Provides ability to view all documents tagged with same preferred term. Ability to combine fields for powerful search/browse.
111
Case Study: Call Center Intranet Deliverables Overview
Blueprints and Wireframes Controlled Vocabularies Authoring & Indexing Guidelines Indexed Documents (4,000) Functional Specifications Documentation & Training
112
Section Break Thesauri in Context Value of Thesauri Methodology
Metadata Vocabulary Control Structure & Relationships Thesaurus Management Case Study Related Topics
113
Related Topics Multi-Lingual Thesauri
Concepts Source / Target Language Degrees of Equivalence Localization, not Globalization Facts (from The Mother Tongue by Bill Bryson) There are now more students of English in China than there are people in the United States The French can’t distinguish house and home Finnish has 15 case forms (noun variants) The Eskimos have 50 words for types of snow but no word that just means snow A blizzard in England is a flurry in Nebraska
114
Related Topics The List Goes On…
Thesauri AND Business Strategy Content Management Markup Languages Notation XML
115
Seminar Review Thesauri in Context Value of Thesauri Methodology
Metadata Vocabulary Control Structure & Relationships Thesaurus Management Case Study Related Topics
116
How To Learn More http://argus-acia.com/seminars/
Argus Center for Information Architecture Web Site Newsletter Strange Connections, Events, Interviews Thesaurus Resources & Examples user name and password both = “lajolla”
117
Contact Us Argus Associates, Inc. 912 North Main Street
Ann Arbor, Michigan 48104 (734) Sales Employment Web Sites
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.