Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City, May.

Similar presentations


Presentation on theme: "Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City, May."— Presentation transcript:

1 Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City, May 21, 2006

2 Copyright © 2006 Access Innovations, Inc. 2 So what’s a taxonomy? Words – controlled vocabulary Used as labels for indexing – descriptive metadata Attached to documents, digital objects, or physical objects Organized to aid retrieval – hierarchical structure –Hierarchical presentation of a thesaurus

3 Copyright © 2006 Access Innovations, Inc. 3 Perspectives on taxonomies Taxonomist (aka Lexicographer, Thesaurus builder) Information architect Indexer Searcher Each has a different view and need for words in retrieving information. Each need relates to using a taxonomy for indexing.

4 Copyright © 2006 Access Innovations, Inc. 4 Taxonomies for information retrieval online Conceptual framework for web content – reflects organization of knowledge in a domain Foundation for information architecture Often 3 levels deep – depends on domain May be hidden or displayed

5 Copyright © 2006 Access Innovations, Inc. 5 Info retrieval starts with a knowledge organization system Uncontrolled list Name authority file Synonym set/ring Controlled vocabulary Taxonomy Thesaurus Ontology Semantic network Not complex Highly complex LOTS OF OVERLAP!

6 Copyright © 2006 Access Innovations, Inc. 6 Structure of controlled vocabularies List of words Synonyms Taxonomy Thesaurus Ambiguity control Ambiguity control Ambiguity cont’l Synonym control Synonym control Synonym cont’l Hierarchical rel’s Hierarchical rel’s Associative rel’s INCREASING COMPLEXITY

7 Copyright © 2006 Access Innovations, Inc. 7 Controlled vocabulary construction standards ANSI (American National Standards Institute) NISO (National Information Standards Organization) ISO (International Standards Organization) BS (British Standards Institute) Differences are minor and diminishing. ANSI/NISO Z39-19.2005 revision approved.

8 Copyright © 2006 Access Innovations, Inc. 8 Taxonomy defined – ANSI/NISO Z39.19-2005* “A controlled vocabulary consisting of preferred terms all of which are connected in a hierarchy or polyhierarchy.” controlled Missing: equivalence, homographic, and associative relationships and notes – features of a THESAURUS. * http://www.niso.org/standards/resources/Z39-19-2005.pdf hierarchy

9 Copyright © 2006 Access Innovations, Inc. 9 Taxonomy as an organization system Controlled vocabulary Hierarchical format –Parent-child relationships Specific items appear as final leaves on hierarchy branches Common on websites –Pick list –Browsable directory –Other variations

10 Copyright © 2006 Access Innovations, Inc. 10 Thesaurus as an organization system Controlled vocabulary Focus on conceptual classes, not specifics Hierarchy – implicit if not displayed –Parent-child relationships Various display formats may be available Network of relationships between terms helps user to find information –Cousins, friends, aliases Scope notes, term history More elaborate and informative Long established standards

11 Copyright © 2006 Access Innovations, Inc. 11 Thesaurus defined – ANSI/NISO Z39.19-1993, -2005 “A controlled vocabulary of terms in natural language that are designed for postcoordination...” “Terms are arranged…so that various relationships are displayed clearly…” “The controlled vocabulary is established by information specialists or lexicographers and is generally employed in indexing.”

12 Copyright © 2006 Access Innovations, Inc. 12 Thesaurus defined – ANSI/NISO Z39.19-2005 “A controlled vocabulary arranged in a known order in which equivalence, homographic, hierarchical, and associative relationships among terms are clearly displayed and identified by standardized relationship indicators, which must be employed reciprocally. Its purposes are to promote consistency in the indexing of content objects, especially for postcoordinated information storage and retrieval systems, and to facilitate browsing and searching by linking entry terms with terms. Thesauri may also facilitate the retrieval of content objects in free text searching.”

13 Copyright © 2006 Access Innovations, Inc. 13 Standards and pragmatism Standards are your friends –Lead to richer, more informative product –Promote interoperability -- Allow you to adopt or adapt other controlled vocabularies –Promote predictability –Allow repurposing within your organization and by other organizations Follow standards for taxonomy building –Incorporate authority files / final nodes as needed Your taxonomy or thesaurus must meet your needs

14 Copyright © 2006 Access Innovations, Inc. 14 Your taxonomy / thesaurus end product Reflects –scope of your concern –degree of precision you need Facilitates –data storage and retrieval by vocabulary control –discovery of ideas Promotes learning –preferred terminology –relationships among concepts –organized guide to your field

15 Copyright © 2006 Access Innovations, Inc. 15 Talk about terms and taxonomies How to choose terms How to ensure term clarity, avoid ambiguity –Vocabulary control—why and how How to format terms Terms within a taxonomy—the big picture

16 Copyright © 2006 Access Innovations, Inc. 16 How do you choose terms? Importance in the subject area Use in the literature, by the organization or community Necessary degree of specificity or detail Relationship with other controlled vocabularies

17 Copyright © 2006 Access Innovations, Inc. 17 Vocabulary control – why? “The need for vocabulary control arises from two basic features of natural language, namely: two or more words or terms can be used to represent a single concept, and two or more words that have the same spelling can represent different concepts.” ANSI/NISO Z39.19-2005

18 Copyright © 2006 Access Innovations, Inc. 18 Vocabulary control through disambiguation Synonyms – de-duplicate meanings Multiple words for the same concept –President of the United States, POTUS –Biological technology, Biotech Homographs (polysemes) – eliminate ambiguity Same written word used for multiple meanings –Balloon—which kind?, Box—which kind? –Cells, Mercury, Records, Bridge/Bridges, Bush

19 Copyright © 2006 Access Innovations, Inc. 19 Vocabulary control – how? Organize terms to show which of two or more synonymous terms is preferred or authorized for use to distinguish between homographs to indicate hierarchical and associative relationships among terms

20 Copyright © 2006 Access Innovations, Inc. 20 Vocabulary control – in practice Use unambiguous terms, clear to the user group Distinguish between terms that appear similar Use Scope Notes when necessary Use terms as elements that can be coordinated in a flexible manner Create compound terms (noun+modifier) when necessary

21 Copyright © 2006 Access Innovations, Inc. 21 One term / one concept “Terms in a thesaurus should represent simple or unitary concepts…” (ISO standard) “Each descriptor included in a thesaurus should represent a single concept (or unit of thought). …frequently expressed by a single-word term but in many cases a multiword term is required.” (ANSI/NISO Z39.19-2005)

22 Copyright © 2006 Access Innovations, Inc. 22 A “term” synonym ring Term Node Subject heading Category Descriptor

23 Copyright © 2006 Access Innovations, Inc. 23 So what’s a concept? “A unit of thought, formed by mentally combining some or all of the characteristics of a concrete or abstract, real or imaginary object. Concepts exist in the mind as abstract entities independent of terms used to express them.” Three main categories –Abstract concepts –Concrete entities –Proper nouns

24 Copyright © 2006 Access Innovations, Inc. 24 Concrete entities as terms Things and their physical parts –primates head –buildings floors Materials –cement –wood –lead

25 Copyright © 2006 Access Innovations, Inc. 25 Abstract concepts as terms Actions and events –evolution, skating, management, ceremonies Abstract entitites –law, theory Properties of things, materials, and actions –strength, efficiency Disciplines and sciences –physics, meteorology, mathematics Units of measurement –pounds, kilograms, miles, meters, nanoseconds

26 Copyright © 2006 Access Innovations, Inc. 26 Proper nouns as terms Individual entities – “classes of one” – expressed as proper nouns –San Francisco, Lake Michigan Thesaurus standards prefer to exclude proper names, persons, and trade names. Extensive lists  authority files. Taxonomies include them as final nodes.

27 Copyright © 2006 Access Innovations, Inc. 27 Pop quiz – which qualify as terms? rooms living rooms living room furniture “single unit of thought” schools public schools public school curricula marketing and advertising societal issues information ethics, plagiarism, credibility information literacy, lifelong learning

28 Copyright © 2006 Access Innovations, Inc. 28 The term record Main Term (MT) Top Term (TT) Broader Terms (BT) Narrower Terms (NT) Related Terms (RT) –See also (SA) Scope Note (SN) History (H) NonPreferred Term (NP) –Used for (UF), See (S) see Lexicographer’s lexicon = subject term, heading, node, category, descriptor, class TAXONOMY THESAURUS

29 Copyright © 2006 Access Innovations, Inc. 29 Build a taxonomy – simple steps Get paper and pencil –Sharpen pencil Define subject field Collect terms Organize terms Fill in gaps Flesh out and interrelate terms You’re done!

30 Copyright © 2006 Access Innovations, Inc. 30 Define subject field Review representative collection of content Determine: –Core areas –Peripheral topics Psychology Education Sociology Law Scope can be modified later

31 Copyright © 2006 Access Innovations, Inc. 31 Before you go on: Build or buy? Survey existing thesaurus/taxonomy resources for your domain Test for –Scope –Depth Make-or-break terms –Cost Don’t reinvent the wheel!

32 Copyright © 2006 Access Innovations, Inc. 32 Collect terms Your documents and databases Departmental terminology Text books and their indexes (indices) Book tables of contents and indexes Journal quarterly indexes Encyclopediae Lexicons, glossaries on the topic Web resources Users and experts Search logs

33 Copyright © 2006 Access Innovations, Inc. 33 Gather terms from search logs Beyond the Spider: The Accidental Thesaurus (Richard Wiggins, Information Today, Oct 2002) Top ~100 search terms from search logs Match to web site with appropriate answer Basis for favorites or best bets, presented at the top of results list. (AKA behavior-based taxonomy) Not a thesaurus or taxonomy, but still a useful source of terms.

34 Copyright © 2006 Access Innovations, Inc. 34 Organize terms – roughly Sort terms into several major categories – logical groups of similar concepts as Top Terms –Identify core areas and peripheral topics –10 – 20 to start –Consider moving proper names to authority files Result: loose collection of terms under several main headings –Rough and tentative – see how it fits as you go –Initial gap analysis –Add / modify / delete as needed

35 Copyright © 2006 Access Innovations, Inc. 35 Labelling a concept – cognitive linguistics Most-used labels are middle in range from abstract to specific --- relates to search Linguistic universal – true across cultures Unique beginner Life form Generic Specific Varietal Insurance Health insurance Group health insurance Practical application?

36 Copyright © 2006 Access Innovations, Inc. 36 Craft the Top Terms Toughest job and most important step! Dictates further organization Determines how browsers/searchers perceive the taxonomy –Coverage –Formality Establish the concept first, tweak the wording later

37 Copyright © 2006 Access Innovations, Inc. 37 Usefulness of a term – the “duh” factor Some terms are so basic for a domain that they have little or no value –“Sports” in Sports Illustrated –“Technology” in Technology Review –“Golf” in Golf Magazine How useful will the term be for indexing? –Apply to everything in the domain? –Distinguish important concepts? –If term is needed, specify limited use conditions in Scope Note

38 Copyright © 2006 Access Innovations, Inc. 38 Hierarchy structures – variations on a theme Not pre-determined –Wines  type  variety  region  cost –Or Wines  cost  type…. Varies by user group and needs –May have multiple views of same content –Standard alpha view or customized notation Affects information architecture, i.e. how web site functions

39 Copyright © 2006 Access Innovations, Inc. 39 How do terms relate? Hierarchical relationships -- Parents and their children Equivalence relationships -- Aliases Associative relationships -- Cousins TAXONOMY THESAURUS

40 Copyright © 2006 Access Innovations, Inc. 40 Hierarchical relationships Broader Term represents the category Narrower Term represents the specific Three types: –Generic relationship (BTG/NTG) –Whole-part relationship (BTP/NTP) –Instance relationship (BTI/NTI) BTs/NTs have a reciprocal relationship

41 Copyright © 2006 Access Innovations, Inc. 41 Broader to Narrower Terms Gubernatorial elections Politics Elections Presidential elections Mayoral elections Generic Specific Varietal

42 Copyright © 2006 Access Innovations, Inc. 42 Hierarchy – Generic (genus-species) relationship Inheritance or inclusion – what’s true of the parent (BT) is true for all children (NTs) Applies to entities, actions, properties, agents – not just biological taxonomies ValueTeachersThinking Cultural value Adult educators Contemplation Economic value School teachers Divergent thinking Moral value Special ed teachers Lateral thinking Social value Student teachers Reasoning

43 Copyright © 2006 Access Innovations, Inc. 43 Generic relationship test – 1 Both terms in same fundamental category “All-and-some” test SOMEALL SOMENOT ALL Rodents Squirrels Pests Squirrels

44 Copyright © 2006 Access Innovations, Inc. 44 Generic relationship test – 2 Pests Squirrels Rodents ALL squirrels are rodents x NOT ALL squirrels are pests x NOT ALL pests are rodents

45 Copyright © 2006 Access Innovations, Inc. 45 Hierarchy – Whole-part relationship Also known as meronymy or partonomy Four types allowed in thesaurus standards –Body systems and organs Ear  Middle ear –Geographical locations Bernalillo County  Albuquerque –Fields of study Geology  Physical geology –Hierarchical organizational/corporate/social/political structures Diocese  Parish

46 Copyright © 2006 Access Innovations, Inc. 46 Hierarchy – Instance relationship General category (common noun) = BT Individual example (proper noun) = NT SeasNew York museums Baltic Sea Guggenheim Museum Caspian Sea Museum of Modern Art Mediterranean Sea Museum of Natural History Essentially identical to “final node” in taxonomies. Best practice: long list  move to authority file

47 Copyright © 2006 Access Innovations, Inc. 47 Polyhierarchical relationship Term can logically fit under more than one Broader Term – can have Multiple Broader Terms (MBT) New to ANSI/NISO standards SpoonsForks Sporks NursesHealth administrators Nurse administrators FinanceCareers Accounting

48 Copyright © 2006 Access Innovations, Inc. 48 Equivalence relationship Preferred Term –Thesaurus term and valid for indexing –Thesaurus notation: USE NonPreferred Term –Not valid for indexing –An alias or imposter –Entry point, directs user to Preferred Term –Thesaurus notation: UF or NPT SpidersPlant pathology UF Arachnids USE Phytopathology

49 Copyright © 2006 Access Innovations, Inc. 49 Equivalence – when to use Synonyms, slang, quasi-synonyms Scientific and trade names –IbubrofenUF Motrin™ Lexical variants –Fiber opticsUF Fibre optics –MouseUF Mice Upward posting of narrow concepts not specified in taxonomy or thesaurus –Social classUF Elite, Middle class, Working class Get equivalent terms from search logs, brainstorming…

50 Copyright © 2006 Access Innovations, Inc. 50 Associative relationship Related Terms (RTs) ~ cousins “…terms related conceptually but not hierarchically, and are not part of an equivalence set” (i.e. not synonyms) –Should siblings be Related Terms?? Both terms are valid thesaurus terms for indexing, and have reciprocal relationship Expands user’s awareness, reflects thesaurus coverage of unanticipated areas Standards describe specific types (see Lexicon)

51 Copyright © 2006 Access Innovations, Inc. 51 Sibling rivalry and facets Format and sense of sibling terms should be consistent If siblings don’t coexist well, separate them Subdivide large groups of terms into facets, mutually exclusive subcategories Growing demand with faceted navigation Facet examples –Properties, Materials, Agents, Actions, Influence –Objects, Styles and periods, Color, Shape (Art & Architecture Thesaurus)

52 Copyright © 2006 Access Innovations, Inc. 52 Faceted classification Pharmaceuticals –(by action) Anti-inflammatory agents… –(by chemical structure) Alkaloids… –(by indication) Pain… –(by use) Immunosuppression… Facet indicators (aka Node labels), not to be used for indexing

53 Copyright © 2006 Access Innovations, Inc. 53 Faceting challenge Paint –Oil paint –High-gloss paint –Interior paint –Matte paint –Latex paint –Semi-gloss paint –Exterior paint Propose facet indicators and subgroup these paint varieties into facets.

54 Copyright © 2006 Access Innovations, Inc. 54 Scope Notes (SN) Indicate meaning of the term in the context of this thesaurus, for this audience –Stress – Metal, Psychological, Physiological Indicate any restriction in meaning Indicate range of topics covered Provide direction for indexers; for terms often confused, may suggest an alternative term Use only as needed – not for every term Establish and stick with consistent format Be concise

55 Copyright © 2006 Access Innovations, Inc. 55 Evaluating terms Do terms represent all necessary concepts? –Gap analysis Do terms capture necessary details? –Level of granularity Are terms understood by users? –Domain expert vs. common user

56 Copyright © 2006 Access Innovations, Inc. 56 Talk about terms Term format Grammatical issues Singular and plural forms Spelling Abbreviations and acronyms Capitalization Other punctuation Consistency

57 Copyright © 2006 Access Innovations, Inc. 57 Term format KISS – Keep it short and simple –1-2-3 words Effect on search Factoring, Postcoordination (coming) Grammatical issues –Nouns and noun phrases –Verbish things –Adjectives –Adverbs –Initial articles

58 Copyright © 2006 Access Innovations, Inc. 58 Most terms are nouns Nouns or simple noun phrases (phrase = compound or bound term) –Adj + Noun – Art history (ANSI/NISO standard) Noun + Prep + Noun – History of art (ISO standard) –Exceptions – Burden of proof, Coats of arms, Prisoners of war, Birds of prey, etc.

59 Copyright © 2006 Access Innovations, Inc. 59 Other parts of speech Verbs –Gerund form: Fishing Adjectives –Not used in isolation –Very rare (lots in Art & Architecture Thesaurus) –OK when combined with another term – Dental bridges Adverbs –No, except as part of proper name – Very Large Array Articles –No, except as part of proper name – El Salvador, Le Mans

60 Copyright © 2006 Access Innovations, Inc. 60 Singular and plural forms Plural form for count nouns –“how many” clouds, animals, highways Singular form for mass nouns –“how much” security, oxygen, rain Exceptions –Body parts in medicine  singular (heart, foot) –Unique entities  singular (Brooklyn Bridge) –User warrant  plural/singular (fishes) stocks? fishes? monies?

61 Copyright © 2006 Access Innovations, Inc. 61 Term spelling Preferred spelling depends on audience –Multinational company may need alternative spellings in same taxonomy Use most widely accepted spelling Use secondary spelling as NonPreferred Term (synonym) Exception: –Proper names – Labour Party

62 Copyright © 2006 Access Innovations, Inc. 62 Abbreviations and acronyms Use only when full form is rarely seen – SCUBA, LASER, DNA, LASIK Use full form if abbreviation is not widely used and understood –Automated teller machines – for ATM –Driving while intoxicated – for DWI Alternative becomes NonPreferred Term Use and acceptance always shifting Be consistent

63 Copyright © 2006 Access Innovations, Inc. 63 CapitalizationCapitalization Standards: use all lower case –Exceptions: Initialisms – DNA Proper names – Queen Mary Trade names – Thesaurus Master™ Taxonomic names – Homo sapiens Much variation in practice

64 Copyright © 2006 Access Innovations, Inc. 64 ParenthesesParentheses Use only for –Parenthetical qualifiers to disambiguate homographs Bridges (Dentistry), Bridges (Roadways), Bridges (Music) –Different meanings for singular / plural word forms Bridges [all the above] vs. Bridge (Card game) Wood (Material) vs. Woods (Forest) Damage (Injury) vs. Damages (Law) –Facet indicators – Paint (by finish) –Part of the term – benzo(a)pyrene –Trademark indicator (tm) becomes ™

65 Copyright © 2006 Access Innovations, Inc. 65 HyphensHyphens Generally avoid -- nonfiction Use only if –Omitting the hyphen would be ambiguous cocitation vs. co-occurrence –The hyphen is part of the term n-body problem p-benzoquinone CD-ROM

66 Copyright © 2006 Access Innovations, Inc. 66 Other punctuation bits Apostrophes –Keep for possessive case Diacritical marks –Keep if possible – Québec Other random marks –Keep if part of a proper name – A&W Root Beer Standard & Poors

67 Copyright © 2006 Access Innovations, Inc. 67 Compound terms (aka bound terms) and factored terms Term consisting of more than one word that represents a single concept Keep compound term or factor out (split)?

68 Copyright © 2006 Access Innovations, Inc. 68 Compound terms are precoordinated Elements are bound together to specify a concept at the indexing stage Can’t change the parts Water pollution Library science Television influence on preschoolers Chicken dinner with turnips and rutabagas- no substitutions of menu items!

69 Copyright © 2006 Access Innovations, Inc. 69 Factored terms can be Postcoordinated Elements can be strung together to specify a concept at the search stage Elements can be mixed and combined as needed –Few clothing pieces  several outfits The sum of the elements reflects the concept (usually)

70 Copyright © 2006 Access Innovations, Inc. 70 To factor or not to factor Is each factor a single concept? Is each factor in your thesaurus? If YES, break term down to factors: California highway construction  California + Highways + Construction If NO, or if factoring would be confusing, retain the compound term Children’s television  Television + Children ?? Science library  Library + Science ??

71 Copyright © 2006 Access Innovations, Inc. 71 Precoordination positives User expectations – Rapid transit –Occurs commonly in data –Splitting would be odd –Reflects a single concept for the audience Better accuracy – captures specific concepts precisely Fewer false drops Term information is retained (Related Terms, NonPreferred Terms, Scope Notes, …)

72 Copyright © 2006 Access Innovations, Inc. 72 Precoordination negatives Poorer total recall Term proliferation –Combinations and permutations increase thesaurus size Higher cost Limited flexibility in expressing new concepts

73 Copyright © 2006 Access Innovations, Inc. 73 Postcoordination pros and cons Higher recall Lower cost Greater flexibility – enables expression of new concepts through novel combinations x Lower accuracy, some false drops –Library scienceNOT = Library + Science –Art museums NOT = Art + Museums Postcoordination is implicit in most online searches (implied AND between search words)

74 Copyright © 2006 Access Innovations, Inc. 74 About “and” Avoid “and” in terms – not a single concept Instead of: Children and television Factor and postcoordinate USE Media influence + Television + Children “and” OK when both elements are members of a broader class Vessels Ships and boats Your need for granularity may dictate your choice

75 Copyright © 2006 Access Innovations, Inc. 75 So far you’ve got Hierarchy Complete term records –Broader and Narrower Terms Polyhierarchies when needed –Preferred/NonPreferred Terms (equivalence relationships) –Related Terms (associative relationships) –Scope Notes –Correct term format –Compound terms when needed

76 Copyright © 2006 Access Innovations, Inc. 76 NotationNotation Symbols (numbers, letters, hyphens, colons…) –1: Apples 1.1: Granny Smith 1.2: Winesap Another kind of ordering (non-alphabetic) –Chronological, positional, numeric sequence, or other logical sequence for user group –Same terms presented differently –Different user groups, different purposes Adjunct to verbal expression of term Secondary to verbal concept organization

77 Copyright © 2006 Access Innovations, Inc. 77 Review, edit, test, edit, use, edit, and maintain, i.e. edit Review –Users –Expert reviewers Test –Index 500+ documents (more for variable writing style; fewer for strict style) –Monitor search log Edit and maintain –Add term –Change existing term –Change term status –Delete term –Add term relationship –Delete term relationship –Add/modify Scope Note –Change overall structure Consider machine automated / assisted indexing software

78 Copyright © 2006 Access Innovations, Inc. 78 Automatic taxonomy construction Words and phrases from documents Based on frequency and co-occurrence of words No semantic analysis Produces list of possible terms Requires editorial analysis –hierarchical and conceptual organization –association of related concepts –identifying and deduplicating equivalent concepts

79 Copyright © 2006 Access Innovations, Inc. 79 Show ‘em what you’ve got – displays for every user Thesaurus/taxonomy views and functions depend on audience and purpose –taxonomists –indexers –corporate workers –public searchers

80 Copyright © 2006 Access Innovations, Inc. 80 For the taxonomist Hierarchy view Alphabetic view Permuted (KWIC) view Single term record view Graphical view Notational view Deleted terms Candidate terms Retrieve term record Find term in hierarchy view Taxonomists NEED MOST and WANT even MORE!

81 Hierarchy Alphabetical Permuted (KWIC) Term record

82 Notation view

83 Copyright © 2006 Access Innovations, Inc. 83 For the indexer Search to retrieve term record Access to Scope Notes, Related Terms, NonPreferred Terms Hierarchy view for the big picture Automated proposal of indexing terms

84

85 Copyright © 2006 Access Innovations, Inc. 85 For the searcher Browsable directory (Yahoo.com, MediaSleuth.com) Faceted navigation (MOMA.org, LandsEnd.com) Alpha term list or terms grouped by letter Drop down list with selected terms Portal view – complete or partial taxonomy –Display terms may be identical to taxonomy terms –Display terms may be variants, mapped to taxonomy terms Taxonomy may not be accessible – requires random guessing

86 Display taxonomy categories Results from sample of 1,100 documents (not all categories are populated)

87 Copyright © 2006 Access Innovations, Inc. 87 Reveal Narrower Terms

88 Copyright © 2006 Access Innovations, Inc. 88 Select taxonomy category to display titles

89 Copyright © 2006 Access Innovations, Inc. 89 Access full bibliographic record

90 Copyright © 2006 Access Innovations, Inc. 90 Faceted navigation

91 Copyright © 2006 Access Innovations, Inc. 91 SLA website and thesaurus

92 Copyright © 2006 Access Innovations, Inc. 92 SLA search

93 Copyright © 2006 Access Innovations, Inc. 93 Search query: THESAURUS Precision search based on M.A.I. indexing: 3 hits Free text, no indexing  0 hits Concept indexing – effect on retrieval

94 Copyright © 2006 Access Innovations, Inc. 94

95 Copyright © 2006 Access Innovations, Inc. 95 Leverage taxonomy term information to aid search Search: kangaroo Broader Terms Narrower Terms Related Terms Use (synonyms)

96 Copyright © 2006 Access Innovations, Inc. 96 Indexing rule Term record

97 Copyright © 2006 Access Innovations, Inc. 97 What we’ve covered Taxonomy – from different perspectives Collecting and organizing concepts Term choice and vocabulary control Taxonomy structure Term relationships Term format Factored and compound terms Constructing a simple taxonomy Display variations for different users

98 Copyright © 2006 Access Innovations, Inc. 98 “The Computer and the Poet” “The biggest single need in computer technology is not for improved circuitry, or enlarged capacity, or prolonged memory, or miniaturized containers, but for better questions and better use of answers.” Norman Cousins, editorial in The Saturday Review, July 23, 1966 special issue on “The New Computer Age” Through taxonomies, effectively applied through indexing, we aim to efficiently connect the questions and the answers.

99 Copyright © 2006 Access Innovations, Inc. 99 Thanks for your attention! Alice Redmond-Neal ared@accessinn.com Access Innovations, Inc. www.AccessInn.com Data Harmony software www.DataHarmony.com Questions? Comments?


Download ppt "Copyright © 2006 Access Innovations, Inc. 1 Building Taxonomies Alice Redmond-Neal Access Innovations, Inc. Enterprise Search Summit New York City, May."

Similar presentations


Ads by Google