Presentation on theme: "Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group."— Presentation transcript:
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group
IBM Software Group Agenda Benefits, business and technical A few definitions Planning Issues Measuring value Futures Q&A
IBM Software Group The Mantra Knowledge is in the eye of the beholder, but reflecting end user needs is as critical as representing texts....and it takes work!
IBM Software Group Business Benefits Mergers and acquisitions Research and development Industries: Consulting Pharmaceuticals Financial services Legal If only I could find information to help me do my job better...
IBM Software Group Technical Benefits Site creation Navigation/search Personalization Defining areas of expertise
IBM Software Group “The science, laws or principles of classification” (From the Greek: rules of arrangement) Biology (Linnaeus) Education (Bloom) A hierarchical collection of categories and documents Structure and content Definitions: Taxonomy
IBM Software Group Definitions: Directory More general than taxonomy Natural structure Wide vs deep Category structure less controlled File system Yahoo (http://www.yahoo.com) Yellow Pages Corporate Web sites (http://www.ibm.com)
IBM Software Group Controlled vocabulary Subject headings, labels Synonyms (U, UF) Relation types (TT, BT, NT,SN, HN, RT, SA) Examples: http://www.loc.gov/flicc/wg/taxonomy. html Definitions: Thesaurus
IBM Software Group Definitions: Meta-data and tagging Meta-data Properties, attributes: information describing types of data [Crandall] The ‘energy’ required to keep things organized [Earley] Tagging, Document Properties
IBM Software Group Analyzing documents and assigning them to predefined categories Rule-based vs natural Classification schemes Dewey Library of Congress Industry-specific Definitions: Classification
IBM Software Group Definitions: Clustering Clustering Automatically generating groups of similar documents based on distance or proximity measures "Bags of words" Vector analysis determines boundaries Adaptive, but not abstract
IBM Software Group Develop a Plan Determine user information needs Information audit, Content audit Select appropriate sources Create initial taxonomy Edit categories Categorize new documents Test the UI Train the taxonomy
IBM Software Group Plan: Information audit What is the objective of the system? Who owns the project? What do users need? What do content creators need? What do system managers need?
IBM Software Group Plan: Content audit Is there an existing taxonomy? How clean is the meta-data? Is the content suited to automatic classification techniques? Good example: Notes discussion databases Not-so-good example: Web site with little text, lots of links Is a subset of a source better than the whole?
IBM Software Group Plan: Select sources Which sources? Who owns them? Which sources do users access most often? How do users access these sources? What is the lifecycle of the content? Who identifies the most current content?
IBM Software Group Resources Centralized or department-level Who decides when new content is added? Term approval process How do new concepts get into the taxonomy? Plan: Maintenance
IBM Software Group Identify issues Getting user involvement and buy-in Maintenance resources Directory versus taxonomy Meta-data Globalization and regionalization Hidden vs published taxonomies
IBM Software Group Understand the BIG issues Organizational “perfection complex” [Chait] Multiple taxonomies Automated versus manual categorization
IBM Software Group Multiple taxonomies Many editors Term approval process, synonyms Standard tools across the enterprise Federated taxonomies Taxonomy links, “cross-connections,” facets, views Taxonomy mapping
IBM Software Group
Measuring value NCR Corporation - Support Organization Needed to convince organization of the value of captured content Managers resisted diverting resources to maintaining content Current measure: Time per incident How could the value of a knowledge classification system be demonstrated?
IBM Software Group Measuring value NCR developed a new parameter: Knowledge helpful (the answer was in the support database and was used to solve the problem) Knowledge not effective (the answer sent them in the wrong direction, did not help to address the issue) Knowledge not available (nothing available to assist in solving the problem) Knowledge not required (problem solved without the use of the knowledge base)
IBM Software Group Futures Methods: Feature extraction, statistical analysis, rules-based, label generation Starter taxonomies, imports Taxonomy mapping Interfaces: Visualization, better training tools