Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group, wpohs@us.ibm.com Infotoday 2003 Content Management Symposium May 8, 2003 11/8/2018
Agenda Benefits, business and technical Definitions Planning and Implementation Issues Futures Q&A 11/8/2018
The Mantra Knowledge is in the eye of the beholder, but reflecting end user needs is as critical as representing texts....and it takes work! 11/8/2018
Business Benefits - Lifecycle Integration eLearning Technology, Government, Pharmaceutical Regulatory Compliance Pharmaceutical, Government Corporate accountability Financial, Life Sciences Intellectual Capital Management Consulting, Law firms, Financial Innovation, Discovery Government, Pharmaceutical, Retail, Technology 11/8/2018
Technical Benefits Integration with content management systems Site creation Site navigation Enhance full text search Gap analysis Personalization Defining skills, areas of expertise 11/8/2018
Definitions: Taxonomy “The science, laws or principles of classification” (From the Greek: rules of arrangement) Biology (Linnaeus) Education (Bloom) A hierarchical collection of categories and documents Structure and content 11/8/2018
Definitions: Lexicon A word book or dictionary Vocabulary of a particular field of study Keywords, synonyms, jargon 11/8/2018
Definitions: Directory More general than taxonomy Natural structure Wide vs deep Category structure less controlled File system Yahoo (http://www.yahoo.com) Yellow Pages Corporate Web sites (http://www.ibm.com) 11/8/2018
Definitions: Thesaurus Controlled vocabulary Subject headings, labels Synonyms (U, UF) Relation types (TT, BT, NT,SN, HN, RT, SA) Examples: http://www.loc.gov/flicc/wg/taxonomy.html 11/8/2018
Definitions: Meta-data and Tags Properties, attributes: information describing types of data [Crandall] The ‘energy’ required to keep things organized [Earley] Tags <META>, <Source> Document Properties $CreatedBy 11/8/2018
Definitions: Classification Analyzing documents and assigning them to predefined categories Rule-based vs natural Statistical vs semantic Classification schemes Dewey Library of Congress Industry-specific 11/8/2018
Planning: Initial Analysis Determine user needs thru content, knowledge audits What is the objective of the system? What are typical "day in the life" scenarios? Do you need to comply with existing standards? Select representative content No need to include every document in every source Look for a subset of documents with Good meta-data (Titles, Authors) Rich, representative body text 11/8/2018
Implementation: Creation and strategy Create an initial taxonomy On paper, on a whiteboard, in a spreadsheet Look at existing databases, Web sites, org charts Reuse good, representative categories Prototype taxo structure (flat, hierarchy, associative) Review the initial taxonomy Determine a categorization strategy Rules-based, keyword-based, statistical, others Review taxonomy creation, content management tools Consider resource requirements for taxonomy maintenance 11/8/2018
Implementation: Testing and Maintenance Test the taxonomy Track queries to determine accuracy Enable categorization to test strategies; refine if necessary Test with disparate user groups Maintain taxonomy Establish a workable change-management process Move documents, promote/demote categories, merge/delete as necessary Add more content Iterate 11/8/2018
Issues: Understand the BIG issues Maintenance Content expert or info professional Multiple taxonomies Organizational “perfection complex” [Chait] Categorization strategy Manual, automatic, both 11/8/2018
Issues: Multiple taxonomies Many editors Term approval process, synonyms Standard tools across the enterprise Federated taxonomies Taxonomy links, “cross-connections,” facets, views Taxonomy mapping 11/8/2018
Futures Methods: Feature extraction, statistical analysis, rules-based, better semantics, label generation Starter taxonomies, imports "Plug and play" classifiers Taxonomy mapping Interfaces: Visualization, better training tools Semantic Web 11/8/2018
11/8/2018
Q&A 11/8/2018