Knowledge Retrieval Taxonomies & Auto-Categorization Tom Reamy Knowledge Architect Intranet Consultant.

Slides:



Advertisements
Similar presentations
CIDOC 2000 Using GEM Metadata to Access Education Resources Nancy Virgil Morgan Coordinator
Advertisements

Taxonomy Development in an Enterprise Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Top Tips Enterprise Content Management Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Business Development Suit Presented by Thomas Mathews.
Making Search Relevant SchemaLogic Gary Carlson Chief Taxonomist
Laurie E Damianos, MITRE September 2008 Approved for Public Release; Distribution Unlimited. MITRE Case # ©2008 The MITRE Corporation. All rights.
Metadata Strategies Alternatives for creating value from metadata Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group.
1.Data categorization 2.Information 3.Knowledge 4.Wisdom 5.Social understanding Which of the following requires a firm to expend resources to organize.
Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Enterprise Information Architecture A Platform for Integrating Your Organization’s Information and Knowledge Activities Tom Reamy Chief Knowledge Architect.
Faceted Navigation: Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
1. Failure is when users do not feel they get what they paid for. 2. Failure is when the overall organization fails to adopt the solution.
Taxonomy Boot Camp Panel Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
1 of 2 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
Automatic Facets: Faceted Navigation and Entity Extraction Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
KNOWLEDGE MANAGEMENT AT ACCENTURE
Beyond Sentiment Mining Social Media A Panel Discussion of Trends and Ideas Marie Wallace, IBM Marcello Pellacani, Expert System Fabio Lazzarini, CRIBIS.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Introducing Symposia : “ The digital repository that thinks like a librarian”
Libraries and Institutional Content Management Systems
Unstructured Content Management Taxonomic Publishing Models Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Microsoft Office Sharepoint Server 2007 (MOSS) Overview Momentum Microsoft November 15, 2007.
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy and Knowledge Organization Taxonomy in Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Knowledge Maps An Intellectual Infrastructure for KM Tom Reamy Knowledge Architect Intranet Consultant.
Cyborg Categorization Salvation for Search? Tom Reamy Information Architect Charles Schwab © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights.
IBE312: Ch15 Building an IA Team & Ch16 Tools & Software 2013.
Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Enhanced Collaboration and other benefits of Sharepoint Technologies Kern Sutton Business Productivity Group Microsoft Corporation.
Text Analytics And Text Mining Best of Text and Data
Welcome to the Minnesota SharePoint User Group. Introductions / Overview Project Tracking / Management / Collaboration via SharePoint Multiple Audiences.
Knowledge Management and Technology for Today’s Legal Professional L. Keith Lipman, Esquire Director, Advanced Technology Solutions.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Taxonomies and Faceted Navigation Getting the Best of Both
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Multimedia Specification Design and Production 2013 / Semester 2 / week 7 Lecturer: Dr. Nikos Gazepidis * Notes by Dr Trevor Baker.
Get More Value from Your Reference Data—Make it Meaningful with TopBraid RDM Bob DuCharme Data Governance and Information Quality Conference June 9.
Controlled Vocabulary & Thesaurus Design Planning & Maintenance.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Teaching End User SharePoint Robert Bogue
1 The BT Digital Library A case study in intelligent content management Paul Warren
Content Strategy.
Human Resource Management Lecture 27 MGT 350. Last Lecture What is change. why do we require change. You have to be comfortable with the change before.
Lisa Ruff Business Productivity/Accessibility TS Microsoft Federal.
Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Metadata and Taxonomies The Best of Both Worlds Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
When Search is not Enough Case Study: The Advertising Research Foundation Gilbane Boston November 27, 2007 Gilbane Boston November 27, 2007.
New Ideas for IA Readings review - How to manage the process Content Management Process Management - New ideas in design Information Objects Content Genres.
Electronic Scriptorium, Ltd. AIIM Minnesota Chapter Metadata and Taxonomy Presentation Copyright Electronic Scriptorium, Ltd. All rights reserved, 1991.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
UCSD Libraries Portal Project: Building a Database-Driven Web Content Management System Sharecase, 3/28/2001 Esmé Cowles and Laura Galvan-Estrada.
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
The Claromentis Digital Workplace An Introduction
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
How EPA/ORD Moved to Drupal 7 Jessica Dearie U.S. EPA, Office of Research and Development Office of Science Information Management.
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Federated & Meta Search
Taxonomies, Lexicons and Organizing Knowledge
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Presentation transcript:

Knowledge Retrieval Taxonomies & Auto-Categorization Tom Reamy Knowledge Architect Intranet Consultant

Knowledge Retrieval l Taxonomy: What, Why, How? l Taxonomy and Auto-Categorization –Approaches and Companies l Applied Taxonomies: –Content Management, Search l Future Directions –Information Retrieval to Knowledge Retrieval

Taxonomy: What l What is a Taxonomy? n Organization: Hierarchical, web, etc. n Card Catalog, Yahoo n Creates a context within which facts are related n Find, Identify, Describe information, relations, context.

Taxonomy: What l Is this a Taxonomy? –Things that begin with the letter A –Things that have 4 legs –Things that are used to write with –Fantasy Animals –Large Orange Objects –Objects used by non-humans for undisclosed purposes. l Jorge Luis Borges

Taxonomy: What l What makes a good taxonomy? l The Library of Congress catalog? –No. Not unless your intranet contains as much information as the LC. l An understandable organization of content that enables people to find information and which supports knowledge discovery.

Taxonomy: Why l Search Stinks l Professionals spend more time looking for information than using it. l Solution: Browse and Search l Need a Taxonomy l It’s ain’t easy, so why do it?

Taxonomy: Why l Cost of poor Search and Content Management –If its not organized, you can’t find it. –If you can’t find it, you can’t use it. –If you can’t find it, you waste a lot of time. –If you can’t find it, you could lose an account. –If you can’t find it, you could look stupid. –If you can’t find it, it doesn’t exist.

Taxonomy: Why l How does a Taxonomy improve Search and Content Management? –Browse and Search works better than Search l ecommerce - 56% of all searches fail = lost income l Intranet - lost time, lost business, lost ideas –Improved Publishing Model: By category, not department –Rich semantic web of concepts, not a unstructured collection of documents

Taxonomy: Why l How does Content Management improve Taxonomies? n CM supports intelligent distributed categorization: –Work Flow: Central and local –Multiple roles: IA, SME, author, editor n CM supports automatic meta data and categorization

Taxonomy: How l Old Answer: Manual –hire a bunch of librarians and IA’s –Costly, difficult to maintain l New Answer: –Cyborg: Manual and Automatic Categorization –Integrate Content Management and Taxonomy –Integrate central IA’s and local authors

Automatic vs. Humanatic l Humans are better, but not as consistent –General bin, understandable mistakes –Bring outside contexts to the document l Purpose, similar documents, common sense l Computers are faster and cheaper. –Faster yes, Cheaper ? –Cost of poorer quality categorization l Intranet: 20,000 users taking 60 seconds longer = $20,000 a week

News Feeds - Corporate Intranets l News Feeds and Content providers –uniform content, size and structure –professional writers –Simple or standard vocabulary l Corporate intranet –Wildly varied content –Mix of good, bad, and ugly writers –Tower of Babel: Acronyms, special meanings

Auto-Categorization: the How l Automatic Methods n Catalog by Example –Training Sets (5-500) –Bag of Words or language and concepts n Statistical Clustering –Set of Documents & Taxonomy Level l Semi-Automatic: Rules

Auto-Categorization: the How l Next Generation n Support Vector Machines n Machine Learning n World Knowledge l Incremental Improvement n From 75% to 85% l Critical Issue: Integration

Categorization Explosion l Autonomy l Semio l Verity l Inxight l Topical Net l Mohomine l Simile l H5Technologies l YellowBrix l GammaSite l MetaTagger l Applied Semantics l Sageware l SmartLogik l Quiver l Stratify l Vivisimo l Other - Tacit

Auto-Categorization: Features l The Categorization Algorithm n SVM – Vector space is an improvement n Higher Accuracy n Fewer documents for training set n White Box – customize recall & precision n Categorize multiple file types & sizes l Clustering – Taxonomy Builder

Auto-Categorization: Features l Support Distributed Activities –Distributed work flow: authors, subject matter experts, information architects –Provisional categorization, keywords, meta data –Automatic summarization –Ease of Use, Integration with CM and Search l Integrate with Rules, Meta Data n Content to Context

Auto-Categorization: Features l Platform for Knowledge Retrieval n World Knowledge –Pre-Built Categories –Rich Semantic Net (WordNet+) –Entity Extraction n Integration –Specialized Audiences & Vocabularies –Content, Expertise, Communities, Activities

The Answer is Cyborg l Automatic Categorization is Not. l Professional Services: Initial Taxonomy l Cyborg: Human and Automatic Integration –Distributed Work Flow l Cyborg Integration with Content Management, Search

Content Management and Taxonomy l Taxonomic Publishing Model n Publish by Category, not web site l Web site the wrong unit of organization n 10 pages to 10,000 pages n 10 users to 20,000 users n 1 activity to 100’s of activities

Content Management and Taxonomy l Content Re-Organization n Support Browse by Topic, Type, Task n Rich Web of Related Content –Product information l Basic Info + background contexts l Legal / Policy contexts l Technical Contexts l Customer / Task contexts

Content Management and Taxonomy l Content Re-Organization: Next Steps n Document can be wrong unit of organization n Information / Learning objects n XML based objects: reuse, combine (relations and contexts) in more flexible and sophsticated ways.

Content Management: Re-organize Authoring l Streamline Authoring n Minimize IT / Web Developer Bottleneck l Integrated Work Flow & Categorization n Central: Librarian and/or Information Architects n Distributed: content owners, authors, SME’s n Distributed Categorization, Meta Data

Applied Taxonomy: Search l Intranet Environments l Case Studies: n Meta Data n Browse / Search Model

Intranet Environments l Global, Distributed l Variety of Documents, People, Activities l 100’s independent Web Sites l Documents, Databases, Applications © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. ( )

Meta Data: Dublin Core+ l Title l Description l Keywords l Creator l Publisher l ContentType l Audience l SectionName l Language l Contributor l Contributor.Technical l Date.Created l Date.Review l Format l Identifier l Rights © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. ( )

Controlled Vocabularies l ContentType n Application n Calendar n Form n FAQ n Mission n Reference n Training l Audience n Function –Project Manager –Trainer n Enterprise –Retail –Technology n Role –Admin Assistant –Officer © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. ( )

First Generation Browse Taxonomy l News l Education & Training l HR / Benefits l Employee Services & Programs l Departments l Communities l Tools, Forms, Calendars l How To/ FAQ’s l Products l Reference & Resources © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights reserved. ( )

Future Directions l Extending Taxonomies n Richer World Knowledge n Smarter Learning n Additional Content: Databases, Word Docs on network drive, n Integration of external content

Future Directions l Integration: Creation to Retrieval n Collaborative Filtering and Categorization l Integration throughout the Enterprise n People, Communities, Expertise l Contexualizing content n Related topics and related contexts l Categories for Stories