Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant.

1 Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant

2 Categorization Explosion l Autonomy l Semio l Verity l Inxight l Topical Net l Mohomine l Simile l H5Technologies l YellowBrix l GammaSite l MetaTagger l Applied Semantics l Sageware l SmartLogik l Quiver l Stratify l Vivisimo l Other - Tacit

3 Categorization: Why Now? l Search Stinks l Professionals spend more time looking for information than using it. l Solution: Browse and Search l Buy Search to get Categorization l Need a Taxonomy

4 Taxonomy: How l Old Answer: Manual –hire a bunch of librarians and IA’s –Costly, difficult to maintain l New Answer: –Automatic Categorization l A Better Answer: –Cyborg Categorization –Integrate Content Management, Search,Taxonomy –Integrate central IA’s and local authors

5 Auto-Categorization: the How l Automatic Methods n Catalog by Example –Training Sets (5-500) –Bag of Words or language and concepts n Statistical Clustering –Set of Documents & Taxonomy Level l Semi-Automatic: Rules

6 Auto-Categorization: the How l Next Generation n Support Vector Machines n Machine Learning n World Knowledge l Incremental Improvement n From 75% to 85% l Critical Issue: Integration

7 Automatic vs. Humanatic l Humans are better, but not as consistent –General bin, understandable mistakes –Bring outside contexts to the document l Purpose, similar documents, common sense l Automatic is faster and cheaper. –Faster yes, Cheaper ? –Cost of poorer quality categorization l Intranet: 20,000 users taking 60 seconds longer = $20,000 a week

8 Automatic vs Humanatic: News Feeds to Corporate Intranets l News Feeds and Content providers –uniform content, size and structure –professional writers –Simple or standard vocabulary l Corporate intranet –Wildly varied content –Mix of good, bad, and ugly writers –Tower of Babel: Acronyms, special meanings

9 The Answer is Cyborg l No one software has best of automatic l Automatic Categorization is not l Integration not Assimilation l Human and Computer Integration l Cyborg Integration and Content Management, Search

10 Human - Computer Integration l Humans –Create top level taxonomy –Create rules, select training sets –Final Quality Control l Automatic –Provisional Categorization and Meta Data –Automatic Summarization l Combination –Integration of Rules, World Knowledge

11 Content Management & Search l Content Management –Distributed Work Flow: Central IA & local authors –Collaborative Categorization –Taxonomic Publishing Model l Search –Support Browse and Seach –Real time clustering, categorizing –Collaborative filtering - by category

12 Lessons Learned l Out of the Box, Out of Your Mind l Play well with others l Brain surgery is fun! l World revolves around you l Quality counts and size matters l Let a Hundred flowers Bloom l The End

13 The END l Really.

