Download presentation
Presentation is loading. Please wait.
Published byAugustus Hart Modified over 9 years ago
1
Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant
2
Categorization Explosion l Autonomy l Semio l Verity l Inxight l Topical Net l Mohomine l Simile l H5Technologies l YellowBrix l GammaSite l MetaTagger l Applied Semantics l Sageware l SmartLogik l Quiver l Stratify l Vivisimo l Other - Tacit
3
Categorization: Why Now? l Search Stinks l Professionals spend more time looking for information than using it. l Solution: Browse and Search l Buy Search to get Categorization l Need a Taxonomy
4
Taxonomy: How l Old Answer: Manual –hire a bunch of librarians and IA’s –Costly, difficult to maintain l New Answer: –Automatic Categorization l A Better Answer: –Cyborg Categorization –Integrate Content Management, Search,Taxonomy –Integrate central IA’s and local authors
5
Auto-Categorization: the How l Automatic Methods n Catalog by Example –Training Sets (5-500) –Bag of Words or language and concepts n Statistical Clustering –Set of Documents & Taxonomy Level l Semi-Automatic: Rules
6
Auto-Categorization: the How l Next Generation n Support Vector Machines n Machine Learning n World Knowledge l Incremental Improvement n From 75% to 85% l Critical Issue: Integration
7
Automatic vs. Humanatic l Humans are better, but not as consistent –General bin, understandable mistakes –Bring outside contexts to the document l Purpose, similar documents, common sense l Automatic is faster and cheaper. –Faster yes, Cheaper ? –Cost of poorer quality categorization l Intranet: 20,000 users taking 60 seconds longer = $20,000 a week
8
Automatic vs Humanatic: News Feeds to Corporate Intranets l News Feeds and Content providers –uniform content, size and structure –professional writers –Simple or standard vocabulary l Corporate intranet –Wildly varied content –Mix of good, bad, and ugly writers –Tower of Babel: Acronyms, special meanings
9
The Answer is Cyborg l No one software has best of automatic l Automatic Categorization is not l Integration not Assimilation l Human and Computer Integration l Cyborg Integration and Content Management, Search
10
Human - Computer Integration l Humans –Create top level taxonomy –Create rules, select training sets –Final Quality Control l Automatic –Provisional Categorization and Meta Data –Automatic Summarization l Combination –Integration of Rules, World Knowledge
11
Content Management & Search l Content Management –Distributed Work Flow: Central IA & local authors –Collaborative Categorization –Taxonomic Publishing Model l Search –Support Browse and Seach –Real time clustering, categorizing –Collaborative filtering - by category
12
Lessons Learned l Out of the Box, Out of Your Mind l Play well with others l Brain surgery is fun! l World revolves around you l Quality counts and size matters l Let a Hundred flowers Bloom l The End
13
The END l Really.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.