Presentation is loading. Please wait.

Presentation is loading. Please wait.

How Clustering of Search Results Can Aid Taxonomy Building.

Similar presentations


Presentation on theme: "How Clustering of Search Results Can Aid Taxonomy Building."— Presentation transcript:

1 How Clustering of Search Results Can Aid Taxonomy Building

2 About  Vivisimo Inc. is enterprise software company Carnegie Mellon spinoff (June ’00) Profitable since 2002-2003 FY  Vivisimo.com is award-winning web-search site Vivisimo.com Best meta-search site 2001-2003 (Search Engine Watch)  $1M Funding from National Science Foundation SBIR  Raul Valdes-Perez, PhD (valdes@vivisimo.com)valdes@vivisimo.com President & co-founder Adjunct Assoc Prof, Carnegie Mellon Computer Science Dept

3 Categorization Saves Time & Money  Intuition Lots of wasted effort if information is disorganized View few results before exhausting your patience  Modeling Assumptions User spends 12 min before giving up or moving on Eye skips over search results or folders sequentially 1,000 users at $60 per hour 2 searches per user/day 10 minutes to solve problem elsewhere when search fails  Folders let you see 11 docs in detail vs. 6 for ranked lists  Conclusion: savings of $1M+ per year (white paper)white paper

4 Taxonomy Building Challenges I  Getting Everyone on Board “We have no process for consistently tagging our content. We have 50 different business units. People in one unit do a great job, but others do not use tags at all.” Forrester ReportForrester Report  Expense Forrester says $4 per page to make a controlled vocabulary $50 per document to manually tag (large pharma)  Expertise Need highly qualified staff to maintain the taxonomy NLM has staff of ten (4 PhDs and 1 MD) to update MeSH

5 Taxonomy Building Challenges II  Discovery If users are familiar with the material, controlled vocabularies offer little scope for surprise  Currency Controlled vocabulary lags fast-changing world  Federated Search How to handle external information sources?

6 When Can Categorization Occur?  Categorization Moments Taxonomy building categorizes at creation time Clustering can categorize at delivery time of search results  Cluster top 200-500 Search Results into Labelled Folders Uses title, snippet, and (optionally) meta-tags if they exist Works with any good search engine (Autonomy, Convera, FAST, Google, Sharepoint, Ultraseek, Verity, etc.) Interoperable with search engine’s XML; also outputs XML  Cluster Categories Need to … Be concise, accurate, natural, & distinctive Allow search results to appear in more than one category Not let them appear in too many categories (1.4 on average)

7 Clustering Can Aid Taxonomy Building  Key Idea: Cluster on Title, Abstract, and Index Terms Treat everything as free text Index terms get parsed, stemmed, etc. exactly like the rest  Advantages Proceed with taxonomies without needing universal agreement Proceed with taxonomies as budget allows Lack of expert staff for indexing won’t kill the approach Combination with spontaneous categories allows surprise New categories can emerge immediately (e.g., SARS) Federated search is not a problem: your documents can be indexed into the taxonomy, the external ones need not  ClusterMed clusters on title, abstract, and MeSH terms  Or on any of them individually, or author, affiliation etc.

8

9 Federated (Meta) Search Challenge

10 Clustering at Delivery Time Allows Categorizing Federated-Search Results

11 Some Organizations that Have Selected Clustering

12 Conclusion  Taxonomy Building faces many challenges Rarely have the resources to do the job fast Therefore, implementation delays are long Have to wait until everything is completed?  Combine Clustering with Taxonomies Combine Provide categorized info to users right away Work on taxonomy & indexing as resources permit


Download ppt "How Clustering of Search Results Can Aid Taxonomy Building."

Similar presentations


Ads by Google