How Clustering of Search Results Can Aid Taxonomy Building.

Slides:



Advertisements
Similar presentations
KPI – what is that??? Two part question Are there more female or male patrons If one is greater, how much greater? KPI stands for Key Performance Indicators.
Advertisements

Ken Varnum Copyright © 2001 Ford Motor Company Information Architecture at Ford Motor Company Ken Varnum Head, Web Development Group Library.
Building an on-line presence that makes it easy for customers to find your service company.
Search for personal information using Yahoo BOSS by Evgeny Dosychev Dmitry Kichin Supervisor: Eddie Bortnikov.
Requirements Creep at the IRS By Group A. Introduction  Accomplishes work using information systems designed in the 60’s systems designed in the 60’s.
“ Leveraging SharePoint 2010 Search Technologies ” With: Ivan Neganov.
Lucene Part3‏. Lucene High Level Infrastructure When you look at building your search solution, you often find that the process is split into two main.
Web Document Clustering: A Feasibility Demonstration Hui Han CSE dept. PSU 10/15/01.
An Agent Capable of Learning to Create and Maintain Websites Anthony Tomasic, Ravi Mosur Alex Rudnicky, Raj Reddy, John Zimmerman Carnegie Mellon University.
Information Retrieval in Practice
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
1 Information Management on the World-Wide Web Junghoo “John” Cho UCLA Computer Science.
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Unit 3 Web Search Engines. Can You Find the Answers? n Connect to Google Google n Search for items on Iran Records ________ n Combine Iran with nuclear.
Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.
Static VS Dynamic websites. 1-What are the advantages and disadvantages? 2- Which one should you choose and why?
BACK TO SCHOOL (7 of 10) or… do we need to change project management practices in a big way things that we need to learn, unlearn or re-learn as fast as.
Channel Shift The Poole Experience Public Sector Forums 7 th February 2012 Chris Angell Service Improvement Manager.
Search Engine Optimization (SEO) Week 07 Dynamic Web TCNJ Jean Chu.
Federated Search: True Enterprise Search Abe Lederman, President and CTO Deep Web Technologies Search Engine Meeting – April 28-29, 2008.
Cutting Through the Clutter Searching the Web. There is a wealth of information waiting for you on the internet, if you know the right tools to use and.
An Introduction to Content Management. By the end of the session you will be able to... Explain what a content management system is Apply the principles.
1 Web Developer Foundations: Using XHTML Chapter 11 Web Page Promotion Concepts.
A Case Study in Success Online How to generate revenue through content marketing.
D YNAMIC B UILDING OF D OMAIN S PECIFIC L EXICONS U SING E MERGENT S EMANTICS Final Presentation Matt Selway Supervisor: Professor Markus Stumptner.
Evaluation David Kauchak cs458 Fall 2012 adapted from:
Beyond Google Search Using Google Search tools to their potential.
Evaluation David Kauchak cs160 Fall 2009 adapted from:
Full Process: From Application to Finalization
Using Metadata Skills for a Course Inventory Lee Richardson Health Sciences Library University of North Carolina at Chapel Hill ALA Annual Conference June.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
1 Chapter 11 Implementation. 2 System implementation issues Acquisition techniques Site implementation tools Content management and updating System changeover.
Could You Use More Traffic?. If you’re like most marketers, the answer to this question is… YES!
Bio-Medical Information Retrieval from Net By Sukhdev Singh.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
World Wide Web Resources What Do the Experts Say About Web Pages? February, 2001.
A Survey of Patent Search Engine Software Jennifer Lewis April 24, 2007 CSE 8337.
7. Approaches to Models of Metadata Creation, Storage and Retrieval Metadata Standards and Applications.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
1 CHBE Orientation Program Searching the Literature.
Web Document Clustering: A Feasibility Demonstration Oren Zamir and Oren Etzioni, SIGIR, 1998.
Web Searching. How does a search engine work? It does NOT search the Web (when you make a query) It contains a database with info on numerous Web sites.
The Internet 8th Edition Tutorial 4 Searching the Web.
IT Job Roles & Responsibilities Shannon Ciriaco Unit 2:
Electronic Scriptorium, Ltd. AIIM Minnesota Chapter Metadata and Taxonomy Presentation Copyright Electronic Scriptorium, Ltd. All rights reserved, 1991.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Search & Searchability. Presentation from David Hawking – CSIRO Ineffectual corporate search tools can be the biggest drag on employee productivity. Knowledge.
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
Content Management Systems Jenny Owens & Nick Owens.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
How to create a high traffic website. Ok, so your site is now live and you still haven't seen any traffic whatsoever to your website. Although getting.
February 2016 How does your marketing activity compare to other attractions? A snapshot Nick How Mill House, North Street, York, YO1 6JD Tel:
WebScan: Implementing QueryServer 2.0 Karl Geiger, Amgen Inc. BRS NA UG August 1999.
Developing GRID Applications GRACE Project
Accurate  Consistent  Compliant Contact: i4i the structured content company the structured content company.
SEARCH ENGINE OPTIMIZATION, SECURITY, MAINTENANCE.
The success of a website depends on a number of factors like the designs, implementations, functionality and the maintenance of the webpage. Hence, it.
Improving E-Book Access via a Library Developed Full-Text Search Tool Jill E. Foust, MLS Phillip Bergen, MA, MS Gretchen L. Maxeiner, MA, MS Health Sciences.
GroupRocket.net. Years back checking s in the morning was the first ever thing most of the professionals would start their day with. And with the.
CPS 49S Google: The Computer Science Within and its Impact on Society Shivnath Babu Spring 2007.
Information Retrieval in Practice
Software Documentation
Taxonomies, Lexicons and Organizing Knowledge
Search Search Engines Search Engine Optimization Search Interfaces
Welcome to SharePoint Saturday Denver!
Louisiana: Our History.
Platinum Sponsors Silver Sponsors Say Thanks to our Sponsors
Welcome to SharePoint/O365 Saturday Kansas City!.
Presentation transcript:

How Clustering of Search Results Can Aid Taxonomy Building

About  Vivisimo Inc. is enterprise software company Carnegie Mellon spinoff (June ’00) Profitable since FY  Vivisimo.com is award-winning web-search site Vivisimo.com Best meta-search site (Search Engine Watch)  $1M Funding from National Science Foundation SBIR  Raul Valdes-Perez, PhD President & co-founder Adjunct Assoc Prof, Carnegie Mellon Computer Science Dept

Categorization Saves Time & Money  Intuition Lots of wasted effort if information is disorganized View few results before exhausting your patience  Modeling Assumptions User spends 12 min before giving up or moving on Eye skips over search results or folders sequentially 1,000 users at $60 per hour 2 searches per user/day 10 minutes to solve problem elsewhere when search fails  Folders let you see 11 docs in detail vs. 6 for ranked lists  Conclusion: savings of $1M+ per year (white paper)white paper

Taxonomy Building Challenges I  Getting Everyone on Board “We have no process for consistently tagging our content. We have 50 different business units. People in one unit do a great job, but others do not use tags at all.” Forrester ReportForrester Report  Expense Forrester says $4 per page to make a controlled vocabulary $50 per document to manually tag (large pharma)  Expertise Need highly qualified staff to maintain the taxonomy NLM has staff of ten (4 PhDs and 1 MD) to update MeSH

Taxonomy Building Challenges II  Discovery If users are familiar with the material, controlled vocabularies offer little scope for surprise  Currency Controlled vocabulary lags fast-changing world  Federated Search How to handle external information sources?

When Can Categorization Occur?  Categorization Moments Taxonomy building categorizes at creation time Clustering can categorize at delivery time of search results  Cluster top Search Results into Labelled Folders Uses title, snippet, and (optionally) meta-tags if they exist Works with any good search engine (Autonomy, Convera, FAST, Google, Sharepoint, Ultraseek, Verity, etc.) Interoperable with search engine’s XML; also outputs XML  Cluster Categories Need to … Be concise, accurate, natural, & distinctive Allow search results to appear in more than one category Not let them appear in too many categories (1.4 on average)

Clustering Can Aid Taxonomy Building  Key Idea: Cluster on Title, Abstract, and Index Terms Treat everything as free text Index terms get parsed, stemmed, etc. exactly like the rest  Advantages Proceed with taxonomies without needing universal agreement Proceed with taxonomies as budget allows Lack of expert staff for indexing won’t kill the approach Combination with spontaneous categories allows surprise New categories can emerge immediately (e.g., SARS) Federated search is not a problem: your documents can be indexed into the taxonomy, the external ones need not  ClusterMed clusters on title, abstract, and MeSH terms  Or on any of them individually, or author, affiliation etc.

Federated (Meta) Search Challenge

Clustering at Delivery Time Allows Categorizing Federated-Search Results

Some Organizations that Have Selected Clustering

Conclusion  Taxonomy Building faces many challenges Rarely have the resources to do the job fast Therefore, implementation delays are long Have to wait until everything is completed?  Combine Clustering with Taxonomies Combine Provide categorized info to users right away Work on taxonomy & indexing as resources permit