Presentation is loading. Please wait.

Presentation is loading. Please wait.

TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.

Similar presentations


Presentation on theme: "TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal."— Presentation transcript:

1 TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal

2 Introduction: What is query routing?  Searching online can be both rewarding and frustrating.  General search engines such as Yahoo, Lycos return many irrelevant information to users query.  In such context, query routing attempts dynamically route each users’ query to the appropriate specialized search.

3  There are many general search engines such as Yahoo, Lycos, Alta Vista etc.  There are also many topic specific search engines such as VaccationSpot.com, KidsHealth.com etc.  However, many casual users are not familiar with all these topic specific search engines.  In such context, topic centric query expansion is important. Problem Description

4  It is of utmost importance to analyze other query routing systems as well before we discuss the importance of Topic Centric routing.  Manual Query Routing Services: - provide the categorized list of specialized search engine - users have to choose the search engines - although keyword search interface is provided the terms that can be accepted as the keywords are limited.  Query Routing based on Centroids: - consist centroids which are summaries of databases - these summaries consists a complete list of terms and frequencies of the databases. Why Topic Centric Query Routing:

5 - search engine is located by dividing which databases are relevant to a user query by comparing the query with each centroid. - this technique cannot be applied to most of the topic specific search engines provided on the Web because of the restricted access to their internal database.  Query Routing Without Centroids: - Instead of centroids this systems generate a short text to explain the centroids of databases. - if the search keywords are contained is such text then only the search engine will be located.

6 In such context, Topic Centric Query Routing is appropriate as it uses the routing model to expand the query. The general framework of the query routing model is as follows:  Getting relevant terms from the Web: - routing model does not use any special dictionaries, but it uses the Web as the source of relevant terms. - finds the Web documents relevant to the user query dynamically by submitting that query to a general search engine. - the relevant terms are extracted from those documents. Research Objective:

7  Co-occurrence based evaluation of term relevance. - the mutual relevance of terms is evaluated on the basis of their co-occurrences in the documents. - the co-occurrences of the search keywords are counted in all the documents retrieved by the general search engine. - the routing model list all the distinct terms contained in all documents and counts for each term the number of documents that contain both the search key word and that term.  Using a pseudo-feedback technique - it is difficult to determine the term relevance from only the results of a single document search on the general search engine.

8 -even relevant terms often have few co-occurrences in the selected documents of the first search. -in such context, query routing model re-evaluates such low co-occurrences terms selecting terms to be re-evaluated from the first search results, formulating new queries by adding the selected terms to the original query and performing the co-occurrence based evaluation for each formulated query.

9 Query expansion procedure 1.Get a document set D 0 relevant to a user query Q 0, where search keywords are w 01,..., w 0n, by sending Q 0 to a general search engine. 2.Count co-occurrences of search keywords and other terms in the document set D 0. 3.Let WH 0 and WL 0 be a set of terms whose co-occurrences exceed a certain threshold and a set of the other terms, respectively. WH 0 is considered relevant to the query Q 0 and will be a part of the query expansion result. 4.Pick up at most four topic terms wt 1 -wt 4 from WL 0. 5.Formulate four queries QT 1 -QT 4 by combining wt 1 -wt 4 with Q 0 (for example, QT 1 ="w 01... w 0n wt 1 "). Figure 4: Query expansion procedure.

10 6. Clustering all terms in D 0 to at most three clusters: W 1 ={w 11,..., w 1m }, W 2 ={w 21,..., w 2k } and W 3 ={w 31,..., w 3j }. 7. Formulate three queries Q 1 -Q 3 by combining W 1 -W 3 with Q 0 (for example, Q 1 ="w 01... w 0n w 11... w 1m "). 8. Get document sets DT 1 -DT 4 and D 1 -D 3 by sending QT 1 - QT 4 and Q 1 -Q 3 independently to a general search engine. 9. Count co-occurrences in DT 1 -DT 4 and D 1 -D 3. Sets of high co-occurrence terms WTH 1 -WTH 4 and WH 1 -WH 3, as well as WH 0 in step 3, are query expansion results.

11

12 Query Routing Result Query: “python” User query If you are looking for information about… movie-monty python Phrase to explain topic [1600] Search/Go to Search the Internet Movie Database [1600] Search/Go to The Roger Ebert Movie Files [1600] Search/Go to Horror Search Recommended topic Search Engines

13 Object oriented programming in python [7500] Search/Go to Index to Object Oriented Information Sources [3600] Search/Go to Unix Programming jpython- python in java [6300] Search/Go to java.sun.com –The Source for Java™ Technology [5641] Search/Go to Gamelan- The official Java Directory [4921] Search/Go to JCentral – Search the web for Java [4266] Search/Go to Index to Object Oriented Information Sources Other Topics….

14 Topic Centric Query Routing Importance of Topic Centric Query Routing Query Routing Model is used. Query Routing Model is used. Query Routing model doesn’t generate centroids. Query Routing model doesn’t generate centroids. IT consists an off line pre-processing component and online interface. IT consists an off line pre-processing component and online interface. Offline Query Routing Model takes as input a set of search engines and creates for each engine, an approximate textual model of that engine’s content or scope. Offline Query Routing Model takes as input a set of search engines and creates for each engine, an approximate textual model of that engine’s content or scope.

15 Online Query Routing Model takes a user query as input and applies a novel query expansion technique to the query; Then it clusters the output of the query expansion to suggest multiple topics that user may be interested in. Each topic is associated with a set of search engines, eg., “Python”

16 Query Expansion model has the ability to automatically obtain terms relevant to a query from the web. Using Query Expansion model, it is not necessary to maintain a massive dictionary of terms in a wide range of fields.

17 Conclusion Topic centric query routing uses a query expansion model. Query expansion model obtains all the information necessary in query routing form the web. Thus Query routing model is an intelligent agent that uses the web as its knowledge and identifies topics of given queries dynamically by query expansion.


Download ppt "TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal."

Similar presentations


Ads by Google