Presentation is loading. Please wait.

Presentation is loading. Please wait.

Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.

Similar presentations


Presentation on theme: "Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China."— Presentation transcript:

1 Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China

2 Introduction  Query suggestions are more useful for difficult topics, for which users have little knowledge to create meaningful queries  A meaningful query must infer the user’s query intent & information needs & must help user find the relevant documents containing relevant information  Existing web search engines rely on query logs to make query suggestions which are not available for desktop or enterprise search systems  Solution: a document centric probabilistic mechanism to generate query suggestions w/o using query logs which utilizes the document corpus to extract phrases 2

3 Related Work  Most of the previous works provide query expansion & refinement rather than query suggestions  Comp(lete)Search Method:  Provides real time auto-completion of the last query term typed by the user  Requires user to type at least two characters of the last query term which is the most frequent term  SimSearch Method:  Phrase index is searched to find phrases that contain the user submitted partial query as a sub-phrase  Selected phrases are presented to the user in order of their occurrence frequency 3

4 Proposed QS Approach  Based on the document centric probabilistic mechanism  Extracting phrases to create a database of phrases that can be used for completing partial user queries from document corpus  Using N-grams of all order 1, 2, & 3, i.e., unigrams, bigrams, & trigrams from the document corpus  Use idea similar to skip-grams rather than N-grams  N-gram is the number of non stop-words 4

5 Query Suggestions  At any given instant of time, after the user has entered k characters, denoted Q 1 k, which can be decomposed Q 1 K = Q c + Q t (1) where |Q c |  0, a set of words, & |Q t |  {0, 1}, a (in)complete word  Given a partial query Q 1 k & a phrase p i  P = {p 1, p 2, …, p n }, what is the probability P(p i | Q 1 k ), i.e., the probability that the user will type p i after typing Q 1 k ? 5

6 Query Suggestions 6

7  The proposed query suggestion is defined as  The probability of selecting a phrase given a partial word is  The importance of phrases is determined by occurrence frequencies in the document corpus 7 a vocabulary word that start with Q t a phrase that contains the word c i

8 Estimating Phrase-Query Correlation  The contextual relationship between a phrase p i & a user submitted query Q c using their joint occurrence p i is the 2 nd half of the complete query & Q c is the 1 st half  Both P(Q c, p i ) & P(p i ) in the previous equation can be estimated using the corpus as follows: where D p i and D Q c represent the sets of documents that contain phrase p i and Q c, respectively 8

9 Experimental Results: Datasets  Two datasets were used  TREC  Consists of more than 200K news articles published in Financial Times between years 1991–1994  Ubuntu:  Consists of more than 100K discussion threads crawled from ubuntuforums.org, 25 queries, & relevance judgments 9

10 Baselines Methods  The proposed methods was compared with the following two baseline methods  Similarity based phrase search (SimSearch) Indexed phrases which contain user queries as sub-phrases are searched & ranked according to their occurrence frequencies  CompleteSearch (CompSearch) Offers real-time auto-completion of the last query term being typed by the user Also use frequency as the ranking criterion 10

11 Test Queries  Generated 40 partial test queries, created from 20 non- stop words, non-single keyword, randomly-chosen queries, for each dataset  Type-A Queries  Queries were generated by retaining only the 1 st keyword from each of the 20 original queries  Type-B Queries  Queries were generated by retaining the 1 st keyword of the query followed by the first randomly-chosen k characters (2 ≤ k ≤ length of the remaining query string) 11

12 Test Queries(cont’d) 12

13 Evaluation  For each test query, the top 10 suggestions generated by SimSearch, CompSearch & the proposed Probabilistic method were collected & evaluated by 3 assessors  Evaluation was performed w/ the help from 12 volunteers who were colleagues not associated with the project  For each query suggestion, each assessor assigned one rating among the four (given below) & major-vote is used 13

14 Suggestions Created by Two Test Queries 14

15 Success Rate of Different Methods  A query suggestion method is successful for a given partial query if it is able to generate at least one meaningful suggestion for the partial query 15

16 Quality of Suggestions 16

17 Precision Values Achieved by Different QS 17

18 Effectiveness of Suggested Queries  Query clarity score is used to measure the retrieval performance of suggested queries  Clarity score of a query increases if we add terms that reduce query ambiguity & it decreases on adding terms that make the query more ambiguous  Clarity score for a query q with respect to a collection of documents C is computed using KL-Divergence where V is the vocabulary of the collection 18

19 Clarity Scores Achieved by Different QS 19

20 Conclusions and Future Works  Meaningful query suggestions can be made in the absence of query logs with probabilistic approach using the occurrence of terms/phrases in a corpus of documents  Future works  A future goal is to ensure that the badly formed combination of phrases are eliminated from the suggestions  Use of synonyms and synonymous phrases to enable the system to suggest alternatives also needs to be explored  Systematic approach towards diversifying the suggested queries  Apply to a relatively larger scale 20


Download ppt "Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China."

Similar presentations


Ads by Google