Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.

Slides:



Advertisements
Similar presentations
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Advertisements

Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,
Chapter 5: Introduction to Information Retrieval
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Implicit Queries for Vitor R. Carvalho (Joint work with Joshua Goodman, at Microsoft Research)
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
1 Statistical correlation analysis in image retrieval Reporter : Erica Li 2004/9/30.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Federated Search of Text Search Engines in Uncooperative Environments Luo Si Language Technology Institute School of Computer Science Carnegie Mellon University.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
Chapter 6: Information Retrieval and Web Search
1 Computing Relevance, Similarity: The Vector Space Model.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Can Change this on the Master Slide Monday, August 20, 2007Can change this on the Master Slide0 A Distributed Ranking Algorithm for the iTrust Information.
IR Theory: Relevance Feedback. Relevance Feedback: Example  Initial Results Search Engine2.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Chapter 23: Probabilistic Language Models April 13, 2004.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
Language Model in Turkish IR Melih Kandemir F. Melih Özbekoğlu Can Şardan Ömer S. Uğurlu.
Effective Keyword-Based Selection of Relational Databases By Bei Yu, Guoliang Li, Karen Sollins & Anthony K. H. Tung Presented by Deborah Kallina.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Information Retrieval
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
Post-Ranking query suggestion by diversifying search Chao Wang.
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
Survey Jaehui Park Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Statistical Properties of Text
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
An Efficient Algorithm for Incremental Update of Concept space
CSCI 5417 Information Retrieval Systems Jim Martin
Information Retrieval
INF 141: Information Retrieval
Information Retrieval and Web Design
Presentation transcript:

Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China

Introduction  Query suggestions are more useful for difficult topics, for which users have little knowledge to create meaningful queries  A meaningful query must infer the user’s query intent & information needs & must help user find the relevant documents containing relevant information  Existing web search engines rely on query logs to make query suggestions which are not available for desktop or enterprise search systems  Solution: a document centric probabilistic mechanism to generate query suggestions w/o using query logs which utilizes the document corpus to extract phrases 2

Related Work  Most of the previous works provide query expansion & refinement rather than query suggestions  Comp(lete)Search Method:  Provides real time auto-completion of the last query term typed by the user  Requires user to type at least two characters of the last query term which is the most frequent term  SimSearch Method:  Phrase index is searched to find phrases that contain the user submitted partial query as a sub-phrase  Selected phrases are presented to the user in order of their occurrence frequency 3

Proposed QS Approach  Based on the document centric probabilistic mechanism  Extracting phrases to create a database of phrases that can be used for completing partial user queries from document corpus  Using N-grams of all order 1, 2, & 3, i.e., unigrams, bigrams, & trigrams from the document corpus  Use idea similar to skip-grams rather than N-grams  N-gram is the number of non stop-words 4

Query Suggestions  At any given instant of time, after the user has entered k characters, denoted Q 1 k, which can be decomposed Q 1 K = Q c + Q t (1) where |Q c |  0, a set of words, & |Q t |  {0, 1}, a (in)complete word  Given a partial query Q 1 k & a phrase p i  P = {p 1, p 2, …, p n }, what is the probability P(p i | Q 1 k ), i.e., the probability that the user will type p i after typing Q 1 k ? 5

Query Suggestions 6

 The proposed query suggestion is defined as  The probability of selecting a phrase given a partial word is  The importance of phrases is determined by occurrence frequencies in the document corpus 7 a vocabulary word that start with Q t a phrase that contains the word c i

Estimating Phrase-Query Correlation  The contextual relationship between a phrase p i & a user submitted query Q c using their joint occurrence p i is the 2 nd half of the complete query & Q c is the 1 st half  Both P(Q c, p i ) & P(p i ) in the previous equation can be estimated using the corpus as follows: where D p i and D Q c represent the sets of documents that contain phrase p i and Q c, respectively 8

Experimental Results: Datasets  Two datasets were used  TREC  Consists of more than 200K news articles published in Financial Times between years 1991–1994  Ubuntu:  Consists of more than 100K discussion threads crawled from ubuntuforums.org, 25 queries, & relevance judgments 9

Baselines Methods  The proposed methods was compared with the following two baseline methods  Similarity based phrase search (SimSearch) Indexed phrases which contain user queries as sub-phrases are searched & ranked according to their occurrence frequencies  CompleteSearch (CompSearch) Offers real-time auto-completion of the last query term being typed by the user Also use frequency as the ranking criterion 10

Test Queries  Generated 40 partial test queries, created from 20 non- stop words, non-single keyword, randomly-chosen queries, for each dataset  Type-A Queries  Queries were generated by retaining only the 1 st keyword from each of the 20 original queries  Type-B Queries  Queries were generated by retaining the 1 st keyword of the query followed by the first randomly-chosen k characters (2 ≤ k ≤ length of the remaining query string) 11

Test Queries(cont’d) 12

Evaluation  For each test query, the top 10 suggestions generated by SimSearch, CompSearch & the proposed Probabilistic method were collected & evaluated by 3 assessors  Evaluation was performed w/ the help from 12 volunteers who were colleagues not associated with the project  For each query suggestion, each assessor assigned one rating among the four (given below) & major-vote is used 13

Suggestions Created by Two Test Queries 14

Success Rate of Different Methods  A query suggestion method is successful for a given partial query if it is able to generate at least one meaningful suggestion for the partial query 15

Quality of Suggestions 16

Precision Values Achieved by Different QS 17

Effectiveness of Suggested Queries  Query clarity score is used to measure the retrieval performance of suggested queries  Clarity score of a query increases if we add terms that reduce query ambiguity & it decreases on adding terms that make the query more ambiguous  Clarity score for a query q with respect to a collection of documents C is computed using KL-Divergence where V is the vocabulary of the collection 18

Clarity Scores Achieved by Different QS 19

Conclusions and Future Works  Meaningful query suggestions can be made in the absence of query logs with probabilistic approach using the occurrence of terms/phrases in a corpus of documents  Future works  A future goal is to ensure that the badly formed combination of phrases are eliminated from the suggestions  Use of synonyms and synonymous phrases to enable the system to suggest alternatives also needs to be explored  Systematic approach towards diversifying the suggested queries  Apply to a relatively larger scale 20