Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A.

Slides:



Advertisements
Similar presentations
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Advertisements

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 龙星计划课程 : 信息检索 Personalized Search & User Modeling ChengXiang Zhai.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Towards a Game-Theoretic Framework for Information Retrieval
Personalization and Search Jaime Teevan Microsoft Research.
Information Retrieval Models: Probabilistic Models
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Chapter 7 Retrieval Models.
Information Retrieval in Practice
Search Engines and Information Retrieval
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Federated Search of Text Search Engines in Uncooperative Environments Luo Si Language Technology Institute School of Computer Science Carnegie Mellon University.
Maximum Personalization: User-Centered Adaptive Information Retrieval ChengXiang (“Cheng”) Zhai Department of Computer Science Graduate School of Library.
Overview of Search Engines
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Search Engines and Information Retrieval Chapter 1.
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
Clustering Personalized Web Search Results Xuehua Shen and Hong Cheng.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
1 Rated Aspect Summarization of Short Comments Yue Lu, ChengXiang Zhai, and Neel Sundaresan.
Personalized Search Xiao Liu
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
1 Statistical Machine Translation Models for Personalized Search Rohini U AOL India R&D, Bangalore India Vamshi Ambati Language.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Implicit User Modeling for Personalized Search Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
1 Click Chain Model in Web Search Fan Guo Carnegie Mellon University PPT Revised and Presented by Xin Xin.
Automatic Labeling of Multinomial Topic Models
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Nonintrusive Personalization in Interactive IR Xuehua Shen Department of Computer Science University of Illinois at Urbana-Champaign Thesis Committee:
A Study of Poisson Query Generation Model for Information Retrieval
Context-Sensitive IR using Implicit Feedback Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Information Retrieval in Practice
Search Engine Architecture
Search User Behavior: Expanding The Web Search Frontier
Course Summary (Lecture for CS410 Intro Text Info Systems)
Information Retrieval Models: Probabilistic Models
Modeling Diversity in Information Retrieval
Author: Kazunari Sugiyama, etc. (WWW2004)
John Lafferty, Chengxiang Zhai School of Computer Science
ChengXiang (“Cheng”) Zhai Department of Computer Science
Retrieval Performance Evaluation - Measures
Language Models for TR Rong Jin
Presentation transcript:

Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A Decision-Theoretic Framework for Optimal Interactive Retrieval through Dynamic User Modeling Including joint work with Xuehua Shen, Bin Tan

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 2 What is a query? Query = a sequence of keywords that describe the information need of a particular user at a particular time for finishing a particular task iPhone batterySearch Rich context ! Query = a sequence of keywords?

Query must be put in a context SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 3 JaguarSearch Mac OS? Car ? Animal ? What queries did the user type in before this query? What documents were just viewed by this user? What documents were skipped by this user? What other users looked for similar information? ……

Context helps query understanding SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 4 Car Software Animal Suppose we know: 1.Previous query = “racing cars” vs. “Apple OS” 2.“car” occurs far more frequently than “Apple” in pages browsed by the user in the last 20 days 3. User just viewed an “Apple OS” document

Questions How can we model a query in a context- sensitive way?  Generalize query representation to user model How can we model the dynamics of user information needs?  Dynamic updating of user models How can we put query representation into a retrieval framework to improve search?  A framework for optimal interactive retrieval SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 5

Rest of the talk: UCAIR Project 1.A decision-theoretic framework 2.Statistical language models for implicit feedback (personalized search without extra user effort) 3.Open challenges SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 6

SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, UCAIR Project UCAIR = User-Centered Adaptive IR – user modeling (“user-centered”) – search context modeling (“adaptive”) – interactive retrieval Implemented as a personalized search agent that –sits on the client-side (owned by the user) –integrates information around a user (1 user vs. N sources as opposed to 1 source vs. N users) –collaborates with each other –goes beyond search toward task support

8 Main Idea: Putting the User in the Center! Search Engine “java” Personalized search agent WEB Search Engine Search Engine Desktop Files Personalized search agent “java”... Viewed Web pages Query History A search agent can know about a particular user very well SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

1. A Decision-Theoretic Framework for Optimal Interactive Retrieval SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

IR as Sequential Decision Making UserSystem A 1 : Enter a query Which documents to present? How to present them? R i : results (i=1, 2, 3, …) Which documents to view? A 2 : View document Which part of the document to show? How? R’: Document content View more? A 3 : Click on “Back” button (Information Need) (Model of Information Need) SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

Retrieval Decisions User U: A 1 A 2 … … A t-1 A t System: R 1 R 2 … … R t-1 Given U, C, A t, and H, choose the best R t from all possible responses to A t History H={(A i,R i )} i=1, …, t-1 Document Collection C Query=“Jaguar” All possible rankings of C The best ranking for the query Click on “Next” button All possible rankings of unseen docs The best ranking of unseen docs R t  r(A t ) R t =? SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

12 A Risk Minimization Framework User: U Interaction history: H Current user action: A t Document collection: C Observed All possible responses: r(A t )={r 1, …, r n } User Model M=(S,  U …) Seen docs Information need L(r i,A t,M)Loss Function Optimal response: r* (minimum loss) ObservedInferred Bayes risk SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 13 Approximate the Bayes risk by the loss at the mode of the posterior distribution Two-step procedure –Step 1: Compute an updated user model M* based on the currently available information –Step 2: Given M*, choose a response to minimize the loss function A Simplified Two-Step Decision-Making Procedure

14 Optimal Interactive Retrieval User A1A1 UC M* 1 P(M 1 |U,H,A 1,C) L(r,A 1,M* 1 ) R1R1 A2A2 L(r,A 2,M* 2 ) R2R2 M* 2 P(M 2 |U,H,A 2,C) A3A3 … Collection IR system SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 15 Refinement of Risk Minimization r(A t ): decision space (A t dependent) –r(A t ) = all possible subsets of C (document selection) –r(A t ) = all possible rankings of docs in C –r(A t ) = all possible rankings of unseen docs –r(A t ) = all possible subsets of C + summarization strategies M: user model –Essential component:  U = user information need –S = seen documents –n = “Topic is new to the user” L(R t,A t,M): loss function –Generally measures the utility of R t for a user modeled as M –Often encodes retrieval criteria (e.g., using M to select a ranking of docs) P(M|U, H, A t, C): user model inference –Often involves estimating a unigram language model  U

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 16 Case 1: Context-Insensitive IR –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –p(M|U,H,At,C)=p(  U |Q)

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 17 Case 2: Implicit Feedback –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –H={previous queries} + {viewed snippets} –p(M|U,H,At,C)=p(  U |Q,H)

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 18 Case 3: General Implicit Feedback –A t =“enter a query Q” or “Back” button, “Next” button –r(A t ) = all possible rankings of unseen docs in C –M= (  U, S), S= seen documents –H={previous queries} + {viewed snippets} –p(M|U,H,At,C)=p(  U |Q,H)

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 19 Case 4: User-Specific Result Summary –A t =“enter a query Q” –r(A t ) = {(D,  )}, D  C, |D|=k,  {“snippet”,”overview”} –M= (  U, n), n  {0,1} “topic is new to the user” –p(M|U,H,At,C)=p(  U,n|Q,H), M*=(  *, n*) n*=1n*=0  i =snippet 10  i =overview 01 Choose k most relevant docs If a new topic (n*=1), give an overview summary; otherwise, a regular snippet summary

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 20 Case 5: Active Feedback –A t =“enter a query Q” or “Back” button, “Next” button –r(A t ) = all subsets of k docs in C; r i ={d 1, …, d k } –A t+1 = {J 1, …, J k } relevance judgments on R t –M=  U, unigram language model (word distribution) –L = utility of R t for the user + utility of R t for feedback Tradeoff between relevance and diversity For difficult topics, diversity dominates the loss

2. Statistical Language Models for implicit feedback (Personalized search without extra user effort) SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 22 Risk Minimization for Implicit Feedback –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –H={previous queries} + {viewed snippets} –p(M|U,H,At,C)=p(  U |Q,H) Need to estimate a context-sensitive LM

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 23 Estimate a Context-Sensitive LM Q2Q2 C 2 ={C 2,1, C 2,2,C 2,3, … } … C 1 ={C 1,1, C 1,2,C 1,3, …} User Clickthrough QkQk Q1Q1 User Query e.g., Apple software e.g., Apple - Mac OS X Apple - Mac OS X The Apple Mac OS X product page. Describes features in the current version of Mac OS X, … e.g., Jaguar User Model: Query HistoryClickthrough

Short-term vs. long-term implicit feedback Short term implicit feedback –context = current retrieval session –past queries in the context are closely related to the current query –clickthroughs  user’s current interests Long term implicit feedback –context = all search interaction history –not all past queries/clickthroughs are related to the current query SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 24

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 25 “Bayesian interpolation” for short-term implicit feedback Q1Q1 Q k-1 … C1C1 C k-1 … Average user query and clickthrough history Intuition: trust the current query Q k more if it’s longer QkQk Dirichlet Prior

26 Overall Effect of Search Context Query FixInt (  =0.1,  =1.0) BayesInt (  =0.2, =5.0) OnlineUp (  =5.0, =15.0) BatchUp (  =2.0, =15.0) Q3Q Q 3 +H Q +H C Improve 72.4%32.6%93.8%39.4%67.7%20.2%92.4%39.4% Q4Q Q 4 +H Q +H C Improve 66.2%15.5%78.2%19.9%47.8%6.9%77.2%16.4% Short-term context helps system improve retrieval accuracy BayesInt better than FixInt; BatchUp better than OnlineUp SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

27 Using Clickthrough Data Only Q3Q Q 3 +H C Improve81.9%37.1% Q4Q Q 4 +H C Improve72.6%18.1% BayesInt (  =0.0, =5.0) Clickthrough is the major contributor 13.9% 67.2%Improve Q 4 +H C Q4Q4 42.4%99.7%Improve Q 3 +H C Q3Q3 Performance on unseen docs -4.1%15.7%Improve Q 4 +H C Q4Q4 23.0%23.8%Improve Q 3 +H C Q3Q3 Snippets for non-relevant docs are still useful!

28 Mixture model with dynamic weighting for long-term implicit feedback q1D1C1q1D1C1 S1S1 θS1θS1 q2D2C2q2D2C2 S2S2 θS2θS2... q t-1 D t-1 C t-1 S t-1 θ S t-1 θHθH qtDtqtDt StSt θqθq θ q,H λ1?λ1? λ2?λ2? λq?λq? 1-λ q λ t-1 ? select {λ} to maximize P(D t | θq, H) EM algorithm SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

29 Results: Different Individual Search Models recurring ≫ fresh combination ≈ clickthrough > docs > query, contextless SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

30 Results: Different Weighting Schemes for Overall History Model hybrid ≈ EM > cosine > equal > contextless SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland Open Challenges What is a query? How to collect as much context information as possible without infringing user privacy? How to store and organize the collected context information? How to accurately interpret/exploit context information? How to formally represent the evolving information need of a user? How to optimize search results for an entire session? What’s the right architecture (client-side, server-side, and client-server combo)?

References Framework –Xuehua Shen, Bin Tan, and ChengXiang Zhai, Implicit User Modeling for Personalized Search, In Proceedings of CIKM 2005, pp –ChengXiang Zhai and John Lafferty, A risk minimization framework for information retrieval, Information Processing and Management, 42(1), Jan pages Short-term implicit feedback –Xuehua Shen, Bin Tan, ChengXiang Zhai, Context-Sensitive Information Retrieval with Implicit Feedback, Proceedings of SIGIR 2005, pp Long-term implicit feedback –Bin Tan, Xuehua Shen, ChengXiang Zhai, Mining long-term search history to improve search accuracy, Proceedings of KDD 2006, pp SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 32

Thank You! SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010