Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A.

Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A Decision-Theoretic Framework for Optimal Interactive Retrieval through Dynamic User Modeling Including joint work with Xuehua Shen, Bin Tan

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 2 What is a query? Query = a sequence of keywords that describe the information need of a particular user at a particular time for finishing a particular task iPhone batterySearch Rich context ! Query = a sequence of keywords?

Query must be put in a context SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 3 JaguarSearch Mac OS? Car ? Animal ? What queries did the user type in before this query? What documents were just viewed by this user? What documents were skipped by this user? What other users looked for similar information? ……

Context helps query understanding SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 4 Car Software Animal Suppose we know: 1.Previous query = “racing cars” vs. “Apple OS” 2.“car” occurs far more frequently than “Apple” in pages browsed by the user in the last 20 days 3. User just viewed an “Apple OS” document

Questions How can we model a query in a context- sensitive way?  Generalize query representation to user model How can we model the dynamics of user information needs?  Dynamic updating of user models How can we put query representation into a retrieval framework to improve search?  A framework for optimal interactive retrieval SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 5

Rest of the talk: UCAIR Project 1.A decision-theoretic framework 2.Statistical language models for implicit feedback (personalized search without extra user effort) 3.Open challenges SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 6

SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010 7 UCAIR Project UCAIR = User-Centered Adaptive IR – user modeling (“user-centered”) – search context modeling (“adaptive”) – interactive retrieval Implemented as a personalized search agent that –sits on the client-side (owned by the user) –integrates information around a user (1 user vs. N sources as opposed to 1 source vs. N users) –collaborates with each other –goes beyond search toward task support

8 Main Idea: Putting the User in the Center! Search Engine “java” Personalized search agent WEB Search Engine Email Search Engine Desktop Files Personalized search agent “java”... Viewed Web pages Query History A search agent can know about a particular user very well SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

1. A Decision-Theoretic Framework for Optimal Interactive Retrieval SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

IR as Sequential Decision Making UserSystem A 1 : Enter a query Which documents to present? How to present them? R i : results (i=1, 2, 3, …) Which documents to view? A 2 : View document Which part of the document to show? How? R’: Document content View more? A 3 : Click on “Back” button (Information Need) (Model of Information Need) SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

Retrieval Decisions User U: A 1 A 2 … … A t-1 A t System: R 1 R 2 … … R t-1 Given U, C, A t, and H, choose the best R t from all possible responses to A t History H={(A i,R i )} i=1, …, t-1 Document Collection C Query=“Jaguar” All possible rankings of C The best ranking for the query Click on “Next” button All possible rankings of unseen docs The best ranking of unseen docs R t  r(A t ) R t =? SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

12 A Risk Minimization Framework User: U Interaction history: H Current user action: A t Document collection: C Observed All possible responses: r(A t )={r 1, …, r n } User Model M=(S,  U …) Seen docs Information need L(r i,A t,M)Loss Function Optimal response: r* (minimum loss) ObservedInferred Bayes risk SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 13 Approximate the Bayes risk by the loss at the mode of the posterior distribution Two-step procedure –Step 1: Compute an updated user model M* based on the currently available information –Step 2: Given M*, choose a response to minimize the loss function A Simplified Two-Step Decision-Making Procedure

14 Optimal Interactive Retrieval User A1A1 UC M* 1 P(M 1 |U,H,A 1,C) L(r,A 1,M* 1 ) R1R1 A2A2 L(r,A 2,M* 2 ) R2R2 M* 2 P(M 2 |U,H,A 2,C) A3A3 … Collection IR system SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 15 Refinement of Risk Minimization r(A t ): decision space (A t dependent) –r(A t ) = all possible subsets of C (document selection) –r(A t ) = all possible rankings of docs in C –r(A t ) = all possible rankings of unseen docs –r(A t ) = all possible subsets of C + summarization strategies M: user model –Essential component:  U = user information need –S = seen documents –n = “Topic is new to the user” L(R t,A t,M): loss function –Generally measures the utility of R t for a user modeled as M –Often encodes retrieval criteria (e.g., using M to select a ranking of docs) P(M|U, H, A t, C): user model inference –Often involves estimating a unigram language model  U

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 16 Case 1: Context-Insensitive IR –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –p(M|U,H,At,C)=p(  U |Q)

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 17 Case 2: Implicit Feedback –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –H={previous queries} + {viewed snippets} –p(M|U,H,At,C)=p(  U |Q,H)

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 18 Case 3: General Implicit Feedback –A t =“enter a query Q” or “Back” button, “Next” button –r(A t ) = all possible rankings of unseen docs in C –M= (  U, S), S= seen documents –H={previous queries} + {viewed snippets} –p(M|U,H,At,C)=p(  U |Q,H)

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 19 Case 4: User-Specific Result Summary –A t =“enter a query Q” –r(A t ) = {(D,  )}, D  C, |D|=k,  {“snippet”,”overview”} –M= (  U, n), n  {0,1} “topic is new to the user” –p(M|U,H,At,C)=p(  U,n|Q,H), M*=(  *, n*) n*=1n*=0  i =snippet 10  i =overview 01 Choose k most relevant docs If a new topic (n*=1), give an overview summary; otherwise, a regular snippet summary

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 20 Case 5: Active Feedback –A t =“enter a query Q” or “Back” button, “Next” button –r(A t ) = all subsets of k docs in C; r i ={d 1, …, d k } –A t+1 = {J 1, …, J k } relevance judgments on R t –M=  U, unigram language model (word distribution) –L = utility of R t for the user + utility of R t for feedback Tradeoff between relevance and diversity For difficult topics, diversity dominates the loss

2. Statistical Language Models for implicit feedback (Personalized search without extra user effort) SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 22 Risk Minimization for Implicit Feedback –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –H={previous queries} + {viewed snippets} –p(M|U,H,At,C)=p(  U |Q,H) Need to estimate a context-sensitive LM

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 23 Estimate a Context-Sensitive LM Q2Q2 C 2 ={C 2,1, C 2,2,C 2,3, … } … C 1 ={C 1,1, C 1,2,C 1,3, …} User Clickthrough QkQk Q1Q1 User Query e.g., Apple software e.g., Apple - Mac OS X Apple - Mac OS X The Apple Mac OS X product page. Describes features in the current version of Mac OS X, … e.g., Jaguar User Model: Query HistoryClickthrough

Short-term vs. long-term implicit feedback Short term implicit feedback –context = current retrieval session –past queries in the context are closely related to the current query –clickthroughs  user’s current interests Long term implicit feedback –context = all search interaction history –not all past queries/clickthroughs are related to the current query SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 24

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 25 “Bayesian interpolation” for short-term implicit feedback Q1Q1 Q k-1 … C1C1 C k-1 … Average user query and clickthrough history Intuition: trust the current query Q k more if it’s longer QkQk Dirichlet Prior

26 Overall Effect of Search Context Query FixInt (  =0.1,  =1.0) BayesInt (  =0.2, =5.0) OnlineUp (  =5.0, =15.0) BatchUp (  =2.0, =15.0) MAPpr@20MAPpr@20MAPpr@20MAPpr@20 Q3Q3 0.04210.14830.04210.14830.04210.14830.04210.1483 Q 3 +H Q +H C 0.07260.19670.08160.20670.07060.17830.08100.2067 Improve 72.4%32.6%93.8%39.4%67.7%20.2%92.4%39.4% Q4Q4 0.05360.19330.05360.19330.05360.19330.05360.1933 Q 4 +H Q +H C 0.08910.22330.09550.23170.07920.20670.09500.2250 Improve 66.2%15.5%78.2%19.9%47.8%6.9%77.2%16.4% Short-term context helps system improve retrieval accuracy BayesInt better than FixInt; BatchUp better than OnlineUp SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland

27 Using Clickthrough Data Only QueryMAPpr@20 Q3Q3 0.04210.1483 Q 3 +H C 0.07660.2033 Improve81.9%37.1% Q4Q4 0.05360.1930 Q 4 +H C 0.09250.2283 Improve72.6%18.1% BayesInt (  =0.0, =5.0) Clickthrough is the major contributor 13.9% 67.2%Improve 0.1880.0739Q 4 +H C 0.1650.0442Q4Q4 42.4%99.7%Improve 0.1780.0661Q 3 +H C 0.1250.0331Q3Q3 pr@20MAPQuery Performance on unseen docs -4.1%15.7%Improve 0.18500.0620Q 4 +H C 0.19300.0536Q4Q4 23.0%23.8%Improve 0.18200.0521Q 3 +H C 0.14830.0421Q3Q3 pr@20MAPQuery Snippets for non-relevant docs are still useful!

28 Mixture model with dynamic weighting for long-term implicit feedback q1D1C1q1D1C1 S1S1 θS1θS1 q2D2C2q2D2C2 S2S2 θS2θS2... q t-1 D t-1 C t-1 S t-1 θ S t-1 θHθH qtDtqtDt StSt θqθq θ q,H λ1?λ1? λ2?λ2? λq?λq? 1-λ q λ t-1 ? select {λ} to maximize P(D t | θq, H) EM algorithm SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

29 Results: Different Individual Search Models recurring ≫ fresh combination ≈ clickthrough > docs > query, contextless SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

30 Results: Different Weighting Schemes for Overall History Model hybrid ≈ EM > cosine > equal > contextless SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 31 3. Open Challenges What is a query? How to collect as much context information as possible without infringing user privacy? How to store and organize the collected context information? How to accurately interpret/exploit context information? How to formally represent the evolving information need of a user? How to optimize search results for an entire session? What’s the right architecture (client-side, server-side, and client-server combo)?

References Framework –Xuehua Shen, Bin Tan, and ChengXiang Zhai, Implicit User Modeling for Personalized Search, In Proceedings of CIKM 2005, pp. 824-831. –ChengXiang Zhai and John Lafferty, A risk minimization framework for information retrieval, Information Processing and Management, 42(1), Jan. 2006. pages 31-55. Short-term implicit feedback –Xuehua Shen, Bin Tan, ChengXiang Zhai, Context-Sensitive Information Retrieval with Implicit Feedback, Proceedings of SIGIR 2005, pp. 43-50. Long-term implicit feedback –Bin Tan, Xuehua Shen, ChengXiang Zhai, Mining long-term search history to improve search accuracy, Proceedings of KDD 2006, pp. 718-723. SIGIR 2010 Workshop on Query Representation and Understanding, July 23, 2010, Geneva, Switzerland 32

Thank You! SIGIR 2010 Workshop on Query Representation and Understanding, Geneva, Switzerland, July 23, 2010

Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A.

Similar presentations

Presentation on theme: "Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A.

Similar presentations

Presentation on theme: "Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A."— Presentation transcript:

Similar presentations

About project

Feedback