Nonintrusive Personalization in Interactive IR Xuehua Shen Department of Computer Science University of Illinois at Urbana-Champaign Thesis Committee:

Nonintrusive Personalization in Interactive IR Xuehua Shen Department of Computer Science University of Illinois at Urbana-Champaign Thesis Committee: Nick Belkin (Rutgers University ) Kevin Chang Jiawei Han Marianne Winslett ChengXiang Zhai (Chair)

2 Problem of Non-personalized Search Jaguar Car Apple Software Animal Chemistry Software

3 Other Context Info: Dwelling time Mouse movement Clickthrough Query History Personalized Search: Put Search in Context Apple software Hobby …

4 Previous Work of Personalized Search Personalization is focused on ranking of retrieval process –Implicit feedback methods, e.g., White et al. [ECIR 04] –Personalization using desktop index, e.g., Teevan et al. [SIGIR 05] Limited notion of personalization –Stages of interactive IR: query formulation, ranking, result presentation and feedback –Personalization can be applied at each stage

5 Previous Work of Personalized Search (cont.) Personalization is applied at the server side –Personalized PageRank, e.g., Haveliwala[WWW02]; Jeh and Widom [WWW03] –Learning a ranking function, e.g. Radlinski and Joachims [KDD05] Privacy issue and server overhead –Client-side personalization will reduce privacy concerns and distribute the computation and storage

6 Previous Work of Personalized Search (cont.) Study of different factors on effectiveness of personalization –Implicit interest (e.g., display time) indicator, e.g., Claypool et al. [IUI01]; Kelly and Belkin [SIGIR04] –User interface of personalization, e.g., Dumais et al. [SIGCHI 01]; Cutrell et al. [SIGCHI 06] Practical system needed to demonstrate the effectiveness of personalization –Test personalization algorithms in real search activities –Collect data sets for the evaluation

7 Our View of Personalization Personalization can be taken with a broader view –The whole interactive retrieval process can be personalized Personalization should be done in a nonintrusive way –Protected privacy: personalization should not infringe user privacy –Progressive personalization: personalization should not reduce retrieval performance –No/minimal user efforts: personalization should minimize user efforts even when the user is willing to provide explicit feedback

8 Thesis Statement Personalization can be applied at different stages of interactive information retrieval process to improve user search experience in a nonintrusive way. Framework Ranking Result Presentation Feedback Protected PrivacyProgressive PersonalizationNo/minimal User Effort

9 Thesis Work Modeling interactive IR –Decision-theoretic framework (completed, [CIKM05] ) No/Minimal user efforts –Implicit feedback: personalized search without user efforts (completed, [SIGIR05a] ) –Active feedback: minimize user efforts in relevance feedback (completed, [SIGIR05b]) Protected privacy and personalized search system –UCAIR search agent design and evaluation (work in progress, [CIKM05]) Broader view of personalization –Personalization in clustering result presentation (remaining work) Progressive personalization –Progressive personalization (remaining work)

10 Model Interactive IR as Sequential Decision Making [CIKM05] User System A 1 : Enter a query Which documents to present? How to present them? R 1 : Present search results Which documents to view? A 2 : View a document Which part of the document to show? Other documents? R 2 : Present document content Rerank other documents More result to view? A 3 : Click on “Next” link (Information Need) (Model of Information Need)

11 Decision Theoretic Framework User: U Interaction history: H Current user action: A t Document collection: C Observed All possible responses: r(A t )={r 1, …, r n } User Model M=(S,  U …) Seen docs Information need L(r i,A t,M)Loss Function Optimal response: r* (minimum loss) ObservedInferred Bayes risk

12 Optimal Interactive Retrieval User A1A1 UC M* 1 P(M 1 |U,H,A 1,C) L(r,A 1,M* 1 ) R1R1 A2A2 L(r,A 2,M* 2 ) R2R2 M* 2 P(M 2 |U,H,A 2,C) A3A3 … Collection IR system

13 Benefit of the Framework Traditional view of IR –Retrieval  Match a query against documents –Insufficient for modeling personalized search (user and the interaction history are not part of a retrieval model) New framework User action New query Click ‘back’ and ‘next page’ Browsing search result New query System response Return search result Select subset docs Rerank search results Reorganiz e search results Rerank search results Research problem Implicit feedback Active feedback Eager feedback Personaliz ation in presentatio n Progressiv e personaliz ation

15 Implicit Feedback [SIGIR05a] Q2Q2 {C 2,1, C 2,2,C 2,3, … } C2C2 … Q1Q1 User Query {C 1,1, C 1,2,C 1,3, …} C1C1 User Clickthrough How to model and use all the information? QkQk e.g., Apple software e.g., Apple - Mac OS X Apple - Mac OS X The Apple Mac OS X product page. Describes features in the current version of Mac OS X, a screenshot gallery, latest software downloads, and a directory of... e.g., Jaguar

16 Retrieval Model QkQk D θQkθQk θDθD Similarity Measure Results Basis: Unigram language model + KL divergence U Personalized search: query model update using user query and clickthrough history Query History Clickthrough

17 Experiment Design Data collection: TREC AP88-90 Topics: 30 hard topics of TREC topics 1-150 Context: Query and clickthrough history of 3 participants Models: FixInt, BayesInt, OnlineUp and BatchUp Performance Comparison: Q k vs. Q k +H Q +H C Evaluation Metrics: MAP and Pr@20 docs

18 Overall Effect of Search Context Query FixInt (  =0.1,  =1.0) BayesInt (  =0.2, =5.0) OnlineUp (  =5.0, =15.0) BatchUp (  =2.0, =15.0) MAPpr@20MAPpr@20MAPpr@20MAPpr@20 Q3Q3 0.04210.14830.04210.14830.04210.14830.04210.1483 Q 3 +H Q +H C 0.07260.19670.08160.20670.07060.17830.08100.2067 Improve 72.4%32.6%93.8%39.4%67.7%20.2%92.4%39.4% Q4Q4 0.05360.19330.05360.19330.05360.19330.05360.1933 Q 4 +H Q +H C 0.08910.22330.09550.23170.07920.20670.09500.2250 Improve 66.2%15.5%78.2%19.9%47.8%6.9%77.2%16.4% Interaction history helps system improve retrieval accuracy

19 Using Clickthrough Data Only QueryMAPpr@20 Q3Q3 0.04210.1483 Q 3 +H C 0.07660.2033 Improve81.9%37.1% Q4Q4 0.05360.1930 Q 4 +H C 0.09250.2283 Improve72.6%18.1% BayesInt (  =0.0, =5.0) Clickthrough data is very useful to improve retrieval accuracy

20 Thesis Work Modeling interactive IR –Decision-theoretic framework (completed, [CIKM05] ) No/minimal user efforts –Implicit feedback: personalized search without user efforts (completed, [SIGIR05a] ) –Active feedback: minimize user efforts in relevance feedback (completed, [SIGIR05b]) Protected privacy and personalized search system –UCAIR search agent design and evaluation (work in progress, [CIKM05]) Broader view of personalization –Personalization in clustering result presentation (remaining work) Progressive personalization –Progressive personalization (remaining work)

21 Active Feedback: Document Selection in RF [SIGIR05b] Feedback Judgments: d 1 + d 2 - … d k - Query Retrieval System Which k docs to present ? User Document Collection Can we do better than just presenting top-K? (Consider diversity…)

22 Active Feedback (AF) An IR system actively selects documents for obtaining relevance judgments If a user is willing to judge K documents, which K documents should we present in order to maximize learning effectiveness?

23 Illustration of Three AF Methods Top-K (normal feedback) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … Gapped Top-K K-Cluster Centroid Aiming at high diversity …

24 Comparison of Three AF Methods Collection Active FB Method #AFRel Per topic Include judged docs MAPPr@10doc HARD 2003 Baseline/0.3010.501 Pseudo FB/0.3200.515 Top-K3.00.3250.527 Gapped2.6 0.330 ** 0.548 * Clustering2.40.3320.565 AP88-89 Baseline/0.2010.326 Pseudo FB/0.2180.343 Top-K2.20.2280.351 Gapped1.5 0.234 * 0.389 ** Clustering1.3 0.237 ** 0.393 ** Top-K is the worst! Clustering uses fewest relevant docs

26 UCAIR Toolbar Architecture [CIKM05] (http://sifaka.cs.uiuc.edu/ir/ucair/download.html) Search Engine (e.g., Google) Search History Log (e.g.,past queries, clicked results) Query Modification Result Re-Ranking User Modeling Result Buffer UCAIR User query results clickthrough…

27 System Characteristics Client side personalization –Privacy –Distribution of computation –More clues about the user Implicit user modeling and eager feedback Bayesian decision theory and statistical language model

28 User Actions (A t ) and System Responses (R t ) User Actions –Submit a keyword query –View a document –Click the “Back” button –Click the “Next” link System Responses –Decide relatedness of neighboring queries and do query expansion –Update user model according to clickthrough –Rerank unseen documents

29 A User Study of Personalized Search Six participants use UCAIR toolbar to do web search Topics are selected from TREC web track and terabyte track Participants explicitly evaluate the relevance of top 30 search results from Google and UCAIR

30 Precision at Top N Documents Ranking Methodprec@5prec@10prec@20prec@30 Google0.5380.4720.3770.308 UCAIR0.5810.5560.4530.375 Improvement8.0%17.8%20.2%21.8% More user interaction, better user model and retrieval accuracy

31 UCAIR Search Agent: Remaining Work Incorporate personalization in feedback and clustering result presentation into UCAIR search agent Incorporate progressive personalization algorithms in UCAIR search agent Evaluate the different personalization algorithms by deploying the prototype of the UCAIR search agent

33 Personalization in Result Presentation Can we personalize clustering presentation according to user interaction? –User Interaction provides more clues about user information need and the dynamic change –Clustering presentation should be adaptive to user interaction for better retrieval performance

34 Personalization in Result Presentation (cont.) What are different personalization strategies for clustering presentation? –Reranking documents based on a selected cluster –Reranking documents based on a viewed document –Promoting “near-miss” documents –Merging unselected clusters

35 Personalization in Result Presentation (cont.) What is the impact of different personalization strategies for different types of users? –Smart users: always make optimal cluster selection themselves –Dummy users: selection depends on ranking of search engines –Stochastic mixture: real users

36 Illustration of Document Reranking for Two Types of Users Baseline Sort by retrieval score Dummy Baseline Sort by relevance percentage Smart Baseline N docs into K clusters Dummy Adaptive Top cluster Smart Reranking Best relevant doc Dummy Reranking Top doc Smart Adaptive Best cluster

38 Progressive Personalization Definition and measurement of intrusiveness? –Definition: impact of personalization on retrieval performance –Tolerance of intrusiveness for different types of users (risky users vs. conservative users) –Impact of different context information on intrusiveness Propose algorithm to predict retrieval performance of personalized search? –Borrow the idea of predicting query performance, e.g., Croft [DELOS01]; Carmel et al. [SIGIR06] Do personalization in a progressive way –Minimize the maximal potential loss brought by personalization

39 Timeline July, 2006 – September, 2006 –Personalization in clustering result presentation in interactive IR October, 2006 – December, 2006 –Progressive personalization study January, 2007 – March, 2007 –Implementation of algorithms into UCAIR agent and evaluation April, 2007 – May, 2007 –Thesis write-up –Thesis defense in May

40 Publications [SIGIR05a] Xuehua Shen, Bin Tan, Chengxiang Zhai. Context-Sensitive Information Retrieval Using Implicit Feedback. Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. [SIGIR05b] Xuehua Shen, Chengxiang Zhai. Active Feedback in Ad-hoc Information Retrieval. Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. [CIKM05] Xuehua Shen, Bin Tan, Chengxiang Zhai. Implicit User Modeling for Personalized Search. Proceedings of the 14th ACM International Conference on Information and Knowledge Management. [SIGKDD06a] Bin Tan, Xuehua Shen and ChengXiang Zhai. Mining Long- term Search History to Improve Search Accuracy. (poster paper) Proceedings of the 2006 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. [SIGKDD06b] Dong Xin, Xuehua Shen, Qiaozhu Mei and Jiawei Han. Discovering Interesting Patterns Through User's Interactive Feedback. (poster paper). Proceedings of the 2006 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

41 Thank you ! The End

42 Modeling Interactive IR Model interactive IR as “action dialog”: cycles of user action (A i ) and system response (R i ) User action (A i )System response (R i ) Submit a new queryRetrieve new documents View a documentPresent selected document Rerank unseen documents

43 Retrieval Decisions User U: A 1 A 2 … … A t-1 A t System: R 1 R 2 … … R t-1 Given U, C, A t, and H, choose the best R t from all possible responses to A t History H={(A i,R i )} i=1, …, t-1 Document Collection C Query=“Jaguar” All possible rankings of C Best ranking for the query Click on “Next” button All possible rankings of unseen docs Best ranking of unseen docs R t  r(A t ) R t =?

44 Approximate the Bayes risk by the loss at the mode of the posterior distribution Two-step procedure –Step 1: Compute an updated user model M* based on the currently available information –Step 2: Given M*, choose a response to minimize the loss function A Simplified Two-Step Decision-Making Procedure

45 Fixed Coefficient Interpolation (FixInt) QkQk Q1Q1 Q k-1 … C1C1 C k-1 … Average user query history and clickthrough Linearly interpolate history models Linearly interpolate current query and history model

46 Bayesian Interpolation (BayesInt) Q1Q1 Q k-1 … C1C1 C k-1 … Average user query and clickthrough history Intuition: if the current query Q k is longer, we should trust Q k more QkQk Dirichlet Prior

47 Online Bayesian Update (OnlineUp) QkQk C2C2 Q1Q1 Intuition: continuous belief update about user information need Q2Q2 C1C1

48 Batch Bayesian Update (BatchUp) C1C1 C2C2 … C k-1 Intuition: clickthrough data may not decay QkQk Q1Q1 Q2Q2

49 Example of a Hard Topic 2 (283 relevant docs in 242918 documents) Acquisitions Document discusses a currently proposed acquisition involving a U.S. company and a foreign company. To be relevant, a document must discuss a currently proposed acquisition (which may or may not be identified by type, e.g., merger, buyout, leveraged buyout, hostile takeover, friendly acquisition). The suitor and target must be identified by name; the nationality of one of the companies must be identified as U.S. and the nationality of the other company must be identified as NOT U.S.

50 Performance of a Hard Topic Example Q1: acquisition u.s. foreign company MAP: 0.004; Pr@20: 0.000 Q2: acquisition merge takeover u.s. foreign company MAP: 0.026; Pr@20: 0.100 Q3: acquire merge foreign abroad international MAP: 0.004; Pr@20: 0.050 Q4: acquire merge takeover foreign european japan MAP: 0.027; Pr@20: 0.200

51 Sensitivity of BatchUp Parameters BatchUp is stable with different parameter settings Best performance is achieved when  =2.0; =15.0

52 Precision-Recall Curve

53 Document Selection in Relevance Feedback [SIGIR05b] Feedback Judgments: d 1 + d 2 - … d k - Query Retrieval System Top K Results d 1 3.5 d 2 2.4 … d k 0.5 User Document Collection

54 Progressive Personalization (cont.) Can we propose personalized search algorithm which guarantees the retrieval performance with a high probability? –Use different personalized search algorithm according to different impact of different context information –Use different personalized search algorithm according to different types of users

Nonintrusive Personalization in Interactive IR Xuehua Shen Department of Computer Science University of Illinois at Urbana-Champaign Thesis Committee:

Similar presentations

Presentation on theme: "Nonintrusive Personalization in Interactive IR Xuehua Shen Department of Computer Science University of Illinois at Urbana-Champaign Thesis Committee:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Nonintrusive Personalization in Interactive IR Xuehua Shen Department of Computer Science University of Illinois at Urbana-Champaign Thesis Committee:

Similar presentations

Presentation on theme: "Nonintrusive Personalization in Interactive IR Xuehua Shen Department of Computer Science University of Illinois at Urbana-Champaign Thesis Committee:"— Presentation transcript:

Similar presentations

About project

Feedback