Maximum Personalization: User-Centered Adaptive Information Retrieval ChengXiang (“Cheng”) Zhai Department of Computer Science Graduate School of Library.

Slides:

Advertisements

Similar presentations

ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.

Advertisements

Evaluating the Robustness of Learning from Implicit Feedback Filip Radlinski Thorsten Joachims Presentation by Dinesh Bhirud

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

CONTRIBUTIONS Ground-truth dataset Simulated search tasks environment Multiple everyday applications (MS Word, MS PowerPoint, Mozilla Browser) Implicit.

A Researcher’s Workbench in 2020: Intelligent Information Systems for Knowledge Synthesis and Discovery ChengXiang (“Cheng”) Zhai Department of Computer.

Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.

1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,

Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:

Optimizing search engines using clickthrough data

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 龙星计划课程 : 信息检索 Personalized Search & User Modeling ChengXiang Zhai.

1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Introduction to IR Research ChengXiang Zhai Department of Computer.

Information Retrieval Models: Probabilistic Models

Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.

1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.

Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Scalable Text Mining with Sparse Generative Models

Putting Query Representation and Understanding in Context: ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign A.

Overview of Search Engines

Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.

CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Pick a Good IR Research Problem ChengXiang Zhai Department of Computer.

Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.

Personalization of the Digital Library Experience: Progress and Prospects Nicholas J. Belkin Rutgers University, USA

1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.

©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.

Chapter 1 Introduction to Data Mining

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.

Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Frame an IR Research Problem and Form Hypotheses ChengXiang Zhai Department.

Comparative Text Mining Q. Mei, C. Liu, H. Su, A. Velivelli, B. Yu, C. Zhai DAIS The Database and Information Systems Laboratory. at The University of.

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Personalized Search Xiao Liu

Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.

UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI

ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Implicit User Modeling for Personalized Search Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.

Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.

Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.

Automatic Labeling of Multinomial Topic Models

Relevance Feedback Hongning Wang

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

2005/09/13 A Probabilistic Model for Retrospective News Event Detection Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma University of Science and Technology.

Automatic Labeling of Multinomial Topic Models Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai DAIS The Database and Information Systems Laboratory.

Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.

Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.

Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining KDD’05, August 21–24, 2005, Chicago, Illinois, USA. Qiaozhu Mei.

Nonintrusive Personalization in Interactive IR Xuehua Shen Department of Computer Science University of Illinois at Urbana-Champaign Thesis Committee:

Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.

A Study of Poisson Query Generation Model for Information Retrieval

Context-Sensitive IR using Implicit Feedback Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.

哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.

Introduction to IR Research

A Researcher’s Workbench in 2020: Intelligent Information Systems for Knowledge Synthesis and Discovery ChengXiang (“Cheng”) Zhai Department of Computer.

John Lafferty, Chengxiang Zhai School of Computer Science

Topic Models in Text Processing

Presentation transcript:

Maximum Personalization: User-Centered Adaptive Information Retrieval ChengXiang (“Cheng”) Zhai Department of Computer Science Graduate School of Library & Information Science Department of Statistics Institute for Genomic Biology University of Illinois at Urbana-Champaign 1Keynote, AIRS 2010, Taipei, Dec. 2, 2010

Happy Users Keynote, AIRS 2010, Taipei, Dec. 2,

Sad Users Keynote, AIRS 2010, Taipei, Dec. 2, They’ve got to know the users better! I work on information retrieval; I searched for similar pages last week; I clicked on AIRS-related pages (including keynote); … How can search engines better help these users?

Keynote, AIRS 2010, Taipei, Dec. 2, Current Search Engines are Document-Centered Documents “airs” Search Engine “airs”... It’s hard for a search engine to know everyone well!

Keynote, AIRS 2010, Taipei, Dec. 2, To maximize personalization, we must put a user in the center! Search Engine “airs”... Personalized search agent WEB Search Engine Viewed Web pages Query History Search Engine Desktop Files Personalized search agent “airs” A search agent knows about a particular user very well

Keynote, AIRS 2010, Taipei, Dec. 2, User-Centered Adaptive IR (UCAIR) A novel retrieval strategy emphasizing – user modeling (“user-centered”) – search context modeling (“adaptive”) – interactive retrieval Implemented as a personalized search agent that –sits on the client-side (owned by the user) –integrates information around a user (1 user vs. N sources as opposed to 1 source vs. N users) –collaborates with each other –goes beyond search toward task support

Much work has been done on personalization Keynote, AIRS 2010, Taipei, Dec. 2, Personalized data collection: Haystack [Adar & Karger 99], MyLifeBit [Gemmell et al. 02], Stuff I’ve Seen [Dumais et al. 03], Total Recall [Cheng et al. 04], Google desktop search, Microsoft desktop search Server-side personalization: My Yahoo! [Manber et al. 00], Personalized Google Search Capturing user information & search context: SearchPad [Bharat 00], Watson [Budzik & Hammond 00], Intellizap [Finkelstein et al. 01], Understanding clickthrough data [Joachmis et al. 05] Implicit feedback: SVM [Joachims 02], BM25 [Teevan et al. 05], Language models [Shen et al. 05] However, we are far from unleashing the full power of personalization

UCAIR is unique in emphasizing maximum exploitation of client-side personalization Keynote, AIRS 2010, Taipei, Dec. 2, Benefit of client-side personalization More information about the user, thus more accurate user modeling –Can exploit the complete interaction history (e.g., can easily capture all click-through information and navigation activities) –Can exploit user’s other activities (e.g., searching immediately after reading an ) Naturally scalable Alleviate the problem of privacy Can potentially maximize benefit of personalization

Maximum Personalization = Maximum User Information  Maximum Exploitation of User Info.  Client-Side Agent  (Frequent + Optimal) Adaptation Keynote, AIRS 2010, Taipei, Dec. 2, 20109

Examples of Useful User Information Textual information –Current query –Previous queries in the same search session –Past queries in the entire search history Clicking activities –Skipped documents –Viewed/clicked documents –Navigation traces on non-search results –Dwelling time –Scrolling Search context –Time, location, task, … Keynote, AIRS 2010, Taipei, Dec. 2,

Examples of Adaptation Query formulation –Query completion: provide assistance while a user enters a query –Query suggestion: suggest useful related queries –Automatic generation of queries: proactive recommendation Dynamic re-ranking of unseen documents –As a user clicks on the “back” button –As a user scrolls down on a result list –As a user clicks on the “next” button to view more results Adaptive presentation/summarization of search results Adaptive display of a document: display the most relevant part of a document Keynote, AIRS 2010, Taipei, Dec. 2,

Keynote, AIRS 2010, Taipei, Dec. 2, Challenges for UCAIR General: how to obtain maximum personalization without requiring extra user effort? Specific challenges –What’s an appropriate retrieval framework for UCAIR? –How do we optimize retrieval performance in interactive retrieval? –How can we capture and manage all user information? –How can we develop robust and accurate retrieval models to maximally exploit user information and search context? –How do we evaluate UCAIR methods? –…

Keynote, AIRS 2010, Taipei, Dec. 2, The Rest of the Talk Part I: A decision-theoretic framework for UCAIR Part II: Algorithms for personalized search –Optimize initial document ranking –Dynamic re-ranking of search results –Personalize search result presentation Part III: Summary and open challenges

Keynote, AIRS 2010, Taipei, Dec. 2, Part I A Decision-Theoretic Framework for UCAIR

Keynote, AIRS 2010, Taipei, Dec. 2, IR as Sequential Decision Making UserSystem A 1 : Enter a query Which documents to present? How to present them? R i : results (i=1, 2, 3, …) Which documents to view? A 2 : View document Which part of the document to show? How? R’: Document content View more? A 3 : Click on “Back” button (Information Need) (Model of Information Need)

Keynote, AIRS 2010, Taipei, Dec. 2, Retrieval Decisions User U: A 1 A 2 … … A t-1 A t System: R 1 R 2 … … R t-1 Given U, C, A t, and H, choose the best R t from all possible responses to A t History H={(A i,R i )} i=1, …, t-1 Document Collection C Query=“Jaguar” All possible rankings of C The best ranking for the query Click on “Next” button All possible rankings of unseen docs The best ranking of unseen docs R t  r(A t ) R t =?

Keynote, AIRS 2010, Taipei, Dec. 2, A Risk Minimization Framework User: U Interaction history: H Current user action: A t Document collection: C Observed All possible responses: r(A t )={r 1, …, r n } User Model M=(S,  U,… ) Seen docs Information need L(r i,A t,M)Loss Function Optimal response: r* (minimum loss) ObservedInferred Bayes risk

Keynote, AIRS 2010, Taipei, Dec. 2, Approximate the Bayes risk by the loss at the mode of the posterior distribution Two-step procedure –Step 1: Compute an updated user model M* based on the currently available information –Step 2: Given M*, choose a response to minimize the loss function A Simplified Two-Step Decision-Making Procedure

Keynote, AIRS 2010, Taipei, Dec. 2, Optimal Interactive Retrieval User A1A1 UC M* 1 P(M 1 |U,H,A 1,C) L(r,A 1,M* 1 ) R1R1 A2A2 L(r,A 2,M* 2 ) R2R2 M* 2 P(M 2 |U,H,A 2,C) A3A3 … Collection IR system Many possible actions: -type in a query character - scroll down a page - click on any button -… Many possible responses: -query completion -display relevant passage -recommendation -clarification -…

Keynote, AIRS 2010, Taipei, Dec. 2, Refinement of Risk Minimization r(A t ): decision space (A t dependent) –r(A t ) = all possible rankings of docs in C –r(A t ) = all possible rankings of unseen docs –r(A t ) = all possible summarization strategies –r(A t ) = all possible ways to diversify top-ranked documents M: user model –Essential component:  U = user information need –S = seen documents –n = “Topic is new to the user”; r=“reading level of user” L(R t,A t,M): loss function –Generally measures the utility of R t for a user modeled as M –Often encodes retrieval criteria, but may also capture other preferences P(M|U, H, A t, C): user model inference –Often involves estimating the unigram language model  U –May involve inference of other variables also (e.g., readability, tolerance of redundancy)

Keynote, AIRS 2010, Taipei, Dec. 2, Case 1: Context-Insensitive IR –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –p(M|U,H,At,C)=p(  U |Q)

Keynote, AIRS 2010, Taipei, Dec. 2, Case 2: Implicit Feedback –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –H={previous queries} + {viewed snippets} –p(M|U,H,At,C)=p(  U |Q,H)

Keynote, AIRS 2010, Taipei, Dec. 2, Case 3: General Implicit Feedback –A t =“enter a query Q” or “Back” button, “Next” button –r(A t ) = all possible rankings of unseen docs in C –M= (  U, S), S= seen documents –H={previous queries} + {viewed snippets} –p(M|U,H,At,C)=p(  U |Q,H)

Keynote, AIRS 2010, Taipei, Dec. 2, Case 4: User-Specific Result Summary –A t =“enter a query Q” –r(A t ) = {(D,  )}, D  C, |D|=k,  {“snippet”,”overview”} –M= (  U, n), n  {0,1} “topic is new to the user” –p(M|U,H,At,C)=p(  U, n|Q,H), M*=(  *, n*) n*=1n*=0  i =snippet 10  i =overview 01 Choose k most relevant docs If a new topic (n*=1), give an overview summary; otherwise, a regular snippet summary

Keynote, AIRS 2010, Taipei, Dec. 2, Part II. Algorithms for personalized search - Optimize initial document ranking - Dynamic re-ranking of search results - Personalize search result presentation

Scenario 1: After a user types in a query, how to exploit long-term search history to optimize initial results? Keynote, AIRS 2010, Taipei, Dec. 2,

Keynote, AIRS 2010, Taipei, Dec. 2, Case 2: Implicit Feedback –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –H={previous queries} + {viewed snippets} –p(M|U,H,At,C)=p(  U |Q,H)

28 Long-term Implicit Feedback from Personal Search Log Search interests: user interested in X (champaign, luxury car) consistent & distinct Most useful for ambiguous queries Search preferences: For Y, user prefers X quotes → newcars.com Most useful for recurring queries session query champaign map query jaguar query champaign jaguar click champaign.il.auto.com query jaguar quotes click newcars.com query yahoo mail query jaguar quotes click newcars.com noise recurring query avg 80 queries / mo Keynote, AIRS 2010, Taipei, Dec. 2, 2010

29 Estimate Query Language Model using the Entire Search History q1D1C1q1D1C1 S1S1 θS1θS1 q2D2C2q2D2C2 S2S2 θS2θS2... q t-1 D t-1 C t-1 S t-1 θ S t-1 θHθH qtDtqtDt StSt θqθq θ q,H λ1?λ1? λ2?λ2? λq?λq? How can we optimize λ k and λ q ? -Need to distinguish informative/noisy past searches -Need to distinguish queries with strong vs. weak support from history 1-λ q λ t-1 ? Keynote, AIRS 2010, Taipei, Dec. 2, 2010

30 Adaptive Weighting with Mixture Model [Tan et al. 06] θS1θS1 θS2θS2 θ S t-1... θHθH θ q,H λ1λ1 λ2λ2 λ t-1 λqλq θBθB 1-λ q θqθq λBλB 1-λ B jaguar car official site racing jaguar is a big cat... local jaguar dealer in champaign... query past jaguar searches past champaign searches background θ mix select {λ} to maximize P(D t | θ mix ) DtDt EM algorithm Keynote, AIRS 2010, Taipei, Dec. 2, 2010

31 Sample Results: improving initial ranking with long-term implicit feedback recurring ≫ fresh combination ≈ clickthrough > docs > query, contextless Keynote, AIRS 2010, Taipei, Dec. 2, 2010

Scenario 2: The user is examining search results, how can we further dynamically optimize search results based on clickthroughs? Keynote, AIRS 2010, Taipei, Dec. 2,

Keynote, AIRS 2010, Taipei, Dec. 2, Case 3: General Implicit Feedback –A t =“enter a query Q” or “Back” button, “Next” button –r(A t ) = all possible rankings of unseen docs in C –M= (  U, S), S= seen documents –H={previous queries} + {viewed snippets} –p(M|U,H,At,C)=p(  U |Q,H)

Keynote, AIRS 2010, Taipei, Dec. 2, Estimate a Context-Sensitive LM Q2Q2 C 2 ={C 2,1, C 2,2,C 2,3, … } … C 1 ={C 1,1, C 1,2,C 1,3, …} User Clickthrough QkQk Q1Q1 User Query e.g., Apple software e.g., Apple - Mac OS X Apple - Mac OS X The Apple Mac OS X product page. Describ es features in the current version of Mac OS X, … e.g., Jaguar User Model: Query HistoryClickthrough

Keynote, AIRS 2010, Taipei, Dec. 2, Method1: Fixed Coeff. Interpolation (FixInt) QkQk Q1Q1 Q k-1 … C1C1 C k-1 … Average user query history and clickthrough Linearly interpolate history models Linearly interpolate current query and history model

Keynote, AIRS 2010, Taipei, Dec. 2, Method 2: Bayesian Interpolation (BayesInt) Q1Q1 Q k-1 … C1C1 C k-1 … Average user query and clickthrough history Intuition: trust the current query Q k more if it’s longer QkQk Dirichlet Prior

Keynote, AIRS 2010, Taipei, Dec. 2, Method 3: Online Bayesian Updating (OnlineUp) QkQk C2C2 Q1Q1 Intuition: incremental updating of the language model C1C1 Q2Q2

Keynote, AIRS 2010, Taipei, Dec. 2, Method 4: Batch Bayesian Update (BatchUp) C2C2 … C k-1 Intuition: all clickthrough data are equally useful QkQk Q1Q1 C1C1 Q2Q2

Keynote, AIRS 2010, Taipei, Dec. 2, Overall Effect of Search Context [Shen et al. 05b] Query FixInt (  =0.1,  =1.0) BayesInt (  =0.2, =5.0) OnlineUp (  =5.0, =15.0) BatchUp (  =2.0, =15.0) Q3Q Q 3 +H Q +H C Improve 72.4%32.6%93.8%39.4%67.7%20.2%92.4%39.4% Q4Q Q 4 +H Q +H C Improve 66.2%15.5%78.2%19.9%47.8%6.9%77.2%16.4% Short-term context helps system improve retrieval accuracy BayesInt better than FixInt; BatchUp better than OnlineUp

Keynote, AIRS 2010, Taipei, Dec. 2, Using Clickthrough Data Only Q3Q Q 3 +H C Improve81.9%37.1% Q4Q Q 4 +H C Improve72.6%18.1% BayesInt (  =0.0, =5.0) Clickthrough is the major contributor 13.9% 67.2%Improve Q 4 +H C Q4Q4 42.4%99.7%Improve Q 3 +H C Q3Q3 Performance on unseen docs -4.1%15.7%Improve Q 4 +H C Q4Q4 23.0%23.8%Improve Q 3 +H C Q3Q3 Snippets for non-relevant docs are still useful!

Keynote, AIRS 2010, Taipei, Dec. 2, UCAIR Outperforms Google: PR Curve

Scenario 3: The user has not viewed any document on the first result page and is now clicking on “Next” to view more: how can we optimize the search results on the next page? Keynote, AIRS 2010, Taipei, Dec. 2,

Problem Formulation Query: Q Collection C 1 st page Results L1L2…LfL1L2…Lf Search Engine N 2 nd page L f+1 L f+2 … L f+r How to rerank these unseen docs? … 101 st page U Seen, Negative Unseen, To be Reranked 43 Keynote, AIRS 2010, Taipei, Dec. 2, 2010

Strategy I: Query Modification Q Q new Q N = {L 1, …, L 10 } D 11 D 12 D 13 D 14 D 15 … D 1010 D’ 11 D’ 12 D’ 13 D’ 14 D’ 15 … D’ 1010 parameter 44 Keynote, AIRS 2010, Taipei, Dec. 2, 2010

Strategy II: Score Combination D D D D D … D D D D D D … D D’ D’ D’ D’ D’ … D’ Q Q neg parameter 45 Keynote, AIRS 2010, Taipei, Dec. 2, 2010

Multiple Negative Models Negative feedback examples may be quite diverse –They may distract in totally different ways –A single negative model is not optimal Multiple negative models –Learn multiple models from N –Score function for negative query F: aggregation function Q Q 1 neg Q 2 neg Q 3 neg Q 4 neg Q 5 neg Q 6 neg 46 Keynote, AIRS 2010, Taipei, Dec. 2, 2010

Effectiveness of Negative Feedback [Wang et al. 08] MAPGMAPMAPGMAP ROBUST+LMROBUST+VSM OriginalRank SingleQuery SingleNeg SingleNeg MultiNeg MultiNeg GOV+LMGOV+VSM OriginalRank SingleQuery SingleNeg SingleNeg MultiNeg MultiNeg Keynote, AIRS 2010, Taipei, Dec. 2, 2010

Scenario 4:Can we leverage user interaction history to personalize result presentation? Keynote, AIRS 2010, Taipei, Dec. 2,

Keynote, AIRS 2010, Taipei, Dec. 2, Need for User-Specific Summaries Such a snippet summary may be fine for a user who knows about the topic But for a user who hasn’t been tracking the news, a theme-based overview summary may be more useful Query = “Asian tsunami”

Keynote, AIRS 2010, Taipei, Dec. 2, A Theme Overview Summary (Asia Tsunami) Immediate Reports Statistics of Death and loss Personal Experience of Survivors Statistics of further impact Aid from Local Areas Aid from the world Donations from countries Specific Events of Aid … … Lessons from TsunamiResearch inspired Time Doc1 Doc3 Doc.. Theme Evolutionary transitions Theme evolution thread

Keynote, AIRS 2010, Taipei, Dec. 2, Risk Minimization for User-Specific Summary –A t =“enter a query Q” –r(A t ) = {(D,  )}, D  C, |D|=k,  {“snippet”, “overview”} –M= (  U, n), n  {0,1} “topic is new to the user” –p(M|U,H,At,C)=p(  U,n|Q,H), M*=(  *, n*) n*=1n*=0  i =snippet 10  i =overview 01 Task 1 = Estimating n*: p(n=1)  p(Q|H) Task 2 = Generating an overview summary

Keynote, AIRS 2010, Taipei, Dec. 2, General problem definition:  Given a text collection with time stamps  Extract a theme evolution graph  Model the life cycles of the most salient themes Temporal Theme Mining for Generating Overview News Summaries Time Theme1.1 T1T1 T2T2 TnTn … Theme1.2 … Theme2.1 Theme2.2 … Theme3.1 Theme3.2 … … T 1 T 2 … T n Theme A Theme B Theme life cycles Theme evolution graph

Keynote, AIRS 2010, Taipei, Dec. 2, A Topic Modeling Approach [Mei & Zhai 06]t  11  12  13  21  22  31  3k Partitioning Theme Evolution Graph Extracting global salient themes (mixture model) …… θ1θ1θ1θ1 θ2θ2θ2θ2 θ3θ3θ3θ3 B……(HMM) Decoding Collection s t Theme Life cycles t Theme extraction (mixture models) … Collection with time stamps Model theme transitions (KL div) Computing Theme Strength t1 t2 t3, …, t

Keynote, AIRS 2010, Taipei, Dec. 2, Task I: Theme Extraction There are k themes in the collection (or a time span), each document is a sample of words generated by multiple themes Infer the best theme language models that fit our data Theme  1 Theme  k Theme  2 … Background B warning 0.3 system Aid 0.1 donation 0.05 support statistics 0.2 loss 0.1 dead Is 0.05 the 0.04 a Document d kk 11 22 B B W  d,1  d, k 1 - B  d,2 “Generating” word w in doc d in the collection ? ? ? ? ? ? Parameters: B =noise-level (manually set)  ’s and  ’s are estimated with Maximum Likelihood

Keynote, AIRS 2010, Taipei, Dec. 2, Task II: Transition Modeling Theme spans in an earlier time interval could evolve into theme spans in a later time interval Tt1…t2 A C ? B ? microarray 0.2 gene 0.1 protein 0.05 web 0.3 classification 0.1 topic 0.1 Information 0.2 topic 0.1 classif ication 0.1 text 0.05 Evolutionary Trans ition Theme similarity = Similarity between two theme spans is modeled with KL Divergence between two distributions

Keynote, AIRS 2010, Taipei, Dec. 2, Task III: Theme Segmentation View the whole collection as a sequence ordered by time, Model the theme shifts in documents with a Hidden Markov Model Decoding C ollection Theme  1 Theme  3 Theme  2 Background …… The Collection θ1θ1θ1θ1 θ2θ2θ2θ2 θ3θ3θ3θ3 B output probabilityP (w|θ)= Train transition probabilities wwwwwwwwwwwwwwwwwww

Keynote, AIRS 2010, Taipei, Dec. 2, Theme Evolution Graph: Tsunami T aid relief U.S military U.N … Bush U.S $ relief million … Indonesian 0.01 military 0.01 islands foreign aid … system Bush warning conference US … system China warning Chinese … warning system Islands Japan quake … ………… ………… ………… 12/28/04 01/05/0501/15/05… …

Keynote, AIRS 2010, Taipei, Dec. 2, Theme Life Cycles: Tsunami Aid from the world $ million relief aid U.N … Personal experiences I wave beach saw sea … CNN, Absolute Strength

Keynote, AIRS 2010, Taipei, Dec. 2, Theme Life Cycles: Tsunami Aid from the world Research Aid from China statistics Scene and Experiences dollars million aid U.N reconstruction … China yuan Beijing $ donation … XINHUA News, Absolute Strength XINHUA News, Absolute Strength

Keynote, AIRS 2010, Taipei, Dec. 2, Theme Life Cycles: Tsunami Aid from the world Research Aid from China statistics Scene and Experiences $ million relief aid U.N … China yuan Beijing $ donation … XINHUA News, Normalized Strength

Keynote, AIRS 2010, Taipei, Dec. 2, Theme Evolution Graph: KDD T SVM criteria classifica – ti on linear … decision tree classifier class Bayes … Classifica - tion text unlabeled document labeled learning … Informa - tion web social retrieval distance networks … ………… 1999 … web classifica –tio n features0.006 topic … mixture random cluster clustering variables … topic mixture LDA semantic … …

Keynote, AIRS 2010, Taipei, Dec. 2, Theme Life Cycles: KDD Global Themes life cycles of KDD Abstracts gene expressions probability microarray … marketing customer model business … rules association support …

The UCAIR Prototype System Keynote, AIRS 2010, Taipei, Dec. 2, A client-side search agent Talks to any browser (both Firefox and IE)

UCAIR Screen Shots: Immediate Implicit Feedback Keynote, AIRS 2010, Taipei, Dec. 2, Standard mode Adaptive mode

Screen Shots of UCAIR System: query =“airs accommodation” Keynote, AIRS 2010, Taipei, Dec. 2, Adaptive mode Standard mode

Screen Shots of UCAIR: “airs regisgtration” Keynote, AIRS 2010, Taipei, Dec. 2, Adaptive mode Standard mode

Part III. Summary and Open Challenges Keynote, AIRS 2010, Taipei, Dec. 2,

Keynote, AIRS 2010, Taipei, Dec. 2, Summary One doesn’t fit all; each user needs his/her own search agent (especially important for long-tail search) User-centered adaptive IR (UCAIR) emphasizes –Collecting maximum amount of user information and search context –Formal models of user information needs and other user status variables –Information integration –Optimizing every response in interactive IR, thus potentially maximizing the effectiveness Preliminary results show that –Implicit user modeling can improve search accuracy in many different ways

Keynote, AIRS 2010, Taipei, Dec. 2, Open Challenges Formal user models –More in-depth analysis of user behavior (e.g., why did the user drop a query word and add it again later?) –Exploit more implicit feedback clues (e.g., dwelling time-based language model) –Collaborative user modeling (e.g., smoothing of user model) Context-sensitive retrieval models based on appropriate loss functions –Optimize long-term utility in interactive retrieval (e.g., active feedback, exploration-exploitation tradeoff, incorporation of Fuhr’s interactive retrieval model) –Robust and non-intrusive adaptation (e.g., considering confidence of adaptation) UCAIR system extension –Right architecture: client+server? P2P? –Design of novel interface to facilitate acquisition of user info –Beyond search to support querying+browsing+recommendation

Final Goal: A unified personal intelligent information agent Keynote, AIRS 2010, Taipei, Dec. 2, WWW E-COM Blog Sports Literature IM Desktop Intranet … User Profile Intelligent Adaptation Proactive Info Service Frequently Accessed Info Security Handler Task Support …

Acknowledgments Collaborators: Xuehua Shen, Bin Tan, Maryam Karimzadehgan, Qiaozhu Mei, Xuanhui Wang, Hui Fang, and other TIMAN group members Funding 71 Keynote, AIRS 2010, Taipei, Dec. 2, 2010

References Xuehua Shen, Bin Tan, and ChengXiang Zhai, Implicit User Modeling for Personalized Search, In Proceedings of the 14th ACM International Conference on Information and Knowledge Management ( CIKM'05), pages Xuehua Shen, Bin Tan, ChengXiang Zhai, Context-Sensitive Information Retrieval with Implicit Feedback, Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'05), 43-50, Bin Tan, Xuehua Shen, ChengXiang Zhai, Mining long-term search history to improve search accuracy, Proceedings of the 2006 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (KDD'06 ), pages Xuanhui Wang, Hui Fang, ChengXiang Zhai. A study of methods for negative relevance feedback, Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'08 ), pages Qiaozhu Mei, ChengXiang Zhai, Discovering Evolutionary Theme Patterns from Text -- An Exploration of Temporal Text Mining, Proceedings of the 2005 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (KDD'05 ), pages , Maryam Karimzadehgan, ChengXiang Zhai: Exploration-exploitation tradeoff in interactive relevance feedback. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management ( CIKM‘10), pages Norbert Fuhr: A probability ranking principle for interactive information retrieval. Information Retrieval 11(3): (2008) Keynote, AIRS 2010, Taipei, Dec. 2,