PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:

Slides:



Advertisements
Similar presentations
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Advertisements

Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.
Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.
Sentiment Diversification with Different Biases Date : 2014/04/29 Source : SIGIR’13 Advisor : Prof. Jia-Ling, Koh Speaker : Wei, Chang 1.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Evaluating Search Engine
Personalized Search Result Diversification via Structured Learning
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Modern Retrieval Evaluations Hongning Wang
On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Chapter 23: Probabilistic Language Models April 13, 2004.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
1 Blog site search using resource selection 2008 ACM CIKM Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Diversifying Search Results Rakesh AgrawalSreenivas GollapudiSearch LabsMicrosoft Research Alan HalversonSamuel.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Web Search Personalization with Ontological User Profile Advisor: Dr. Jai-Ling Koh Speaker: Shun-hong Sie.
DivQ: Diversification for Keyword Search over Structured Databases Elena Demidova, Peter Fankhauser, Xuan Zhou and Wolfgang Nejfl L3S Research Center,
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Search Result Diversification in Resource Selection for Federated Search Date : 2014/06/17 Author : Dzung Hong, Luo Si Source : SIGIR’13 Advisor: Jia-ling.
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,
QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.
Compact Query Term Selection Using Topically Related Text
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
INF 141: Information Retrieval
Ranking using Multiple Document Types in Desktop Search
Preference Based Evaluation Measures for Novelty and Diversity
Presentation transcript:

PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker: Shun-Chen, Cheng

Outline  Introduction  Personalized Diversity IA-Select 、 xQuAD Personalized IA-Select Personalized xQuAD  Evaluation  Experiment Results  Conclusions

Introduction Search Personalization: adapt the search result to a specific aspect that may interest the user …… ……. ….. ……. …… Query Ranking with similarity between Query and result list …… ……. ….. ……. …… Result list Ranked Result list

Introduction Diversification: regard multiple aspects in order to maximize the probability that some query aspect is relevant to the user Query …… ……. ….. ……. …… Result list c1 c2 c3 Clustering Clustered Result list

Introduction Goal : we question this antagonistic view, and hypothesize that these two directions may in fact be effectively combined and enhance each other.

Introduction

Outline  Introduction  Personalized Diversity IA-Select 、 xQuAD Personalized IA-Select Personalized xQuAD  Evaluation  Experiment Results  Conclusions

IA-Select 、 xQuAD using an explicit representation of query intents for diversification. IA-Select: xQuAD(eXplicit Query Aspect Diversification):

Personalized IA-Select A personalized search system: p(q|d,u) The personalized query aspect distribution: p(c|q,u) The personalized aspect distribution over documents: p(c|d,u)  p(q|d,u) = Position of document d in the order induced by the retrieval system scores s(d,q) for d ∈ R q assume q and u are conditionally independent given a document

 p(c|d,u) assume conditional independence between documents and users given a query aspect assume conditional independence between aspects and users given a document.

w : a tag in the folksonomy(Delicious) tf(w,u) :the number of times a user used the tag in their profile bookmark annotations. tf(w,d) :number of times a tag was used (by any user) to annotate a document. Δ = document collection 1. User preference model by an adaption of the BM25 probabilistic model : iuf(w) : the inverse user frequency of term w in the set of users. |u| : the size of the user profile calculated as Σ w tf(w,u). b = 0.75 k 1 = 2 Two ways to calculate p(d|u): 2.

 p(c|q,u) A convenient one is to develop p(c|q,u) by marginalizing over the set of documents, because it allows taking advantage of the computation of the two previous top-level components in equations 1 and 2 assume the conditional independence of query aspects and queries given a user and a document.

Personalized xQuAD The personalized search system: p(q|d,u) The personalized query aspect distribution: p(c|q,u) The personalized, aspect-dependent document distribution: p(d|c,u)  p(d|c,u) P(c|d): by Textwise ODP classification service. It returns up to three possible ODP classifications for a document, ranked by a score in [0,1] that reflects the degree of confidence on the classification. assumed documents and users are conditionally independent given a query aspect.

Outline  Introduction  Personalized Diversity IA-Select 、 xQuAD Personalized IA-Select Personalized xQuAD  Evaluation  Experiment Results  Conclusions

Evaluation Crowdsourcing service :Amazon mechanical turk, Crowdflower Data set : Delicious Assessment collection : four weeks Tested user number : 35 users for a total amount of 180 topics and 3,800 individual results. randomly generated an equal amount of topics of size K = 1 and K = 2 top P = 5

Evaluation interactive evaluation interface

Evaluation Q1 (user): how relevant is the result to the user’s interests. Q2 (topic): how relevant is the result to the evaluated topic. Q3 (subtopic): workers assign each result to a specific subtopic related to the evaluated topic. Q1 measuring the accuracy of the evaluated approaches with respect to the user interest. Q2 : a successful reordering technique will place results high that are assessed as both relevant to the topic and to the user’s interests.

Outline  Introduction  Personalized Diversity IA-Select 、 xQuAD Personalized IA-Select Personalized xQuAD  Evaluation  Experiment Results  Conclusions

Experiment Results Nine different approaches : Baseline IA-Select xQuAD plain personalized search approach based on social tagging profiles and BM25 (PersBM25) xQuAD BM25 PIA-Select (probabilistic calculation of p(d|u)) PIA-Select BM25 (BM25 of p(d|u)) PxQuAD PxQuAD BM25

Experiment Results to evaluate for diversity : the intent aware version of expected reciprocal rank (ERR-IA), α-nDCG, and subtopic recall (S-recall) for accuracy : nDCG and precision

α-nDCG C1-1C1-2C1-3 D1 D2 D3 D4 α = 0.5

IG: ICG: IDCG: α-nDCG:

Subtopic recall(S-recall) s1,s2,s3,s4,s5,s6,s7,s8,s9,s10 topic T with n A subtopics subtopics(d i ) be the set of subtopics to which d i is relevant. T S-recall(1) = 3/10 S-recall(2) = 5/10 S-recall(3) = 7/10 S-recall(4) = 9/10 Subtopics(di) D1s1,s3,s10 D2s3,s4,s6 D3s2,s5 D4s2,s7,s9

Diversity metric values for the evaluated approaches Bold : the best for each metric. Underlined : a statistically significant difference with respect to the baseline Double underlined : a statistical significance with respect xQuAD (Wilcoxon, p < 0.05). PxQuADBM25 has a significantly better performance than the baseline and plain diversification approaches in terms of ERR-IA and a negative effect of the probabilistic estimate of the personalized factor on the overall behavior of the PIA-Select algorithm.

Accuracy metrics for evaluated approaches User relevance : PersBM25,appears to be on par with PxQuADBM25 Topic relevance : PersBM25 underperforms the baseline, while PxQuADBM25 improves the baseline to this regard, with statistical significance.

Outline  Introduction  Personalized Diversity IA-Select 、 xQuAD Personalized IA-Select Personalized xQuAD  Evaluation  Experiment Results  Conclusions

Conclusions have presented a number of approaches that combine both personalization and diversification components investigating the introduction of the user as an explicit random variable in two state of the art diversification models: IA-Select and xQuAD Achieving statistically significant improvements over the baselines that range between 3%-11% in terms accuracy values, and between 3%-8% in terms of diversity values.