Diversifying Search Results Rakesh AgrawalSreenivas GollapudiSearch LabsMicrosoft Research Alan HalversonSamuel.

Slides:



Advertisements
Similar presentations
Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
Advertisements

A Framework for Result Diversification
Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.
Term Level Search Result Diversification DATE : 2013/09/11 SOURCE : SIGIR’13 AUTHORS : VAN DANG, W. BRUCE CROFT ADVISOR : DR.JIA-LING, KOH SPEAKER : SHUN-CHEN,
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Vote Calibration in Community Question-Answering Systems Bee-Chung Chen (LinkedIn), Anirban Dasgupta (Yahoo! Labs), Xuanhui Wang (Facebook), Jie Yang (Google)
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Personalized Search Result Diversification via Structured Learning
Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Text-Based Content Search and Retrieval in ad hoc P2P Communities Francisco Matias Cuenca-Acuna Thu D. Nguyen
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Relevance Feedback Users learning how to modify queries Response list must have least some relevant documents Relevance feedback `correcting' the ranks.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
HeteroPar 2013 Optimization of a Cloud Resource Management Problem from a Consumer Perspective Rafaelli de C. Coutinho, Lucia M. A. Drummond and Yuri Frota.
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.
Active Learning for Class Imbalance Problem
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
An Analysis of Assessor Behavior in Crowdsourced Preference Judgments Dongqing Zhu and Ben Carterette University of Delaware.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Tag Data and Personalized Information Retrieval 1.
A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez.
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint.
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu Yu Kang, Yangfan Zhou, Zibin Zheng, and Michael R. Lyu {ykang,yfzhou,
Performance Measurement. 2 Testing Environment.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Post-Ranking query suggestion by diversifying search Chao Wang.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
A New Algorithm for Inferring User Search Goals with Feedback Sessions.
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Search Result Diversification in Resource Selection for Federated Search Date : 2014/06/17 Author : Dzung Hong, Luo Si Source : SIGIR’13 Advisor: Jia-ling.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
1 Learning to Impress in Sponsored Search Xin Supervisors: Prof. King and Prof. Lyu.
Agenda  INTRODUCTION  GENETIC ALGORITHMS  GENETIC ALGORITHMS FOR EXPLORING QUERY SPACE  SYSTEM ARCHITECTURE  THE EFFECT OF DIFFERENT MUTATION RATES.
1 Context-Aware Ranking in Web Search (SIGIR 10’) Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, Hang Li 2010/10/26.
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,
Click Through Rate Prediction for Local Search Results
Query Caching in Agent-based Distributed Information Retrieval
Learning to Rank with Ties
Presentation transcript:

Diversifying Search Results Rakesh AgrawalSreenivas GollapudiSearch LabsMicrosoft Research Alan HalversonSamuel IeongSearch LabsMicrosoft Research WSDM ’09

Outline Introduction Problem Formulation A Greedy Algorithm for DIVERSIFY(K) Performance Metrics Evaluation Conclusions

Introduction Minimize the risk of dissatisfaction of the average user Assume that there exist – a taxonomy – a model user intents Consider both the relevance of the documents and diversity of the search result Tradeoff relevance and diversity

Problem Formulation The number of results to show for each category according to the percentage of users interested in that category may perform poorly Example : Flash – Technology : 0.6

Problem Formulation Non-order Our algorithm is also designed to generate an ordering of results rather than just a set of results

Problem Formulation DIVERSIFY(k) is NP-hard Optimal for DIVERSIFY (k-1) need not be a subset of documents optimal for DIVERSIFY (k) Example : p(c 1 |q)=p(c 2 |q)=0.5 DIVERSIFY(1):d 1,d 2,d 3 DIVERSIFY(2):d 2,d 3,d 1

A Greedy Algorithm for DIVERSIFY(K)

Performance Metrics NDCG,MRR,MAP do not take into account the value of diversification Intent Aware Measure example: p(c 2 |q)>>p(c 1 |q) d 1 is Excellent for c 1 (but unrelated to c 2 ) d 2 is Good for c 2 (but unrelated to c 1 ) Classical IR metrics:d 1,d 2 Intent aware measures:d 2,d 1

Intent Aware Measure

Evaluation Evaluate our approach against three commercial search engine Conduct three sets of experiments Differ in how the distributions of intents and how the relevance of the documents are obtained

Experiment 1 The distributions of intents for both queries and documents via standard classifiers The relevance of documents from a proprietary repository of human judgements that we have been granted access to Dataset : 10,000 random queries with top 50 documents Many documents are assigned human judgments in the top 10 for each query

Experiment 1 sample about 900 queries – at least two categories – a significant fraction of associated documents have human judgments

Experiment 1

Experiment 2 Obtain the distributions of intents for queries and the document relevance using the Amazon Mechanical Turk platform Sample 200 queries from the dataset – at least three categories Submit these queries along with the three most likely categories as estimated by the classier and the top five results produced by IA-Select to the Turks

Experiment 2

Experiment 3 IA-Select : p(c|q) from Amazon Mechanical Turk platform Metrics : p(c|q) and relevance documents are the same as used in Experiment 1

Conclusions Provide a greedy algorithm with good approximation gurantees To evaluate the effectiveness of our approach, we proposed generalizations of well-studied metrics to take into account of the intentions of the users Our approach outperforms results produced by commercial search engines over all of the metrics