Search Results Need to be Diverse Mark Sanderson University of Sheffield.

Slides:



Advertisements
Similar presentations
Accurately Interpreting Clickthrough Data as Implicit Feedback Joachims, Granka, Pan, Hembrooke, Gay Paper Presentation: Vinay Goel 10/27/05.
Advertisements

Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
1 Retrieval Performance Evaluation Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto Addison-Wesley, (Chapter 3)
Section Based Relevance Feedback Student: Nat Young Supervisor: Prof. Mark Sanderson.
Overview of Collaborative Information Retrieval (CIR) at FIRE 2012 Debasis Ganguly, Johannes Leveling, Gareth Jones School of Computing, CNGL, Dublin City.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Evaluating Search Engine
Presenters: Başak Çakar Şadiye Kaptanoğlu.  Typical output of an IR system – static predefined summary ◦ Title ◦ First few sentences  Not a clear view.
Search Engines and Information Retrieval
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
 Mark Sanderson, University of Sheffield University of Sheffield CIIR, University of Massachusetts Deriving concept hierarchies from text Mark Sanderson,
Modern Information Retrieval
INFO 624 Week 3 Retrieval System Evaluation
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Evaluating the Performance of IR Sytems
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
Web Archive Information Retrieval Miguel Costa, Daniel Gomes (speaker) Portuguese Web Archive.
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Search Result Diversification by M. Drosou and E. Pitoura Presenter: Bilge Koroglu June 14, 2011.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Russian Information Retrieval Evaluation Seminar (ROMIP) Igor Nekrestyanov, Pavel Braslavski CLEF 2010.
Search Engines and Information Retrieval Chapter 1.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
PERSONALIZED SEARCH Ram Nithin Baalay. Personalized Search? Search Engine: A Vital Need Next level of Intelligent Information Retrieval. Retrieval of.
Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
Less is More Probabilistic Models for Retrieving Fewer Relevant Documents Harr Chen, David R. Karger MIT CSAIL ACM SIGIR 2006 August 9, 2006.
Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Thomas Mandl: GeoCLEF Track Overview Cross-Language Evaluation Forum (CLEF) Thomas Mandl, (U. Hildesheim) 8 th Workshop.
Performance Measurement. 2 Testing Environment.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
The Cross Language Image Retrieval Track: ImageCLEF Breakout session discussion.
Why IR test collections are so bad Mark Sanderson University of Sheffield.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Evaluation of Information Retrieval Systems Xiangming Mu.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Introduction to Information Retrieval Introduction to Information Retrieval Information Retrieval and Web Search Lecture 8: Evaluation.
Developments in Evaluation of Search Engines
Evaluation Anisio Lacerda.
Walid Magdy Gareth Jones
Evaluation of IR Systems
ارزيابی قابليت استفاده مجدد مجموعه تست‌ها دارای قضاوت‌های چندسطحی Reusability Assessment of Test Collections with Relevance Levels of Judgments مريم.
Mining Query Subtopics from Search Log Data
Presentation transcript:

Search Results Need to be Diverse Mark Sanderson University of Sheffield

How to have fun while running an evaluation campaign Mark Sanderson University of Sheffield

Aim Tell you about our test collection work in Sheffield How we’ve been having fun building test collections 09/05/2015 3

Organising this is hard TREC Donna, Ellen CLEF Carol NTCIR Noriko Make sure you enjoy it 09/05/2015 4

ImageCLEF Cross language image retrieval Running for 6 years Photo Medical And other tasks Imageclef.org 09/05/2015 5

How do we do it? Organise and conduct research imageCLEFPhoto 2008 Study diversity in search results Diversity? 09/05/2015 6

SIGIR 09/05/2015 7

ACL 09/05/2015 8

Mark Sanderson 09/05/2015 9

Cranfield model 09/05/

Operational search engine Ambiguous queries What is correct interpretation? Don’t know Serve as diverse a range as possible 09/05/

Diversity is studied Carbonell, J. and Goldstein, J. (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In ACM SIGIR, Zhai, C. (2002) Risk Minimization and Language Modeling in Text Retrieval, PhD thesis, Carnegie Mellon University. Chen, H. and Karger, D. R. (2006) Less is more: probabilistic models for retrieving fewer relevant documents. In ACM SIGIR,

Cluster hypothesis “closely associated documents tend to be relevant to the same requests” Van Rijsbergen (1979) 09/05/

Most test collections Focussed topic Relevance judgments Who says what is relevant? (almost always) one person Consideration of interpretations Little or none Gap between test and operation

Few test collections Hersh, W. R. and Over, P. (1999) Trec-8 interactive track report. TREC-8 Over P. (1997) TREC-5 Interactive Track Report. TREC-5, Clarke, C. L., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Büttcher, S., and MacKinnon, I. (2008) Novelty and diversity in information retrieval evaluation. In ACM SIGIR. 09/05/

Study diversity What sorts of diversity is there? Ambiguous query words How often is it a feature of search? How often are queries ambiguous? How can we add it into test collections? 09/05/

Extent of diversity? “Ambiguous queries: test collections need more sense”, SIGIR 2008 How do you define ambiguity? Wikipedia WordNet 09/05/

Disambiguation page 09/05/

Wikipedia stats enwiki pages-articles.xml (12.7Gb) Disambiguation pages easy to spot “_(disambiguation)” in title Chicago “{{disambig}}” template George_bush

Conventional source Downloaded WordNet v3.0 88K words 09/05/

Query logs LogUnique queries (all) Most frequent (fr) Year(s) gathered Web1,000,0008, PA507,91414,

Fraction of ambiguous 123 NameWiWNWN+Wi Webfreq7.6%4.0%10.0% all2.5%0.8%3.0% PAfreq10.5%6.4%14.7% all2.1%0.8%2.7%

Conclusions Ambiguity is a problem Ambiguity is present in query logs Not just Web search Ambiguity present? Need for IR systems to produce diverse results 09/05/

Test collections Don’t test for diversity Do search systems deal with it? 09/05/

ImageCLEFPhoto Build a test collection Encourage the study of diversity Study how others deal with diversity Have some fun 09/05/

Collection IAPR TC-12 20,000 travel photographs Text captions 60 existing topics Used in two previous studies 39 used for diversity study 09/05/

Diversity needs in topic “Images of typical Australian animals” 09/05/

Types of diversity 22 geographical “Churches in Brazil” 17 other “Australian animals” 09/05/

Relevance judgments Clustered existing qrels Multiple assessors Good level of agreement on clusters 09/05/

Evaluation Precision at 20 P(20) Fraction of relevant in top 20 Cluster recall at 20 CR(20) Fraction of different clusters in top 20 09/05/

Track was popular 24 groups 200 runs in total 09/05/

Submitted runs 09/05/

Compare with past years Same 39 topics used in 2006, 2007 But without clustering Compare cluster recall on past runs Based on identical P(20) Cluster recall increased Substantially Significantly 09/05/

Meta-analysis This was fun We experimented on participants outputs Not by design Lucky accident 09/05/

Not first to think of this Buckley and Voorhees SIGIR 2000, 2002 Use submitted runs to generate new research 09/05/

Conduct user experiment Do users prefer diversity? Experiment Build a system to do this Show users your system Baseline system Measure users 09/05/

Why bother… …when others have done the work for you Pair up randomly sampled runs High CR(20) Low CR(20) Show to users 09/05/

Animals swimming 09/05/

Numbers 25 topics 31 users 775 result pairs compared 09/05/

User preferences 54.6% more diversified; 19.7% less diversified; 17.4% both were equal; 8.3% preferred neither. 09/05/

Conclusions Diversity appears to be important System don’t do diversity by default Users prefer diverse results Test collections don’t support diversity But can be adapted 09/05/

and Organising evaluation campaigns is rewarding And can generate novel research 09/05/