Operational search engine Ambiguous queries What is correct interpretation? Don’t know Serve as diverse a range as possible 09/05/2015 11
Diversity is studied Carbonell, J. and Goldstein, J. (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In ACM SIGIR, 335-336. Zhai, C. (2002) Risk Minimization and Language Modeling in Text Retrieval, PhD thesis, Carnegie Mellon University. Chen, H. and Karger, D. R. (2006) Less is more: probabilistic models for retrieving fewer relevant documents. In ACM SIGIR, 429-436.
Cluster hypothesis “closely associated documents tend to be relevant to the same requests” Van Rijsbergen (1979) 09/05/2015 13
Most test collections Focussed topic Relevance judgments Who says what is relevant? (almost always) one person Consideration of interpretations Little or none Gap between test and operation
Few test collections Hersh, W. R. and Over, P. (1999) Trec-8 interactive track report. TREC-8 Over P. (1997) TREC-5 Interactive Track Report. TREC-5, 29-56 Clarke, C. L., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Büttcher, S., and MacKinnon, I. (2008) Novelty and diversity in information retrieval evaluation. In ACM SIGIR. 09/05/2015 15
Study diversity What sorts of diversity is there? Ambiguous query words How often is it a feature of search? How often are queries ambiguous? How can we add it into test collections? 09/05/2015 16
Extent of diversity? “Ambiguous queries: test collections need more sense”, SIGIR 2008 How do you define ambiguity? Wikipedia WordNet 09/05/2015 17
Compare with past years Same 39 topics used in 2006, 2007 But without clustering Compare cluster recall on past runs Based on identical P(20) Cluster recall increased Substantially Significantly 09/05/2015 33
Meta-analysis This was fun We experimented on participants outputs Not by design Lucky accident 09/05/2015 34
Not first to think of this Buckley and Voorhees SIGIR 2000, 2002 Use submitted runs to generate new research 09/05/2015 35
Conduct user experiment Do users prefer diversity? Experiment Build a system to do this Show users your system Baseline system Measure users 09/05/2015 36
Why bother… …when others have done the work for you Pair up randomly sampled runs High CR(20) Low CR(20) Show to users 09/05/2015 37
Your consent to our cookies if you continue to use this website.