Presentation is loading. Please wait.

Presentation is loading. Please wait.

Search Results Need to be Diverse Mark Sanderson University of Sheffield.

Similar presentations


Presentation on theme: "Search Results Need to be Diverse Mark Sanderson University of Sheffield."— Presentation transcript:

1 Search Results Need to be Diverse Mark Sanderson University of Sheffield

2 How to have fun while running an evaluation campaign Mark Sanderson University of Sheffield

3 Aim Tell you about our test collection work in Sheffield How we’ve been having fun building test collections 09/05/2015 3

4 Organising this is hard TREC Donna, Ellen CLEF Carol NTCIR Noriko Make sure you enjoy it 09/05/2015 4

5 ImageCLEF Cross language image retrieval Running for 6 years Photo Medical And other tasks Imageclef.org 09/05/2015 5

6 How do we do it? Organise and conduct research imageCLEFPhoto 2008 Study diversity in search results Diversity? 09/05/2015 6

7 SIGIR 09/05/2015 7

8 ACL 09/05/2015 8

9 Mark Sanderson 09/05/2015 9

10 Cranfield model 09/05/2015 10

11 Operational search engine Ambiguous queries What is correct interpretation? Don’t know Serve as diverse a range as possible 09/05/2015 11

12 Diversity is studied Carbonell, J. and Goldstein, J. (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In ACM SIGIR, 335-336. Zhai, C. (2002) Risk Minimization and Language Modeling in Text Retrieval, PhD thesis, Carnegie Mellon University. Chen, H. and Karger, D. R. (2006) Less is more: probabilistic models for retrieving fewer relevant documents. In ACM SIGIR, 429-436.

13 Cluster hypothesis “closely associated documents tend to be relevant to the same requests” Van Rijsbergen (1979) 09/05/2015 13

14 Most test collections Focussed topic Relevance judgments Who says what is relevant? (almost always) one person Consideration of interpretations Little or none Gap between test and operation

15 Few test collections Hersh, W. R. and Over, P. (1999) Trec-8 interactive track report. TREC-8 Over P. (1997) TREC-5 Interactive Track Report. TREC-5, 29-56 Clarke, C. L., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Büttcher, S., and MacKinnon, I. (2008) Novelty and diversity in information retrieval evaluation. In ACM SIGIR. 09/05/2015 15

16 Study diversity What sorts of diversity is there? Ambiguous query words How often is it a feature of search? How often are queries ambiguous? How can we add it into test collections? 09/05/2015 16

17 Extent of diversity? “Ambiguous queries: test collections need more sense”, SIGIR 2008 How do you define ambiguity? Wikipedia WordNet 09/05/2015 17

18 Disambiguation page 09/05/2015 18

19 Wikipedia stats enwiki-20071018-pages-articles.xml (12.7Gb) Disambiguation pages easy to spot “_(disambiguation)” in title Chicago “{{disambig}}” template George_bush

20 Conventional source Downloaded WordNet v3.0 88K words 09/05/2015 20

21 Query logs LogUnique queries (all) Most frequent (fr) Year(s) gathered Web1,000,0008,7192006 PA507,91414,5412006-7

22 Fraction of ambiguous 123 NameWiWNWN+Wi Webfreq7.6%4.0%10.0% all2.5%0.8%3.0% PAfreq10.5%6.4%14.7% all2.1%0.8%2.7%

23 Conclusions Ambiguity is a problem Ambiguity is present in query logs Not just Web search Ambiguity present? Need for IR systems to produce diverse results 09/05/2015 23

24 Test collections Don’t test for diversity Do search systems deal with it? 09/05/2015 24

25 ImageCLEFPhoto Build a test collection Encourage the study of diversity Study how others deal with diversity Have some fun 09/05/2015 25

26 Collection IAPR TC-12 20,000 travel photographs Text captions 60 existing topics Used in two previous studies 39 used for diversity study 09/05/2015 26

27 Diversity needs in topic “Images of typical Australian animals” 09/05/2015 27

28 Types of diversity 22 geographical “Churches in Brazil” 17 other “Australian animals” 09/05/2015 28

29 Relevance judgments Clustered existing qrels Multiple assessors Good level of agreement on clusters 09/05/2015 29

30 Evaluation Precision at 20 P(20) Fraction of relevant in top 20 Cluster recall at 20 CR(20) Fraction of different clusters in top 20 09/05/2015 30

31 Track was popular 24 groups 200 runs in total 09/05/2015 31

32 Submitted runs 09/05/2015 32

33 Compare with past years Same 39 topics used in 2006, 2007 But without clustering Compare cluster recall on past runs Based on identical P(20) Cluster recall increased Substantially Significantly 09/05/2015 33

34 Meta-analysis This was fun We experimented on participants outputs Not by design Lucky accident 09/05/2015 34

35 Not first to think of this Buckley and Voorhees SIGIR 2000, 2002 Use submitted runs to generate new research 09/05/2015 35

36 Conduct user experiment Do users prefer diversity? Experiment Build a system to do this Show users your system Baseline system Measure users 09/05/2015 36

37 Why bother… …when others have done the work for you Pair up randomly sampled runs High CR(20) Low CR(20) Show to users 09/05/2015 37

38 Animals swimming 09/05/2015 38

39 Numbers 25 topics 31 users 775 result pairs compared 09/05/2015 39

40 User preferences 54.6% more diversified; 19.7% less diversified; 17.4% both were equal; 8.3% preferred neither. 09/05/2015 40

41 Conclusions Diversity appears to be important System don’t do diversity by default Users prefer diverse results Test collections don’t support diversity But can be adapted 09/05/2015 41

42 and Organising evaluation campaigns is rewarding And can generate novel research 09/05/2015 42


Download ppt "Search Results Need to be Diverse Mark Sanderson University of Sheffield."

Similar presentations


Ads by Google