Presentation is loading. Please wait.

Presentation is loading. Please wait.

Walid Magdy Gareth Jones

Similar presentations


Presentation on theme: "Walid Magdy Gareth Jones"— Presentation transcript:

1 Walid Magdy Gareth Jones
CLEF, 21 Sep 2010 Examining the Robustness of Evaluation Metrics for Patent Retrieval with Incomplete Relevance Judgements Walid Magdy Gareth Jones Dublin City University

2 Patent Retrieval Search collection of patents for relevant ones
Objective: find all possible relevant documents Search: takes much longer Users: professionals and more patient IR Campaigns: NTCIR, TREC, CLEF Evaluation: MAP, recall, PRES Focuses on finding more relevant documents in relative good ranks Focuses on finding relevant documents earlier Focuses on finding more relevant documents W. Magdy and G. Jones. PRES: a score metric for evaluating recall-oriented information retrieval applications. SIGIR 2010

3 What’s up? Missing a relevant document in patent search is harmful
What about missing it in the relevance judgements? How evaluation metrics will be affected? Are the metrics robust in evaluating systems? Bompad et al. On the robustness of relevance measures with incomplete judgements. SIGIR 2007

4 Data Used CLEF-IP 2009 qrels for 400 topics
Avg. number of relevant documents per topic = 6 48 runs submitted by 15 participants Runs ranked according to MAP, recall, and PRES

5 Experimental Setup Create versions of incomplete judgements (20%, 40%, 60%, 80% of the qrels) Re-compute scores with the new judgements Re-rank runs according to new scores Monitor the change in ranking Measure correlation between ranking using Kendall Tau The higher the correlation the more robust the metric

6 Results Voorhees E. M. Evaluation by highly relevant documents. SIGIR 2001 Kendall tau > 0.9: nearly equivalent ranking Kendall tau < 0.8: noticeable change in ranking

7 Conclusion MAP is not a robust score for evaluating patent search when relevance judgements are incomplete PRES & recall are more robust

8 Recommendation Based on metrics robustness + performance for patent search evaluation Stop using MAP - does not reflect system recall - not robust with incomplete judgements Start using PRES - reflects system recall + quality of ranking - highly robust with incomplete judgements Get PRESeval from:

9 Thank you Get PRESeval from:

10 Number of Relevant Docs per Topic


Download ppt "Walid Magdy Gareth Jones"

Similar presentations


Ads by Google