Walid Magdy Gareth Jones

Walid Magdy Gareth Jones
CLEF, 21 Sep 2010 Examining the Robustness of Evaluation Metrics for Patent Retrieval with Incomplete Relevance Judgements Walid Magdy Gareth Jones Dublin City University

Patent Retrieval Search collection of patents for relevant ones
Objective: find all possible relevant documents Search: takes much longer Users: professionals and more patient IR Campaigns: NTCIR, TREC, CLEF Evaluation: MAP, recall, PRES Focuses on finding more relevant documents in relative good ranks Focuses on finding relevant documents earlier Focuses on finding more relevant documents W. Magdy and G. Jones. PRES: a score metric for evaluating recall-oriented information retrieval applications. SIGIR 2010

What’s up? Missing a relevant document in patent search is harmful
What about missing it in the relevance judgements? How evaluation metrics will be affected? Are the metrics robust in evaluating systems? Bompad et al. On the robustness of relevance measures with incomplete judgements. SIGIR 2007

Data Used CLEF-IP 2009 qrels for 400 topics
Avg. number of relevant documents per topic = 6 48 runs submitted by 15 participants Runs ranked according to MAP, recall, and PRES

Experimental Setup Create versions of incomplete judgements (20%, 40%, 60%, 80% of the qrels) Re-compute scores with the new judgements Re-rank runs according to new scores Monitor the change in ranking Measure correlation between ranking using Kendall Tau The higher the correlation the more robust the metric

Results Voorhees E. M. Evaluation by highly relevant documents. SIGIR 2001 Kendall tau > 0.9: nearly equivalent ranking Kendall tau < 0.8: noticeable change in ranking

Conclusion MAP is not a robust score for evaluating patent search when relevance judgements are incomplete PRES & recall are more robust

Recommendation Based on metrics robustness + performance for patent search evaluation Stop using MAP - does not reflect system recall - not robust with incomplete judgements Start using PRES - reflects system recall + quality of ranking - highly robust with incomplete judgements Get PRESeval from:

Thank you Get PRESeval from:

Number of Relevant Docs per Topic

Walid Magdy Gareth Jones

Similar presentations

Presentation on theme: "Walid Magdy Gareth Jones"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Walid Magdy Gareth Jones

Similar presentations

Presentation on theme: "Walid Magdy Gareth Jones"— Presentation transcript:

Similar presentations

About project

Feedback