# PRES A Score Metric for Evaluating Recall- Oriented IR Applications Walid Magdy Gareth Jones Dublin City University SIGIR, 22 July 2010.

## Presentation on theme: "PRES A Score Metric for Evaluating Recall- Oriented IR Applications Walid Magdy Gareth Jones Dublin City University SIGIR, 22 July 2010."— Presentation transcript:

PRES A Score Metric for Evaluating Recall- Oriented IR Applications Walid Magdy Gareth Jones Dublin City University SIGIR, 22 July 2010

Recall-Oriented IR Examples: patent search and legal search Objective: find all possible relevant documents Search: takes much longer Users: professionals and more patient IR Campaigns: NTCIR, TREC, CLEF Evaluation: mainly MAP!!!

Current Evaluation Metrics For a topic with 4 relevant docs and 1 st 100 docs are to be checked: System1: relevant ranks = {1} System2: relevant ranks = {50, 51, 53, 54} System3: relevant ranks = {1, 2, 3, 4} System4: relevant ranks = {1, 98, 99, 100} AP system1 = 0.25 AP system2 = 0.0481 AP system3 = 1 R system1 = 0.25 R system2 = 1 R system3 = 1 F 1 system1 = 0.0192 F 1 system2 = 0.0769 F 1 system3 = 0.0769 F 1 system1 = 0.25 F 1 system2 = 0.0917 F 1 system3 = 1 F 4 system1 = 0.25 F 4 system2 = 0.462 F 4 system3 = 1 AP system4 = 0.2727 R system4 = 1 F 4 system4 = 0.864

Normalized Recall (R norm ) R norm is the area between the actual case and the worst as a proportion of the area between the best and the worst. N: collection size n: number of relevant docs r i : the rank at which the i th relevant document is retrieved

Applicability of R norm R norm requires the following: 1. Known collection size (N) 2. Number of relevant documents (qrels) (n) 3. Retrieving documents till reaching 100% recall (r i ) Workaround: – Un-retrieved relevant docs are considered as worst case – For large scale document collection: R norm ≈ Recall

N worst_case = N max + n For recall = 1  n/N max ≤ ≤ 1 For recall = R  nR 2 /N max ≤ ≤ R R norm Modification PRES: Patent Retrieval Evaluation Score PRESR norm | M PRESR norm | M

PRES Performance For a topic with 4 relevant docs and 1 st 100 docs are to be checked: System1: relevant ranks = {1} System2: relevant ranks = {50, 51, 53, 54} System3: relevant ranks = {1, 2, 3, 4} System4: relevant ranks = {1, 98, 99, 100} APR/R norm F4F4 PRES System10.25 System20.048110.4620.51 System31111 System40.272710.8640.28 n = 4, N max = 100

Average Performance 48 runs in CLEF-IP 2009 PRES vs MAP vs Recall Change in Scores Change in Ranking N max = 1000 Run IDMAPRecallPRES R470.1040.5890.484 R120.0880.5340.43 R230.0870.7280.603 R260.0840.5110.431 R180.0330.6560.49 PRES MAPRecall 0.66 0.56 0.87 Correlation

PRES Designed for recall-oriented applications Gives higher score for systems achieving higher recall and better average relative ranking Designed for laboratory testing Dependent on user’s potential/effort (N max ) Going to be applied in CLEF-IP 2010 PRESeval Get PRESeval from: www.computing.dcu.ie/~wmagdy/PRES.htm www.computing.dcu.ie/~wmagdy/PRES.htm

What should I say next? Let me check What should I say? MAP system Thank bla Recall system bla Thank you PRES system bla Thank you bla Thank you

Download ppt "PRES A Score Metric for Evaluating Recall- Oriented IR Applications Walid Magdy Gareth Jones Dublin City University SIGIR, 22 July 2010."

Similar presentations