Cumulated Gain-Based Evaluation of IR Techniques

Cumulated Gain-Based Evaluation of IR Techniques
Liu bingbing

Motivation There are so many different kinds of IR techniques , but which one is better? And how to evaluate these techniques?

Outline Introduction Cumulated gain-based measurements
Case study : comparison of some TREC-7 results at different relevance levels Discussion

Background Highly relevant documents should be identified and ranked first It’s necessary to develop measures to evaluate different IR techniques

Old measures Highly and marginally relevant documents are given equal credit IR documents are judged relevant or irrelevant Graded relevance judgments

New measures CG DCG nCG nDCG

Principles Highly relevant documents are more important than marginally relevant ones Documents found late are less important

Relationship CG G BV n(D)CG DCG

Direct Cumulated Gain (CG)
For example G `=<3, 2, 3, 0, 0, 1, 2, 2, 3, 0, : : :> CG`=<3, 5, 8, 8, 8, 9, 11, 13, 16, 16, : : :>

Discounted Cumulated Gain (DCG)
For example G`=<3, 2, 3, 0, 0, 1, 2, 2, 3, 0, : : :> DCG `=<3, 5, 6.89, 6.89, 6.89, 7.28, 7.99, 8.66, 9.61, 9.61, : : :>

Best possible Vectors Theoretically

A sample ideal gain vector (BV)
CG`=<3, 6, 9, 11, 13, 15, 16, 17, 18, 19, 19, 19, 19, : : :> DCG`=<3, 6, 7.89, 8.89, 9.75, 10.52, 10.88, 11.21, 11.53, 11.83, 11.83, 11.83, : : :> base=2

Relative to the Ideal Measure—the Normalized (D)CG Measure
Norm-vect (V, I)=<v1/i1, v2/i2, : : : , vk/ik> For example nCG=norm-vect( CG, CGI) nDCG=norm-vect(DCG,DCGI)

Comparison to Earlier Measures
Average search length (ASL) estimate the average position of a relevant document Expected search length (ESL) average number of documents that must be examined to retrieve a given number of relevant documents ………………. Both of them either don’t take the degree of document relevance into account or depend on the retrieved list size or …

The strengths of new measures -CG,DCG,NCG,NDCG
Take the degree of relevance of document into account Don’t depend on the size of recall base Don’t depend on outliers Be obvious to interpret

In addition DCG has further advantages
Weights down the gain found later Model user persistence

Data source TREC-7 50 queries from topic statements
51800 document or 1.9 GB data we used result lists for 20 topics by five participants from the TREC-7 ad hoc manual track

Relevance judgments The new judgment is reliable
New judgment is stricter

Cumulated gain (a) Binary weighting (b) Nonbinary weighting

Discounting gain

Normalized (D)CG Vectors and Statistical Testing

About the case study D 1 2 3 4 5 6 7 8 9 10 G For example: So:
Ideal=<3,3,3,2,2,1,1,1,0,0> A=<2,3,2,1,3,…> D 1 2 3 4 5 6 7 8 9 10 G

Several parameters Last Rank Considered Gain Values Discounting Factor

Limitations Don’t take order effects on relevance judgments or document overlap into account Deal with a single dimension only Be unable to handle dynamic changes

Benefites Take the degree of document relevance into account
Model user persistence

Cumulated Gain-Based Evaluation of IR Techniques

Similar presentations

Presentation on theme: "Cumulated Gain-Based Evaluation of IR Techniques"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cumulated Gain-Based Evaluation of IR Techniques

Similar presentations

Presentation on theme: "Cumulated Gain-Based Evaluation of IR Techniques"— Presentation transcript:

Similar presentations

About project

Feedback