LIS618 lecture 11 i/r performance evaluation Thomas Krichel 2011-11-21.

LIS618 lecture 11 i/r performance evaluation Thomas Krichel 2011-11-21

retrieval performance evaluation "Recall" and "Precision" are two classic measures to measure the performance of information retrieval in a single query. Both assume that there is an answer set of documents that contain the answer to the query. Performance is optimal if – the database returns all the documents in the answer set – the database returns only documents in the answer set Recall is the fraction of the relevant documents that the query result has captured. Precision is the fraction of the retrieved documents that is relevant.

recall and precision curves Assume that all the retrieved documents arrive at once and are being examined. During that process, the user discover more and more relevant documents. Recall increases. During the same process, at least eventually, there will be less and less useful document. Precision declines (usually). This can be represented as a curve.

Example Let the answer set be {0,1,2,3,4,5,6,7,8,9} and non-relevant documents represented by letters. A query reveals the following result: 7,a,3,b,c,9,n,j,l,5,r,o,s,e,4. For the first document, (recall, precision) is (10%,100%), for the third (20%,66%), for the sixth (30%,50%), for the tenth (40%,40%), and for the last (50%,33%).

recall/precision curves Such curves can be formed for each query. An average curve, for each recall level, can be calculated for several queries. Recall and precision levels can also be used to calculate two single-valued summaries. – average precision at seen document – R-precision

R-precision This is a pretty ad-hoc measure. Let R be the size of the answer set. Take the first R results of the query. Find the number of relevant documents Divide by R. In our example, the R-precision is 40%. An average can be calculated for a number of queries.

average precision at seen document To find it, sum all the precision level for each new relevant document discovered by the user and divide by the total number of relevant documents for the query. In our example, it is (100+66+50+40+ 33)/5=57.8% This measure favors retrieval methods that get the relevant documents to the top.

critique of recall & precision Recall has to be estimated by an expert. Recall is very difficult to estimate in a large collection. They focus on one query only. No serious user works like this. There are some other measures, but that is more for an advanced course in IR.

http://openlib.org/home/krichel Please shutdown the computers when you are done. Thank you for your attention!

LIS618 lecture 11 i/r performance evaluation Thomas Krichel 2011-11-21.

Similar presentations

Presentation on theme: "LIS618 lecture 11 i/r performance evaluation Thomas Krichel 2011-11-21."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LIS618 lecture 11 i/r performance evaluation Thomas Krichel 2011-11-21.

Similar presentations

Presentation on theme: "LIS618 lecture 11 i/r performance evaluation Thomas Krichel 2011-11-21."— Presentation transcript:

Similar presentations

About project

Feedback