Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Access I Measurement and Evaluation GSLT, Göteborg, October 2003 Barbara Gawronska, Högskolan i Skövde.

Similar presentations


Presentation on theme: "Information Access I Measurement and Evaluation GSLT, Göteborg, October 2003 Barbara Gawronska, Högskolan i Skövde."— Presentation transcript:

1 Information Access I Measurement and Evaluation GSLT, Göteborg, October 2003 Barbara Gawronska, Högskolan i Skövde

2 Evaluation in process Hirschman, L. & Mani, I. (2003): Evaluation. In: Mitkov, R. (ed.): The Oxford Handbook of Computational Linguistics, Oxford University Press, p. 415.

3 Evaluation types (White’s classification) Feasability evaluation Internal evaluation Declarative evaluation Usability evaluation

4 Intrinsic vs. Extrinsic measures Intrinsic: measure the system in itself Extrinsic: measure the efficiency and acceptability in some task

5 Glass box vs Black box evaluation

6 Standard measures: precision a = retrieved and relevant b = retrieved and not relevant Ret = the set of retrieved records Rel = the set of relevant records

7 Standard measures: precision (2) a = retrieved and relevant b = retrieved and not relevant S i = the sum of the scores of retrieved records S max = the maximum possible total score, over all records retrieved

8 Standard measures: recall a = retrieved and relevant c = not retrieved and relevant Ret = the set of retrieved records Rel = the set of relevant records

9 Standard measures : recall a = retrieved and relevant c = not retrieved and relevant RL i = the relevance of record i S max = the maximum possible total score, over all records retrieved

10 Typical relationship between precision and recall

11 Composite efficiency measures:

12 The E-Measure Combine Precision and Recall into one number (van Rijsbergen 1979) P = precision R = recall b = measure of relative importance of P or R

13 The E and F-Measures Combine Precision and Recall into one number (van Rijsbergen 1979) P = precision R = recall With the f-measure, larger values are better

14 The F-Measure b = 0 means F = precision b = ∞ means F = recall b = 1 means recall and precision equally weighted b = 0.5 means recall is half as important as precision b = 2.0 means recall is twice as important as precision (because 0≤P,R ≤1, a larger value in the denominator means a smaller value overall)

15 CLEF – Cross Language Evaluation Forum (IR) Tasks (”Topics”) based on TREC ad-hoc tasks Multilingual, bilingual and monolingual retrieval evaluation

16 Sample TREC queries (topics) Number: 168 Topic: Financing AMTRAK Description: A document will address the role of the Federal Government in financing the operation of the National Railroad Transportation Corporation (AMTRAK) Narrative: A relevant document must provide information on the government’s responsibility to make AMTRAK an economically viable entity. It could also discuss the privatization of AMTRAK as an alternative to continuing government subsidies. Documents comparing government subsidies given to air and bus transportation with those provided to a MTRAK would also be relevant.

17 CLEF tasks for multilingual IR: Given a topic in Language L, retrieve relevant documents in other languages (2002: Dutch, English, French, German, Spanish, Swedish) evaluation in terms of recall and precision Focus on recall (criticized)

18 Evaluation of summaries Most evaluations intrinsic Typically ”golden set” evaluation – against ideal summaries, measuring context overlap (by sentence and phrase recall and precision, or by simple word overlap)

19 Evaluation of summaries (2) Two basic measures for summary evaluation: Compression Ratio: CR= (length Summary)/ (length Text) Retension Ratio : RR = (info in Summary)/(info in Text)

20 How to estimate information amount? TIPSTER/SUMMAC EVALUATION The Shannon Game (guessing the content of text reading letter by letter) The Question Game (answering questions regarding the text)


Download ppt "Information Access I Measurement and Evaluation GSLT, Göteborg, October 2003 Barbara Gawronska, Högskolan i Skövde."

Similar presentations


Ads by Google