Presentation is loading. Please wait.

Presentation is loading. Please wait.

Assessing The Retrieval A.I Lab 2007.01.20 박동훈. Contents 4.1 Personal Assessment of Relevance 4.2 Extending the Dialog with RelFbk 4.3 Aggregated Assessment.

Similar presentations


Presentation on theme: "Assessing The Retrieval A.I Lab 2007.01.20 박동훈. Contents 4.1 Personal Assessment of Relevance 4.2 Extending the Dialog with RelFbk 4.3 Aggregated Assessment."— Presentation transcript:

1 Assessing The Retrieval A.I Lab 2007.01.20 박동훈

2 Contents 4.1 Personal Assessment of Relevance 4.2 Extending the Dialog with RelFbk 4.3 Aggregated Assessment : Search Engine Performance 4.4 RAVE : A Relevance Assessment Vehicle 4.5 Summary

3 4.1 Personal Assessment of Relevance 4.1.1 Cognitive Assumptions – Users trying to do ‘object recognition’ – Comparison with respect to prototypic document – Reliability of user opinions? – Relevance Scale – RelFbk is nonmetric

4 Relevance Scale

5 Users naturally provides only preference information Not(metric) measurement of how relevant a retrieved document is! RelFbk is nonmetric

6 4.2 Extending the Dialog with RelFbk RelFbk Labeling of the Retr Set

7 Query Session, Linked by RelFbk

8 4.2.1 Using RelFbk for Query Refinment

9

10

11 4.2.2 Document Modifications due to RelFbk Fig 4.7 Change documents!? More/less the query that successfully / un matches them

12 4.3 Aggregated Assessment : Search Engine Performance 4.3.1 Underlying Assumptions –RelFbk(q,di) assessments independent –Users’ opinions will all agree with single ‘omniscient’ expert’s

13 4.3.2 Consensual relevance Consensually relevant

14 4.3.4 Basic Measures Relevant versus Retrieved Sets

15 Contingency table NRel : the number of relevant documents NNRel : the number of irrelevant documents NDoc : the total number of documents NRet : the number of retrieved documents NNRet : the number of documents not retrieved

16 4.3.4 Basic Measures (cont)

17 4.3.4 Basic Measures (cont)

18 4.3.5 Ordering the Retr set Each document assigned hitlist rank Rank(di) Descending Match(q,di) Rank(di) Match(q,dj) –Rank(di) Pr(Rel(dj)) Coordination level : document’s rank in Retr –Number of keywords shared by doc and query Goal:Probability Ranking Principle

19 A tale of two retrievals Query1Query2

20 Recall/precision curve Query1

21 Recall/precision curve Query1

22 Retrieval envelope

23 4.3.6 Normalized recall ri : i 번째 relevant doc 의 hitlist rank Worst Best

24 4.3.8 One-Parameter Criteria Combining recall and precision Classification accuracy Sliding ratio Point alienation

25 Combining recall and precision F-measure –[Jardine & van Rijsbergen71] –[Lewis&Gale94] Effectiveness –[vanRijsbergen, 1979] E=1-F, α=1/(β 2 +1) α=0.5=>harmonic mean of precision & recall

26 Classification accuracy accuracy Correct identification of relevant and irrelevant

27 Sliding ratio Imagine a nonbinary, metric Rel(di) measure Rank1, Rank2 computed by two separate systems

28 Point alienation Developed to measure human preference data Capturing fundamental nonmetric nature of RelFbk

29 4.3.9 Test corpora More data required for “test corpus” Standard test corpora TREC:Text Retrieval Evaluation Conference TREC’s refined queries TREC constantly expanding, refining tasks

30 More data required for “test corpus” Documents Queries Relevance assessments Rel(q,d) Perhaps other data too – Classification data (Reuters) – Hypertext graph structure (EB5)

31 Standard test corpora

32 TREC constantly expanding, refining tasks Ad hoc queries tasks Routing/filtering task Interactive task

33 Other Measure Expected search length (ESL) –Length of “path” as user walks down HitList –ESL=Num. irrelevant documents before each relevant document –ESL for random retrieval –ESL reduction factor

34 4.5 Summary Discussed both metric and nonmetric relevance feedback The difficulties in getting users to provide relevance judgments for documents in the retrieved set Quantified several measures of system perfomance


Download ppt "Assessing The Retrieval A.I Lab 2007.01.20 박동훈. Contents 4.1 Personal Assessment of Relevance 4.2 Extending the Dialog with RelFbk 4.3 Aggregated Assessment."

Similar presentations


Ads by Google