Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Task Oriented Non- Interactive Evaluation Methodology for IR Systems By Jane Reid Alyssa Katz LIS 551 March 30, 2004.

Similar presentations


Presentation on theme: "A Task Oriented Non- Interactive Evaluation Methodology for IR Systems By Jane Reid Alyssa Katz LIS 551 March 30, 2004."— Presentation transcript:

1 A Task Oriented Non- Interactive Evaluation Methodology for IR Systems By Jane Reid Alyssa Katz LIS 551 March 30, 2004

2 Introduction Reid offers an alternative to evaluation of IR systems based on the objective, static concept of relevance Reid offers an alternative to evaluation of IR systems based on the objective, static concept of relevance Instead, offers the concept of “task- oriented relevance” Instead, offers the concept of “task- oriented relevance” Non-topical relevanceNon-topical relevance Shift toward more user-centered evaluation methodsShift toward more user-centered evaluation methods Attempts to bridge systems-oriented view with a user-oriented viewAttempts to bridge systems-oriented view with a user-oriented view

3 Non-Interactive Evaluation Advantages Cheap Cheap Simple to Conduct Simple to Conduct Statistical Measures allow for easy comparison of systems Statistical Measures allow for easy comparison of systemsDisadvantages Relevance judgments are too simplistic Relevance judgments are too simplistic Does not account for interactive nature of current IR systems Does not account for interactive nature of current IR systems “Systems” Framework, not “Situational” “Systems” Framework, not “Situational”

4 Task-Oriented IR Task Representation Task Outcome Task Requirements A task is a purpose for seeking information A task is a purpose for seeking information Features common to all tasks can be identified as the “task framework” Features common to all tasks can be identified as the “task framework” Setter Performer Setter Performer Task Model

5 Task-Oriented Evaluation Task Model is continuously refined Task Model is continuously refined Takes place within an external context Takes place within an external context Externally vs. internally generated tasks Externally vs. internally generated tasks Different criteria for judgementDifferent criteria for judgement

6 A Task-Oriented Test Collection 1. A textual or mixed media collection of documents 2. A description of the task which can include performer, outcome, and completion information 3. Natural language queries created by the performers submitted to the IR System 4. Relevance Judgments

7 Relevance Judgments (cont’d) Subjective (query to document) vs. Objective (query to end need) Subjective (query to document) vs. Objective (query to end need) Task oriented judgments viewed as subjective, but more narrowly construed- document to performer’s task model Task oriented judgments viewed as subjective, but more narrowly construed- document to performer’s task model Paper goes further because it accounts for the feedback and learning stage Paper goes further because it accounts for the feedback and learning stage Notion of relevance will be modified throughout the entire processNotion of relevance will be modified throughout the entire process Only after feedback will a definitive answer about the relevance of a document be determinedOnly after feedback will a definitive answer about the relevance of a document be determined

8 Implementation of Test Collection Task Descriptions Task Descriptions Use real tasksUse real tasks Categorize to allow for varietyCategorize to allow for variety Experts vs. novice generated tasks Experts vs. novice generated tasks Externally vs. Internally generated tasks Externally vs. Internally generated tasks Simple (well defined) vs. Complex (poorly defined) Simple (well defined) vs. Complex (poorly defined)

9 Implementation of Test Collection (2) Queries Queries Will be created by task performersWill be created by task performers All queries should be includedAll queries should be included

10 Implementation of Test Collection (3) Relevance Judgments Relevance Judgments Use individual weighted relevance judgments, not binary judgmentsUse individual weighted relevance judgments, not binary judgments Ask performers to judge relevance on a scale Ask performers to judge relevance on a scale

11 Statistical Measures- Recall A different view definition of recall and precision A different view definition of recall and precision Recall Recall Old:Old: Number of relevant documents retrieved ___________________________________ Total number of relevant documents New:New: Relevance weight of document retrieved ______________________________ Total relevance weight of all documents

12 Statistical Measures-Recall Accumulate amount of relevance for each document across all task performers Accumulate amount of relevance for each document across all task performers Gives most intuitive view of recall Gives most intuitive view of recall

13 Statistical Measures- Precision Precision Precision Old:Old: Number of relevant documents retrieved -------------------------------------------- Total number of documents retrieved New:New: Relevance weight of documents retrieved -------------------------------------------- Potential Relevance weight of documents retrieved

14 Statistical Measures- Precision For an overall picture of an IR system’s performance: For an overall picture of an IR system’s performance: Calculate precision score every time another document is retrievedCalculate precision score every time another document is retrieved Average recall and precision points across all queriesAverage recall and precision points across all queries

15 Advantages of Task Oriented Evaluation Easy to use Easy to use Can still compare results Can still compare results Incorporates elements of interaction Incorporates elements of interaction Is based in the real world with the user’s task in mind, along with the dynamic nature of seeking information Is based in the real world with the user’s task in mind, along with the dynamic nature of seeking information

16 Implications for Future Research Use for multi-media searches, which are not as clear cut. Use for multi-media searches, which are not as clear cut. How do we arrive at weighted relevance judgements? How do we arrive at weighted relevance judgements? Other measures besides recall and precision Other measures besides recall and precision


Download ppt "A Task Oriented Non- Interactive Evaluation Methodology for IR Systems By Jane Reid Alyssa Katz LIS 551 March 30, 2004."

Similar presentations


Ads by Google