Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluation Issues: June 2002 Donna Harman Ellen Voorhees.

Similar presentations


Presentation on theme: "Evaluation Issues: June 2002 Donna Harman Ellen Voorhees."— Presentation transcript:

1 Evaluation Issues: June 2002 Donna Harman Ellen Voorhees

2 NIST Interlocking Evaluation Plan Major metrics evaluation to be in TREC QA track Additional Aquaint-specific evaluations to be run for narrowly focused areas Pilot tasks to test new evaluation methodologies to be run each 6 months; resulting tasks will then migrate to TREC or to an Aquaint-specific evaluation Testbed will be focused on integration and usability issues

3 NIST Why use TREC QA? To open the evaluation to a much broader community –allows many different/unusual approaches –ensures that Aquaint technology is “competitive” with the outside world –encourages more rapid technology transfer To maintain continuity across the various question types; building to an ever larger set of question-answering capabilities

4 NIST When would an evaluation be Aquaint-specific ? Evaluation plan not likely to scale to TREC-size participation –example: user dialog evaluation Data not available outside of Aquaint –example: CNS data Narrow focus not likely to attract many research groups –example: multimedia/multilingual QA

5 NIST Criteria for Pilot Tasks Known type of question with evaluation problems –example: definitional/biographical questions (who is Colin Powell?) Known area of interest from Aquaint users –example: questions with no answer or only a partial answer Known area of research concentration –example: multimedia QA

6 NIST June 02 Pilot Evaluation Tasks Dialog for QA –Tomek Strzalkowski, Sanda Harabagiu Relationships or cause-and-effect QA –John Prager, Eric Nyberg Answer explanation/justification –Stefano Bertolo, Richard Fikes QA access to multimedia data –Howard Wactlar, Yiming Yang, Herb Gish

7 NIST June 02 Pilot Evaluation Tasks Opinion questions –Eduard Hovy, Kathleen McKeown Definitional (who is, what is) questions –Ralph Weischedel, Dan Moldovan Questions for a fixed domain –Jerry Hobbs, Daniel Marcu Questions with no or only partial answer –Maureen Caudill, Bill Ogden

8 NIST Evaluation Breakout Goals Develop a workable evaluation plan for a pilot evaluation of the target task Pilot evaluations run July-November 2002 –results reported at December meeting –the result of interest is the effectiveness of the evaluation, not of the systems

9 NIST Stakeholders in Evaluation Plans Contractors/researchers –plan needs to address an appropriate facet of problem so that groups will participate Eventual end-users (analysts) –plan needs to reflect some facet of user needs so that evaluation is seen as useful Implementers –plan needs to be specific, actually do-able so that NIST (or others) can carry it out

10 NIST Workable Evaluation Plan Concrete definition of problem to be addressed Detailed specification of the data structure that systems are to return as a response Operational method for scoring the quality of a response, including any human judgments required

11 NIST Examples of thorny issues Who is/what is questions –Task definition: how to supply context to help systems select “better” answers? –Form of answer: a ranked/”binned” list of facts? a filled template? a narrative? –Judgment: recall of “important” facts? missing a critical fact? precision/redundancy? –Operational details of pilot (who is doing what?)

12 NIST More thorny issues Answer justification –Task definition: what does this mean? –Form of answer: a logical reasoning chain? a list of document extracts? metadata? –Judgment: ??? –Operational details of pilot (who is doing what?)

13 NIST Breakout areas Dialog for QA Definitional (who is, what is) questions Opinion questions Relationships or cause-and-effect Questions for a fixed domain QA Questions with no or only partial answer h Answer explanation/justification QA access to multimedia data


Download ppt "Evaluation Issues: June 2002 Donna Harman Ellen Voorhees."

Similar presentations


Ads by Google