Presentation is loading. Please wait.

Presentation is loading. Please wait.

Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo.

Similar presentations


Presentation on theme: "Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo."— Presentation transcript:

1 Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo Thanks to… Bernardo Magnini Danilo Giampiccolo Pamela Forner Petya Osenova Christelle Ayache Bodgan Scaleanu Diana Santos Juan Feu Ido dagan …

2 nlp.uned.es/QA/ave What? Answer Validation Exercise Validate the correctness of the answers given by real QA systems......the answers of participants at CLEF QA 2006 Why? Give feedback on a single QA module, improve QA systems performance, improve systems self-score, help humans in the assessment of QA systems output, develop criteria for collaborative QA systems,...

3 nlp.uned.es/QA/ave How? Turning it into a RTE exercise If the text semantically entails the hypothesis, then the answer is expected to be correct. Question Supporting snippet & doc ID Exact Answer QA system Hypothesis Into affirmative form Text several sentences <500 bytes

4 nlp.uned.es/QA/ave Example  Question: Who is the President of Mexico?  Answer (obsolete): Vicente Fox  Hypothesis: Vicente Fox is the President of Mexico  Supporting Text: “...President Vicente Fox promises a more democratic Mexico...”  Exercise Text entails Hypothesis? Answer: YES | NO

5 nlp.uned.es/QA/ave Looking for robust systems  Hypothesis are built semi-automatically from systems answers Some answers are correct and exact Many are too large, too short, too wrong  Many hypothesis with Wrong syntax but understandable Wrong syntax and not understandable Wrong semantics

6 nlp.uned.es/QA/ave So, the exercise  Return an entailment value (YES|NO) for each given text-hypothesis pair  Results were evaluated against the QA human assessments  Subtasks English, Spanish, Italian, Dutch, French, German, Portuguese and Bulgarian

7 nlp.uned.es/QA/ave Collections Available for CLEF participants atnlp.uned.es/QA/ave/ TestingTraining English2088 (10% YES)2870 (15% YES) Spanish2369 (28% YES)2905 (22% YES) German1443 (25% YES) French3266 (22% YES) Italian1140 (16% YES) Dutch807 (10% YES) Portuguese1324 (14% YES)

8 nlp.uned.es/QA/ave Evaluation  Not balanced collections  Approach: Detect if there is enough evidence to accept an answer  Measures: Precision, recall and F over pairs YES (where text entails hypothesis)  Baseline system: Accept all answers, (give always YES)

9 nlp.uned.es/QA/ave Participants and runs DEENESFRITNLPT Fernuniversität in Hagen2 2 Language Computer Corporation 11 2 U. Rome "Tor Vergata" 2 2 U. Alicante (Kozareva)222222113 U. Politecnica de Valencia 1 1 U. Alicante (Ferrández) 2 2 LIMSI-CNRS 1 1 U. Twente122112110 UNED (Herrera) 2 2 UNED (Rodrigo) 1 1 ITC-irst 1 1 R2D2 project 1 1 Total5119434238

10 nlp.uned.es/QA/ave Results LanguageBaseline (F) Best (F) Reported Techiques English.27.44Logic Spanish.45.61Logic German.39.54Lexical, Syntax, Semantics, Logic, Corpus French.37.47Overlapping, Learning Dutch.19.39Syntax, Learning Portuguese.38.35Overlapping Italian.29.41Overlapping, Learning

11 nlp.uned.es/QA/ave Conclusions  Developed methodologies Build collections from QA responses Evaluate in chain with a QA Track  New testing collections for the QA and RTE communities In 7 languages, not only English  Evaluation in a real environment Real systems outputs -> AVE input

12 nlp.uned.es/QA/ave Conclusions  Reformulation of Answer Validation as Textual Entailment problem is feasible Introduces a 4% of error (in the semi-automatic generation of the collection)  Good participation 11 systems, 38 runs, 7 languages  Systems that reported the use of Logic obtained the best results


Download ppt "Answer Validation Exercise - AVE QA subtrack at Cross-Language Evaluation Forum UNED (coord.) Anselmo Peñas Álvaro Rodrigo Valentín Sama Felisa Verdejo."

Similar presentations


Ads by Google