Presentation is loading. Please wait.

Presentation is loading. Please wait.

ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo,

Similar presentations


Presentation on theme: "ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo,"— Presentation transcript:

1 ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo, UNED, Spain 1

2 Outline  The Multiple Language Question Answering Track at CLEF – a bit of History  ResPubliQA this year –What is new  Participation, Runs and Languages  Assessment and Metrics  Results  Conclusions ResPubliQA 2010, 22 September, Padua, Italy 2

3 Multiple Language Question Answering at CLEF ResPubliQA 2010, 22 September, Padua, Italy 3 Era I: Era II: Era III: Ungrouped mainly factoid questions asked against monolingual newspapers; Exact answers returned Grouped questions asked against newspapers and Wikipedia; Exact answers returned ResPubliQA - Ungrouped questions against multilingual parallel- aligned EU legislative documents; Passages returned Started in 2003: eighth year

4 ResPubliQA 2010 – Second Year  Key points: – same set of questions in all languages – same document collections: parallel aligned documents  Same objectives: – to move towards a domain of potential users – to allow the direct comparison of performances across languages – to allow QA technologies to be evaluated against IR approaches – to promote use of Validation technologies ResPubliQA 2010, 22 September, Padua, Italy 4 But also some novelties…

5 What’s new 1.New Task (Answer Selection) 2.New document collection (EuroParl) 3.New question types 4.Automatic Evaluation ResPubliQA 2010, 22 September, Padua, Italy 5

6 The Tasks ResPubliQA 2010, 22 September, Padua, Italy 6  Paragraph Selection (PS) – to extract a relevant paragraph of text that satisfies completely the information need expressed by a natural language question  Answer Selection (AS) – to demarcate the shorter string of text corresponding to the exact answer supported by the entire paragraph NEW

7 The Collections  Subset of JRC-Acquis (10,700 docs per lang) – EU treaties, EU legislation, agreements and resolutions – Between 1950 and 2006 – Parallel-aligned at the doc level (not always at paragraph) – XML-TEI.2 encoding  Small subset of EUROPARL (~ 150 docs per lang) – Proceedings of the European Parliament translations into Romanian from January 2009 Debates (CRE) from 2009 and Texts Adopted (TA) from 2007 – Parallel-aligned at the doc level (not always at paragraph) – XML encoding ResPubliQA 2010, 22 September, Padua, Italy 7 NEW

8 EuroParl Collection  is compatible with Acquis domain  allows to widen the scope of the questions  Unfortunately – small number of texts documents are not fully translated ResPubliQA 2010, 22 September, Padua, Italy 8 The specific fragments of JRC-Acquis and Europarl used by ResPubliQA is available at

9 Questions  two new question categories: – OPINION What did the Council think about the terrorist attacks on London? – OTHER What is the e-Content program about?  Reason and Purpose categories merged together Why was Perwiz Kambakhsh sentenced to death?  And also Factoid, Definition, Procedure ResPubliQA 2010, 22 September, Padua, Italy 9

10 ResPubliQA Campaigns ResPubliQA 2010, 22 September, Padua, Italy 10 Task Registered groups Participant groups Submitted Runs Organizing people ResPubliQA (baseline runs) 9 ResPubliQA (42 PS and 7 AS) 6 (+ 6 additional translators/ assessors) More participants and more submissions

11 ResPubliQA 2010 Participants ResPubliQA 2010, 22 September, Padua, Italy 11 System name TeamReference bpacSZTAKI, HUNGARYNemeskey dict Dhirubhai Ambani Institute of Information and Communication Technology, INDIASabnani et al elixUniversity of Basque Country, SPAINAgirre et al iciaRACAI, ROMANIAIon et al ilesLIMSI-CNRS, FRANCETannier et al ju_cJadavpur University, INDIAPakray et al logaUniversity Koblenz, GERMANYGlöckner and Pelzer nlelU. Politecnica Valencia, SPAINCorrea et al pribPriberam, PORTUGAL- uaicAl.I.Cuza\ University of Iasi, ROMANIAIftene et al uc3mUniversidad Carlos III de Madrid, SPAINVicente-Díez et al uiirUniversity of Indonesia, INDONESIAToba et al unedUNED, SPAINRodrigo et al 13 participants 8 countries 4 new participants

12 Submissions by Task and Language Target language Source languages DEENESFRITPTROTotal DE4 (4,0) EN19 (16,3)2 (2,0)21 (18,3) ES7 (6,1) EU2 (2,0) FR7 (5,2) IT3 (2,1) PT1 (1,0) RO4 (4,0) Total4 (4,0)21 (18,3)7 (6,1)7 (5,2)3 (2,1)1 (1,0)6 (6,0)49 (42,7) ResPubliQA 2010, 22 September, Padua, Italy 12

13 System Output  Two options: – Give an answer (paragraph or exact answer) – Return NOA as response = no answer is given The system is not confident about the correctness of its answer  Objective: – avoid to return an incorrect answer – reduce only the portion of wrong answers ResPubliQA 2010, 22 September, Padua, Italy 13

14 Evaluation Measure ResPubliQA 2010, 22 September, Padua, Italy 14 n R : number of questions correctly answered n U : number of questions unanswered n: total number of questions (200 this year) If n U = 0 then R /n  Accuracy

15 Assessment Two steps: 1)Automatic evaluation o responses automatically compared against the Gold Standard manually produced – answers that exactly match with the GoldStandard, are given the correct value (R) – correctness of a response: exact match of Document identifier, Paragraph identifier, and the text retrieved by the system with respect to those in the GoldStandard 2)Manual assessment o Non-matching paragraphs/ answers judged by human assessors o anonymous and simultaneous for the same question ResPubliQA 2010, 22 September, Padua, Italy 15 31% of the answers automatically marked as correct

16 Assessment for Paragraph Selection (PS)  binary assessment: – Right (R) – Wrong (W)  NOA answers: – automatically filtered and marked as U (Unanswered) – discarded candidate answers were also evaluated NoA R: NoA, but the candidate answer was correct NoA W: NoA, and the candidate answer was incorrect Noa Empty: NoA and no candidate answer was given  evaluators were guided by the initial “gold” paragraph – only a hint ResPubliQA 2010, 22 September, Padua, Italy 16

17 Assessment for Answer Selection (AS) R (Right): the answer-string consists of an exact and correct answer, supported by the returned paragraph; X (ineXact): the answer-string contains either part of a correct answer present in the returned paragraph or it contains all the correct answer plus unnecessary additional text; M (Missed): the answer-string does not contain a correct answer even in part but the returned paragraph in fact does contain a correct answer; W (Wrong): the answer-string does not contain a correct answer and moreover the returned paragraph does not contain it either; or it contains an unsupported answer ResPubliQA 2010, 22 September, Padua, Italy 17

18 Monolingual Results for PS ResPubliQA 2010, 22 September, Padua, Italy 18 systemDEENESFRITPTRO Combination uiir dict bpac loga loga prib nlel bpac elix IR baseline (uned) uned uc3m uc3m dict uiir uned elix nlel ju_c iles uaic uaic icia

19 Improvement in the Performance ResPubliQA 2010, 22 September, Padua, Italy 19 BESTAVERAGE ResPubliQA ResPubliQA Monolingual PS Task: 2010 CollectionsBESTAVERAGE JRC-Acquis EuroParl

20 Cross-language Results for PS ResPubliQA 2010, 22 September, Padua, Italy 20 systemDEENESFRITPTRO elix102euen0.36 elix101euen0.33 icia101enro0.29 icia102enro0.29 In comparison to ResPubliQA 2009: – More cross-language runs (+ 2) – Improvement in the best performance: from 0.18 to 0.36

21 Results for the AS Task ResPubliQA 2010, 22 September, Padua, Italy 21 R #NoA W #NoA M #NoA X #NoA empty combination ju_c101ASenen iles101ASenen iles101ASfrfr nlel101ASenen nlel101ASeses nlel101ASitit nlel101ASfrfr

22 Conclusions  Successful continuation of ResPubliQA 2009  AS task: few groups and poor results  Overall improvement of results  New document collection and new question types  evaluation metric encourages the use of validation module ResPubliQA 2010, 22 September, Padua, Italy 22

23 More on System Analyses and Approaches MLQA’10 Workshop on Wednesday 14:30 – 18:00 ResPubliQA 2010, 22 September, Padua, Italy 23

24 ResPubliQA 2010: QA on European Legislation Thank you! 24


Download ppt "ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo,"

Similar presentations


Ads by Google