Presentation is loading. Please wait.

Presentation is loading. Please wait.

Co-funded by the European Union The QALL-ME Benchmark: a Multilingual Resource of Annotated Spoken Requests for Question Answering E. Cabrio, M. Kouylekov,

Similar presentations


Presentation on theme: "Co-funded by the European Union The QALL-ME Benchmark: a Multilingual Resource of Annotated Spoken Requests for Question Answering E. Cabrio, M. Kouylekov,"— Presentation transcript:

1 Co-funded by the European Union The QALL-ME Benchmark: a Multilingual Resource of Annotated Spoken Requests for Question Answering E. Cabrio, M. Kouylekov, B. Magnini, M. Negri (FBK-Irst) L. Hasler, C. Orasan, (University of Wolverhampton) D. Tomas, J.L. Vicedo (University of Alicante) G. Neumann, C. Weber (DFKI)

2 Outline: o Motivations and goals o QALL-ME Project o QALL-ME Benchmark o Data collection o Translation into English o Speech Acts Annotation o Question Answering Annotation o Annotation of relations o Conclusion and Future Work LREC - 28-30 May 2008 - Marrakech (Morocco)

3 Context: the Qall-me project QALL-ME (Question Answering Learning technologies in a multiLingual and multiModal Environment): an EU-funded project aiming at the realization of a shared and distributed infrastructure for Question Answering systems on mobile devices (e.g. mobile phones). LREC - 28-30 May 2008 - Marrakech (Morocco) SMS INPUTOUTPUT SMS MMS VOICE TEXT VOICE VIDEO DIGITAL ASSISTANT

4 QALL-ME details o Reference:FP6 IST-033860 o Contract Type: STREP o Start date: October 1 st, 2006 o Duration: 36 months o Project Funding: 2.82 M euros http://qallme.fbk.eu LREC - 28-30 May 2008 - Marrakech (Morocco) FBK- Irst, ItalyComdata S,p.A., Italy DFKI, GermanyUbiest S.p.A., Italy University of Alicante, SpainWaycom S.r.l., Italy University of Wolverhampton, UK

5 Motivations dataset of requests o Providing a dataset of requests beyond factoid questions (e.g. verification, procedural) LREC - 28-30 May 2008 - Marrakech (Morocco)

6 Motivation: beyond factoid…  has Venezia hotel a restaurant  is there a toll free number for the INAIL office in via Gazzoletti in TrentoVERIFICATION  where is the INAIL office and how can I get there  how can I get to the pharmacy De Gerloni of TrentoPROCEDURAL LREC - 28-30 May 2008 - Marrakech (Morocco)

7 Motivations dataset of requests o Providing a dataset of requests beyond factoid questions (e.g. verification, procedural) domain dependent o Investigating domain dependent vs domain independent annotation schema (Qall-me project domain: cultural events in a town). LREC - 28-30 May 2008 - Marrakech (Morocco)

8 Challenges o Context aware QA oWhat can I see tonight at cinema oWhere is the nearest pharmacy o Persistent vs dynamic information o Multiple sources (database, newspaper, web) LREC - 28-30 May 2008 - Marrakech (Morocco) 8

9 Challenges related to events o Context aware QA oWhat can I see tonight at cinema (in Trento) oWhere is the nearest pharmacy (to piazza Duomo) o Persistent vs dynamic information o Multiple sources (database, newspaper, web) LREC - 28-30 May 2008 - Marrakech (Morocco) 9

10 Motivations dataset of requests o Providing a dataset of requests beyond factoid questions (e.g. verification, procedural) domain dependent o Investigating domain dependent vs domain independent annotation schema (Qall-me project domain: cultural events in a town). QA annotations QA o Experimenting the impact of QA annotations (e.g. EAT) on spoken requests (speech vs QA). LREC - 28-30 May 2008 - Marrakech (Morocco)

11 QA annotation may I know where the ice stadium of Trento is located and at what time it opens LREC - 28-30 May 2008 - Marrakech (Morocco) LOCATION DATE Expected Answer Type :

12 Motivations dataset of requests o Providing a dataset of requests beyond factoid questions (e.g. verification, procedural) domain dependent o Investigating domain dependent vs domain independent annotation schema (Qall-me project domain: cultural events in a town). QA annotations QA o Experimenting the impact of QA annotations (e.g. EAT) on spoken requests (speech vs QA). portability of semantic annotation o Investigating of the portability of semantic annotation through languages. LREC - 28-30 May 2008 - Marrakech (Morocco)

13 Portability of annotations may I know where the ice stadium of Trento is located LREC - 28-30 May 2008 - Marrakech (Morocco) Expected Answer Type: LOCATION potrei sapere dov’è lo stadio del ghiaccio di Trento puedo saber donde esta el estadio de hielo de Trento ich möchte wissen wo das Eisstadium von Trento ist

14 Data collection o 14645 questions in four different languages: ITALIAN, ENGLISH, GERMAN, SPANISH ITALIAN, ENGLISH, GERMAN, SPANISH o Domain: cultural events in a town Acquisition: Every speaker performs 30 questions, based on 15 scenarios : o Using a graphical interface, for each scenario is first generated a spontaneous request and then a written one (previously predefined) o A telephone was used to acquire questions. LREC - 28-30 May 2008 - Marrakech (Morocco)

15 Data collection # words# utterancesavg. len (words) ITALIAN read utterances25715229011.2 spontaneous utterances33492237414.1 total utterances59207466412.7 SPANISH read utterances25919225011.52 spontaneous utterances26327225011.70 total utterances52246450011.61 ENGLISH read utterances26626221512 spontaneous utterances36000228615.8 total utterances62626450113.9 GERMAN read utterances1099090312.17 spontaneous utterances9857712.79 total utterances1197598012.22 LREC - 28-30 May 2008 - Marrakech (Morocco)

16 Data acquisition features # speakersmalesfemalesnon- native tot. speech duration avg. utt. dur IT1616893129h20’7” SP15010941816h4’5.14” EN1134663217h35’6.1” GER94521h21’4.9” LREC - 28-30 May 2008 - Marrakech (Morocco)

17 Transcription All the audio files acquired from a speaker were joined together and orthographically transcribed using the tool Transcriber. (http://trans.souceforge.net) Being domain-restricted, our scenarios led sometimes to the same utterance (matching word sequence). However, the number of repetitions is actually small. LREC - 28-30 May 2008 - Marrakech (Morocco)

18 Translation into English Translation made by simulating the real situation of an English speaker visiting a foreign city. E.g. owhat is the address of museo dell'aeronautica Gianni Caproni Future work: Future work: all data collected for one language translated into the other three languages LREC - 28-30 May 2008 - Marrakech (Morocco)

19 Annotation of speech acts o As a starting point for further analyses, it is important to separate within an utterance (each speaker’s turn) what has to be interpreted as the actual request from what does not need an answer. hallo I am in Trento and I would like to visit a church in the centre of the town I would like to know the name and the location of one of these churches thanks LREC - 28-30 May 2008 - Marrakech (Morocco) from the QALL-ME benchmark

20 Annotation of speech acts o As a starting point for further analyses, it is important to separate within an utterance (each speaker’s turn) what has to be interpreted as the actual request from what does not need an answer. to greet hallo I am in Trento and I would like to visit a church in the centre of the town I would like to know the name and the location of one of these churches thanks LREC - 28-30 May 2008 - Marrakech (Morocco) from the QALL-ME benchmark

21 Annotation of speech acts o As a starting point for further analyses, it is important to separate within an utterance (each speaker’s turn) what has to be interpreted as the actual request from what does not need an answer. to contextualise hallo I am in Trento and I would like to visit a church in the centre of the town I would like to know the name and the location of one of these churches thanks LREC - 28-30 May 2008 - Marrakech (Morocco) from the QALL-ME benchmark

22 Annotation of speech acts o As a starting point for further analyses, it is important to separate within an utterance (each speaker’s turn) what has to be interpreted as the actual request from what does not need an answer. hallo I am in Trento and I would like to visit a church in the centre of the town I would like to know the name and the location of one of these churches thanks to ask LREC - 28-30 May 2008 - Marrakech (Morocco) from the QALL-ME benchmark

23 Annotation of speech acts o As a starting point for further analyses, it is important to separate within an utterance (each speaker’s turn) what has to be interpreted as the actual request from what does not need an answer. hallo I am in Trento and I would like to visit a church in the centre of the town I would like to know the name and the location of one of these churches thanks to thank LREC - 28-30 May 2008 - Marrakech (Morocco) from the QALL-ME benchmark

24 Annotation of speech acts REQUESTS DIRECT: wh-questions Introduced by: Could you tell me… May I know… pronounced with ascendant intonation INDIRECT: requests formulated in indirect or implicit ways NON REQUESTS All the utterances used by the speaker to introduce himself, to contextualize himself or his request in time and space, to thank, to greet. ASSERT GREETINGS THANKS OTHER For our purposes, we used CLaRK, an XML Based System for Corpora Development (http://www.bultreebank.org/clark/index.html).http://www.bultreebank.org/clark/index.html LREC - 28-30 May 2008 - Marrakech (Morocco) UTTERANCE

25 Agreement (speech acts) Inter-annotator agreement (calculated on 1000 randomly picked sentences) for ITALIAN: Dice coefficient = 2C/(A+B) C=number of common annotations A, B =number of annotations provided by the first and the second annotator Overall agreement96.1% ASSERT85.5% DIRECT97.88% INDIRECT97.33% OTHER76.47% THANKS98.51% GREETINGS99.49% LREC - 28-30 May 2008 - Marrakech (Morocco)

26 Expected Answer Type For EAT annotation we propose the following scheme: EAT PROCEDURAL VERIFICATION FACTOID DEFINITION/DESCRIPTION DOMAIN-INDEPENDENT (SEKINE’S ENE HIERARCHY) DOMAIN-SPECIFIC (QALL-ME ONTOLOGY) LREC - 28-30 May 2008 - Marrakech (Morocco) Extracted from Graesser’s (1988) taxonomy

27 Sekine’s ENE vs Qall-me ont. LREC - Marrakech (Morocco ), 28-30 May 2008 27 what is the restaurant in via Brennero in Trento  EAT Sekine’s ENE hierarchy Qall-me ontology

28 Sekine’s ENE vs Qall-me ont. LREC - Marrakech (Morocco ), 28-30 May 2008 28 can you give me the name of the pharmacy in piazza Pasi 20 in Trento  EAT Sekine’s ENE hierarchy Qall-me ontology

29 Annotation of Relations o Relations among entities: convey and complete the context in which a specific request has to be interpreted LREC - 28-30 May 2008 - Marrakech (Morocco) At what time is the movie il grande capo beginning tomorrow afternoon at Vittoria cinema o Rel1 (MOVIE, DATE) o Rel2 (MOVIE, STARTINGHOUR) o Rel3 (MOVIE, CINEMA) o 10% of the Italian questions (referring to Cinema/Movie domain) have been annotated with the 12 relations holding in such domain (Qall-me ontology).

30 Status of the benchmark audiotranscr.translat.speech acts EAT Sekine EAT ontology ITALIANXXXXX X SPANISHXXXX X in progress ENGLISHXX--- in progress GERMAN in progress LREC - 28-30 May 2008 - Marrakech (Morocco) Present situation and tentative scheduling: The QALL-ME benchmark is being made incrementally available at the project website (http://qallme.fbk.eu)

31 Future work Additional annotation layers will be considered: o Focus of the question o Multiwords o Named Entities o Normalized Temporal Expressions o…o… LREC - 28-30 May 2008 - Marrakech (Morocco)

32 Conclusions o QALL-ME benchmark o QALL-ME benchmark: multilingual resource (for Italian, Spanish, English and German) of annotated spoken requests in the tourism domain. o Beyond factoid o Context aware QA and dynamic changes o QA annotation on spoken requests o Portability of semantic annotation o Reference resource, useful to train and test ML based QA systems LREC - 28-30 May 2008 - Marrakech (Morocco)

33 Thank you {cabrio, kouylekov, magnini, negri}@fbk.eu {L.Hasler, c.orasan}@wlv.ac.uk {tomas, vicedo}@disi.ua.es {neumann, cowe01}@dfki.de Project website: http://qallme.fbk.eu LREC - 28-30 May 2008 - Marrakech (Morocco)

34 Acquisition scenarios 34 SubDomain DesiredOutput MandatoryItems OptionalItems

35 Example from the corpus buongiorno chiamo da Trento avrei bisogno dell'indirizzo del teatro Auditorium per un concerto di Salvatore Accardo del 17 gennaio 2007 buongiorno chiamo da Trento avrei bisogno dell'indirizzo del teatro Auditorium per un concerto di Salvatore Accardo del 17 gennaio 2007 spk075_27mar07comd_it_sid023 6 buongiorno chiamo da Trento ho [mmm] avrei bisogno dell'indirizzo del teatro Auditorium per un [eh] concerto di Salvatore Accardo del 17 gennaio 2007 [b] good morning I am calling from Trento I would like to know the address of Auditorium theatre for Salvatore Accardo's concert on 17th January 2007 35

36 Expected Answer Type (1) The semantic category associated to the desired answer, chosen out of a predefined set of labels (e.g. PERSON, LOCATION, DATE). o How many colors are in the Italian flag QUANTITY o Where is the Uffizi museum LOCATION Most QA systems described in literature heavily rely on EAT information, at least in the Answer Extraction phase, to narrow the potential answer candidate search space. 36

37 Example from the corpus 37 What are the address and the telephone number of Venezia hotel in Trento

38 Expected Answer Quantifier Attribute of the EAT that specifies the number of expected items in the answer. o I would like to know the three colors of the Italian flag o which movies are on tonight at Multisala Modena all The possible values are: one, at least one, all, n. 38


Download ppt "Co-funded by the European Union The QALL-ME Benchmark: a Multilingual Resource of Annotated Spoken Requests for Question Answering E. Cabrio, M. Kouylekov,"

Similar presentations


Ads by Google