Presentation is loading. Please wait.

Presentation is loading. Please wait.

COLLATE Deriving FrameNet Representations: Towards Meaning-Oriented Question Answering Gerhard Fliedner DFKI GmbH and Computational Linguistics, Saarland.

Similar presentations


Presentation on theme: "COLLATE Deriving FrameNet Representations: Towards Meaning-Oriented Question Answering Gerhard Fliedner DFKI GmbH and Computational Linguistics, Saarland."— Presentation transcript:

1 COLLATE Deriving FrameNet Representations: Towards Meaning-Oriented Question Answering Gerhard Fliedner DFKI GmbH and Computational Linguistics, Saarland University NLDB 2004, 23 June 2004

2 NLDB 2004, 23 June 20042 Overview  Introduction  Question Answering Using FrameNet  Deriving FrameNet Structures from Texts  Implementation Issues  Conclusions

3 NLDB 2004, 23 June 20043 Introduction  We present a system for automatically annotating German texts with lexical semantic structures, namely FrameNet.  This module is eventually to form the core of a Question Answering system that uses direct matching of FrameNet representations of both document collection and the user’s questions.  This work is pursued within the Collate project (Computational Linguistics and Language Technology for Real Life Applications) at DFKI GmbH, partly jointly with the Computational Linguistics Department of the Saarland University, Saarbrücken.  Joint work with Christian Braun (Saarland University)

4 NLDB 2004, 23 June 20044 Meaning Oriented QA: System Architecture

5 NLDB 2004, 23 June 20045 Overview  Introduction  Question Answering Using FrameNet  Deriving FrameNet Structures from Texts  Implementation Issues  Conclusions

6 NLDB 2004, 23 June 20046 Using Semantics in Question Answering  Most QA systems use IR techniques based on surface words for document/passage retrieval.  Often used extensions:  Stemming  Query expansion using semantically related words (mostly using WordNet)  Deeper linguistic processing of retrieved passages (using logic forms or similar)  However, semantic relations between words are rarely taken into account.

7 NLDB 2004, 23 June 20047 Different textual realisations  We want to reliably capture systematic semantic relations such as  Synonmy: buy vs. purchase  Converse/inverse relations: buy vs. sell  (Some) Hyponymy/Hyperonymy: order/request  Realisations as e.g. verbs or nouns should receive the same representation A sold B to C vs. the sale of C to B by A  We also want to factor out different surface realisations of argument PPs (especially with ‘picture nouns’)  Compare A with B vs. compare A to B

8 NLDB 2004, 23 June 20048 FrameNet  As a framework for a ‘flat’ semantic representation, we have chosen FrameNet.  FrameNet is a database that documents the semantic and syntactic valence, using a concept that is derived from the idea of thematic roles (Fillmore, 68).  Related words are grouped into a hierarchical structure of frames according to word fields.  Instead of universal thematic roles, each frame has a set of specific roles (frame elements)  For example: Commerce (buy, sell, sale) defines frame elements buyer and seller.

9 NLDB 2004, 23 June 20049 FrameNet: Resources  English FrameNet: ICSI, University of Berkeley (CA) Charles Fillmore et al. Overall running time: 5 years  German FrameNet: SALSA (The Saarbrücken Lexical Semantics Annotation and Analysis Project) Leibniz programme of the German Science Foundation (DFG) Saarland University Manfred Pinkal et al.

10 NLDB 2004, 23 June 200410 FrameNet: Example

11 NLDB 2004, 23 June 200411 Example: Commerce_buy Commerce_buy (buy.v, purchase.v, purchase.n) Core Elements: Buyer Seller Goods Money Ratefive dollars an hour Unitby the pound Non-Core Elements: Meansbuy with cash Place Purpose Reason Time

12 NLDB 2004, 23 June 200412 Overview  Introduction  Question Answering Using FrameNet  Deriving FrameNet Structures from Texts  Implementation Issues  Conclusions

13 NLDB 2004, 23 June 200413 Deriving FrameNet Structures from Texts  Cascade of Parsers  Parsers use hand-crafted grammars.  Easy-first parsing (Abney): Every parser recognises one linguistically motivated ‘layer’.  Ambiguities are in general left unresolved, so that later processing steps may resolve them.

14 NLDB 2004, 23 June 200414 Morphology  Tokenisation (rule-based, using abbreviation recognition)  Morphological analysis using GERTWOL German Two- Level Morphology by Lingsoft Oy, Helsinki.  Full German morphology (inflection, derivation, composition)  Broad coverage lexicon (~350,000 stems)

15 NLDB 2004, 23 June 200415 Topological Parser (Braun 99, 03)  Recognising German sentence structure based on sentence topology  German sentences have a relatively rigid structure (Vorfeld, left sentence bracket, Mittelfeld, right sentence bracket, Nachfeld).  Helps to recognise the following  Subordinate clauses  Split verbs  Verb clusters  Parser uses Context-Free grammar.  Evaluation: 87% precision&recall (perfect match).

16 NLDB 2004, 23 June 200416 German Sentence Topology  Stellungsfeldertheorie ([Drach, 1937], [Engel, 1970])  A German sentence is dividable into fields: Das Unternehmen hat 1999 gute Gewinne gemacht, weil es expandiert hat. LKRK LKRK The company has 1999 good profits made, because it expanded has. Main clause VFMFNF MF Subordinate clause VF : Front field MF: Midfield NF: Back field LK: Left bracket RK: Right bracket

17 NLDB 2004, 23 June 200417 NE Recognition  Finite state based rule set  Developed in Collate IE subproject (multilingual NE recognition)  Covers company names, currency expressions, date expressions, number expressions, person names  Gazetteer with several thousand company names  Evaluation: precision 96%, recall 82% (average for different text sorts)  Complementation with more sophisticated techniques is under investigation (e.g. “learn-filter-apply-forget”, Volk/Clematide 01).

18 NLDB 2004, 23 June 200418 NP/PP Chunking  NP/PP chunking based on extended finite state grammar  Extension allows complex, self-embedded NPs/PPs (e.g. Adjective phrases with pre-nominal complements/modifiers).  Chunker includes results from NE recogniser (N' or NP), allowing complex NPs, e.g. with coordination.  Evaluation (NEGRA, Brants et al. 99, as gold standard): recall 92%, precision 71% (due to different handling of postnominal attachment)

19 NLDB 2004, 23 June 200419 PReDS  Syntacto-semantic dependency structure (Partially Resolved Dependency Structure)  Abstracts away over certain surface differences (active/passive), retains others (prepositions in PPs).  Underspecified in case of ambiguities  Derivation using context-free grammar  Brings together results from all previous steps

20 NLDB 2004, 23 June 200420 Deriving FrameNet structures  Based on PReDS  Subtree matching using weighted rules  Based on FrameNet valency information  Small coverage for German yet, but grows with increasing FrameNet coverage

21 NLDB 2004, 23 June 200421 Putting it all together Gloss: Lockheed has from Great Britain the order for 25 transport planes received.

22 NLDB 2004, 23 June 200422 Overview  Introduction  Question Answering Using FrameNet  Deriving FrameNet Structures from Texts  Implementation Issues  Conclusions

23 NLDB 2004, 23 June 200423 Implementation Issues of the QA system  Frame Merging  Question type recognition/question typology  Efficient storing of FrameNet structures (database)  ‘Ontology-enabled’ matching  Matching Interlinked Frames (‘database join’)  Inferencing

24 NLDB 2004, 23 June 200424 Question/Matching Lockheed has received an order for 25 transport planes from Great Britain. From whom has Lockheed received an order? Getting Target:Receive Donor:Great Britain Recipient:Lockheed Theme: Getting Target:Receive Donor:? [Person_or_Organisation] Recipient:Lockheed Theme:(Request)

25 NLDB 2004, 23 June 200425 Question/Matching Lockheed has received an order for 25 transport planes from Great Britain. From whom has Lockheed received an order? Getting Target:Receive Donor:Great Britain Recipient:Lockheed Theme: Getting Target:Receive Donor:? [Person_or_Organisation] Recipient:Lockheed Theme:(Request)

26 NLDB 2004, 23 June 200426 Question/Matching Lockheed has received an order for 25 transport planes from Great Britain. From whom has Lockheed received an order? Getting Target:Receive Donor:Great Britain Recipient:Lockheed Theme: Getting Target:Receive Donor:? [Person_or_Organisation] Recipient:Lockheed Theme:(Request)

27 NLDB 2004, 23 June 200427 Overview  Introduction  Question Answering Using FrameNet  Deriving FrameNet Structures from Texts  Implementation Issues  Conclusions

28 NLDB 2004, 23 June 200428 Conclusions  We have presented a system for deriving FrameNet structures from German texts.  The coverage needs to be extended with growing German FrameNet.  An evaluation is in process, based on matching of grammatical relations (Carroll, Minnen, Briscoe 03)  The QA system is still in its design phase, some of the issues have been shown.

29 NLDB 2004, 23 June 200429 Questions

30 NLDB 2004, 23 June 200430 Backups

31 NLDB 2004, 23 June 200431 FrameNet Representation: Current Thoughts  Basis: Frame instances  Frame elements are references (links) to frame instances  A FrameNet representation thus forms a network of linked frame instances.  This is comparable to the A-Box in Knowledge Representation.

32 NLDB 2004, 23 June 200432 Example Lockheed has received an order for 25 transport planes from Great Britain. Getting Target:Receive Donor:Great Britain Recipient:Lockheed Theme: Request Target:Order Message:25 Transport planes Speaker: Addressee:

33 NLDB 2004, 23 June 200433 Frame Merging Lockheed has received an order for 25 transport planes from Great Britain. Getting Target:Receive Donor:Great Britain Recipient:Lockheed Theme: Request Target:Order Message:25 Transport planes Speaker: Addressee:

34 NLDB 2004, 23 June 200434 Frame Merging  Comparable to template merging in IE systems.  In IE often done by sets of rules describing equality and inequality constraints over template slots.  Hand-craft rules? (Observation: Give/receive order are ‘strong’ collocations)  Use machine learning?

35 NLDB 2004, 23 June 200435 Matching  Storing/Matching in principle straightforward, but:  Hypo/Hypernyms should be matched, e.g. should plane match transport plane in ‘Who has ordered planes from Lockheed?’.  Similar to ‘Ontology-Enabled’ Searching (Weikum et al.)

36 NLDB 2004, 23 June 200436 Missing Frames  The FrameNet coverage is not yet perfect, therefore ‘missing’ frames will have to be inserted.  Different degree of difficulty:  Named Entities (as ‘Great Britain’): Introduce pseudo-frame without frame elements.  Nouns (as ‘transport planes’): Introduce frame, try to ‘position’ it using sortal information.  Verbs (as ‘pinch’ for ‘get’): Introduce frame, try to ‘position’ it using sortal information, assign underspecified frame elements.

37 NLDB 2004, 23 June 200437 Underspecified frames Lockheed has received an order for 25 transport planes from Great Britain. Who pinched the order from Great Britain? Getting Target:Receive Donor:Great Britain Recipient:Lockheed Theme: PseudoFrame_pinch Target:pinch DeepSubject:? [Person_or_Organisation] PP from :Great Britain DeepObject:(Request)

38 NLDB 2004, 23 June 200438 Sortal Information Lockheed has received an order for 25 transport planes from Great Britain. From whom has Lockheed received an order for the construction of transport planes? Request Target:Order Message:25 transport planes Request Target:Order Message: Construction Target:Construction Created_entity:25 transport planes

39 NLDB 2004, 23 June 200439 Sortal Information Lockheed has received an order for 25 transport planes from Great Britain. From whom has Lockheed received an order for the construction of transport planes? Request Target:Order Message:25 transport planes Request Target:Order Message: Construction Target:Construction Created_entity:25 transport planes ?

40 NLDB 2004, 23 June 200440 Sortal Mismatch  Case of sortal mismatch: ‘Message’ should contain an event, ’25 transport planes’ is not an event  General solution: Type coercion.  Two solutions possible:  Introduce an empty, underspecified frame during indexing.  Enhance matching to handle these cases.

41 NLDB 2004, 23 June 200441 Matching Interlinked Frames Lockheed has received an order for 25 transport planes from Great Britain. Who has received an order for 25 transport planes? Getting Target:Receive Donor:Great Britain Recipient:Lockheed Theme: Request Target:Order Message:25 Transport planes Speaker: Addressee:

42 NLDB 2004, 23 June 200442 Database join  In relational databases, such a query would be done using a join (very efficient).  Can that be brought together with out other requirements?

43 NLDB 2004, 23 June 200443 Inferencing  Quite often, inferencing might help to find answers to ‘hard’ questions: List plane manufacturers. plane_manufacturer(x)↔company(x)&  y.produce(x,y)& plane(y) company(lockheed).  z.receive_from(lockheed,z,great_britain)&order_to(z,w)&  F.F(lockheed,v)&plane(v).  plane_manufacturer(lockheed).  See QA engines by LCC (Harabagiu et al.)

44 NLDB 2004, 23 June 200444 Parser Evaluation: Grammatical Function Annotation Die im Direktvertrieb aktiven Gesellschaften schneiden 1994 gut ab. ncsubj(abschneiden, Gesellschaft, _) dobj(abschneiden, gut, _) ncmod(_, Gesellschaft, aktiv) iobj(in, aktiv, Direkt#vertrieb) ncmod(_, abschneiden, 1994)


Download ppt "COLLATE Deriving FrameNet Representations: Towards Meaning-Oriented Question Answering Gerhard Fliedner DFKI GmbH and Computational Linguistics, Saarland."

Similar presentations


Ads by Google