Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik.

Similar presentations


Presentation on theme: "Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik."— Presentation transcript:

1 Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik

2 Lexicon, Lexical Semantics, Grammar, and Translation for Norwegian A 4-year project (2002 - 2006) involving groups at: The University of Oslo The University of Bergen NTNU (The University of Trondheim) Cooperation with PARC (John Maxwell) and others

3 The LOGON system Schematic architecture

4 XLE: Xerox Linguistic Environment A platform developed over more than 20 years at Xerox PARC (now PARC) Developer: John Maxwell LFG grammar development Parsing Generation Transfer Stochastic parse selection Interaction with shallow methods

5 An LFG analysis: Det regnet 'It rained'

6 Develops parallel grammars on XLE: English, French, German, Norwegian, Japanese, Urdu, Welsh, Malagasy, Arabic, Hungarian, Chinese, Vietnamese ‘Parallel grammars’ means parallel f-structures: A common inventory of features Common principles of analysis ParGram: The Parallel Grammar Project A long-term project (1993-)

7 LOGON Analysis Modules Input string Tokenization Named ent. Compounds Morphology LFG lexicons: NKL-derived Hand coded Lexical templates Syntactic rules Rule templates c-structures f-structures MRSs Norsk ordbank lexicon XLE Parser NorGram String of stems and tags Output-input Supporting knowledge base

8 Scope of NorGram Lexicon: about 80 000 lemmas. In addition: Automatically analyzed compounds Automatically recognized proper names "Guessed" nouns Syntax: 229 complex rules, giving rise to about 48 000 arcs Semantics: Minimal Recursion Semantics projections for all readings

9 Coverage Performance on an unknown corpus of newspaper text: 17 randomly selected pieces of text, limited to coherent text, comprising 1000 sentences taken from 9 newspapers Adresseavisen, Aftenposten, Aftenposten nett, Bergens Tidende, Dagbladet, Dagens Næringsliv, Dagsavisen, Fædrelandsvennen, Nordlys, from the editions on November 11th 2005.

10

11 The LOGON challenge: From a resource grammar based on independent linguistic principles, derive MRS structures harmonized with the MRS structures of the HPSG English Resource Grammar.

12 Semantics for translation: Two issues The representational subset problem - Desirable: normalization to flat structures with unordered elements. Complete and detailed semantic analyses may be unnecessary. - Desirable: rich possibilities of underspecification

13 Basics of Minimal Recursion Semantics Developers: A. Copestake, D. Flickinger, R. Malouf, S. Rieheman, I. Sag A framework for the representation of semantic information Developed in the context of HPSG and machine translation (Verbmobil) Sources of inspiration: - Quasi-Logical Form (H. Alshawi): underspecification, e.g. of quantifier scope - Shake-and-bake translation (P. Whitelock): a bag of words as interface structure

14 An MRS representation is a bag of semantic entities (some corresponding to words, some not), each with a handle, plus a bag of handle constraints allowing the underspecification of scope, plus a handle and an index. Each semantic entity is referred to as an Elementary Predication (EP). Relations among EPs are captured by means of shared variables. There are three elementary variable types: - handles (or 'labels') (h) - events (e) - referential indices (x)

15 From standard logical form to MRS «Every ferry crosses some fjord» Two readings: Replace operators with generalized quantifiers: every(variable, restriction, body) some(variable, restriction, body) The first reading (wide-scope every): varrestrictionbody

16 Make the structure flat: give each EP a handle replace embedded EPs by their handles collect all EPs on the same level (understood as conjunction)

17 Underspecified scope by means of handle constraints: Make the structure flat: give each EP a handle replace embedded EPs by their handles collect all EPs on the same level (understood as conjunction) Wide scope: someWide scope: every

18 MRS as feature structure (also adding event variables): Norwegian translation: «Hver ferge krysser en fjord»

19 Projecting MRS representations from f-structures «Katten sover» 'The cat sleeps'

20 Projecting MRS representations from f-structures «Katten sover» 'The cat sleeps'

21

22 mrs::

23

24   Composition: Top-level MRS with unions of HCONS and RELS:

25

26 Post-processing this structure brings us back to the LOGON MRS format: http://decentius.aksis.uib.no/logon/xle-mrs.xml

27 Examples

28 bil 'car' (as in "Han kjøpte bil" 'He bought [a] car') No SPEC

29 disse hans mange spørsmål 'these his many questions' Multiple SPECs

30 Han jaget barnet ut nakent 'He chased the child out naked'

31 The Transfer Component Developer of the formalism: Stephan Oepen

32 Example of transfer Source sentence: Henterhanbilensin? fetcheshe car.DEFPOSS.REFL.SG.MASC 'Does he fetch his car?' Alternative reading: 'Does he fetch the one of the car?'

33 Parse output:

34 Choosing the first reading of Henter han bilen sin?

35 The variables have features. Interrogative is coded as [SF ques] on the event variable.

36 Two of four transfer outputs

37 Norwegian transfer input One of four English transfer outputs

38 Generator output from the chosen transfer output

39 Transfer formalism (Stephan Oepen) The form of a transfer rule: C = context I = input F = filter O = output

40 Simple example: Lexical transfer rule, transferring bekk into creek No context, no filter, only the predicate is replaced.

41 Example with a context restriction: gå en tur (lit. 'go a trip') is transferred into the light-verb construction take a trip. In the context of _tur_n as its second argument, _gå_v is transferred to _take_v.

42 The SEM-I (Semantic Interface) A documentation of the external semantic interface for a grammar, crucial for the writer of transfer rules. In order to enforce the maintaining of a SEM-I, LOGON parsing returns fail if every parse contains at least one predicate not in the SEM-I.

43 A small section of the verb part of the NorGram SEM-I Size of the Norwegian SEM-I: slightly less than 6000 entries

44 Parse Selection Parsing, transfer and generation may each give many solutions, leading to a fanout tree: The outputs at each of the three stages are statistically ranked.

45 Example Example of a four-way ambiguity: Det regnet 'It rained'/'It calculated'/'That one calculated'/'That rain' The Parsebanker Efficient treebank building by discriminants Developer: Paul Meurer, Bergen Predecessors in discriminant analysis: David Carter (1997) Stephan Oepen, Dan Flickinger & al. (2003)

46 1 2

47 3 4

48 Packed representations and discriminants (Paul Meurer)

49

50 Clicking on one discriminant is in this case sufficient to select a unique solution:

51 The Parsebanker

52

53

54 'After all, a human being must be something more than a machine?'

55 TigerSearch The implementation is under development by Paul Meurer Find selected prepositional phrases with sentential objects:

56 Find selected prepositional phrases with the preposition 'om' and nominal objects:

57 Find topicalized objects:


Download ppt "Deep Grammars in Hybrid Machine Translation University of Bergen Helge Dyvik."

Similar presentations


Ads by Google