Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jan 2005Machine Translation II1 Postgraduate Diploma in Translation Machine Translation II Direct MT Transfer MT Interlingual MT.

Similar presentations


Presentation on theme: "Jan 2005Machine Translation II1 Postgraduate Diploma in Translation Machine Translation II Direct MT Transfer MT Interlingual MT."— Presentation transcript:

1 Jan 2005Machine Translation II1 Postgraduate Diploma in Translation Machine Translation II Direct MT Transfer MT Interlingual MT

2 Jan 2005 Machine Translation II 2 Today’s Lecture Part I Historical Background Different Translation Models  Direct MT  Transfer Based MT  Interlingual MT Part II  Example Based MT  Statistical MT

3 Jan 2005 Machine Translation II 3 History – Pre ALPAC 1952 – First MT Conference (MIT) 1954 – Georgetown System (word for word based) successfully translated 49 Russian sentences 1954 – 1965 – Much investment into brute force empirical approach – crude word-for-word techniques with limited reshuffling of output ALPAC (Automatic Language Processing Advisory Committee) Report concludes that research funds should be directed into more fundamental linguistic research

4 Jan 2005 Machine Translation II 4 History – Post ALPAC 1965-1970  Operational Systems approach: SYSTRAN (eventually became the basis for babelfish)  University centres established in Grenoble (CETA), Montreal and Saarbruecken Systems developed on the basis of linguistic and non-linguistic representations 1970-1990  Ariane (Dependency Grammar)  TAUM METEO (Metamorphoses Grammars)  EUROTRA (multilingual intermediate representations)  ROSETTA (Landsbergen) interlingua based  BSO (Witkam) – Esperanto 1990- Data Driven Translation Systems

5 Jan 2005 Machine Translation II 5 MT Methods MT Direct MT Rule-Based MT Data-Driven MT Transfer Interlingua EBMT SMT

6 Jan 2005 Machine Translation II 6 Basic Architecture: Direct Translation source texttarget text Basic idea - language pair specific - no intermediate representation - pipeline architecture

7 Jan 2005 Machine Translation II 7 Direct Translation Advantages Exploits fact that certain potential ambiguities can be left unresolved wand/mauer – parete/muro → wall Designers can concentrate more on special cases where languages differ. Minimal resources necessary: a cheap bilingual dictionary & rudimentary knowledge of target language suffices. Translation memories are a (successful and much used) development of this approach.

8 Jan 2005 Machine Translation II 8 Direct Translation Disadvantages Computationally naive  Basic model: word-for-word translation + local reordering (e.g. to handle adj+noun order) Linguistically naive:  no analysis of internal structure of input, esp. wrt the grammatical relationships between the main parts of sentences.  no generalisation; everything on a case-by-case basis. Generally, poor translation  except in simple cases where there is lots of isomorphism between sentences.

9 Jan 2005 Machine Translation II 9 Example of Direct Translation French: Les soldats sont dans le café

10 Jan 2005 Machine Translation II 10 Example of Direct Translation French: Les soldats sont dans le café English: The soldiers are in the coffee

11 Jan 2005 Machine Translation II 11 Transfer Model of MT To overcome language differences, first build a more abstract representation of the input. The translation process as such (called transfer) operates upon at the level of the representation. This architecture assumes  analysis via some kind of parsing process.  synthesis via some kind of generation process

12 Jan 2005 Machine Translation II 12 Basic Architecture: Transfer Model source text target text source representation target representation analysisgeneration transfer

13 Jan 2005 Machine Translation II 13 The Analysis Problem The aim of analysis is to transform unstructured text to a structured representation that is easier to translate. There are two major problems  Ambiguity  Ill formed input: the fact that written language abounds in errors of spelling, repeated words, grammatical errors etc.

14 Jan 2005 Machine Translation II 14 Does Ambiguity Matter? In some cases ambiguity can be ignored or preserved: e.g, they (En) → sie (De) irrespective of gender. Different language pairs behave in different ways they (En) → ils (Fr) they (En) → elles (Fr)

15 Jan 2005 Machine Translation II 15 Does Ambiguity Matter ?(2) Pauline writes to her friends in Paris. Ambiguity can remain in French translation Pauline écrit à ses amis à Paris Pauline misses her friends in Paris Ambiguity has to be resolved À Paris les amis manquent à Pauline Les amis à Paris manquent à Pauline

16 Jan 2005 Machine Translation II 16 Two Key Points Ambiguities combine together. The resulting collection of ambiguities can be very large. Different varieties of information required to resolve ambiguities. Grammatical Information (here agreement) John hit Bill then he hit him John hit Mary then she hit him World knowledge Pregnant women and men came to the meeting.

17 Jan 2005 Machine Translation II 17 Ambiguities Multiply In the worst case, if we have N ambiguous words, we have 2 N ambiguities. Exponential growth; combinatorial explosion. Sometimes some of the ambiguities can be ruled out a priori Sam loves presents but not always: typically there are tens or even hundreds of possible analyses for very ordinary sentences.

18 Jan 2005 Machine Translation II 18 Ill-formed Input Two methods for dealing with it: Permuting, inserting, deleting words until an analysis is found. There are so many possibilities that this can easily lead to combinatoral explosion Relaxing constraints on, e.g., agreement. The problems are interesting, but the solution leave something to be desired If used generally, can create further ambiguities. Inverse relationship between robustness and potential for ambiguity.

19 Jan 2005 Machine Translation II 19 Transfer The task of transfer is to take the source interface structure produced by analysis and produce a target interface structure which can be input to the synthesis component. In general, this transformation is effected by transfer rules.

20 Jan 2005 Machine Translation II 20 An Example I miss London [sentence/pres miss [nounphrase/sing/1 pronoun] [nounphrase London]] Londres me manque [sentence/pres manquer [nounphrase Londres] [nounphrase/sing/1 pronoun]] source interface structure target interface structure

21 Jan 2005 Machine Translation II 21 Transfer Rules London translates as Londres miss translates as manquer First person singular nounphrase translate as first person singular nounphrase direct object of miss translates as subject of manquer. subject of miss translates as indirect object of manquer.

22 Jan 2005 Machine Translation II 22 Transfer Rules In General there are two kinds of transfer rule: Lexical Transfer Rules: these deal with cross lingual mappings at the level of words and fixed phrases. Structural Transfer Rules: these deal with differences in the syntactic structures.

23 Jan 2005 Machine Translation II 23 Lexical Transfer Word → Word (usually) Easy cases are based on bilingual dictionary lookup. Resolution of ambiguities may require further knowledge know  savoir know  connaître Not necessarily word for word schimmel  white horse

24 Jan 2005 Machine Translation II 24 Structural Transfer Rule Tree → Tree NP s (Adj s,Noun s )  NP t (Noun t,Adj t )

25 Jan 2005 Machine Translation II 25 Structural Transfer (1) Passive Constructions. apples are sold here. (passive) man verkoopt hier appels (impersonal) one sells here apple se venden manzanas aqui (reflexive) self they sell apples here

26 Jan 2005 Machine Translation II 26 Structural Transfer (2) Adjective/Noun Correspondences An adjective in En translates as a noun in De  I am hungry  Ich habe hunger I have hunger Knock on effect: we cannot get the normal translation of very (= sehr in German)  I am very hungry  Ich habe einen riesigen hunger I have a huge hunger

27 Jan 2005 Machine Translation II 27 Sometimes knock on effect requires insertion of new information das für Sam neue Auto the for Sam new car the car which is new to Sam The problem here is that  we have to supply a verb (in this case is)  the verb has to have a tense Neither of these pieces of information were present in the source.

28 Jan 2005 Machine Translation II 28 Generally, translation may involve inserting information missing in source. In English, one cannot avoid the issue of whether a noun phrase is singular or plural In Japanese, this information can remain unspecified. Therefore, there is a problem going from Japanese into English Similarly, one cannot avoid the issue of social relationship between the reader and writer in Japanese, but one can in English. So there is a (different) problem in going from English to Japanese.

29 Jan 2005 Machine Translation II 29 Towards an Interlingua The transfer problem arises because of differences between source and target interface structures. The more similar they are, the smaller the transfer problem should be. There is clearly a relationship between the “depth” of interface structure and the size of the respective components of an MT system.

30 Jan 2005 Machine Translation II 30 Interlingual Translation: The Vauquois Triangle source text target text interlingua analysis generation increasing depth

31 Jan 2005 Machine Translation II 31 Interlingual Translation Transfer model requires different transfer rules for each language pair. Much work for multilingual system. Interlingual approach eliminates transfer altogether by creating a language independent canonical form known as an interlingua. Various logic-based and feature-based schemes have been used to represent such forms. Other approaches include attribute/value matrices called feature structures.

32 Jan 2005 Machine Translation II 32 Possible Feature Structure for “There was an old man gardening” eventgardening typeman agentnumbersg definitenessindef aspectprogressive tensepast

33 Jan 2005 Machine Translation II 33 Interlingual Translation. Problem 1: Unnecessarily complex analysis Basic idea is that source and target interface structures are identical. With this approach analysis is more complex. In particular, all distinctions relevant for translation into any target language must be present in the interlingua. English/Japanese: Analysis of the word sister will have to distinguish between younger and older sister. Wasteful for an English/French system.

34 Jan 2005 Machine Translation II 34 Interlingual Translation. Problem 2: Ensuring identity of S & T Sam eats only fish Natural En interlingual representation if e is an eating event with eater Sam, the thing eaten is fish Natural Jp interlingual representation there are no eating events with Sam as eater that do not involve fish as object These representations are intuitively equivalent but they are not identical. To get from one to the other requires something like a logic for the interlingua which provides a well- defined notion of equivalence.

35 Jan 2005 Machine Translation II 35 Interlingual Approach Pros and Cons Pros  Portable (avoids N 2 problem)  Because representation is normalised structural transformations are simpler to state.  Elegance Cons  Difficult to deal with terms on primitive level:  Universal concepts?  Must decompose and reassemble concepts  Useful information lost (paraphrase) In practice, works best in small domains.

36 Jan 2005 Machine Translation II 36 Intrerlingual Systems: Problems 3 Ontological Issues The designer of an interlingua has a very difficult task. What is the appropriate inventory of attributes and values? Clearly, the choice has radical effects on the ability of the system to translate faithfully. For instance, to handle the muro/parete distinction, the internal/external characteristic of the wall would have to be encoded. source interface structure

37 Jan 2005 Machine Translation II 37 Summary More abstract representations are a good thing because they make the job of the transfer component smaller. Yet there are irreducible differences in the way that languages express the same content. So the transfer problem cannot be entirely eliminated.

38 Jan 2005 Machine Translation II 38 The Generation (Synthesis) Problem There are typically many ways in which the same content can be expressed. Sometimes only one of the ways of expressing the content is correct, e.g.  What time is it?  How late is it?  What is the hour? Difficult to keep a list of contents that should be realised idiomatically.

39 Jan 2005 Machine Translation II 39 Synthesis: Problem 2 There may be no obvious way to realise the content.  Sam saw a cat. It was black.  Sam saw something black. It was a cat.  Sam saw a cat which was black.  There was a black cat. Sam saw it. How does one select between the alternatives? Heuristic: stick to the form of the source sentence.

40 Jan 2005 Machine Translation II 40 Components of a typical MT system Source & target lexicons (10,000+ entries) Morphological rules (50+ rules) Source analysis rules (50 -100 rules) Target generation rules (50 - 100 rules) Transfer component if system is not interlingual (100-1000 rules) Ten man/years to produce a basic system


Download ppt "Jan 2005Machine Translation II1 Postgraduate Diploma in Translation Machine Translation II Direct MT Transfer MT Interlingual MT."

Similar presentations


Ads by Google