Download presentation
Presentation is loading. Please wait.
Published byHilary Burke Modified over 9 years ago
1
Jan 2005CSA4050 Machine Translation II1 CSA4050: Advanced Techniques in NLP Machine Translation II Direct MT Transfer MT Interlingual MT
2
Jan 2005CSA4050 Machine Translation II2 History – Pre ALPAC 1952 – First MT Conference (MIT) 1954 – Georgetown System (word for word based) successfully translated 49 Russian sentences 1954 – 1965 – Much investment into brute force empirical approach – crude word-for-word techniques with limited reshuffling of output ALPAC (Automatic Language Processing Advisory Committee) Report concludes that research funds should be directed into more fundamental linguistic research
3
Jan 2005CSA4050 Machine Translation II3 History – Post ALPAC 1965-1970 –Operational Systems approach: SYSTRAN (eventually became the basis for babelfish) –University centres established in Grenoble (CETA), Montreal and Saarbruecken Systems developed on the basis of linguistic and non- linguistic representations 1970-1990 –Ariane (Dependency Grammar) –TAUM METEO (Metamorphoses Grammars) –EUROTRA (multilingual intermediate representations) –ROSETTA (Landsbergen) interlingua based –BSO (Witkam) – Esperanto 1990- Data Driven Translation Systems
4
Jan 2005CSA4050 Machine Translation II4 MT Methods MT Direct MT Rule-Based MT Data-Driven MT Transfer Interlingua EBMT SMT
5
Jan 2005CSA4050 Machine Translation II5 Basic Architecture: Direct Translation source texttarget text Basic idea - language pair specific - no intermediate representation - pipeline architecture
6
Jan 2005CSA4050 Machine Translation II6 Staged Direct MT (En/Jp)
7
Jan 2005CSA4050 Machine Translation II7 Direct Translation Advantages Exploits fact that certain potential ambiguities can be left unresolved wall -wand/mauer – parete/muro Designers can concentrate more on special cases where languages differ. Minimal resources necessary: a cheap bilingual dictionary & rudimentary knowledge of target language suffices. Translation memories are a (successful and much used) development of this approach.
8
Jan 2005CSA4050 Machine Translation II8 Direct Translation Disadvantages Computationally naive –Basic model: word-for-word translation + local reordering (e.g. to handle adj+noun order) Linguistically naive: –no analysis of internal structure of input, esp. wrt the grammatical relationships between the main parts of sentences. –no generalisation; everything on a case-by-case basis. Generally, poor translation –except in simple cases where there is lots of isomorphism between sentences.
9
Jan 2005CSA4050 Machine Translation II9 Transfer Model of MT To overcome language differences, first build a more abstract representation of the input. The translation process as such (called transfer) operates upon at the level of the representation. This architecture assumes –analysis via some kind of parsing process. –synthesis via some kind of generation.
10
Jan 2005CSA4050 Machine Translation II10 Basic Architecture: Transfer Model source text target text source representation target representation analysisgeneration transfer
11
Jan 2005CSA4050 Machine Translation II11 Transfer Rules In General there are two kinds of transfer rule: Structural Transfer Rules: these deal with differences in the syntactic structures. Lexical Transfer Rules: these deal with cross lingual mappings at the level of words and fixed phrases.
12
Jan 2005CSA4050 Machine Translation II12 Structural Transfer Rule NP s (Adj s,Noun s ) NP t (Noun t,Adj t )
13
13 existential-there-sentence there was an old man gardening intermediate-representation-1 an old man gardening was intermediate-representation-2 gardening an old man was japanese-s niwa no teire o suru ojiisan ita delete initial there make gardening modify NP reverse order of NP/modifier lexical transfer
14
Jan 2005CSA4050 Machine Translation II14 More Structural Transfer Rules
15
Jan 2005CSA4050 Machine Translation II15 Lexical Transfer Easy cases are based on bilingual dictionary lookup. Resolution of ambiguities may require further knowledge know savoir know connaître Not necessarily word for word schimmel white horse
16
Jan 2005CSA4050 Machine Translation II16 Transfer Model Degree of generalisation depends upon depth of representation: –Deeper the representation, harder it is to do analysis or generation. –Shallower the representation, the larger the transfer component. Where does ambiguity get resolved? Number of bilingual components can get large.
17
Jan 2005CSA4050 Machine Translation II17 Interlingual Translation: The Vauquois Triangle source text target text interlingua analysis generation increasing depth
18
Jan 2005CSA4050 Machine Translation II18 Interlingual Translation Transfer model requires different transfer rules for each language pair. Much work for multilingual system. Interlingual approach eliminates transfer altogether by creating a language independent canonical form known as an interlingua. Various logic-based schemes have been used to represent such forms. Other approaches include attribute/value matrices called feature structures.
19
Jan 2005CSA4050 Machine Translation II19 Possible Feature Structure for “There was an old man gardening” eventgardening typeman agentnumbersg definitenessindef aspectprogressive tensepast
20
Jan 2005CSA4050 Machine Translation II20 Ontological Issues The designer of an interlingua has a very difficult task. What is the appropriate inventory of attributes and values? Clearly, the choice has radical effects on the ability of the system to translate faithfully. For instance, to handle the muro/parete distinction, the internal/external characteristic of the wall would have to be encoded.
21
Jan 2005CSA4050 Machine Translation II21 Feature Structure for “muro” wordmuro syntaxPOSclass noun type count fieldbuildings semanticstypestructural positionexternal
22
Jan 2005CSA4050 Machine Translation II22 Interlingual Approach Pros and Cons Pros –Portable (avoids N 2 problem) –Because representation is normalised structural transformations are simpler to state. –Explanatory Adequacy Cons –Difficult to deal with terms on primitive level: –universals? –Must decompose and reassemble concepts –Useful information lost (paraphrase) In practice, works best in small domains.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.