Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.

Similar presentations


Presentation on theme: "Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004."— Presentation transcript:

1 Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004

2 Anna Sågvall Hein, GSLT, September 2004 Translation ”substitute the text material of one language (SL) by the equivalent text material of another language (TL)” (Catford 1965: 20) ”Translation consists in producing in the target language the closest natural equivalent of the text material of the source language, in the first hand concerning meaning, in the second hand concerning style (Nida 1975: 32) ”Translation is in theory impossible, but in practice fairly possible” Mounin (1967) Catford, J. C. (1965), A Linguistic Theory of Translation, Oxford Press, England. Mounin, G. (1967) Les problèmes théotitiques de la traduction. Paris Nida, E. (1975), A Framework for the Analysis and Evaluation of Theories of Translation, in Brislin, R. W. (ed) (1975), Translation Application and Research, Gardner Press, New York.

3 Anna Sågvall Hein, GSLT, September 2004 Equivalence form meaning style effect

4 Anna Sågvall Hein, GSLT, September 2004 Formal and dynamic equivalence Formal equivalence focuses attention on the message itself, in both form and content. It aims to allow the reader to understand as much of the SL context as possible. Dynamic equivalence is based on the principle of equivalent effect, i.e. that the relationship between receiver and message should aim at being the same as that between the original receivers and the SL message. (Nida 75)

5 Anna Sågvall Hein, GSLT, September 2004 Can computers translate? Not a simple yes or no; it depends on the purpose of the translation and the required quality.

6 Anna Sågvall Hein, GSLT, September 2004 Classical problems with MT unrealistic expectations bad translations difficulties in integrating MT in the work flow –the Ericsson case

7 Anna Sågvall Hein, GSLT, September 2004 Feasibility of machine translation quality in relation to purpose control of the source language human machine interaction re-use of translations evalution

8 Anna Sågvall Hein, GSLT, September 2004 Quality publishing quality editing quality browsing qualiy

9 Anna Sågvall Hein, GSLT, September 2004 Translation related tasks translation browsing gisting drafting message dissemination cross-language information searches cross-language interchanges

10 Anna Sågvall Hein, GSLT, September 2004 MT as a cross-language communication tool MT is used not only for pure translation purposes but also for writing in a foreign language and for browsing (Hutchins 2001) Hutchins, J., 2001, Towards a new vision for MT, Introductory speech at MT Summit VIII conference, 18-22 September 2001 (http://ourworld.compuserve.com/homepages/WJHutchins/M TS-2001.pdf)

11 Anna Sågvall Hein, GSLT, September 2004 Control of the source language spell checked and grammar checked SL sublanguage –Domain –Text type controlled language

12 Anna Sågvall Hein, GSLT, September 2004 Spell checking and grammar checking If there are spelling errors or typos in the SL dictionary search will fail If there are grammatical errors in the SL grammatical analysis will fail Where and how should spell and grammar checking be accounted for? Before or in the process?

13 Anna Sågvall Hein, GSLT, September 2004 Controlled language consistent authoring of source texts –reduction of ambiguity –full linguistic coverage controlled vocabulary –full lexical coverage controlled grammar –full grammatical coverage controlled language checking –e.g. Scania Checker

14 Anna Sågvall Hein, GSLT, September 2004 Ex. of controlled languages Simplified English KANT controlled English Scania Swedish –Scania checker

15 Anna Sågvall Hein, GSLT, September 2004 Human intervention before –language checking during –e.g. ambiguity resolution after –post-editing

16 Anna Sågvall Hein, GSLT, September 2004 Re-use of translations translation memories translation dictionaries incl. terminologies lexicalistic translation statistical machine translation example-based translation

17 Anna Sågvall Hein, GSLT, September 2004 Evaluation of MT human automatic –using a gold standard coverage (recall) quality (precision) global similarity measures –merge of recall and precision –BLEU, NIST

18 Anna Sågvall Hein, GSLT, September 2004 Why machine translation? cheaper faster more consistent –when it succeeds …

19 Anna Sågvall Hein, GSLT, September 2004 What is MT proper? To be considered as MT, a system should provide minimally correct morphology minimal syntactic processing minimal semantic processing handle and produce full sentences Hutchins, J., 2000, The IAMT Certification initiative and defining translation system categories (http://nl.ijs.si/eamt00/proc/Hutchins.pdf)

20 Anna Sågvall Hein, GSLT, September 2004 Examples of MT products Systran (http://babelfish.altavista.com/) Comprendium (based on Metal) ProMT (http://www.translate.ru/eng) ESTeam See further: http://ourworld.compuserve.com/homepages/WJHutchins/ Compendium-4.pdf, http://www.foreignword.com/Technology/mt/mt.htm http://ourworld.compuserve.com/homepages/WJHutchins/ Compendium-4.pdf http://www.foreignword.com/Technology/mt/mt.htm

21 Anna Sågvall Hein, GSLT, September 2004 Basic strategies direct translation rule-based translation –transfer –interlingua example-based translation statistical translation hybrids

22 Anna Sågvall Hein, GSLT, September 2004 Direct translation no complete intermediary sentence structure translation proceeds in a number of steps, each step dedicated to a specific task the most important component is the bilingual dictionary typically general language problems with –ambiguity –inflection –word order and other structural shifts

23 Anna Sågvall Hein, GSLT, September 2004 Simplistic approach sentence splitting tokenisation handling capital letters dictionary look-up and lexical substitution incl. some heuristics for handling ambiguities copying unknown words, digits, signs of punctuation etc. formal editing

24 Anna Sågvall Hein, GSLT, September 2004 Advanced classical approach (Tucker 1987) Source text dictionary look-up and morphological analysis Identification of homographs Identification of compound nouns Identification of nouns and verb phrases Processing of idioms

25 Anna Sågvall Hein, GSLT, September 2004 Advanced approach, cont. processing of prepositions subject-predicate identification syntactic ambiguity identification synthesis and morphological processing of target text rearrangement of words and phrases in target text

26 Anna Sågvall Hein, GSLT, September 2004 Feasibility of the direct translation strategy Is it possible to carry out the direct translation steps as suggested by Tucker with sufficient precision without relying on a complete sentence structure?

27 Anna Sågvall Hein, GSLT, September 2004 Assignment 1: manual direct translation Sv. Ytterst handlar kampen för sysselsättning om att hålla samman Sverige.  En. Ultimately, the fight for full employment concerns the cohesion of Swedish society. (from Statement of Government Policy 1996) Define an algorithm and a dictionary (based on Norstedts) for simplistic translation of the example. Present the model and the result.

28 Anna Sågvall Hein, GSLT, September 2004 Assignment 1, cont. Improve the result stepwise in accordance with the advanced direct translation strategy –Specify each step carefully and demonstrate its effect on the translation. Evaluate and discuss the final result. Translate the ex. using Systran (http://kwic.systran.fr/systran/svdemo) and discuss the differences in an evaluative wayhttp://kwic.systran.fr/systran/svdemo Report the assignment and up-load on the web (041001)

29 Anna Sågvall Hein, GSLT, September 2004 Current trends in direct translation re-use of translations –translation memories of sentences and sub-sentence units such as words, phrases and larger units –lexicalistic translation –example-based translation –statistical translation Will re-use of translations overcome the problems with the direct translation approach that were discussed above? If so, how can they be handled?

30 Anna Sågvall Hein, GSLT, September 2004 Systran System Translation developed in the US by Peter Toma first version 1969 (Ru-En) EC bought the rights of Systran in 1976 currently 18 language pairs demo version sv-en in 2003 (http://kwic.systran.fr/systran/svdemo)http://kwic.systran.fr/systran/svdemo http://babelfish.altavista.com/

31 Anna Sågvall Hein, GSLT, September 2004 Systran, cont. more than 1,600,000 dictionary units 20 domain dictionaries daily use by EC translators, administrators of the European institutions originally a direct translation strategy –see H&S today more of a transfer-based strategy

32 Anna Sågvall Hein, GSLT, September 2004 Ex. 1: fairly good translation /Systran sv-en "Enskilda företagare som inte bildat bolag klassificeras hit." "Individual entrepreneurs that have not formed companies are classified here.” Systemet har känt igen bildat som en perfektform och översätter tempusformen korrekt have formed med negationen not på rätt plats.

33 Anna Sågvall Hein, GSLT, September 2004 Ex. 2: word order problem/ Systran sv-en "När byarna kontaktades hade de inte ens utsatts för influensa." "When the villages were contacted had they not even been exposed to flu.” Systemet har inte hittat subjekt och predikat och ger därför fel ordföljd.

34 Anna Sågvall Hein, GSLT, September 2004 Ex. 3: ambiguity problem/ Systran sv-en "Vad kan vi lära av Arrawetestammen?" "What can we faith of the Arawete?” Systemet hittar inte sambandet mellan kan och lära och ser därför inte att lära är ett verb.

35 Anna Sågvall Hein, GSLT, September 2004 Ex. 4: ambiguity problem/ Systran sv-en ”Extrapoleringen går till så här. " ”The extrapolation goes to so here.” Systemet känner inte till partikelverbet känna till och översätter därför felaktigt ord för ord.

36 Anna Sågvall Hein, GSLT, September 2004 Systran Linguistic Resources Dictionaries –POS Definitions –Inflection Tables –Decomposition Tables –Segmentation Dictionaries Disambiguation Rules Analysis Rules

37 Anna Sågvall Hein, GSLT, September 2004 Systran Processing Steps Analysis –Lookup –Compound Decomposition –Disambiguation –Syntactic Analysis –Compound Expansion Sentence Transfer –Initial Target Structure –Lookup –Default Transfer of Attributes –Structure Transformation

38 Anna Sågvall Hein, GSLT, September 2004 Systran Processing Steps (cont) Sentence Synthesis –Structure Transformation –Inflection lookup –Surface Transformation

39 Anna Sågvall Hein, GSLT, September 2004 Motivations for transfer-based translation lexical ambiguity structural differences See further Ingo 91

40 Anna Sågvall Hein, GSLT, September 2004 Example 1 Sv. Fyll på olja i växellådan.  En. Fill gearbox with oil. (from the Scania corpus) fyll på  fill obj  adv adv  obj

41 Anna Sågvall Hein, GSLT, September 2004 Example 2 Sv. I oljefilterhållaren sitter en överströmningsventil.  En. The oil filter retainer has an overflow valve. (from the Scania corpus) sitter  has adv  subj subj  obj

42 Anna Sågvall Hein, GSLT, September 2004 Transfer-based translation intermediary sentence structure basic processes –analysis –transfer –generation (synthesis) language modules –dictionary and grammar of SL –transfer dictionary and transfer rules –dictionary and grammar of TL

43 Anna Sågvall Hein, GSLT, September 2004 SLTL Interlingua Direct translation Transfer Multra Metal

44 Anna Sågvall Hein, GSLT, September 2004 Levels of intermediary structure cf. J&M, Chapter 21 word order

45 Anna Sågvall Hein, GSLT, September 2004 Metal See H&S

46 Anna Sågvall Hein, GSLT, September 2004 MULTRA Multilingual Support for Translation and Writing translation engine transfer-based –shake-and-bake modular unification-based preference machinery trace-able

47 Anna Sågvall Hein, GSLT, September 2004

48 Analysis chart parser (Lisp  C) –procedural formalism unification and other kinds of operations sentence structure –feature structure –grammatical relations –surface order implicit via grammatical relations See further Sågvall Hein&Starbäck (99),Weijnitz (02), Dahllöf (89)

49 Anna Sågvall Hein, GSLT, September 2004 Transfer unification-based declarative formalism –Multra transfer formalism (Beskow 93) lexical and structural rules rules are partially ordered a more specific rule takes precedence over a less specific one –specificity in terms of number of transfer equations all applicable rules are applied written in prolog

50 Anna Sågvall Hein, GSLT, September 2004 Generation syntactic generation –Multra syntactic generation formalism (Beskow 97a) –PATR-like style unification concatenation typed features morphological generation (Beskow 97b) –lexical insertion rules –morphological realisation and phonological finish in prolog written in prolog

51 Anna Sågvall Hein, GSLT, September 2004 An example: Tippa hytten. Tippa hytten. : (* = (PHR.CAT = CL MODE = IMP SUBJ = 2ND VERB = (WORD.CAT = VERB INFF = IMP DIAT = ACT LEX = TIPPA.VB.1 VSURF = +) OBJ.DIR = (PHR.CAT = NP NUMB = SING GENDER = UTR CASE = BASIC DEF = DEF HEAD = (LEX = HYTT.NN.1 WORD.CAT = NOUN))) REG = (V1.LEM = TIPPA.VB) SEP = (WORD.CAT = SEP LEX = STOP.SR.0)))

52 Anna Sågvall Hein, GSLT, September 2004 Transfer structure [VERB : [WORD.CAT : VERB LEX : TILT.VB.0 DIAT : ACT INFF : IMP] OBJ.DIR : [PHR.CAT : NP DEF : DEF NUMB : SING HEAD : [WORD.CAT : NOUN LEX : CAB.NN.0]] MODE : IMP SUBJ: 2ND VSURF: + SEP : [WORD.CAT : SEP LEX : STOP.SR.0] PHR.CAT : CL]

53 Anna Sågvall Hein, GSLT, September 2004 Generation Tilt the cab.

54 Anna Sågvall Hein, GSLT, September 2004 A grammar rule defrule legal.obj { = 'np, not = 'gen, not = 'subj }

55 Anna Sågvall Hein, GSLT, September 2004 Transfer rules copy feature delete feature transfer feature assign feature

56 Anna Sågvall Hein, GSLT, September 2004 Copy feature LABEL mode SOURCE = ?x1 TARGET = ?x2 TRANSFER

57 Anna Sågvall Hein, GSLT, September 2004 Delete feature LABEL REG SOURCE = ANY TARGET = TRANSFER

58 Anna Sågvall Hein, GSLT, September 2004 Transfer feature LABEL OBJ.DIR SOURCE = ?x1 TARGET = ?x2 TRANSFER ?x1 ?x2

59 Anna Sågvall Hein, GSLT, September 2004 Define feature LABEL trycka.in-press SOURCE =trycka.vb+in.ab.1 =VERB TARGET =press.vb.1 =VERB TRANSFER

60 Anna Sågvall Hein, GSLT, September 2004 A generation rule LABEL CL.IMP X1 ---> X2 X3 X4 : = CL = = IMP =

61 Anna Sågvall Hein, GSLT, September 2004 A contextual lexical rule LABEL tänka.på-think.about SOURCE = tänka.vb.1 = pp = ?prep = på.pp.1 = ?rect1 TARGET = pp = PREP = about.pp.1 = ?rect2 TRANSFER ?rect1 ?rect2

62 Anna Sågvall Hein, GSLT, September 2004 A generation trace 1-Applying Rule cl-sep 1- Applying Rule cl.imp 1- Applying Rule subj2nd-verb-obj.dir 1- Applying Rule verb.main.act 1- Applying Rule np.the-df 1- Applying Rule ng.noun-def 1-Success!

63 Anna Sågvall Hein, GSLT, September 2004 Language resources in the MATS system dictionary in a database with different views analysis grammar transfer grammar –incl. contextually defined lexical rules generation grammar

64 sv-en_LinkLexicon

65 en-Inflections

66 en_LemmaLexicon

67 en_LexemeLexicon

68 en_Lexicon

69 en_StemLexicon

70 sv_Inflections

71 sv_LemmaLexicon

72 sv_LexemeLexicon

73 sv_Lexicon

74 sv_StemLexicon

75 Anna Sågvall Hein, GSLT, September 2004 The MATS system Frozen demo…

76 Anna Sågvall Hein, GSLT, September 2004 Assignment 2: Working with MATS http://stp.ling.uu.se/~evapet/mt04/assignment2.html

77 Anna Sågvall Hein, GSLT, September 2004 Lexicalistic translation Identify (lexical) translation units in the source sentence Translate each unit separately (considering the context) Order the result in agreement with a model of the target language Formulation due to Lars Ahrenberg; see further AH (reading list) ; see also Beaven, L. John, Shake-and-Bake Machine Translation. Coling –92, Nantes, 23-28 Aout 1992.

78 Anna Sågvall Hein, GSLT, September 2004 T4F – a lexicalistic system processes in T4F –tokenisation –tagging –transfer –transposition –filtering See further AH (in the reading list)

79 Anna Sågvall Hein, GSLT, September 2004 Interlingua translation See SN

80 Anna Sågvall Hein, GSLT, September 2004

81

82

83 Applications of alignment translation memories translation dictionaries lexicalistic translation statistical machine translation example-based translation

84 Anna Sågvall Hein, GSLT, September 2004 Translation memories based on sentence links optionally, sub sentence links See further Macklovitch, E. (2000)

85 Anna Sågvall Hein, GSLT, September 2004 Translation dictionaries based on word links refinement of word links

86 Anna Sågvall Hein, GSLT, September 2004 Refinement of word alignment data neutralise capital letters where appropriate lemmatise or tag source and target units identify ambiguities –search for criteria to resolve them identify partial links –compounds? –remove or complete them manual revision?

87 Anna Sågvall Hein, GSLT, September 2004 Informally about statistical MT build a translation dictionary based on word alignment aim for as big fragments as possible keep information on link frequency build an n-gram model of the target language implement a direct translation strategy –including alternatives ordered by length and frequency process the output by the n-gram model filtering out the best alternatives and adjust the translation accordingly

88 Anna Sågvall Hein, GSLT, September 2004 Example-based MT HS (in the reading list)

89 Anna Sågvall Hein, GSLT, September 2004 Some current research topics intersentential dependences hybrid systems: data-driven and rule-driven improved alignment techniques improved language modeling in ST automatic learning from post-editing translation by structural correspondences translation of spoken language improved preference strategies ambiguity preserving translation

90 Anna Sågvall Hein, GSLT, September 2004 Intersentential dependencies pronoun resolution lexical ambiguity resolution, such as –(torkar)motorn the motor –(förbrännings)motornthe engine fluency

91 Anna Sågvall Hein, GSLT, September 2004 Preserving the information structure information structure is expressed in different ways in the source and the target syntactic clues are exploited in the analysis to compute the information structure (topic- focus articulation) information structure is used to guide the generation

92 Anna Sågvall Hein, GSLT, September 2004 An example Torkarmotorn M2 är sammankopplad med omkopplare S24 och intervallrelä R22. För att inte motorn skall överbelastas, t.ex. om torkarbladen fastnat, finns en inbyggd termovakt som bryter strömmen till motorn när … Wiper motor M2 is connected to switch S24 and intermittent relay R22. To prevent motor overload, e.g. if the wiper blade gets stuck, there is an integral thermal sensor which breaks the current to the motor when …

93 Anna Sågvall Hein, GSLT, September 2004 Preferences syntactic preferences –the principle of right association –the principle of minimal attachment –two-stage processing semantic preferences –lexical selectional restrictions –lexical contextual rules –conceptual taxonomies –likelihood of occurrence See further Bennet, P. & Paggio, P., 1993, Preference in Eurotra.

94 Anna Sågvall Hein, GSLT, September 2004 Preferences in Multra parsing –a formalism for expressing syntactic preferences in the parse not fully developed transfer –contextual lexical rules –rule specificity generation –rule specificity

95 Anna Sågvall Hein, GSLT, September 2004 Hybrid systems aims components problems architecture scores

96 Anna Sågvall Hein, GSLT, September 2004 Aims of a hybrid system simple techniques for simple tasks complex techniques for complex tasks

97 Anna Sågvall Hein, GSLT, September 2004 Components of a hybrid systems component strategies –translation memory full sentences fragments direct translation –statistical translation –ebmt

98 Anna Sågvall Hein, GSLT, September 2004 Component strategies, cont’d rule-based translation –simplistic analysis (cf. direct translation) word by word (S  sequence of words) phrase by phrase (S  sequence of phrases) –partial parsing –full parsing

99 Anna Sågvall Hein, GSLT, September 2004 Problems of a hybrid system how does the system know when a simple technique is appropriate? –does the source tell? –does the target tell?

100 Anna Sågvall Hein, GSLT, September 2004 Architecture and scores simple first? concerting results? scoring?

101 Anna Sågvall Hein, GSLT, September 2004 Improved techniques for re-use of translation combining clues for word alignment (Tiedemann 2003) interactive word alignment (Ahrenberg et al. 2003) parallel treebanks

102 Anna Sågvall Hein, GSLT, September 2004 Translation by structural correspondences LFG HPSG

103 Anna Sågvall Hein, GSLT, September 2004 Translation of spoken language See Krauver, Steven (ed.), 2000, Machine Translation, June 2000. Volume 15, Issue 1- 2, Special issue on Spoken Language Translation.


Download ppt "Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004."

Similar presentations


Ads by Google