Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

Similar presentations


Presentation on theme: "A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer."— Presentation transcript:

1 A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer

2 2 Introduction Goal: Create a machine translation system that translates Turkish text into English text  Turkish has an agglutinative morphology ev+im+de+ki+ne to the one at my home  Turkish has free word order Ben eve gittim, Eve gittim ben, Gittim ben eve,... I went to the house Idea Write rules to translate analyzed Turkish sentence into English

3 3 Outline Machine Translation (MT)  Motivation  Challenges in MT  History of MT  Classical Approaches to MT The Hybrid Approach  Challenges  Translation Steps Analysis and Preprocessing Transfer and Generation Decoding Evaluation  Methods  Experimental Results  Examples Conclusions

4 4 Machine Translation Translation Given: Input text s in source language S Find: A well-formed text in target language T that is equivalent to s Machine Translation (MT) Any system using an electronic computer to perform translation

5 5 Motivation Satisfy increasing demand for translation  100 languages with 5 million or more native speakers Reduce the cost and effort of human translation  13% of EU budget  weeks vs. minutes Make information available to more people in less time  translation of web sites automatically Exploring limits to computers’ ability and linguistic challenges

6 6 Challenges in MT Morphological issues  Each language has a different morphology Syntactical issues  Word order in sentences and noun phrases  Language-specific features (narrative past tense in Turkish, distinguishing feminine and masculine nouns) Semantical issues  Word sense ambiguities bank  geographical term OR financial institution ?  Idiomatic phrases kafa çekmek  pull head OR drink alcohol ?

7 7 History of MT Idea by Warren Weaver in 1945 1950s: Russian-English MT research during cold war between US and USSR 1960s: Funding for research stopped due to failure Mid-1970s  METÉO: English-French MT in Canada  Systran and Eurotra: Multi-lingual MT in Europe  TITRAN and MU Project in Kyoto University, Japan After 90s  Statistical MT: Use statistics and large amount of data

8 8 MT between English and Turkish Morphological analyzer  Oflazer, 1993. Morphological disambiguator  Oflazer & Kuruöz, 1994.  Hakkani-Tür et al., 2000.  Yuret & Türe, 2006. English-to-Turkish MT  Sagay, 1981.  Hakkani et al., 1998.  Keyder Turhan, 1997. No Turkish-to-English system

9 9 Classical Approaches to MT

10 10 Vauquois Triangle Analysis Generation Syntactic level Semantic level Lexical level Interlingua Transfer

11 11 Word-by-word Translation Source sentence Bilingual Dictionary Target sentence Source sentence: Ali evdeki kediyi çok sevmez Translation: Ali home cat very like Reference: Ali does not like the cat at home very much

12 12 Direct Translation Source: Ali evde -ki kediyi çok sevmez Analysis: Ali ev +Loc Rel +Adj kedi +Acc çok +Adv sev +Neg+Present Lexical: Ali home +Loc at +Adj cat +Acc very much +Adv like +Neg+Present Reorder: Ali at +Adj home +Loc cat +Acc like +Neg+Present very much +Adv Generate: Ali at home cat not like very much Source sentence Morphological Analyzer Lexical Transfer Local Reordering Target sentence

13 13 Transfer-based Translation Source sentence SL Grammar TL Grammar Target sentence SL Representation TL Representation Transfer rules / Dictionary

14 14 Source sentence SL Grammar TL Grammar Target sentence SL Representation TL Representation Transfer rules / Dictionary A mavi N ev+in APNP N duvar+ı NP N wall Det the NP PP Prep of NP Det the A blue N house AP NP mavi evin duvarı the wall of the blue house Transfer-based Translation

15 15 Interlingual Translation Source sentence Target sentence Interlingua AnalysisGeneration Source: Ali evdeki kediyi çok sevmez Interlingua: ¬holds(in_general, like(subj: Ali, obj: cat (at: home ), degree: very much )) Translation: Ali does not like the cat at home very much

16 16 Statistical MT Given a Turkish sentence t, find the English sentence e that is the “most likely” translation of t

17 17 Statistical MT Translation Model P(t|e) Language Model P(e) Decoding argmax P(e) * P(t|e) e whether an English text e is a good translation of a Turkish text t whether an English text e is well-formed English or not Turkish-English aligned text English text

18 18 Statistical MT TranslationLM Score TM Score Score e P(e)P(e) P(t|e)P(t|e) P(t|e)×P(e)P(t|e)×P(e) I have a book 0.90.20.18 Hungry Ali be so 0.10.80.08 Ali was so hungry 0.8 0.64... Ali çok açtı Ali was so hungry

19 19 Outline Machine Translation (MT)  Motivation  Challenges in MT  History of MT  Classical Approaches to MT The Hybrid Approach  Challenges  Translation Steps Analysis and Preprocessing Transfer and Generation Decoding Evaluation  Methods  Experimental Results  Examples Conclusions

20 20 The Hybrid Approach

21 21 Why Hybrid? Classical transfer-based approaches are good at  representing the structural differences between the source and target languages. and statistical methods are good at  extracting knowledge from large amounts of data, about how well-formed a sentence or how “meaningful” a translation is.

22 22 Challenges Avrupalılaştıramadıklarımızdanmışsınız You were among the ones who we were not able to cause to become European Morphological differences Extreme case of a word in an agglutinative language Each Turkish morpheme corresponds to one or more words in English

23 23 arkadaşımdakiler the ones at my friend Challenges Morphological differences

24 24 dinle+miş+sin  ( someone told me that ) you listened dinle+di+n  you listened dinle+t+ti+n  you made (someone) listen dinle+t+tir+di+n  you had (someone) make (someone) listen dinle+r+im  I listen dinle+r+di+m  I used to listen dinle+t+ebil+ir+miş+im  ??? Challenges Structural differences

25 25 Adam evde kitap okuyordu  The man was reading a book at home SUBJ ADJCT OBJ V SUBJ V OBJ ADJCT mavi kitap  blue book AP NP AP NP evdeki kitap  the book at home AP NP NP AP kitabımın kapağı  my book’s cover NP1 NP2 NP1 NP2 arkadaşımın yüzünden  because of my friend NP1 NP2 NP2 NP1 Challenges Structural differences

26 26 koyun 1.sheep (or bosom) 2.your bay 3.your dark (one) 4.of the bay 5.put! Challenges Ambiguities

27 27 silahını evine koy 1.put your gun to your home 2.put your gun to his home 3.put his gun to your home 4.put his gun to his home 5.put your gun to her home 6.put her gun to your home 7.put her gun to her home. Challenges Ambiguities

28 28 Challenges Ambiguities kitabın kapağı 1.the book’s cover 2.book’s cover 3.the cover of the book

29 29 ev+ Dative (gitti)  (went) to the house masa+ Dative (çıktı)  (jumped) on the table adam+ Dative (baktı)  (looked) at the man Challenges Ambiguities

30 30 Challenges Morphological differences --------------------------------------------------------------------------- Structural differences --------------------------------------------------------------------------- Ambiguities Use morphological analysis on Turkish side and generation on English side Transfer rules can represent such transformations An English language model can determine the most probable translation statistically

31 31 The Avenue Transfer System Avenue Project initiated by CMU LTI Group Grammar formalism, which allows one to manually create a parallel grammar between two languages and Transfer engine, which transfers the source sentence into possible target sentence(s) using this parallel grammar

32 32 Overview of Our Approach Turkish sentence Morphological Analyzer Preprocessor Analysis Avenue Transfer Engine Transfer rules Lattice English Language Model... English translations Most probable English translation

33 33 I. Analysis and Preprocessing Morphological analyses of each word: A set of features, describing the structural properties of the word adam evde oğlunu yendi

34 34 I. Analysis and Preprocessing Lattice representation of the sentence ada+N+P1Sg adam+N+PNon ev+N+Loc o ğ ul+N+P2Sg o ğ ul+N+P3Sg ye+V+Pass+V+Past yen+N Zero+V+Past yen+V+Past 0123 4 6 5

35 35 I. Analysis and Preprocessing Representation of IGs

36 36 II. Transfer and Generation

37 37 II. Transfer and Generation

38 38 II. Transfer and Generation N NNV

39 39 II. Transfer and Generation adam evde oğlunu yendi N NNV man won son house N VN N

40 40 II. Transfer and Generation adam evde oğlunu yendi N NNV NP man won son house N NP the VN N

41 41 II. Transfer and Generation adam evde oğlunu yendi N NNV NP SUBJ man won son house N NP the VN N

42 42 II. Transfer and Generation adam evde oğlunu yendi N NV NP SUBJ N NP SUBJ man won son house N NP the VN N NP the

43 43 II. Transfer and Generation adam evde oğlunu yendi N NV NP SUBJAdjct N NP SUBJ Adjct man won son house N NP the VN at N NP the

44 44 II. Transfer and Generation adam evde oğlunu yendi N NV NP SUBJAdjct N NP SUBJ Adjct man won son house N NP the VN NP his at N NP the

45 45 II. Transfer and Generation adam evde oğlunu yendi N NV NP SUBJAdjct N NP OBJSUBJ Adjct OBJ man won son house N NP the VN NP his at N NP the

46 46 II. Transfer and Generation adam evde oğlunu yendi N NV NP VcVc SUBJAdjct N NP OBJSUBJ OBJ man won son house N NP the V VcVc N NP his Adjct at N NP the

47 47 II. Transfer and Generation adam evde oğlunu yendi N NV NP VcVc SUBJAdjct N NP OBJ V fin the SUBJ OBJ V fin man won son house N NP the V VcVc N NP his Adjct at N NP the

48 48 II. Transfer and Generation adam evde oğlunu yendi N NV NP VcVc SUBJAdjct N NP OBJ V fin S SUBJ OBJ V fin S man won son house N NP the V VcVc N NP his Adjct at N NP the

49 49 II. Transfer and Generation SUBJ AdjctOBJ V fin S SUBJAdjctOBJ V fin S

50 50 II. Transfer and Generation NP Adjunct {Adjunct,3} Adjunct::Adjunct : [NP] -> ["at" NP] ( (x1::y2) (x0 = x1) ((x1 CASE) =c Loc) ((x1 poss) =c yes) (y0 = x0) ) Adjunct at NP

51 51 II. Transfer and Generation VcVc V fin VcVc ;; yendi -> won {Vc,2} Vc::Vc : [V] -> [V] ( (x1::y1) ;Analysis (x0 = x1) ;Constraints ((x1 lex) =c (*or* “yen"...) ((x0 casev) <= Acc) ((x0 trans) <= yes) ;Transfer ((y1 TENSE) = (x1 TENSE)) ((y1 AGR-PERSON) = (x1 AGR-PERSON)) ((y1 AGR-NUMBER) = (x1 AGR-NUMBER)) ((y1 POLARITY) = (x1 POLARITY)) ;Generation (y0 = y1) )

52 52 III. Decoding Transfer engine outputs n translations T1,..., Tn We use an English language model to calculate probability of each translation, and pick the one with highest language model score

53 53 III. Decoding

54 54 III. Decoding TranslationLog Probability My island beat your son at home -29.5973 My island beat his son at home -27.1953 The man beat your son at home -23.7629 The man beat his son at home -26.1649

55 55 Outline Machine Translation (MT)  Motivation  Challenges in MT  History of MT  Classical Approaches to MT The Hybrid Approach  Challenges  Translation Steps Analysis and Preprocessing Transfer and Generation Decoding Evaluation  Methods  Experimental Results  Examples Conclusions

56 56 Evaluation

57 57 MT Evaluation Manual evaluation: SSER (subjective sentence error rate) Correct/Incorrect Manual evaluations require human effort and time Automatic evaluation: WER (word error rate) BLEU (Bilingual Evaluation Understudy) METEOR

58 58 Automatic Evaluation Word Error Rate (WER) Number of insertions, deletions, and substitutions required to transform the reference translation into the system translation BLEU Number of common n-grams of words between the system translation S and a set of reference translations METEOR Similar to BLEU, considers roots and synonyms

59 59 Experimental Results System contains over 200 transfer rules, and 20000 lexical rules It can parse and translate challenging sentences Translations are sound, but not complete We tested the system on 192 noun phrases, and 70 sentences. BLEU Score for noun phrases: 60.38 BLEU Score for sentences:33.17

60 60 Examples Noun phrase: siyahlarla birlikte bir protesto yürüyüşünde Translation: in a protest walk with the blacks Reference: in a protest walk with the blacks Noun phrase: Elif 'in arkasındaki kapıda Translation: at the door at the back of Elif Reference: on the door behind Elif Noun phrase: alışveriş dünyasında Translation: in the shopping world Reference: at the shopping world

61 61 Examples Sentence: Bu tutku zamanla bana acı vermeye başladı Translation: This passion began to give pain to me with time Reference: In time this passion began to give me pain Sentence: Perşembe uzun yürüyüşler ve ziyaretler yapıyorum Translation: I am doing long walks and visits on Thursday Reference: On Thursdays I take long walks and make visits Sentence: Kaçtıkça daha büyüdü, bir tutku oldu Translation: It grew more as escaping, it became a passion Reference: He grew as he ran away, became an obsession

62 62 Conclusions & Future Work A hybrid machine translation system from Turkish to English  wide linguistic coverage by manually-crafted transfer rules in Avenue  ambiguities handled by English language model  computationally inefficient translation  time-consuming development Future work  further improvement of transfer rules  learning rules automatically from parallel corpus

63 63 Thank you!


Download ppt "A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer."

Similar presentations


Ads by Google