A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer.

Slides:



Advertisements
Similar presentations
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Machine Translation II How MT works Modes of use.
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Lexical Functional Grammar : Grammar Formalisms Spring Term 2004.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
Languages & The Media, 4 Nov 2004, Berlin 1 Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System.
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
Introduction to Computational Linguistics Lecture 2.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
TRANSFORMATIONAL GRAMMAR An introduction. LINGUISTICS Linguistics Traditional Before 1930 Structural 40s -50s Transformational ((Chomsky 1957.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
Machine translation Context-based approach Lucia Otoyo.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
Globalisation and machine translation Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate.
Area Report Machine Translation Hervé Blanchon CLIPS-IMAG A Roadmap for Computational Linguistics COLING 2002 Post-Conference Workshop.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
GUIDE : PROF. PUSHPAK BHATTACHARYYA Bilingual Terminology Mining BY: MUNISH MINIA (07D05016) PRIYANK SHARMA (07D05017)
Rule Learning - Overview Goal: Syntactic Transfer Rules 1) Flat Seed Generation: produce rules from word- aligned sentence pairs, abstracted only to POS.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Jan 2005CSA4050 Machine Translation II1 CSA4050: Advanced Techniques in NLP Machine Translation II Direct MT Transfer MT Interlingual MT.
LING 6520: Comparative Topics in Linguistics (from a computational perspective) Martha Palmer Jan 15,
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
SYNTAX.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
UNL Document Summarization Virach Sornlertlamvanich, Tanapong Potipiti and Thatsanee Charoenporn Information Research and Development Division National.
CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.
Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September.
23.3 Information Extraction More complicated than an IR (Information Retrieval) system. Requires a limited notion of syntax and semantics.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Introduction to Machine Translation
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Approaches to Machine Translation
Introduction to Machine Translation
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Basic Parsing with Context Free Grammars Chapter 13
SYNTAX.
BBI 3212 ENGLISH SYNTAX AND MORPHOLOGY
Introduction to Linguistics
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Approaches to Machine Translation
Introduction to Machine Translation
Presentation transcript:

A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer

2 Introduction Goal: Create a machine translation system that translates Turkish text into English text  Turkish has an agglutinative morphology ev+im+de+ki+ne to the one at my home  Turkish has free word order Ben eve gittim, Eve gittim ben, Gittim ben eve,... I went to the house Idea Write rules to translate analyzed Turkish sentence into English

3 Outline Machine Translation (MT)  Motivation  Challenges in MT  History of MT  Classical Approaches to MT The Hybrid Approach  Challenges  Translation Steps Analysis and Preprocessing Transfer and Generation Decoding Evaluation  Methods  Experimental Results  Examples Conclusions

4 Machine Translation Translation Given: Input text s in source language S Find: A well-formed text in target language T that is equivalent to s Machine Translation (MT) Any system using an electronic computer to perform translation

5 Motivation Satisfy increasing demand for translation  100 languages with 5 million or more native speakers Reduce the cost and effort of human translation  13% of EU budget  weeks vs. minutes Make information available to more people in less time  translation of web sites automatically Exploring limits to computers’ ability and linguistic challenges

6 Challenges in MT Morphological issues  Each language has a different morphology Syntactical issues  Word order in sentences and noun phrases  Language-specific features (narrative past tense in Turkish, distinguishing feminine and masculine nouns) Semantical issues  Word sense ambiguities bank  geographical term OR financial institution ?  Idiomatic phrases kafa çekmek  pull head OR drink alcohol ?

7 History of MT Idea by Warren Weaver in s: Russian-English MT research during cold war between US and USSR 1960s: Funding for research stopped due to failure Mid-1970s  METÉO: English-French MT in Canada  Systran and Eurotra: Multi-lingual MT in Europe  TITRAN and MU Project in Kyoto University, Japan After 90s  Statistical MT: Use statistics and large amount of data

8 MT between English and Turkish Morphological analyzer  Oflazer, Morphological disambiguator  Oflazer & Kuruöz,  Hakkani-Tür et al.,  Yuret & Türe, English-to-Turkish MT  Sagay,  Hakkani et al.,  Keyder Turhan, No Turkish-to-English system

9 Classical Approaches to MT

10 Vauquois Triangle Analysis Generation Syntactic level Semantic level Lexical level Interlingua Transfer

11 Word-by-word Translation Source sentence Bilingual Dictionary Target sentence Source sentence: Ali evdeki kediyi çok sevmez Translation: Ali home cat very like Reference: Ali does not like the cat at home very much

12 Direct Translation Source: Ali evde -ki kediyi çok sevmez Analysis: Ali ev +Loc Rel +Adj kedi +Acc çok +Adv sev +Neg+Present Lexical: Ali home +Loc at +Adj cat +Acc very much +Adv like +Neg+Present Reorder: Ali at +Adj home +Loc cat +Acc like +Neg+Present very much +Adv Generate: Ali at home cat not like very much Source sentence Morphological Analyzer Lexical Transfer Local Reordering Target sentence

13 Transfer-based Translation Source sentence SL Grammar TL Grammar Target sentence SL Representation TL Representation Transfer rules / Dictionary

14 Source sentence SL Grammar TL Grammar Target sentence SL Representation TL Representation Transfer rules / Dictionary A mavi N ev+in APNP N duvar+ı NP N wall Det the NP PP Prep of NP Det the A blue N house AP NP mavi evin duvarı the wall of the blue house Transfer-based Translation

15 Interlingual Translation Source sentence Target sentence Interlingua AnalysisGeneration Source: Ali evdeki kediyi çok sevmez Interlingua: ¬holds(in_general, like(subj: Ali, obj: cat (at: home ), degree: very much )) Translation: Ali does not like the cat at home very much

16 Statistical MT Given a Turkish sentence t, find the English sentence e that is the “most likely” translation of t

17 Statistical MT Translation Model P(t|e) Language Model P(e) Decoding argmax P(e) * P(t|e) e whether an English text e is a good translation of a Turkish text t whether an English text e is well-formed English or not Turkish-English aligned text English text

18 Statistical MT TranslationLM Score TM Score Score e P(e)P(e) P(t|e)P(t|e) P(t|e)×P(e)P(t|e)×P(e) I have a book Hungry Ali be so Ali was so hungry Ali çok açtı Ali was so hungry

19 Outline Machine Translation (MT)  Motivation  Challenges in MT  History of MT  Classical Approaches to MT The Hybrid Approach  Challenges  Translation Steps Analysis and Preprocessing Transfer and Generation Decoding Evaluation  Methods  Experimental Results  Examples Conclusions

20 The Hybrid Approach

21 Why Hybrid? Classical transfer-based approaches are good at  representing the structural differences between the source and target languages. and statistical methods are good at  extracting knowledge from large amounts of data, about how well-formed a sentence or how “meaningful” a translation is.

22 Challenges Avrupalılaştıramadıklarımızdanmışsınız You were among the ones who we were not able to cause to become European Morphological differences Extreme case of a word in an agglutinative language Each Turkish morpheme corresponds to one or more words in English

23 arkadaşımdakiler the ones at my friend Challenges Morphological differences

24 dinle+miş+sin  ( someone told me that ) you listened dinle+di+n  you listened dinle+t+ti+n  you made (someone) listen dinle+t+tir+di+n  you had (someone) make (someone) listen dinle+r+im  I listen dinle+r+di+m  I used to listen dinle+t+ebil+ir+miş+im  ??? Challenges Structural differences

25 Adam evde kitap okuyordu  The man was reading a book at home SUBJ ADJCT OBJ V SUBJ V OBJ ADJCT mavi kitap  blue book AP NP AP NP evdeki kitap  the book at home AP NP NP AP kitabımın kapağı  my book’s cover NP1 NP2 NP1 NP2 arkadaşımın yüzünden  because of my friend NP1 NP2 NP2 NP1 Challenges Structural differences

26 koyun 1.sheep (or bosom) 2.your bay 3.your dark (one) 4.of the bay 5.put! Challenges Ambiguities

27 silahını evine koy 1.put your gun to your home 2.put your gun to his home 3.put his gun to your home 4.put his gun to his home 5.put your gun to her home 6.put her gun to your home 7.put her gun to her home. Challenges Ambiguities

28 Challenges Ambiguities kitabın kapağı 1.the book’s cover 2.book’s cover 3.the cover of the book

29 ev+ Dative (gitti)  (went) to the house masa+ Dative (çıktı)  (jumped) on the table adam+ Dative (baktı)  (looked) at the man Challenges Ambiguities

30 Challenges Morphological differences Structural differences Ambiguities Use morphological analysis on Turkish side and generation on English side Transfer rules can represent such transformations An English language model can determine the most probable translation statistically

31 The Avenue Transfer System Avenue Project initiated by CMU LTI Group Grammar formalism, which allows one to manually create a parallel grammar between two languages and Transfer engine, which transfers the source sentence into possible target sentence(s) using this parallel grammar

32 Overview of Our Approach Turkish sentence Morphological Analyzer Preprocessor Analysis Avenue Transfer Engine Transfer rules Lattice English Language Model... English translations Most probable English translation

33 I. Analysis and Preprocessing Morphological analyses of each word: A set of features, describing the structural properties of the word adam evde oğlunu yendi

34 I. Analysis and Preprocessing Lattice representation of the sentence ada+N+P1Sg adam+N+PNon ev+N+Loc o ğ ul+N+P2Sg o ğ ul+N+P3Sg ye+V+Pass+V+Past yen+N Zero+V+Past yen+V+Past

35 I. Analysis and Preprocessing Representation of IGs

36 II. Transfer and Generation

37 II. Transfer and Generation

38 II. Transfer and Generation N NNV

39 II. Transfer and Generation adam evde oğlunu yendi N NNV man won son house N VN N

40 II. Transfer and Generation adam evde oğlunu yendi N NNV NP man won son house N NP the VN N

41 II. Transfer and Generation adam evde oğlunu yendi N NNV NP SUBJ man won son house N NP the VN N

42 II. Transfer and Generation adam evde oğlunu yendi N NV NP SUBJ N NP SUBJ man won son house N NP the VN N NP the

43 II. Transfer and Generation adam evde oğlunu yendi N NV NP SUBJAdjct N NP SUBJ Adjct man won son house N NP the VN at N NP the

44 II. Transfer and Generation adam evde oğlunu yendi N NV NP SUBJAdjct N NP SUBJ Adjct man won son house N NP the VN NP his at N NP the

45 II. Transfer and Generation adam evde oğlunu yendi N NV NP SUBJAdjct N NP OBJSUBJ Adjct OBJ man won son house N NP the VN NP his at N NP the

46 II. Transfer and Generation adam evde oğlunu yendi N NV NP VcVc SUBJAdjct N NP OBJSUBJ OBJ man won son house N NP the V VcVc N NP his Adjct at N NP the

47 II. Transfer and Generation adam evde oğlunu yendi N NV NP VcVc SUBJAdjct N NP OBJ V fin the SUBJ OBJ V fin man won son house N NP the V VcVc N NP his Adjct at N NP the

48 II. Transfer and Generation adam evde oğlunu yendi N NV NP VcVc SUBJAdjct N NP OBJ V fin S SUBJ OBJ V fin S man won son house N NP the V VcVc N NP his Adjct at N NP the

49 II. Transfer and Generation SUBJ AdjctOBJ V fin S SUBJAdjctOBJ V fin S

50 II. Transfer and Generation NP Adjunct {Adjunct,3} Adjunct::Adjunct : [NP] -> ["at" NP] ( (x1::y2) (x0 = x1) ((x1 CASE) =c Loc) ((x1 poss) =c yes) (y0 = x0) ) Adjunct at NP

51 II. Transfer and Generation VcVc V fin VcVc ;; yendi -> won {Vc,2} Vc::Vc : [V] -> [V] ( (x1::y1) ;Analysis (x0 = x1) ;Constraints ((x1 lex) =c (*or* “yen"...) ((x0 casev) <= Acc) ((x0 trans) <= yes) ;Transfer ((y1 TENSE) = (x1 TENSE)) ((y1 AGR-PERSON) = (x1 AGR-PERSON)) ((y1 AGR-NUMBER) = (x1 AGR-NUMBER)) ((y1 POLARITY) = (x1 POLARITY)) ;Generation (y0 = y1) )

52 III. Decoding Transfer engine outputs n translations T1,..., Tn We use an English language model to calculate probability of each translation, and pick the one with highest language model score

53 III. Decoding

54 III. Decoding TranslationLog Probability My island beat your son at home My island beat his son at home The man beat your son at home The man beat his son at home

55 Outline Machine Translation (MT)  Motivation  Challenges in MT  History of MT  Classical Approaches to MT The Hybrid Approach  Challenges  Translation Steps Analysis and Preprocessing Transfer and Generation Decoding Evaluation  Methods  Experimental Results  Examples Conclusions

56 Evaluation

57 MT Evaluation Manual evaluation: SSER (subjective sentence error rate) Correct/Incorrect Manual evaluations require human effort and time Automatic evaluation: WER (word error rate) BLEU (Bilingual Evaluation Understudy) METEOR

58 Automatic Evaluation Word Error Rate (WER) Number of insertions, deletions, and substitutions required to transform the reference translation into the system translation BLEU Number of common n-grams of words between the system translation S and a set of reference translations METEOR Similar to BLEU, considers roots and synonyms

59 Experimental Results System contains over 200 transfer rules, and lexical rules It can parse and translate challenging sentences Translations are sound, but not complete We tested the system on 192 noun phrases, and 70 sentences. BLEU Score for noun phrases: BLEU Score for sentences:33.17

60 Examples Noun phrase: siyahlarla birlikte bir protesto yürüyüşünde Translation: in a protest walk with the blacks Reference: in a protest walk with the blacks Noun phrase: Elif 'in arkasındaki kapıda Translation: at the door at the back of Elif Reference: on the door behind Elif Noun phrase: alışveriş dünyasında Translation: in the shopping world Reference: at the shopping world

61 Examples Sentence: Bu tutku zamanla bana acı vermeye başladı Translation: This passion began to give pain to me with time Reference: In time this passion began to give me pain Sentence: Perşembe uzun yürüyüşler ve ziyaretler yapıyorum Translation: I am doing long walks and visits on Thursday Reference: On Thursdays I take long walks and make visits Sentence: Kaçtıkça daha büyüdü, bir tutku oldu Translation: It grew more as escaping, it became a passion Reference: He grew as he ran away, became an obsession

62 Conclusions & Future Work A hybrid machine translation system from Turkish to English  wide linguistic coverage by manually-crafted transfer rules in Avenue  ambiguities handled by English language model  computationally inefficient translation  time-consuming development Future work  further improvement of transfer rules  learning rules automatically from parallel corpus

63 Thank you!