Machine Translation Diana Trandab ă ţ Academic Year 2015-2016.

Slides:



Advertisements
Similar presentations
Introduction to Statistical Machine Translation Philipp Koehn Kevin Knight USC/Information Sciences Institute USC/Computer Science Department CSAIL Massachusetts.
Advertisements

Computing & Information Sciences Kansas State University Wednesday, 29 Nov 2006CIS 490 / 730: Artificial Intelligence Lecture 39 of 42 Wednesday, 29 November.
Computing & Information Sciences Kansas State University Lecture 38 of 42 CIS 530 / 730 Artificial Intelligence Lecture 38 of 42 Natural Language Processing,
Machine Translation Domain Adaptation Day PROJECT #2 2.
+. + Natural Language Processing CS311, Spring 2013 David Kauchak.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 24 Jim Martin.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 14b 24 August 2007.
Machine Translation: Introduction Slides from: Dan Jurafsky.
Statistical Machine Translation Kevin Knight USC/Information Sciences Institute USC/Computer Science Department.
Introduction to Statistical Machine Translation Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics.
CS 188: Artificial Intelligence Spring 2007 Lecture 25: Machine Translation 4/24/2007 Srini Narayanan – ICSI and UC Berkeley.
Planning the Development of Reading Skills Modern Languages PGCE School of Education University of Nottingham.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Machine Translation Course 9 Diana Trandab ă ț Academic year
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Class 1: What this course is about. Assignments Reading: Chapter 1, pp 1-33 Do in Class 1: –Exercises on pages 13, 14, 22, 28 To hand in in Class 2: –Exercises.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Introduction to Statistical Machine Translation Philipp Koehn Kevin Knight USC/Information Sciences Institute USC/Computer Science Department CSAIL Massachusetts.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Statistical Alignment and Machine Translation
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Introduction to Programming Lecture 1 – Overview
Linguistics & AI1 Linguistics and Artificial Intelligence Linguistics and Artificial Intelligence Frank Van Eynde Center for Computational Linguistics.
CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.
WXGE 6103 Digital Image Processing Semester 2, Session 2013/2014.
1 Machine Translation Dai Xinyu Outline  Introduction  Architecture of MT  Rule-Based MT vs. Data-Driven MT  Evaluation of MT  Development.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
CST 320 Compiler Methods Dr. Sherry Yang PV 171 (541)
CHAPTER 13 NATURAL LANGUAGE PROCESSING. Machine Translation.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
For Wednesday No reading Homework –Chapter 23, exercise 15 –Process: 1.Create 5 sentences 2.Select a language 3.Translate each sentence into that language.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Compiler Construction (CS-636)
Introduction to Linguistics Ms. Suha Jawabreh Lecture # 1.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Machine Translation Course 2 Diana Trandab ă ţ Academic year:
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Natural Language Processing Lecture 23—12/1/2015 Jim Martin.
Machine Translation Course 4 Diana Trandab ă ț Academic year:
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Introduction to Machine Translation
Spring 2010 Lecture 2 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn, Kevin Knight, Chris Quirk LING 575: Seminar on statistical machine.
Machine Translation, Statistical Approach Heshaam Faili Natural Language and Text Processing Laboratory School of Electrical and Computer Engineering,
Approaches to Machine Translation
Introduction to Machine Translation
CSE 517 Natural Language Processing Winter 2015
Alexander Fraser CIS, LMU München Machine Translation
Statistical NLP: Lecture 13
Machine Translation: Introduction
CSCI 5832 Natural Language Processing
Advanced English Conversation 2
LING 180 SYMBSYS 138 Intro to Computer Speech and Language Processing
Approaches to Machine Translation
Introduction to Machine Translation
Statistical Machine Translation
Machine Translation(MT)
Introduction to Statistical Machine Translation
Machine Translation: Word alignment models
Presentation transcript:

Machine Translation Diana Trandab ă ţ Academic Year

Course overview Approaches to MT Language Model Translation model Statistical modeling and IBM Models EM algorithm Word alignment Phrase-based translation Syntax-based translation Reordering Decoding Evaluation

Prerequisites WILL TO LEARN!!!

Minimum expectations LEARN something Adequately use the machine translation terminology Create language models Develop and/or implement translation models Better presentations skills DO something Assignments Project TEACH me something

Evaluation Laboratory – 100 points – Attendance(10%) – Homework (90%) Project – 100 points Exam – 100 points – Midterm – Final exam

Homework ~ Weekly In class delivery – 50% of the points for delivery in class; 50% for submitted homework Late delivery for submissions – 100% of the points for delivery on time, 80% of the points for 1 day late delivery, 60% of the points for 2 days late delivery, … Name convention: MT_HomeworkNO_StudentName_ProgrammingLanguage Each implementation task is submitted with a short documentation (max. 1 page) with implementation details, challenges, methods/solutions, errors, problems etc.

Projects We’ll get to that latter…

What I expect you to know after today What is machine translation What is statistical machine translation Problems of machine translation

What I expect you to know after today What is machine translation What is statistical machine translation Problems of machine translation We are not alone in the universe!?

How do humans translate?

Spend years learning a new language – memorizing words – learning syntactic patterns – exercising – … Use dictionaries and detailed world knowledge to: – Identify meaning – Find proper words to use in new language – Produce a syntactically correct text – Preserve meaning ….

What is machine translation? Translation performed using a machine/computer

How do machines translate? Flowers are lovely!

How do machines translate? Using available resources: Electronic bilingual dictionary Templates, transfer rules: Thesaurus, WordNet, FrameNet, … Parallel data, comparable data Using available NLP tools tokenizer, morphological analyzer, syntactic parser, …  More resources for major languages, less for “minor” languages.

How do machines translate?

Statistical machine translation

very large data set of good translations automatically infer a statistical model of translation apply the translation model to new texts to guess a reasonable translation

Statistical machine translation very large data set of good translations automatically infer a statistical model of translation apply the translation model to new texts to guess a reasonable translation

Noisy channel

Language Model P(e) Takes care of fluency in the target language Data: corpora in the target language Translation Model P(f|e) Lexical faithful correspondence between languages Data: aligned corpora in source and target languages argmax Search done by the decoder Noisy channel

Accurate vs. Fluent Often impossible to have a true translation; one that is both: – Faithful to the source language, and – Fluent in the target language Japanese: “fukaku hansei shite orimasu” Fluent translation: “we apologize” Faithful translation: “we are deeply reflecting (on our past behaviour, and what we did wrong, and how to avoid the problem next time)” Need to compromise between faithfulness & fluency

Accurate vs. Fluent Often impossible to have a true translation; one that is both: – Faithful to the source language, and – Fluent in the target language Japanese: “fukaku hansei shite orimasu” Fluent translation: “we apologize” Faithful translation: “we are deeply reflecting (on our past behaviour, and what we did wrong, and how to avoid the problem next time)” Need to compromise between faithfulness & fluency

Accurate vs. Fluent Often impossible to have a true translation; one that is both: – Faithful to the source language, and – Fluent in the target language Japanese: “fukaku hansei shite orimasu” Fluent translation: “we apologize” Faithful translation: “we are deeply reflecting (on our past behaviour, and what we did wrong, and how to avoid the problem next time)” Need to compromise between faithfulness & fluency

Question What is your input on clients which sell pharmaceuticals in Europe?

Group activity

CENTAURIARCTURAN Ok-voon ororok sprok.At-voon bichat dat. Ok-drubel ok-voon anok plok sprok.At-drubel at-voon pippat rrat dat. Erok sprok izok hihok ghirok.Totat dat arrat vat hilat. Ok-voon anok drok brok jok.At-voon krat pippat sat lat. Wiwok farok izok stok.Totat jjat quat cat. Lalok sprok izok jok stok.Wat dat krat quat cat. Lalok farok ororok lalok sprok izok enemok.Wat jjat bichat wat dat vat eneat. Lalok brok anok plok nok.Iat lat pippat rrat nnat. Wiwok nok izok kantok ok-yurp.Totat nnat quat oloat at-yurp. Lalok mok nok yorok ghirok clok.Wat nnat gat mat bat hilat Lalok nok crrrok hihok yorok zanzanok.Wat nnat arrat mat zanzanat. Lalok rarok nok izok hihok mok.Wat nnat forat arrat vat gat.

What we’ve learned Direct (word-by-word) translation Reordering Different word alignment 1:1, 0:1, 1:0, etc. Translation model

Question What is your input on clients which sell pharmaceuticals in Europe?

References Philipp Koehn: Statistical machine translation. Cambridge University Press. xii, 433pp, 2009 Yorick Wilks: Machine translation: its scope and limits. New York: Springer. x, 252pp, 2009 John Hutchins “Machine translation: general overview”. Chapter 27 of R Mitkov (ed.) The Oxford Handbook of Computational Linguistics, Oxford (2004) Harold Somers “Machine Translation”. Chapter 13 of R Dale, H Moisl & H Somers (eds) Handbook of Natural Language Processing, New York (2000): Marcel Dekker Nico Weber (ed.): Machine translation: theory, applications, and evaluation. An assessment of the state-of-the-art St.Augustin: Gardez! Verlag, 1998 Kishore Papineni et. al.: Bleu: a Method for Automatic Evaluation of Machine Translation, ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Pages , 2002.

“One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’ ” Warren Weaver (1947)

See you next time!

Noisy channel