Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.

Slides:



Advertisements
Similar presentations
Machine Translation. Can you imagine working as a translator without the help of computer?
Advertisements

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Machine Translation II How MT works Modes of use.
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Bilingual Dictionaries
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
The Challenges of Multilingual Search Paul Clough The Information School University of Sheffield ISKO UK conference 8-9 July 2013.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Machine Translation Anna Sågvall Hein Mösg F
Introduction to Computational Linguistics Lecture 2.
SB Program University of Jyväskylä Machine Translation Research Seminar on Software Business Antti Ilmo.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Evaluating an MT French / English System Widad Mustafa El Hadi Ismaïl Timimi Université de Lille III Marianne Dabbadie LexiQuest - Paris.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Language Translators By: Henry Zaremba. Origins of Translator Technology ▫1954- IBM gives a demo of a translation program called the “Georgetown-IBM experiment”
MACHINE TRANSLATION TRANSLATION(5) LECTURE[1-1] Eman Baghlaf.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
MACHINE TRANSLATION A precious key to communicate beyond linguistic barriers 1.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Machine translation Context-based approach Lucia Otoyo.
Linguistics, Pragmatics & Natural Grammar
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Computational Linguistics INTroduction
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
Globalisation and machine translation Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate.
1 Natural Language Processing Gholamreza Ghassem-Sani Fall 1383.
Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin.
Chapter 6. Semantics is the study of the meaning of words, phrases and sentences. In semantic analysis, there is always an attempt to focus on what the.
Postgraduate Diploma in Translation Lecture 1 Computers and Language.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Grammars Grammars can get quite complex, but are essential. Syntax: the form of the text that is valid Semantics: the meaning of the form – Sometimes semantics.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Designing a Machine Translation Project Lori Levin and Alon Lavie Language Technologies Institute Carnegie Mellon University CATANAL Planning Meeting Barrow,
Jan 2005CSA4050 Machine Translation II1 CSA4050: Advanced Techniques in NLP Machine Translation II Direct MT Transfer MT Interlingual MT.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
Utkal University We Work On Image Processing Speech Processing Knowledge Management.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
NATURAL LANGUAGE PROCESSING
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Natural Language Processing Tasneem Ghnaimat Spring 2013.
INTRODUCTION TO APPLIED LINGUISTICS
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
Introduction to Machine Translation
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Translation Tutorial Lauri Carlson MOLTO kickoff meeting Barcelona March 2010.
English-Korean Machine Translation System
Approaches to Machine Translation
Introduction to Machine Translation
Approaches to Machine Translation
Introduction to Machine Translation
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Machine Translation Dr. Radhika Mamidi

What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software to translate text or speech in between natural languages Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another.

Translation process The translation process can be stated as: Decoding the meaning of the source text, and Re-encoding this meaning in the target language =Analyze the source text for meaning and generate the meaning in target language.

Five types of knowledge used in the translation process: Knowledge of the source language, which allows us to understand the original text. Knowledge of the target language, which makes it possible to produce a coherent text in that language. Knowledge of equivalents between the source and target languages. Knowledge of the subject field as well as general knowledge, both of which aid comprehension of the source language text. Knowledge of socio-cultural aspects, that is, of the customs and conventions of the source and target cultures.

In other words, we need… in-depth knowledge of both the grammar, semantics, syntax, idioms and the like of the source language, as well as the culture of its speakers the same in-depth knowledge of target language is needed to re-encode the meaning in the target language

The challenge How to program a computer to "understand" a text as a human being does Also to "create" a new text in the source language that "sounds" as if it has been written by a human

Some issues Languages have different word orders/structures. Some languages are morphologically complex. Some languages do not have determiners. Identifying and finding equivalents of idioms, collocations, phrasal verbs etc. Lexical gaps Lexical ambiguity Structural ambiguity

History The Georgetown experiment in 1954 [after the second world war] involved fully automatic translation of more than sixty Russian sentences into English. The experiment was a great success and ushered in an era of significant funding for machine translation research. After the ALPAC report in 1966, which found that the ten years long research had failed to fulfill the expectations, the funding was dramatically reduced. ALPAC = Automatic Language Processing Advisory Committee

History (contd.) Starting in the late 1980s, as computational power increased and became less expensive, more interest began to be shown in statistical models for machine translation. Today there are many software programs for translating natural language, several of them online, such as the SYSTRAN system which powers both Google translate and the AltaVista's Babelfish.

Compromise It would be absurd to claim that a machine could produce a target text of the same quality as that of a human being A machine can do the first stage of translation automatically and then human beings can revise and edit the translation. –Machine/Computer Aided Translation systems –Human Aided Machine Translation systems

Currently, the product of machine translation is sometimes called a "gisting translation”. MT will often produce only a rough translation that will at best allow the reader to "get the gist" of the source text. It may not convey a complete understanding of it. The user may find the raw translation sufficiently useful as it is. Despite their inherent limitations, MT programs are currently used by various organisations around the world.

Very useful for… Translating user manuals Translating UN, EU documents Translating instructions Translating web pages Limited domain

Approaches Source Text Target Text Direct Translation Transfer Interlingua AnalysisGeneration

Methods 1. Rule based: Words will be translated in a linguistic way — the most suitable words of the target language will replace the ones in the source language This method requires extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules.

Methods 2. Corpus based: Usually statistics is used to find the equivalents in two languages from the parallel corpus for a given word, phrase or sentence. Given enough data, machine translation programs often work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker.

Types of MT systems 1.Dictionary based MT Machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does — word by word, usually without much correlation of meaning between them.

Types of MT systems 2. Statistical machine translation The machine translation tries to generate translations using statistical methods based on bilingual text corpora 3. Example-based machine translation The EBMT approach is often characterised by its use of a bilingual corpus as its main knowledge base, at run-time. It is essentially a translation by analogy

Evaluation Evaluating the MT system is essential. There are various methods for evaluating the performance of machine translation systems: –The oldest method is by using human judges to tell the quality of a translation, –Automated methods include BLEU, NIST and METEOR.

Online available MT software Alpha Works Emerging Technologies Systran Language Translation Technologies Free Professional Translation Links to Online Translators and Machine Translation Software

Problem -1 Word Sense Disambiguation: Identifying the correct meaning of a polysemous word. Translating with the correct meaning of the word. Example: Telugu-English equivalents perugu = yogurt [N], grow [V] cheyyi = hand [N], do [V] pallu = fruits [N], teeth [N]

Problem 2 Anaphora Resolution: Identifying the NP the pronoun refers to. Translating correctly the pronoun. Examples: Sam went to market. He met Harry. He gave him his telephone number. John and Mary saw some cars. They were new models. John and Mary saw some cars. They were excited.

Problem 3 Structural ambiguity: A sentence having at least two structures, each structure having a different meaning. Examples: They are cooking apples. Flying planes can be dangerous. Susan wrote a letter to her daughter from London.

Problem 4 Translating idioms, collocations, multi-word expressions, frozen expressions, phrasal verbs etc. Problem 5 Translating verbalized nouns as in English He nailed the picture on the wall = He hung the picture on the wall with a nail. He axed him to death = He killed him with an axe.

Assignment 3 Due date: 3 rd June, 2009 Step 1: Choose an English text of five sentences [This text should contain at least some challenging problems that we have discussed] Step 2: Use Systrans/Google translate for translating this text into Arabic Step 3:Use the translated output in Arabic as the input now. Step 4: Translate this back into English Is the original English text and the translated English text the same? Comment where the translation went wrong. [Important: Your comments are very important. So don’t copy from each other.]