Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.

Similar presentations


Presentation on theme: "Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software."— Presentation transcript:

1 Machine Translation Dr. Radhika Mamidi

2 What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software to translate text or speech in between natural languages Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another.

3 Translation process The translation process can be stated as: Decoding the meaning of the source text, and Re-encoding this meaning in the target language =Analyze the source text for meaning and generate the meaning in target language.

4 Five types of knowledge used in the translation process: Knowledge of the source language, which allows us to understand the original text. Knowledge of the target language, which makes it possible to produce a coherent text in that language. Knowledge of equivalents between the source and target languages. Knowledge of the subject field as well as general knowledge, both of which aid comprehension of the source language text. Knowledge of socio-cultural aspects, that is, of the customs and conventions of the source and target cultures.

5 In other words, we need… in-depth knowledge of both the grammar, semantics, syntax, idioms and the like of the source language, as well as the culture of its speakers the same in-depth knowledge of target language is needed to re-encode the meaning in the target language

6 The challenge How to program a computer to "understand" a text as a human being does Also to "create" a new text in the source language that "sounds" as if it has been written by a human

7 Some issues Languages have different word orders/structures. Some languages are morphologically complex. Some languages do not have determiners. Identifying and finding equivalents of idioms, collocations, phrasal verbs etc. Lexical gaps Lexical ambiguity Structural ambiguity

8 History The Georgetown experiment in 1954 [after the second world war] involved fully automatic translation of more than sixty Russian sentences into English. The experiment was a great success and ushered in an era of significant funding for machine translation research. After the ALPAC report in 1966, which found that the ten years long research had failed to fulfill the expectations, the funding was dramatically reduced. ALPAC = Automatic Language Processing Advisory Committee

9 History (contd.) Starting in the late 1980s, as computational power increased and became less expensive, more interest began to be shown in statistical models for machine translation. Today there are many software programs for translating natural language, several of them online, such as the SYSTRAN system which powers both Google translate and the AltaVista's Babelfish.

10 Compromise It would be absurd to claim that a machine could produce a target text of the same quality as that of a human being A machine can do the first stage of translation automatically and then human beings can revise and edit the translation. –Machine/Computer Aided Translation systems –Human Aided Machine Translation systems

11 Currently, the product of machine translation is sometimes called a "gisting translation”. MT will often produce only a rough translation that will at best allow the reader to "get the gist" of the source text. It may not convey a complete understanding of it. The user may find the raw translation sufficiently useful as it is. Despite their inherent limitations, MT programs are currently used by various organisations around the world.

12 Very useful for… Translating user manuals Translating UN, EU documents Translating instructions Translating web pages Limited domain

13 Approaches Source Text Target Text Direct Translation Transfer Interlingua AnalysisGeneration

14 Methods 1. Rule based: Words will be translated in a linguistic way — the most suitable words of the target language will replace the ones in the source language This method requires extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules.

15 Methods 2. Corpus based: Usually statistics is used to find the equivalents in two languages from the parallel corpus for a given word, phrase or sentence. Given enough data, machine translation programs often work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker.

16 Types of MT systems 1.Dictionary based MT Machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does — word by word, usually without much correlation of meaning between them.

17 Types of MT systems 2. Statistical machine translation The machine translation tries to generate translations using statistical methods based on bilingual text corpora 3. Example-based machine translation The EBMT approach is often characterised by its use of a bilingual corpus as its main knowledge base, at run-time. It is essentially a translation by analogy

18 Evaluation Evaluating the MT system is essential. There are various methods for evaluating the performance of machine translation systems: –The oldest method is by using human judges to tell the quality of a translation, –Automated methods include BLEU, NIST and METEOR.

19 Online available MT software Alpha Works Emerging Technologies http://www.alphaworks.ibm.com/aw.nsf/html/mt Systran Language Translation Technologies http://www.systransoft.com/index.html Free Professional Translation http://www.freetranslation.com/ 100 Links to Online Translators and Machine Translation Software http://www.bultra.com/mtlinks.htm

20 Problem -1 Word Sense Disambiguation: Identifying the correct meaning of a polysemous word. Translating with the correct meaning of the word. Example: Telugu-English equivalents perugu = yogurt [N], grow [V] cheyyi = hand [N], do [V] pallu = fruits [N], teeth [N]

21 Problem 2 Anaphora Resolution: Identifying the NP the pronoun refers to. Translating correctly the pronoun. Examples: Sam went to market. He met Harry. He gave him his telephone number. John and Mary saw some cars. They were new models. John and Mary saw some cars. They were excited.

22 Problem 3 Structural ambiguity: A sentence having at least two structures, each structure having a different meaning. Examples: They are cooking apples. Flying planes can be dangerous. Susan wrote a letter to her daughter from London.

23 Problem 4 Translating idioms, collocations, multi-word expressions, frozen expressions, phrasal verbs etc. Problem 5 Translating verbalized nouns as in English He nailed the picture on the wall = He hung the picture on the wall with a nail. He axed him to death = He killed him with an axe.

24 Assignment 3 Due date: 3 rd June, 2009 Step 1: Choose an English text of five sentences [This text should contain at least some challenging problems that we have discussed] Step 2: Use Systrans/Google translate for translating this text into Arabic Step 3:Use the translated output in Arabic as the input now. Step 4: Translate this back into English Is the original English text and the translated English text the same? Comment where the translation went wrong. [Important: Your comments are very important. So don’t copy from each other.]


Download ppt "Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software."

Similar presentations


Ads by Google