Presentation is loading. Please wait.

Presentation is loading. Please wait.

SB Program University of Jyväskylä Machine Translation Research Seminar on Software Business 21.5.2003 Antti Ilmo.

Similar presentations


Presentation on theme: "SB Program University of Jyväskylä Machine Translation Research Seminar on Software Business 21.5.2003 Antti Ilmo."— Presentation transcript:

1 SB Program University of Jyväskylä Machine Translation Research Seminar on Software Business 21.5.2003 Antti Ilmo

2 SB Program University of Jyväskylä Outline  Introduction  Translation and Machine Translation Techniques  The Early Machine Translation Systems  Problems of Machine Translation  Proposed Solutions to the Problems  Summary

3 SB Program University of Jyväskylä Introduction  The Internet and globalisation have increased the need for localization of documentation and interaction between different nationalities  Localization is expensive and time consuming  Machine Translation a potential solution  But…

4 SB Program University of Jyväskylä Introduction (2)  MT quality is not good enough –language works on many levels interpretation –dictionary may tell a meaning, but not how it is interpreted »competence, experience and internal models of language users important local usage etc. (Canadian French and French French) –translation may sound ”wrong” in a dialect typos –syntactic errors occur

5 SB Program University of Jyväskylä Outline  Introduction  Translation and Machine Translation Techniques  The Early Machine Translation Systems  Problems of Machine Translation  Proposed Solutions to the Problems  Summary

6 SB Program University of Jyväskylä What is translation?  Preservation of the original text –stylistic and semantic characteristics word-for-word meaning-for-meaning  Rules of language –e.g. letters ”c”, ”a” and ”t” form a word only in the right order  Translation process (translating) and translation product (translated text) –translation concept consists of both of the above  Translator re-codes the message into a different language

7 SB Program University of Jyväskylä MT Technology  Machine Translation (MT) –machine takes care of translation process  Machine Aided Translation (MAT) –Machine-Assisted Human Translation (MAHT) humans translate, machine assists –Human-Assisted Machine Translation (HAMT) machine translates, humans assist –e.g. choosing a correct word from a dictionary  Terminology Databanks (TD) –technical terminology most commonly used nowadays

8 SB Program University of Jyväskylä Linguistic Techniques  Direct vs. indirect –direct uses word replacement –indirect tries to express a meaning  Interlingua vs. transfer –Interlingua does not take into account variations in target languages –transfer approach uses language-specific meaning  local vs. global –local scope uses word-level analysis –global scope analyses sentences or even more

9 SB Program University of Jyväskylä Outline  Introduction  Translation and Machine Translation Techniques  The Early Machine Translation Systems  Problems of Machine Translation  Proposed Solutions to the Problems  Summary

10 SB Program University of Jyväskylä Early Systems (GAT)  Georgetown Automatic Translation –one of the earliest MT projects development began in 1952, in use 1964-1979 –physics texts from Russian to English –replacement of words –no real linguistic theory ”The spirit is willing, but the flesh is weak” translated to Russian and then back to English. The result: ”The wine is agreeable, but the meat has spoiled”

11 SB Program University of Jyväskylä Early Systems (CETA)  Centre d’Etudes pour la Traduction Automatique –launched in 1961 in Grenoble –in use 1967-71 approximately 400,000 words translated –Russian to French –sentence based analysis –Interlingua and transfer mixed grammatical level vs. dictionary level –Realization: Interlingua approach not a good one

12 SB Program University of Jyväskylä Early Systems (SYSTRAN)  one of the first systems marketed  installed in 1970 (US Air Force Foreign Technology Division)  used also at NASA and EURATOM  semantic features ad hoc  negative feedback at first  post-editing found to be a good approach –GM of Canada claimed the system speeded up the work of human translators three to four times (3000-4000 words a day, approximately the same a human translator now translates with the help of translation workbenches)

13 SB Program University of Jyväskylä Early Systems (TAUM-METEO)  TAUM- METEO was the first truly automatic MT system  developed in 1960’s  used by Canadian Meteorological Center –scanned network for English weather reports and translated them to French  corrected its own errors without post-editors –forwarded offending content to human translators  24,000 words/day  problems –communication noise –misspellings –words missing from the dictionary  specialised language made translations possible

14 SB Program University of Jyväskylä Outline  Introduction  Translation and Machine Translation Techniques  The Early Machine Translation Systems  Problems of Machine Translation  Proposed Solutions to the Problems  Summary

15 SB Program University of Jyväskylä Problems  Translation is not straightforward –it is not replacing words for words –word orders –rewriting of text into another language –choosing the right words –e.g. imperative mood in English infinitive in French

16 SB Program University of Jyväskylä Problems (2)  Automation of translation not easy –quality is poor –homographs ”fan” a ventilator or an enthusiast different word classes –e.g. ”love” both a verb and a noun –”you” can be both singular and plural –idioms e.g. ”country music” meaning type of music –personal pronouns second person pronouns may vary in familiar and formal situations –also post-editing can take more time than translating from a scratch

17 SB Program University of Jyväskylä Problems (3)  Morphological analysis –e.g. Chinese and Japanese do not use punctuations sentences are not separated by anything  Syntactic analysis –modifiers a problem ”The boy saw a girl with a telescope” –the girl had a telescope vs. the boy used a telescope to see a girl  Analysis of context –20-40 words in a sentence 100 million possible translations  There are always going to be problem cases

18 SB Program University of Jyväskylä Outline  Introduction  Translation and Machine Translation Techniques  The Early Machine Translation Systems  Problems of Machine Translation  Proposed Solutions to the Problems  Summary

19 SB Program University of Jyväskylä AI-Based Approach  Raman & Alwar 1990  Conversations carried out across enquiry counters on railway stations in India  System should understand a text before translating it –analysis of text to understand the meaning and storing it in a language-free semantic map –semantic maps used to generate translations  Analyzer analyses one sentence at a time –unnecessary adjectives not taken into account –morphological analysis first –building of semantic map second –stages work concurrently –large dictionary needed

20 SB Program University of Jyväskylä AI-Based Approach (2)  Natural language generator builds a sentence in target language –analyzer’s result fed into the generator –translate everything vs. leave something out –definition of structure words in right order and inflected correctly –minimal importance to style  Successful in specific application and a restricted set of sentences

21 SB Program University of Jyväskylä Interactive Approach  Sen, Zhaoxiong and Heyan 1997  Knowledge of MT systems incomplete -> incorrect translations  Possibility for an MT system to learn –quality should improve  Interaction starts when a sentence is found that the system cannot analyse properly –message to the user –user responds with a coded message updates systems knowledge base –interaction limited to three stages lexical analysis uncertain modifiers multiple translations

22 SB Program University of Jyväskylä Multiple Translation Engines & Sentence Partitioning  Ren, Shi and Kuroiwa 2000  Multiple MT systems running in parallel –all use different MT techniques –controller coordinates translating –each engine translates a sentence indepedently –controller chooses the best translation no proper translations leads to sentence partitioning process starts from beginning in the end the partitioned sentence is put back together

23 SB Program University of Jyväskylä Multiple Translation Engines & Sentence Partitioning (2)  Parallel processing should improve success rate –correct translation preserved through procedures –combining the best translations should improve quality  Morphological analysis –analysis gives results that are used as inpupts for the engines –engines are then ran on parallel –if more than one result amount of engines increase –if no results sentence is partitioned problem of partitioning a sentence e.g. Chinese & Japanese  In a test situation with four engines the results improved dramatically –consumed time doubled –1 MT system translated 45.6 % of sentences correctly with multiple engines the result was 74.2 % (Japanese to Chinese)

24 SB Program University of Jyväskylä Outline  Introduction  Translation and Machine Translation Techniques  The Early Machine Translation Systems  Problems of Machine Translation  Proposed Solutions to the Problems  Summary

25 SB Program University of Jyväskylä Summary  Definite solution is still to be found  Biggest problems of MT are linguistic –it is very hard to cover all the rules and adjust them to all possible languages and variations –misspellings cause problems which means a very good proof-reading function is needed  There is a long way to go before MT systems replace human translators  Machine Translation can be used in applications where the language is very specific


Download ppt "SB Program University of Jyväskylä Machine Translation Research Seminar on Software Business 21.5.2003 Antti Ilmo."

Similar presentations


Ads by Google