Computing Science, University of Aberdeen1 CS4025: Machine Translation l Background, how languages differ l MT Techniques l Controlled languages For more.

Slides:



Advertisements
Similar presentations
Machine Translation II How MT works Modes of use.
Advertisements

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Critical Thinking Course Introduction and Lesson 1
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
1 PLAIN Conference Toronto, September , 2002 AECMA Simplified English Poppy Quintal AECMA Simplified English Poppy Quintal.
Introduction to Semantics and Pragmatics. LING NLP 2 NLP tends to focus on: Syntax – Grammars, parsers, parse trees, dependency structures.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
1 Words and the Lexicon September 10th 2009 Lecture #3.
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
NLP Syntax1 Syntax The Structure of language Dave Inman.
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 23 Jim Martin.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Creation of a Russian-English Translation Program Karen Shiells.
Psych156A/Ling150: Psychology of Language Learning Lecture 17 Language Structure.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
MACHINE TRANSLATION A precious key to communicate beyond linguistic barriers 1.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Some extra stuff Semantic change that results in an antonym of the original word:Semantic change that results in an antonym of the original word: awful:
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
9/8/20151 Natural Language Processing Lecture Notes 1.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Globalisation and machine translation Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate.
1 Natural Language Processing Gholamreza Ghassem-Sani Fall 1383.
Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin.
Leksička semantika i pragmatika 3. predavanje. Machine Translation The Story of the Stone –=The Dream of the Red Chamber (Cao Xueqin 1792) Issues: (“Language.
Introduction to CL & NLP CMSC April 1, 2003.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Rules, Movement, Ambiguity
Artificial Intelligence: Natural Language
CSE573 Autumn /23/98 Natural Language Processing Administrative –PS3 due today –PS4 out Wednesday, due Friday 3/13 (last day of class) special.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Jan 2005CSA4050 Machine Translation II1 CSA4050: Advanced Techniques in NLP Machine Translation II Direct MT Transfer MT Interlingual MT.
Supertagging CMSC Natural Language Processing January 31, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
SYNTAX.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Natural Language Processing (NLP)
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
NATURAL LANGUAGE PROCESSING
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
Introduction to Machine Translation
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Approaches to Machine Translation
Introduction to Machine Translation
Machine Learning in Natural Language Processing
Approaches to Machine Translation
Introduction to Machine Translation
Natural Language Processing
CS246: Information Retrieval
David Kauchak CS159 – Spring 2019
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval
Presentation transcript:

Computing Science, University of Aberdeen1 CS4025: Machine Translation l Background, how languages differ l MT Techniques l Controlled languages For more info: J&M, chap 21 in 1 st ed, 25 in 2 nd. Also extra notes.

Computing Science, University of Aberdeen2 Machine Translation l Automatically translate texts between languages (eg, English to Japanese) »Or assist human translators? l One of the oldest dreams of NLP, AI, and CS (first system in 1954).

Computing Science, University of Aberdeen3 Varieties of Machine Translation Translating from a source language to a target language. l (FA)MT – (full automatic) Machine Translation l HAMT – Human Aided MT (aid before or after) l MAHT – Machine Aided Human Translation

Computing Science, University of Aberdeen4 Brief History of MT Serious but naïve work in the 1950 ’ s l 1966 ALPAC report (speed, cost, accuracy) terminated most research funding “ Underground ” MT systems developed into products (e.g. SYSTRAN) in the 1970 ’ s More MT products emerged in the 1980 ’ s and 1990 ’ s, though still relatively simple l MT now in everyday widespread use (e.g. for web pages), in spite of its problems

Computing Science, University of Aberdeen5 Translation is Hard: Language differences Lexical l Meanings assigned to a word »to know a person »to know a fact l Boundaries on a scale »friend vs acquaintance l Preferences »sibling vs brother vs elder brother l Gaps »Japanese has no word for privacy

Computing Science, University of Aberdeen6 Overlaps between word senses (Eng/Fr)

Computing Science, University of Aberdeen7 Syntactic differences l Morphology vs word-order »English: John saw Jane »Russian: John[+subject] saw Jane[+object] l Which word orders »English: a cheap car »French: a car cheap l Argument order (e.g. VSO/SVO/SOV languages) »English: John likes apples »Spanish: apples gustar John

Computing Science, University of Aberdeen8 Pragmatic differences l Zero pronouns »Bake [] for 20 minutes l Extra distinctions »Relative-status markers in Japanese l Cultural knowledge »mu -> curtains of her bed, not just curtains

Computing Science, University of Aberdeen9 Translating from Japanese to English… l dai yu zi zai chuang shang gan nian bao chai you ting jian chuang wai zhu shao xiang ye zhe shang, yu sheng xi li, qing han tou mu, bu jue you di xia lei lai. l Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen to window outside bamboo tip plantain leaf of on- top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come l As she lay there alone, Dai-yu’s thoughts turned to Bao- chai… Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry.

Computing Science, University of Aberdeen10 Perfect Translation needs World Knowledge Example: Translating “ it ” into a language which associates grammatical gender with nouns requires identifying the antecedent: »A hollow cylinder … rests on a surface … and an object is suspended so that it … EnglishGermanGenderPronoun SurfaceFlaecheFemininesie CylinderZylinderMasculineer ObjectObjektNeuteres

Computing Science, University of Aberdeen11 Approaches to MT

Computing Science, University of Aberdeen12 Direct Translation No intermediate representation. Possibly morphological analysis and simple reordering principles l Input: [Japanese text] l After word-by-word translation »I give PAST pen on desk John to l After word-order, det rewrite rules »I give PAST the pen on the desk to John l After morphology »I gave the pen on the desk to John

Computing Science, University of Aberdeen13 l Completely tied to a language pair »Complete new system for each pair l Problems dealing with ambiguity: Example (Russian-English) »My trebuem mira »We require world(direct translation) »We want peace(correct translation) Don ’ t need complex NLP »used in cheap translators Useful as a “ default translation ” if more complex techniques fail Direct Translation - Issues

Computing Science, University of Aberdeen14 Structural Transfer l Three steps »parse input text (reusable) »rewrite parse tree into parse tree of new language (specific to language pair) –English NP -> Det Adj N becomes –French NP -> Det N Adj »generate output text (reusable) l More in next lecture

Computing Science, University of Aberdeen15 Structural Transfer - Issues l Most popular approach (?) »Used in Systran (Altavista translator) l n*(n-1) transfer components needed for translation between n languages l Good for syntax, less good for words, pragmatics »supplement with other techniques, such as statistical translation of individual words?

Computing Science, University of Aberdeen16 Interlingua Approach l Two steps »full analysis of input text, into a meaning (interlingua) –eg, know into KnowFact or KnowPerson »full generation of output text, from meaning Can ’ t be done except in a small domain l Preserving ambiguity »if target language uses same word for KnowFact or KnowPerson, no need to disambiguate know

Computing Science, University of Aberdeen17 Interlingua Approach - Issues l Interlingua must contain all aspects of meaning needed for all the languages (e.g. gender for Spanish cats) Interlingua must reflect all the different views on how the world is made up (e.g. Japanese “ yasai ” refers mostly to vegetables, but also mint but not carrots) l For this to work, the domain must be restricted and the languages similar l Translation between n languages only needs n analysis components and n generation components

Computing Science, University of Aberdeen18 Statistical Approach l Noisy channel model for speech rec: look for Sentence that maximises P(Sig|Sent)*P(Sent) l MT: look for translation Sent that maximises P(Input|Sent)*P(Sent) »faithfulness*fluency?? »P(Sent) - estimated using bigrams/trigrams »P(Input|Sent) - estimated by analysing a corpus of human-translated texts –eg, how often is know translated as savoir (know fact) and how often as connaitre (know person) –Also model reordering, insertions, deletions

Computing Science, University of Aberdeen19 Statistical Approach - Issues l P(Input|Sent) »Very hard to model situations where translation reorders material, even if this has a simple syntactic description »How “ faithful ” is a proposed output sentence to the original input text? »Less clear what this means once we go beyond translating individual words »Combine with direct techniques?

Computing Science, University of Aberdeen20 l Translating 100 sentences is trivial, the problems are all in the scaling-up. »Good dictionaries are key. l Three uses »Fully automatic rough translation –like Altavista/Systran Babelfish »Draft translations which a human post-edits (humans can postedit quickly as long as less than 20% of words need to be changed) »Tools for translators (MAHT) MT Performance

Computing Science, University of Aberdeen21 Another approach to HAMT: Controlled Languages l A controlled (simplified, basic) English is a subset of full English. »Limited vocabulary: repair but not fix »Limited syntax: I ate but not I have eaten l Mainly used for technical documents l Originally intended to make manuals easier for non-native speakers l MT works much better if input is Controlled English

Computing Science, University of Aberdeen22 l (Emerging) standard for commercial aerospace industry. l Designed by academic linguists as well as practitioners (technical authors). AECMA Simplified English

Computing Science, University of Aberdeen23 AECMA: vocabulary l Fixed vocabulary (2000 words?) with additions limited to specific areas (eg, company names). Goal is “ each word means only one thing ”, and “ each concept is expressed by only one word ”. No ambiguity, no synonyms.

Computing Science, University of Aberdeen24 l Above: only use to indicate physical position »Legal: The wing is above the wheel »Illegal: The engine temperature is above normal »Legal: The engine temperature is more than normal l Test: use as noun only »Legal: the system test »Illegal: Test the circuit. »Legal: Do a test on the circuit. Example words

Computing Science, University of Aberdeen25 AECMA: Syntax Rule: Forbid “ unusual ” English syntax l Ex: only simple past, present, future tenses »Illegal: Any other information is to be ignored »Legal: Ignore any other information l Ex: No gerunds »Illegal: Changing the light is dangerous. »Legal: It is dangerous to change the light.

Computing Science, University of Aberdeen26 l Only two noun-noun modifiers »Illegal: The aircraft door attachment bolt »Legal: The attachment bolt of the aircraft door l Verbs and det. must be included »Illegal: Rotary switch to INPUT »Legal: Set the rotary switch to INPUT AECMA: Syntax Examples (2)

Computing Science, University of Aberdeen27 AECMA: Stylistic Rules l Sentences should be 20 words or less l Paragraphs should be 6 sentences or less. l Start warnings with a command »Illegal: The oil used in the engine contains toxic additives which may be absorbed through the skin. »Legal: Do not get the oil on your skin. It is poisonous.

Computing Science, University of Aberdeen28 Controlled-Language MT l Much easier »No problems disambiguating words »Hard syntax is forbidden »May also prohibit/restrict pronouns l Authors must write in CE »CE conformance checkers l Lot of commercial interest