Administration Introduction/Signup sheet Course web site Course location and time: Thursday,

Slides:



Advertisements
Similar presentations
Machine Translation II How MT works Modes of use.
Advertisements

CSA4050: Advanced Topics in NLP Example Based MT.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
BİL744 Derleyici Gerçekleştirimi (Compiler Design)1.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Rapid Prototyping of Machine Translation Systems A Tale of Two Case Studies Srinivas Bangalore Giuseppe Riccardi AT&T Labs-Research Joint work with German.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Machine translation Context-based approach Lucia Otoyo.
9/8/20151 Natural Language Processing Lecture Notes 1.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Globalisation and machine translation Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.
Natural Language Processing Rogelio Dávila Pérez Profesor – Investigador
One World Your Translation Companion Kenny Risk Alex Cheng Angapparaj Kalimuthu Daniel Mejia.
1 A Finite-State Approach to Machine Translation Srinivas Bangalore Giuseppe Riccardi AT&T Labs-Research NAACL 2001, Pittsburgh,
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
Introduction to CL & NLP CMSC April 1, 2003.
CS 6961: Structured Prediction Fall 2014 Course Information.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Research Topics CSC Parallel Computing & Compilers CSC 3990.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Designing a Machine Translation Project Lori Levin and Alon Lavie Language Technologies Institute Carnegie Mellon University CATANAL Planning Meeting Barrow,
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
Natural Language Processing (NLP)
NATURAL LANGUAGE PROCESSING
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
Introduction to Machine Translation
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Natural Language Processing [05 hours/week, 09 Credits] [Theory]
Advanced Computer Systems
Compiler Design (40-414) Main Text Book:
Approaches to Machine Translation
Introduction to Machine Translation
Natural Language Processing (NLP)
CS416 Compiler Design lec00-outline September 19, 2018
Approaches to Machine Translation
Introduction to Machine Translation
CS416 Compiler Design lec00-outline February 23, 2019
CS246: Information Retrieval
Natural Language Processing (NLP)
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Natural Language Processing (NLP)
Presentation transcript:

Administration Introduction/Signup sheet Course web site Course location and time: Thursday, 1:30pm – 4:20pm, Robertson Hall 023 TA: Juan Carlos Niebles Office: 215 Computer Science Bldg. Phone: (609) jniebles [at] princeton Office hour: TBD or by appointment. Suggested Reading List: (NSW) Readings in Machine Translation, S. Nirenberg, H. Somers and Y. Wilks, MIT Press, 2002 (AT) Translation Engines: Techniques for Machine Translation, Arturo Trujillo, Springer 1999 (JM) Speech and Language Processing, Jurafsky and Martin, Prentice Hall (HS) An introduction to machine translation, W.John Hutchins and Harold L. Somers, London: Academic Press, Assessment: Class participation and attendance 15% Homework assignments 20% Midterm exam 30% Final exam/Term Paper 35%

Machine Translation Srinivas Bangalore AT&T Research Florham Park, NJ 07932

The funnier side of translation… In a Belgrade hotel elevator – “The lift is being fixed for the next day. During that time we regret that you will be unbearable” In a Paris hotel lobby – “Please leave your values at the front desk” On the menu of a Swiss restaurant – “Our wines leave you nothing to hope for” Outside a Hong Kong tailor shop – “Ladies may have a fit upstairs” In an advertisement by a Hong Kong dentist – “Teeth extracted by the latest Methodists” In a Norwegian cocktail lounge – “Ladies are requested not to have children in the bar” In a pet shop in Malaysia – “For hygienic purposes, do not feed your hand to the dog” Machine Translation: – The spirit is willing but the flesh is weak  Russian  The vodka is good but the meat is rotten Source: the web

Outline History of Machine Translation Machine Translation Paradigms Machine Translation Evaluation Applications of Machine Translation

Early days of Machine Translation Success in cryptography (code-breaking) during the war Source Text  Encoded Source Text  Transmit Text Receive Text  Decode Text  Target Text Ciphers: algorithms to encode and decode – Plain text  cipher text  decoded cipher text – cat  dog; fog  bat; ??  bog; Warren Weaver (1947) – When I look at an article in Russian, I say: 'This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode. Ciphers are created to be hard to break, but are usually unambiguous. Natural Languages are not as simple!!

Complexity of Machine Translation Computer program compilation is translation – Languages are designed to be unambiguous and formal – Source language and target language Natural languages are ambiguous – Lexical (e.g. bank, lead) – Structural (e.g. john saw a man with a telescope; flying planes can be dangerous) For Machine Translation: – Ambiguity is compounded!! – Mapping between words of the two languages is not unique – Lexical gaps Languages have different mappings from concepts to words – Word order differences English: Subject-Verb-Object; Japanese, Hindi: Subject-Object-Verb.

Issues in Machine Translation Orthography – Writing from left-to-right vs right-to-left – Character sets (alphabetic, logograms, pictograms) – Segmentation into word/word-like units Morphology Lexical: Word senses – bank  “river bank”, “financial institution” Syntactic: Word order – Subject-verb-object  subject-object-verb Semantic: meaning – “ate pasta with a spoon”, “ate pasta with marinara”, “ate pasta with John” Pragmatic: world knowledge – “Can you pass me the salt?” Social: conversational norms – pronoun usage depends on the conversational partner Cultural: idioms and phrases – “out of the ballpark”, “came from leftfield” Contextual In addition for Speech Translation – Prosody: JOHN eats bananas: John EATS bananas; John eats BANANAS – Pronunciation differences – Speech recognition errors In a multilingual environment – Code Switching: Use of linguistic apparatus of one language to express ideas in another language.

Machine Translation: Why and what’s it good for? Understanding people across linguistic barriers – Socio-Political – Commercial: Globalization Limited availability of human expertise What is it good for? – Tasks with limited vocabulary and syntax (technical manuals) – Rough translations for web pages, s – Applications that use translation as one of the components What is it not good for? – Hard and Important domains (Literature, Legal, Medical) Machine Translation need not be fully automated!! – Human assisted machine translation – Machine assisted human translation – Machine Translation as a productivity enhancement tool.

Machine Translation: Past and Present s present MT as code breaking, IBM-Georgetown Univ. demonstration Large bilingual dictionaries, linguistic and formal grammar motivated syntactic reordering, lots of funding, little progress ALPAC report: “there is no immediate or predictable prospect of useful fully automatic machine translation” Translation continued in Canada, France and Germany. Beyond English- Russian translation. Meteo for translating weather reports. Systran in 1970 Emphasis on ‘indirect’ translation: semantic and knowledge-based. Advent of microcomputers. Translation companies: Systran, Logos, GlobalLink. Domain specific machine-aided translation systems. Corpus-based methods: IBM’s Candide, Japanese ‘example-based’ translation. Speech-to-Speech translation: Verbmobil, Janus. ‘Pure’ to practical MT for embedded applications: Cross-lingual IR

MT Approaches: Different levels of meaning transfer Direct MT Interlingua Transfer-based MT Source Target Depth of Analysis Parsing Semantic Interpretation Semantic Generation Syntactic Generation Syntactic Structure

Spanish : ajá quiero usar mi tarjeta de crédito English : yeah I wanna use my credit card Alignment : Direct Machine Translation Words are replaced using a dictionary – Some amount of morphological processing Word reordering is limited Quality depends on the size of the dictionary, closeness of languages English : I need to make a collect call Japanese : 私は コレクト コールを かける 必要があります Alignment :

Example-based MT Translation-by-analogy: a.A collection of source/target text pairs b.A matching metric c.An word or phrase-level alignment d.Method for recombination ATR EBMT System (E. Sumita, H. Iida, 1991); CMU Pangloss EBMT (R. Brown, 1996) Exact match (direct translation) Target ALIGNMENT (transfer) MATCHING (analysis) RECOMBINATION (generation) Source

Example run of EBMT English-Japanese Examples in the Corpus: 1. He buys a notebook  Kare wa noto o kau 2. I read a book on international politics  Watashi wa kokusai seiji nitsuite kakareta hon o yomu Translation Input: He buys a book on international politics Translation Output: Kare wa kokusai seiji nitsuite kakareta hon o kau Challenge: Finding a good matching metric He bought a notebook A book was bought I read a book on world politics

NLP Pipeline: Beads on a String TokenizationSentence Segmentation Part-of-speech tagging Named Entity Detection Noun/Verb Chunking Syntactic Parsing Semantic Role Labeling Word Sense Disambiguation Co-reference resolution

Named Entity Detection Noun/Verb Chunking Syntactic Parsing Semantic Role Labeling Word Sense Disambiguation Co-reference resolution Part-of-speech tagging TokenizationSentence Segmentation NLP Pipeline: Sentence Segmentation U.S. President lives in Washington D.C. He will travel to Florida this week. U.S. President lives in Washington D.C. He will travel to Florida this week.

Named Entity Detection Noun/Verb Chunking Syntactic Parsing Semantic Role Labeling Word Sense Disambiguation Co-reference resolution Tokenization Part-of-speech tagging Sentence Segmentation NLP Pipeline: Part-of-speech Tagging He will travel to Florida this week. He/PRP will/MD travel/VB to/TO Florida/NNP this/DT week/NN./.

Word Sense Disambiguation Co-reference resolution Named Entity Detection Noun/Verb Chunking Syntactic Parsing Semantic Role Labeling Tokenization Part-of-speech tagging Sentence Segmentation NLP Pipeline: Named Entity Detection President Bush will travel to Florida on February to meet with the CEO of AT&T

Syntactic Parsing Word Sense Disambiguation Co-reference resolution Named Entity Detection Noun/Verb Chunking Semantic Role Labeling Tokenization Part-of-speech tagging Sentence Segmentation NLP Pipeline: Noun/Verb Chunking President Bush will travel to Florida on February to meet with the CEO of AT&T

Word Sense Disambiguation Semantic Role Labeling Noun/Verb Chunking Sentence Segmentation Syntactic Parsing Co-reference resolution Named Entity Detection Tokenization Part-of-speech tagging NLP Pipeline: Syntactic Parsing $PERSON will travel to $PLACE on $DATE to meet with the $JOB of $ORG will travel $Persontoonto meet $PLACE$DATE with $JOB the of $ORG

Noun/Verb Chunking Word Sense Disambiguation Semantic Role Labeling Sentence Segmentation Syntactic Parsing Co-reference resolution Named Entity Detection Tokenization Part-of-speech tagging NLP Pipeline: Semantic Role Labeling will travel $Person to on $PLACE$DATE the of $ORG Named Entity Detection Part-of-speech tagging will travel $Person to on $PLACE $DATE ARG0 ARGM-tmp ARGM-loc

Word Sense Disambiguation Semantic Role Labeling Noun/Verb Chunking Sentence Segmentation Syntactic Parsing Tokenization Part-of-speech tagging NLP Pipeline: Word Sense Disambiguation The man went to the bank to get some money The man went to the bank to get some flowers Co-reference resolution

Word Sense Disambiguation Semantic Role Labeling Noun/Verb Chunking Sentence Segmentation Syntactic Parsing Tokenization Part-of-speech tagging NLP Pipeline: Co-reference resolution The U.S. President lives in Washington D.C. He will return to the capital this week. Co-reference resolution The U.S. President lives in Washington D.C. He will return to the capital this week.

Syntactic Transfer-based Machine Translation Direct and Example-based approaches – Two ends of a spectrum – Recombination of fragments for better coverage. What if the matching/transfer is done at syntactic parse level Three Steps – Parse: Syntactic parse of the source language sentence Hierarchical representation of a sentence – Transfer: Rules to transform source parse tree into target parse tree Subject-Verb-Object  Subject-Object-Verb – Generation: Regenerating target language sentence from parse tree Morphology of the target language Tree-structure provides better matching and longer distance transformations than is possible in string-based EBMT.

I Examples of SynTran-MT quiero ajáusar mitarjeta de crédito wanna yeahuse mycard credit Mostly parallel parse structures Might have to insert word – pronouns, morphological particles

Example of SynTran MT -2 Pros: – Allows for structure transfer – Re-orderings are typically restricted to the parent-child nodes. Cons: – Transfer rules are for each language pair (N 2 sets of rules) – Hard to reuse rules when one of the languages is changed need Imake tocall a collect 必要があります (need) 私は (I) かける (make) コールを (call) コレクト (collect)

Interlingua-based Machine Translation Syntactic transfer-based MT – Couples the syntax of the two languages What if we abstract away the syntax – All that remains is meaning – Meaning is the same across languages – Simplicity: Only N components needed to translate among N languages Two “small” problems: – What is meaning? – How do we represent meaning? Direct MT Interlingua Transfer-based MT Source Target Parsing Semantic Interpretation Semantic Generation Syntactic Generation Syntactic Structure English analyzer Spanish analyzer Japanese analyzer Spanish Generator Japanese Generator English generator Interlingual representation

Example of Interlingua Machine Translation need Imake tocall a collect 必要があります (need) 私は (I) かける (make) コールを (call) コレクト (collect) Interlingua representation

Probabilistic Direct Machine Translation Starting early 1990s, full circle back to code-breaking paradigm of machine translation – With a probabilistic twist What is it: If you want to translate from English to Japanese – assume that the English text started out as a Japanese text – but went through a noisy channel which changed it into English Goal is to recover the best (most probable) Japanese text – J*=argmax J P(J|E) = argmax J P(E|J)*P(J) P(E|J) : Translation faithfulness; P(J): Translation fluency Popular approach due to: – Availability of large amounts of bilingual data (parallel data) – Large memory and high speed computers 私は コレクト コールを かける 必要があります I need to make a collect call Noisy Channel/Encryption P(E|J)

Probabilistic Direct Machine Translation Learn pattern mappings (words and sequences of words) between pairs of sentences in the two languages. - Use the result of translation; not the process of translation - Infer a process that produces a similar result. English : I need to make a collect call Japanese : 私は コレクト コールを かける 必要があります Alignment : Spanish : ajá quiero usar mi tarjeta de crédito English : yeah I wanna use my credit card Alignment :

Applications of Machine Translation

Sector ConsumerBusinessGovernment Example Application s Call Center Web Search Call Center Collaborative Workspace Surveillance Information Dissemination Translation needs Multilingual dialog Web page translation Localization Document translation /Chat translation Speech/text translation AT&T MT prototypes Multilingual customer care Multilingual Instant Messaging Speech/Text Instant messaging

Multilingual Customer Care

Making Travel Arrangements using Multilingual Chat

Large Vocabulary Speech Recognition and Translation

Evaluation of Machine Translation

What is a good translation? Meaning preserving and (social, cultural, conversation) context- appropriate rendering of the source language sentence Bilingual Human Annotators Mark the output of a translation system on a 5 point scale. Expensive!! Too coarse to arrive at a feedback signal to improve the translation system Objective Metrics: Approximations to the real thing!! Lexical Accuracy (LA) – Bag of words. Translation Accuracy (TA) – Based on string alignment Application-driven evaluation – “How May I Help You?” – Spoken dialog for call routing – Classification based on salient phrase detection Machine Translation Evaluation

Machine Translation Evaluation for call routing

Summary Fully Automatic Machine Translation in its full complexity is a very hard task Pragmatic approaches to Machine Translation have been successful – Limited domain/vocabulary – Human-assisted machine translation – Machine-assisted human translation A range of applications for “rough” machine translation Machine Translation will improve as we better understand how people communicate.

book thefliesplease this flight three qing3 yU4ding4 zhe4 ban1ji1 ENGLISH SPEECH ENGLISH WORD LATTICE CHINESE TEXT CHINESE SPEECH ACOUSTIC SEGMENT FEATURE VALUES PRONUNCIATION FEATURE EXTRACTION RECOGNITION SEARCH MACHINE TRANSLATION PHONETIC ANALYSIS AUDIO SYNTHESIS 請預訂這班機 Spoken Language Translation