Fall 2004 Lecture Notes #7 EECS 595 / LING 541 / SI 661 Natural Language Processing.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

Machine Translation II How MT works Modes of use.
Natural Language Understanding Difficulties: Large amount of human knowledge assumed – Context is key. Language is pattern-based. Patterns can restrict.
Statistical NLP: Lecture 3
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Machine Translation Anna Sågvall Hein Mösg F
C SC 620 Advanced Topics in Natural Language Processing Lecture 19 4/6.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
French grammar and grammatical analysis
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Statistical Alignment and Machine Translation
9/8/20151 Natural Language Processing Lecture Notes 1.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Globalisation and machine translation Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate.
1 Computational Linguistics Ling 200 Spring 2006.
Lecture 2 What Is Linguistics.
A multiple knowledge source algorithm for anaphora resolution Allaoua Refoufi Computer Science Department University of Setif, Setif 19000, Algeria .
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Discourse Read J & M Chapter More than One Sentence at a Time The alphas have a long-standing hatred of the betas. Their leaders have decided that.
Fall 2005 Lecture Notes #9 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, January 2003.
30 March – 8 April 2005 Dipartimento di Informatica, Universita di Pisa ML for NLP With Special Focus on Tagging and Parsing Kiril Ribarov.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 24 (14/04/06) Prof. Pushpak Bhattacharyya IIT Bombay Word Sense Disambiguation.
Natural Language Processing Chapter 1 : Introduction.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
SYNTAX.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Recap: distributional hypothesis What is tezgüino? – A bottle of tezgüino is on the table. – Everybody likes tezgüino. – Tezgüino makes you drunk. – We.
Natural Language Processing Tasneem Ghnaimat Spring 2013.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
American Sign Language
Natural Language Processing [05 hours/week, 09 Credits] [Theory]
Grammar Module 1: Grammar: what and why? (GM1)
Approaches to Machine Translation
Statistical NLP: Lecture 3
Statistical NLP: Lecture 13
Approaches to Machine Translation
The Winograd Schema Challenge Hector J. Levesque AAAI, 2011
Presentation transcript:

Fall 2004 Lecture Notes #7 EECS 595 / LING 541 / SI 661 Natural Language Processing

Machine Translation

Example (from the Hansards corpus) English I would like the government and the Postmaster General to agree that we place the union and the Postmaster General under trusteeship so that we can look at his books and records, including those of his management people and all the memos he has received from them, some of which must have shocked him rigid. If the minister would like to propose that, I for one would be prepared to support him. French Je voudrais que le gouvernement et le ministre des Postes conviennent de placer le syndicat et le ministre des Postes sous tutelle afin que nous puissions examiner ses livres et ses dossiers, y compris ceux de ses collaborateurs, et tous les mémoires qu'il a reçus d'eux, dont certains l'ont sidéré. Si le ministre voulait proposer cela, je serais pour ma part disposé à l'appuyer.

Example These lies are like their father that begets them; gross as a mountain, open, palpable (Henry IV, Part 1, act 2, scene 2)

Language similarities and differences Word order (SVO: English, Mandarin, VSO: Irish, Classical Arabic, SOV: Hindi, Japanese) Prepositions (Jap.) (to Mariko, Mariko-ni) Lexical distinctions (Sp.): –the bottle floated out –la botella salió flotando Brother (Jap.) = otooto (younger), oniisan (older) They (Fr.) = elles (feminine), ils (masculine)

Why is Machine Translation Hard? Analysis Transfer/interlingua Generation INPUT OUTPUT 2 OUTPUT 1 OUTPUT 3

Basic Strategies of MT Direct Approach –50’s,60’s –naïve Indirect: Interlingua –No looking back –Language-neutral –No influence on the target language Indirect: Transfer –Preferred F E I

Levels of Linguistic Processing Phonology Orthography Morphology (inflectional, derivational) Syntax (e.g., agreement) Semantics (e.g., concrete vs. abstract terms) Discourse (e.g., use of pronouns) Pragmatics (world knowledge)

Category Ambiguity Morphological ambiguity (“Wachtraum”) Part-of-speech (category) ambiguity (e.g. “round”) Some help comes from morphology (“rounding”) Using syntax, some ambiguities disappear (context dictates category)

Homography and Polysemy Homographs: (“light”, “club”, “bank”) Polysemous words: (“channel”, “crane”) for different categories - syntax for same category - semantics

Structural Ambiguity Humans can have multiple interpretations (parses) for the same sentence Example: prepositional phrase attachment Use context to disambiguate For machine translation, context can be hard to define

Use of Linguistic Knowledge Subcategorization frames Semantic features (is an object “readable”?)

Contextual Knowledge In practice, very few sentences are truly ambiguous Context makes sense for humans (“telescope” example), not for machines no clear definition of context

Other Strategies Pick most natural interpretation Ask the author Make a guess Hope for a free ride Direct transfer

Anaphora Resolution Use of pronouns (“it”, “him”, “himself”, “her”) Definite anaphora (“the young man”) Antecedents Same problems as for ambiguity resolution Similar solutions (e.g., subcategorization)

When does MT work? Machine-Aided Translation (MAT) Restricted Domains (e.g., technical manuals) Restricted Languages (sublanguages) To give the reader an idea of what the text is about

The Noisy Channel Model Source-channel model of communication Parametric probabilistic models of language and translation Training such models

Statistics Given f, guess e e f e’ E  FF  E encoderdecoder e’ = argmax P(e|f) = argmax P(f|e) P(e) e e translation modellanguage model

Parametric probabilistic models Language model (LM) Deleted interpolation Translation model (TM) P(e) = P(e 1, e 2, …, e L ) = P(e 1 ) P(e 2 |e 1 ) … P(e L |e 1 … e L-1 ) P(e L |e 1 … e K-1 )  P(e L |e L-2, e L-1 ) Alignment: P(f,a|e)

IBM’s EM trained models Word translation Local alignment Fertilities Class-based alignment Non-deficient algorithm (avoid overlaps, overflow)

Evaluation Human judgements: adequacy, grammaticality Automatic methods –BLEU –ROUGE

Readings for next time J&M Chapters 18, 21