Albert Gatt LIN3022 Natural Language Processing Lecture 11.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Statistical NLP: Lecture 3
Linguistic Theory Lecture 8 Meaning and Grammar. A brief history In classical and traditional grammar not much distinction was made between grammar and.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Language Translators By: Henry Zaremba. Origins of Translator Technology ▫1954- IBM gives a demo of a translation program called the “Georgetown-IBM experiment”
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Linguistic Theory Lecture 3 Movement. A brief history of movement Movements as ‘special rules’ proposed to capture facts that phrase structure rules cannot.
What a professional translator should know about Machine Translation Harold Somers Professor Emeritus University of Manchester.
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Machine translation Context-based approach Lucia Otoyo.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval Doctorate Course Web Information Retrieval Speaker Gaia Trecarichi.
Globalisation and machine translation Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential.
Chapter 10 Language and Computer English Linguistics: An Introduction.
Postgraduate Diploma in Translation Lecture 1 Computers and Language.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Albert Gatt LIN3021 Formal Semantics Lecture 4. In this lecture Compositionality in Natural Langauge revisited: The role of types The typed lambda calculus.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 16, March 6, 2007.
Jan 2005CSA4050 Machine Translation II1 CSA4050: Advanced Techniques in NLP Machine Translation II Direct MT Transfer MT Interlingual MT.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
Introduction to Machine Translation
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Neural Machine Translation
Approaches to Machine Translation
Introduction to Machine Translation
Statistical NLP: Lecture 3
contrastive linguistics
Statistical NLP: Lecture 13
Approaches to Machine Translation
Introduction to Machine Translation
Linguistic Essentials
contrastive linguistics
contrastive linguistics
Information Retrieval
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Albert Gatt LIN3022 Natural Language Processing Lecture 11

In this lecture We consider the task of Machine Translation Some history Rule-based approaches Contemporary statistical approachs

The Challenge of MT Part 1

Lost in translation (excerpt from script) [Scene: Bob is shooting an advert in Japan] The Director (with blue contact lenses) utters a long stream of sentences in Japanese. TRANSLATOR: He wants you to turn, look in camera and say the lines. BOB: That's all he said? TRANSLATOR: Yes, turn to camera.

Literary: Chinese to English Chinese vs English Very different word order No tense/aspect marking in Chinese Notice lots of adverbials in English... Cultural conventions on literary prose differ. Metaphor, etc...

Non-literary: English to French Still challenging: Word order differences etc But easier than literary translation...

What computational MT is good for Non-literary (factual etc) prose. Typical contemporary uses: Producing a rough translation when this suffices (e.g. Running a web page through Google translate) Computer-aided Human Translation (CAHT): Facilitating human translation by post-editing an automatic draft Carrying out Fully-Automatic, High-Quality Translation (FAHQT) in small, well-defined sublanguages e.g. Weather forecasts, software manuals etc

Why it’s hard: typological differences Morphology: Isolating vs polysynthetic (and the gradations in between) Agglutinative vs fusion morphology Syntax: Word order Argument structure & linking differences Prepositions vs postpositions Premodification vs postmodification....

Why it’s hard: lexical differences Languages exhibit different degrees of polysemy, homonymy etc. Homonyms: English bass (fish, instrument) Spanish lubina (fish) or bajo (instrument) Polysemy: English know a person, know a fact French connaitre vs savoir English ceiling (inside) vs roof (outside) Maltese saqaf English corner Maltese kantuniera (outside), rokna (inside)

Lexemes map in many-to-many fashion One word in L1  several different words in L2 Several words in L2  one word in L1 L1 may have lexical gaps (e.g no English word for Japanese oyakoko meaning ‘filial piety’)

Part 2 A brief history of MT

The Babel problem Governments and companies have had to deal with the problem of multilinguality for decades EU Commission employs hundreds of translators US Defence Department needs continuous translation of documents in scores of different languages For businesses, translation is a key to competitiveness In everyday life, (partial) access to documents in a foreign language may be beneficial

Automatic Machine Translation: history One of the oldest applications of Language Engineering interest in MT goes back at least to the 1950s, during WWII Theorists such as Shannon and Weaver proposed to use statistical techniques. approach abandoned; computers weren’t powerful enough to process the huge amounts of data required Georgetown experiment (1954): jointly developed by IBM and Georgetown University: successful translation of >60 Russian sentences into English authors claimed that MT would be a solved problem in a few years prospect of automatic translation very attractive from a defence point of view (Cold War) government funding began in earnest

Automatic Machine Translation: history ALPAC Report 1966: Automatic Language Processing Advisory Committee (a team of 7 scientists) commissioned by the US Government evaluated efforts in Language Engineering, especially MT concluded that MT was slower than human translation, and less accurate recommended investment in machine aids to translation, rather than automatic MT systems

Automatic Machine Translation: history 1980’s: MT research took off again usually relied on rule-based technologies Basic strategy: 1. Natural Language Understanding: translate source language into a language-independent semantic representation 2. Natural Language Generation: translate semantic representation into target language Problem: highly restricted, due to the huge effort required in encoding rules rules need to encode a lot of knowledge

The underlying model in knowledge-based MT Natural Language utterance (in source language) Deep semantics (language-independent) Natural Language utterance (in source language) Deep semantics (language-independent) Language as the “surface manifestation” of an underlying meaning. Translation as the rendering into a target language of the same underlying meaning Natural Language utterance (in target language)

The Knowledge Bottleneck Codifying all human knowledge relevant to “understand” an utterance seems like an impossible task. The Cyc Project: founded by Doug Lenat Aim: codify human knowledge in all areas (common sense reasoning, mathematics, physics….) ongoing since 1984 First public release (2001) ca. 6,000 concepts; 60,000 facts Latest public release of the knowledge base: 2006 hundreds of thousands of terms millions of assertions relating terms to eachother still incomplete no objective criterion for determining coverage, correctness

Automatic Machine Translation: history 1990s – present: growing interest in the use of very large, parallel corpora for translation many systems radically departed from the traditional “rule- based” methodology proposals include completely statistical systems (no linguistic knowledge at all) Current efforts: usually hybrid, combining linguistic knowledge and statistical techniques

MT Today Many systems are deployed in real-world applications SYSTRAN: one of the oldest companies in the MT business supplier to the EU Commission and US Government Partly undeliers the online Babelfish (Altavista) and Google Translate Many statistical MT systems.

Part 3 Doing MT: classical approaches

The Vauqois Triangle Increasing depth of analysis

Direct translation Incrementally transform source sentence into target sentence. Main knowledge source: a large, bilingual dictionary Each entry in the dictionary is viewed as a “program” (set of instructions) for going from source expression to target expression. 1. Proceed through the source text word by word. For each word/phrase do: 1. Perform shallow morphological analysis 2. Look up corresponding target word/phrase 2. Final step: 1. Reorder target language sentence 2. Perform morphological generation

Direct Translation: example Input: Maria didn’t slap the green witch Morphology: Maria DO-PAST not slap the green witch Lexical transfer: Maria PAST no dar una bofetada a la verde bruja NB: dar una bofetada assumed to be dictionary entry for slap Local reordering: Maria no dar una bofetada a la bruja verde Morphology: Maria no dió una bofetada a la bruja verde

Direct translation: Problems This approach performs no parsing of the input and has little or no knowledge of grammar. Difficult to deal with word order variation and issues related to ordering dependencies over long distances: E.g. In German, adverbials can vary position fairly flexibly E.g. In Chinese, goal PP (send X to Y) often occur before the verb.

Direct Translation: Problems Different word order languages Maltese OVS: Il-mu ż ika j ħ obb Pawlu def-music like-3SgMPaul (order used for emphasis on music) Direct translation into English would be: The music likes Paul. More appropriate: It’s music that Paul likes. But to do this, we would need to syntactically analyse the input, to identify the roles of Paul and the music and to identify the topicalisation.

Syntactic Transfer approaches Systems based on the transfer model: Apply contrastive knowledge of languages 1. Analyse/Parse the source language input 2. Transfer the constituents to the target language 3. Generate the syntactically correct sentence NB: may not need to solve all parsing problems. E.g. PP-attachment ambiguity in the source often translates into PP-attachment ambiguity in the target: Jack saw the girl with the glasses Jack à vu la fille avec les lunettes

Syntactic transfer Additional knowledge source: Morphology as in direct MT; Syntactic transfer rules to go from source to target syntax E.g. Adjective-noun reordering. NP NAdj Maltese, Italian, Spanish kelb kbir cane grande NP ADJN English large dog

Syntactic transfer The original English-Spanish example: After application of transfer rules After bilingual dictionary lookup

Syntactic transfer Dealing with word order variation (e.g. English SVO  Japanese SOV) Example transfer rules: VP  V NP := VP  NP V PP  P NP := PP  NP P

Syntactic transfer: other components These systems also need: Lexical transfer rules (based on a bilingual dictionary) Include phrases In some cases, semantic transfer rules as well E.g. Chinese thematic role GOAL tends to be pre-verbal English: I went to the store Chinese: I to the store went Note: this kind of re-ordering needs sensitivity to semantic info.

Direct + transfer combinations Commercial MT systems often combine direct and transfer approaches. Example: Systran (first documented, 1992, still in use!) 1. Shallow analysis 1. Morphology 2. POS ttagging 3. Chunking of NPs, PPs etc 2. Transfer 1. Translation of idioms 2. Word Sense disambiguation 3. Preposition assignment 3. Synthesis 1. Lexical translation (dictionary-based) 2. Reordering 3. Morphological generation

Interlingua Transfer models require a distinct set of rules for each language pair. Not feasible if we have a many-to-many translation problem E.g. Translating between all pairs of the EU’s 27 official languages Rather than direct transformation from source to target, interlingua-based approaches go : from source to a “language-neutral” meaning representation (the interlingua) From interlingua to target language

Interlingua An interlingua should: Represent all sentences that mean the same thing in the same way. Regardless of the original language Common approach: Simple, event-based representation (Others possible, including use of logic etc)

Event-based interlingua example Mary did not slap the green witch NB: requires deep semantic analysis for: Identification of thematic roles (AGENT, THEME etc) Polarity identification Definiteness, attributes etc E.g. In this example, we need to identify that green is a colour, rather than something else (e.g. “naive”)

Interlingua: Pros No syntactic transformation rules necessary All languages are “generated” from the common interlingua No specific English-Spanish, Spanish-Italian etc rules. No lexical transfer rules French connaitre vs savoir arise from different interlingua semantic representations. The same solution can apply to other languages that make the distinction.

Interlingua: Cons Large burden on deep semantic analysis. Problematic in “open domain translation” – a lot of semantics involves “world knowledge” Interlingua is assumed to be language neutral. But is this realistic? Do we ever have a language-neutral semantics? Example: Chinese/Japanese will require concepts for ELDER-BROTHER and YOUNGER-BROTHER English doesn’t If intelingua is “universal”, then analysing and generating English will have to make use of these concepts, involving a lot of extra work: Brother disambiguated into ELDER or YOUNGER ELDER or YOUNGER both mapped to brother

Part 4 Statistical MT: overview

Alignment In order to be of any use for MT, parallel corpora must be: very large aligned, usually at sentence and word level Uses of aligned corpora: 1. example-based MT 2. statistical MT 3. cross-lingual information retrieval 4. …

Automatic alignment Automatic tools for alignment are now common. Typically use: statistical heuristics (sentence 1 in corpus A is probably a translation of sentence 2 in corpus B) linguistic rules a combination of both

Automatic sentence alignment Simple quantitative measures can give surprisingly good results, e.g.: using sentence length: long sentences in corpus A are likely to be translations of long sentences in corpus B based on the observation that relative sentence length remains roughly constant in translation Alternative is to use linguistically motivated methods: pairing of lexical units pairing of sentences based on functional dependencies (relations between lexical units)

Automatic sentence alignment Results from McEnery and Oakes 1996, using statistical heuristics: LanguagesDomainNo. pars%correct En-Plfiction89100 En-Frtelecom10098 En-Sptelecom En-DeEconomics3675 Ch-EnNews

Automatic sentence alignment A lot depends on the languages compared we can expect sentences in some languages to be more similar

Automatic word alignment Source: Brown et al A statistical approach to machine translation. Computational Linguistics, 16(2): 79—85

Automatic word alignment Simple technique: compare the similarity between two words can use the Dice coefficient score of 1 means “identical” using this technique, yields quite good results again, probably dependent on whether languages are historically related other techniques rely on positional information

Aligning multiword expressions McEnery et al 1996: attempt to align nominal compounds in English and Spanish combine rules and statistical heuristics use the dice coefficient to compare compounds extracted over 80% accuracy for compounds with a Dice score over.85

Statistical MT Brown et al (1990): argued that the time was ripe for a return to the statistical paradigm in MT (abandoned in the 1950’s) Given a sentence S in the source language, and a sentence T in the target language: what is the probability that S is a translation of T? if we know this for many candidate translations, then we can choose the sentence T which is most probable, given S

Statistical MT Brown et al focused on English-French translation “for simple sentences it is reasonable to think of the French translation of an English sentence as being generated from the English sentence word by word” i.e. every English word maps to a French word with some probability, with no intervening semantics

Statistical MT: alignment For more complicated sentences: suppose English S is made up of several words w1, w2…, wn given an alignment model, each word in S maps to one or more words in French probabilities differ for each mapping some words in English won’t map to anything in French some words may map to more than one word in French

Automatic word alignment Source: Brown et al A statistical approach to machine translation. Computational Linguistics, 16(2): 79—85

Brown et al’s alignment probabilities English “not” NB: probabilities are learned automatically from aligned corpora using statistical heuristics English “hear”

Statistical MT: distortion Brown et al (1990): observation: words in English will tend to align with words in French in the same sentence position e.g. English word at the beginning of a sentence maps to French word at the end of sentence but not always the case: sometimes, the position of the French word is quite far from that of the English this is called distortion captured using probabilities: what is the likelihood that word w in English maps to a word w’ in French which is in a different position?

Statistical MT: search Brown et al’s alignment model is then used to translate new sentences: given an English sentence S: 1. search for the most likely French mappings for words in S 2. check whether there is distortion (and move the French words accordingly) 3. output the French sentence

Some observations Highly domain-dependent! e.g. most probable translation for hear is Bravo! reason: aligned corpus used was Canadian parliamentary transcripts, where Hear hear! was often translated as bravo! model had no notion of context: assumption was that every word in a sentence was independent of every other word

Example-based translation again uses parallel aligned corpora Simplest method: given sentence S in source language try and find S in the parallel corpus, and retrieve the aligned translation sentence More elaborate methods exist: e.g. parsed bilingual corpora, aligned at the dependency level

Contemporary approaches Most corpus-based MT approaches recognise the limitations of using corpora alone Many hybrid approaches now developing e.g. use statistical approaches, combined with grammatical rules

Summary MT is one of the fastest-growing applications in Language Engineering Millions invested in research every year. Still far from perfect Contemporary approaches rely strongly on aligned corpora, but are now becoming more hybrid