Machine Translation Introduction

Machine Translation Introduction
Jan Odijk LOT Winterschool Amsterdam January 2011

Overview MT: What is it MT: What is not possible (yet?)
MT: Why is it so difficult? MT: Can we make it possible? MT: Evaluation MT: What is (perhaps) possible Conclusions

MT: What is it? Input: text in source language
Output text in target language that is a translation of the input text

MT: What is it? Interlingua Analyzed input  transfer Analyzed output
Input direct translation Output

MT: System Types Direct: Transfer Interlingual
Earliest systems (1950s) Direct word-to-word translation Recent statistical MT systems Transfer Almost all research and commercial systems <= 1990 Interlingual

MT: System Types Interlingual Hybrid Interlingual/Transfer
A few research systems in the 1980s Rosetta (Philips), based on Montague Grammar Semantic derivation trees of attuned grammars Distributed Translation (BSO) (enriched) Esperanto Sometimes logical representations Hybrid Interlingual/Transfer Transfer for lexicons; IL for rules

Rule-Based Systems Most systems explicit source language grammar
parser yields analysis of source language input transfer component turns it into target language structure no explicit grammar of target language (except morphology)

Rule-Based Systems Some systems (Eurotra)
explicit source and target language grammar sometimes reversible parser yields analysis of source language input transfer component turns it into target language structure generation of translation by target language grammar

Rule-Based Systems Some systems (Rosetta, DLT)
explicit source and target language grammar in some cases reversible parser yields interlingual representation generation of translation by target language grammar from interlingual representation

MT: Is it difficult? FAHQT: Fully Automatic High Quality Translation
Fully Automatic: no human intervention High Quality: close or equal to human translation Even acceptable quality is difficult to achieve

MT: Why is it so difficult?
Ambiguity Real Temporary Computational Complexity Complexity of language Divergences Language Competence v. Language Use Require large and rich lexicons

De jongen sloeg het meisje met de gitaar Hij heeft boeken gelezen Hij heeft uren gelezen He has been reading books *He has been reading for books *He has been reading hours He has been reading for hours

Uren: not only also dagen, de hele dag, weken, … (Words expressing units of time) But also: De hele vergadering, meeting, bijeenkomst, les, … (words expressing events)

Hij draagt een bruin pak Dragen: wear or carry Pak: suit or package Hij draagt een bruin pak en zwarte schoenen Hij draagt een bruin pak onder zijn arm

Voert uw bedrijf sloten uit? Uitvoeren: execute, or export? Bedrijf: act, or company? Sloten: ditches, or locks?

Temporary Ambiguity Hij heeft boeken gelezen Heeft: main or auxiliary verb? Boeken: noun or verb Voert uw bedrijf sloten uit? Voert: form of voeren or of uitvoeren, Bedrijf: noun or verb form? Sloten uit: noun+particle or PP: out of ditches/locks

Why is MT difficult? Ambiguity of natural language Summary
requires modeling of knowledge of the world /situation by rule systems, and/or by statistics

Computational Complexity High demands of processing capacity High demands on memory Complexity of language Many different construction types All interacting with each other

Why is MT difficult? Divergences between language
require deep syntactic analysis Or very sophisticated statistical techniques

Divergences: Category mismatches
Simple category mismatches woonachtig (zijn) v. reside (Adj – Verb) zich ergeren v. (be) annoyed (Verb-Adj) verliefd v. in love (Adj- Prep+Noun) kunnen v. (be) able kunnen v. (be) possible door- v. continue (to)

More complex category mismatches graag vs. like (Adv vs. Verb) hij zwemt graag vs. he likes to swim toevallig vs. happen hij viel toevallig vs. he happened to fall

Phrasal category mismatches de zieke vrouw the woman who is ill (* the ill woman) I expect her to leave ik verwacht dat zij vertrekt She is likely to come het is waarschijnlijk dat zij komt

Conflational Divergences:
prepositional complements houden van vs. love existential er vs. Ø er passeerde een auto vs. a car passed verbal particles blow (something) up vs. volar

Conflational Divergences:
reflexive verbs zich scheren vs. shave composed vs. simple tense forms he will do it vs. lo hará split negatives vs. composed negatives he does not see anyone vs. hij ziet niemand

Functional Divergences:
I like these apples me gustan estas manzanas se venden manzanas aqui hier verkoopt men appels er werd door de toeschouwers gejuicht the spectators were cheering

Divergences: MWEs semi-fixed MWEs flexible idioms
nuclear power plant vs. kerncentrale flexible idioms de plaat poetsen vs. bolt de pijp uit gaan v. to kick the bucket

Divergences: MWEs semi-idioms (collocations)
zware shag vs. strong tobacco semi-idioms (support verbs) aandacht besteden aan pay attention to

Language Competence v. Language Use Earlier systems implemented idealized reality But not the really occurring language use In some cases focus on theoretically interesting difficult constructions That do occur in reality But other constructions are more important to deal with in practical systems

Large and rich lexicons Existing human-oriented dictionaries are not suited as such All information must be available in a formalized way Much more information is needed than in a traditional dictionary

Multi-word Expressions (MWEs) Are in current dictionaries only in a very informal way No standards on how to represent them lexically Many different types requiring different treatment in the grammar Huge numbers!! Domain and company-specific terminology are often MWEs

MT: Can we make it possible?
Probably not, but we can still improve significantly Lexicons Selection restrictions Approximating analyses Statistical MT

Large and rich lexicons widely accepted and used (de facto) standards Methods and tools to quickly adapt to domain or company specific vocabulary Better treatment of MWEs and standards for lexical representation of MWEs

Selection restrictions with type system to approach modeling of world knowledge Requires sophisticated syntactic analysis Boek: info (legible) Uur: time unit  duration Vergadering: event  duration Lezen: subject=human; object=info (legible) Durational adjunct must be a duration phrase

Selection restrictions Pak (1) (suit): cloths Pak (2) (package): entity Dragen (1) (wear): subj=animate; object=cloths Dragen (2) (carry): subj=animate; object= entity Schoen: cloths Entity > cloths Identity preferred over subsumption Homogeneous object preferred over heterogeneous one

Selection restrictions Hij draagt een bruin pak He wears a brown suit (1: cloths=cloths) He carries a brown package (1: entity=entity) He carries a brown suit (2: entity > cloth) *He wears a brown package (cloth ¬> entity) Hij draagt een bruin pak en zwarte schoenen He wears a brown suit and black shoes (1: homogeneous and cloths=cloths) He carries a brown suit and black shoes (2: homogeneous but entity > cloths) He carries a brown package and black shoes(2: inhomogeneous but entity=entity) *He wears a brown package and black shoes (cloths ¬> entity)

Approximating analyses Ignore certain ambiguities to begin with Use only limited amount of relevant information Cut off analysis when there are too many alternatives This is currently actually done in all practical systems Need new ways of doing this without affecting quality too seriously

Statistical MT Derives MT-system automatically From statistics taken from Aligned parallel corpora ( translation model) Monolingual target language corpora ( language model) Being worked since early 90’s

Plus: No or very limited grammar development Includes language and world knowledge automatically (but implicitly) Based on actually occurring data Currently many experimental and commercial systems Minus: Requires large aligned parallel corpora Unclear how much linguistics will be needed anyway Probably restricted to very limited domains only

Google Translate (statistical MT) Hij draagt een pak.  √He wears a suit. Hij draagt schoenen.  √ He wears shoes. Hij draagt bruine schoenen en een pak.  √ He wears a suit and brown shoes. (!!) Hij draagt het pakket  √ He carries the package Hij heeft een pak aan.  *He has a suit. Voert uw bedrijf sloten uit?  *Does your company locks out?

Euromatrix esp. “the Euromatrix” Lists data and tools for European language pairs Goals Translation systems for all pairs of EU languages Organization, analysis and interpretation of a competitive annual international evaluation of machine translation The provision of open source machine translation technology including research tools, software and data A systematically compiled and constantly updated detailed survey of the state of MT technology for all EU language pairs Efficient inclusion of linguistic knowledge into statistical machine translation The development and testing of hybrid architectures for the integration of rule-based and statistical approaches

Euromatrix esp. “the Euromatrix” Lists data and tools for European language pairs Goals Translation systems for all pairs of EU languages Organization, analysis and interpretation of a competitive annual international evaluation of machine translation The provision of open source machine translation technology including research tools, software and data A systematically compiled and constantly updated detailed survey of the state of MT technology for all EU language pairs Efficient inclusion of linguistic knowledge into statistical machine translation The development and testing of hybrid architectures for the integration of rule-based and statistical approaches Successor project EuromatrixPlus

META-NET (EU-funding) Building a community with shared vision and strategic research agenda Building META-SHARE, an open resource exchange facility Building bridges to neighbouring technology fields Bringing more Semantics into Translation Optimising the Division of Labour in Hybrid MT Exploiting the Context for Translation Empirical Base for Machine Translation

PACO-MT Investigates hybrid approach to MT Rule-based and statistical Uses existing parser for source language analysis Uses statistical n-gram language models for generation Uses statistical approach to transfer

MT Evaluation Evaluation depends on purpose of MT and how it is used
application, domain, controlled language Many aspects can be evaluated functionality, efficiency, usability, reliability, maintainability, portability translation quality embedding in work flow post-editing options/tools

MT Evaluation Focus here: Again, many aspects:
does the system yield good translations according to human judgement in the context of developing a system Again, many aspects: fidelity (how close), correctness, adequacy, informativeness, intelligibility, fluency and many ways to measure these aspects

MT Evaluation Test suite Advantages Disadvantages Reference =
list of (carefully selected) sentences with their translations (ordered by score) translations judged correct by human (usually developer) upon every update of the system output of the new system is compared to the reference if different: system has to be adapted, or reference has to be adapted Advantages focus on specific translation problems possible excellent for regression testing Manual judgement needed only once for each new output –other comparisons are automatic Disadvantages not really independent particularly suited for pure rule-based systems human judgement needed if output differs from reference

MT Evaluation Comparison against Advantages Disadvantage Useful
translation corpus independently created by human translators possibly multiple equivalently correct translations of a sentence Advantages truely independent also suited for data-driven systems Disadvantage requires human judgement (every time there is a system update) high effort by highly skilled people, high costs, requires a lot of time human judgement is not easy (unless there is a perfect match) Useful for a one-time evaluation of a stable system not for evaluation during development

MT Evaluation Edit-Distance (Word Accuracy)
metric to determine closeness of translations automatically the least number of edit operations to turn the translated sentence into the reference sentence Alshawi et al. 1998

MT Evaluation WA = 1- ((d+s+i)/max(r,c)) d= number of deletions
s = number of substitutions i = number of insertions r = reference sentence length c = candidate sentence length easy to calculate using Levenshtein distance algorithm (dynamic programming) various extensions have been proposed

MT Evaluation Advantages Disadvantages
fully automatic given a reference set Disadvantages penalizes candidates if a synonym is used penalizes swaps of words and block of words too much

MT Evaluation BLEU (method to automate MT Evaluation) Required:
the closer a machine translation is to a professional human translation, the better it is BiLingual Evaluation Understudy Required: corpus of good quality human reference translations a “closeness” metric

MT Evaluation Two candidate translations from Chinese source
C1: It is a guide to action which ensures that the military always obeys the commands of the party C2: It is to insure the troops forever hearing the activity guidebook that party direct Intuitively: C1 is better than C2

MT Evaluation Three reference translations
R1: It is a guide to action that ensures that the military will forever heed Party commands R2: It is the guiding principle which guarantees the military forces always being under the command of the Party R3: It is the practical guide for the army always to heed the directions of the party

MT Evaluation Basic idea:
a good candidate translation shares many words and phrases with reference translations comparing n-gram matches can be used to rank candidate translations n-gram: a sequence of n word occurrences in BLEU n=1,2,3,4 1-grams give a measure of adequacy longer n-grams give a measure of fluency

MT Evaluation For unigrams: count the number of matching unigrams
in all references divide by the total number of unigrams (in the candidate sentence)

MT Evaluation Problem Solution:
C1: the the the the the the the (=7/7=1) R1: the cat is on the mat Solution: clip matching count (7) by maximum reference count (2)  2 (CountClip)  modified unigram precision = 2/7=0.29

MT Evaluation Example (unigrams)
C1: It is a guide to action which ensures that the military always obeys the commands of the party (17/18=0.94) R1: It is a guide to action that ensures that the military will forever heed Party commands R2: It is the guiding principle which guarantees the military forces always being under the command of the Party R3: It is the practical guide for the army always to heed the directions of the party

MT Evaluation Example (unigrams)
C2: It is to insure the troops forever hearing the activity guidebook that party direct (8/14=0.57) R1: It is a guide to action that ensures that the military will forever heed Party commands R2: It is the guiding principle which guarantees the military forces always being under the command of the Party R3: It is the practical guide for the army always to heed the directions of the party

MT Evaluation Example (bigrams)
C1: It is a guide to action which ensures that the military always obeys the commands of the party (10/17=0.59) R1: It is a guide to action that ensures that the military will forever heed Party commands R2: It is the guiding principle which guarantees the military forces always being under the command of the Party R3: It is the practical guide for the army always to heed the directions of the party

MT Evaluation Example (bigrams)
C2: It is to insure the troops forever hearing the activity guidebook that party direct (1/13=0.08) R1: It is a guide to action that ensures that the military will forever heed Party commands R2: It is the guiding principle which guarantees the military forces always being under the command of the Party R3: It is the practical guide for the army always to heed the directions of the party

MT Evaluation Extend to a full multi-sentence corpus
compute n-gram matches sentence by sentence sum the clipped n-gram counts for all candidates divide by the number of n-grams in the text corpus pn = ∑C ∈ {Candidates}∑n-gram ∈ C Countclip(n-gram) divided by ∑C’ ∈ {Candidates}∑n-gram’ ∈ C’ Count(n-gram’)

MT Evaluation Combining n-gram precision scores
weighted linear average works reasonable ∑Nn=1 wn pn but: n-gram decisions decays exponentially with n (so log to compensate for this) exp (∑Nn=1 wn log pn) weights in BLEU: wn = 1/N

MT Evaluation BLEU is a precision measure
#(C ∩ R) / #C Recall is difficult to define because of multiple reference translations e.g. #(C ∩ Rs) / # Rs where Rs = Ui Ri will not work

MT Evaluation C1: I always invariably perpetually do C2: I always do
R1: I always do R2: I invariably do R3: I perpetually do Recall of C1 over R1-3 is better than C2 but C2 is a better translation

MT Evaluation But without Recall: C1: of the
compared with R1-3 as before modified unigram precision = 2/2 modified bigram precision = 1/1 which is the wrong result

MT Evaluation Length n-gram precision penalizes translations longer than the reference but not translations shorter than the reference  Add Brevity Penalty (BP)

MT Evaluation bi= best match length = reference sentence length closest to candidate sentence i‘s length (e.g. r:12, 15, 17, c: 12  12) r = test corpus effective reference length = ∑i bi c = total length of candidate translation corpus

MT Evaluation BP = BLEU = BP • exp (∑Nn=1 wn log pn)
computed over the corpus not sentence by sentence and averaged 1 if c > r e(1-r/c) if c <= r BLEU = BP • exp (∑Nn=1 wn log pn)

MT Evaluation BLEU: claim: BLEU closely matches human judgement
when averaged over a test corpus not necessarily on individual sentences shown extensively in Papineni et al. 2001  multiple reference translations are desirable to cancel out translation styles of individual translators (e.g. East Asian economy v. economy of East Asia)

MT Evaluation Variants on BLEU NIST ROUGE (Lin and Hovy 2003)
different weights different BP ROUGE (Lin and Hovy 2003) for text summarization Recall-Oriented Understudy for Gisting Evaluation

MT Evaluation Main Advantage of BLEU Disadvantage automatic evaluation
good for use during development particularly useful for data-based systems Disadvantage defined for a whole test corpus not for individual sentences just measures difference with reference

MT: What is (perhaps) possible
Cross-Language Information Retrieval Low Quality MT for Gist extraction MT and Speech Technology Controlled Language Limited Domain Interaction with author Combinations of the above Computer-aided translation

Cross-Language Information Retrieval (CLIR) Input query: in own language Input query translated into target languages Search in target language documents Results in target language Translation of individual words only Growing need (growing multilingual Web) No perfect translation required

Low quality MT for Gist extraction Low quality but still useful If interesting high quality human translation can be requested (has to be paid for)

CLIR Fills a growing need in the market Is technically feasible Creates need for translation of found documents Solved partially by low quality MT Potentially creates need for more human translation Stimulates (funds) research into more sophisticated MT

Combine MT (statistical or rule-based) with OCR technology Make a picture of a text with your phone Text is OCR-ed Text is translated (usually a short and simple text) Linguatec Shoot & Translate Word Lens

Combine MT (statistical or rule-based) with Speech technology Complicates the problem on the one hand but Speech technology (ASR) is currently limited to very limited domains (makes MT simpler) Many useful applications for speech technology currently in the market Directory assistance Tourist Information Tourist communication Call Centers Navigation Hotel reservations Some will profit from in-built automatic translation

Large EC FP6 project TC-STAR (2004-) ( Research into improved speech technology (ASR and TTS) Research into statistical MT Research in combining both (speech-to-speech translation) In a few selected limited domains

Commercial Speech2Speech Translation Jibbigo Speech-to-speech translation (iPhone, Android) Talk to Me (Android phones)

Controlled Language Authoring System limits vocabulary and syntax of document authors Often desirable in companies to get consistent documentation (e.g. aircraft maintenance manuals) AECMA Simplified English GIFAS Rationalized French Makes MT easier (language well-defined)

Limited Domain Translation of Weather reports (TAUM-Meteo, Canada) Avalanche warnings (Switzerland) Fast adaptation to domain/company-specific vocabulary and terminology

Interaction with author No fully automatic translation Document author resolves Ambiguities unresolved by the system In a dialogue between the author and the system in the source language Approach taken in Rosetta project (Philips) Will only work if the #unresolved ambiguities is low Questions to resolve ambiguity are clear

Hij droeg een bruin pak Wat bedoelt u met “pak” (1) kostuum (2) pakket Wat bedoelt u met “dragen (droeg)” (1) aan of op hebben (kleding) (2) bij zich hebben (bijv. in de hand)

Combinations of the above

Computer-aided translation For end-users For professional translators/localization industry Limited functionality Specific terminology Bootstrap translation automatically Human revision and correction (Post-edit) Only if MT Quality is such that it reduces effort The system is fully integrated in the workflow system

Conclusions FAHQT not possible (yet?) MT is really very difficult!
Several constrained versions do yield usable technology with state-of-the-art MT In some cases: even potentially creates additional needs for MT and human translation

Conclusions Statistical MT yields practical relatively quick to produce systems (but low-quality) More research and lots of hard work is needed to get better systems Will probably require hybrid systems (mixed statistically based/knowledge based); the focus of research is here (PACO-MT, META-NET,…) Needs to be financed by niches where current state-of-the art MT yields usable technology and there is a market.

Machine Translation Introduction

Similar presentations

Presentation on theme: "Machine Translation Introduction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Translation Introduction

Similar presentations

Presentation on theme: "Machine Translation Introduction"— Presentation transcript:

Similar presentations

About project

Feedback