Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wrapper Syntax for Example-Based Machine Translation Karolina Owczarzak, Bart Mellebeek, Declan Groves, Josef Van Genabith, Andy Way National Centre for.

Similar presentations


Presentation on theme: "Wrapper Syntax for Example-Based Machine Translation Karolina Owczarzak, Bart Mellebeek, Declan Groves, Josef Van Genabith, Andy Way National Centre for."— Presentation transcript:

1 Wrapper Syntax for Example-Based Machine Translation Karolina Owczarzak, Bart Mellebeek, Declan Groves, Josef Van Genabith, Andy Way National Centre for Language Technology School of Computing Dublin City University

2 Overview TransBooster – wrapper technology for MT –motivation –decomposition process –variables and template contexts –recomposition Example-Based Machine Translation –marker-based EBMT Experiment –English-Spanish –Europarl, Wall Street Journal section of Penn II Treebank –automatic and manual evaluation Comparison with previous experiments

3 TransBooster – wrapper technology for MT Assumption: MT systems perform better at translating short sentences than long ones. Decompose long sentences into shorter and syntactically simpler chunks, send to translation, recompose on output Decomposition linguistically guided by syntactic parse of the sentence

4 TransBooster – wrapper technology for MT TransBooster technology is universal and can be applied to any MT system Experiments to date: –TB and Rule-Based MT (Mellebeek et al., 2005a,b) –TB and Statistical MT (Mellebeek et al., 2006a) –TB and Multi-Engine MT (Mellebeek et al., 2006b) TransBooster outperforms baseline MT systems

5 TransBooster – decomposition Input – syntactically parsed sentence (Penn II format) Decompose into pivot and satellites –pivot: usually main predicate (plus additional material) –satellites: arguments and adjuncts Recursively decompose satellites if longer than x leaves Replace satellites around pivot with variables –static: simple same-type phrases with known translation –dynamic: simplified version of original satellites –send off to translation Insert each satellite into a template context –static: simple predicate with known translation –dynamic: simpler version of original clause (pivot + simplified arguments, no adjuncts) –send off to translation

6 TransBooster – decomposition example (S (NP (NP (DT the) (NN chairman)) (,,) (NP (NP (DT a) (JJ long-time) (NN rival)) (PP (IN of) (NP (NNP Bill) (NNP Gates)))) (,,)) (VP (VBZ likes) (NP (ADJP (JJ fast) (CC and) (JJ confidential)) (NNS deals))) (..)) [The chairman, a long-time rival of Bill Gates,] ARG1 [likes] pivot [fast and confidential deals] ARG2. [The chairman] V1 [likes] pivot [deals] V2. [The chairman, a long-time rival of Bill Gates,] ARG1 [likes deals] V1. [The chairman likes] V1 [fast and confidential deals] ARG2. [The man] V1 [likes] pivot [cars] V2. [The chairman, a long-time rival of Bill Gates,] ARG1 [is sleeping] V1. [The man sees] V1 [fast and confidential deals] ARG2. MT engine

7 TransBooster – recomposition MT output: a set of translations with dynamic and static variables and contexts for a sentence S Remove translations of dynamic variables and contexts from translation of S If unsuccessful, back off to translation with static variables and contexts, remove those Recombine translated pivot and satellites into output sentence

8 TransBooster – recomposition example [The chairman] V1 [likes] pivot [deals] V2. -> El presidente tiene gusto de repartos. [The chairman, a long-time rival of Bill Gates,] ARG1 [likes deals] V1. -> El presidente, un rival de largo plazo de Bill Gates, tiene gusto de repartos. [The chairman likes] V1 [fast and confidential deals] ARG2. -> El presidente tiene gusto de repartos rápidos y confidenciales. [The man] V1 [likes] pivot [cars] V2. -> El hombre tiene gusto de automóviles. [The chairman, a long-time rival of Bill Gates,] ARG1 [is sleeping] V1. -> El presidente, un rival de largo plazo de Bill Gates, está durmiendo. [The man sees] V1 [fast and confidential deals] ARG2. -> El hombre ve repartos rápidos y confidenciales. [El presidente, un rival de largo plazo de Bill Gates,] [tiene gusto de] [repartos rápidos y confidenciales]. Original translation: El presidente, rival de largo plazo de Bill Gates, gustos ayuna y los repartos confidenciales. The chairman, a long-time rival of Bill Gates, likes fast and confidential deals.

9 EBMT – Overview An aligned bilingual corpus Input text is matched against this corpus The best match is found and a translation is produced French F1 F2 F3 F4 EX (input) search F2 F4 FX (output) English E1 E2 E3 E4 Given in corpus John went to school  Jean est allé à l’école The butcher’s is next to the baker’s  La boucherie est à côté de la boulangerie Isolate useful fragments John went to  Jean est allé à the baker’s  la boulangerie We can now translate John went to the baker’s  Jean est allé à la boulangerie

10 EBMT – Marker-Based Chunking = {the,a,these……} = {le,la,l’,une,un,ces…..} = {on, of …} = {sur, d’..} English phrase : on virtually all uses of asbestos French translation: sur virtuellement tous usages d’asbeste on virtually all uses of asbestos sur virtuellement tous usages d’ asbeste Marker Chunks: on virtually : sur virtuellement all uses : tous usages of asbestos : d’asbeste Lexical Chunks: on : sur virtually : virtuellement all : tous uses : usages of : d’ asbestos : asbeste

11 EBMT – System Overview

12 Experiment English -> Spanish Two test sets: –Wall Street Journal section of Penn II Treebank 800 sentences –Europarl 800 sentences “Out-of-domain” factor: –TransBooster developed on perfect Penn II trees –EBMT trained on 958K English-Spanish Europarl sentences

13 Experiment – Results Results for EBMT vs TransBooster on 741-sentence test set from Europarl. Europarl BLEUNIST EBMT0.21115.9243 TransBooster0.21345.9342 Percent of Baseline101%100.2% Wall Street Journal BLEUNIST EBMT0.10984.9081 TransBooster0.11404.9321 Percent of Baseline103.8%100.5% Results for EBMT vs TransBooster on 800-sentence test set from Penn II Treebank. Automatic evaluation

14 Experiment - Results Manual evaluation 100 randomly selected sentences from EP test set: –source English sentence –EBMT translation –EBMT + TransBooster translation 3 judges, native speakers of Spanish fluent in English Accuracy and fluency: relative scale for comparing the two translations Inter-judge agreement (Kappa): Fluency > 0.948, Accuracy > 0.926 FluencyAccuracy TB > EBMT35.33%35% EBMT > TB16%19.33% Absolute quality gain when using TransBooster: Fluency 19.33% of sentences Accuracy 15.67% of sentences

15 Experiment – Results TB improvements: Example 1 Source: women have decided that they wish to work, that they wish to make their work compatible with their family life. EBMT: hemos decidido su deseo de trabajar, su deseo de hacer su trabajo compatible con su vida familiar. empresarias TB: mujeres han decidido su deseo de trabajar, su deseo de hacer su trabajo compatible con su vida familiar. Example 2 Source: if this global warming continues, then part of the territory of the eu member states will become sea or desert. EBMT: si esto continúa calentamiento global, tanto dentro del territorio de los estados miembros tendrán tornarse altamar o desértico TB: si esto calentamiento global perdurará, entonces parte del territorio de los estados miembros de la unión europea tendrán tornarse altamar o desértico

16 Previous experiments TransBooster vs. SMT on 800-sentence test set from Europarl. TB vs. SMT: EPBLEUNIST SMT0.19865.8393 TransBooster0.20525.8766 % of Baseline103.3%100.6% TB vs. RBMT: WSJBLEUNIST Rule-Based MT0.31087.3428 TransBooster0.31637.3901 % of Baseline101.7%100.6% Results for TransBooster vs. Rule-Based MT on 800-sentence test set from Penn II Treebank. TB vs. SMT: WSJBLEUNIST SMT0.13435.1432 TransBooster0. 13795.1259 % of Baseline102.7%99.7% TransBooster vs. SMT on 800-sentence test set from Penn II Treebank. TB vs. EBMT: EPBLEUNIST EBMT0.21115.9243 TransBooster0.21345.9342 % of Baseline101%100.2% TransBooster vs. EBMT on 800-sentence test set from Europarl. TB vs. EBMT: WSJBLEUNIST EBMT0.10984.9081 TransBooster0.11404.9321 % of Baseline103.8%100.5% TransBooster vs. EBMT on 800-sentence test set from Penn II Treebank.

17 Previous experiments TransBooster vs. SMT on 800-sentence test set from Europarl. TB vs. SMT: EPBLEUNIST SMT0.19865.8393 TransBooster0.20525.8766 % of Baseline103.3%100.6% TB vs. EBMT: EPBLEUNIST EBMT0.21115.9243 TransBooster0.21345.9342 % of Baseline101%100.2% TransBooster vs. EBMT on 800-sentence test set from Europarl.

18 Previous experiments TB vs. RBMT: WSJBLEUNIST Rule-Based MT0.31087.3428 TransBooster0.31637.3901 % of Baseline101.7%100.6% TransBooster vs. Rule-Based MT on 800-sentence test set from Penn II Treebank. TB vs. SMT: WSJBLEUNIST SMT0.13435.1432 TransBooster0. 13795.1259 % of Baseline102.7%99.7% TransBooster vs. SMT on 800-sentence test set from Penn II Treebank. TB vs. EBMT: WSJBLEUNIST EBMT0.10984.9081 TransBooster0.11404.9321 % of Baseline103.8%100.5% TransBooster vs. EBMT on 800-sentence test set from Penn II Treebank.

19 Summary TransBooster is a universal technology to decompose and recompose MT text Net improvement in translation quality against EBMT: Fluency 19.33% of sentences Accuracy 15.67% of sentences Successful experiments to date: rule-based MT, phrase-based SMT, multi-engine MT, EBMT Journal article in preparation

20 Thank You


Download ppt "Wrapper Syntax for Example-Based Machine Translation Karolina Owczarzak, Bart Mellebeek, Declan Groves, Josef Van Genabith, Andy Way National Centre for."

Similar presentations


Ads by Google