Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploding the Myth the gerund in machine translation Nora Aranberri.

Similar presentations


Presentation on theme: "Exploding the Myth the gerund in machine translation Nora Aranberri."— Presentation transcript:

1 Exploding the Myth the gerund in machine translation Nora Aranberri

2 Optional Footer Information Here Background Nora Aranberri –PhD student at CTTS (Dublin City University) –Funded by Enterprise Ireland and Symantec (Innovation Partnerships Programme) Symantec –Software publisher –Localisation requirements Translation – Rule-based machine translation system (Systran) Documentation authoring – Controlled language (CL checker: acrocheck) –Project: CL checker rule refinement

3 Optional Footer Information Here The Myth Sources: translators, post-editors, scholars –Considered a translation issue for MT due to its ambiguity Bernth & McCord, 2000; Bernth & Gdaniec, 2001 –Addressed by CLs Adriaens & Schreurs, 1992; Wells Akis, 2003; OBrien 2003; Roturier, 2004 The gerund is handled badly by MT systems and should be avoided Sources: translators, post-editors, scholars –Considered a translation issue for MT due to its ambiguity Bernth & McCord, 2000; Bernth & Gdaniec, 2001 –Addressed by CLs Adriaens & Schreurs, 1992; Wells Akis, 2003; OBrien 2003; Roturier, 2004 The gerund is handled badly by MT systems and should be avoided

4 Optional Footer Information Here What is a gerund? -ing either a gerund, a participle, or continuous tense keeping the same form Examples –GERUND: Steps for auditing SQL Server instances. –PARTICIPLE: When the job completes, BACKINT saves a copy of the Backup Exec restore logs for auditing purposes. –CONTINUOUS TENSE: Server is auditing and logging. Conclusion: gerunds and participles can be difficult to differentiate for MT.

5 Optional Footer Information Here Methodology: creating the corpus Initial corpus –Risk management components texts –494,618 words –uncontrolled Structure of study –Preposition or subordinate conjunction + -ing Extraction of relevant segments –acrocheck: CL checker asked to flag the patterns of the structure IN + VBG|NN|JJ -ing –1,857 sentences isolated

6 Optional Footer Information Here Methodology: translation Apply machine translation for target language –MT used: Systran Server 5.05 –Dictionaries No specific dictionaries created for the project Systran in-built computer science dictionary applied –Languages Source language: English Target languages: Spanish, French, German and Japanese

7 Optional Footer Information Here Methodology: evaluation (1) Evaluators –one evaluator per target language only –native speakers of the target languages –translators / MA students with experience in MT Evaluation format

8 Optional Footer Information Here Methodology: evaluation (2) Analysis of the relevant structure only Questions: –Q1: is the structure correct? –Q2: is the error due to the misinterpretation of the source or because the target is poorly generated? Both are yes/no questions.

9 Optional Footer Information Here Results: prepositions / subordinate conjunctions preposition example s by + ing377 for + ing339 when + ing256 before + ing163 after + ing 122 about + ing96 on + ing89 without + ing75 of + ing71 from + ing68 while + ing54 in + ing36 if + ing19 rather than + ing 14 such as + ing13 TOTAL1857 %

10 Optional Footer Information Here Results: correctness for Spanish Spanish preposition example scorrect incorrec t by + ing37735126 for + ing33924396 when + ing25620551 before + ing16314518 after + ing12210715 about + ing968214 on + ing893851 without + ing754728 of + ing71656 from + ing683038 while + ing54351 in + ing36279 if + ing19154 rather than + ing 140 such as + ing1394 TOTAL18571393464 % 75.01%24.99%

11 Optional Footer Information Here Results: correctness for French SpanishFrench preposition example scorrectincorrectcorrect incorrec t by + ing3773512635819 for + ing3392439628455 when + ing256205512254 before + ing1631451814617 after + ing122107151175 about + ing9682148214 on + ing893851809 without + ing7547286510 of + ing71656 6 from + ing6830383137 while + ing54351459 in + ing362799 if + ing19154109 rather than + ing 140 0 such as + ing139494 TOTAL185713934641341516 % 75.%24.99%72.21%27.79%

12 Optional Footer Information Here Results: correctness for German SpanishFrenchGerman preposition example scorrectincorrectcorrectincorrectcorrect incorrec t by + ing377351263581936413 for + ing339243962845526277 when + ing25620551225421343 before + ing163145181461714518 after + ing1221071511751148 about + ing9682148214888 on + ing8938518095831 without + ing7547286510714 of + ing71656 66011 from + ing68303831372444 while + ing5435145927 in + ing362799 2313 if + ing19154109172 rather than + ing 140 0 0 such as + ing13949494 TOTAL1857139346413415161514343 % 75.01%24.99%72.21%27.79%81.53%18.47%

13 Optional Footer Information Here Results: correctness for Japanese SpanishFrenchGermanJapanese prepositionexamplescorrectincorrectcorrectincorrectcorrectincorrectcorrectincorrect by + ing 37735126358193641330176 for + ing 339243962845526277224115 when + ing 2562055122542134316195 before + ing 16314518146171451813429 after + ing 122107151175114810814 about + ing 9682148214888 8 on + ing 89385180958312960 without + ing 7547286510714669 of + ing 71656 660115714 from + ing 683038313724443335 while + ing 5435145927 4410 in + ing 362799 2313927 if + ing 19154109172 2 rather than + ing 140 0 0 113 such as + ing 1394949485 TOTAL 18571393464134151615143431303554 % 75.%24.99%72.21%27.79%81.53%18.47%70.17%29.83%

14 Optional Footer Information Here Significant results SpanishFrenchGermanJapanese preposition example scorrectincorrectcorrectincorrectcorrectincorrectcorrectincorrect by + ing377 35126358193641330176 for + ing339 243962845526277224115 when + ing256 2055122542134316195 before + ing163 14518146171451813429 after + ing122 107151175114810814 about + ing96 82148214888 8 on + ing89 385180958312960 without + ing75 47286510714669 of + ing71 656 660115714 from + ing68 3038313724443335 whil e + ing54 35145927 4410 in + ing36 2799 2313927 if + ing19 154109172 2 rather than + ing 14 0 0 0 113 such as + ing13 94949485 TOTAL1857 1393464134151615143431303554 % 75.%24.99%72.21%27.79%81.53%18.47%70.17%29.83%

15 Optional Footer Information Here Results: correlation of problematic structures The most problematic structures seem to strongly correlate across languages Top 6 prep/conj account for >65% of errors

16 Optional Footer Information Here Analysis and generation errors SpanishFrenchGermanJapanese preposition example s Source- error Target- error Source- error Target- error Source- error Target- error Source- error Target- error by + ing377 4271013491658 for + ing339 37120375533473082 when + ing256 134902561038393 before + ing163 4274174148 22 after + ing122 51255174 11 about + ing96 7511013534 1 on + ing89 35109130257 without + ing75 32628221 8 of + ing71 4437487 11 from + ing68 536137143833 while + ing54 25028326010 in + ing36 5762721312 18 if + ing19 1319200 2 rather than + ing 14 0 0 0 0 13 such as + ing13 3814223 2 TOTAL1857 106523835148526798459 % 0.60%0.63%0.54%0.74%0.61%0.72%0.60%0.72%

17 Optional Footer Information Here Analysis and generation errors SpanishFrenchGermanJapanese preposition example s Source- error Target- error Source- error Target- error Source- error Target- error Source- error Target- error by + ing377 4271013491658 for + ing339 37120375533473082 when + ing256 134902561038393 before + ing163 4274174148 22 after + ing122 51255174 11 about + ing96 7511013534 1 on + ing89 35109130257 without + ing75 32628221 8 of + ing71 4437487 11 from + ing68 536137143833 while + ing54 25028326010 in + ing36 5762721312 18 if + ing19 1319200 2 rather than + ing 14 0 0 0 0 13 such as + ing13 3814223 2 TOTAL1857 106523835148526798459 % 0.60%0.63%0.54%0.74%0.61%0.72%0.60%0.72%

18 Optional Footer Information Here Analysis and generation errors SpanishFrenchGermanJapanese preposition example s Source- error Target- error Source- error Target- error Source- error Target- error Source- error Target- error by + ing377 4271013491658 for + ing339 37120375533473082 when + ing256 134902561038393 before + ing163 4274174148 22 after + ing122 51255174 11 about + ing96 7511013534 1 on + ing89 35109130257 without + ing75 32628221 8 of + ing71 4437487 11 from + ing68 536137143833 while + ing54 25028326010 in + ing36 5762721312 18 if + ing19 1319200 2 rather than + ing 14 0 0 0 0 13 such as + ing13 3814223 2 TOTAL1857 106523835148526798459 % 0.60%0.63%0.54%0.74%0.61%0.72%0.60%0.72%

19 Optional Footer Information Here Analysis and generation errors SpanishFrenchGermanJapanese preposition example s Source- error Target- error Source- error Target- error Source- error Target- error Source- error Target- error by + ing377 4271013491658 for + ing339 37120375533473082 when + ing256 134902561038393 before + ing163 4274174148 22 after + ing122 51255174 11 about + ing96 7511013534 1 on + ing89 35109130257 without + ing75 32628221 8 of + ing71 4437487 11 from + ing68 536137143833 while + ing54 25028326010 in + ing36 5762721312 18 if + ing19 1319200 2 rather than + ing 14 0 0 0 0 13 such as + ing13 3814223 2 TOTAL1857 106523835148526798459 % 0.60%0.63%0.54%0.74%0.61%0.72%0.60%0.72%

20 Optional Footer Information Here Analysis and generation errors SpanishFrenchGermanJapanese preposition exampl es Source- error Target- error Source- error Target- error Source- error Target- error Source- error Target- error by + ing377 4271013491658 for + ing339 37120375533473082 when + ing256 134902561038393 before + ing163 4274174148 22 after + ing122 51255174 11 about + ing96 7511013534 1 on + ing89 35109130257 without + ing75 32628221 8 of + ing71 4437487 11 from + ing68 536137143833 while + ing54 25028326010 in + ing36 5762721312 18 if + ing19 1319200 2 rather than + ing 14 0 0 0 0 13 such as + ing13 3814223 2 TOTAL1857 106523835148526798459 % 0.60%0.63%0.54%0.74%0.61%0.72%0.60%0.72%

21 Optional Footer Information Here Analysis and generation errors SpanishFrenchGermanJapanese preposition example s Source -error Target- error Source- error Target- error Source- error Target- error Source- error Target- error by + ing377 4271013491658 for + ing339 37120375533473082 when + ing256 134902561038393 before + ing163 4274174148 22 after + ing122 51255174 11 about + ing96 7511013534 1 on + ing89 35109130257 without + ing75 32628221 8 of + ing71 4437487 11 from + ing68 536137143833 while + ing54 25028326010 in + ing36 5762721312 18 if + ing19 1319200 2 rather than + ing 14 0 0 0 0 13 such as + ing13 3814223 2 TOTAL1857 106523835148526798459 % 0.60%0.63%0.54%0.74%0.61%0.72%0.60%0.72%

22 Optional Footer Information Here Source and target error distribution Target errors seem to be more important across languages The prep/conj with the highest error rate and common to 3 or 4 target languages cover 43-54% of source errors and 48-59% of target errors SpanishFrenchGermanJapanese Source -error Target- error Source -error Target- error Source -error Target- error Source- error Target- error for + ing 37120375533473082 when + ing 134902561038393 from + ing 536137143833 on + ing 35109130257 SUM 58256383574515843265 Total 106523835148526798459 % 54.72%48.9545.7869.4552.9459.1843.8857.73

23 Optional Footer Information Here Conclusions Overall success rate between 70-80% for all languages Target language generation errors are higher than the errors due to the misinterpretation of the source. Great diversity of prepositions/subordinate conjunctions with varying appearance rates. Strong correlation of results across languages.

24 Optional Footer Information Here Next steps Further evaluations to consolidate results –4 evaluators per language –Present sentences to the evaluators out of alphabetical order by preposition/conjunction –Note the results for the French when. Make these findings available to the writing teams Take our prominent issues –Source issues controlled language or pre-processing –Formulate more specific rules in acrocheck to handle the most problematic structures/prepositions and reduce false positives Standardise structures with low frequencies –Target issues post-processing or MT improvements

25 Optional Footer Information Here References Adriaens, G. and Schreurs, D., (1992) From COGRAM to ALCOGRAM: Toward a Controlled English Grammar Checker, 14th International Conference on Computational Linguistics, COLING-92, Nantes, France, 23-28 August, 1992, 595-601. Bernth, A. and Gdaniec, C. (2001) MTranslatability Machine Translation 16: 175-218. Bernth, A. and McCord, M. (2000) The Effect of Source Analysis on Translation Confidence, in White, J. S., eds., Envisioning Machine Translation in the Information Future: 4th Conference of the Association for Machine Translation in the Americas, AMTA 2000, Cuernavaca, Mexico, 10-14 October, 2000, Springer: Berlin, 89-99. OBrien, S. (2003) Controlling Controlled English: An Analysis of Several Controlled Language Rule Sets, in Proceedings of the 4th Controlled Language Applications Workshop (CLAW 2003), Dublin, Ireland, 15-17 May, 2003, 105-114. Roturier, J. (2004) Assessing a set of Controlled Language rules: Can they improve the performance of commercial Machine Translation systems?, in ASLIB Conference Proceedings, Translating and the Computer 26, London, 18-19 November, 2004, 1-14. Wells Akis, J. and Sisson, R. (2003) Authoring translation-ready documents: is software the answer?, in Proceedings of the 21st annual international conference on Documentation, SIGDOC 2003, San Francisco, CA, USA, October 12-15, 2003, 38-44.

26 Optional Footer Information Here Thank you! e-mail: nora.aranberrimonasterio AT dcu.ie


Download ppt "Exploding the Myth the gerund in machine translation Nora Aranberri."

Similar presentations


Ads by Google