Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploding the Myth the gerund in machine translation

Similar presentations


Presentation on theme: "Exploding the Myth the gerund in machine translation"— Presentation transcript:

1 Exploding the Myth the gerund in machine translation
Nora Aranberri

2 Background Nora Aranberri Symantec
PhD student at CTTS (Dublin City University) Funded by Enterprise Ireland and Symantec (Innovation Partnerships Programme) Symantec Software publisher Localisation requirements Translation – Rule-based machine translation system (Systran) Documentation authoring – Controlled language (CL checker: acrocheck™) Project: CL checker rule refinement

3 The gerund is handled badly by MT systems and should be avoided
The Myth The gerund is handled badly by MT systems and should be avoided The gerund is handled badly by MT systems and should be avoided Sources: translators, post-editors, scholars Considered a translation issue for MT due to its ambiguity Bernth & McCord, 2000; Bernth & Gdaniec, 2001 Addressed by CLs Adriaens & Schreurs, 1992; Wells Akis, 2003; O’Brien 2003; Roturier, 2004 Sources: translators, post-editors, scholars Considered a translation issue for MT due to its ambiguity Bernth & McCord, 2000; Bernth & Gdaniec, 2001 Addressed by CLs Adriaens & Schreurs, 1992; Wells Akis, 2003; O’Brien 2003; Roturier, 2004

4 What is a gerund? -ing either a gerund, a participle, or continuous tense keeping the same form Examples GERUND: Steps for auditing SQL Server instances. PARTICIPLE: When the job completes, BACKINT saves a copy of the Backup Exec restore logs for auditing purposes. CONTINUOUS TENSE: Server is auditing and logging. Conclusion: gerunds and participles can be difficult to differentiate for MT.

5 Methodology: creating the corpus
Initial corpus Risk management components texts 494,618 words uncontrolled Structure of study Preposition or subordinate conjunction + -ing Extraction of relevant segments acrocheck™: CL checker asked to flag the patterns of the structure IN + VBG|NN|JJ “-ing” 1,857 sentences isolated

6 Methodology: translation
Apply machine translation for target language MT used: Systran Server 5.05 Dictionaries No specific dictionaries created for the project Systran in-built computer science dictionary applied Languages Source language: English Target languages: Spanish, French, German and Japanese

7 Methodology: evaluation (1)
Evaluators one evaluator per target language only native speakers of the target languages translators / MA students with experience in MT Evaluation format

8 Methodology: evaluation (2)
Analysis of the relevant structure only Questions: Q1: is the structure correct? Q2: is the error due to the misinterpretation of the source or because the target is poorly generated? Both are “yes/no” questions.

9 Results: prepositions / subordinate conjunctions
preposition examples by + ing 377 for + ing 339 when + ing 256 before + ing 163 after + ing 122 about + ing 96 on + ing 89 without + ing 75 of + ing 71 from + ing 68 while + ing 54 in + ing 36 if + ing 19 rather than + ing 14 such as + ing 13 TOTAL 1857 %

10 Results: correctness for Spanish
Spanish preposition examples correct incorrect by + ing 377 351 26 for + ing 339 243 96 when + ing 256 205 51 before + ing 163 145 18 after + ing 122 107 15 about + ing 82 14 on + ing 89 38 without + ing 75 47 28 of + ing 71 65 6 from + ing 68 30 while + ing 54 3 in + ing 36 27 9 if + ing 19 4 rather than + ing such as + ing 13 TOTAL 1857 1393 464 % 75.01% 24.99%

11 Results: correctness for French
Spanish French preposition examples correct incorrect by + ing 377 351 26 358 19 for + ing 339 243 96 284 55 when + ing 256 205 51 2 254 before + ing 163 145 18 146 17 after + ing 122 107 15 117 5 about + ing 82 14 on + ing 89 38 80 9 without + ing 75 47 28 65 10 of + ing 71 6 from + ing 68 30 31 37 while + ing 54 3 45 in + ing 36 27 if + ing 4 rather than + ing such as + ing 13 TOTAL 1857 1393 464 1341 516 % 75.% 24.99% 72.21% 27.79%

12 Results: correctness for German
Spanish French German preposition examples correct incorrect by + ing 377 351 26 358 19 364 13 for + ing 339 243 96 284 55 262 77 when + ing 256 205 51 2 254 213 43 before + ing 163 145 18 146 17 after + ing 122 107 15 117 5 114 8 about + ing 82 14 88 on + ing 89 38 80 9 58 31 without + ing 75 47 28 65 10 71 4 of + ing 6 60 11 from + ing 68 30 37 24 44 while + ing 54 3 45 27 in + ing 36 23 if + ing rather than + ing such as + ing TOTAL 1857 1393 464 1341 516 1514 343 % 75.01% 24.99% 72.21% 27.79% 81.53% 18.47%

13 Results: correctness for Japanese
Spanish French German Japanese preposition examples correct incorrect by + ing 377 351 26 358 19 364 13 301 76 for + ing 339 243 96 284 55 262 77 224 115 when + ing 256 205 51 2 254 213 43 161 95 before + ing 163 145 18 146 17 134 29 after + ing 122 107 15 117 5 114 8 108 14 about + ing 82 88 on + ing 89 38 80 9 58 31 60 without + ing 75 47 28 65 10 71 4 66 of + ing 6 11 57 from + ing 68 30 37 24 44 33 35 while + ing 54 3 45 27 in + ing 36 23 if + ing rather than + ing 1 such as + ing TOTAL 1857 1393 464 1341 516 1514 343 1303 554 % 75.% 24.99% 72.21% 27.79% 81.53% 18.47% 70.17% 29.83%

14 Significant results Spanish French German Japanese preposition
Spanish French German Japanese preposition examples correct incorrect by + ing 377 351 26 358 19 364 13 301 76 for + ing 339 243 96 284 55 262 77 224 115 when + ing 256 205 51 2 254 213 43 161 95 before + ing 163 145 18 146 17 134 29 after + ing 122 107 15 117 5 114 8 108 14 about + ing 82 88 on + ing 89 38 80 9 58 31 60 without + ing 75 47 28 65 10 71 4 66 of + ing 6 11 57 from + ing 68 30 37 24 44 33 35 whil e + ing 54 3 45 27 in + ing 36 23 if + ing rather than + ing 1 such as + ing TOTAL 1857 1393 464 1341 516 1514 343 1303 554 % 75.% 24.99% 72.21% 27.79% 81.53% 18.47% 70.17% 29.83%

15 Results: correlation of problematic structures
The most problematic structures seem to strongly correlate across languages Top 6 prep/conj account for >65% of errors

16 Analysis and generation errors
Spanish French German Japanese preposition examples Source-error Target-error by + ing 377 4 27 10 13 9 16 58 for + ing 339 37 120 55 33 47 30 82 when + ing 256 49 38 3 93 before + ing 163 17 14 22  after + ing 122 5 12 1 7 11  about + ing 96 51 on + ing 89 2 57 without + ing 75 26 8 of + ing 71 from + ing 68 36 43 while + ing 54 50 in + ing 6 12  18  if + ing 19 rather than + ing 13  such as + ing TOTAL 1857 106 523 83 514 85 267 98 459 % 0.60% 0.63% 0.54% 0.74% 0.61% 0.72%

17 Analysis and generation errors
Spanish French German Japanese preposition examples Source-error Target-error by + ing 377 4 27 10 13 9 16 58 for + ing 339 37 120 55 33 47 30 82 when + ing 256 49 38 3 93 before + ing 163 17 14 22  after + ing 122 5 12 1 7 11  about + ing 96 51 on + ing 89 2 57 without + ing 75 26 8 of + ing 71 from + ing 68 36 43 while + ing 54 50 in + ing 6 12  18  if + ing 19 rather than + ing 13  such as + ing TOTAL 1857 106 523 83 514 85 267 98 459 % 0.60% 0.63% 0.54% 0.74% 0.61% 0.72%

18 Analysis and generation errors
Spanish French German Japanese preposition examples Source-error Target-error by + ing 377 4 27 10 13 9 16 58 for + ing 339 37 120 55 33 47 30 82 when + ing 256 49 38 3 93 before + ing 163 17 14 22  after + ing 122 5 12 1 7 11  about + ing 96 51 on + ing 89 2 57 without + ing 75 26 8 of + ing 71 from + ing 68 36 43 while + ing 54 50 in + ing 6 12  18  if + ing 19 rather than + ing 13  such as + ing TOTAL 1857 106 523 83 514 85 267 98 459 % 0.60% 0.63% 0.54% 0.74% 0.61% 0.72%

19 Analysis and generation errors
Spanish French German Japanese preposition examples Source-error Target-error by + ing 377 4 27 10 13 9 16 58 for + ing 339 37 120 55 33 47 30 82 when + ing 256 49 38 3 93 before + ing 163 17 14 22  after + ing 122 5 12 1 7 11  about + ing 96 51 on + ing 89 2 57 without + ing 75 26 8 of + ing 71 from + ing 68 36 43 while + ing 54 50 in + ing 6 12  18  if + ing 19 rather than + ing 13  such as + ing TOTAL 1857 106 523 83 514 85 267 98 459 % 0.60% 0.63% 0.54% 0.74% 0.61% 0.72%

20 Analysis and generation errors
Spanish French German Japanese preposition examples Source-error Target-error by + ing 377 4 27 10 13 9 16 58 for + ing 339 37 120 55 33 47 30 82 when + ing 256 49 38 3 93 before + ing 163 17 14 22  after + ing 122 5 12 1 7 11  about + ing 96 51 on + ing 89 2 57 without + ing 75 26 8 of + ing 71 from + ing 68 36 43 while + ing 54 50 in + ing 6 12  18  if + ing 19 rather than + ing 13  such as + ing TOTAL 1857 106 523 83 514 85 267 98 459 % 0.60% 0.63% 0.54% 0.74% 0.61% 0.72%

21 Analysis and generation errors
Spanish French German Japanese preposition examples Source-error Target-error by + ing 377 4 27 10 13 9 16 58 for + ing 339 37 120 55 33 47 30 82 when + ing 256 49 38 3 93 before + ing 163 17 14 22  after + ing 122 5 12 1 7 11  about + ing 96 51 on + ing 89 2 57 without + ing 75 26 8 of + ing 71 from + ing 68 36 43 while + ing 54 50 in + ing 6 12  18  if + ing 19 rather than + ing 13  such as + ing TOTAL 1857 106 523 83 514 85 267 98 459 % 0.60% 0.63% 0.54% 0.74% 0.61% 0.72%

22 Source and target error distribution
Target errors seem to be more important across languages The prep/conj with the highest error rate and common to 3 or 4 target languages cover 43-54% of source errors and 48-59% of target errors Spanish French German Japanese Source-error Target-error for + ing 37 120 55 33 47 30 82 when + ing 13 49 256 10 38 3 93 from + ing 5 36 1 43 8 on + ing 51 9 2 57 SUM 58 357 45 158 265 Total 106 523 83 514 85 267 98 459 % 54.72% 48.95 45.78 69.45 52.94 59.18 43.88 57.73

23 Conclusions Overall success rate between 70-80% for all languages
Target language generation errors are higher than the errors due to the misinterpretation of the source. Great diversity of prepositions/subordinate conjunctions with varying appearance rates. Strong correlation of results across languages.

24 Next steps Further evaluations to consolidate results
4 evaluators per language Present sentences to the evaluators out of alphabetical order by preposition/conjunction Note the results for the French “when”. Make these findings available to the writing teams Take our prominent issues Source issues controlled language or pre-processing Formulate more specific rules in acrocheck to handle the most problematic structures/prepositions and reduce false positives Standardise structures with low frequencies Target issues post-processing or MT improvements

25 References Adriaens, G. and Schreurs, D., (1992) ‘From COGRAM to ALCOGRAM: Toward a Controlled English Grammar Checker’, 14th International Conference on Computational Linguistics, COLING-92, Nantes, France, August, 1992, Bernth, A. and Gdaniec, C. (2001) ‘MTranslatability’ Machine Translation 16: Bernth, A. and McCord, M. (2000) ‘The Effect of Source Analysis on Translation Confidence’, in White, J. S.,  eds., Envisioning Machine Translation in the Information Future: 4th Conference of the Association for Machine Translation in the Americas, AMTA 2000, Cuernavaca, Mexico, October, 2000, Springer: Berlin, O’Brien, S. (2003) ‘Controlling Controlled English: An Analysis of Several Controlled Language Rule Sets’, in Proceedings of the 4th Controlled Language Applications Workshop (CLAW 2003), Dublin, Ireland, May, 2003, Roturier, J. (2004) ‘Assessing a set of Controlled Language rules: Can they improve the performance of commercial Machine Translation systems?’, in ASLIB Conference Proceedings, Translating and the Computer 26, London, November, 2004, 1-14. Wells Akis, J. and Sisson, R. (2003) ‘Authoring translation-ready documents: is software the answer?’, in Proceedings of the 21st annual international conference on Documentation, SIGDOC 2003, San Francisco, CA, USA, October 12-15, 2003,

26 e-mail: nora.aranberrimonasterioATdcu.ie
Thank you! nora.aranberrimonasterioATdcu.ie


Download ppt "Exploding the Myth the gerund in machine translation"

Similar presentations


Ads by Google