The Link between Controlled Language and Post-Editing: An Empirical Investigation of Technical, Temporal and Cognitive Effort Sharon OBrien, CTTS/SALIS.

1 The Link between Controlled Language and Post-Editing: An Empirical Investigation of Technical, Temporal and Cognitive Effort Sharon OBrien, CTTS/SALIS

2 Overview Research Parameters Temporal Effort Technical Effort Cognitive Effort Conclusions

3 Definition –an explicitly defined restriction of a natural language that specifies constraints on lexicon, grammar, and style. (Huijsen, 1998: 2)

4 Motivation – In a Nutshell Can the introduction of CL rules really improve MT output such that post-editing effort is reduced?

5 Machine Translatability One of the main goals of CL The notion of translatability is based on so-called "translatability indicators" where the occurrence of such an indicator in the text is considered to have a negative effect on the quality of machine translation. The fewer translatability indicators, the better suited the text is to translation using MT. (Underwood and Jongejan 2001: 363)

6 Machine Translatability Negative Translatability Indicators –NTIs for short –Examples (for English as SL) Long noun phrases Passive voice Ungrammatical constructs Use of slang… –Use of NTI list (Bernth/Gdaniec 2001) –Use of term minimal NTI

7 Research Design SL: English; TL: German Text Type: User Manual (1 777 words) Users: 12 Professional Translators Tools: IBM Websphere, Translog, IBMs EasyEnglishAnalyzer, Sun Microsystems Sunproof Place of Data Capture: IBM Stuttgart

8 Methodology Edit SL text to create two sentence types: –S(nti) = sentences with known negative translatability indicators –S(min-nti) = sentences where all listed NTIs had been removed 9 subjects: post-editing (P1-P9) 3 subjects: translating (T1-T3) First pass exercise, no QA

9 Temporal Effort Post-Editing vs. Translation –median words per minute Subject Type Median words per minute Post- Editors Translators13.63

10 Temporal Effort (2) Post-Editing vs. Translation –median processing speed Processing speed is the total number of source words in each segment divided by the total processing time for that segment –i.e. words processed per second

11 Median Processing Speed S(ntis) vs. S(min-ntis) Segment TypeMedian Processing Speed S(nti).350 S(min-nti).435

12 Temporal Effort: Conclusions The post-editing task was completed faster than the translation task. –First-pass exercise/No QA The median processing speeds for S(min- nti) segments were significantly higher than S(nti) segments So, from a temporal point of view, it seems that the introduction of CL benefits turnaround times

13 Technical Effort Measured using Translog: –Keyboarding Deletions, insertions, cuts, pastes –Dictionary Look-Up Activity

14 Translog

15 Sample Linear Repetition File

16 Keyboarding Median Measurements Segment Type Median Deletions Median Insertions Median Cuts Median Pastes S(nti) S(min-nti)

17 Keyboarding Median Measurements Small difference between the two segment types, but statistically significant for insertions/deletions Cutting and pasting: very limited even though post-editors recycled whole chunks of text

18 Use of the Translog Dictionary Training and practice prior to task All users reported being comfortable with the feature

19 Data on Dictionary Usage SubjectSuccessful Dictionary Look- Up Unsuccessful Dictionary Look- Up P100 P201 P305 P400 P501 P610 P701 P805 P901

20 Possible Explanations? Subjects not as familiar with feature as they reported Subjects felt it was unnecessary to use dictionary Subjects used to having terms suggested on-screen with TM/Terminology tool Subjects lost faith in the feature when they encountered problems

21 Conclusions on Technical Effort S(min-nti) segments require significantly fewer deletions and insertions than S(nti) segments. Cutting and pasting was a very rare activity for both segment types. Dictionary searches were uncommon during this study. When they were carried out, the search facility was frequently used incorrectly.

22 Technical/Temporal Combined Results on technical post-editing effort add to the evidence presented above on temporal post-editing effort and further supports the claim that the elimination of NTIs from a segment can reduce post- editing effort.

23 Cognitive Effort Potential Methodologies –TAP (rejected) –Pause Analysis –Choice Network Analysis –Eye tracking (unavailable at the time)

24 Pause Behaviour No discernible correlations between pause behaviour and post-editing activity –Pause analysis rejected

25 Cognitive Effort Choice Network Analysis

26 …Choice Network Analysis compares the renditions of a single string of translation by multiple translators in order to propose a network of choices that theoretically represents the cognitive model available to any translator for translating that string. The technique is favoured over the think-aloud method, which is acknowledged as not being able to access automaticized processes. (Campbell, 2000: 215)

27 Example – Sentence with NTIs ST: –Save the document(s). Raw MT output: –Sichern Sie das Dokument(s). NTIs for this sentence: –Short segment –Use of (s) for plural

28 MTSichern Siedas Dokument (s.) P1Sichern Siedas Dokument/die Dokumente. P2Sichern Siedas bzw. die Dokumente. P3Sichern Siedas/die Dokument/e. P4Sichern Siedas/die Dokument (e). P6Sichern Siedas Dokument/die Dokumente P7Sichern Siedas Dokument. P8Speichern Siedas/dieDokument(e). P9Sichern Siedas Dokument.

29 Example – Sentence with minimal NTIs ST: –The editor contains a menu and a toolbar. Raw MT output: –Der Editor enthält ein Menü und eine Symbolleiste.

30 MTDer Editorenthältein Menüundeine Symbolleiste. P1Der Editorenthältein Menüundeine Symbolleiste. P2Der Editorenthältein Menüundeine Symbolleiste. P3Der Editorenthältein Menüundeine Symbolleiste. P4Der Editorenthältein Menüundeine Symbolleiste. P5Der Editorenthältein Menüundeine Symbolleiste. P6Der Editorenthältein Menüundeine Symbolleiste. P7Der Editorenthältein Menüundeine Symbolleiste. P8Der Editorenthältein Menüundeine Symbolleiste. P9Der Editorenthältein Menüundeine Symbolleiste.

31 NTIs and Cognitive Effort Using CNA as a guide, NTIs categorised into: High impact on post-editing effort –50% or more of the occurrences of the NTI resulted in post-editing by two or more post-editors Moderate impact on post-editing effort –Between 31% and 49% of occurrences Low impact on post-editing effort –30% or fewer occurrences

32 Correlating Measurements By combining data on temporal, technical and cognitive effort: High Impact NTIs –Use of the gerund –Proper nouns –Problematic punctuation –Ungrammatical constructs –Use of (s) for plural –Non-finite verbs –Incomplete syntactic unit –Long NP –Short segment

33 Correlating Measurements Moderate impact NTIs: –Multiple coordinators –Passive voice –Personal pronouns –Use of a slash as a separator –Ambiguous scope in coordination –Parentheses

34 Correlating Measurements Low impact NTIs: –Abbreviations –Demonstrative pronouns –Missing in order to –Contractions

35 Conclusion Within the limited scope of this research, we now have empirical evidence to support the assertion that controlling the input to MT leads to lower post-editing effort. The elimination of some NTIs can have a higher impact than other NTIs –Is it worth having a relatively high number of CL rules? Even if we remove known NTIs, MT engines are still likely to produce some errors and post- editors are still likely to post-edit.

