Assessing the Usability of Machine Translated Content: A User-Centred Study using Eye Tracking Dr. Stephen Doherty & Dr. Sharon O’Brien Centre for Next Generation Localisation School of Applied Language & Intercultural Studies Dublin City University
Outline Introduction Research Aims Methods Results Conclusions
Introduction Increased need for translation Diversity of content and users Rise in prevalence of machine translation [MT] both off- and online Mixed reports of quality – attitudes and expectations Divergence in R&D – translation studies/computer science Evaluation metrics – human and automatic Our focus here is on usability
Research Aims To investigate if there are differences in usability between the English [source language] and the unedited machine translated target languages [FR, DE, SP, JP]. Or in other words: how usable is machine translated content? Adoption of the ISO/TR 16982 definition of usability Importance of ecological validity: real materials and users
Methods User-centred approach [n = 30]; task driven – ‘new user’ scenario Eye tracking [tobii 1750]: Fixation count and average duration Attentional shifts; percentage time in each window Textual regressions
Methods Post-task questionnaire; five-point Likert Comprehension Task completion Potential improvement Future reuse Recommendation Recall
Methods Usability Satisfaction Efficiency [task success/task time]
Eye Tracking Task time Fixation count and average duration Lowest for EN [sig. JP] Fixation count and average duration Lowest for EN [sig. JP] for both Attentional shifts; percentage time in each window EN and FR spent most time in task window EN fewest shifts of attention [sig. JP] Textual regressions Raw number and distance: EN and SP [sig. JP] ‘Long’ regressions: JP [sig. all others]
Questionnaire Results Comprehension EN rated highest [sig. for FR and JP] Task completion EN rated highest [sig. for JP] Potential improvement SP & EN rated as needing least improvement, but could still be improved upon Future reuse FR & EN rated highest Recommendation EN rated highest [sig. for JP and DE] Recall EN scored highest [sig. for JP and DE]
Usability Results Task completion Efficiency Satisfaction EN rated highest [sig. for FR, DE, and JP] Task completion EN and SP more successful [sig. JP] Efficiency EN most efficient [sig. JP and DE]
Conclusions So, just how usable is raw MT? Similar results for EN, SP, and FR DE and JP more problematic [MT system] Functionally usable [more than just ‘gisting’] UX best for EN users MT viable for certain pairs Human intervention necessary to ensure best UX
stephen.doherty@dcu.ie sharon.obrien@dcu.ie Questions? stephen.doherty@dcu.ie sharon.obrien@dcu.ie This research is supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation (www.cngl.ie) at Dublin City University.
Predictors of Positive UX Satisfied users: comprehension & task time Satisfied users: recommend to others Task completion: textual regressions Cognitive effort: instructions aiding task completion