Download presentation
Presentation is loading. Please wait.
Published byKelley Wilkerson Modified over 9 years ago
1
AUTOMATIC TRANSLATION UTILITY Fostering language diversity and participation Juan Dolio, DR, 11-14 November 2008 Stéphane Bruno, AHTIC/CONSORTIUM CARISNET sbruno@websystems.ht
2
LANGUAGE STATS
4
FACTS English is the dominant language in CIVIC discussions Non-English speaking members that are not fluent in English (or do not speak at all) are reluctant to contribute Manual (Human) translation of all email and forum communications is impossible and way too costly Systematic human translation would also delay interactions
5
CIVIC APPROACH TO LANGUAGE DIVERSITY Three official languages: English, French, Spanish All documents and “official” communications are translated in all three languages, (the original language document being the legally binding one?) Simultaneous translation is provided in face-to- face meetings for plenary sessions when the number of the language group and its needs justify the cost Automatic translation of emails is provided to facilitate comprehension and contribution by all language groups
6
OBJECTIVES OF THE AUTOMATIC TRANSLATION Provide the opportunity for all members to get the essence of all communications in all three official CIVIC languages Make the translation non disruptive, as seamless and as user-friendly as possible Allow an improvement of the translation overtime Construct a contextual terminology and linguistic environment for CIVIC on its field of intervention
7
HOW IT WORKS
8
THE TRANSLATION MECHANISMS When a mail arrives, the software breaks the email into paragraphs The software tries to guess the language of the paragraph If it cannot guess the language, it assumes it is English Then the software preprocess the paragraph through the knowledgebase Then each paragraph is sent to the translation service (Babelfish) and the result is retrieved for each language pair The resulting paragraph is post-processed Then the email is reconstructed and sent to the mailing list manager
9
INPUT REQUIREMENTS Use simple language constructs Use complete sentences and correct grammar and syntax Avoid abbreviations, metaphors and idiomatic expressions Avoid proverbs and sayings Do not mix languages in same paragraph (as translation is done paragraph by paragraph, and language is guessed)
10
OTHER FEATURES If you want some words not to be translated, enclose them in “*”, like *CIVIC* The knowledgebase allows to enter in a database how some words are to be translated to override the translation of the translation service, for example, to say ICT is translated TIC in French and Spanish and vice cersa This allows to build a lexicon or linguistic construct in the context of CIVIC and ICT4D
11
LIMITATIONS The less lengthy a paragraph is, the less accurate is the guessing of the language of the text. So, introductory paragraphs like greetings or opening, single-words texts will usually be wrongly or not translated at all The current version works only with plain text email messages. The final version will try to convert HTML-formatted emails to plain text before processing them The utility relies on Babelfish without a formal agreement (since it is free) and for which Babelfish was not designed. So, it is vulnerable to the slightest changes on the Babelfish web site
12
THINGS TO RESOLVE The character encoding issues Who will manage the knowledgebase? How words are entered into the database? How it is decided?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.