Machine Translation II How MT works Modes of use
2/20 How MT works distinguish between generic “translation software” (algorithms) and language-pair- specific linguistic data –Software engineers ~ linguists Idea (from computer science) of modularity –Break down problem into manageable subproblems, essentially independent though linked to each other –Modules usually linguistically motivated linguistic formalisms for lexicons and grammars –May be more or less like formal linguistic theories –Usually “less” !
3/20 Modularity Morphological analysis Dictionary lookup Syntactic parse Attachment disambiguation Semantic roles TL syntax TL lexical choice Text normalisation TL morphology Text reconstitution Possible sequence of modules (fictitious)
4/20 Depth of analysis The “Vauquois triangle” analysis generation interlingua source texttarget text direct translation transfer
5/20 Depth of analysis The “Vauquois triangle” analysis generation interlingua source texttarget text direct translation transfer word-for-word some syntactic awareness full meaning representation
6/20 Modes of use fully automatic unrestricted texts high quality restricted input low quality impractical interactive
7/20 Different scenarios for MT Assimilation many SLs, one TL any style any topic partial analysis post-editing user is reader Dissemination one SL, many TLs controlled style single topic full analysis no post-editing user is author
8/20 Restricted input Restrictions may be natural (sublanguage) or imposed (controlled language) Related terms: special language, jargon, register, LSP For human: (usually) more readable, less ambiguous, more “focussed” For MT: –fewer syntactic constructions –closed vocabulary with fewer homonyms –greater certainty about interpretation
9/20 Features of sublanguage Lexicon –smaller size: fewer concepts to cover –finite/closed: innovation is controlled –nature: less homonymy, some synonyms (dis)favoured –grammatical use: fewer category ambiguities Syntax –reduced range of structures –some structures (dis)favoured –less flexibility in choice of structure –some deviance from “standard” grammar
10/20 Controlled languages Widely used in technical authoring Promotes consistency and readability Similar features to sublanguage Can be coupled with grammar checker Permits “multilingual authoring”
11/20 Use of low-quality output To get a rough idea of content, and to identify which parts need to be translated “properly” … especially with “exotic” languages Widely used on the Internet for browsing, chat- rooms and Despite low quality, users seem satisfied Task is especially difficult due to odd grammar, spelling, punctuation (GIGO), and wide variety of subject matter, often mixed Most MT systems now customized for web-page translation (take HTML mark-up into account)
12/20 Interactive translation Tools for translators “Translator’s workstation” Humans and computers cooperate Which takes the initiative? MAHT: human translation using translation tools HAMT: MT with human assistance
13/20 Machine-readable version of dictionary for human users
14/20 Pre-translation: terminology look-up
15/20 Translation Memory Database of previous translations More or less sophisticated matching algorithm (“fuzzy match”, simple pattern-matching which may incorporate “linguistic “knowledge”) But user must decide what to do with them
16/20 MT system’s dictionary
17/20 Bilingual concordance Source: TransSearch, Laboratoire de Recherche Appliquée en Linguistique Informatique, Université de Montréal
18/20 Parallel scrolling screens
19/20 Interactive translation
20/20 Conclusion Translation is really hard, but lay-people don’t understand this –Example: evaluating systems by use of round- trip translation, often of idioms, jokes, or set phrases Current MT systems are quite crude, and likely to remain so But useful nevertheless in appropriate scenarios, under certain conditions of use