Presentation is loading. Please wait.

Presentation is loading. Please wait.

What a professional translator should know about Machine Translation Harold Somers Professor Emeritus University of Manchester.

Similar presentations


Presentation on theme: "What a professional translator should know about Machine Translation Harold Somers Professor Emeritus University of Manchester."— Presentation transcript:

1 What a professional translator should know about Machine Translation Harold Somers Professor Emeritus University of Manchester

2 2 Background Machine Translation (MT): 60-year-old technology, firmly established (esp. free online MT) as viable, though flawed. Professional translators’ reservations –fears (misplaced as it turns out) that it would take work away from them –disgust at bad image MT gives to the profession recent developments suggest need for more reconciliatory approach –MT as a “colleague” rather than a rival Need better understanding of what MT can and, more importantly, cannot do.

3 3 Overview Focusing on full text MT … How MT works Strengths and weaknesses What should translators say about MT? Assumption: we’re mostly talking about free online MT here

4 4 History of MT 1945-65: Crude early attempts with unsophisticated computers and naïve linguistic approach –mainly word-for-word 1966-90: Linguistic rule-based programs –some successes, especially with “sublanguage” –requires much effort to build 1991-…: Statistics-based programs, learning translation patterns from large amounts of data –quick to develop if data is available –surprisingly good quality (but see later)

5 5 How does (S)MT work? Requires huge amounts of parallel (bilingual) data – i.e. texts and their translations Programs automatically align the texts (sentence-by-sentence where possible), then extract (or “learn”) translation probabilities (“models”) At run time, probabilities are juggled to get the highest scoring result

6 6 A little more detail Actually, two models are learned from the data: –Translation model: given words and word sequences in SL, what are the most likely corresponding words in the TL? –Target-language model: given these corresponding TL words, what is the most likely way in which they will be combined?

7 7 So how well does that work? Let’s look first at what makes translation hard for a computer … …then see how well SMT handles these difficulties … … and what we can conclude from that

8 8 Why is translation hard for a computer? Language is highly ambiguous Translation largely requires genuine understanding Translation is all about style You may not have realised that, but for a computer it is true Debatable in some cases, but undoubtedly often true Well, sometimes it is!

9 9 Language is difficult Individual words –ambiguous morphology –homonymy –polysemy –translation divergences Sequences of words –local ambiguity –global ambiguity –“Dependencies” –TL grammar –numb:number, tow:tower –round, bank, last, flush –report, range –wall = muro/parete –car boot sale, He shot the man with a gun –Time flies like an arrow –He left the passage that had taken him so long to compose out –This bed has been slept in Humans use their general understanding of context or plausibility, and so often don’t even notice the ambiguity “Contrastive” knowledge of languages is a big part of what translation is about

10 10 How does MT cope? Ambiguity –“Translation models” handle not only individual words, but word sequences –If the model has the wrong interpretation, the system is likely to reproduce it –Also, dependencies between words (which can be arbitrarily distant from each other) are more difficult to capture –Target-language models may also help here

11 11 How does MT cope? Style and nuance –Both translation and TL models can only reflect the data on which they have been trained –Probability data is generally not fine-grained enough to capture niceties –Again, anything that depends on long- distance dependencies is unlikely to shine through

12 12 What are MT’s strengths? Impact of training data is paramount: –MT performs best when translating the kind of text it has been trained on –This was also true of rule-based systems –Somewhat true of (specialised) human translators too Tension between –need to use as much material as possible for training –desire (eg Google) to provide a generic translation service –trade-off between coverage and translation quality

13 13 What are MT’s strengths? MT in general performs well with –simple grammatical source text free of ambiguities, colloquialisms, etc. for which style and nuance is not so important Happily these are the kinds of texts that human translators find least engaging However well MT manages, it is not 100% reliable as as a human

14 14 What you should say about MT It’s good (even preferable) for some things Mainly translation into the client’s language (“assimilation”) –Reading a document in a foreign language to see what it’s about and whether they need a proper translation, or which bits need translation –They may feel able (if they know the source language) to tidy it up (“post-editing”, “revision”) themselves, though they should always be aware of the risk involved Rough and ready translation into a foreign language –eg for informal communication with someone who can tolerate a rough translation –Again, the risks must be emphasised –Possible use (even by translators) of MT as a first draft: postediting

15 15 What you should say about MT But for other things MT might be quite unsuitable, and HT is still a better bet –Certainly any document (eg for publication) where the quality of the translation will reflect on your client –Any document where style and presentation is important –Any document where accuracy is crucial –Translation into a target language that the client does not know at all carries a major risk

16 16 A final word of warning Clients might like to evaluate an MT system for themselves A common method is back-and-forth (“round trip”) translation This has some major drawbacks: –A bad RT may be caused by a bad outward trip or a bad return trip … hard to know which –A good RT may hide a bad translation – eg word for word nonsense in the TL, which comes back as the same original source text So RTT on a single sentence won’t tell you much … so test it with a longer text: if it does OK it may be a fair result; if it does badly you can never be sure why

17 17 Grazie


Download ppt "What a professional translator should know about Machine Translation Harold Somers Professor Emeritus University of Manchester."

Similar presentations


Ads by Google