Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Translation III Empirical approaches to MT: Example-based MT Statistical MT LELA30431/chapter50.pdf.

Similar presentations


Presentation on theme: "Machine Translation III Empirical approaches to MT: Example-based MT Statistical MT LELA30431/chapter50.pdf."— Presentation transcript:

1 Machine Translation III Empirical approaches to MT: Example-based MT Statistical MT http://personalpages.manchester.ac.uk/staff/harold.somers/ LELA30431/chapter50.pdf http://www.statmt.org/

2 2/30 Introduction Empirical approaches: what does that mean? –Empirical vs rationalist –Data-driven vs rule-driven Pure empiricism: statistical MT Hybrid empiricism: Example-based MT

3 3/30 Empirical approaches Approaches based on pure data Contrast with “rationalist” approach: rule-based systems of “2nd generation” Larger storage, faster processors, and availability of textual data in huge quantities suggest data-driven approach may be possible “Data” here means just raw text

4 4/30 Flashback Early thoughts on MT (Warren Weaver 1949) included possibility that translation was like code-breaking (cryptanalysis). Weaver – with Claude Shannon – invented “information theory” Given enough data, patterns could be identified and applied to new text

5 5/30 Back to the future Data-driven approach encouraged by availability of machine-readable parallel text, notably at first Canadian and Hong Kong Hansards, then EU documents, and dual-language web pages Two basic approaches: –Statistical MT –Example-based MT

6 6/30 Example-based MT “Translation by analogy” First proposed by Nagao (1984) but not implemented until early 1990s Very intuitive: translate text on the basis of recognising bits that have been previously translated, and sticking them together –Cf tourist phrasebook approach

7 7/30 Example-based MT Like an extension of Translation Memory Based on a database of translation examples System finds closely matching previous example(s) (unlike TM) identifies the corresponding fragments in the target text(s) (align) And recombines them to give the target text

8 8/30 He buys a book on international politics Input Matches He buys a notebook. Kare wa n ō to o kau. I read a book on international politics. Watashi wa kokusai seiji nitsuite kakareta hon o yomu. Result Kare wa o kau. kokusai seiji nitsuite kakareta hon Example (Sato & Nagao 1990)

9 9/30 Learning templates The monkey ate a peach.  saru wa momo o tabeta. The man ate a peach.  hito wa momo o tabeta. monkey  saru man  hito The … ate a peach.  … wa momo o tabeta. The dog ate a rabbit.  inu wa usagi o tabeta. dog  inu rabbit  usagi The … ate a ….  … wa … o tabeta. The dog ate a peach.  inu wa momo o tabeta.

10 10/30 Some problems include … Source of examples –Genuine text or hand-crafted? Identifying matching fragments –Preprocessed storage implication Prejudge what will be useful –“on the fly” – needs a dictionary Partial matching Sticking fragments together (boundary friction) Conflicting/multiple examples

11 11/30 Partial matching The operation was interrupted because the file was hidden. a. The operation was interrupted because the Ctrl-c key was pressed. b. The specified method failed because the file is hidden. c. The operation was interrupted by the application. d. The requested operation cannot be completed because the disk is full.

12 12/30 Boundary friction (1) Consider again: He buys a book on politics Matches He buys a notebook. Kare wa n ō to o kau. I read a book on politics. Watashi wa seiji nitsuite kakareta hon o yomu. He buys a pen. Kare wa pen o kau. She wrote a book on politics. Kanojo wa seiji nitsuite kakareta hon o kaita. Result Kare wa o kau Kare wa o kau. wa seiji nitsuite kakareta hon o

13 13/30 Boundary friction (2) Input: The handsome boy entered the room Matches: The handsome boy ate his breakfast. I saw the handsome boy. Der schöne Junge aß sein Frühstück Ich sah den schönen Jungen.

14 14/30 Competing examples In closing, I will say that I am sad for workers in the airline industry. My colleague spoke about the airline industry. People in the airline industry have become unemployed. This tax will cripple some of the small companies in the airline industry. En terminant, je dirai que c’est triste pour les travailleurs et les travailleuses du secteur de l’aviation. Mon collègue a parlé de l’industrie du transport aérien. Des gens de l’industrie aérienne sont devenus chômeurs. Cette surtaxe va nuire aux petits transporteurs aériens. Results from Canadian Hansard using TransSearch

15 15/30 Statistical MT Pioneered by IBM in early 1990s Spurred on by better success in speech recognition of statistical over linguistic rule- based approaches Idea that translation can be modelled as a statistical process Seems to work best in limited domain where given data is a good model of future translations

16 16/30 Translation as a probabilistic problem For a given SL sentence S i, there are  number of “translations” T of varying probability Task is to find for S i the sentence T j for which the probability P(T j | S i ) is the highest

17 17/30 Two models P(T j | S i ) is a function of two models: –The probabilities of the individual words that make up T j given the individual words in S i - the “translation model” –The probability that the individual words that make up T j are in the appropriate order – “the language model”

18 18/30 Expressed in mathematical terms: Since S is a given, and constant, this can be simplified as Translation modelLanguage model

19 19/30 So how do we translate? For a given input sentence S i we have to have a practical way to find the T j that maximizes the formula We have to start somewhere, so we start with the translation model: which words look most likely to help us? In a systematic way we can keep trying different combinations together with the language model until we stop getting improvements

20 20/30 Input sentence Translation model Bag of possible words Most probable translation Seek improvement by trying other combinations Language model

21 21/30 Where do the models come from? All the statistical parameters are pre- computed, based on a parallel corpus Language model is probabilities of word sequences (n-grams) Translation model is derived from aligned parallel corpus

22 22/30 The translation model Take sentence-aligned parallel corpus Extract entire vocabulary for both languages For every word-pair, calculate probability that they correspond – e.g. by comparing distributions

23 23/30 Some obvious problems “fertility”: not all word correspondences are 1:1 –Some words have multiple possible translations, e.g. the  {le, la, l’, les} –Some words have no translation, e.g. in il se rase ‘he shaves’, se  –Some words are translated by several words, e.g. cheap  peu cher –Not always obvious how to align

24 24/30 The proposal will not now be implemented Les propositions ne seront pas mises en application maintenant The ~ Les proposal ~ propositions will ~ seront now ~ maintenant implemented ~ mises en application be ~  not ~ ne…pas will not ~ ne seront pas } many:many not allowed; only 1:n (n  0) and in practice, n<3 

25 25/30 Some word-pair probabilities from Canadian Hansard French Pfertility P le.6101.871 la.1780.124 l’.083 2.004 les.023 ce.013 il.012 de.009 à.007 que.007 ‘the’ French Pfertility P pas.4692.758 ne.4600.133 non.024 1.106 faux.006 plus.002 ce.002 que.002 jamais.002 ‘not’ French Pfertility P bravo.9920.584 entendre.0051.416 entendu.002 entende.001 ‘hear’

26 26/30 Another problem: distortion Notice that corresponding words do not appear in the same order. The translation model includes probabilities for “distortion” –e.g. P(2|5): the P that w s in position 2 will produce a w t in position 5 –can be more complex: P(5|2,4,6): the P that w s in position 2 will produce a w t in position 5 when S has 4 words and T has 6.

27 27/30 The language model Impractical to calculate probability of every word sequence: –Many will be very improbable … –Because they are ungrammatical –Or because they happen not to occur in the data Probabilities of sequences of n words (“n- grams”) more practical –Bigram model: where P(w i |w i–1 )  f(w i–1, w i )/f(w i )

28 28/30 Sparse data Relying on n-grams with a large n risks 0- probabilities Bigrams are less risky but sometimes not discriminatory enough –e.g. I hire men who is good pilots 3- or 4-grams allow a nice compromise, and if a 3-gram is previously unseen, we can give it a score based on the component bigrams (“smoothing”)

29 29/30 Put it all together and …? To build a statistical MT system we need: –Aligned bilingual corpus –“Training programs” which will extract from the corpora all the statistical data for the models –A “decoder” which takes a given input, and seeks the output that evaluates the magic argmax formula – based on a heuristic search algorithm Software for this purpose is freely available –e.g. Claim is that an MT system for a new language pair can be built in a matter of hours

30 30/30 SMT latest developments Nevertheless, quality is limited SMT researchers quickly learned (just like in the 1960s) that this crude approach can get them so far (quite far actually), but that to go the extra distance you need linguistic knowledge (eg morphology, “phrases”, consitutents) Latest developments aim to incorporate this Big difference is that it too can be LEARNED (automatically) from corpora So SMT still contrasts with traditional RBMT where rules are “hand coded” by linguists


Download ppt "Machine Translation III Empirical approaches to MT: Example-based MT Statistical MT LELA30431/chapter50.pdf."

Similar presentations


Ads by Google