Presentation is loading. Please wait.

Presentation is loading. Please wait.

September 2004CSAW 20041 Extraction of Bilingual Information from Parallel Texts Mike Rosner.

Similar presentations


Presentation on theme: "September 2004CSAW 20041 Extraction of Bilingual Information from Parallel Texts Mike Rosner."— Presentation transcript:

1 September 2004CSAW 20041 Extraction of Bilingual Information from Parallel Texts Mike Rosner

2 September 2004CSAW 20042 Outline Machine Translation Traditional vs. Statistical Architectures Experimental Results Conclusions

3 September 2004CSAW 20043 Translational Equivalence: many:many relation SOURCETARGET

4 September 2004CSAW 20044 Traditional Machine Translation

5 September 2004CSAW 20045 Remarks Character of System –Knowledge based. –High quality results if domain is well delimited. –Knowledge takes the form of specialised rules (analysis; synthesis; transfer). Problems –Limited coverage –Knowledge acquisition bottleneck. –Extensibility.

6 September 2004CSAW 20046 Statistical Translation Robust Domain independent Extensible Does not require language specialists Uses noisy channel model of translation

7 September 2004CSAW 20047 Noisy Channel Model Sentence Translation (Brown et. al. 1990) source sentence target sentence sentence

8 September 2004CSAW 20048 The Problem of Translation Given a sentence T of the target language, seek the sentence S from which a translator produced T, i.e. find S that maximises P(S|T) By Bayes' theorem P(S|T) = P(S) x P(T|S) P(T) whose denominator is independent of S. Hence it suffices to maximise P(S) x P(T|S)

9 September 2004CSAW 20049 A Statistical MT System Source Language Model Translation Model P(S) * P(T|S) = P(S,T) ST Decoder TS

10 September 2004CSAW 200410 The Three Components of a Statistical MT model 1.Method for computing language model probabilities (P(S)) 2.Method for computing translation probabilities (P(S|T)) 3.Method for searching amongst source sentences for one that maximises P(S) * P(T|S)

11 September 2004CSAW 200411 Probabilistic Language Models General P(s1s2...sn) = P(s1)*P(s2|s1)...*P(sn|s1...s(n-1)) Trigram P(s1s2...sn) = P(s1)*P(s2|s1)*P(s3|s1,s2)...*P(sn|s(n-1)s(n-2)) Bigram P(s1s2...sn) = P(s1)*P(s2|s1)...*P(sn|s(n-1))

12 September 2004CSAW 200412 A Simple Alignment Based Translation Model Assumption: target sentence is generated from the source sentence word- by-word S: John loves Mary T: Jean aime Marie

13 September 2004CSAW 200413 Sentence Translation Probability According to this model, the translation probability of the sentence is just the product of the translation probabilities of the words. P(T|S) = P(Jean aime Marie|John loves Mary) = P(Jean|John) * P(aime|loves) * P(Marie|Mary)

14 September 2004CSAW 200414 More Realistic Example The proposal will not now be implemented Les propositions ne seront pas mises en application maintenant

15 September 2004CSAW 200415 Some Further Parameters Word Translation Probability: P(t|s) Fertility: the number of words in the target that are paired with each source word: (0 – N) Distortion: the difference in sentence position between the source word and the target word: P(i|j,l)

16 September 2004CSAW 200416 Searching Maintain list of hypotheses. Initial hypothesis: (Jean aime Marie | *) Search proceeds interatively. At each iteration we extend most promising hypotheses with additional words Jean aime Marie | John(1) * Jean aime Marie | * loves(2) * Jean aime Marie | * Mary(3) *

17 September 2004CSAW 200417 Parameter Estimation In general - large quantities of data For language model, we need only source language text. For translation model, we need pairs of sentences that are translations of each other. Use EM Algorithm (Baum 1972) to optimize model parameters.

18 September 2004CSAW 200418 Experiment (Brown et. al. 1990) Hansard. 40,000 pairs of sentences = approx. 800,000 words in each language. Considered 9,000 most common words in each language. Assumptions (initial parameter values) –each of the 9000 target words equally likely as translations of each of the source words. –each of the fertilities from 0 to 25 equally likely for each of the 9000 source words –each target position equally likely given each source position and target length

19 September 2004CSAW 200419 English: not FrenchProbability pas.469 ne.460 non.024 pas du tout.003 faux.003 plus.002 ce.002 que.002 jamais.002 FertilityProbability 2.758 0.133 1.106

20 September 2004CSAW 200420 English: hear FrenchProbability bravo.992 entendre.005 entendu.002 entends.001 FertilityProbability 0.584 1.416

21 September 2004CSAW 200421 Bajada 2003/4 400 sentence pairs from Malta/EU accession treaty Three different types of alignment –Paragraph (precision 97% recall 97%) –Sentence (precision 91% recall 95%) –Word: 2 translation models Model 1: distortion independent Model 2: distortion dependent

22 September 2004CSAW 200422 Bajada 2003/4 Model 1Model 2 word pairs present244 word pairs identified145 correct5877 incorrect8768 precision40%53% recall24%32%

23 September 2004CSAW 200423 Conclusion/Future Work Larger data sets Finer models of word/word translation probabilities taking into account –fertility –morphological variants of the same words Role and tools for bilingual informant (not linguistic specialist)


Download ppt "September 2004CSAW 20041 Extraction of Bilingual Information from Parallel Texts Mike Rosner."

Similar presentations


Ads by Google