Presentation is loading. Please wait.

Presentation is loading. Please wait.

Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel.

Similar presentations


Presentation on theme: "Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel."— Presentation transcript:

1 Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

2 Institut für Anthropomatik 214.02.2014 Overview Introduction Lexica Alignment IBM Model 1 EM Algorithm Higher IBM Models Word Alignment Jan Niehues - Lehrstuhl Prof. Alex Waibel

3 Institut für Anthropomatik 314.02.2014 Introduction Notation Source source (foreign) word I: length of foreign sentence i: position in source sentence (foreign sentence) foreign sentence Target: target (English) word J: length of English sentence j: position in english sentence English sentence Jan Niehues - Lehrstuhl Prof. Alex Waibel

4 Institut für Anthropomatik 414.02.2014 Introduction Statistical Machine Translation: Find most probable translation e for a given source sentence f Use Bayes Rule Jan Niehues - Lehrstuhl Prof. Alex Waibel

5 Institut für Anthropomatik 514.02.2014 System overview Jan Niehues - Lehrstuhl Prof. Alex Waibel

6 Institut für Anthropomatik 614.02.2014 Word-based Translation Model Word-based models were introduced by Brown et al. in early 90s Directly translate source words to target words Model word-by-word translation probabilities First statistical approach to machine translation No longer state of the art Used to generate word alignment for phrase extraction in phrase based models Jan Niehues - Lehrstuhl Prof. Alex Waibel

7 Institut für Anthropomatik 714.02.2014 Lexica Store translation of the source words One word can have several translations Example: Haus – house, building, home, household, shell Some are more likely, others are only used in certain circumstances How to decide which one to use in the translation? Use statistics Jan Niehues - Lehrstuhl Prof. Alex Waibel

8 Institut für Anthropomatik 814.02.2014 Lexica 0.00550Shell 0.015150Household 0.02200Home 0.161600Building 0.88000House ProbabilityCountsTranslations Collect counts of different translation Approximate probability distribution Jan Niehues - Lehrstuhl Prof. Alex Waibel

9 Institut für Anthropomatik 914.02.2014 Alignment Mapping between source and target words that are translations of each other Example: Input: das Haus ist klein Probabilistic Lexicon Possible word-by-word translation: The house is small Implicit alignment between source and target sentence: Jan Niehues - Lehrstuhl Prof. Alex Waibel

10 Institut für Anthropomatik 1014.02.2014 Alignment Formalized as a function: Maps target word position to source word position Example:

11 Institut für Anthropomatik 1114.02.2014 Alignment Difficulties Word reordering: Leads to non-monoton alignment

12 Institut für Anthropomatik 1214.02.2014 Alignment Difficulties Many-to-one alignments: One word of the input language is translated into several words

13 Institut für Anthropomatik 1314.02.2014 Alignment Difficulties Deletion: For some source words there is no equivalent in the translation

14 Institut für Anthropomatik 1414.02.2014 Alignment Difficulties Insertion: Some words of the target sentence have no equivalent in the source sentence Add NULL word to have still a fully defined alignment function

15 Institut für Anthropomatik 1514.02.2014 Alignment Remarks Many-to-one alignments are possible but no one-to-many alignment In this models alignments are represented by a function Leads to problems with languages like Chinese-English In phrase-based system this is solved by looking at the translation process from both directions

16 Institut für Anthropomatik 1614.02.2014 IBM Model 1 Model that generates different translations for a sentence with associated probability Generative Model: Break modeling of sentence translations into smaller steps of word-to-word translations with a coherent story Probability of the English sentence e and Alignment a given Foreign sentence f Number of possible alignments: Normalization constant:

17 Institut für Anthropomatik 1714.02.2014 IBM 1 Example Jan Niehues - Lehrstuhl Prof. Alex Waibel

18 Institut für Anthropomatik 1814.02.2014 IBM 1 Training Learn translation probability distributions Problem: incomplete data Only large amounts sentence-aligned parallel texts are available Lack alignment information Consider alignment as a hidden variable Approach: Expectation maximization (EM) algorithm Jan Niehues - Lehrstuhl Prof. Alex Waibel

19 Institut für Anthropomatik 1914.02.2014 EM Algorithm 1.Initialize the model Use uniform distribution 2.Apply the model to the data (expectation step) Compute alignment probabilities First all are equal but later Hause will be most likely translated to house 3.Learn the model from the data (maximization step) Learn translation probabilities from guess alignment Use best alignment or all with weights according to their probability 4.Iterate steps 2 and 3 until convergence Jan Niehues - Lehrstuhl Prof. Alex Waibel

20 Institut für Anthropomatik 2014.02.2014 Step 2 Calculate probability of an alignment Using dynamic programming we can reduce the complexity from exponential to quadratic in sentence length Jan Niehues - Lehrstuhl Prof. Alex Waibel

21 Institut für Anthropomatik 2114.02.2014 Step 2 Put together both equations: Jan Niehues - Lehrstuhl Prof. Alex Waibel

22 Institut für Anthropomatik 2214.02.2014 Step 3 Collect counts from every sentence pair (e,f): Calculate translation probabilities:

23 Institut für Anthropomatik 2314.02.2014 Pseudo-code Jan Niehues - Lehrstuhl Prof. Alex Waibel

24 Institut für Anthropomatik 2414.02.2014 Example Jan Niehues - Lehrstuhl Prof. Alex Waibel

25 Institut für Anthropomatik 2514.02.2014 Convergence Goal: Find model that best fits the data Measure: How well does it translate unseen sentences? At this point no test data How well does it model the training data Jan Niehues - Lehrstuhl Prof. Alex Waibel

26 Institut für Anthropomatik 2614.02.2014 Convergence Initial Model: First iteration: Final: Probability of training sentences increases Jan Niehues - Lehrstuhl Prof. Alex Waibel

27 Institut für Anthropomatik 2714.02.2014 Convergenz Perplexity of the model: Perplexity is guaranteed to decrease or stay the same at each iterations EM converges to local minimum IBM1: global miminum Jan Niehues - Lehrstuhl Prof. Alex Waibel

28 Institut für Anthropomatik 2814.02.2014 Higher IBM Models IBM1 is very simple No treatment of reordering and adding or dropping words Five models of increasing complexity were proposed by Brown et al. Lexicon plus relative positionsHMM Fixes deficiencyIBM Model 5 Relative alignment positionsIBM Model 4 adds fertility modelIBM Model 3 adds absolute positionIBM Model 2 Lexical translationsIBM Model 1

29 Institut für Anthropomatik 2914.02.2014 Higher IBM Models Complexity of training grows, but general principal stays the same During training: First train IBM Model 1 Use IBM Model 1 to initialize IBM Model 2 … All models are implemented in the GIZA ++ Toolkit Used by many groups Parallel version developed at CMU

30 Institut für Anthropomatik 3014.02.2014 IBM Model 2 Problem of IBM Model 1: same probability for these both sentence pairs Model for the alignment based on positions of input and output words Jan Niehues - Lehrstuhl Prof. Alex Waibel

31 Institut für Anthropomatik 3114.02.2014 IBM Model 2 Two step procedure: Mathematical formulation: Jan Niehues - Lehrstuhl Prof. Alex Waibel

32 Institut für Anthropomatik 3214.02.2014 IBM Model 2 Lexical translation step: Mathematical formulation: Jan Niehues - Lehrstuhl Prof. Alex Waibel

33 Institut für Anthropomatik 3314.02.2014 IBM Model 2 Alignment step: Mathematical formulation in IBM1: Jan Niehues - Lehrstuhl Prof. Alex Waibel

34 Institut für Anthropomatik 3414.02.2014 IBM Model 2 Alignment step: Mathematical formulation in IBM2: Jan Niehues - Lehrstuhl Prof. Alex Waibel

35 Institut für Anthropomatik 3514.02.2014 IBM Model 2 Training: Similar to IBM Model 1 training Initialization: Initialize with values of IBM Model 1 training Alignment probability :

36 Institut für Anthropomatik 3614.02.2014 IBM Model 2

37 Institut für Anthropomatik 3714.02.2014 IBM Model 2

38 Institut für Anthropomatik 3814.02.2014 Did not model how many words are generated by a input word Model fertility by a probability distribution: Examples: Add additional step to the model IBM Model 3 Jan Niehues - Lehrstuhl Prof. Alex Waibel

39 Institut für Anthropomatik 3914.02.2014 IBM Model 3 Word deletion: Modelled by Fertility 0 Word insertions: Could be modelled by Fertility of NULL word: But Fertility should depend on the sentence length Instead add NULL Insertion step NULL Insert step: Add NULL token after every word with probability or not with probability

40 Institut für Anthropomatik 4014.02.2014 IBM Model 3 Jan Niehues - Lehrstuhl Prof. Alex Waibel

41 Institut für Anthropomatik 4114.02.2014 IBM Model 3 Jan Niehues - Lehrstuhl Prof. Alex Waibel

42 Institut für Anthropomatik 4214.02.2014 IBM Model 3 Distortion model instead of Alignment model: Different distortions in both productions by same alignment Different direction of both models:

43 Institut für Anthropomatik 4314.02.2014 IBM Model 3 Mathematical Formulation Fertility step Fertility greater than one: Different tableaus for same alignment Alignment probability for all tableau are the same Number of different tableaus generating same alignment: All tableaus generating same alignment have the same probability Probabilitiy: Jan Niehues - Lehrstuhl Prof. Alex Waibel

44 Institut für Anthropomatik 4414.02.2014 IBM Model 3 Mathematical Formulation Fertility step Jan Niehues - Lehrstuhl Prof. Alex Waibel

45 Institut für Anthropomatik 4514.02.2014 IBM Model 3 Mathematical Formulation NULL Word insertion Number of generated NULL words: Depend on the number of generated output words from input puts words After each generated word there may be inserted a NULL Word s words generated from foreign input words Maximal number of generated NULL words Probability: Jan Niehues - Lehrstuhl Prof. Alex Waibel

46 Institut für Anthropomatik 4614.02.2014 IBM Model 3 Mathematical Formulation NULL Word insertion Jan Niehues - Lehrstuhl Prof. Alex Waibel

47 Institut für Anthropomatik 4714.02.2014 IBM Model 3 Mathematical Formulation Combine Fertility, lexical translation and distortion probabilities Jan Niehues - Lehrstuhl Prof. Alex Waibel

48 Institut für Anthropomatik 4814.02.2014 IBM Model 3 Training Problem: Exponential Number of Alignments IBM1/2: Dynamic Programming IBM 3: No longer possible to use Sampling from space of possible alignments Find most probable alignments Add additional similar alignments Use only these alignment for normalization Jan Niehues - Lehrstuhl Prof. Alex Waibel

49 Institut für Anthropomatik 4914.02.2014 IBM Model 3 Training Finding most probable alignment Exp. Number -> test all possible alignments to complex Use Hill climbing algorithm Evaulate all points in neighbour Go to highest Point Iterate Problem: may end in local maxima Start a various locations Jan Niehues - Lehrstuhl Prof. Alex Waibel

50 Institut für Anthropomatik 5014.02.2014 IBM Model 3 Training Initialization: Exp. Number -> test all possible alignments to complex Use Hill climbing algorithm Evaulate all points in neighbour Go to highest Point Iterate Problem: may end in local maxima Start a various locations Pegging Jan Niehues - Lehrstuhl Prof. Alex Waibel

51 Institut für Anthropomatik 5114.02.2014 Pegging For all indices i For all indices j Set alignment a(j)=i Find most probable alignment under this condition Add to set of starting points Jan Niehues - Lehrstuhl Prof. Alex Waibel

52 Institut für Anthropomatik 5214.02.2014 Hillcliming Find most probable alignment in neighborhood Neighborhood: Alignments differ by move Two alignments differ a1 and a2 differ by a move if the alignments differ only in the alignment for one word j Alignments differ by swap Two alignments a1 and a2 differ by a swap if the agree in the alignments for all words, except for two, for which the alignment points are switched: Jan Niehues - Lehrstuhl Prof. Alex Waibel

53 Institut für Anthropomatik 5314.02.2014 IBM3 Training Summary for IBM3 training Sampling the alignments Pegging Collecting counts Estimating probabilities Jan Niehues - Lehrstuhl Prof. Alex Waibel

54 Institut für Anthropomatik 5414.02.2014 IBM Model 4 Distortion Model: Absolute Position in IBM Model 3 Long sentences are relative rare Distortion probability can not approximated well Use relative position instead Problem: Added Words Droped Words One-to-many alignments

55 Institut für Anthropomatik 5514.02.2014 IBM Model 4 Cept: Each input word f j that is aligned to at least one output word forms a cept

56 Institut für Anthropomatik 5614.02.2014 IBM Model 4 Cept: Each input word f j that is aligned to at least one output word forms a cept Center: Ceiling of the average of the output word positions

57 Institut für Anthropomatik 5714.02.2014 IBM Model 4 Relative distortion: Define relative distortion for each output word 1.Target words generated by the NULL token: Uniform distribution 2.First word of a cept Word position j relative to the center of the preceding cept i-1 3.Subsequent words in a cept Word position I relative to postion of previous word in the cept

58 Institut für Anthropomatik 5814.02.2014 IBM Model 4

59 Institut für Anthropomatik 5914.02.2014 IBM Model 4 Word classes: Richer conditioning on the distortion: Some words are reordered more often E.g.: Adjectives when translating form English to French Not sufficient statistics to estimate probabilities Group words into word classes Possible classes: POS, Originally: automatically cluster words Jan Niehues - Lehrstuhl Prof. Alex Waibel

60 Institut für Anthropomatik 6014.02.2014 IBM Model 5 Deficiency: According to IBM Model 3 and 4 multiple output words can be placed at the same position Positive probability for impossible alignments IBM Model 5 prevent this No longer multiple tablaux with same alignment Place words only into vacant words position For all word positions How many untranslated words until this word No improvement in alignment quality Not used in most state-of-the-art systems Jan Niehues - Lehrstuhl Prof. Alex Waibel

61 Institut für Anthropomatik 6114.02.2014 HMM Alignment Model HMM successfully used in speech recognition Introduced by Vogel et. al Idea: Use relative position instead of absolute Entire word groups (phrases) are moved with respect to source position Giza Toolkit: Replace IBM2 by HMM Model Jan Niehues - Lehrstuhl Prof. Alex Waibel

62 Institut für Anthropomatik 6214.02.2014 HMM Alignment Model Jan Niehues - Lehrstuhl Prof. Alex Waibel

63 Institut für Anthropomatik 6314.02.2014 HMM Alignment Model First order model: target position dependent on previous target position (captures movement of entire phrases) Alignment probability: Maximum approximation: Jan Niehues - Lehrstuhl Prof. Alex Waibel

64 Institut für Anthropomatik 6414.02.2014 Viterbi Training Jan Niehues - Lehrstuhl Prof. Alex Waibel # Accumulation (over corpus) # find Viterbi path For each sentence pair For each source position j For each target position i P best = 0; t = p(f j |e i ) For each target position i P prev = P(j-1,i) a = p(i|i,I,J) P new = P prev *t*a if (P new > P best ) P best = P new BackPointer(j,i) = i # update counts i = argmax{ BackPointer( J, I ) } For each j from J downto 1 Count(f_j, e_i)++ Count(i,iprev,I,J)++ i = BackPoint(j,i) # renormalize … P prev a = p(i | i,I,J) t = p(f j | e i ) P new =P prev *a*t

65 Institut für Anthropomatik 6514.02.2014 HMM Forward-Backward Training Gamma: Probailitiy to emit f j when state i in sentence s Sum over all paths through (j,i) Jan Niehues - Lehrstuhl Prof. Alex Waibel j i

66 Institut für Anthropomatik 6614.02.2014 HMM Forward-Backward Training Epsilon: Probability to transit from state I into I Sum over all paths through (j-1,I) and (j,i) emitting f j Jan Niehues - Lehrstuhl Prof. Alex Waibel 11-731 Machine Translation (2009) j-1 i j

67 Institut für Anthropomatik 6714.02.2014 Forward Probabilities Defined as: Recursion: Initial condition Jan Niehues - Lehrstuhl Prof. Alex Waibel j i

68 Institut für Anthropomatik 6814.02.2014 Backward Probabilities Defined as: Recursion: Initial condition Jan Niehues - Lehrstuhl Prof. Alex Waibel j i

69 Institut für Anthropomatik 6914.02.2014 Forward-Backward Calculaate Gamma and Epsilon with Alpha and Beta Gamma: Epsilon Jan Niehues - Lehrstuhl Prof. Alex Waibel

70 Institut für Anthropomatik 7014.02.2014 Parameter Re-Estimation Lexicon probabilties: Aligment probailities: Jan Niehues - Lehrstuhl Prof. Alex Waibel

71 Institut für Anthropomatik 7114.02.2014 Forward-Backward Training Pseudo Code Jan Niehues - Lehrstuhl Prof. Alex Waibel # Accumulation For each sentence-pair { Forward. (Calculate Alphas) Backward. (Calculate Betas) Calculate Xis and Gammas. For each source word { Increase LexiconCount(f_j|e_i) by Gamma(j,i). Increase AlignCount(i|i) by Epsilon(j,i,i). } # Update Normalize LexiconCount to get P(f_j|e_i). Normalize AlignCount to get P(i|i).

72 Institut für Anthropomatik 7214.02.2014 Example HMM Training Jan Niehues - Lehrstuhl Prof. Alex Waibel

73 Institut für Anthropomatik 7314.02.2014 IBM Models Phrase-based systems outperform these word-based translation models IBM Models can be used to generate a word alignment by using the viterbi path Problem: 1-to-many But we can generate many-to-1 alignments Use alignments from both directions and combine with a heuristic Jan Niehues - Lehrstuhl Prof. Alex Waibel

74 Institut für Anthropomatik 7414.02.2014 Word alignment Jan Niehues - Lehrstuhl Prof. Alex Waibel

75 Institut für Anthropomatik 7514.02.2014 Word alignment Jan Niehues - Lehrstuhl Prof. Alex Waibel

76 Institut für Anthropomatik 7614.02.2014 Word alignment Evaluation: Given some manually aligned data (ref) and automatically aligned data (hyp) links can be Correct, i.e. link in hyp matches link in ref: true positive (tp) Wrong, i.e. link in hyp but not in ref: false positive (fp) Missing, i.e. link in ref but not in hyp: false negaitve (fn) Jan Niehues - Lehrstuhl Prof. Alex Waibel

77 Institut für Anthropomatik 7714.02.2014 Word alignment Measures Precision: Number of correct links / Number of links in hyp Problem: Less Links -> Improve Presicion Recall: Number of correct links / Number of links in reference Problem: All links in Alignment -> Recall = 1 Jan Niehues - Lehrstuhl Prof. Alex Waibel

78 Institut für Anthropomatik 7814.02.2014 Word alignment Measures Precision: Number of correct links / Number of links in hyp Problem: Less Links -> Improve Presicion Recall: Number of correct links / Number of links in reference Problem: All links in Alignment -> Recall = 1 Jan Niehues - Lehrstuhl Prof. Alex Waibel

79 Institut für Anthropomatik 7914.02.2014 Word alignment Measures F-Score: Alignment error rate (AER): Jan Niehues - Lehrstuhl Prof. Alex Waibel

80 Institut für Anthropomatik 8014.02.2014 Refernce Sometimes it is difficult for human annotators to decide Differentiate between sure and possible links Sets: A: generated links S: sure links (not finding a sure link is an error) P: possible links (putting a link which is not possible is an error) Alignment error rate Jan Niehues - Lehrstuhl Prof. Alex Waibel

81 Institut für Anthropomatik 8114.02.2014 Conclusion Word-based Translation Models Word alignment as hidden variable Only 1-n alignments possible Jan Niehues - Lehrstuhl Prof. Alex Waibel


Download ppt "Institut für Anthropomatik 114.02.2014 Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel."

Similar presentations


Ads by Google