Download presentation

Presentation is loading. Please wait.

Published byJada Adkins Modified over 2 years ago

1
Institut für Anthropomatik Introduction to SMT – Word-based Translation Models Jan Niehues - Lehrstuhl Prof. Alex Waibel

2
Institut für Anthropomatik Overview Introduction Lexica Alignment IBM Model 1 EM Algorithm Higher IBM Models Word Alignment Jan Niehues - Lehrstuhl Prof. Alex Waibel

3
Institut für Anthropomatik Introduction Notation Source source (foreign) word I: length of foreign sentence i: position in source sentence (foreign sentence) foreign sentence Target: target (English) word J: length of English sentence j: position in english sentence English sentence Jan Niehues - Lehrstuhl Prof. Alex Waibel

4
Institut für Anthropomatik Introduction Statistical Machine Translation: Find most probable translation e for a given source sentence f Use Bayes Rule Jan Niehues - Lehrstuhl Prof. Alex Waibel

5
Institut für Anthropomatik System overview Jan Niehues - Lehrstuhl Prof. Alex Waibel

6
Institut für Anthropomatik Word-based Translation Model Word-based models were introduced by Brown et al. in early 90s Directly translate source words to target words Model word-by-word translation probabilities First statistical approach to machine translation No longer state of the art Used to generate word alignment for phrase extraction in phrase based models Jan Niehues - Lehrstuhl Prof. Alex Waibel

7
Institut für Anthropomatik Lexica Store translation of the source words One word can have several translations Example: Haus – house, building, home, household, shell Some are more likely, others are only used in certain circumstances How to decide which one to use in the translation? Use statistics Jan Niehues - Lehrstuhl Prof. Alex Waibel

8
Institut für Anthropomatik Lexica Shell Household Home Building House ProbabilityCountsTranslations Collect counts of different translation Approximate probability distribution Jan Niehues - Lehrstuhl Prof. Alex Waibel

9
Institut für Anthropomatik Alignment Mapping between source and target words that are translations of each other Example: Input: das Haus ist klein Probabilistic Lexicon Possible word-by-word translation: The house is small Implicit alignment between source and target sentence: Jan Niehues - Lehrstuhl Prof. Alex Waibel

10
Institut für Anthropomatik Alignment Formalized as a function: Maps target word position to source word position Example:

11
Institut für Anthropomatik Alignment Difficulties Word reordering: Leads to non-monoton alignment

12
Institut für Anthropomatik Alignment Difficulties Many-to-one alignments: One word of the input language is translated into several words

13
Institut für Anthropomatik Alignment Difficulties Deletion: For some source words there is no equivalent in the translation

14
Institut für Anthropomatik Alignment Difficulties Insertion: Some words of the target sentence have no equivalent in the source sentence Add NULL word to have still a fully defined alignment function

15
Institut für Anthropomatik Alignment Remarks Many-to-one alignments are possible but no one-to-many alignment In this models alignments are represented by a function Leads to problems with languages like Chinese-English In phrase-based system this is solved by looking at the translation process from both directions

16
Institut für Anthropomatik IBM Model 1 Model that generates different translations for a sentence with associated probability Generative Model: Break modeling of sentence translations into smaller steps of word-to-word translations with a coherent story Probability of the English sentence e and Alignment a given Foreign sentence f Number of possible alignments: Normalization constant:

17
Institut für Anthropomatik IBM 1 Example Jan Niehues - Lehrstuhl Prof. Alex Waibel

18
Institut für Anthropomatik IBM 1 Training Learn translation probability distributions Problem: incomplete data Only large amounts sentence-aligned parallel texts are available Lack alignment information Consider alignment as a hidden variable Approach: Expectation maximization (EM) algorithm Jan Niehues - Lehrstuhl Prof. Alex Waibel

19
Institut für Anthropomatik EM Algorithm 1.Initialize the model Use uniform distribution 2.Apply the model to the data (expectation step) Compute alignment probabilities First all are equal but later Hause will be most likely translated to house 3.Learn the model from the data (maximization step) Learn translation probabilities from guess alignment Use best alignment or all with weights according to their probability 4.Iterate steps 2 and 3 until convergence Jan Niehues - Lehrstuhl Prof. Alex Waibel

20
Institut für Anthropomatik Step 2 Calculate probability of an alignment Using dynamic programming we can reduce the complexity from exponential to quadratic in sentence length Jan Niehues - Lehrstuhl Prof. Alex Waibel

21
Institut für Anthropomatik Step 2 Put together both equations: Jan Niehues - Lehrstuhl Prof. Alex Waibel

22
Institut für Anthropomatik Step 3 Collect counts from every sentence pair (e,f): Calculate translation probabilities:

23
Institut für Anthropomatik Pseudo-code Jan Niehues - Lehrstuhl Prof. Alex Waibel

24
Institut für Anthropomatik Example Jan Niehues - Lehrstuhl Prof. Alex Waibel

25
Institut für Anthropomatik Convergence Goal: Find model that best fits the data Measure: How well does it translate unseen sentences? At this point no test data How well does it model the training data Jan Niehues - Lehrstuhl Prof. Alex Waibel

26
Institut für Anthropomatik Convergence Initial Model: First iteration: Final: Probability of training sentences increases Jan Niehues - Lehrstuhl Prof. Alex Waibel

27
Institut für Anthropomatik Convergenz Perplexity of the model: Perplexity is guaranteed to decrease or stay the same at each iterations EM converges to local minimum IBM1: global miminum Jan Niehues - Lehrstuhl Prof. Alex Waibel

28
Institut für Anthropomatik Higher IBM Models IBM1 is very simple No treatment of reordering and adding or dropping words Five models of increasing complexity were proposed by Brown et al. Lexicon plus relative positionsHMM Fixes deficiencyIBM Model 5 Relative alignment positionsIBM Model 4 adds fertility modelIBM Model 3 adds absolute positionIBM Model 2 Lexical translationsIBM Model 1

29
Institut für Anthropomatik Higher IBM Models Complexity of training grows, but general principal stays the same During training: First train IBM Model 1 Use IBM Model 1 to initialize IBM Model 2 … All models are implemented in the GIZA ++ Toolkit Used by many groups Parallel version developed at CMU

30
Institut für Anthropomatik IBM Model 2 Problem of IBM Model 1: same probability for these both sentence pairs Model for the alignment based on positions of input and output words Jan Niehues - Lehrstuhl Prof. Alex Waibel

31
Institut für Anthropomatik IBM Model 2 Two step procedure: Mathematical formulation: Jan Niehues - Lehrstuhl Prof. Alex Waibel

32
Institut für Anthropomatik IBM Model 2 Lexical translation step: Mathematical formulation: Jan Niehues - Lehrstuhl Prof. Alex Waibel

33
Institut für Anthropomatik IBM Model 2 Alignment step: Mathematical formulation in IBM1: Jan Niehues - Lehrstuhl Prof. Alex Waibel

34
Institut für Anthropomatik IBM Model 2 Alignment step: Mathematical formulation in IBM2: Jan Niehues - Lehrstuhl Prof. Alex Waibel

35
Institut für Anthropomatik IBM Model 2 Training: Similar to IBM Model 1 training Initialization: Initialize with values of IBM Model 1 training Alignment probability :

36
Institut für Anthropomatik IBM Model 2

37
Institut für Anthropomatik IBM Model 2

38
Institut für Anthropomatik Did not model how many words are generated by a input word Model fertility by a probability distribution: Examples: Add additional step to the model IBM Model 3 Jan Niehues - Lehrstuhl Prof. Alex Waibel

39
Institut für Anthropomatik IBM Model 3 Word deletion: Modelled by Fertility 0 Word insertions: Could be modelled by Fertility of NULL word: But Fertility should depend on the sentence length Instead add NULL Insertion step NULL Insert step: Add NULL token after every word with probability or not with probability

40
Institut für Anthropomatik IBM Model 3 Jan Niehues - Lehrstuhl Prof. Alex Waibel

41
Institut für Anthropomatik IBM Model 3 Jan Niehues - Lehrstuhl Prof. Alex Waibel

42
Institut für Anthropomatik IBM Model 3 Distortion model instead of Alignment model: Different distortions in both productions by same alignment Different direction of both models:

43
Institut für Anthropomatik IBM Model 3 Mathematical Formulation Fertility step Fertility greater than one: Different tableaus for same alignment Alignment probability for all tableau are the same Number of different tableaus generating same alignment: All tableaus generating same alignment have the same probability Probabilitiy: Jan Niehues - Lehrstuhl Prof. Alex Waibel

44
Institut für Anthropomatik IBM Model 3 Mathematical Formulation Fertility step Jan Niehues - Lehrstuhl Prof. Alex Waibel

45
Institut für Anthropomatik IBM Model 3 Mathematical Formulation NULL Word insertion Number of generated NULL words: Depend on the number of generated output words from input puts words After each generated word there may be inserted a NULL Word s words generated from foreign input words Maximal number of generated NULL words Probability: Jan Niehues - Lehrstuhl Prof. Alex Waibel

46
Institut für Anthropomatik IBM Model 3 Mathematical Formulation NULL Word insertion Jan Niehues - Lehrstuhl Prof. Alex Waibel

47
Institut für Anthropomatik IBM Model 3 Mathematical Formulation Combine Fertility, lexical translation and distortion probabilities Jan Niehues - Lehrstuhl Prof. Alex Waibel

48
Institut für Anthropomatik IBM Model 3 Training Problem: Exponential Number of Alignments IBM1/2: Dynamic Programming IBM 3: No longer possible to use Sampling from space of possible alignments Find most probable alignments Add additional similar alignments Use only these alignment for normalization Jan Niehues - Lehrstuhl Prof. Alex Waibel

49
Institut für Anthropomatik IBM Model 3 Training Finding most probable alignment Exp. Number -> test all possible alignments to complex Use Hill climbing algorithm Evaulate all points in neighbour Go to highest Point Iterate Problem: may end in local maxima Start a various locations Jan Niehues - Lehrstuhl Prof. Alex Waibel

50
Institut für Anthropomatik IBM Model 3 Training Initialization: Exp. Number -> test all possible alignments to complex Use Hill climbing algorithm Evaulate all points in neighbour Go to highest Point Iterate Problem: may end in local maxima Start a various locations Pegging Jan Niehues - Lehrstuhl Prof. Alex Waibel

51
Institut für Anthropomatik Pegging For all indices i For all indices j Set alignment a(j)=i Find most probable alignment under this condition Add to set of starting points Jan Niehues - Lehrstuhl Prof. Alex Waibel

52
Institut für Anthropomatik Hillcliming Find most probable alignment in neighborhood Neighborhood: Alignments differ by move Two alignments differ a1 and a2 differ by a move if the alignments differ only in the alignment for one word j Alignments differ by swap Two alignments a1 and a2 differ by a swap if the agree in the alignments for all words, except for two, for which the alignment points are switched: Jan Niehues - Lehrstuhl Prof. Alex Waibel

53
Institut für Anthropomatik IBM3 Training Summary for IBM3 training Sampling the alignments Pegging Collecting counts Estimating probabilities Jan Niehues - Lehrstuhl Prof. Alex Waibel

54
Institut für Anthropomatik IBM Model 4 Distortion Model: Absolute Position in IBM Model 3 Long sentences are relative rare Distortion probability can not approximated well Use relative position instead Problem: Added Words Droped Words One-to-many alignments

55
Institut für Anthropomatik IBM Model 4 Cept: Each input word f j that is aligned to at least one output word forms a cept

56
Institut für Anthropomatik IBM Model 4 Cept: Each input word f j that is aligned to at least one output word forms a cept Center: Ceiling of the average of the output word positions

57
Institut für Anthropomatik IBM Model 4 Relative distortion: Define relative distortion for each output word 1.Target words generated by the NULL token: Uniform distribution 2.First word of a cept Word position j relative to the center of the preceding cept i-1 3.Subsequent words in a cept Word position I relative to postion of previous word in the cept

58
Institut für Anthropomatik IBM Model 4

59
Institut für Anthropomatik IBM Model 4 Word classes: Richer conditioning on the distortion: Some words are reordered more often E.g.: Adjectives when translating form English to French Not sufficient statistics to estimate probabilities Group words into word classes Possible classes: POS, Originally: automatically cluster words Jan Niehues - Lehrstuhl Prof. Alex Waibel

60
Institut für Anthropomatik IBM Model 5 Deficiency: According to IBM Model 3 and 4 multiple output words can be placed at the same position Positive probability for impossible alignments IBM Model 5 prevent this No longer multiple tablaux with same alignment Place words only into vacant words position For all word positions How many untranslated words until this word No improvement in alignment quality Not used in most state-of-the-art systems Jan Niehues - Lehrstuhl Prof. Alex Waibel

61
Institut für Anthropomatik HMM Alignment Model HMM successfully used in speech recognition Introduced by Vogel et. al Idea: Use relative position instead of absolute Entire word groups (phrases) are moved with respect to source position Giza Toolkit: Replace IBM2 by HMM Model Jan Niehues - Lehrstuhl Prof. Alex Waibel

62
Institut für Anthropomatik HMM Alignment Model Jan Niehues - Lehrstuhl Prof. Alex Waibel

63
Institut für Anthropomatik HMM Alignment Model First order model: target position dependent on previous target position (captures movement of entire phrases) Alignment probability: Maximum approximation: Jan Niehues - Lehrstuhl Prof. Alex Waibel

64
Institut für Anthropomatik Viterbi Training Jan Niehues - Lehrstuhl Prof. Alex Waibel # Accumulation (over corpus) # find Viterbi path For each sentence pair For each source position j For each target position i P best = 0; t = p(f j |e i ) For each target position i P prev = P(j-1,i) a = p(i|i,I,J) P new = P prev *t*a if (P new > P best ) P best = P new BackPointer(j,i) = i # update counts i = argmax{ BackPointer( J, I ) } For each j from J downto 1 Count(f_j, e_i)++ Count(i,iprev,I,J)++ i = BackPoint(j,i) # renormalize … P prev a = p(i | i,I,J) t = p(f j | e i ) P new =P prev *a*t

65
Institut für Anthropomatik HMM Forward-Backward Training Gamma: Probailitiy to emit f j when state i in sentence s Sum over all paths through (j,i) Jan Niehues - Lehrstuhl Prof. Alex Waibel j i

66
Institut für Anthropomatik HMM Forward-Backward Training Epsilon: Probability to transit from state I into I Sum over all paths through (j-1,I) and (j,i) emitting f j Jan Niehues - Lehrstuhl Prof. Alex Waibel Machine Translation (2009) j-1 i j

67
Institut für Anthropomatik Forward Probabilities Defined as: Recursion: Initial condition Jan Niehues - Lehrstuhl Prof. Alex Waibel j i

68
Institut für Anthropomatik Backward Probabilities Defined as: Recursion: Initial condition Jan Niehues - Lehrstuhl Prof. Alex Waibel j i

69
Institut für Anthropomatik Forward-Backward Calculaate Gamma and Epsilon with Alpha and Beta Gamma: Epsilon Jan Niehues - Lehrstuhl Prof. Alex Waibel

70
Institut für Anthropomatik Parameter Re-Estimation Lexicon probabilties: Aligment probailities: Jan Niehues - Lehrstuhl Prof. Alex Waibel

71
Institut für Anthropomatik Forward-Backward Training Pseudo Code Jan Niehues - Lehrstuhl Prof. Alex Waibel # Accumulation For each sentence-pair { Forward. (Calculate Alphas) Backward. (Calculate Betas) Calculate Xis and Gammas. For each source word { Increase LexiconCount(f_j|e_i) by Gamma(j,i). Increase AlignCount(i|i) by Epsilon(j,i,i). } # Update Normalize LexiconCount to get P(f_j|e_i). Normalize AlignCount to get P(i|i).

72
Institut für Anthropomatik Example HMM Training Jan Niehues - Lehrstuhl Prof. Alex Waibel

73
Institut für Anthropomatik IBM Models Phrase-based systems outperform these word-based translation models IBM Models can be used to generate a word alignment by using the viterbi path Problem: 1-to-many But we can generate many-to-1 alignments Use alignments from both directions and combine with a heuristic Jan Niehues - Lehrstuhl Prof. Alex Waibel

74
Institut für Anthropomatik Word alignment Jan Niehues - Lehrstuhl Prof. Alex Waibel

75
Institut für Anthropomatik Word alignment Jan Niehues - Lehrstuhl Prof. Alex Waibel

76
Institut für Anthropomatik Word alignment Evaluation: Given some manually aligned data (ref) and automatically aligned data (hyp) links can be Correct, i.e. link in hyp matches link in ref: true positive (tp) Wrong, i.e. link in hyp but not in ref: false positive (fp) Missing, i.e. link in ref but not in hyp: false negaitve (fn) Jan Niehues - Lehrstuhl Prof. Alex Waibel

77
Institut für Anthropomatik Word alignment Measures Precision: Number of correct links / Number of links in hyp Problem: Less Links -> Improve Presicion Recall: Number of correct links / Number of links in reference Problem: All links in Alignment -> Recall = 1 Jan Niehues - Lehrstuhl Prof. Alex Waibel

78
Institut für Anthropomatik Word alignment Measures Precision: Number of correct links / Number of links in hyp Problem: Less Links -> Improve Presicion Recall: Number of correct links / Number of links in reference Problem: All links in Alignment -> Recall = 1 Jan Niehues - Lehrstuhl Prof. Alex Waibel

79
Institut für Anthropomatik Word alignment Measures F-Score: Alignment error rate (AER): Jan Niehues - Lehrstuhl Prof. Alex Waibel

80
Institut für Anthropomatik Refernce Sometimes it is difficult for human annotators to decide Differentiate between sure and possible links Sets: A: generated links S: sure links (not finding a sure link is an error) P: possible links (putting a link which is not possible is an error) Alignment error rate Jan Niehues - Lehrstuhl Prof. Alex Waibel

81
Institut für Anthropomatik Conclusion Word-based Translation Models Word alignment as hidden variable Only 1-n alignments possible Jan Niehues - Lehrstuhl Prof. Alex Waibel

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google