Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.

Similar presentations


Presentation on theme: "Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation."— Presentation transcript:

1 Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation

2 Overview  Assignments Some notes about HW1 Final projects  Decoding for Machine Translation Brief Review of Search in AI Search space Costs Search for phrase-based translation Representing Hypotheses Pruning Multi-stack Decoding

3 Notes about Homework 1  Updated version with clarifications on how to run the Moses experiments  Do we need extension of due date?

4 Final projects  Proposals due 4/27  Updates due 5/11  Final reports due June 1  Presentation due June 1

5 Final project scope  About four times the work on one homework assignment per person  Project proposal: a short (one or two paragraphs) description of  What is the problem and general approach  Who is in the group and who will do what  What data you are using  Project update: a one page description of what you have done so far, with some preliminary results, if possible  Final report: a four to eight page description of the problem and results  Final presentation: depending on number of groups, x minute presentation

6 What this lecture is about A language model and a translation model define scores for candidate translations A decoder finds the (approximately) highest-scoring translation A decoder searches for the best translation, using a search algorithm

7 A review from introductory AI classes Search

8 Map of Romania with step costs in km Slide copied from Hwee Tou Ng's AI course slides

9 Search problem formulation A problem is defined by four items: 1. initial state e.g., "at Arad" 2. actions or successor function S(x) = set of action–state pairs  e.g., S(Arad) = {, … } 3. goal test  e.g., x = "at Bucharest" 4. path cost (additive)  e.g., sum of distances, number of actions executed, etc.  c(x,a,y) is the step cost, assumed to be ≥ 0 normally in AI search but not guaranteed in MT  A solution is a sequence of actions leading from the initial state to a goal state  We need to find lowest cost solution  For MT, can define cost as the negative score (max score = min cost) Slide copied from Hwee Tou Ng's AI course slides

10 How to search: tree search algorithms  Basic idea:  offline, simulated exploration of state space by generating successors of already-explored states (a.k.a.~expanding states) building up a tree of explored states

11 Best-first search  Idea: use an evaluation function f(n) for each node  estimate of "desirability"  Expand most desirable unexpanded node  Implementation: Order the nodes in fringe in decreasing order of desirability  Special cases:  greedy best-first search  A * search Slide copied from Hwee Tou Ng's AI course slides

12 Romania with step costs in km Slide copied from Hwee Tou Ng's AI course slides

13 A* best-first search  Evaluation function f(n) = g(n)+ h(n) (heuristic)  g(n)= cost of path to node  h(n)= estimate of cost from n to closest goal  e.g., h SLD (n) = straight-line distance from n to Bucharest  A* uses an admissible heuristic – one that does not overestimate the cost of the best path to a goal state  This property makes it optimal

14 A * search example

15

16

17

18

19

20 Search in phrase-based translation

21 Basic phrase-translation  Decisions for target sentence, segmentation and alignment, given source sentence  Source sentence is segmented into source phrases  Not linguistically motivated segmentation  Each source phrase is translated into a target phrase  Independent of other source phrases and their translations  The resulting target phrases are re-ordered to form output

22 Translation as sequence of actions: decoding process  Build translation from left to right  Select foreign sequence of words to be translated

23 Decoding process  Build translation from left to right  Select foreign sequence of words to be translated  Select English phrasal translation from phrase table  Append English words to the end of the partial translation

24 Decoding process  Build translation from left to right  Select foreign sequence of words to be translated  Select English phrasal translation from phrase table  Append English words to the end of the partial translation  Mark foreign words as translated So we know not to select them again later on

25 Decoding process  One-to-many translation

26 Decoding process  Many-to-one translation

27 Decoding process  Many-to-one translation

28 Decoding process  Reordering

29 Decoding process  Reordering  Translation finished (reached a goal)

30 Phrase translation options for source sentence  Many different phrase-translation options available for a sentence  Can look them all up before starting decoding

31 Decoding organization  Each sequence of actions we explore defines a partial translation hypothesis  Translation hypotheses are the analogue of search nodes in general search  The data we keep in each translation hypothesis should be sufficient to  Tell us which actions are applicable we need to know which foreign words have been translated  Tell us what is the cost so far (in the examples we will use probability instead and will multiply probs instead of adding up costs)  Allow us compute the cost of each possible next action  Allow us read off the target translation

32 Search hypotheses and initial hypothesis  Start with an initial empty hypothesis  e: no English words have been output  f: no foreign words have been translated  the probability so far is 1 (we will multiply in the prob. of each next action)  end prev 0 not shown here but need the end of the previous phrase in f for distortion model computation  prev 2 English words: not shown but need them for language model computation

33 Hypothesis expansion  Pick translation option  Create next hypothesis using this action e: Mary is in the output f: Maria has been translated p: the probability of the partial translation so far

34 Computing the probability of actions  Probability of actions depends on models used  Translation models  Phrasal probabilities in both directions  Lexical weighting probabilities  Word count, phrase count  Reordering model probability  Can be computed given current phrase pair and positions in the source of the current and previous phrase  Language model probability  Can be computed given English side of current phrase pair, and last 2 previous English words (for trigram LM)

35 Hypothesis expansion  Add another hypothesis using another translation option

36 Hypothesis expansion  Further expansion

37 Hypothesis expansion  Until all foreign words are translated  Trace the parent links back to the beginning to collect full translation

38 Hypothesis expansion  The search space explodes  grows exponentially with sentence length

39 Explosion of search space  The search graph grows exponentially with sentence length  Due to the number of possible re-orderings, problem is NP complete [Knight, 1999]  We need to reduce the search space  Can recombine equivalent hypotheses (loss-less, risk- free pruning)  Apply other kinds of pruning Histogram pruning Threshold pruning

40 Hypothesis recombination  Example: different paths to the same English output in partial hypotheses  Correspond to different phrasal segmentation

41 Hypothesis recombination  Combine equivalent hypotheses  Drop the weaker hypothesis  The weaker path is still available for lattice generation

42 Hypothesis recombination  The merged hypotheses do not need to match completely  We just need them to have the same best path to completion The same applicable future expansions with the same scores Same last 2 English words, coverage vectors, last phrase source position  Since any path that goes through the worse hypothesis can be changed to use the path to the better hypothesis and then the same path to the end, we are not losing anything.

43 Pruning

44 Hypothesis stacks  The i-th stack contains hypotheses for which i source words have been translated  Process stacks in order  Expand all hypotheses from a stack  Place expanded hypotheses on corresponding stacks

45 How to compare hypotheses  So far we only have the probability (cost) of each hypothesis  Comparing hypotheses that have translated the same number of words makes these costs more comparable  Can we do better than comparing based on cost so far?

46 Comparing hypotheses  Comparing two hypotheses translating the same number of words  The one translating an easier part of the sentence is preferred  Can do better by considering future cost of translating the rest of the source words

47 Estimating future cost  The closest to the correct future cost we get, the better for our search  But computation of the future cost should not take too long  A future cost estimate that is less or equal to the true cost (optimistic), guarantees optimality in A* search  This has usually been too slow in practice, so we don’t use A* and admissible heuristics

48 Estimating future cost  The future cost will be the sum of costs of actions (translations) that we will take in the future  We can estimate the cost of each translation option for the sentence  Translation probabilities: context independent  Lang model: context dependent, so we approximate P(to)P(the|to)  Reordering model cost: ignore, can’t estimate without context  Prob for option = LM * TM

49 Future cost estimation  Find the cost of the cheapest translation for a given source phrase (highest probability)

50 Future cost estimation  For each span of the source sentence (each contiguous sequence of words) compute the cost of the cheapest combination of translation options  Can be done efficiently using dynamic programming

51 Estimation of combined score of hypotheses  Add up the costs of contiguous spans in the un-translated sequence of words to compute future cost  Add future cost to cost so far to compute combined score used for pruning

52 Limits on reordering  Limits on reordering can reduce the search space dramatically  Monotone decoding  Target phrases follow same order as source phrases  Reordering limit n (used in Moses)  Forbid jumps greater with distance greater than n  Results in polynomial inference  In addition to speed-ups reordering limits often lead to improved translations  Because the reordering models are weak

53 Word lattices  Can easily extract a word lattice from search graph  Can extract n-best translation hypotheses  n-best lists are used for discriminative re-ranking and training of log-linear model parameters

54 Summary  Search in phrase-based translation  Decomposing translation into sequence of actions  Building partial translation hypotheses from left to right  Computing cost by adding up costs for actions  Recombining hypotheses for loss-less memory and time savings  Pruning based on estimated score for risky search- space reduction  Organizing hypotheses in comparable stacks  Estimating future cost

55 Readings and next time  For this week  Chapter 6, SMT Decoding  Optional: Germann et al 03 (nice paper comparing optimal and other decoders for IBM model 4)  For next week  Starting on tree-based models


Download ppt "Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation."

Similar presentations


Ads by Google