Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Informed search algorithms
Informed Search Algorithms
Informed search algorithms
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Solving Problem by Searching
May 12, 2013Problem Solving - Search Symbolic AI: Problem Solving E. Trentin, DIISM.
EIE426-AICV 1 Blind and Informed Search Methods Filename: eie426-search-methods-0809.ppt.
Search Strategies CPS4801. Uninformed Search Strategies Uninformed search strategies use only the information available in the problem definition Breadth-first.
1 Solving problems by searching Chapter 3. 2 Why Search? To achieve goals or to maximize our utility we need to predict what the result of our actions.
CS 380: Artificial Intelligence Lecture #3 William Regli.
Problem Solving by Searching
SE Last time: Problem-Solving Problem solving: Goal formulation Problem formulation (states, operators) Search for solution Problem formulation:
Review: Search problem formulation
1 Solving problems by searching Chapter 3. 2 Why Search? To achieve goals or to maximize our utility we need to predict what the result of our actions.
Artificial Intelligence
Cooperating Intelligent Systems Informed search Chapter 4, AIMA.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
Problem Solving and Search in AI Heuristic Search
CSC344: AI for Games Lecture 4: Informed search
CS 561, Session 6 1 Last time: Problem-Solving Problem solving: Goal formulation Problem formulation (states, operators) Search for solution Problem formulation:
1 Solving problems by searching Chapter 3. 2 Why Search? To achieve goals or to maximize our utility we need to predict what the result of our actions.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
Graphs II Robin Burke GAM 376. Admin Skip the Lua topic.
Vilalta&Eick: Informed Search Informed Search and Exploration Search Strategies Heuristic Functions Local Search Algorithms Vilalta&Eick: Informed Search.
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
1 Solving problems by searching This Lecture Chapters 3.1 to 3.4 Next Lecture Chapter 3.5 to 3.7 (Please read lecture topic material before and after each.
Informed search algorithms
Informed search algorithms Chapter 4. Outline Best-first search Greedy best-first search A * search Heuristics.
1 Shanghai Jiao Tong University Informed Search and Exploration.
AI in game (II) 권태경 Fall, outline Problem-solving agent Search.
Informed search algorithms Chapter 4. Best-first search Idea: use an evaluation function f(n) for each node –estimate of "desirability"  Expand most.
Informed Search Methods. Informed Search  Uninformed searches  easy  but very inefficient in most cases of huge search tree  Informed searches  uses.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
1 Solving problems by searching 171, Class 2 Chapter 3.
Advanced Artificial Intelligence Lecture 2: Search.
SOLVING PROBLEMS BY SEARCHING Chapter 3 August 2008 Blind Search 1.
A General Introduction to Artificial Intelligence.
CSC3203: AI for Games Informed search (1) Patrick Olivier
Informed Search I (Beginning of AIMA Chapter 4.1)
1 Kuliah 4 : Informed Search. 2 Outline Best-First Search Greedy Search A* Search.
Solving problems by searching 1. Outline Problem formulation Example problems Basic search algorithms 2.
CPSC 420 – Artificial Intelligence Texas A & M University Lecture 3 Lecturer: Laurie webster II, M.S.S.E., M.S.E.e., M.S.BME, Ph.D., P.E.
Informed Search and Heuristics Chapter 3.5~7. Outline Best-first search Greedy best-first search A * search Heuristics.
4/11/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 4, 4/11/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Feng Zhiyong Tianjin University Fall  Best-first search  Greedy best-first search  A * search  Heuristics  Local search algorithms  Hill-climbing.
Best-first search Idea: use an evaluation function f(n) for each node –estimate of "desirability"  Expand most desirable unexpanded node Implementation:
Pengantar Kecerdasan Buatan 4 - Informed Search and Exploration AIMA Ch. 3.5 – 3.6.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Artificial Intelligence Lecture No. 8 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
CPSC 420 – Artificial Intelligence Texas A & M University Lecture 5 Lecturer: Laurie webster II, M.S.S.E., M.S.E.e., M.S.BME, Ph.D., P.E.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Chapter 3 Solving problems by searching. Search We will consider the problem of designing goal-based agents in observable, deterministic, discrete, known.
Chapter 3.5 Heuristic Search. Learning Objectives Heuristic search strategies –Best-first search –A* algorithm Heuristic functions.
Last time: Problem-Solving
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
HW #1 Due 29/9/2008 Write Java Applet to solve Goats and Cabbage “Missionaries and cannibals” problem with the following search algorithms: Breadth first.
Discussion on Greedy Search and A*
Discussion on Greedy Search and A*
Solving problems by searching
Lecture 1B: Search.
CS 4100 Artificial Intelligence
COMP 8620 Advanced Topics in AI
Informed search algorithms
Informed search algorithms
Artificial Intelligence
Solving problems by searching
Artificial Intelligence
Informed Search.
Presentation transcript:

Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation

Overview  Assignments Some notes about HW1 Final projects  Decoding for Machine Translation Brief Review of Search in AI Search space Costs Search for phrase-based translation Representing Hypotheses Pruning Multi-stack Decoding

Notes about Homework 1  Updated version with clarifications on how to run the Moses experiments  Do we need extension of due date?

Final projects  Proposals due 4/27  Updates due 5/11  Final reports due June 1  Presentation due June 1

Final project scope  About four times the work on one homework assignment per person  Project proposal: a short (one or two paragraphs) description of  What is the problem and general approach  Who is in the group and who will do what  What data you are using  Project update: a one page description of what you have done so far, with some preliminary results, if possible  Final report: a four to eight page description of the problem and results  Final presentation: depending on number of groups, x minute presentation

What this lecture is about A language model and a translation model define scores for candidate translations A decoder finds the (approximately) highest-scoring translation A decoder searches for the best translation, using a search algorithm

A review from introductory AI classes Search

Map of Romania with step costs in km Slide copied from Hwee Tou Ng's AI course slides

Search problem formulation A problem is defined by four items: 1. initial state e.g., "at Arad" 2. actions or successor function S(x) = set of action–state pairs  e.g., S(Arad) = {, … } 3. goal test  e.g., x = "at Bucharest" 4. path cost (additive)  e.g., sum of distances, number of actions executed, etc.  c(x,a,y) is the step cost, assumed to be ≥ 0 normally in AI search but not guaranteed in MT  A solution is a sequence of actions leading from the initial state to a goal state  We need to find lowest cost solution  For MT, can define cost as the negative score (max score = min cost) Slide copied from Hwee Tou Ng's AI course slides

How to search: tree search algorithms  Basic idea:  offline, simulated exploration of state space by generating successors of already-explored states (a.k.a.~expanding states) building up a tree of explored states

Best-first search  Idea: use an evaluation function f(n) for each node  estimate of "desirability"  Expand most desirable unexpanded node  Implementation: Order the nodes in fringe in decreasing order of desirability  Special cases:  greedy best-first search  A * search Slide copied from Hwee Tou Ng's AI course slides

Romania with step costs in km Slide copied from Hwee Tou Ng's AI course slides

A* best-first search  Evaluation function f(n) = g(n)+ h(n) (heuristic)  g(n)= cost of path to node  h(n)= estimate of cost from n to closest goal  e.g., h SLD (n) = straight-line distance from n to Bucharest  A* uses an admissible heuristic – one that does not overestimate the cost of the best path to a goal state  This property makes it optimal

A * search example

Search in phrase-based translation

Basic phrase-translation  Decisions for target sentence, segmentation and alignment, given source sentence  Source sentence is segmented into source phrases  Not linguistically motivated segmentation  Each source phrase is translated into a target phrase  Independent of other source phrases and their translations  The resulting target phrases are re-ordered to form output

Translation as sequence of actions: decoding process  Build translation from left to right  Select foreign sequence of words to be translated

Decoding process  Build translation from left to right  Select foreign sequence of words to be translated  Select English phrasal translation from phrase table  Append English words to the end of the partial translation

Decoding process  Build translation from left to right  Select foreign sequence of words to be translated  Select English phrasal translation from phrase table  Append English words to the end of the partial translation  Mark foreign words as translated So we know not to select them again later on

Decoding process  One-to-many translation

Decoding process  Many-to-one translation

Decoding process  Many-to-one translation

Decoding process  Reordering

Decoding process  Reordering  Translation finished (reached a goal)

Phrase translation options for source sentence  Many different phrase-translation options available for a sentence  Can look them all up before starting decoding

Decoding organization  Each sequence of actions we explore defines a partial translation hypothesis  Translation hypotheses are the analogue of search nodes in general search  The data we keep in each translation hypothesis should be sufficient to  Tell us which actions are applicable we need to know which foreign words have been translated  Tell us what is the cost so far (in the examples we will use probability instead and will multiply probs instead of adding up costs)  Allow us compute the cost of each possible next action  Allow us read off the target translation

Search hypotheses and initial hypothesis  Start with an initial empty hypothesis  e: no English words have been output  f: no foreign words have been translated  the probability so far is 1 (we will multiply in the prob. of each next action)  end prev 0 not shown here but need the end of the previous phrase in f for distortion model computation  prev 2 English words: not shown but need them for language model computation

Hypothesis expansion  Pick translation option  Create next hypothesis using this action e: Mary is in the output f: Maria has been translated p: the probability of the partial translation so far

Computing the probability of actions  Probability of actions depends on models used  Translation models  Phrasal probabilities in both directions  Lexical weighting probabilities  Word count, phrase count  Reordering model probability  Can be computed given current phrase pair and positions in the source of the current and previous phrase  Language model probability  Can be computed given English side of current phrase pair, and last 2 previous English words (for trigram LM)

Hypothesis expansion  Add another hypothesis using another translation option

Hypothesis expansion  Further expansion

Hypothesis expansion  Until all foreign words are translated  Trace the parent links back to the beginning to collect full translation

Hypothesis expansion  The search space explodes  grows exponentially with sentence length

Explosion of search space  The search graph grows exponentially with sentence length  Due to the number of possible re-orderings, problem is NP complete [Knight, 1999]  We need to reduce the search space  Can recombine equivalent hypotheses (loss-less, risk- free pruning)  Apply other kinds of pruning Histogram pruning Threshold pruning

Hypothesis recombination  Example: different paths to the same English output in partial hypotheses  Correspond to different phrasal segmentation

Hypothesis recombination  Combine equivalent hypotheses  Drop the weaker hypothesis  The weaker path is still available for lattice generation

Hypothesis recombination  The merged hypotheses do not need to match completely  We just need them to have the same best path to completion The same applicable future expansions with the same scores Same last 2 English words, coverage vectors, last phrase source position  Since any path that goes through the worse hypothesis can be changed to use the path to the better hypothesis and then the same path to the end, we are not losing anything.

Pruning

Hypothesis stacks  The i-th stack contains hypotheses for which i source words have been translated  Process stacks in order  Expand all hypotheses from a stack  Place expanded hypotheses on corresponding stacks

How to compare hypotheses  So far we only have the probability (cost) of each hypothesis  Comparing hypotheses that have translated the same number of words makes these costs more comparable  Can we do better than comparing based on cost so far?

Comparing hypotheses  Comparing two hypotheses translating the same number of words  The one translating an easier part of the sentence is preferred  Can do better by considering future cost of translating the rest of the source words

Estimating future cost  The closest to the correct future cost we get, the better for our search  But computation of the future cost should not take too long  A future cost estimate that is less or equal to the true cost (optimistic), guarantees optimality in A* search  This has usually been too slow in practice, so we don’t use A* and admissible heuristics

Estimating future cost  The future cost will be the sum of costs of actions (translations) that we will take in the future  We can estimate the cost of each translation option for the sentence  Translation probabilities: context independent  Lang model: context dependent, so we approximate P(to)P(the|to)  Reordering model cost: ignore, can’t estimate without context  Prob for option = LM * TM

Future cost estimation  Find the cost of the cheapest translation for a given source phrase (highest probability)

Future cost estimation  For each span of the source sentence (each contiguous sequence of words) compute the cost of the cheapest combination of translation options  Can be done efficiently using dynamic programming

Estimation of combined score of hypotheses  Add up the costs of contiguous spans in the un-translated sequence of words to compute future cost  Add future cost to cost so far to compute combined score used for pruning

Limits on reordering  Limits on reordering can reduce the search space dramatically  Monotone decoding  Target phrases follow same order as source phrases  Reordering limit n (used in Moses)  Forbid jumps greater with distance greater than n  Results in polynomial inference  In addition to speed-ups reordering limits often lead to improved translations  Because the reordering models are weak

Word lattices  Can easily extract a word lattice from search graph  Can extract n-best translation hypotheses  n-best lists are used for discriminative re-ranking and training of log-linear model parameters

Summary  Search in phrase-based translation  Decomposing translation into sequence of actions  Building partial translation hypotheses from left to right  Computing cost by adding up costs for actions  Recombining hypotheses for loss-less memory and time savings  Pruning based on estimated score for risky search- space reduction  Organizing hypotheses in comparable stacks  Estimating future cost

Readings and next time  For this week  Chapter 6, SMT Decoding  Optional: Germann et al 03 (nice paper comparing optimal and other decoders for IBM model 4)  For next week  Starting on tree-based models