1 Local Search for Optimal Permutations Jason Eisner and Roy Tromble with Very Large-Scale Neighborhoods in Machine Translation.

1 Local Search for Optimal Permutations Jason Eisner and Roy Tromble with Very Large-Scale Neighborhoods in Machine Translation

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 2 Motivation MT is really easy! Just use a finite-state transducer! Phrases, morphology, the works!

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 3 Permutation search in MT 1 4 23 56 initial order (French) 1 5 42 63 best order (French’) NNP Marie NEG ne PRP m’ AUX a NEG pas VBN vu NNP Marie NEG ne PRP m’ AUX a NEG pas VBN vu Mary hasn’t seenme easy transduction

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 4 Motivation MT is really easy! Just use a finite-state transducer! Phrases, morphology, the works! Have just to fix that pesky word order.

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 5 Often want to find an optimal permutation … Machine translation: Reorder French to French-prime (Brown et al. 1992) So it’s easier to align or translate MT eval: How much do you need to rearrange MT output so it scores well under an LM derived from ref translations? Discourse generation, e.g., multi-doc summarization: Order the output sentences (Lapata 2003) So they flow nicely Reconstruct temporal order of events after info extraction Learn rule ordering or constraint ranking for phonology? Multi-word anagrams that score well under a LM

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 6 How can we find this needle in the haystack of N! possible permutations? Permutation search: The problem 1 4 23 56 initial order 1 5 42 63 best order according to some cost function

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 7 Cost models initial order 1 4 23 56 1 5 42 63 cost of this order: 1.Does my favorite WFSA like it as a string? 2.Non-local pair order ok? 3.Non-local triple order ok? Add these all up … 4 before 3 …?1…2…3? These costs are enough to encode Traveling Salesperson Many other NP-complete problems IBM Model 4 and more …

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 8 Traditional approach: Beam search Approx. best path through a really big FSA N! paths: one for each permutation only 2 N states arc weight = cost of picking 5 next if we’ve seen {1,2,4} so far state remembers what we’ve generated so far (but not in what order)

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 9 An alternative: Local search (Germann et al. 2001) The “swap” neighborhood 1 2 3 4 5 6 cost=22 2 1 3 4 5 6 cost=26 1 2 3 4 5 6 cost=22 1 2 3 5 4 6 cost=25 1 3 2 4 5 6 cost=20 1 2 4 3 5 6 cost=19

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 10 An alternative: Local search (Germann et al. 2001) The “swap” neighborhood 1 2 3 4 5 6 cost=22 1 2 4 3 5 6 cost=19

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 11 An alternative: Local search 1 4 23 56 cost=22 The “swap” neighborhood cost=19cost=17cost=16... Why are the costs always going down? How long does it take to pick your swap? How many swaps might you need to reach answer? What if you get stuck in a local min? we pick best swap O(N)*O(1)? O(N 2 ) random restarts

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 12 Hill-climbing vs. random walks 1 2 3 4 5 6 cost=22 2 1 3 4 5 6 cost=26 1 2 3 4 5 6 cost=22 1 2 3 5 4 6 cost=25 1 3 2 4 5 6 cost=20 1 2 4 3 5 6 cost=19

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 13 Larger neighborhoods – fewer local mins? (Germann et al. 2001, Germann 2003) Now we can get to our destination in O(N) steps instead of O(N 2 ) But each step has to consider O(N 2 ) neighbors instead of O(N)  Push the runtime down here, it pops up there … Can we do better? 1 23 456 cost=22 cost=17 “Jump” neighborhood Yes! Consider exponentially many neighbors by dynamic programming

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 14 Let’s define each neighbor by a tree 1 4 23 56 = swap children

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 17 If that was the optimal neighbor … 1 456 23 … now look for its optimal neighbor new tree!

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 18 If that was the optimal neighbor … 56 1 4 2 3 … now look for its optimal neighbor new tree!

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 19 If that was the optimal neighbor … 56 1 4 23 … now look for its optimal neighbor … repeat till reach local optimum At each step, consider all possible trees by dynamic programming (CKY parsing)

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 20 Dynamic program must pick the tree that leads to the lowest-cost permutation initial order 1 4 23 56 1 5 42 63 cost of this order: 1.Does my favorite WFSA like it as a string?

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 21 A bigram model as a WFSA After you read 1, you’re in state 1 After you read 2, you’re in state 2 After you read 3, you’re in state 3 … and this state determines the cost of the next symbol you read

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 22 Including WFSA costs via nonterminals 1 4 23 56 6161424223231414I5I55656 A possible preterminal for word 2 is an arc in A that’s labeled with 2. The preterminal 4  2 rewrites as word 2 with a cost equal to the arc’s cost. 42 2

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 23 I3I3I3I3 Including WFSA costs via nonterminals 1 4 23 6161424223231414 4343 1313 6363 56 I5I55656 I6I6 6363 I6I6 6363 I6I6 I3I3 1 4 23 56 6161424223231414I5I55656 This constituent’s total cost is the total cost of the best 6  3 path. 61 1 423 4 23. 16 1 423 4 23 5 6 I 5 cost of the new permutation

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 24 Dynamic program must pick the tree that leads to the lowest-cost permutation initial order 1 4 23 56 1 5 42 63 cost of this order: 1.Does my favorite WFSA like it as a string? 2.Non-local pair order ok? 4 before 3 …?

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 25 Incorporating the pairwise ordering costs So this hypothesis must add costs 5 < 1, 5 < 2, 5 < 3, 5 < 4, 6 < 1, 6 < 2, 6 < 3, 6 < 4, 7 < 1, 7 < 2, 7 < 3, 7 < 4 Uh-oh! So now it takes O(N 2 ) time to combine two subtrees, instead of O(1) time? Nope – dynamic programming to the rescue again! 1 4 23 56 7 This puts {5,6,7} before {1,2,3,4}.

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 26 Incorporating the pairwise ordering costs 1 4 23 56 7 1234 5 6 7 1234 5 6 7 1234 5 6 7 1234 5 6 7 1234 5 6 7 So this hypothesis must add costs This puts {5,6,7} before {1,2,3,4}. =+-+ already computed at earlier steps of parsing

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 27 Incorporating 3-way ordering costs See the paper … A little tricky, but  comes “for free” if you’re willing to accept a certain restriction on these costs  more expensive without that restriction, but possible

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 28 How many steps to get from here to there? 84 6 2 53 7 1 45 1 2 36 7 8 One twisted-tree step? Not always … (Dekai Wu) initial order best order

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 29 Can you get to the answer in one step? German-English, Giza++ alignment often (yay, big neighborhood) not always (yay, local search)

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 30 84 6 2 53 7 1 How many steps to the answer in the worst case? (what is diameter of the search space?) 45 1 2 36 7 8 claim: only log 2 N steps at worst (if you know where to step) Let’s sketch the proof!

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 31 Quicksort anything into, e.g., 1 2 3 4 5 6 7 8 84 6 2 53 7 1  5  4 right-branching tree

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 32 Quicksort anything into, e.g., 1 2 3 4 5 6 7 8 17 2 4 38 5 6  6  5  4  7  2  3 sequence of right-branching trees Only log 2 N steps to get to 1 2 3 4 5 6 7 8 … … or to anywhere!

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 33 Speedups (read the paper!) We’re just parsing the current permutation as a string – and we know how to speed up parsers!  pruning  A*  best-first  coarse-to-fine Can restrict to a subset of parse trees  Gives us smaller neighborhoods, quicker to search, but still exponentially large  Right-branching trees, asymmetric trees … Note: Even w/o any of this, super-fast and effective on the LOP (no WFSA  no grammar const).

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 34 More on modeling (read the paper!) Encoding classical NP-complete problems Encoding translation decoding in general  Encoding IBM Model 4  Encoding soft phrasal constraints via hidden bracket symbols Costs that depend on features of source sentence  Training the feature weights

Eisner & Tromble - HLT-NAACL Workshop on Computationally Hard Methods and Joint Inference in NLP - June 2006 35 Summary Local search is fun and easy  Popular elsewhere in AI  Closely related to MCMC sampling Probably useful for translation Can efficiently use huge local neighborhoods  Algorithms are closely related to parsing and FSMs  We know that stuff better than anyone!

1 Local Search for Optimal Permutations Jason Eisner and Roy Tromble with Very Large-Scale Neighborhoods in Machine Translation.

Similar presentations

Presentation on theme: "1 Local Search for Optimal Permutations Jason Eisner and Roy Tromble with Very Large-Scale Neighborhoods in Machine Translation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Local Search for Optimal Permutations Jason Eisner and Roy Tromble with Very Large-Scale Neighborhoods in Machine Translation.

Similar presentations

Presentation on theme: "1 Local Search for Optimal Permutations Jason Eisner and Roy Tromble with Very Large-Scale Neighborhoods in Machine Translation."— Presentation transcript:

Similar presentations

About project

Feedback