Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.

Similar presentations


Presentation on theme: "A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009."— Presentation transcript:

1 A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009

2 把 7 月 11 日 設立 為 航海 節 Introduction Machine Translation –Chinese to English –Chinese 把 7 月 11 日 設立 為 航海 節 An ideal case: to establish July 11 as Sailing Festival day

3 Wrong Linguistic Structure 航海 節 is a syntactic constituent 把 7 月 11 日 設立 為 航海 節 to set up for navigation on July 11 knots

4 A Naive Solution Employ syntactic constraints –Fully respect linguistic structures

5 把 今天 設立 為航海 節 A Naive Solution (2) Unfortunately, it damages the performance –Non-syntactic translations are sometimes useful Sailing Festival dayestablish today as

6 Syntax-Driven Bracketing Model SDB model Translation unit is more important –Whether it is syntactic or non-syntactic Include but not limited to constituent matching/violation Protect the strength of the phrase-based system

7 Translation Unit Bracketable source phrase and its corresponding translation Bracketable –A source phrase is bracketable Its translation is contiguous –A pair of neighboring phrases is bracketable Their translations are contiguous after combined

8 establish today as Translation Unit Examples Bracketable 把 今天 設立 為 establish today as 把 今天 設立為 把 今天 設立 and 為 are bracketable 把 今天 設立 為 is bracketable

9 把 今天 設立 為 establish today as Translation Unit Examples Unbracketable 設立 and 為 are unbracketable 設立 為 is unbracketable

10 Bracketing Instances Extraction Extract bracketable and unbracketable instances from training data –Aligned sentence pair + parsed source sentence Estimate whether a source phrase is bracketable at run time

11

12 SDB Features

13 Rule Features Rule Features (RF) –CFG rule –Horizontal context

14 Rule Features (2) S 1 : ADVP  AD S 2 : VP  VV AS NP S: VP  ADVP VP

15 Path Features Path features (PF) –Path to roots S1 to the root of S S2 to the root of S S to the root of this tree –Vertical context

16 Path Features (2) S 1 : ADVP VP S 2 : VP VP S: VP IP

17 Constituent Boundary Matching Features Constituent Boundary Matching Features (CBMF) –Exact match Source phrase covers the boundaries of its tree –Inside match Source phrase covers a sequence of its tree –Crossing match Source phrase crosses the subtree of its tree

18 Constituent Boundary Matching Features (3) Exact match Inside match Crossing match

19 Integration into Phrase-based MT SDB model estimate the probability that a source phrase is bracketable. –Whether it can be translated as a unit Integrated into BTG MT system –Bracketing Transduction Grammar (Wu, 1997) establish today as 把 今天 設立為 as establish today 把 今天 設立為 Straight Inverted

20 Experiment Comparing models –Baseline: BTG system –XP+ (Marton and Resnik, 2008) NP, VP, PP, ADVP…. Penalize each time when violating the syntactic boundaries. (soft constraint) –UniSDB Only S features –BiSDB S 1, S 2 and S features

21 Experiment (2) Chinese parser –Lexicalized PCFG parser (Xiong et al., 2005) Parallel corpus –FBIS corpus Word alignment –GIZA++ Four-gram language model –Built with SRILM –Xinhua section of the the English Gigaword corpus Maximum Entropy (ME) Trainer –Zhang 2004

22 Result SDB receives the largest feature weight –Imply its impact on decoder. Baseline features (Common for phrase-based systems) XP+ and SDB

23 Result (2) NIST MT-05 test set –Improvement of 1.67 BLEU over baseline –Improvement of 0.59 BLEU over XP+

24 Result (3) Based on CBMF, adding rule and path feature achieves further improvement BiSDB is constantly better than UniSDB –Inner contexts (S1 and S2) are useful

25 XP+ and SDB Same –Consider syntactic constituent Different –XP+ only punishes non-syntactic source phrase –SDB is able to encourage non-syntactic if the phrase is bracketable

26 XP+ and SDB

27 Conclusion SDM model predict whether a source phrase can be translated as a unit. Appropriate constituent violations are helpful –Because it better inherit the strength of phrase-based approach


Download ppt "A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009."

Similar presentations


Ads by Google