Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rule Refinement for Spoken Language Translation by Retrieving the Missing Translation of Content Words Linfeng Song, Jun Xie, Xing Wang, Yajuan Lü and.

Similar presentations


Presentation on theme: "Rule Refinement for Spoken Language Translation by Retrieving the Missing Translation of Content Words Linfeng Song, Jun Xie, Xing Wang, Yajuan Lü and."— Presentation transcript:

1 Rule Refinement for Spoken Language Translation by Retrieving the Missing Translation of Content Words Linfeng Song, Jun Xie, Xing Wang, Yajuan Lü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences 1

2 Motivation Spoken language translation suffers serious problem of missing content words no, you need 10 minutes to go to the main street, (the bus) comes every 10 minutes 2

3 Motivation further investigation shows that this happens due to the usage of incorrect MT rules 我 想 买 茶叶 送给 家人 做 礼物 。 rule : #X1# 茶叶 #X2#-> #X1# #X2# 我 想 买 I would like to buy 送给 家人 做 礼物 。 souvenir for my family. 3 result: I would like to buy souvenir for my family.

4 Motivation There is no specific feature in classic SMT framework to distinguish bad rules from good ones. An obvious way to tackle this problem is to find a way to distinguish those bad MT rules from the good ones. 4

5 two rules 推荐 的 茶 tea recommended 推荐 的 茶 tea R1 R2 a good rule a bad rule that miss the translation of content word “ 推荐 ” 5

6 two rules 推荐 的 茶 tea recommended 推荐 的 茶 tea R1 R2 R2 may be favored by classic MT system Since it generate shorter translation result 6

7 Our Model 7

8 推荐 的 茶 tea recommended 推荐 的 茶 tea R1 R2 8

9 Training bilingual corpus with word alignment info 9 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended ……

10 Training 推荐 recommended 茶 tea 日本 japanese 日本 茶 japanese tea 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended bilingual corpus with word alignment info …… 10

11 Training bilingual corpus with word alignment info 11 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended isn’t content phrase content phrase stoplist 么 吗 的 … content words are label with bold face ……

12 Training Co-relation table 茶 tea 茶 Japanese tea 4.89 … 12 推荐 recommended 茶 tea 日本 japanese 日本 茶 japanese tea … bilingual corpus with word alignment information 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended ……

13 Two penalties Source Unaligned Penalty – the number of unaligned source content words in a rule Target Unaligned Penalty – the number of unaligned target content words in a rule 13

14 Experiment Data Sets – training : 280K CH-EN spoken language sentences – tuning : DEVSET2 of IWSLT 2010 – test : DEVSET3 ~ DEVSET6 of IWSLT 2010 – training set is used to our model 14

15 Experiment 15

16 Thanks Q & A 16


Download ppt "Rule Refinement for Spoken Language Translation by Retrieving the Missing Translation of Content Words Linfeng Song, Jun Xie, Xing Wang, Yajuan Lü and."

Similar presentations


Ads by Google