Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, 2009 11/20/20141.

Similar presentations


Presentation on theme: "Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, 2009 11/20/20141."— Presentation transcript:

1 Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, /20/20141

2 Overview Brief recap on SMT & morphological analysis Motivation Enriched translation model – Twin phrase-table construction – Merging phrase tables Experiments Conclusion 11/20/20142

3 SMT overview – alignment Parallel data 11/20/2014 These are, first and foremost, messages of concern at the economic and social problems that we are experiencing, in spite of a period of sustained growth stemming from years of efforts by all our fellow citizens. Ensinnäkin kohtaamiemme taloudellisten ja sosiaalisten vaikeuksien vuoksi on havaittavissa huolestumista, vaikka kasvu on kestävällä pohjalla ja tulosta vuosien ponnisteluista, kaikkien kansalaistemme taholta. Alignment: one-to-many (1-M) Marianodabaunabotefadaalabrujaverde NULLMarydidnotslapthegreenwitch Source Target 3

4 SMT overview – translation model Intersect alignment 1-M + M-1  M – M Extracting phrases from M-M alignment  translation model (phrase table). 11/20/2014 problems ||| ongelmat ||| problems ||| ongelmasta ||| … problems ||| vaikeuksista ||| problems ||| vaikeuksien ||| Phrase penalty Translation probabilities English eForeign f 4 Lexical probabilities

5 Recap - Morphological analysis Morpheme: minimal meaning-bearing unit English: machine + s, present + ed, etc. Finnish: oppositio + kansa + n + edusta + ja = opposition of parliament member Morfessor (Creutz & Lagus, 2007): segment words, unsupervised manner un/PRE + fortunate/STM + ly/SUF 11/20/20145

6 Motivation Problem: – Multiple word forms in morphology-complex language, e.g. ongelmat, ongelmasta, etc. – Rare words often occur and are hard to align  incorrect entries in normal (word-align) phrase table. Solution: – Construct morpheme-align phrase table (PT) to aggregate better statistics for rare words. – Combine word- and morpheme-align PTs to produce even better translation model in a proper way. 11/20/20146

7 Overview Brief recap on SMT & morphological analysis Motivation Enriched translation model – Twin phrase-table construction – Merging phrase tables Experiments Conclusion 11/20/20147

8 Twin phrase-table (PT) construction 11/20/2014 GIZA++ Decoding Word alignment Morpheme alignment WordMorpheme PT m PT wm Phrase Extraction PT w Morphological segmentation Phrase Extraction GIZA++ PT merging problem/STM+ s/SUF ||| ongelma/STM+ t/SUF problem/STM+ s/SUF ||| vaikeu/STM+ ksi/SUF+ sta/SUF problems ||| vaikeuksista 8

9 Existing PT-merging methods Add-feature - (Nakov, 2008; Chen et. al. 2009): F1 = F2 = F3 =  heuristic-driven Interpolation - (Wu & Wang, 2007) : – tran(f|e) = α * tran 1 (f|e) + (1- α) * tran 2 (f|e) – lex(f|e) = β * lex 1 (f|e) + (1- β) * lex 2 (f|e)  not consider score “meaning” 11/20/ if from 1 st PT 0.5 otherwise 1 if from 2 nd PT 0.5 otherwise 1 if from both PTs 0.5 otherwise 9

10 Our merging method – normalizing translation probabilities tran 1 (e|f) =count 1 (e, f) / ∑ e count 1 (e, f) tran 2 (e|f) =count 2 (e, f) / ∑ e count 2 (e, f) 11/20/ problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE

11 Our merging method – normalizing translation probabilities tran(vaikeuksista | problems) =1/2=0.5 tran(ongelmasta | problems) =1/2=0.5 tran(ongelmat | problems) = 3/4 = 0.75 tran(vaikeuksista | problems) = 1/4 = 0.25 Undesired translation! tran(vaikeuksista | problems) = ( )/2 = tran(ongelmat | problems) = ( )/2 = tran(ongelmasta | problems) = ( )/2 = 0.25 Interpolation (ratio = 0.5) 11/20/ problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE

12 Our merging method – normalizing translation probabilities tran 1 (e|f) =count 1 (e, f) / ∑ e count 1 (e, f) tran 2 (e|f) =count 2 (e, f) / ∑ e count 2 (e, f) 11/20/ Normalization tran(e|f) =[ count 1 (e, f) + count 2 (e, f)] / [ ∑ e count 1 (e, f) + ∑ e count 2 (e, f) ] problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE

13 Our merging method – normalizing translation probabilities tran(vaikeuksista | problems) =1/2=0.5 tran(ongelmasta | problems) =1/2=0.5 tran(ongelmat | problems) = 3/4 = 0.75 tran(vaikeuksista | problems) = 1/4 = 0.25 tran(vaikeuksista | problems) = (1 + 1)/(2+4) = 0.33 tran(ongelmat | problems) = (0 + 3)/(2 + 4) = 0.5 tran(ongelmasta | problems) = (1 + 0)/(2 + 4) = 0.17 Desired translation! Normalization 11/20/ problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE

14 Our merging method – full lexical probability interpolation lex(vaikeuksista | problems) = w 1 lex(ongelmasta | problems) = w 2 lex(vaikeu + ksi + sta | problem + s) = m 1 lex(ongelma + t | problem + s) = m 3 lex(vaikeuksista | problems) = (w 1 + m 1 )/2 lex(ongelmat | problems) = (w 2 + 0)/2 lex(ongelmasta | problems) = (0 + m 3 ) /2 Normal Interpolation (ratio = 0.5) Missing interpolated probabilities ! PT m lexical model P(vaikeuksista|problems) P(ongelmasta|problems) P(vaikeu|problem), P(ongelma|problem), P(t|s), P(ksi|s),P(sta|s) 11/20/ PT w lexical model Estimate lex(ongelma + sta | problem + s) using PT m lexical model  m 2 Estimate lex(ongelmat | problems) using PT w lexical model  w 3 Full Interpolation

15 Overview Brief recap on SMT & morphological analysis Motivation Enriched translation model – Twin phrase-table construction – Merging phrase tables Experiments Conclusion 11/20/201415

16 Experiments – dataset 2005 ACL shared task (Koehn & Monz, 2005) 11/20/201416

17 Experiments – baselines w-system: uses PT w translate at word-level m-system: uses PT m translate at morpheme-level m-BLEU: BLEU where each token unit is a morpheme 11/20/201417

18 Experiments – our system Improvements over m-system and w-system are statistically significant using sign test by (Collins et al. 2005) 11/20/201418

19 Conclusion Our contributions: Enrich the translation model without using additional data. Propose a principal way to merge phrase tables generated at different granularities. 11/20/201419

20 Q & A Thank you !!! 11/20/201420


Download ppt "Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, 2009 11/20/20141."

Similar presentations


Ads by Google