Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discriminative Word Alignment with Conditional Random Fields Phil Blunsom & Trevor Cohn [ACL2006] Eiji ARAMAKI.

Similar presentations


Presentation on theme: "Discriminative Word Alignment with Conditional Random Fields Phil Blunsom & Trevor Cohn [ACL2006] Eiji ARAMAKI."— Presentation transcript:

1 Discriminative Word Alignment with Conditional Random Fields Phil Blunsom & Trevor Cohn [ACL2006] Eiji ARAMAKI

2

3 Philipp Koehn

4 Outline Motivation How to use CRF for alignment task Features Experiments Conclusion

5 After TMI & MT-summit No Machine Learning (ML)-junky paper – Except for: Rens Bod’s unsupervised parsing Why Machine Translation (MT) is far from ML – BECAUSE: Translation model (Alignment): IBM-model Language model: n-gram model Decoder – BUT: Few studies challenged supervised alignment Unsupervised “Few” means 3-4 papers EM-based

6 What is Alignment? To estimate word correspondences between 2 sentences Usually: word-to-word [IBM-model] His friend runs the company 彼の 友人は 会社を 経営している Simple ! BUT: suffer from different word orders

7 What is Alignment? Another alignment: Tree-to-tree [MSR-MT] His friend runs the company 彼の 友人は 会社を 経営している Promising! BUT: suffer from parsing errors

8 Points of this paper (1) Alignment ≠ labeling task – How to use CRF for Alignment task – (c.f.) Shift-Reduce for dependency parsing – (c.f.) IOB-representation for NER (2) What kind of features? もん ちぇ

9 Outline Motivation How to use CRF for alignment task Features Experiments Conclusion

10 Conditional Random Fields (CRF) Log-linear model: for sentence x= & label sequence y=, – λ k : parameter – f k : feature function – Z : partition function P(y|x)= exp( ∑ λ k f k (x,y)) 1Z1Z k Z= ∑ exp( ∑ λ k f k (x,y)) k y

11 CRF for POS tagging Hisfriendrunsthecompany DT NN ………… x y 遷移素性 Aaaaaa..s$..d$..s$..e$..y$ 観測素性 + = y

12 CRF for Alignment Hisfriendrunsthecompany x y 彼の 友人は会社を経営している x 12345 1234 a1=1a1=1a2=2a2=2 a3=4a3=4 a4=3a4=3a5=4a5=4 y Japanese ID English ID 観測素性の一部 depends on alignment

13 Feature example (dictionary feature) Given two words (J t, Ea t ) Found in dictionary → F dic (J t, Ea t )=1 Otherwise → F dic (J t, Ea t )=0 His 彼の 友人は 1 12 Fdic (J 1, Ea 1 )=1 His 彼の 友人は 1 12 Fdic (J 1, Ea 1 )=0 1 2

14 (I’m afraid) No alignment (=NULL) changes the # of labels? No! NULL is also a label (align to 0). Hisfriendrunsthecompany 彼の 家は遠い 12345 123 0 φ a1=1a1=1a2=0a2=0 a3=0a3=0 a4=0a4=0a5=0a5=0

15 Formalization P(y|x)= exp( ∑ λ k f k (x,y)) 1Z1Z k Z= ∑ exp( ∑ λ k f k (x,y)) k y Conventional CRF t: ranges over the indices of SRC sentence (f) k: ranges over features

16 Outline Motivation How to use CRF for alignment task Features Experiments Conclusion

17 Features (1/4) Senction3.1 Dice – C # (@) is count of the occurrences of word # in @ – Suffer from singletons – (c.f.) 1/1 = 100/100 ? IBM-Model 1 (see Knight Workbook@JHU) Dice (e, f)= 2 C EF (e,f) C E (e)+ C F (f)

18 Features (2/4) Orthographic features – BASIC IDEA: In European languages, similar spelling=translation – Edit distance between terms with/without vowels POS tags Bilingual dictionary – Dictionary match or not

19 Features (3/4) Markov features – BASIC IDEA: Monotonic alignment is preferred Jump width(t-1, t)=abs(a t -a t-1 -1) Hisfriend 彼の 1 12 a1=1a1=1a2=2a2=2 友人は 2 Jump width(t-1, t)=2-1-1 = 0

20 Features (3/4) Markov features – BASIC IDEA: Monotonic alignment is preferred Jump width(t-1, t)=abs(a t -a t-1 -1) Jump width(t-1, t)=3-4-1 = 2 runsthe 会社を経営している 34 34 a3=4a3=4 a4=3a4=3

21 Features (4/4) Relative sentence position – BASIC IDEA: Same positions in both sentences are preferred RSP (t)=abs(a t /|e|-t/|f|) e f

22

23 Outline Motivation How to use CRF for alignment task Features Experiments Conclusion

24 Experimental Setting Corpus – NAACL2004-WS shared-task [Mihalcea&Pedersen] – ACL2005 shared-task [Martin+] Evaluation AER (A,S,P)= |A∩S|+|A∩P| |A|+|S| – A: system output – S: Gold standard (E→F ∩ F→E) – P: Gold standard (E→F ∪ F→E)

25 Results

26 Go through examples With/without Markov features

27 Outline Motivation How to use CRF for alignment task Features Experiments Conclusion

28 Discriminative model with a few hundred training sentences outperforms the generative IBM Model 4 MAYBE: This method is helpful only between similar languages – BECAUSE: its features prefer monotone alignment – BUT: …(From here, our future task)

29 Reference “A Discriminative Matching Approach to Word Alignment” [Ben Taskar+, HLT-EMNLP2005] – Another discriminative alignment – BUT: Deal with only 1-to-1/φ – Motivated this paper “Conditional Random Fields を用いた略語抽 出 ”[Okazaki+, YANS2007] – Alignment acronym-to-full-spelling – MAYBE: similar task

30 Three promising future tasks in MT (in my feeling) (1) Discriminative alignment [This paper] (2) Monolingual machine translation – Context-based MT (CBMT) [Grassiany+,AAMT2006] (3) How to use Comparable corpus – [Quirk+, MT-summit2007]

31


Download ppt "Discriminative Word Alignment with Conditional Random Fields Phil Blunsom & Trevor Cohn [ACL2006] Eiji ARAMAKI."

Similar presentations


Ads by Google