Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Hitting The Right Paraphrases In Good Time Stanley Kok Dept. of Comp. Sci. & Eng. Univ. of Washington Seattle, USA Chris Brockett NLP Group Microsoft.

Similar presentations


Presentation on theme: "1 Hitting The Right Paraphrases In Good Time Stanley Kok Dept. of Comp. Sci. & Eng. Univ. of Washington Seattle, USA Chris Brockett NLP Group Microsoft."— Presentation transcript:

1 1 Hitting The Right Paraphrases In Good Time Stanley Kok Dept. of Comp. Sci. & Eng. Univ. of Washington Seattle, USA Chris Brockett NLP Group Microsoft Research Redmond, USA

2 Motivation Background Hitting Time Paraphraser Experiments Future Work 2 Overview

3 Motivation Background Hitting Time Paraphraser Experiments Future Work 3 Overview

4 4 What’s a paraphrase of… Paraphrase System “is on good terms with” “is friendly with” “is a friend of” … Query expansion Document summarization Natural language generation Question answering etc. Applications

5 5 What’s a paraphrase of… Paraphrase System “is on good terms with” “is friendly with” “is a friend of” … Bilingual Parallel Corpora

6 English Phrase (E) German Phrase (G) P(G|E)P(E|G) under controlunter kontrolle0.750.40 in checkunter kontrolle0.600.20...……… 6 Bilingual Parallel Corpus …the cost dynamic is under control… …die kostenentwicklung unter kontrolle… …keep the cost in check… …die kosten unter kontrolle… … … Phrase Table

7 BCB system [Bannard & Callison-Burch, ACL’05] P(E 2 |E 1 ) ¼  C  G P(E 2 |G) P(G|E 1 ) SBP system [Callison-Burch, EMNLP’08] P(E 2 |E 1 ) ¼  C  G P(E 2 |G,syn(E 1 )) p(G|E 1, syn(E 1 )) 7 State of the Art

8 8 E1E1 E2E2 G1G1 F2F2 P(F 2 |E 1 ) P(E 2 |F 2) P(G 1 |E 1 ) P(E 2 |G 1 ) E3E3 E4E4 (in check)(under control) G2G2 G3G3 (unter kontrolle) F1F1 Graphical View

9 9 Path lengths > 2 General graph Add nodes to represent domain knowledge Random Walks Hitting Times G1G1 F2F2 G2G2 G3G3 F1F1 E1E1 E2E2 E3E3 E4E4

10 Motivation Background Hitting Time Paraphraser Experiments Future Work 10 Overview

11 AA Random Walk Begin at node A Randomly pick neighbor n E F D B C 11

12 Random Walk Begin at node A Randomly pick neighbor n Move to node n E F DA 2B C 12

13 Random Walk Begin at node A Randomly pick neighbor n Move to node n Repeat E F DA B 2 C 13

14 Expected number of steps starting from node i before node j is visited for first time Smaller hitting time → closer to start node i Truncated Hitting Time [Sarkar & Moore, UAI’07] Random walks are limited to T steps Computed efficiently & with high probability by sampling random walks [Sarkar, Moore & Prakash ICML’08] 14 Hitting Time from node i to j

15 Finding Truncated Hitting Time By Sampling E F D1 B C A A T=5 15

16 Finding Truncated Hitting Time By Sampling E F 4A B C D A D T=5 16

17 Finding Truncated Hitting Time By Sampling 5 F DA B C E A D E T=5 17

18 Finding Truncated Hitting Time By Sampling E F 4A B C D A D E D T=5 18

19 Finding Truncated Hitting Time By Sampling E 6 DA B CF A D E D F T=5 19

20 Finding Truncated Hitting Time By Sampling 5 F DA B C E A D E D F E T=5 20

21 Finding Truncated Hitting Time By Sampling A D E D F E T=5 E F DA B C h AD =1 h AE =2 h AF =4 h AA =0 h AB =5 h AC =5 21

22 Motivation Background Hitting Time Paraphraser Experiments Future Work 22 Overview

23 23 Hitting Time Paraphraser (HTP) Paraphrase System “is on good terms with” “is friendly with” “is a friend of” … HTP Phrase Tables English-German English-French German-French etc. PhraseParaphrases

24 24 Graph Construction

25 25 Graph Construction

26 BFS from query phrase up to depth d or up to max. number n of nodes d = 6, n = 50,000 26 … … … … … … … … … Graph Construction

27 27 Graph Construction … … … … … … … … … 0.25 0.35

28 28 Graph Construction … … … … … … … … … 0.6

29 29 Graph Construction … … … … … … … … … 0.5

30 Run m truncated random walks to estimate truncated hitting time of each node T = 10, m = 1,000,000 Prune nodes with hitting times = T Estimate Trunc. Hitting Times

31 31 Add Ngram Nodes “achieve the goal”“achieve the aim”“reach the objective” “the” …… “achieve the”“the aim”“reach”“objective”

32 32 Add “Syntax” Nodes “whose goal is”“the aim is”“the objective is” “what goal” start with articleend with bestart with interrogatives

33 33 Add Not-Substring-Of Nodes “reach the”“reach the aim”“reach the objective”“objective” not-substring-of

34 34 Feature Nodes ngram nodes “syntax” nodes not-substring nodes phrase nodes p2p2 p1p1 p3p3 p4p4 = 0.4 = 0.1 = 0.4 = 0.1

35 Run m truncated random walks again Rank paraphrases in increasing order of hitting times 35 Re-estimate Truncated Hitting Times

36 Motivation Background Hitting Time Paraphraser Experiments Future Work 36 Overview

37 Europarl dataset [Koehn, MT-Summit’05] Use 6 of 11 languages: English, Danish, German, Spanish, Finnish, Dutch About a million sentences per language English−Foreign phrasal alignments by giza++ [Callison-Burch, EMNLP’08] Foreign−Foreign phrasal alignments by MSR aligner 37 Data

38 SBP system [Callison-Burch, EMNLP’08] HTP with no feature node HTP with bipartite graph 38 Comparison Systems

39 NIST dataset 4 English translations per Chinese sentence 33,216 English translations Randomly selected 100 English phrases From 1-4grams in both NIST & Europarl datasets Exclude stop words, numbers, phrases containing periods and commas 39 Evaluation Methodology

40 For each phrase, randomly select a sentence from NIST dataset containing it Substituted top 1 to 10 paraphrases for phrase 40 Methodology

41 Manually evaluated resulting sentences 0: Clearly wrong; grammatically incorrect or does not preserve meaning 1: Minor grammatical errors (e.g., subject-verb disagreement; wrong tenses, etc.), or meaning largely preserved but not completely 2: Totally correct; grammatically correct and meaning is preserved Correct: 1 and 2; Wrong: 0 Two evaluators; Kappa = 0.62 (substantial agree.) 41 Methodology

42 42 Phr. HTPSBP q1q1 q2q2 ……… q 49 q 50 q 51 …… q 100 HTP vs. SBP p 1 1 p 2 1 p 3 1 p 4 1 p 5 1 p 6 1 p 7 1 p 8 1 p 9 1 p 10 1 p 11 1 p 12 1 p 1 2 p 2 2 p 3 2 p 4 2 p 5 2 p 1 49 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 1 1 p 2 1 p 3 1 p 4 1 p 5 1 p 6 1 p 7 1 p 1 2 p 2 2 p 3 2 p 1 p 2 p 3 p 4 p 5 p 1 50 p 2 p 3 p 4 p 5 p 6 p 7 p 1 51 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 1 100 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9 51 p 10 51 p 11 51 p 12 51 0.710.53

43 43 Phr. HTPSBP q1q1 q2q2 ……… q 49 q 50 q 51 …… q 100 HTP vs. SBP p 1 1 p 2 1 p 3 1 p 4 1 p 5 1 p 6 1 p 7 1 p 8 1 p 9 1 p 10 1 p 11 1 p 12 1 p 1 2 p 2 2 p 3 2 p 4 2 p 5 2 p 1 49 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 1 1 p 2 1 p 3 1 p 4 1 p 5 1 p 6 1 p 7 1 p 1 2 p 2 2 p 3 2 p 1 p 2 p 3 p 4 p 5 p 1 50 p 2 p 3 p 4 p 5 p 6 p 7 p 1 51 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9 p 10 51 p 11 51 p 12 51 0.560.39 373 paraphrases per system p 1 100 p 2 p 3 p 4 p 5 p 6 p 7 p 8

44 44 Phr. HTPSBP q1q1 q2q2 ……… q 49 q 50 q 51 …… q 100 HTP vs. SBP p 1 1 p 2 1 p 3 1 p 4 1 p 5 1 p 6 1 p 7 1 p 8 1 p 9 1 p 10 1 p 11 1 p 12 1 p 1 2 p 2 2 p 3 2 p 4 2 p 5 2 p 1 49 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 1 1 p 2 1 p 3 1 p 4 1 p 5 1 p 6 1 p 7 1 p 1 2 p 2 2 p 3 2 p 1 p 2 p 3 p 4 p 5 p 1 50 p 2 p 3 p 4 p 5 p 6 p 7 p 1 51 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9 p 10 51 p 11 51 p 12 51 483 paraphrases 0.54 p 1 100 p 2 p 3 p 4 p 5 p 6 p 7 p 8

45 45 Phr. HTPSBP q1q1 q2q2 ……… q 49 q 50 q 51 …… q 100 HTP vs. SBP p 1 1 p 2 1 p 3 1 p 4 1 p 5 1 p 6 1 p 7 1 p 8 1 p 9 1 p 10 1 p 11 1 p 12 1 p 1 2 p 2 2 p 3 2 p 4 2 p 5 2 p 1 49 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 1 1 p 2 1 p 3 1 p 4 1 p 5 1 p 6 1 p 7 1 p 1 2 p 2 2 p 3 2 p 1 p 2 p 3 p 4 p 5 p 1 50 p 2 p 3 p 4 p 5 p 6 p 7 p 1 51 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9 p 10 51 p 11 51 p 12 51 0.53 p 1 100 p 2 p 3 p 4 p 5 p 6 p 7 p 8 0.50 0.71 0.61

46 46 Phr. HTPSBP q1q1 q2q2 ……… q 49 q 50 q 51 …… q 100 HTP vs. SBP p 1 1 p 2 1 p 3 1 p 4 1 p 5 1 p 6 1 p 7 1 p 8 1 p 9 1 p 10 1 p 11 1 p 12 1 p 1 2 p 2 2 p 3 2 p 4 2 p 5 2 p 1 49 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 1 1 p 2 1 p 3 1 p 4 1 p 5 1 p 6 1 p 7 1 p 1 2 p 2 2 p 3 2 p 1 p 2 p 3 p 4 p 5 p 1 50 p 2 p 3 p 4 p 5 p 6 p 7 p 1 51 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9 p 10 51 p 11 51 p 12 51 0.540.39 p 1 100 p 2 p 3 p 4 p 5 p 6 p 7 p 8 975 paraphrases 0.32 373 paraphrases 492 paraphrases 0.43 420 correct paraphrases 145 correct paraphrases

47 47 Timings SystemTiming (secs/phrase) HTP48 SBP468

48 Motivation Background Hitting Time Paraphraser Experiments Future Work 48 Overview

49 Apply HTP to languages other than English Evaluate HTP impact on applications e.g., improve performance of resource-sparse machine translation systems Add more features etc. 49 Future Work

50 HTP: a paraphrase system based on random walks Good paraphrases have smaller hitting times General graph Path length > 2 Incorporate domain knowledge HTP outperforms state-of-the-art 50 Conclusion


Download ppt "1 Hitting The Right Paraphrases In Good Time Stanley Kok Dept. of Comp. Sci. & Eng. Univ. of Washington Seattle, USA Chris Brockett NLP Group Microsoft."

Similar presentations


Ads by Google