Presentation is loading. Please wait.

Presentation is loading. Please wait.

LREC 2012, May 24 th, 2012 Carnegie Mellon. LREC 2012, May 24 th, 20122  John killed Mary. Can a machine recognize the meaning similarity?

Similar presentations


Presentation on theme: "LREC 2012, May 24 th, 2012 Carnegie Mellon. LREC 2012, May 24 th, 20122  John killed Mary. Can a machine recognize the meaning similarity?"— Presentation transcript:

1 LREC 2012, May 24 th, 2012 Carnegie Mellon

2 LREC 2012, May 24 th,  John killed Mary. Can a machine recognize the meaning similarity?

3 Carnegie Mellon LREC 2012, May 24 th,  John killed Mary.  Mary was killed by John. passivization Can a machine recognize the meaning similarity?

4 Carnegie Mellon LREC 2012, May 24 th,  John killed Mary.  Mary was killed by John.  John is the killer of Mary. passivization nominalization Can a machine recognize the meaning similarity?

5 Carnegie Mellon LREC 2012, May 24 th,  John killed Mary.  Mary was killed by John.  John is the killer of Mary.  John assassinated Mary. passivization nominalization entailment Can a machine recognize the meaning similarity?

6 Carnegie Mellon LREC 2012, May 24 th,  John killed Mary.  Mary was killed by John.  John is the killer of Mary.  John assassinated Mary.  John is the 187 suspect of Mary. passivization nominalization entailment slang Can a machine recognize the meaning similarity? 187 means: “California penal code for murder, made popular in west coast gangsta rap”. – From The Urban Dictionary dot com Usage: “ This is Gavilan. In pursuit of possible 187 suspects. ” –From the movie, Hollywood Homicide

7 Carnegie Mellon LREC 2012, May 24 th,  John killed Mary.  Mary was killed by John.  John is the killer of Mary.  John assassinated Mary.  John is the 187 suspect of Mary.  John terminated Mary with extreme prejudice. passivization nominalization entailment slang Can a machine recognize the meaning similarity? euphemism “In military and other covert operations, terminate with extreme prejudice is a euphemism for execution” – Wikipedia

8 Carnegie Mellon LREC 2012, May 24 th,  John killed Mary.  Mary was killed by John.  John is the killer of Mary.  John assassinated Mary.  John is the 187 suspect of Mary.  John terminated Mary with extreme prejudice. passivization nominalization entailment slang Can a machine recognize the meaning similarity? euphemism Humans use various expressions to convey the same or similar meaning, which makes it difficult for machines to “read” text.

9 Carnegie Mellon LREC 2012, May 24 th,  X killed Y.  Y was killed by Y.  X is the killer of Y.  X assassinated Y.  X is the 187 suspect of Y.  X terminated Y with extreme prejudice. passivization nominalization entailment slang Can a machine recognize the meaning similarity? euphemism Goal: automatically acquire paraphrase patterns that are lexically-diverse

10 Carnegie Mellon LREC 2012, May 24 th,  Automatic Evaluation – In Machine Translation [Kauchak & Barzilay, 2006][Padó et al., 2009] – In Text Summarization [Zhou et al., 2006] – In Question Answering [Ibrahim et al., 2003] [Dalmas, 2007]  Text Summarization [Lloret et al., 2008][Tatar et al., 2009]  Information Retrieval [Parapar et al., 2005][Riezler et al., 2007]  Information Extraction [Romano et al., 2006]  Question Answering [Harabagiu & Hickl, 2006][Dogdan et al., 2008]  Collocation Error Correction [Dahlmeier and Ng, 2011] Paraphrase Recognition / Generation is a common need in various applications

11 Carnegie Mellon Outline LREC 2012, May 24 th,  Motivation  Method: Diversifiable Bootstrapping  Experiment  Related Works  Conclusion

12 Carnegie Mellon Bootstrap Paraphrase Learning LREC 2012, May 24 th, monolingual plain corpus seed instances BOOTSTRAP LEARNING ALGORITHM more instances patterns INPUTOUTPUT

13 Carnegie Mellon BOOTSTRAP LEARNING ALGORITHM monolingual plain corpus Bootstrappingmore instances patterns INPUTOUTPUT Bootstrap Paraphrase Learning LREC 2012, May 24 th, seed instances X (killer)Y (victim) John Wilkes Booth Mark David Chapman Nathuram Godse Yigal Amir John Bellingham Mohammed Bouyeri Dan White Sirhan El Sayyid Nosair Mijailo Mijailovic Abraham Lincoln John Lennon Mahatma Gandhi Yitzhak Rabin Spencer Perceval Theo van Gogh Mayor George Moscone Robert F. Kennedy Meir Kahane Anna Lindh

14 Carnegie Mellon monolingual plain corpus seed instances Bootstrapping more instances INPUTOUTPUT Bootstrap Paraphrase Learning LREC 2012, May 24 th, patterns X, the assassin of Y assassination of Y by X X assassinated Y the assassination of Y by X of X, the assassin of Y X assassinated Y in : : : Unlike many other bootstrapping works the goal is acquire patterns, not instances Unlike many other bootstrapping works the goal is acquire patterns, not instances

15 Carnegie Mellon Bootstrap Paraphrase Learning LREC 2012, May 24 th, monolingual plain corpus seed instances BOOTSTRAP LEARNING ALGORITHM more instances patterns INPUTOUTPUT

16 Carnegie Mellon Bootstrap Learning Algorithm LREC 2012, May 24 th, Seed Instances Sentences Extracted Patterns Ranked Patterns Extracted Instances Sentences Ranked Instances 1st iteration...2nd iteration This framework is based on ESPRESSO [Pantel & Pennacchiotti, 2006]

17 Carnegie Mellon Search sentences by instances Bootstrap Learning Algorithm LREC 2012, May 24 th, Extracted Patterns Ranked Patterns Extracted Instances Sentences Ranked Instances 1st iteration...2nd iteration Sentences Seed Instances  Edwin Booth was brother of John Wilkes Booth, the assassin of Abraham Lincoln.  John Wilkes Booth, the assassin of Abraham Lincoln, was inspired by Brutus.  In 1969 Berman was part of the defense team of Sirhan Sirhan, the assassin of Robert F. Kennedy. : : :

18 Carnegie Mellon Search sentences by instances Bootstrap Learning Algorithm LREC 2012, May 24 th, Extracted Patterns Ranked Patterns Extracted Instances Sentences Ranked Instances 1st iteration...2nd iteration Sentences Seed Instances  Edwin Booth was brother of X, the assassin of Y.  X, the assassin of Y, was inspired by Brutus.  In 1969 Berman was part of the defense team of X, the assassin of Y. : : :

19 Carnegie Mellon Extract patterns from sentences Bootstrap Learning Algorithm LREC 2012, May 24 th, Seed Instances Ranked Patterns Extracted Instances Sentences Ranked Instances 1st iteration...2nd iteration Extracted Patterns Sentences  … brother of X, the assassin of Y.  X, the assassin of Y, was  …team of X, the assassin of Y.

20 Carnegie Mellon Extract patterns from sentences Bootstrap Learning Algorithm LREC 2012, May 24 th, Seed Instances Ranked Patterns Extracted Instances Sentences Ranked Instances 1st iteration...2nd iteration Extracted Patterns Sentences  … brother of X, the assassin of Y.  X, the assassin of Y, was  …team of X, the assassin of Y. Extracted Pattern: Longest Common Substring among retrieved sentences

21 Carnegie Mellon Score and rank patterns Sentences Bootstrap Learning Algorithm LREC 2012, May 24 th, Extracted Instances Sentences Ranked Instances 1st iteration...2nd iteration Ranked Patterns Rank by reliability of pattern: r(p). r(p) is based on an association measure with each instance in the corpus. Extracted Patterns Seed Instances

22 Carnegie Mellon Score and rank patterns Sentences Bootstrap Learning Algorithm LREC 2012, May 24 th, Extracted Instances Sentences Ranked Instances 1st iteration...2nd iteration Ranked Patterns X, the assassin of Y assassination of Y by X X assassinated Y the assassination of Y by X of X, the assassin of Y : : : Extracted Patterns Seed Instances

23 Carnegie Mellon Search sentences by pattern(s) Sentences Extracted Patterns Seed Instances Bootstrap Learning Algorithm LREC 2012, May 24 th, Extracted Instances Ranked Instances 1st iteration...2nd iteration Ranked Patterns  Still shot from the CCTV video footage showing Oguen Samast, the assassin of Hrant Dink.  Henry Bellingham is a descendant of John Bellingham, the assassin of Spencer Perceval. Sentences

24 Carnegie Mellon Ranked Patterns Extract instances from sentences Sentences Extracted Patterns Seed Instances Bootstrap Learning Algorithm LREC 2012, May 24 th, Ranked Instances 1st iteration...2nd iteration  Still shot from the CCTV video footage showing Oguen Samast, the assassin of Hrant Dink.  Henry Bellingham is a descendant of John Bellingham, the assassin of Spencer Perceval. Sentences Extracted Instances

25 Carnegie Mellon Sentences 1st iteration Extracted Patterns Seed Instances Score and rank instances Bootstrap Learning Algorithm LREC 2012, May 24 th, nd iteration Ranked Patterns Extracted Instances Ranked Instances Rank instances by reliability: r(i) (similar to pattern reliability scoring)

26 Carnegie Mellon Issue: Lack of Lexical Diversity LREC 2012, May 24 th, As a solution, we propose the Diversifiable Bootstrapping X, the assassin of Y assassination of Y by X X assassinated Y the assassination of Y by X of X, the assassin of Y X assassinated Y in Words participating in patterns are skewed

27 Carnegie Mellon Diversifiable Bootstrapping LREC 2012, May 24 th, Original reliability score of a pattern How is a pattern lexically different from other patterns originally ranked higher than this?

28 Carnegie Mellon Diversifiable Bootstrapping LREC 2012, May 24 th, Original reliability score of a pattern Interpolation parameter: How is a pattern lexically different from other patterns originally ranked higher than this?

29 Carnegie Mellon How is this pattern lexically different from other patterns originally ranked higher than this? Diversifiable Bootstrapping LREC 2012, May 24 th, Original reliability score of a pattern Key contribution By tweaking the parameter λ, patterns to acquire can be diversifiable with a specific degree one can control. Interpolation parameter:

30 Carnegie Mellon Experimental Settings LREC 2012, May 24 th,  Bootstrapping Algorithm – Based on ESPRESSO framework [Pantel & Pennacchiotti, 2006] – Unlike ESPRESSO, we aim to obtain patterns not instances  Lexical diversity scoring function : – Based on Shima & Mitamura [2011]  Seed instances: Schlaefer et al., [2006]  Corpus: English Wikipedia

31 Carnegie Mellon Acquired Paraphrases: killed LREC 2012, May 24 th, X, the assassin of Y assassination of Y by X X assassinated Y the assassination of Y by X of X, the assassin of Y X assassinated Y in X, the man who assassinated Y Y's assassin, X of Y's assassin X of the assassination of Y by X X shot and killed Y Y was assassinated by X named X assassinated Y Y was shot by X X to assassinate Y (no diversification)

32 Carnegie Mellon Acquired Paraphrases: killed LREC 2012, May 24 th, X, the assassin of Y assassination of Y by X X assassinated Y the assassination of Y by X of X, the assassin of Y X assassinated Y in X, the man who assassinated Y Y's assassin, X of Y's assassin X of the assassination of Y by X X shot and killed Y Y was assassinated by X named X assassinated Y Y was shot by X X to assassinate Y X, the assassin of Y X assassinated Y assassination of Y by X Y was shot by X X, who killed Y the assassination of Y by X X assassinated Y in X tells his version of Y X shoot Y X murdered Y Y's killer, X Y, at the theatre after X Y, push X to his breaking point X to assassinate Y of X, the assassin of Y X, the assassin of Y X, who killed Y Y was shot by X X tells his version of Y X shoot Y X murdered Y Y's killer, X Y, at the theatre after X Y, push X to his breaking point X assassinated Y assassination of Y by X X to assassinate Y X kills Y of X shooting Y X assassinated Y in

33 Carnegie Mellon Acquired Paraphrases: killed LREC 2012, May 24 th, X, the assassin of Y assassination of Y by X X assassinated Y the assassination of Y by X of X, the assassin of Y X assassinated Y in X, the man who assassinated Y Y's assassin, X of Y's assassin X of the assassination of Y by X X shot and killed Y Y was assassinated by X named X assassinated Y Y was shot by X X to assassinate Y X, the assassin of Y X assassinated Y assassination of Y by X Y was shot by X X, who killed Y the assassination of Y by X X assassinated Y in X tells his version of Y X shoot Y X murdered Y Y's killer, X Y, at the theatre after X Y, push X to his breaking point X to assassinate Y of X, the assassin of Y X, the assassin of Y X, who killed Y Y was shot by X X tells his version of Y X shoot Y X murdered Y Y's killer, X Y, at the theatre after X Y, push X to his breaking point X assassinated Y assassination of Y by X X to assassinate Y X kills Y of X shooting Y X assassinated Y in

34 Carnegie Mellon Acquired Paraphrases: died-of LREC 2012, May 24 th, X died of Y X died of Y in X died of Y on X died of lung Y X died of lung Y in X died of lung Y on X died of Y in the X died of Y at X died of stomach Y X died of natural Y X died of breast Y in X died of a Y X died of Y in his X passed away from Y X died of a Y in X died of Y in X died of Y X's death from Y X passed away from Y Y of X, news Y of X, a former that X was suffering from Y the suspected Y of X X to breast Y in X was diagnosed with ovarian Y X dies of Y X was dying of Y X died of lung Y X died of Y on X died of lung Y in X died of Y in X's death from Y X passed away from Y Y of X, news Y of X, a former that X was suffering from Y the suspected Y of X X succumbed to lung Y X to breast Y in X was diagnosed with ovarian Y X dies of Y X was dying of Y X died of Y X's death from Y in X died of lung Y

35 Carnegie Mellon Acquired Paraphrases: was-led-by LREC 2012, May 24 th, Y came to power in X in Y came to power in X Y to power in X Y came to power in X in the when Y came to power in X in when Y came to power in X Y took power in X Y rose to power in X after Y came to power in X Y became chancellor of X Y came to power in X and Y seized power in X Y gained power in X to power of Y in X Y's rise to power in X Y came to power in X Y to power in X regime of Y in X Y came to power in X in Y to power in X in Y became chancellor of X the rise of Y in X X's dictator Y X's president Y Y took control of X Y, who ruled X Y's success and X's saviour Y declared that X had X's leader Y government of Y in X Y came to power in X in regime of Y in X X's dictator Y Y became chancellor of X X's president Y the rise of Y in X X's leader Y Y, who ruled X Y took control of X government of Y in X X, led by Y quisling had visited Y in X to flee X after Y Y in X the year before X, under the leadership of Y

36 Carnegie Mellon LREC 2012, May 24 th,  E.g., WordNet [Miller, 1995], FrameNet [Baker et al., 1998], Nomlex [Macleod et al., 1998], VerbNet [Kipper et al., 2006] Related Works – Use of Thesaurus Synonyms of “lead (v)” in WordNet IDWordsDefinition S1 lead, take, direct, conduct, guide take somebody somewhere S2 leave, result, lead produce as a result or residue : S6 run, go, pass, lead, extend stretch out over a distance, space, time, or scope : S14 moderate, chair, lead preside over

37 Carnegie Mellon LREC 2012, May 24 th,  E.g., WordNet [Miller, 1995], FrameNet [Baker et al., 1998], Nomlex [Macleod et al., 1998], VerbNet [Kipper et al., 2006] Related Works – Use of Thesaurus IDWordsDefinition S1 lead, take, direct, conduct, guide take somebody somewhere S2 leave, result, lead produce as a result or residue : S6 run, go, pass, lead, extend stretch out over a distance, space, time, or scope : S14 moderate, chair, lead preside over Synonyms of “lead (v)” in WordNet WEAKNESS Need WSD or contexts to avoid false-positives.

38 Carnegie Mellon LREC 2012, May 24 th,  Alignment Approach – Monolingual Comparable Corpus [Shinyama et al, 2002] – Bilingual Parallel Corpus [Barzilay & McKeown, 2001][Bannard & Callison-Burch, 2005][Callison-Burch, 2008]  Distributional Approach – Context as Vector Space [Pasca & Dienes, 2005][Bhagat & Ravichandran, 2008] – Context as Surface Pattern [Lin & Pantel, 2001][Ravichandran & Hovy, 2002] Related Works – Paraphrase Acquisition

39 Carnegie Mellon LREC 2012, May 24 th, Related Works – Paraphrase Acquisition [Bannard & Callison-Burch, 2005] [Callison-Burch, 2008] [Bhagat & Ravichandran, 2008] [Pasca & Dienes, 2005] murdered killed inused dieddeadkilled,made beatendeaththat killedinvolved been killeddeathskilled NN peoplefound arediedkilled NNborn lostvictimskilled bydone were killedkillingwere wounded ininjured killbeen killedand woundingseen have dieddead, includingtaken, hundredsreleased Paraphrases acquired by Metzler et al., [2011]

40 Carnegie Mellon LREC 2012, May 24 th,  Our work requires just a plain non-parallel corpus – Language portability: Good news for resource/tool-scarce languages – There’s a potential to learn words used in a closed community (slangs, technical terms etc) by providing a domain-specific corpus  Bootstrapping works iteratively with minimum supervision – Smaller human effort is required as compared to heavily supervised learning methods, or to relying on domain expert humans to hand-craft patterns. Differences from Related Works

41 Carnegie Mellon Conclusion LREC 2012, May 24 th, We proposed the Diversifiable Bootstrapping which can acquire lexically- diverse paraphrase patterns. We gave initial experimental results on a few relations, which look promising. As a future work, we hope to conduct formal evaluations on larger relations in different languages.

42 Carnegie Mellon Acknowledgment LREC 2012, May 24 th, We also gratefully acknowledge the support of Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA C Any opinions, findings, and conclusion or recommendations expressed in this material are those of the authors and do not necessarily reflect the view of the DARPA, AFRL, or the US government. This publication was made possible in part by a NPRP grant (No: ) from the Qatar National Research Fund (a member of The Qatar Foundation). The statements made herein are solely the responsibility of the authors.

43 Carnegie Mellon Questions? LREC 2012, May 24 th,


Download ppt "LREC 2012, May 24 th, 2012 Carnegie Mellon. LREC 2012, May 24 th, 20122  John killed Mary. Can a machine recognize the meaning similarity?"

Similar presentations


Ads by Google