Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe,

Slides:

Advertisements

Similar presentations

Computer Science CPSC 322 Lecture 25 Top Down Proof Procedure (Ch 5.2.2)

Advertisements

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.

Using Link Grammar and WordNet on Fact Extraction for the Travel Domain.

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Recognizing Textual Entailment Challenge PASCAL Suleiman BaniHani.

Max-Margin Matching for Semantic Role Labeling David Vickrey James Connor Daphne Koller Stanford University.

Semantic Role Labeling Abdul-Lateef Yussiff

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.

1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.

Robust Textual Inference via Graph Matching Aria Haghighi Andrew Ng Christopher Manning.

Normalized alignment of dependency trees for detecting textual entailment Erwin Marsi & Emiel Krahmer Tilburg University Wauter Bosma & Mariët Theune University.

1 Quasi-Synchronous Grammars  Based on key observations in MT: translated sentences often have some isomorphic syntactic structure, but not usually in.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.

UNED at PASCAL RTE-2 Challenge IR&NLP Group at UNED nlp.uned.es Jesús Herrera Anselmo Peñas Álvaro Rodrigo Felisa Verdejo.

What is the Jeopardy Model? A Quasi-Synchronous Grammar for Question Answering Mengqiu Wang, Noah A. Smith and Teruko Mitamura Language Technology Institute.

Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun

A Confidence Model for Syntactically-Motivated Entailment Proofs Asher Stern & Ido Dagan ISCOL June 2011, Israel 1.

Overview of Search Engines

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

February 2009Introduction to Semantics1 Logic, Representation and Inference Introduction to Semantics What is semantics for? Role of FOL Montague Approach.

Outline P1EDA’s simple features currently implemented –And their ablation test Features we have reviewed from Literature –(Let’s briefly visit them) –Iftene’s.

Copyright © Cengage Learning. All rights reserved.

1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.

A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)

Assessing the Impact of Frame Semantics on Textual Entailment Authors: Aljoscha Burchardt, Marco Pennacchiotti, Stefan Thater, Manfred Pinkal Saarland.

Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.

Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,

Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.

Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.

November 2003CSA4050: Semantics I1 CSA4050: Advanced Topics in NLP Semantics I What is semantics for? Role of FOL Montague Approach.

Slide 1 Propositional Definite Clause Logic: Syntax, Semantics and Bottom-up Proofs Jim Little UBC CS 322 – CSP October 20, 2014.

AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.

A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

COLING 2012 Extracting and Normalizing Entity-Actions from Users’ comments Swapna Gottipati, Jing Jiang School of Information Systems, Singapore Management.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

Relation Alignment for Textual Entailment Recognition Cognitive Computation Group, University of Illinois Experimental ResultsTitle Mark Sammons, V.G.Vinod.

Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.

For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.

Data Mining and Decision Support

SALSA-WS 09/05 Approximating Textual Entailment with LFG and FrameNet Frames Aljoscha Burchardt, Anette Frank Computational Linguistics Department Saarland.

1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.

Recognising Textual Entailment Johan Bos School of Informatics University of Edinburgh Scotland,UK.

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Lecture 7: Constrained Conditional Models

A Brief Introduction to Distant Supervision

Recognizing Partial Textual Entailment

Automatic Detection of Causal Relations for Question Answering

Presentation transcript:

Stanford Rajat Raina, Aria Haghighi, Christopher Cox, Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill MacCartney, Marie-Catherine de Marneffe, Christopher D. Manning and Andrew Y. Ng PASCAL Challenges Workshop April 12, 2005

Our approach Represent using syntactic dependencies But also use semantic annotations. Try to handle language variability. Perform semantic inference over this representation Use linguistic knowledge sources. Compute a “cost” for inferring hypothesis from text. Low cost  Hypothesis is entailed.

Outline of this talk Representation of sentences Syntax: Parsing and post-processing Adding annotations on representation (e.g., semantic roles) Inference by graph matching Inference by abductive theorem proving A combined system Results and error analysis

Sentence processing Parse with a standard PCFG parser. [Klein & Manning, 2003] Al Qaeda: [Aa]l[ -]Qa’?[ie]da Train on some extra sentences from recent news. Used a high-performing Named Entity Recognizer (next slide) Force parse tree to be consistent with certain NE tags. Example: American Ministry of Foreign Affairs announced that Russia called the United States... (S (NP (NNP American_Ministry_of_Foreign_Affairs)) (VP (VBD announced) (…)))

Named Entity Recognizer Trained a robust conditional random field model. [Finkel et al., 2003] Interpretation of numeric quantity statements Example: T: Kessler's team conducted 60,643 face-to-face interviews with adults in 14 countries. H: Kessler's team interviewed more than 60,000 adults in 14 countries.TRUE Annotate numerical values implied by: “6.2 bn”, “more than 60000”, “around 10”, … MONEY/DATE named entities

Parse tree post-processing Recognize collocations using WordNet Example: Shrek 2 rang up $92 million. (S (NP (NNP Shrek) (CD 2)) (VP (VBD rang_up) (NP (QP ($ $) (CD 92) (CD million)))) (..)) MONEY,

Parse tree  Dependencies Find syntactic dependencies Transform parse tree representations into typed syntactic dependencies, including a certain amount of collapsing and normalization Example: Bill’s mother walked to the grocery store. subj(walked, mother) poss(mother, Bill) to(walked, store) nn(store, grocery) Dependencies can also be written as a logical formula mother(A) Bill(B) poss(B, A) grocery(C) store(C) walked(E, A, C) Basic representations

Representations Dependency graph Logical formula mother(A) Bill(B) poss(B, A) grocery(C) store(C) walked(E, A, C) poss grocery store walked mother Bill subjto nn Can make representation richer “walked” is a verb “Bill” is a PERSON (named entity). “store” is the location/destination of “walked”. … PERSON VBD ARGM-LOC VBD PERSON ARGM-LOC

Annotations Parts-of-speech, named entities Already computed. Semantic roles Example: T: C and D Technologies announced that it has closed the acquisition of Datel, Inc. H 1 : C and D Technologies acquired Datel Inc. TRUE H 2 : Datel acquired C and D Technologies. FALSE Use a state-of-the-art semantic role classifier to label verb arguments. [Toutanova et al. 2005]

More annotations Coreference Example: T: Since its formation in 1948, Israel … H: Israel was established in TRUE Use a conditional random field model for coreference detection. Note: Appositive “references” were previously detected. T: Bush, the President of USA, went to Florida. H: Bush is the President of USA. TRUE Other annotations Word stems (very useful) Word senses (no performance gain in our system)

Event nouns Use a heuristic to find event nouns Augment text representation using WordNet derivational links. Example: T: … witnessed the murder of police commander... H: Police officer killed. TRUE Text logical formula: murder(M) police_commander(P) of(M, P) Augment: murder(E, M, P) NOUN: VERB:

Outline of this talk Representation of sentences Syntax: Parsing and post-processing Adding annotations on representation (e.g., semantic roles) Inference by graph matching Inference by abductive theorem proving A combined system Results and error analysis

Graph Matching Approach Why Graph Matching?: Dependency tree has natural graphical interpretation Successful in other domains: e.g., Lossy image matching Input: Hypothesis (H) and Text (T) Graphs Toy example: Vertices are words and phrases Edges are labeled dependencies Output: Cost of matching H to T (next slide) BMW bought John ARG0(Agent)ARG1(Theme) PERSON T: John bought a BMW.H: John purchased a car. TRUE car purchased John ARG0(Agent)ARG1(Theme) PERSON

Graph Matching: Idea Idea: Align H to T so that vertices are similar and preserve relations (as in machine translation) A matching M is a mapping from vertices of H to vertices of T Thus, for each vertex v in H, M(v) is a vertex in T BMW bought John ARG0(Agent)ARG1(Theme) PERSON car purchased John ARG0(Agent)ARG1(Theme) PERSON T H matching

Graph Matching: Costs The cost of a matching MatchCost(M) measures the “quality” of a matching M VertexCost(M) – Compare vertices in H with matched vertices in T RelationCost(M) – Compare edges (relations) in H with corresponding edges (relations) in T MatchCost(M) = (1 - ß) VertexCost(M) + ß RelationCost(M)

Graph Matching: Costs VertexCost(M) For each vertex v in H, and vertex M(v) in T: Do vertex heads share same stem and/or POS ? Is T vertex head a hypernym of H vertex head? Are vertex heads “similar” phrases? (next slide) RelationCost(M) For each edge (v,v’) in H, and edge (M(v),M(v’)) in T Are parent/child pairs in H parent/child in T ? Are parent/child pairs in H ancestor/descendant in T ? Do parent/child pairs in H share a common ancestor in T?

Digression: Phrase similarity Measures based on WordNet (Resnik/Lesk). Distributional similarity Example: “run” and “marathon” are related. Latent Semantic Analysis to discover words that are distributionally similar (i.e., have common neighbors). Used a web-search based measure Query google.com for all pages with: “run” “marathon” Both “run” and “marathon” Learning paraphrases. [Similar to DIRT: Lin and Pantel, 2001] “World knowledge” (labor intensive) CEO = Chief_Executive_Officer Philippines  Filipino [Can add common facts: “Paris is the capital of France”, …]

Graph Matching: Costs VertexCost(M) For each vertex v in H, and vertex M(v) in T: Do vertex heads share same stem and or POS ? Is T vertex head a hypernym of H vertex head? Are vertex heads “similar” phrases? (next slide) RelationCost(M) For each edge (v,v’) in H, and edge (M(v),M(v’)) in T Are parent/child pairs in H parent/child in T ? Are parent/child pairs in H ancestor/descendant in T ? Do parent/child pairs in H share a common ancestor in T?

Graph Matching: Example VertexCost: ( )/3 = 0.2 RelationCost: 0 (Graphs Isomorphic) ß = 0.45 (say) MatchCost: 0.55 * (0.2) * (0.0) = 0.11 BMW bought John ARG0(Agent)ARG1(Theme) PERSON car purchased John ARG0(Agent)ARG1(Theme) PERSON Hypernym Match Cost: 0.4 Exact Match Cost: 0.0 Synonym Match Cost: 0.2

Outline of this talk Representation of sentences Syntax: Parsing and post-processing Adding annotations on representation (e.g., semantic roles) Inference by graph matching Inference by abductive theorem proving A combined system Results and error analysis

Abductive inference Idea: Represent text and hypothesis as logical formulae. A hypothesis can be inferred from the text if and only if the hypothesis logical formula can be proved from the text logical formula. Toy example: T: John bought a BMW.H: John purchased a car. TRUE John(A) BMW(B) bought(E, A, B)John(x) car(y) purchased(z, x, y) Prove ? Allow assumptions at various “costs” BMW(t) + $2 => car(t) bought(p, q, r) + $1 => purchased(p, q, r)

Abductive assumptions Assign costs to all assumptions of the form: P(p 1, p 2, …, p m ) Q(q 1, q 2, …, q n ) Build an assumption cost model Predicate match cost Phrase similarity cost? Same named entity type? … Argument match cost Same semantic role? Constant similarity cost? …

Abductive theorem proving Each assumption provides a potential proof step. Find the proof with the minimum total cost Uniform cost search If there is a low-cost proof, the hypothesis is entailed. Example: T: John(A) BMW(B) bought(E, A, B)H: John(x) car(y) purchased(z, x, y) Here is a possible proof by resolution refutation (for the earlier costs): $0-John(x) -car(y) -purchased(z, x, y)[Given: negation of hypothesis] $0-car(y) -purchased(z, A, y)[Unify with John(A)] $2-purchased(z, A, B)[Unify with BMW(B)] $3NULL[Unify with purchased(E, A, B)] Proof cost = 3

Abductive theorem proving Can automatically learn good assumption costs Start from a labeled dataset (e.g.: the PASCAL development set) Intuition: Find assumptions that are used in the proofs for TRUE examples, and lower their costs (by framing a log-linear model). Iterate. [Details: Raina et al., in submission]

Some interesting features Examples of handling “complex” constructions in graph matching/abductive inference. Antonyms/Negation: High cost for matching verbs, if they are antonyms or one is negated and the other not. T: Stocks fell. H: Stocks rose. FALSE T: Clinton’s book was not a hit H: Clinton’s book was a hit. FALSE Non-factive verbs: T: John was charged for doing X. H: John did X. FALSE Can detect because “doing” in text has non-factive “charged” as a parent but “did” does not have such a parent.

Some interesting features “Superlative check” T: This is the tallest tower in western Japan. H: This is the tallest tower in Japan. FALSE

Outline of this talk Representation of sentences Syntax: Parsing and post-processing Adding annotations on representation (e.g., semantic roles) Inference by graph matching Inference by abductive theorem proving A combined system Results and error analysis

Results Combine inference methods Each system produces a score. Separately normalize each system’s score variance. Suppose normalized scores are s 1 and s 2. Final score S = w 1 s 1 + w 2 s 2 Learn classifier weights w 1 and w 2 on the development set using logistic regression. Two submissions: Train one classifier weight for all RTE tasks. (General) Train different classifier weights for each RTE task. (ByTask)

Results GeneralByTask AccuracyCWSAccuracyCWS DevSet 164.8% %0.805 DevSet 252.1% %0.661 DevSet 1 + DevSet % %0.743 Test Set56.3% %0.686 Balanced predictions. [55.4%, 51.2% predicted TRUE on test set.] Best other results: Accuracy=58.6%, CWS=0.617

Results by task GeneralByTask AccuracyCWSAccuracyCWS CD79.3% %0.926 IE47.5% %0.590 IR56.7% %0.604 MT46.7% %0.479 PP58.0% %0.535 QA48.5% %0.466 RC52.9% %0.480

Partial coverage results Can also draw coverage-CWS curves. For example: at 50% coverage, CWS = at 25% coverage, CWS = ByTask General ByTask Task-specific optimization seems better!

Some interesting issues Phrase similarity away from the coast  farther inland won victory in presidential election  became President stocks get a lift  stocks rise life threatening  fatal Dictionary definitions believe there is only one God  are monotheistic “World knowledge” K Club, venue of the Ryder Cup, …  K Club will host the Ryder Cup

Future directions Need more NLP components in there: Better treatment of frequent nominalizations, parenthesized material, etc. Need much more ability to do inference Fine distinctions between meanings, and fine similarities. e.g., “reach a higher level” and “rise” We need a high-recall, reasonable precision similarity measure! Other resources (e.g., antonyms) are also very sparse. More task-specific optimization.

Thanks!