Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

TextInfer 2011 – Bar Ilan University 1 Towards a probabilistic Model for Lexical Entailment Eyal Shnarch, Jacob Goldberger, Ido Dagan.
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
DIMENSIONALITY REDUCTION: FEATURE EXTRACTION & FEATURE SELECTION Principle Component Analysis.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Recognizing Textual Entailment Challenge PASCAL Suleiman BaniHani.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Global Learning of Type Entailment Rules Jonathan Berant, Ido Dagan, Jacob Goldberger June 21 st, 2011.
Robust Textual Inference via Graph Matching Aria Haghighi Andrew Ng Christopher Manning.
Normalized alignment of dependency trees for detecting textual entailment Erwin Marsi & Emiel Krahmer Tilburg University Wauter Bosma & Mariët Theune University.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids Y. Wang, O. Zaiane, R. Goebel.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining - revision Martin Russell.
Presented by Zeehasham Rasheed
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
Scalable Text Mining with Sparse Generative Models
A Confidence Model for Syntactically-Motivated Entailment Proofs Asher Stern & Ido Dagan ISCOL June 2011, Israel 1.
Ensemble Learning (2), Tree and Forest
Online Learning Algorithms
Outline P1EDA’s simple features currently implemented –And their ablation test Features we have reviewed from Literature –(Let’s briefly visit them) –Iftene’s.
Overview of the Fourth Recognising Textual Entailment Challenge NIST-Nov. 17, 2008TAC Danilo Giampiccolo (coordinator, CELCT) Hoa Trang Dan (NIST)
Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern and Ido Dagan (earlier partial version by Roy Bar-Haim) Download at:
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Fabio Massimo Zanzotto
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant, Ido Dagan Crowdsourcing Inference-Rule Evaluation Naomi Zeichner, Jonathan Berant,
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Adding Semantics to Clustering Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang Microsoft Research Asia, Beijing, P.R.China Department of Computer.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Today Ensemble Methods. Recap of the course. Classifier Fusion
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.
Toward an Open Source Textual Entailment Platform (Excitement Project) Bernardo Magnini (on behalf of the Excitement consortium) 1 STS workshop, NYC March.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Towards Entailment Based Question Answering: ITC-irst at Clef 2006 Milen Kouylekov, Matteo Negri, Bernardo Magnini & Bonaventura Coppola ITC-irst, Centro.
SALSA-WS 09/05 Approximating Textual Entailment with LFG and FrameNet Frames Aljoscha Burchardt, Anette Frank Computational Linguistics Department Saarland.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
NTU & MSRA Ming-Feng Tsai
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Cold-Start KBP Something from Nothing Sean Monahan, Dean Carpenter Language Computer.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Learning Textual Entailment from Examples
Reading Report: Open QA Systems
Recognizing Partial Textual Entailment
Presented by: Prof. Ali Jaoua
Automatic Detection of Causal Relations for Question Answering
Enriching Taxonomies With Functional Domain Knowledge
Presentation transcript:

Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido Dagan TAC November 2011, NIST, Gaithersburg, Maryland, USA Download at:

RTE Classify a (T,H) pair as ENTAILING or NON-ENTAILING 2 T: The boy was located by the police. H: Eventually, the police found the child. Example

Matching vs. Transformations Matching Sequence of transformations (A proof) – Tree-Edits Complete proofs Estimate confidence – Knowledge based Entailment Rules Linguistically motivated Formalize many types of knowledge 3 T = T 0 → T 1 → T 2 →... → T n = H

Transformation based RTE - Example T = T 0 → T 1 → T 2 →... → T n = H Text: The boy was located by the police. Hypothesis: Eventually, the police found the child. 4

Transformation based RTE - Example T = T 0 → T 1 → T 2 →... → T n = H Text: The boy was located by the police. The police located the boy. The police found the boy. The police found the child. Hypothesis: Eventually, the police found the child. 5

Transformation based RTE - Example T = T 0 → T 1 → T 2 →... → T n = H 6

BIUTEE Goals Tree Edits 1.Complete proofs 2.Estimate confidence Entailment Rules 3.Linguistically motivated 4.Formalize many types of knowledge BIUTEE Integrates the benefits of both worlds 7

Challenges / System Components 1.generate linguistically motivated complete proofs? 2.estimate proof confidence? 3.find the best proof? 4.learn the model parameters? How to 8

1. Generate linguistically motivated complete proofs 9

Entailment Rules boy child Generic Syntactic Lexical Syntactic Lexical Bar-Haim et al Semantic inference at the lexical-syntactic level.

Extended Tree Edits (On The Fly Operations) Predefined custom tree edits – Insert node on the fly – Move node / move sub-tree on the fly – Flip part of speech – … Heuristically capture linguistic phenomena – Operation definition – Features definition 11

Proof over Parse Trees - Example T = T 0 → T 1 → T 2 →... → T n = H Text: The boy was located by the police. Passive to active The police located the boy. X locate Y  X find Y The police found the boy. Boy  child The police found the child. Insertion on the fly Hypothesis: Eventually, the police found the child. 12

2. Estimate proof confidence 13

Cost based Model Define operation cost – Assesses operation’s validity – Represent each operation as a feature vector – Cost is linear combination of feature values Define proof cost as the sum of the operations’ costs Classify: entailment if and only if proof cost is smaller than a threshold 14

Feature vector representation Define operation cost – Represent each operation as a feature vector Features (Insert-Named-Entity, Insert-Verb, …, WordNet, Lin, DIRT, …) The police located the boy. DIRT: X locate Y  X find Y (score = 0.9) The police found the boy. (0,0,…,0.457,…,0)(0,0,…,0,…,0) Feature vector that represents the operation 15 An operation A downward function of score

Cost based Model Define operation cost –Cost is linear combination of feature values Cost = weight-vector * feature-vector Weight-vector is learned automatically 16

Confidence Model Define operation cost – Represent each operation as a feature vector Define proof cost as the sum of the operations’ costs Cost of proof Weight vector Vector represents the proof. Define

Feature vector representation - example T = T 0 → T 1 → T 2 →... → T n = H (0,0,……………….………..,1,0) (0,0,………..……0.457,..,0,0) (0,0,..…0.5,.……….……..,0,0) (0,0,1,……..…….…..…....,0,0) (0,0, …0.457,....…1,0) = 18 Text: The boy was located by the police. Passive to active The police located the boy. X locate Y  X find Y The police found the boy. Boy  child The police found the child. Insertion on the fly Hypothesis: Eventually, the police found the child.

Cost based Model Define operation cost – Represent each operation as a feature vector Define proof cost as the sum of the operations’ costs Classify: “entailing” if and only if proof cost is smaller than a threshold 19 Learn

3. Find the best proof 20

Search the best proof 21 T H Proof #1 Proof #2 Proof #3 Proof #4

Search the best proof 22 Need to find the “best” proof “Best Proof” = proof with lowest cost ‒Assuming a weight vector is given Search space is exponential ‒AI style search algorithm Proof #1 Proof #2 Proof #3 Proof #4 T  H Proof #1 Proof #2 Proof #3 Proof #4 T  H

4. Learn model parameters 23

Learning Goal: Learn parameters (w, b) Use a linear learning algorithm – logistic regression, SVM, etc. 24

25 Inference vs. Learning Training samples Vector representation Learning algorithm w,b Best Proofs Feature extraction

26 Inference vs. Learning Training samples Vector representation Learning algorithm w,b Best Proofs Feature extraction

27 Iterative Learning Scheme Training samples Vector representation Learning algorithm w,b Best Proofs 1. W=reasonable guess 2. Find the best proofs 3. Learn new w and b 4. Repeat to step 2

Summary- System Components 1.Generate syntactically motivated complete proofs? – Entailment rules – On the fly operations (Extended Tree Edit Operations) 2.Estimate proof validity? – Confidence Model 3.Find the best proof? – Search Algorithm 4.Learn the model parameters? – Iterative Learning Scheme How to 28

Results RTE7 29 IDKnowledge ResourcesPrecision % Recall %F1 % BIU1WordNet, Directional Similarity BIU2WordNet, Directional Similarity, Wikipedia BIU3WordNet, Directional Similarity, Wikipedia, FrameNet, Geographical database BIUTEE 2011 on RTE 6 (F1 %) Base line (Use IR top-5 relevance)34.63 Median (September 2010)36.14 Best (September 2010)48.01 Our system49.54

Conclusions Inference via sequence of transformations – Knowledge – Extended Tree Edits Proof confidence estimation Results – Better than median on RTE7 – Best on RTE6 Open Source 30

Thank You