2010/2/4Yi-Ting Huang Pennacchiotti, M., & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual Entailment. Recent Advances in Natural Language.

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
An Introduction of Support Vector Machine
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Recognizing Textual Entailment Challenge PASCAL Suleiman BaniHani.
Pattern Recognition and Machine Learning: Kernel Methods.
Machine learning continued Image source:
Baselines for Recognizing Textual Entailment Ling 541 Final Project Terrence Szymanski.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Normalized alignment of dependency trees for detecting textual entailment Erwin Marsi & Emiel Krahmer Tilburg University Wauter Bosma & Mariët Theune University.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Fabio Massimo Zanzotto and Lorenzo Dell’Arciprete University of Rome “Tor Vergata” Roma, Italy Efficient kernels for sentence pair classification.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Reduced Support Vector Machine
Evaluating Hypotheses
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Experimental Evaluation
A Confidence Model for Syntactically-Motivated Entailment Proofs Asher Stern & Ido Dagan ISCOL June 2011, Israel 1.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Fabio Massimo Zanzotto
A Grammar-based Entity Representation Framework for Data Cleaning Authors: Arvind Arasu Raghav Kaushik Presented by Rashmi Havaldar.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Presented By : Abirami Poonkundran.  This paper is a case study on the impact of ◦ Syntactic Dependencies, ◦ Logical Dependencies and ◦ Work Dependencies.
Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
A Language Independent Method for Question Classification COLING 2004.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Relation Alignment for Textual Entailment Recognition Cognitive Computation Group, University of Illinois Experimental ResultsTitle Mark Sammons, V.G.Vinod.
Algorithmic Detection of Semantic Similarity WWW 2005.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Element Level Semantic Matching Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan Paper by Fausto.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 16, March 6, 2007.
Supertagging CMSC Natural Language Processing January 31, 2006.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Fabio Massimo Zanzotto Alessandro Moschitti Experimenting a “general purpose” textual entailment learner in AVE University of Rome “Tor Vergata” Italy.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE Using Support Vector Machine for Integrating Catalogs.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Learning Textual Entailment from Examples
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presentation transcript:

2010/2/4Yi-Ting Huang Pennacchiotti, M., & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual Entailment. Recent Advances in Natural Language Processing (RANLP2007). Zanzotto, F. M. & Moschitti, A. Automatic Learning of Textual Entailments with Cross-pair similarities. ACL

Recognizing Textual Entailment (RTE)  What is RTE:  To determine whether or not a text T entails a hypothesis H.  Example:  T1: “At the end of the year, all solid companies pay dividends.” H1: “At the end of the year, all solid insurance companies pay dividends.” H2: “At the end of the year, all solid companies pay cash dividends.”  Why RTE is important:  It will allow us to model more accurate semantic theories of natural languages and design important applications(QA or IE, etc.) 2

Idea… (1/2)  T3  H3?  T3: “All wild animals eat plants that have scientifically proven medicinal properties.”  “All wild mountain animals eat plants that have scientifically proven medicinal properties.”  T1: “At the end of the year, all solid companies pay dividends.” H1: “At the end of the year, all solid insurance companies pay dividends.”  Yes! T3 is structurally (and somehow lexically similar) to T1 and H3 is more similar to H1 than to H2. Thus, from T1  H1 we may extract rules to derive that T3  H3. 3

Idea… (2/2)  We should rely not only on a intra-pair similarity between T and H but also on a cross-pair similarity between two pairs (T’,H’) and (T’’,H’’). intra-pair similarity cross-pair similarity 4

Research purpose  In this paper, we define a new cross-pair similarity measure based on text and hypothesis syntactic trees and we use such similarity with traditional intra-pair similarities to define a novel semantic kernel function.  We experimented with such kernel using Support Vector Machines on the test tests of the Recognizing Textual Entailment (RTE) challenges. 5

Term definition  word w t in Text T, word w h in Hypothesis H anchors is the pairs (w t, w h ), e.g. indicate (companies, companies) calls placeholders. 6

Intra-pair similarity (1/3)  Intra-pair similarity: to anchor the content words in the hypothesis W H to words in the text W T.  Each word w h in W H is connected to all words w t in W T. that have the highest similarity sim w (w t, w h ). As result, we have a set of anchors and the subset of words in T connected with a word in H.  We select the final anchor set as the bijective relation between W T and W T ‘ that mostly satisfies a locality criterion: whenever possible, words of constituent in H should be related to words of a constituent in T. 7

Intra-pair similarity (2/3) 1. Two words are maximally similar if these have the same surface form 2. To use one of the WordNet (Miller, 1995) similarities indicated with d(l w, l w’ ) (Corley and Mihalcea, 2005). We adopted the wn::similarity package (Pedersen et al., 2004) to compute the Jiang&Conrath (J&C) distance (Jiang and Conrath, 1997). 3. We can use WordNet 2.' (Miller, 1995) to extract different relation between words such as the lexical entailment between verbs (Ent) and derivationally relation between words (Der). 4. We use the edit distance measure lev(w t,w h ) to capture the similarity between words that are missed by the previous analysis for misspelling errors or for the lack of derivationally forms not coded in WordNet. and the lemmatized form l w of a word w 8

Intra-pair similarity (3/3)  The above word similarity measure can be used to compute the similarity between T and H. In line with (Corley and Mihalcea, 2''5),  where idf(w) is the inverse document frequency of the word w. A selected portion of the British National Corpus to compute the inverse document frequency (idf). We assigned the maximum idf to words not found in the BNC. 9

Cross-pair syntactic kernels  To capture the number of common subtrees between texts (T’, T’’) and hypotheses (H’, H’’) that share the same anchoring scheme respectively.  to derive the best mapping between placeholder sets.  a cross-pair similarity 10

The best mapping  Let A’ and A’’ be the placeholders of (T’,H’) and (T’’,H’’), |A’|≥|A’’| and we align a subset of A’ to A’’.  Let C be the set of all bijective mappings from, an element is substitution function.  The best alignment: where (i) returns the syntactic tree of the text S with placeholders replaced by means of the substitution c. (ii) i is the identity substitution (iii) is a function that measures the similarity between the two trees t 1, t 2. 11

Example where (i) returns the syntactic tree of the text S with placeholders replaced by means of the substitution c. (ii) i is the identity substitution (iii) is a function that measures the similarity between the two trees t 1, t 2. 12

Cross-pair similarity 13  A tree kernel function over t 1 and t 2 is K T (t 1, t 2 ) =, where N t1 and N t2 are the sets of the t1’s and t2’s nodes, respectively.  Given a subtree space F =, the indicator function I i (n) is equal to 1 if the target f i is rooted at node n and equal to 0 otherwise.  In turn is the number of levels of the subtree f i.  Thus assigns a lower weight to larger fragments. when = 1, is equal to the number of common fragments rooted at nodes n1 and n2. where as K T (t1, t2) we use the tree kernel function defined in (Collins and Duffy, 2002).

Example 14 I i (n) is equal to 1 if the target f i is rooted at node n and equal to 0 otherwise K T (t 1, t 2 ) =

Kernel function in SVM 15  The K T function has been proven to be a valid kernel, i.e. its associated Gram matrix is positive semidefinite.  Some basic operations on kernel functions, e.g. the sum, are closed with respect to the set of valid kernels.  The cross-pair similarity would be a valid kernel and we could use it in kernel based machines like SVMs.  We developed SVM-light-TK (Moschitti, 2006) which encodes the basic tree kernel function, K T, in SVM light (Joachims, 1999). We used such software to implement K s, K 1, K 2 and K s + K i kernels (i {1, 2}).

Limit  The limitation of the cross-pair similarity measure is then that placeholders do not convey the semantic knowledge needed in cases such as the above, where the semantic relation between connected verbs is essential. 16

Adding semantic information  Defining anchor types: A valuable source of relation types among words is WordNet.  similar according to the WordNet similarity measure, to capture synonymy and hyperonymy.  surface matching when words or lemmas match, it captures semantically equivalent words. 17

Augmenting placeholders with anchor types (1/2) 18  typed anchor model (ta) : anchor types augment only the pre-terminal nodes of the syntactic tree;  propagated typed anchor model (tap) : anchors climb up in the syntactic tree according to some specific climbing-up rules, similarly to what done for placeholders.  Climbing-up rules: they climb up in the tree according to constituent nodes in the syntactic trees take the placeholder of their semantic heads.

propagated typed anchor model (tap) 19 if two fragment have the same syntactic structure S(NP, V P(AUX,NP)), and there is a semantic equivalence (=) on all constituents, then entailment hold.

Augmenting placeholders with anchor types (2/2) 20  New Rule: if two typed anchors climb up to the same node, give precedence to that with the highest ranking in the ordered set of type 

Experimental I 21  Data set: D1, T1 and D2, T2, are the development and the test sets of the first (Dagan et al., 2005) and second (Bar Haim et al., 2006) challenges.  The positive examples constitute the 50% of the data.  ALL is the union of D1, D2, and T1, which we also split in 70%-30%.  D2(50%)0 and D2(50%)00 is a random split of D2. (homogeneous split.)  Tool: The Charniak parser (Charniak, 2000) and the morpha lemmatiser (Minnen et al., 2001) to carry out the syntactic and morphological analysis.

Results in Experiment I 22

Finding in Experiment I 23  The dramatic improvement observed in (Corley and Mihalcea, 2005) on the dataset “Train:D1-Test:T1” is given by the idf rather than the use of the J&C similarity (second vs. third columns).  our approach (last column) is significantly better than all the other methods as it provides the best result for each combination of training and test sets.  By comparing the average on all datasets, our system improves on all the methods by at least 3 absolute percent points.  The accuracy produced by Synt Trees with placeholders is higher than the one obtained with Only Synt Trees.

Experimental II 24  we compare our ta and tap approaches with the strategies for RTE: lexical overlap, syntactic matching and entailment triggering.  Data set: D2, T2 (Bar Haim et al., 2006) RTE2 challenges.  We here adopt 4-fold cross validation.

Experiment II 25  Variables:  tree: the first algorithm  lex: lexical overlap similarity (Corley and Mihalcea, 2005).  synt: syntactic matching. synt(T,H) is used to compute the score, by comparing all the substructures of the dependency trees of T and H. synt(T,H) = K T (T,H)/|H| where |H| is the number of subtrees in H  lex+trig: SVO that tests if T and H share a similar subj-verb-obj construct; Apposition that tests if H is a sentence headed by the verb to be and in T there is an apposition that states H; Anaphora that tests if the SVO sentence in H has a similar wh- sentence in T and the wh-pronoun may be resolved in T with a word similar to the object or the subject of H.

Results in Experiment II 26

Finding in Experiment II 27  Syntax structure: this demonstrates that syntax is not enough, and that lexical-semantic knowledge, and in particular the explicit representation of word level relations, plays a key role in RTE % +4.19%

Finding in Experiment II 28  Also, tap outperforms lex, supporting a complementary conclusion: lexical-semantic knowledge does not cover alone the entailment phenomenon, but needs some syntactic evidence.  the use of cross-pair similarity together with lexical overlap (lex + tree) is successful, as accuracy improves +1.87% and +2.33% over the related basic methods (respectively lex and tree) %

Conclusions 29  We have presented a model for the automatic learning of rewrite rules for textual entailments from examples.  For this purpose, we devised a novel powerful kernel based on cross-pair similarities.  Effectively integrating semantic knowledge in textual entailment recognition.  We experimented with such kernel using Support Vector Machines on the RTE test sets.

More information 30  TREE KERNELS IN SVM-LIGHTTREE KERNELS IN SVM-LIGHT has released on the website, which is implemented by the first algorithm. TREE KERNELS IN SVM-LIGHTTREE KERNELS IN SVM-LIGHT