Presentation is loading. Please wait.

Presentation is loading. Please wait.

2010/2/4Yi-Ting Huang Pennacchiotti, M., & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual Entailment. Recent Advances in Natural Language.

Similar presentations


Presentation on theme: "2010/2/4Yi-Ting Huang Pennacchiotti, M., & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual Entailment. Recent Advances in Natural Language."— Presentation transcript:

1 2010/2/4Yi-Ting Huang Pennacchiotti, M., & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual Entailment. Recent Advances in Natural Language Processing (RANLP2007). Zanzotto, F. M. & Moschitti, A. Automatic Learning of Textual Entailments with Cross-pair similarities. ACL2006. 1

2 Recognizing Textual Entailment (RTE)  What is RTE:  To determine whether or not a text T entails a hypothesis H.  Example:  T1: “At the end of the year, all solid companies pay dividends.” H1: “At the end of the year, all solid insurance companies pay dividends.” H2: “At the end of the year, all solid companies pay cash dividends.”  Why RTE is important:  It will allow us to model more accurate semantic theories of natural languages and design important applications(QA or IE, etc.) 2

3 Idea… (1/2)  T3  H3?  T3: “All wild animals eat plants that have scientifically proven medicinal properties.”  “All wild mountain animals eat plants that have scientifically proven medicinal properties.”  T1: “At the end of the year, all solid companies pay dividends.” H1: “At the end of the year, all solid insurance companies pay dividends.”  Yes! T3 is structurally (and somehow lexically similar) to T1 and H3 is more similar to H1 than to H2. Thus, from T1  H1 we may extract rules to derive that T3  H3. 3

4 Idea… (2/2)  We should rely not only on a intra-pair similarity between T and H but also on a cross-pair similarity between two pairs (T’,H’) and (T’’,H’’). intra-pair similarity cross-pair similarity 4

5 Research purpose  In this paper, we define a new cross-pair similarity measure based on text and hypothesis syntactic trees and we use such similarity with traditional intra-pair similarities to define a novel semantic kernel function.  We experimented with such kernel using Support Vector Machines on the test tests of the Recognizing Textual Entailment (RTE) challenges. 5

6 Term definition  word w t in Text T, word w h in Hypothesis H anchors is the pairs (w t, w h ), e.g. indicate (companies, companies) calls placeholders. 6

7 Intra-pair similarity (1/3)  Intra-pair similarity: to anchor the content words in the hypothesis W H to words in the text W T.  Each word w h in W H is connected to all words w t in W T. that have the highest similarity sim w (w t, w h ). As result, we have a set of anchors and the subset of words in T connected with a word in H.  We select the final anchor set as the bijective relation between W T and W T ‘ that mostly satisfies a locality criterion: whenever possible, words of constituent in H should be related to words of a constituent in T. 7

8 Intra-pair similarity (2/3) 1. Two words are maximally similar if these have the same surface form 2. To use one of the WordNet (Miller, 1995) similarities indicated with d(l w, l w’ ) (Corley and Mihalcea, 2005). We adopted the wn::similarity package (Pedersen et al., 2004) to compute the Jiang&Conrath (J&C) distance (Jiang and Conrath, 1997). 3. We can use WordNet 2.' (Miller, 1995) to extract different relation between words such as the lexical entailment between verbs (Ent) and derivationally relation between words (Der). 4. We use the edit distance measure lev(w t,w h ) to capture the similarity between words that are missed by the previous analysis for misspelling errors or for the lack of derivationally forms not coded in WordNet. and the lemmatized form l w of a word w 8

9 Intra-pair similarity (3/3)  The above word similarity measure can be used to compute the similarity between T and H. In line with (Corley and Mihalcea, 2''5),  where idf(w) is the inverse document frequency of the word w. A selected portion of the British National Corpus to compute the inverse document frequency (idf). We assigned the maximum idf to words not found in the BNC. 9

10 Cross-pair syntactic kernels  To capture the number of common subtrees between texts (T’, T’’) and hypotheses (H’, H’’) that share the same anchoring scheme respectively.  to derive the best mapping between placeholder sets.  a cross-pair similarity 10

11 The best mapping  Let A’ and A’’ be the placeholders of (T’,H’) and (T’’,H’’), |A’|≥|A’’| and we align a subset of A’ to A’’.  Let C be the set of all bijective mappings from, an element is substitution function.  The best alignment: where (i) returns the syntactic tree of the text S with placeholders replaced by means of the substitution c. (ii) i is the identity substitution (iii) is a function that measures the similarity between the two trees t 1, t 2. 11

12 Example where (i) returns the syntactic tree of the text S with placeholders replaced by means of the substitution c. (ii) i is the identity substitution (iii) is a function that measures the similarity between the two trees t 1, t 2. 12

13 Cross-pair similarity 13  A tree kernel function over t 1 and t 2 is K T (t 1, t 2 ) =, where N t1 and N t2 are the sets of the t1’s and t2’s nodes, respectively.  Given a subtree space F =, the indicator function I i (n) is equal to 1 if the target f i is rooted at node n and equal to 0 otherwise.  In turn is the number of levels of the subtree f i.  Thus assigns a lower weight to larger fragments. when = 1, is equal to the number of common fragments rooted at nodes n1 and n2. where as K T (t1, t2) we use the tree kernel function defined in (Collins and Duffy, 2002).

14 Example 14 I i (n) is equal to 1 if the target f i is rooted at node n and equal to 0 otherwise K T (t 1, t 2 ) =

15 Kernel function in SVM 15  The K T function has been proven to be a valid kernel, i.e. its associated Gram matrix is positive semidefinite.  Some basic operations on kernel functions, e.g. the sum, are closed with respect to the set of valid kernels.  The cross-pair similarity would be a valid kernel and we could use it in kernel based machines like SVMs.  We developed SVM-light-TK (Moschitti, 2006) which encodes the basic tree kernel function, K T, in SVM light (Joachims, 1999). We used such software to implement K s, K 1, K 2 and K s + K i kernels (i {1, 2}).

16 Limit  The limitation of the cross-pair similarity measure is then that placeholders do not convey the semantic knowledge needed in cases such as the above, where the semantic relation between connected verbs is essential. 16

17 Adding semantic information  Defining anchor types: A valuable source of relation types among words is WordNet.  similar according to the WordNet similarity measure, to capture synonymy and hyperonymy.  surface matching when words or lemmas match, it captures semantically equivalent words. 17

18 Augmenting placeholders with anchor types (1/2) 18  typed anchor model (ta) : anchor types augment only the pre-terminal nodes of the syntactic tree;  propagated typed anchor model (tap) : anchors climb up in the syntactic tree according to some specific climbing-up rules, similarly to what done for placeholders.  Climbing-up rules: they climb up in the tree according to constituent nodes in the syntactic trees take the placeholder of their semantic heads.

19 propagated typed anchor model (tap) 19 if two fragment have the same syntactic structure S(NP, V P(AUX,NP)), and there is a semantic equivalence (=) on all constituents, then entailment hold.

20 Augmenting placeholders with anchor types (2/2) 20  New Rule: if two typed anchors climb up to the same node, give precedence to that with the highest ranking in the ordered set of type 

21 Experimental I 21  Data set: D1, T1 and D2, T2, are the development and the test sets of the first (Dagan et al., 2005) and second (Bar Haim et al., 2006) challenges.  The positive examples constitute the 50% of the data.  ALL is the union of D1, D2, and T1, which we also split in 70%-30%.  D2(50%)0 and D2(50%)00 is a random split of D2. (homogeneous split.)  Tool: The Charniak parser (Charniak, 2000) and the morpha lemmatiser (Minnen et al., 2001) to carry out the syntactic and morphological analysis.

22 Results in Experiment I 22

23 Finding in Experiment I 23  The dramatic improvement observed in (Corley and Mihalcea, 2005) on the dataset “Train:D1-Test:T1” is given by the idf rather than the use of the J&C similarity (second vs. third columns).  our approach (last column) is significantly better than all the other methods as it provides the best result for each combination of training and test sets.  By comparing the average on all datasets, our system improves on all the methods by at least 3 absolute percent points.  The accuracy produced by Synt Trees with placeholders is higher than the one obtained with Only Synt Trees.

24 Experimental II 24  we compare our ta and tap approaches with the strategies for RTE: lexical overlap, syntactic matching and entailment triggering.  Data set: D2, T2 (Bar Haim et al., 2006) RTE2 challenges.  We here adopt 4-fold cross validation.

25 Experiment II 25  Variables:  tree: the first algorithm  lex: lexical overlap similarity (Corley and Mihalcea, 2005).  synt: syntactic matching. synt(T,H) is used to compute the score, by comparing all the substructures of the dependency trees of T and H. synt(T,H) = K T (T,H)/|H| where |H| is the number of subtrees in H  lex+trig: SVO that tests if T and H share a similar subj-verb-obj construct; Apposition that tests if H is a sentence headed by the verb to be and in T there is an apposition that states H; Anaphora that tests if the SVO sentence in H has a similar wh- sentence in T and the wh-pronoun may be resolved in T with a word similar to the object or the subject of H.

26 Results in Experiment II 26

27 Finding in Experiment II 27  Syntax structure: this demonstrates that syntax is not enough, and that lexical-semantic knowledge, and in particular the explicit representation of word level relations, plays a key role in RTE. +1.12% +4.19%

28 Finding in Experiment II 28  Also, tap outperforms lex, supporting a complementary conclusion: lexical-semantic knowledge does not cover alone the entailment phenomenon, but needs some syntactic evidence.  the use of cross-pair similarity together with lexical overlap (lex + tree) is successful, as accuracy improves +1.87% and +2.33% over the related basic methods (respectively lex and tree). +0.66%

29 Conclusions 29  We have presented a model for the automatic learning of rewrite rules for textual entailments from examples.  For this purpose, we devised a novel powerful kernel based on cross-pair similarities.  Effectively integrating semantic knowledge in textual entailment recognition.  We experimented with such kernel using Support Vector Machines on the RTE test sets.

30 More information 30  TREE KERNELS IN SVM-LIGHTTREE KERNELS IN SVM-LIGHT has released on the website, which is implemented by the first algorithm. TREE KERNELS IN SVM-LIGHTTREE KERNELS IN SVM-LIGHT


Download ppt "2010/2/4Yi-Ting Huang Pennacchiotti, M., & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual Entailment. Recent Advances in Natural Language."

Similar presentations


Ads by Google