Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer.
Published byModified over 7 years ago
Presentation on theme: "Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer."— Presentation transcript:
Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer Studies Laboratory, IvI, University of Amsterdam firstname.lastname@example.org
Outline Task statement Tree mining: methods Experiments Discussion
Why trees?… What do these two pictures have in common? Complex structure Complex structure! (Scottish handwriting (17 th century))
Motivation Idea: trees can be compared in order to find highly similar structures Tree mining is an intermediate step which allows for the frequent subtree discovery When looking for the most frequent subtrees, we can relax the restrictions on how similar two subtrees should be
What type of trees? (1) In tree mining, there are the following subtrees distinguished: Bottom-up subtrees Induced subtrees Embedded subtrees We use embedded tree mining as described in (M. Zaki, 2005, “Efficiently mining Frequent Trees in a Forest: Algorithms and Applications).
What type of trees?(2) A BD CE G FGH A BD CK E FK H RED – embedded trees YELLOW – bottom-up trees Tree 1 Tree 2
Methodology Data Dependency parsing Depth first search (DFS, preorder) Rooted ordered emb. tree mining Setting thresholds Evaluation
Data preprocessing Each pair of sentences has been parsed by Minipar (Dekang Lin) Each dependency tree has been transformed by incorporating edge labels into node labels Each transformed tree has been presented in preorder (or DFS)
Syntactic matching Provided two sentences (trees, consequently) S1 and S2 where =|S1| and =|S2|, let the size of the rooted maximal embedded tree be. We define the similarity score as a ratio
Runs Run 1: syntactic matching (syntactic functions being incorporated into the node labels) & lemmas overlap Run 2: lemmas overlap (baseline) Run 3: syntactic matching (without syntactic functions) & lemmas overlap
Official results (accuracy) Run 1 (59%) QA 60.50% SUM 69.50% IR 62.00% IE 44.00%
Conclusions: Does it work? Syntactic matching improves precision! But… In some cases, it is too flexible (which leads to false positives) We used ordered trees, therefore such pairs as below do not get high matching scores (h) The currency used in China is the Renminbi Yuan. (t) The Renminbi Yuan is the currency used in China.
Possible extensions Use the synonyms/antonyms from WordNet Handle situation where there are several maximal subtrees Use weighing for the tree nodes Use deep semantic analysis
H: The author expressed his gratitude to the audience T: Thank you! / False? True