Textual Entailment Textual entailment recognition: is the task of deciding, given two text fragments, whether the meaning of one text is entailed (can be inferred) from another text.
Task Definition Given pairs of small text snippets, referred to as Text- Hypothesis (T-H) pairs. Build a system that will decide for each T-H pair whether T indeed entails H or not. Results will be compared to the manual gold standard generated by annotators. Example: T: Kurdistan Regional Government Prime Minister Dr. Barham Salih was unharmed after an assassination attempt. Prime minister targeted for assassination
Dataset Collection and Application Settings The dataset of Text-Hypothesis pairs was collected by human annotators. It consists of seven subsets Information Retrieval (IR) Comparable Documents (CD) Reading Comprehension (RC) Question Answering (QA) Information Extraction (IE) Machine Translation (MT) Paraphrase Acquisition (PP)
Approaching textual entailment recognition Solution approaches can be categorizes as. 1.Deep analysis or “understanding” –Using types of linguistic knowledge and resources to accurately recognize textual entailment Patterns of entailment (e.g. lexical relations, syntactic alternations) Processing technology (word co-occurrence statistics, thesaurus, parsing, etc.) 2.Shallow approach
Baseline Given that half the pairs are FALSE, the simplest baseline is to label all pairs FALSE. This will achieve 50% accuracy.
Application of the BLEU (BiLingual Evaluation Understudy) algorithm Shallow based on lexical level. It is based on calculating the percentage of n-grams for a given translation to the human standard one, a typical values for N are taken, i.e. 1, 2, 3, 4. It limits each n-gram appearance to a maximum frequency. The result of each n-gram is combined, and a penalty is added to short text. Scored 54% for development set, and a 50% in the test set. Good results in the CD, bad results in IE and IR. Problem, does not recognize syntactical or semantics, such as synonyms and antonyms.
Syntactic similarities Human annotators were asked to divide the data set to True by syntax False by syntax Not syntax Cannot decide Then using a robust parser to establish the results. A partial submission was provided. And humans were used for the test.
Tree edit distance The text as well as the Hypothesis is transformed to a tree using a sentence splitter and a parser to create the syntactic representation. A matching module, find the beast sequence of editing operations to obtain H from T. Each editing operation (deletion, insertion and substitution) is given a relative score. Finally the total score is evaluated, if it exceeds a certain limit them the pair is labeled as true. High accuracy for CD but 55% overall accuracy. Should be enriched by using resources as WordNet and other libraries.
Dependency Analysis and WordNet A dependency parser is used to normalized data in appropriate tree representation. Then a lexical entailment module is used, where the sub branches of T an H can be entailed from the other using Synonymy and similarity Hyponymy and WordNet entailment, i.e. death entail kill. Multiwords, i.e. melanoma entails skin-cancer. Negation and antonymy, where negation is propagated through tree leaves. A matching between dependency trees using a matching algorithm searching for matching branches between T and H. Results show high score in CD and a between 42 to 55 % in other fields
Syntactic Graph Distance: a rule based and a SVM based approach Use a graph distance theory, where a graph is used to represent the H and T pair. Use similarity measures to determine entailment T semantically subsumes H, e.g. H: [The cat eats the mouse] and T: [the cat devours the mouse], eat generalizes devour). T syntactically subsumes H, e.g., H: [The cat eats the mouse] and T: [the cat eats the mouse in the garden], T contains a specializing prepositional phrase). T directly implies H (e.g., H: [The cat killed the mouse], T: [the cat devours the mouse]).
Cont. A rule based system realize the following Node similarity Syntactic similarity Semantic similarity Applying a machine learning technique to evaluate the parameters and make the final decision Results high for CD.76 and.44-.59 for others
hierarchical knowledge representation A hierarchical logic passed representation o the T H pairs, where a description logic inspired language is used, extended feature description login (EFDL) which is similar to concept graph. Nodes in the graph represent words or phrases. Manually generated rewriting rules are used for semantic and syntactic representations. A sentence in the text can have different alternatives The evaluation is based if any of the sentence representations can infer H. Results in the system set 64.8 while in the test 56.1, high CD lowest QA 50%
Logic like formula representation A parser is used to transfer the pair T and H to graph, of logical phrases, where the nodes are the words and the links are the relations. A matching score is given for each pair of terms. The theorem proof is used to find the proof with the lowest coast. The final cost is evaluated is it is less than a threshold, then the entailment is proved. High results in the CD 79%, Lowest with MT 47% average 55%.
Atomic Propositions Find entailment relation by comparing the atomic proposition contained in the T and H. The comparison of the atomic propositions is done using a deduction system OTTER. The atomic propositions are extracted from the text using a parser. WordNet is used for word relations. A semantic analyzer is used to transform the output of the parser to first order logic. Low accuracy.5188 especially for QA 47%.
Combining shallow over lapping technique with deep theorem proving In the shallow stage a simple frequency test of over lapping words is used. In the deep stage CCG–parser is used to generate DRS, discourse representation theory. Which is transformed to first order logic. Vampire theorem prover and Paradox where used for entailment proof. A knowledge base was used to validate results with real world. WordNet Geographical knowledge from CIA Generic axioms for, for instance, the semantics of possessives, active-passives, and locations. The combined system has accuracy of.562 while the shallow approach has an accuracy of 0.55.
Applying COGEX logic prover First use parser to convert into logic. Then use COGEX, which is a modified version of OTTER. The prover requires a set of clauses called the “set of support” which is used to initiate the search for inferences. The set of support is loaded with the negated form of the hypothesis as well as the predicates that make up the text passage. Another list is required called the usable list, contains clauses used by OTTER to generate inferences. The usable list consists of all the axioms that have been generated either automatically or by hand. World Knowledge Axioms (Manually) NLP Axioms(SS and SM) WordNet Lexical Chains Overall accuracy.551, a lot of errors in the parsing stage.
Comparing task accuracy CD – Comparable DocumentsIE – Information Extraction QA – Question AnsweringIR – Information Retrieval MT – Machine TranslationRC – Reading Comprehension PP – Paraphrasing
Future work Search for a candidate parser to transform NL to first order logic. Use the largest set of KB to caputre similarity. Search for a robust theorem prover.
References The first PASCAL Recognising Textual Entailment Challenge (RTE I) Ido Dagan, Oren Glickman and Bernardo Magnini. The PASCAL Recognising Textual Entailment Challenge. In Proceedings of the PASCAL Challenges Workshop on Recognising Textual Entailment, 2005. http://www.cs.biu.ac.il/~glikmao/rte05/ind ex.html http://www.cs.biu.ac.il/~glikmao/rte05/ind ex.html