Presentation on theme: "Contradictions in text Mohammed Nazeem (08305038) Swaroop Joshi (08305013) CS 626 - Course Seminar Friday, April 10, 2009."— Presentation transcript:
Contradictions in text Mohammed Nazeem ( ) Swaroop Joshi ( ) CS Course Seminar Friday, April 10, 2009
Contents Introduction & Motivation Contradictions What is contradiction Typology of contradictions Contradiction corpora System overview Features of contradiction detection Results Error Analysis Conclusion
Introduction & Motivation Detecting conflicting statements is a fundamental text understanding task with applications in information analysis. Applications Contradiction detection system to political candidate debates. Intelligent reports Bioinformatics : protein-protein interaction.
Example (1) Police specializing in explosives defused the rockets. Some 100 people were working inside the plant. (2) 100 people were injured. -- defused rockets cannot go off, and thus cannot injure anyone -- (1) does not entail (2)
contradictions require deeper inferences and model building. Mismatching a cue for non-entailment, but not sufficient Assessing event coreference is essential: for texts to contradict, they must refer to the same event.
Contradictions What is a contradiction? sentences A and B are contradictory if there is no possible world in which A and B are both true. Need looser definition. contradiction occurs when two sentences are extremely unlikely to be true simultaneously. Sally sold a boat to John and John sold a boat to Sally
For texts to be contradictory, they must involve same event. Two phenomena: implied coreference. embedded texts.
Examples implied coreference (1) Passions surrounding Germany’s final match turned violent when a woman stabbed her partner because she didn’t want to watch the game. (2) A woman passionately wanted to watch the game. embedded texts (1) Eyewitnesses said de Menezes had jumped over the turnstile at Stockwell subway station. (2) The documents leaked to ITV News suggest that Menezes walked casually into the subway station.
Typology of contradictions Antonym Capital punishment is a catalyst for more crime. Capital punishment is a deterrent to crime. Negation A closely divided Supreme Court said that juries and not judges must impose a death sentence. The Supreme Court decided that only judges can impose the death sentence. Numeric The tragedy of the explosion in Qana that killed more than 50 civilians has presented Israel with a dilemma. An investigation into the strike in Qana found 28 confirmed dead thus far.
Factive Prime Minister John Howard says he will not be swayed by a warning that Australia faces more terrorism attacks unless it withdraws its troops from Iraq. Australia withdraws from Iraq. Structure The Channel Tunnel stretches from England to France. It is the second-longest rail tunnel in the world, the longest being a tunnel in Japan. The Channel Tunnel connects France and Japan.
Lexical The Canadian parliament’s Ethics Commission said former immigration minister, Judy Sgro, did nothing wrong and her staff had put her into a conflict of interest. The Canadian parliament’s Ethics Commission accuses Judy Sgro. WK Microsoft Israel, one of the first Microsoft branches outside the USA, was founded in Microsoft was established in 1989.
Two catagories Catogory 1: those occurring via antonymy, negation, and date/number mismatch Catogory 2: contradictions arising from the use of factive or modal words, structural and subtle lexical contrasts, as well as world knowledge (WK).
Previous structural The Channel Tunnel stretches from England to France. It is the second-longest rail tunnel in the world, the longest being a tunnel in Japan. The Channel Tunnel connects France and Japan. Non-contradiction The CFAP purchases food stamps from the government and distributes them to eligible recipients. A government purchases food.
Contradiction corpora annotated the RTE datasets for contradiction. datasets contain pairs consisting of a short text and a one-sentence hypothesis.
Contrary to Harabagiu et al.(2006) Constructed using negation and paraphrase H: Bill has finished his math. Neg-H: Bill hasn’t finished his math. Para-Neg-H: Bill is still working on his math. rewriting both negated and paraphrased corpora is mostly in the space of ‘easy’ contradictions
simulated the LCC negation corpus adding negative markers to the RTE2 test data (Neg test) a development set (Neg dev) constructed by randomly sampling 50 pairs of entailments and 50 pairs of non- entailments from the RTE2 development set
Number of contradictions in the RTE datasets. Marie-Catherine de Marneffe,Anna N. Rafferty and Christopher D. Manning
Collected contradictions ‘in the wild’ 131 contradictory pairs: 19 from newswire looking at related articles in Google News 51 from Wikipedia 10 from the Lexis Nexis database 51 from the data prepared by LDC for the distillation (DARPA GALE program)
Percentages of contradiction types in the RTE3 dev dataset and the real contradiction corpus. Marie-Catherine de Marneffe,Anna N. Rafferty and Christopher D. Manning
Supports the intuition that contradictions primarily occur due to : Information is updated as knowledge of event over time. (e.g. rising death toll) Various parties have divergent views of an event.
System Overview Linguistic Analysis Alignment between graphs Filtering non-coreferent events Extraction of contradiction features
Linguistic Analysis (1/2) Computes linguistic representations containing information about the semantic content of the passages The text and hypothesis are converted to typed dependency graphs, produced by the Stanford Parser Collocations in WordNet and named entities are collapsed entities and multiword relations become single nodes
Linguistic Analysis (2/2) The typed dependency graph of “Mitsubishi sales rose 46 percent”
Alignment Mapping from each node in the hypothesis to a unique node in the text or to null. Alignment between [T] Mitsubishi Motors Corp.’s new vehicle sales in the US fell 46 percent in June. [H] Mitsubishi sales rose 46 percent.
Filtering non-coreferent events Contradiction features are extracted based on mismatches between the Text and the Hypothesis Pairs which do not describe the same event are removed [T] In Bihar, thousands die due to the flooding Ganga [H] 43 die as a boat sinks in Ganga near Kashi
Extraction of contradiction features Contradiction features are extracted based on mismatches between the Text and the Hypothesis Logistic regression is then applied to classify the pair as contradictory or not (logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logistic curve. It is a generalized linear model used for binomial regression[Wiki]) The feature weights are hand-set, guided by linguistic information
Features for contradiction detection Polarity Number, date and time Antonymy Structural Factivity Modality Relational
Feature-1 : Polarity (1/2) Polarity difference is estimated for a word in the hypothesis and its aligned word in the text If one word is negated and the other is not, there might be a polarity difference. This difference is confirmed by checking that the two words are not antonyms, and they lack unaligned prepositions or other context suggesting that they do not refer to the same thing.
Feature-1 : Polarity (2/2) Example [T] A closely divided Supreme Court said that juries and not judges must impose a death sentence. [H] The Supreme Court decided that judges can impose death sentence
Feature-2 : Numbers, date and time (1/2) Normalize date and time expressions Represent numbers as ranges Expression matching “over 100” and “200” do not contradict Aligned numbers are marked as mismatches when – they are incompatible and – surrounding words match well
Feature-2 : Numbers, date and time (2/2) Example [T] Over 50% Kashmiris used their ballet power [H] 45 pc voting in Kashmir
Feature-3 : Antonymy Antonyms and contrasting words → WordNet Oppositional verbs → VerbOcean Check for common antonym prefix, like anti-, un-, etc. The polarity of the context is used to determine if the antonyms create a contradiction
Feature-4 : Structural (1/2) Determine whether the syntactic structures of the text and the hypothesis create a contradiction Compare subject and object of each aligned verb subject in Text overlaps with object in Hypothesis (or vice versa) → contradiction
Feature-4 : Structural (2/2) Example – [T] Kalam succeeded Narayanan as the President in 2002 – [H] Narayanan succeeded Kalam as the Indian President Contradictions arising from a modified superlative [T] Sachin is the highest test scorer [H] Sachin is the second highest test scorer
Feature-5 : Factivity The context in which a verb phrase is embedded may lead to contradiction Negation influences some factivity patterns. [T1] He forgot to close the door. [T2] He intended to close the door. [H] He did not close the door.
Feature-6 : Modality (1/2) Simple patterns of modal reasoning are captured by mapping the text and hypothesis to one of the six modalities possible, not_possible, actual, not_actual, necessary, not_necessary Example (possible, not_possible) – [T] The trial court may allow the prevailing party reasonable attorney fees as part of costs – [H] The prevailing party may not recover attorney fees
Feature-6 : Modality (2/2) Example (possible, actual) [T] India may beat New Zealand in the last test [H] The last test between India and New Zealand ended in a draw
Feature-7 : Relational The hypothesis captures a relation between elements in the text. Finding such relations (using Semgrex) Ensuring that the arguments between the text and the hypothesis match Example (not a contradiction) – [T] Dutt, of SP, wishes to contest from Lucknow – [H] Dutt is a member of SP
Results (1/2) Reasonably good in Cat.1 63% precision for Negation Single word antonymy handled with good precision (78.9%) Improvement possible with more sophisticated techniques to extract opposite terms Source: deMarneffe et.al.(2008)
Results (2/2) Precision of Numeric is difficult – [T]Tata Steel's quarterly profit grows by 32% – [H]Tata Steel's orders cross Rs. 400 crore, including a 12% rise in international orders – System: Mismatch in 12% - 32%, although one refers to profits and other to orders. Good only in (a) factive/modal and (b) structure in Cat.2
Conclusion Detecting category 1 contradictions is feasible in current systems. Detecting such would solve half of the problem in real world applications Even then, category 2 must be targeted to detect interesting examples. Some of these : lexical and world knowledge are beyond our grasp. But made progress in factive and structural Contradiction is foundational in text comprehension
References (1)“Finding Contradictions in Text”, Marie-Catherine de Marneffe, Anna N. Rafferty, and Christopher D. Manning. Proceedings of ACL-08:HLT, June-2008 (2)“Learning to recognize features of valid textual entailments”, Bill MacCartney et. al., Proceedings of ACL-06:HLT, 2006 (3)“The second PASCAL recognising textual entailment challenge”, Ido Dagan, et.al. Proceedings of the Second PASCAL Challenges Workshop (2006) (4)http://en.wikipedia.org/wiki/Logistic_regression
Extra In the election, Bush called for U.S. troops to be withdrawn from the peacekeeping mission in the Balkans. He cites such missions as an example of how America must “stay the course.”
Extra Pattern Analysis, Statistical Modelling and Computational Learning PASCAL is a Network of Excellence funded by the European Union. It has established a distributed institute that brings together researchers and students across Europe, and is now reaching out to countries all over the world. Language Computer Corporation (LCC) is a natural language processing research company based in Richardson, Texas. The company develops software for question answering, information extraction, and automatic summarization.natural language processingRichardsonTexasquestion answeringinformation extraction automatic summarization