Presentation is loading. Please wait.

Presentation is loading. Please wait.

Toward Dependency Path based Entailment Rodney Nielsen, Wayne Ward, and James Martin.

Similar presentations


Presentation on theme: "Toward Dependency Path based Entailment Rodney Nielsen, Wayne Ward, and James Martin."— Presentation transcript:

1 Toward Dependency Path based Entailment Rodney Nielsen, Wayne Ward, and James Martin

2 Dependency Path-based Entailment DIRT (Lin and Pantel, 2001) Unsupervised method to discover inference rules “X is author of Y ≈ X wrote Y” “X solved Y ≈ X found a solution to Y” If two dependency paths tend to link the same sets of words, they hypothesize that their meanings are similar

3 ML Classification Approach Features derived from corpus statistics Unigram co-occurrence Surface form bigram co-occurrence Dependency-derived bigram co-occurrence Mixture of experts: About 18 ML classifiers from Weka toolkit Classify by majority vote or average probability Bag of WordsGraph Matching Dependency Path Based Entailment

4 Corpora 7.4M articles, 2.5B words, 347 words/doc Gigaword (Graff, 2003) – 77% of documents Reuters Corpus (Lewis et al., 2004) TIPSTER Lucene IR engine Two indices Word surface form Porter stem filter Stop words = {a, an, the}

5 Core Features Core Repeated Features Product of MLEs Average of MLEs Geometric Mean of MLEs Worst Non-Zero MLE Entailing Ngrams for the Lowest Non-Zero MLE Largest Entailing Ngram Count with a Zero MLE Smallest Entailing Ngram Count with a Non-Zero MLE Count of Ngrams in h that do not Co-occur with any Ngrams from t Count of Ngrams in h that do Co-occur with Ngrams in t

6 Dependency Features Dependency bigram features Hypothesis hText t rising costis Theof paper choke Newspaperson costs and falling risingpaperrevenues

7 Dependency Features Hypothesis hText t rising costis Theof paper choke Newspaperson costs and falling risingpaperrevenues Descendent relation statistics

8 Dependency Features Hypothesis hText t rising costis The of paper choke Newspaperson costs and falling risingpaperrevenues Descendent relation statistics

9 Dependency Features Hypothesis hText t rising cost is Theof paper choke Newspaperson costs and falling risingpaperrevenues Descendent relation statistics

10 Dependency Features Hypothesis hText t rising costis Theof paper choke Newspaperson costs and falling risingpaperrevenues Descendent relation statistics

11 Verb Dependency Features Hypothesis hText t rising costis Theof paper choke Newspaperson costs and falling risingpaperrevenues Combined verb descendent relation features Worst verb descendent relation features

12 Subject Dependency Features Combined and worst subject descendent relations Combined and worst subject-to-verb paths Hypothesis hText t rising cost is Theof paper choke Newspaperson costs and falling risingpaperrevenues

13 Other Dependency Features Repeat these same features for: Object pcomp-n Other descendent relations

14 Results RTE2 by Task:IEIRQASUMOverall Accuracy Average Precision RTE2 AccuracySUMNonSUMOverall Test Set Training Set CV RTE1 AccuracyCDNonCDOverall Test Set ( Best submission )83.3 (83.3)56.8 (52.8)61.8 (58.6) Training Set CV

15 Feature Analysis All feature sets are contributing according to cross validation on the training set Most significant feature set: Unigram stem based word alignment Most significant core repeated feature: Average MLE

16 Questions Mixture of experts classifier using corpus co-occurrence statistics Moving in the direction of DIRT Domain of Interest: Student response analysis in intelligent tutoring systems RTE2 Task:IEIRQASUMAll Accuracy Average Precision Bag of WordsGraph Matching Dependency Path Based Entailment Hypothesis h RTE2 AccuracySUMNonSUMOverall Test Set Training Set CV Text t rising costis Theof paper choke Newspaperson costs and falling risingpaperrevenues RTE1 AccuracyCDNonCDOverall Test Set (Best Subm) 83.3 (83.3)56.8 (52.8)61.8 (58.6) Training Set CV

17 Why Entailment Intelligent Tutoring Systems Student Interaction Analysis Are all aspects of the student’s answer entailed by the text and the gold standard answer Are all aspects of the desired answer entailed by the student’s response

18 Word Alignment Features Unigram word alignment

19 Word Alignment Features Bigram word alignment Example: Newspapers choke on rising paper costs and falling revenue. The cost of paper is rising. MLE(cost, t) = n cost of, costs of /n costs of = 6086/35800 = 0.17

20 Word Alignment Features Average unigram and bigram Stem-based tokens

21 Corpora 7.4M articles/docs & 2.5B words, 347 words/doc Gigaword (Graff, 2003) - 5.7M articles, 2.1B words, 375 words/article 77% of documents and 83% of indexed words Reuters Corpus (Lewis et al., 2004) 0.8M articles, 0.17B words, 213 words/article TIPSTER 0.9M articles, 0.26B words, 291 words/article


Download ppt "Toward Dependency Path based Entailment Rodney Nielsen, Wayne Ward, and James Martin."

Similar presentations


Ads by Google