LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.12.11 From EMNLP.

LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.12.11 From EMNLP & CoNLL 2007 (Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning)

Outlines Introduction Related Work Learning Directionality of Inference Rules Experimental Setup Experimental Results Conclusion

Introduction (1) Inference: X eats Y ⇔ X likes Y Examples: “I eat spicy food. ⇒ I like spicy food. (YES) I like rollerblading( 直排輪溜冰 ). ⇒ I eat rollerblading. (NO) Preference: X eats Y ⇒ X likes Y (Asymmetric) Plausibility: 2 sets:{1, 2, 3} {4} Directionality: 3 sets: {1} {2} {3}

Introduction (2) Applications (for improving the performance of) QA (Harabagiu and Hickl, 2006) Multi-Document Summarization (Barzilay et al. 1999) IR (Anick and Tipirneni 1999) Proposed algorithm LEDIR (LEarning Directionality of Inference Rules, pronounced “Leader”) Filtering incorrect rules (case 4) Identifying the directionality of the correct ones (case 1, 2, or 3)

Related Work Learning inference rules Barzilay and McKeown (2001) for paraphrases, DIRT (Lin and Pantel 2001) and TEASE (Szpektor et al. 2004) for inference rules Low precision and bidirectional rules only Learning directionality Chklovski and Pantel (2004) Zanzotto et al. (2006) Torisawa (2006) Geffet and Dagan (2005)

Learning Directionality of Inference Rules (1) – Formal Definition p is a binary semantic relation. The semantic relation can be verb or other relation. x, y are entities. Plausibility: 2 sets:{1, 2, 3} {4} Directionality: 3 sets: {1} {2} {3}

Learning Directionality of Inference Rules (2) – Underlying Assumptions Distributional hypothesis (Harris 1954) words that appear in the same contexts tend to have similar meanings For modeling lexical semantics Directionality hypothesis If two binary semantic relations tend to occur in similar contexts and the first one occurs in significantly more contexts than the second, then the second most likely implies the first and not vice versa. Generality: X eats Y 3000 次 X eats Y ⇒ X likes Y X likes Y 8000 次 Should be

Learning Directionality of Inference Rules (3) – Underlying Assumptions (cont.) Concept in semantic space Being much richer for reasoning about inferences than simple surface words Modeling the context of a relation p of the form using the semantic classes c x and c y of words that can be instantiated for x and y respectively Context similarity of two relations Overlap coefficient: |X ∩ Y| / min(|X|, |Y|)

Learning Directionality of Inference Rules (4) – Selectional Preferences Relational selectional preferences (RSPs) of a binary relation p in the set of semantic classes C(x) and C(y) of words x and y C(x) = { c x : x in instance, c x : the class of term x} C(y) = { c y : y in instance, c y : the class of term y} Example: x likes y using the semantic classes from WordNet C(x) = {individual, social_group…} C(y) = {individual, food, activity…}

Learning Directionality of Inference Rules (5) – Inference Plausibility and Directionality Context similarity of two relations The overlap coefficient of p i and p j Example: ∩

Learning Directionality of Inference Rules (6) – Inference Plausibility and Directionality (cont.) α and β will be determined by experiments.

Learning Directionality of Inference Rules (7) – Two Models (JRM and IRM) Model 1: Joint Relational Model (JRM) Count the actual occurrences of relation p in the corpus Model 2: Independent Relational Model (IRM) Context similarity of two relations Cartesian product

Learning Directionality of Inference Rules (8) – Model 1: Joint Relational Model (JRM) Context similarity of two relations The overlap coefficient of p i and p j Estimating the frequencies :

Experiment Setup (1) Inference rules choosing the inference rules from the DIRT resource (Lin and Pantel 2001) DIRT consists of 12 million rules extracted from 1GB of newspaper text

Experiment Setup (2) Semantic classes Must having the right balance between abstraction and discrimination The first set of semantic classes obtained by running the CBC clustering algorithm (Pantel and Lin, 2002)  on TREC-9 and TREC-2002 newswire collections consisting of over 600 million words. resulted in 1628 clusters, each representing a semantic class. The second set of semantic classes Obtained by using WordNet 2.1 (Fellbaum 1998) A cut at depth four resulted in a set of 1287 semantic classes (only WordNet noun Hierarchy)

Experiment Setup (3) Implementation parsed the 1999 AP newswire collection consisting of 31 million words with Minipar (Lin 1993) Gold Standard Construction randomly sampled 160 inference rules of the form pi ⇔ pj from DIRT, removed 3 nominalization rules, resulted in 157 rules. Using 2 annotators 57 rules used for training set to train annotators 100 rules used for blind test set for this two annotators  Inter-annotator agreement: kappa=0.63  Revising the disagreements together to get the final gold standard

Experiment Setup (4) Baselines B-random Randomly assigns one of the four possible tags to each candidate inference rule. B-frequent Assigns the most frequently occurring tag in the gold standard to each candidate inference rule B-DIRT Assumes each inference rule is bidirectional and assigns the bidirectional tag to each candidate inference rule.

Experimental Results (1) Evaluation Criterion Parameter combination Ran all our algorithms with different parameter combinations on the development set (the 57 DIRT rules), resulted in a total of 420 experiments Used the accuracy statistic to obtain the best parameter combination for each of our four systems Then used these parameter values to obtain the corresponding percentage accuracies on the test set for each of the four systems

Experimental Results (2) 25 11 30 34 27 7 48 18 50

Experimental Results (3) Baseline 66% Baseline 48.48%

Conclusion The problem of semantic inferences fundamental to understanding natural language an integral part of many natural language applications The Directionality Hypothesis The Directionality Hypothesis can indeed be used to filter incorrect inference rules This result is one step in the direction of solving the basic problem of semantic inference

Thanks!!

LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.12.11 From EMNLP.

Similar presentations

Presentation on theme: "LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.12.11 From EMNLP."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.12.11 From EMNLP.

Similar presentations

Presentation on theme: "LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.12.11 From EMNLP."— Presentation transcript:

Similar presentations

About project

Feedback