Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 1 LEILA – Learning to Extract Information by Linguistic Analysis presented.

Similar presentations


Presentation on theme: "Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 1 LEILA – Learning to Extract Information by Linguistic Analysis presented."— Presentation transcript:

1 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 1 LEILA – Learning to Extract Information by Linguistic Analysis presented at the 2 nd Workshop on Ontology Learning and Population (OLP2) Fabian M. Suchanek, Georgiana Ifrim, Gerhard Weikum (Max-Planck Institute for Computer Science Saarbrücken/Germany)

2 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 2 Overview ر Motivation ر The LEILA System ر Plan of Attack ر System Architecture ر Experiments ر Conclusion

3 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 3 Motivation Meat dish Google SearchI'm feeling hungry This page has been created to enlighten the public about the Wiener Schnitzel. [...] ?

4 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 4 Motivation To know that a Schnitzel is a meat dish, we need an ontology. ر Use hand-crafted ontologies (like WordNet) (but: low coverage, high cost, fast aging) ر Or: Gather ontological data from Web documents

5 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 5 Goal Given ر a binary target relation (e.g. subclassOf ) ر a set of Web documents extract all pairs of entities that are in the target relation

6 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 6 Related Work X is a Y A Schnitzel is a meat dish from Austria. Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll)

7 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 7 Related Work X is a Y A Schnitzel, also called Wiener Schnitzel, is a meat dish. Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll)

8 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 8 Related Work ┌──────Subject───────────┐┌Obj─┐ A Schnitzel, also called Wiener Schnitzel, is a meat dish. Learn text patterns (e.g. Soderland, Chakrabarti, KnowItAll) Idea: Learn linguistic patterns!

9 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 9 Plan of Attack (Web documents) (Output pairs) Schnitzelmeat dish Koalamammal … subclassOf (Target relation)

10 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 10 Preprocessing (Web documents) (Output pairs) Schnitzelmeat dish Koalamammal … subclassOf (Target relation) The Schnitzel (0.0314946089 stones) is best enjoyed with Ösibräu.

11 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 11 Preprocessing (Web documents) (Output pairs) Schnitzelmeat dish Koalamammal … subclassOf (Target relation) The Schnitzel (200g) is best enjoyed with Ösibräu.

12 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 12 The Schnitzel (200g) is best enjoyed with Oesibraeu. Preprocessing (Web documents) (Output pairs) Schnitzelmeat dish Koalamammal … subclassOf (Target relation)

13 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 13 Preprocessing (Web documents) (Output pairs) Schnitzelmeat dish Koalamammal … subclassOf (Target relation) The Schnitzel is best enjoyed with Oesibraeu. The Schnitzel ( 200 g )

14 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 14 Preprocessing Schnitzelmeat dish Koalamammal … subclassOf detsubj participle adv modcomp The Schnitzel ( 200 g ) adj adj adj adj adj The Schnitzel is best enjoyed with Oesibraeu.

15 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 15 Preprocessing Schnitzelmeat dish Koalamammal … subclassOf (Web documents) (Output pairs) (Target relation)

16 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 16 Algorithm (Web documents)(Seed pairs) (Output pairs) Schnitzelmeat dish Koalamammal … dogmammal... A dog is a mammal. dognag... + -

17 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 17 Algorithm (Web documents)(Seed pairs) (Output pairs) Schnitzelmeat dish Koalamammal … (Positive patterns) dognag... dogmammal... + - This dog is a nag.A X is a Y.

18 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 18 Algorithm (Web documents)(Seed pairs) (Output pairs) Schnitzelmeat dish Koalamammal … (Positive patterns)(Negative patterns) A X is a Y.This X is a Y. dognag... dogmammal... + -

19 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 19 Algorithm (Web documents)(Seed pairs) (Output pairs) Schnitzelmeat dish Koalamammal … (Generalized positive patterns) A X is a Y. dogmammal... dognag... + - A Schnitzel is a meat dish.

20 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 20 LEILA: System Architecture (Web documents)(Seed pairs) (Output pairs) Schnitzelmeat dish Koalamammal … dogmammal... dognag... Seed pair data sets LEILA LinkParser (Sleator, CMU) Preprocessing, stemming kNN Learner SVMLight (Joachims, Cornell U)

21 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 21 Gold Standard for Evaluation (Web documents) (Output pairs) Schnitzelmeat dish Koalamammal … (Target relation) (Ideal pairs) A Schnitzel is practically vitamin-free and thus the meat dish is extremely popular in Europe. Schnitzel meat dish

22 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 22 Results with different relations Seed pairs are given by a function that decides whether a word pair is ر an example (here: list of birth dates from www.famousbirthdays.com) ر a counterexample (here: can be deduced from examples) ر a candidate (here: all pairs of a name and a date) birthDate

23 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 23 Results with different relations birthDate Patterns: X (born in Y) X was born in Y... 79%  8% 70%  9% Target Relation CorpusPrecision Recall Wikip composers (see paper for details on the experiments)

24 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 24 Results with different relations synonymy Examples: all WordNet synsets Counterexamples: all words that are not in a synset Candidates: all pairs of proper names Patterns: X or Y, X (or Y),... 73%  7% 64%  7% birthDate 79%  8% 70%  9% Target Relation CorpusPrecision Recall Wikip composers Wikip geography

25 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 25 Results with different relations Examples: all direct WordNet hyponyms Counterexamples: all words that are not hyponyms of each other Candidates: all pairs of a proper name and a WordNet concept Patterns: an X is a Y, X is unusual among the Y,... instanceOf 58%  3% 41%  3% synonymy 73%  7% 64%  7% birthDate 79%  8% 70%  9% Target Relation CorpusPrecision Recall Wikip composers Wikip geography Wikip composers

26 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 26 Results with different relations instanceOf 58%  3% 41%  3% synonymy 73%  7% 64%  7% birthDate 79%  8% 70%  9% Target Relation CorpusPrecision Recall Wikip composers Wikip geography Wikip composers Wikip random Google composers 28%  3% 17%  2% 33%  3% (see paper for details on the experiments)

27 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 27 58 41 Results with different competitors (see paper for explanations, conditions and details!) Snowball headquarters Snowball’s corpus TextToOnto,Text2Onto instanceOf Wikip composers CV-System instanceOf CV’s corpus CV-System instanceOf Wikip composers 34 90 50 30 58 41 50 39 4 32 26 15 32 22 4 (Results in %, LEILA in red) 2 Precision Recall

28 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 28 Conclusion Our system LEILA ر can learn arbitrary binary relations from Web documents ر uses a deep linguistic analysis ر compares favorably with other systems See http://www.mpi-inf.de/~suchanek

29 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 29 Results with different competitors headquarters instanceOf 34%  8% 30%  7% System Relation Corpus Precision Recall SnowballSnowball’s headquarters 90%  6% 50%  7% LEILA Snowball’s TextToOnto Wikip composers 39%  9% 4%  1% Text2Onto instanceOf Wikip composers 50% 2%  1% CV-System instanceOf CV’s 32%  5% LEILA instanceOf CV’s 26%  7% 15%  4% CV-System instanceOf 22% 4%  2% Wikip composers LEILA instanceOf Wikip composers 58%  3% 41%  3% (see paper for explanations, conditions and details!) LEILA instanceOf Wikip composers 58%  3% 41%  3%

30 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 30 Pattern Generalization – kNN This X is a Y. X such as Y A X is a Y. + + - A X is a big Y (See our paper at KDD for details)

31 Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 31 Pattern Generalization – SVM This X is a Y. X such as Y A X is a Y. + + - A X is a big Y (See our paper at KDD for details) - + +


Download ppt "Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 1 LEILA – Learning to Extract Information by Linguistic Analysis presented."

Similar presentations


Ads by Google