Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gaby Nativ, SDBI 2007.  Motivation  Other Ontologies  System overview  YAGO Dive IN  LEILA  NAGA  Conclusion.

Similar presentations


Presentation on theme: "Gaby Nativ, SDBI 2007.  Motivation  Other Ontologies  System overview  YAGO Dive IN  LEILA  NAGA  Conclusion."— Presentation transcript:

1 Gaby Nativ, SDBI 2007

2  Motivation  Other Ontologies  System overview  YAGO Dive IN  LEILA  NAGA  Conclusion

3  Which NASA astronaut was born when Elvis was born?

4  Problem : Web pages are designed to be read by people, not machines  Solution : Semantic-Web  Meaning of information and Services is defined  People and machines can use web content

5  Knowledge representation language  Individuals - instances or objects  Classes - concepts or types of objects  Relations – ways that classes and objects can related to one another.  Facts - instance of relation between individuals,classes or relations (Elvis Presley, Isa, Singer)

6  Directed Labeled Multi Graph G = ( V,E,L v,L e )  V is a set of vertices  E  V × V is a multi-set of edges  L v is a is a set of individual and class labels  L e is a set of relation labels  With each edge we associate a confidence value

7 born 1935 ? born type astronaut person entity subclass "Elvis Presley""The King" means Words type Individuals Classes Relations

8   Motivation  Other Ontologies  System overview  YAGO Dive IN  LEILA  NAGA  Conclusion

9 Assemble the ontology manually:  Wordnet  SUMO  GeneOntology  Etc’.. Problems: Usually low coverage

10  Semantic lexicon for English language.  Developed in Princeton University since 1985  Groups English words into synsets  Providing short,general definition  Records a various semantic relations.  Contains about 150,000 words organized in over 115,000 synsets.

11

12  Concerned itself with meta-level concepts  First released in December 2000  Maintained by Articulate Software

13  Part of large effort – Open Biomedical Ontologies.  Constructed in 1998 – 3 models  biological processes  cellular components  molecular function  As of 2005  GO contained over 19,000 terms

14 Automated extraction of ontology  KnowItAll University of Washington  TextToOnto University of Karlsruhe Use pattern matching & machine learning techniques Problem: Usually low accuracy ( 50 %- 92 %)

15   Motivation   Other Ontologies  System overview  YAGO Dive IN  LEILA  NAGA  Conclusion

16 Interface Web YAGO KB LEILA Knowledge Acquisition Tools NAGA Query Processing & Ranking Browser Query Input and Output Tunable Parameters User Backend

17  Based on decidable and simple model  Extensible ontology  High coverage  YAGO knows over 1.7 M entities,14M facts  High quality  Empirical evaluation : 95% accuracy

18  Assemble the ontology from Wikipedia  Good Coverage, 7.83 M entities in all languages

19  Good Accuracy

20  Uses a deep linguistic analysis  Machine learning techniques (SVM)  Input  A binary target relation  A set of Web Documents  Extract  All pairs of entities that are in the target relation

21

22 1935 born American_singer type People_by_occupation Business ? Social_group Classes

23  Each synset of Word-Net becomes a class of YAGO  Extract only Wikipedia’s leaf categories  Exclude Known Individuals in Wordnet  e.g. Albert Einstein will be excluded  15,000 cases WordNet & Wikipedia  Conflict in Meaning prefer Wordnet ”Time exposure” is a common noun for WordNet, but an album title for Wikipedia.

24 Elvis Pr blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories : 1935_births 1935 bornInYear Exploit relational categories bornInYear diedInYear, EstablishedIn

25 Elvis Pr blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories : American_singers 1935 born Exploit conceptual categories subClassOf type American_singer type

26 Elvis Pr blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories : Rock'n_Roll_Music 1935 born American_singer type Rock'n_Roll_Music Avoid thematic categories

27 Shallow linguistic noun phrase parsing: American singers of German origin Premodifier Head Postmodifier Heuristics: If the head is a plural word, the category is conceptual.

28 Pling stemmer

29 1935 born American_singer type Singer Person subclass "singer" means "Elvis Presley" means

30  Storing Witness  Storing each individual the URL of the corresponding Wikipedia page  Storing Confidence

31 YAGO - A Core of Semantic Knowledge 31 1935 born American_singer type Singer#1 Person#3 subclass "singer" means "Elvis Presley" means wiki/Elvis_Presly FoundIn LEILA ExtactedBy

32 singer type But only from 1953 to 1977 We know this from Wikipedia Fact (Elvis, is_a,singer)

33  #1 (Elvis, is_a, singer)  #2 (#1, time, 1953-1977)  #3 (#1, source,Wikipedia) type 1953-1977 Wikipedia time source singer LEILA 0.93

34  A YAGO ontology over  a set of relations R ( type,subClassOf)  a set of common entities C ( entity, class, relation)  a set of fact identifiers I Y : I  (R  C  I)  R  (R  I  C) We can talk about : facts (#1, source, Wikipedia) additional arguments (#1, time, 1953-1977) relations (time, hasRange, time_interval)

35 = subclassOf type aCyclicTransitiveRelation Axioms & Rules: (x, is_a, y) (y, subclass, z) => (x, is_a, z)... singer person subClassOf type

36 Types Relations

37  {(r1, subRelationOf, r2), (x, r1, y)} -> (x, r2, y)  {(r, type, acyclicTransitiveRelation), (x, r, y), (y, r, z)} -> (x, r, z)}  {(r, domain, c), (x, r, c)} -> (x, type, c)}  {(r, range, c), (x, r, y)} -> (y, type, c)}  {(x, type, c1), (c1, subClassOf, c2)} -> (x, type, c2)}

38 Axioms: (x, is_a, y) (y, subclass,z) => (x, is_a, z)... f1, f2, f3, f4, f5 f1, f2, f3 f1, f2, f3, f4, f5, f6, f7, f8, f9, f10 derive facts Eliminate facts finite, unique

39  Consistency YAGO ontology is consistent iff  x,r : (r,TYPE, acyclicTransitiveRelation)  D(y)  (x,r,x)  D(y)  Since D(y) is finite, the consistency of a YAGO ontology is decidable.

40 Is Lake Victoria “locatedIn” Tanzania? When entity should be an individual or a class? e.g. Physics is individual of science

41 KnowItAll SUMO WordNet OpenCyc Cyc 30,000 60,000 200,000 300,000 2,000,000 Yago 14,000,000

42  http://www.mpi- inf.mpg.de/~suchanek/downloads/yago/ http://www.mpi- inf.mpg.de/~suchanek/downloads/yago/  Which astronaut was born in the same year as Elvis? "Elvis Presley" bornInYear $year $astro bornInYear $year $astro isa astronaut 20 Results

43  Roger Bruce Chaffee February 15, 1935 was a U.S. Navy pilot who became an American astronaut in the Apollo program. Died during training in the Apollo 1 fire

44   Motivation   Other Ontologies  System overview   YAGO Dive IN   LEILA overiew  NAGA overview  Conclusion

45 Interface Web YAGO KB LEILA Knowledge Acquisition Tools NAGA Query Processing & Ranking Browser Query Input and Output Tunable Parameters User Backend

46  EVIDENCE QUERY Search the evidence for certain hypothesis  DISCOVERY QUERY KielMaxPlanckPhysicist IsA bornIn Physicist Max Planck IsA $X $Y IsA bornInYear Discover pieces of missing information

47  REGULAR EXPRESSION QUERY An expresion user might be interested in certain Path of relations between pieces of information scientist$XLiu GivenNameOf|familyNameO f IsA river$X Afric a locatedIn* IsA

48  RELATEDNESS QUERY Find a broad relation between pieces of information.  Both are physicists and both are scientists  There are Moon craters and asteroid belts named after them  Tom Cruise connects them by being a vegetarian Bohr Einstein connect

49 The answer to a query Q is a subgraph A of the knowledge graph that matches Q. Q: A: Physicist Max Planck type $X $Y type bornInYear Physicist Max Planck type 1858 Mihajlo Puin type bornInYear 0.98 0.95 0.96 0.97

50  Combines three measures:  Extraction Confident  The informativeness of a fact (e.g. the fact Albert_Einstein isA physicist is more informative than Albert_Einstein isA person)  Compactness of answer graph (e.g “How are Einstein and Bohr related? Both Win Nobel then connected by Tom Cruze )

51  55 queries from TREC 2005/2006  12 queries from the work on SphereSearch  18 regular expression queries  The queries were posed to Google, Yahoo! Answers, and NAGA at the same time

52  Semantic Web Vision  System Overview  YAGO  bases on logically clean model  accuracy of around 95%  YAGO is 7 times larger than the largest competitor.  Investigate the relationship OWL1.1 and YAGO model.

53  “YAGO – A Core of Semantic Knowledge"  “NAGA: Harvesting, Searching and Ranking Knowledge”  “LEILA: Learning to Extract Information by Linguistic Analysis” (Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum …) Available at http://www.mpii.mpg.de/~suchanekhttp://www.mpii.mpg.de/~suchanek

54 Questions ?


Download ppt "Gaby Nativ, SDBI 2007.  Motivation  Other Ontologies  System overview  YAGO Dive IN  LEILA  NAGA  Conclusion."

Similar presentations


Ads by Google