Gaby Nativ, SDBI 2007.  Motivation  Other Ontologies  System overview  YAGO Dive IN  LEILA  NAGA  Conclusion.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Fabian M. SuchanekYAGO - A Core of Semantic Knowledge 1 YAGO – A Core of Semantic Knowledge Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum (Max-Planck.
Fabian M. SuchanekYAGO - A Core of Semantic Knowledge 1 YAGO – A Core of Semantic Knowledge Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum (Max-Planck.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
YAGO: A Large Ontology from Wikipedia and WordNet Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum Max-Planck-Institute for Computer Science, Saarbruecken,
Gerhard Weikum Max Planck Institute for Informatics & Saarland University Semantic Search: from Names and Phrases to.
YAGO-NAGA Project Presented By: Mohammad Dwaikat To: Dr. Yuliya Lierler CSCI 8986 – Fall 2012.
Hermes: News Personalization Using Semantic Web Technologies
Graph Data Management Lab, School of Computer Science Put conference information here.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Database and Information- Retrieval Methods for Knowledge Discovery Database and Information- Retrieval Methods for Knowledge Discovery Gerhard Weikum,
March 17, 2008SAC WT Hermes: a Semantic Web-Based News Decision Support System* Flavius Frasincar Erasmus University Rotterdam.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Saarbrucken / Germany ¨
Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
YAGO:A LARGE ONTOLOGY FROM WIKIPEDIA AND WORDNET FABIAN M. SUCHANEK, GJERGJI KASNECI, GERHARD WEIKUM Subbalakshmi Iyer.
Survey of Semantic Annotation Platforms
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad and Anupam Joshi University of Maryland, Baltimore County
Artificial intelligence project
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Ontology-Based Information Extraction: Current Approaches.
Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.
Dimitrios Skoutas Alkis Simitsis
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Information Extraction Lecture 8 – Ontological and Open IE CIS, LMU München Winter Semester Dr. Alexander Fraser.
Information Extraction Lecture 8 – Ontological and Open IE Dr. Alexander Fraser, U. Munich September 10th, 2014 ISSALE: University of Colombo School of.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 1 LEILA – Learning to Extract Information by Linguistic Analysis presented.
Natural Language Questions for the Web of Data Mohamed Yahya 1, Klaus Berberich 1, Shady Elbassuoni 2 Maya Ramanath 3, Volker Tresp 4, Gerhard Weikum 1.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Natural Language Questions for the Web of Data 1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Natural Language Questions for the Web of Data Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.
1 NAGA: Searching and Ranking Knowledge Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum.
Tutorial: Knowledge Bases for Web Content Analytics
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
The Semantic Web (Slides by Fabian M. Suchanek). Motivation scientists from Brisbane Australia's scientists visit Brisbane The National Science Education.
Copy right 2004 Adam Pease permission to copy granted so long as slides and this notice are not altered Ontology Overview Introduction.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Information Extraction Lecture 10 – Ontological and Open IE
Knowledge Support for Modeling and Simulation Michal Ševčenko Czech Technical University in Prague.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Ontology.
Yet Another Great Ontology
Review of Week 1 Database DBMS File systems vs. database systems
Ontology.
Information Networks: State of the Art
deepschema.org: An Ontology for Typing Entities in the Web of Data
ProBase: common Sense Concept KB and Short Text Understanding
Yago Type Heuristics 丁基伟.
Presentation transcript:

Gaby Nativ, SDBI 2007

 Motivation  Other Ontologies  System overview  YAGO Dive IN  LEILA  NAGA  Conclusion

 Which NASA astronaut was born when Elvis was born?

 Problem : Web pages are designed to be read by people, not machines  Solution : Semantic-Web  Meaning of information and Services is defined  People and machines can use web content

 Knowledge representation language  Individuals - instances or objects  Classes - concepts or types of objects  Relations – ways that classes and objects can related to one another.  Facts - instance of relation between individuals,classes or relations (Elvis Presley, Isa, Singer)

 Directed Labeled Multi Graph G = ( V,E,L v,L e )  V is a set of vertices  E  V × V is a multi-set of edges  L v is a is a set of individual and class labels  L e is a set of relation labels  With each edge we associate a confidence value

born 1935 ? born type astronaut person entity subclass "Elvis Presley""The King" means Words type Individuals Classes Relations

  Motivation  Other Ontologies  System overview  YAGO Dive IN  LEILA  NAGA  Conclusion

Assemble the ontology manually:  Wordnet  SUMO  GeneOntology  Etc’.. Problems: Usually low coverage

 Semantic lexicon for English language.  Developed in Princeton University since 1985  Groups English words into synsets  Providing short,general definition  Records a various semantic relations.  Contains about 150,000 words organized in over 115,000 synsets.

 Concerned itself with meta-level concepts  First released in December 2000  Maintained by Articulate Software

 Part of large effort – Open Biomedical Ontologies.  Constructed in 1998 – 3 models  biological processes  cellular components  molecular function  As of 2005  GO contained over 19,000 terms

Automated extraction of ontology  KnowItAll University of Washington  TextToOnto University of Karlsruhe Use pattern matching & machine learning techniques Problem: Usually low accuracy ( 50 %- 92 %)

  Motivation   Other Ontologies  System overview  YAGO Dive IN  LEILA  NAGA  Conclusion

Interface Web YAGO KB LEILA Knowledge Acquisition Tools NAGA Query Processing & Ranking Browser Query Input and Output Tunable Parameters User Backend

 Based on decidable and simple model  Extensible ontology  High coverage  YAGO knows over 1.7 M entities,14M facts  High quality  Empirical evaluation : 95% accuracy

 Assemble the ontology from Wikipedia  Good Coverage, 7.83 M entities in all languages

 Good Accuracy

 Uses a deep linguistic analysis  Machine learning techniques (SVM)  Input  A binary target relation  A set of Web Documents  Extract  All pairs of entities that are in the target relation

1935 born American_singer type People_by_occupation Business ? Social_group Classes

 Each synset of Word-Net becomes a class of YAGO  Extract only Wikipedia’s leaf categories  Exclude Known Individuals in Wordnet  e.g. Albert Einstein will be excluded  15,000 cases WordNet & Wikipedia  Conflict in Meaning prefer Wordnet ”Time exposure” is a common noun for WordNet, but an album title for Wikipedia.

Elvis Pr blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories : 1935_births 1935 bornInYear Exploit relational categories bornInYear diedInYear, EstablishedIn

Elvis Pr blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories : American_singers 1935 born Exploit conceptual categories subClassOf type American_singer type

Elvis Pr blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories : Rock'n_Roll_Music 1935 born American_singer type Rock'n_Roll_Music Avoid thematic categories

Shallow linguistic noun phrase parsing: American singers of German origin Premodifier Head Postmodifier Heuristics: If the head is a plural word, the category is conceptual.

Pling stemmer

1935 born American_singer type Singer Person subclass "singer" means "Elvis Presley" means

 Storing Witness  Storing each individual the URL of the corresponding Wikipedia page  Storing Confidence

YAGO - A Core of Semantic Knowledge born American_singer type Singer#1 Person#3 subclass "singer" means "Elvis Presley" means wiki/Elvis_Presly FoundIn LEILA ExtactedBy

singer type But only from 1953 to 1977 We know this from Wikipedia Fact (Elvis, is_a,singer)

 #1 (Elvis, is_a, singer)  #2 (#1, time, )  #3 (#1, source,Wikipedia) type Wikipedia time source singer LEILA 0.93

 A YAGO ontology over  a set of relations R ( type,subClassOf)  a set of common entities C ( entity, class, relation)  a set of fact identifiers I Y : I  (R  C  I)  R  (R  I  C) We can talk about : facts (#1, source, Wikipedia) additional arguments (#1, time, ) relations (time, hasRange, time_interval)

= subclassOf type aCyclicTransitiveRelation Axioms & Rules: (x, is_a, y) (y, subclass, z) => (x, is_a, z)... singer person subClassOf type

Types Relations

 {(r1, subRelationOf, r2), (x, r1, y)} -> (x, r2, y)  {(r, type, acyclicTransitiveRelation), (x, r, y), (y, r, z)} -> (x, r, z)}  {(r, domain, c), (x, r, c)} -> (x, type, c)}  {(r, range, c), (x, r, y)} -> (y, type, c)}  {(x, type, c1), (c1, subClassOf, c2)} -> (x, type, c2)}

Axioms: (x, is_a, y) (y, subclass,z) => (x, is_a, z)... f1, f2, f3, f4, f5 f1, f2, f3 f1, f2, f3, f4, f5, f6, f7, f8, f9, f10 derive facts Eliminate facts finite, unique

 Consistency YAGO ontology is consistent iff  x,r : (r,TYPE, acyclicTransitiveRelation)  D(y)  (x,r,x)  D(y)  Since D(y) is finite, the consistency of a YAGO ontology is decidable.

Is Lake Victoria “locatedIn” Tanzania? When entity should be an individual or a class? e.g. Physics is individual of science

KnowItAll SUMO WordNet OpenCyc Cyc 30,000 60, , ,000 2,000,000 Yago 14,000,000

 inf.mpg.de/~suchanek/downloads/yago/ inf.mpg.de/~suchanek/downloads/yago/  Which astronaut was born in the same year as Elvis? "Elvis Presley" bornInYear $year $astro bornInYear $year $astro isa astronaut 20 Results

 Roger Bruce Chaffee February 15, 1935 was a U.S. Navy pilot who became an American astronaut in the Apollo program. Died during training in the Apollo 1 fire

  Motivation   Other Ontologies  System overview   YAGO Dive IN   LEILA overiew  NAGA overview  Conclusion

Interface Web YAGO KB LEILA Knowledge Acquisition Tools NAGA Query Processing & Ranking Browser Query Input and Output Tunable Parameters User Backend

 EVIDENCE QUERY Search the evidence for certain hypothesis  DISCOVERY QUERY KielMaxPlanckPhysicist IsA bornIn Physicist Max Planck IsA $X $Y IsA bornInYear Discover pieces of missing information

 REGULAR EXPRESSION QUERY An expresion user might be interested in certain Path of relations between pieces of information scientist$XLiu GivenNameOf|familyNameO f IsA river$X Afric a locatedIn* IsA

 RELATEDNESS QUERY Find a broad relation between pieces of information.  Both are physicists and both are scientists  There are Moon craters and asteroid belts named after them  Tom Cruise connects them by being a vegetarian Bohr Einstein connect

The answer to a query Q is a subgraph A of the knowledge graph that matches Q. Q: A: Physicist Max Planck type $X $Y type bornInYear Physicist Max Planck type 1858 Mihajlo Puin type bornInYear

 Combines three measures:  Extraction Confident  The informativeness of a fact (e.g. the fact Albert_Einstein isA physicist is more informative than Albert_Einstein isA person)  Compactness of answer graph (e.g “How are Einstein and Bohr related? Both Win Nobel then connected by Tom Cruze )

 55 queries from TREC 2005/2006  12 queries from the work on SphereSearch  18 regular expression queries  The queries were posed to Google, Yahoo! Answers, and NAGA at the same time

 Semantic Web Vision  System Overview  YAGO  bases on logically clean model  accuracy of around 95%  YAGO is 7 times larger than the largest competitor.  Investigate the relationship OWL1.1 and YAGO model.

 “YAGO – A Core of Semantic Knowledge"  “NAGA: Harvesting, Searching and Ranking Knowledge”  “LEILA: Learning to Extract Information by Linguistic Analysis” (Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum …) Available at

Questions ?