Natural Language Questions for the Web of Data Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni.

Slides:



Advertisements
Similar presentations
Fabian M. Suchanek SOFIE: A Self-Organizing Framework for Information Extraction 1 SOFIE: A Self-Organizing Framework for Information Extraction Fabian.
Advertisements

Understanding Tables on the Web Jingjing Wang. Problem to Solve A wealth of information in the World Wide Web Not easy to access or process by machine.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
YAGO: A Large Ontology from Wikipedia and WordNet Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum Max-Planck-Institute for Computer Science, Saarbruecken,
COGEX at the Second RTE Marta Tatu, Brandon Iles, John Slavick, Adrian Novischi, Dan Moldovan Language Computer Corporation April 10 th, 2006.
Gerhard Weikum Max Planck Institute for Informatics & Saarland University Semantic Search: from Names and Phrases to.
YAGO-NAGA Project Presented By: Mohammad Dwaikat To: Dr. Yuliya Lierler CSCI 8986 – Fall 2012.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
A Robust Approach to Aligning Heterogeneous Lexical Resources Mohammad Taher Pilehvar Roberto Navigli MultiJEDI ERC
Probabilistic Latent-Factor Database Models Denis Krompaß 1, Xueyan Jiang 1,Maximilian Nickel 2 and Volker Tresp 1,3 1 Department of Computer Science.
Creating a Similarity Graph from WordNet
Leveraging Community-built Knowledge For Type Coercion In Question Answering Aditya Kalyanpur, J William Murdock, James Fan and Chris Welty Mehdi AllahyariSpring.
1/17 Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation Hiram Calvo and Alexander Gelbukh Presented.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
DEANNA: Natural Language Questions for the Web of Data Mohamed Yahya † Klaus Berberich † Shady Elbassiousni * Maya Ramanath ‡ Volker Tresp # Gerhard Weikum.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
Ensemble Solutions for Link-Prediction in Knowledge Graphs
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
Open Information Extraction using Wikipedia
Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.
Dimitrios Skoutas Alkis Simitsis
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou KBS Computing.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Fabian M. SuchanekLEILA - Learning to Extract Information by Linguistic Analysis 1 LEILA – Learning to Extract Information by Linguistic Analysis presented.
Natural Language Questions for the Web of Data Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni.
Natural Language Questions for the Web of Data Mohamed Yahya 1, Klaus Berberich 1, Shady Elbassuoni 2 Maya Ramanath 3, Volker Tresp 4, Gerhard Weikum 1.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Natural Language Questions for the Web of Data 1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni.
Evgeniy Gabrilovich and Shaul Markovitch
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.
1 NAGA: Searching and Ranking Knowledge Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum.
Exploiting Relevance Feedback in Knowledge Graph Search
Evaluating Answer Validation in multi- stream Question Answering Álvaro Rodrigo, Anselmo Peñas, Felisa Verdejo UNED NLP & IR group nlp.uned.es The Second.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Tutorial: Knowledge Bases for Web Content Analytics
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Of 24 lecture 11: ontology – mediation, merging & aligning.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
NELL Knowledge Base of Verbs
YAGO-QA Answering Questions by Structured Knowledge Queries
Reading Report on Hybrid Question Answering System
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
and Knowledge Graphs for Query Expansion Saeid Balaneshinkordan
Research at Open Systems Lab IIIT Bangalore
Enhanced Dependency Jiajie Yu Wentao Ding.
Acquiring Comparative Commonsense Knowledge from the Web
Extracting Semantic Concept Relations
Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags Niket Tandon1, Charles Hariman1, Jacopo Urbani1,2, Anna.
DBpedia 2014 Liang Zheng 9.22.
35 35 Extracting Semantic Knowledge from Wikipedia Category Names
Question Answering & Linked Data
Template-based Question Answering over RDF Data
Presentation transcript:

Natural Language Questions for the Web of Data Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni Qatar Computing Research Institute Maya Ramanath Dept. of CSE, IIT-Delhi, India Volker Tresp Siemens AG, Corporate Technology, Munich, Germany EMNLP 2012

Introduction  SPARQL ?x hasGender female ?x isa actor ?x actedIn Casablanca (film) ?x marriedTo ?w ?w isa writer ?w bornIn Rome  Natural language question qNL “Which female actor played in Casablanca and is married to a writer who was born in Rome?”. Problem : SPARQL is way too difficlut. Target : Convert SPARQL to qNL.

Knowledge Base: Yago2  Yago2 is a huge semantic knowledge base, derived from Wikipedia, WordNet and GeoNames.

Framework  DEANNA (DEep Answers for maNy Naturally Asked questions)

Framework Phrase Detection Phrase Mapping Q-Unit Generation Disambiguation of Phrase Mappings Query Generation

Phrase Detection  A detected phrase p is a pair where Toks is a phrase and l is a label, l ∈ {concept, relation}. Such as qNL : “Which female actor played in Casablanca and is married to a writer who was born in Rome?”. Concept phrase : Relation phrase : Framework

Phrase Detection Framework Concept detection  Using the Yago2 knowledge base.

Phrase Detection Framework Relation detection  Using the ReVerb (Fader et al., 2011) which is a relation detector. qNL : “Which female actor played in Casablanca and is married to a writer who was born in Rome?”.

Phrase Detection Framework

Phrase Mapping Framework  The mapping of concept phrases also relies on the phrase-concept dictionary. : Using Yago2 knowledge base.  The mapping of relation phrases relies on a corpus of textual patterns to relation mappings of the form.

Phrase Mapping Framework

Dependency Parsing & Q-Unit Generation Framework  Dependency parsing identifies triples of tokens, or triploids,, where trel, targ1, targ2 ∈ qNL are seeds for phrases. Dependency Parsing

Dependency Parsing & Q-Unit Generation Framework  qNL : “Which female actor played in Casablanca and is married to a writer who was born in Rome?”. actor played / played in Casablanca  Triploid :

Dependency Parsing & Q-Unit Generation Framework  A q-unit is a triple of sets of phrases,, where trel ∈ prel and similarly for arg1 and arg2. Q-Unit Generation

Dependency Parsing & Q-Unit Generation Framework

Dependency Parsing & Q-Unit Generation Framework

Disambiguation of Phrase Mappings Framework Disambiguation Graph  Esim ⊆ Vp × Vs  Ecoh ⊆ Vs × Vs  Eq ⊆ Vq×Vp×d, where d ∈ {rel, arg1, arg2} is a q-edge.

Disambiguation of Phrase Mappings Framework Disambiguation Graph(Cohsem)  For Yago2, the characterize an entity e by its inlinks InLinks(e): the set of Yago2 entities whose corresponding Wikipedia pages link to the entity. InLinks(Taipei_zoo):

Disambiguation of Phrase Mappings Framework Disambiguation Graph(Cohsem)  For class c with entities e, its inlinks are defined as follows: InLinks(Taiwan):

Disambiguation of Phrase Mappings Framework Disambiguation Graph(Cohsem)  For class r with entities e, its inlinks are defined as follows:

Disambiguation of Phrase Mappings Framework Disambiguation Graph(Simsem)  For entities How often a phrase refers to a certain entity in Wikipedia.  For classes Normalized prior the reflects the Number of members in a class  For relations The maximum n-gram similarity between the phrase and any of the relation’s surface forms

Disambiguation of Phrase Mappings Framework  Objective function is :

Disambiguation of Phrase Mappings Framework Definitions:

Disambiguation of Phrase Mappings Framework Definitions:

Disambiguation of Phrase Mappings Framework Constraints:

Disambiguation of Phrase Mappings Framework Constraints:

Disambiguation of Phrase Mappings Framework Constraints:

Query Generation Framework

Evaluation  Experiments are based on two collections of questions: QALD-1 (27 questions out of 50) NAGA (44 questions out of 87)  Using 19 questions from the QALD-1 Test set for tuning hyperparameters (α, β, γ) in the ILP objective function

Evaluation  Evaluating the output of DEANNA at three stages in the processing pipeline: a) Disambiguation b) Query Generation c) Question Answering  At each of the three stages, the output was shown to two human assessors. If the two were in disagreement, then a third person resolved the judgment.

Evaluation  Define coverage and precision as follows:

Evaluation a) Disambiguation

Evaluation b) Query Generation

Evaluation c) Question Answering

Evaluation

Conclusions A method for translating natural-language questions into structured queries.