Natural Language Questions for the Web of Data Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni.

Slides:



Advertisements
Similar presentations
YAGO: A Large Ontology from Wikipedia and WordNet Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum Max-Planck-Institute for Computer Science, Saarbruecken,
Advertisements

Gerhard Weikum Max Planck Institute for Informatics & Saarland University Semantic Search: from Names and Phrases to.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Probabilistic Latent-Factor Database Models Denis Krompaß 1, Xueyan Jiang 1,Maximilian Nickel 2 and Volker Tresp 1,3 1 Department of Computer Science.
Leveraging Community-built Knowledge For Type Coercion In Question Answering Aditya Kalyanpur, J William Murdock, James Fan and Chris Welty Mehdi AllahyariSpring.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Language-Independent Set Expansion of Named Entities using the Web Richard C. Wang & William W. Cohen Language Technologies Institute Carnegie Mellon University.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Information Retrieval in Practice
10 December, 2013 Katrin Heinze, Bundesbank CEN/WS XBRL CWA1: DPM Meta model CWA1Page 1.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Querying RDF Data with Text Annotated Graphs Lushan Han, Tim Finin, Anupam Joshi and Doreen Cheng SSDBM’15 
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Chapter 2 Modeling and Finding Abnormal Nodes. How to define abnormal nodes ? One plausible answer is : –A node is abnormal if there are no or very few.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
DEANNA: Natural Language Questions for the Web of Data Mohamed Yahya † Klaus Berberich † Shady Elbassiousni * Maya Ramanath ‡ Volker Tresp # Gerhard Weikum.
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.
Feature Matching Longin Jan Latecki. Matching example Observe that some features may be missing due to instability of feature detector, view change, or.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Ensemble Solutions for Link-Prediction in Knowledge Graphs
Interpreting Dictionary Definitions Dan Tecuci May 2002.
Open Information Extraction using Wikipedia
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou KBS Computing.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Natural Language Questions for the Web of Data Mohamed Yahya 1, Klaus Berberich 1, Shady Elbassuoni 2 Maya Ramanath 3, Volker Tresp 4, Gerhard Weikum 1.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Natural Language Questions for the Web of Data 1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni.
Algorithmic Detection of Semantic Similarity WWW 2005.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Evgeniy Gabrilovich and Shaul Markovitch
Natural Language Questions for the Web of Data Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
1 NAGA: Searching and Ranking Knowledge Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
AIFB Ontology Mapping I3CON Workshop PerMIS August 24-26, 2004 Washington D.C., USA Marc Ehrig Institute AIFB, University of Karlsruhe.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
Neighborhood - based Tag Prediction
Part 2 Applications of ILP Formulations in Natural Language Processing
Web News Sentence Searching Using Linguistic Graph Similarity
Semantic Parsing for Question Answering
and Knowledge Graphs for Query Expansion Saeid Balaneshinkordan
Associative Query Answering via Query Feature Similarity
Web IR: Recent Trends; Future of Web Search
Acquiring Comparative Commonsense Knowledge from the Web
Extracting Semantic Concept Relations
A Graph-Based Approach to Learn Semantic Descriptions of Data Sources
Presentation transcript:

Natural Language Questions for the Web of Data Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni Qatar Computing Research Institute Maya Ramanath Dept. of CSE, IIT-Delhi, India Volker Tresp Siemens AG, Corporate Technology, Munich, Germany EMNLP 2012

Q NL Translation to Q NL : Natural Language Questions “Which female actor played in Casablanca and is married to a writer who was born in Rome?”. Q FL : SPARQL 1.0 ?x hasGender female ?x marriedTo ?w ?x isa actor?w isa writer ?x actedIn Casablanca_(film) ?w bornIn Rome Translation Problem : This complex query is difficult for the user Soluction : automatically Translate q NL to q FL Natural Language Questions for the Web of Data

YAGO2 is a huge semantic knowledge base, derived from Wikipedia, WordNet and GeoNames. Natural Language Questions for the Web of Data Knowledge base Relation ClassEntities

Architecture of System DEANNA (DEep Answers for maNy Naturally Asked questions) Natural Language Questions for the Web of Data

Phrase detection A detected phrase p is a pair Toks : phrase l : label (l ∈ {concept, relation}) Natural Language Questions for the Web of Data Phrase detection Q NL Phrase P r : { } P c : { }

Phrase detection e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?” use a detector that works against a phrase-concept dictionary concept phrase detection : phrase-concept dictionary : instances of the means relation in Yago2 Natural Language Questions for the Web of Data

Phrase detection relation phrase detection : rely on a relation detector based on ReVerb (Fader et al., 2011) with additional POS tag patterns e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?” Natural Language Questions for the Web of Data

Phrase Mapping Two kinds of phrase Mapping: – The mapping of concept phrases – The mapping of relation phrases Phrase Mapping Phrase Mappings Natural Language Questions for the Web of Data

Phrase Mapping the mapping of concept phrases: e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?” phrase-concept dictionary : instances of the means relation in Yago2 also use a detector that works against a phrase-concept dictionary Natural Language Questions for the Web of Data

Phrase Mapping the mapping relation phrases: rely on a corpus of textual patterns to relation mappings e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?” textual patterns relation Natural Language Questions for the Web of Data

Q-Unit Generation Mapping Candidate graph Dependency parsing q-unit is a triple of sets of phrases Two parts of q-uint generation step: Natural Language Questions for the Web of Data

Q-Unit Generation Dependency parsing : identifies triples of tokens:, where t rel, t arg1, t arg2 ∈ q NL who was born in Rome? nsubjpass(born-3, who-1) auxpass(born-3, was-2) root(ROOT-0, born-3) prep_in(born-3, Rome-5) e.q. born whoRome t rel t arg1 t arg2 root nsubjpass in, Natural Language Questions for the Web of Data

Q-Unit Generation q-unit is a triple of sets of phrases,t rel ∈ p rel, t arg1 ∈ p arg1, and t arg2 ∈ p arg2. born was born,, a writerRome PrPr PcPc PcPc Natural Language Questions for the Web of Data

Joint Disambiguation Rule 2: each phrase is assigned to at most one semantic item Rule 1: resolves the phrase boundary ambiguity (only nonoverlapping phrases are mapped) Natural Language Questions for the Web of Data e

Joint Disambiguation Disambiguation Graph Joint disambiguation takes place over a disambiguation graph DG = (V, E), – V = V s ∪ V p ∪ V q – E = E sim ∪ E coh ∪ E q Natural Language Questions for the Web of Data

Joint Disambiguation V s : the set of s-node V p : the set of p-node V rp : the set of relation phrases V rc : the set of concept phrases V q : a set of placeholder nodes for q–units Disambiguation Graph: Vertices Natural Language Questions for the Web of Data

Disambiguation Graph Disambiguation Graph: Edges Esim: E sim ⊆ V p × V s a set of weighted similarity edges Ecoh: E coh ⊆ V s × V s a set of weighted coherence edges Eq: E q ⊆ V q × V p × d d ∈ {rel, arg1, arg2} Q-edges sim-edges Ecoh: Natural Language Questions for the Web of Data

Disambiguation Graph Edge Weights Coh sem (Semantic Coherence) – between two semantic items s1 and s2 as the Jaccard coefficient of their sets of inlinks. Three kinds of inlink – InLinks(e) – InLinks(c) – InLinks(r) Natural Language Questions for the Web of Data

Disambiguation Graph: Edge Weights Coh sem : inlinks of entity InLinks(e): – the set of Yago2 entities whose corresponding Wikipedia pages link to the entity. E.q. – InLinks(Casablanca) = {Marwan_al-Shehhi, Ingrid_Bergman, …, Morocco,…} InLinks(Casablanca) sb.mpg.de/webyagospo/Browser Natural Language Questions for the Web of Data

Disambiguation Graph: Edge Weights Coh sem : inlinks of class InLinks(c) = ∪ e ∈ c Inlinks(e) E.q. – InLinks(wikicategory_Metropolitan_areas_of_Morocco) = InLinks(Casablanca) ∪ InLinks(Marrakech) ∪ … ∪ InLinks(Rabat) Natural Language Questions for the Web of Data entities class

Disambiguation Graph: Edge Weights Coh sem : inlinks of ralation InLinks(r) = ∪ (e1, e2) ∈ r (InLinks(e 1 ) ∩ InLinks(e 2 )) Natural Language Questions for the Web of Data

Similarity Weights Similarity Weights of entities – how often a phrase refers to a certain entity in Wikipedia. Similarity Weights of classes – reflects the number of members in a class Similarity Weights of relations – reflects the maximum n-gram similarity between the phrase and any of the relation’s surface forms Natural Language Questions for the Web of Data

Joint Disambiguation Disambiguation Graph Processing The result of disambiguation is a subgraph of the disambiguation graph, yielding the most coherent mappings. We employ an ILP(integer linear program) to this end. Natural Language Questions for the Web of Data ILP e

Joint Disambiguation : ILP Natural Language Questions for the Web of Data Definitions :

Joint Disambiguation : ILP objective function : Natural Language Questions for the Web of Data

Joint Disambiguation : ILP Constraints: Natural Language Questions for the Web of Data

Joint Disambiguation : ILP resulting subgraph Natural Language Questions for the Web of Data e

Query Generation not assign subject/object roles in triploids and q-units Replacing each semantic class with distinct type-constrained variable Example: – “Which singer is married to a singer?” ?x type singer, ?x marriedTo ?y, and ?y type singer Natural Language Questions for the Web of Data

Query Generation E.q. Natural Language Questions for the Web of Data e ?x Replacing each semantic class ?x ?y Q-uint: arg1 rel arg2 Generation ?x type writer ?y type person bornInRome ?yactedInCasablanca ?ymarried ?x

Evaluation Three part of Evaluation: Datasets Evaluation Metrics Results & Discussion Natural Language Questions for the Web of Data

Datasets Experiments are based on two datasets: – QALD-1 1st Workshop on Question Answering over Linked Data (QALD-1) the context of the NAGA project – NAGA collection The NAGA collection is based on linking data from the Yago2 knowledge base Training set: – 23 QALD-1 questions – 43 NAGA questions Test set: – 27 QALD-1 questions – 44 NAGA questions hyperparameters (α, β, γ) in the ILP objective function. 19 QALD-1 questions in Test set Natural Language Questions for the Web of Data

Evaluation Metrics evaluated the output of DEANNA at three stages – after the disambiguation of phrases – after the generation of the SPARQL query – after obtaining answers from the underlying linked-data sources Judgement – two human assessors – If they were in disagreement then a third person resolved the judgment. Natural Language Questions for the Web of Data

Evaluation Metrics disambiguation stage looked at each q-node/s-node pair. whether the mapping was correct or not. whether any expected mappings were missing. Natural Language Questions for the Web of Data e

Evaluation Metrics query-generation stage Looked at each triple pattern. whether the pattern was meaningful for the question or not. whether any expected triple pattern was missing. e.q. (triple pattern) ?x bornIn Rome ?y actedIn Casablanca ?y married ?x Natural Language Questions for the Web of Data

query-answering stage the judges were asked to identify if the result sets for the generated queries are satisfactory. Natural Language Questions for the Web of Data

Results question q item set s correct(q, s) : – the number of correct items in s ideal(q) : the size of the ideal item set retrieved(q, s) : the number of retrieved items define: coverage and precision as follows: – cov(q, s) = correct(q, s) / ideal(q) – prec(q, s) = correct(q, s) / retrieved(q, s). Natural Language Questions for the Web of Data

Micro-averaging aggregates over all assessed items regardless of the questions to which they belong. Macro-averaging first aggregates the items for the same question, and then averages the quality measure over all questions. For a question q and item set s in one of the stages of evaluation correct(q, s) : the number of correct items in s ideal(q) : the size of the ideal item set retrieved(q, s) : the number of retrieved items define coverage and precision as follows: cov(q, s) = correct(q, s) / ideal(q) prec(q, s) = correct(q, s) / retrieved(q, s).

Results Example questions, the generated SPARQL queries and their answers Natural Language Questions for the Web of Data the relation bornIn relates people to cities and not countries in Yago2.

Results Natural Language Questions for the Web of Data Relaxation use (Elbassuoni et al., 2009)

Natural Language Questions for the Web of Data

Conclusions Author presented a method for translating natural language questions into structured queries. Although author’s model, in principle, leads to high combinatorial complexity, they observed that the Gurobi solver could handle they judiciously designed ILP very efficiently. Author’s experimental studies showed very high precision and good coverage of the query translation, and good results in the actual question answers. Natural Language Questions for the Web of Data