Natural Language Questions for the Web of Data 1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni.

Slides:



Advertisements
Similar presentations
YAGO: A Large Ontology from Wikipedia and WordNet Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum Max-Planck-Institute for Computer Science, Saarbruecken,
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Gerhard Weikum Max Planck Institute for Informatics & Saarland University Semantic Search: from Names and Phrases to.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Probabilistic Latent-Factor Database Models Denis Krompaß 1, Xueyan Jiang 1,Maximilian Nickel 2 and Volker Tresp 1,3 1 Department of Computer Science.
Leveraging Community-built Knowledge For Type Coercion In Question Answering Aditya Kalyanpur, J William Murdock, James Fan and Chris Welty Mehdi AllahyariSpring.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Language-Independent Set Expansion of Named Entities using the Web Richard C. Wang & William W. Cohen Language Technologies Institute Carnegie Mellon University.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
10 December, 2013 Katrin Heinze, Bundesbank CEN/WS XBRL CWA1: DPM Meta model CWA1Page 1.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Querying RDF Data with Text Annotated Graphs Lushan Han, Tim Finin, Anupam Joshi and Doreen Cheng SSDBM’15 
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
DEANNA: Natural Language Questions for the Web of Data Mohamed Yahya † Klaus Berberich † Shady Elbassiousni * Maya Ramanath ‡ Volker Tresp # Gerhard Weikum.
Feature Matching Longin Jan Latecki. Matching example Observe that some features may be missing due to instability of feature detector, view change, or.
Ensemble Solutions for Link-Prediction in Knowledge Graphs
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
Open Information Extraction using Wikipedia
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors Mohamed Ali Hadj Taieb *, Mohamed Ben Aouicha, Abdelmajid Ben Hamadou KBS Computing.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Natural Language Questions for the Web of Data Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni.
Natural Language Questions for the Web of Data Mohamed Yahya 1, Klaus Berberich 1, Shady Elbassuoni 2 Maya Ramanath 3, Volker Tresp 4, Gerhard Weikum 1.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Algorithmic Detection of Semantic Similarity WWW 2005.
PCI th Panhellenic Conference in Informatics Clustering Documents using the 3-Gram Graph Representation Model 3 / 10 / 2014.
Evgeniy Gabrilovich and Shaul Markovitch
Natural Language Questions for the Web of Data Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Similarity Measures for Query Expansion in TopX Caroline Gherbaoui Universität des Saarlandes Naturwissenschaftlich-Technische Fak. I Fachrichtung 6.2.
1 NAGA: Searching and Ranking Knowledge Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum.
Exploiting Relevance Feedback in Knowledge Graph Search
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
Information Retrieval Lecture 3 Introduction to Information Retrieval (Manning et al. 2007) Chapter 8 For the MSc Computer Science Programme Dell Zhang.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
Neighborhood - based Tag Prediction
Web News Sentence Searching Using Linguistic Graph Similarity
Associative Query Answering via Query Feature Similarity
Web IR: Recent Trends; Future of Web Search
Enhanced Dependency Jiajie Yu Wentao Ding.
Acquiring Comparative Commonsense Knowledge from the Web
Extracting Semantic Concept Relations
35 35 Extracting Semantic Knowledge from Wikipedia Category Names
CS246: Information Retrieval
Presentation transcript:

Natural Language Questions for the Web of Data 1 Mohamed Yahya, Klaus Berberich, Gerhard Weikum Max Planck Institute for Informatics, Germany 2 Shady Elbassuoni Qatar Computing Research Institute 3 Maya Ramanath Dept. of CSE, IIT-Delhi, India 4 Volker Tresp 4 Siemens AG, Corporate Technology, Munich, Germany EMNLP 2012

Q NL Translation to Q NL : Natural Language Questions “Which female actor played in Casablanca and is married to a writer who was born in Rome?”. QFL : SPARQL 1.0 ?x hasGender female ?x marriedTo ?w ?x isa actor?w isa writer ?x actedIn Casablanca_(film) ?w bornIn Rome Characteristics of SPARQL : Complex query good results Difficult for the user Translation

Natural Language Questions for the Web of Data 3 Yago2 YAGO2s is a huge semantic knowledge base, derived from Wikipedia, WordNet and GeoNames. Relation ClassEntities

Architecture of DEANNA.

Phrase detection A detected phrase p is a pair Toks : phrase l : label (l ∈ {concept, relation}) 5 Natural Language Questions for the Web of Data Phrase detection Q NL Phrase P r : { } P c : { }

Phrase detection e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?” Search instances of the means relation in Yago2 concept phrase detection :

Phrase detection relation phrase detection : rely on a relation detector based on ReVerb (Fader et al., 2011) with additional POS tag patterns e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?”

Phrase Mapping to map concept phrases: also Search instances of the means relation in Yago2 to map relation phrases: rely on a corpus of textual patterns to relation mappings e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?” textual patterns relation Phrase Mapping Phrase Mapping

Q-Unit Generation Mapping Candidate graph Dependency parsing : q-unit is a triple of sets of phrases

Q-Unit Generation Dependency parsing : identifies triples of tokens:, where t rel, t arg1, t arg2 ∈ q NL who was born in Rome? nsubjpass(born-3, who-1) auxpass(born-3, was-2) root(ROOT-0, born-3) prep_in(born-3, Rome-5) e.q. born whoRome t rel t arg1 t arg2 root nsubjpass in,

Q-Unit Generation q-unit is a triple of sets of phrases,t rel ∈ p rel, t arg1 ∈ p arg1, and t arg2 ∈ p arg2. triples of tokensphrase

Joint Disambiguation 1.each phrase is assigned to at most one semantic item 2.resolves the phrase boundary ambiguity ( only nonoverlapping phrases are mapped ) Rule

Joint Disambiguation Disambiguation Graph Joint disambiguation takes place over a disambiguation graph DG = (V, E), – V = V s ∪ V p ∪ V q – E = E sim ∪ E coh ∪ E q

V = V s ∪ V p ∪ V q V q : a set of placeholder nodes for q–units Joint Disambiguation V s : the set of s-node (s-node is semantic items) V p : the set of p-node p-node is phrases V rp : the set of relation phrases V rc : the set of concept phrases Disambiguation Graph

E q ⊆ V q × V p × d, d ∈ {rel, arg1, arg2} Called q-edge E = E sim ∪ E coh ∪ E q E sim ⊆ V p × V s a set of weighted similarity edges E coh ⊆ V s × V s a set of weighted coherence edges Disambiguation Graph

Edge Weights Coh sem (Semantic Coherence) – between two semantic items s1 and s2 as the Jaccard coefficient of their sets of inlinks. Three kinds of inlink – InLinks(e) – InLinks(c) – InLinks(r)

InLinks(e) InLinks(e): the set of Yago2 entities whose corresponding Wikipedia pages link to the entity. e.q. – Let e = Casablanca – InLinks(Casablanca) = {Marwan_al-Shehhi, Ingrid_Bergman, …, Morocco…} Natural Language Questions for the Web of Data 17

InLinks(c) InLinks(c) = ∪ e ∈ c Inlinks(e) e.q. let c = wikicategory_Metropolitan_areas_of_Morocco – InLinks(wikicategory_Metropolitan_areas_of_Morocco) = InLinks(Casablanca) ∪ InLinks(Marrakech) ∪ InLinks(Fes) ∪ InLinks(Agadir) ∪ InLinks(Safi,_Morocco) ∪ InLinks(Oujda) ∪ InLinks(Tangier) ∪ InLinks(Rabat) Natural Language Questions for the Web of Data 18

InLinks(r) InLinks(r) = ∪ (e1, e2) ∈ r (InLinks(e 1 ) ∩ InLinks(e 2 )) Natural Language Questions for the Web of Data 19

Similarity Weights For entities – how often a phrase refers to a certain entity in Wikipedia. For classes – reflects the number of members in a class For relations – reflects the maximum n-gram similarity between the phrase and any of the relation’s surface forms 20 Natural Language Questions for the Web of Data

Disambiguation Graph Processing The result of disambiguation is a subgraph of the disambiguation graph, yielding the most coherent mappings. We employ an ILP to this end. 21 Natural Language Questions for the Web of Data

Definitions (part1) 22 Natural Language Questions for the Web of Data

Definitions (part2) 23 Natural Language Questions for the Web of Data

objective function 24 Natural Language Questions for the Web of Data

Constraints(1~3) 25 Natural Language Questions for the Web of Data

Constraints(4~7) 26 Natural Language Questions for the Web of Data

Constraints(8~9) This is not invoked for existential questions 27 Natural Language Questions for the Web of Data

resulting subgraph for the disambiguation graph of Figure 3 28 Natural Language Questions for the Web of Data

Query Generation not assign subject/object roles in triploids and q-units Example: – “Which singer is married to a singer?” ?x type singer, ?x marriedTo ?y, and ?y type singer 29 Natural Language Questions for the Web of Data

5 Evaluation Datasets Evaluation Metrics Results & Discussion 30 Natural Language Questions for the Web of Data

Datasets author's experiments are based on two collections of questions: – QALD-1 1st Workshop on Question Answering over Linked Data (QALD-1) the context of the NAGA project – NAGA collection The NAGA collection is based on linking data from the Yago2 knowledge base Training set – 23 QALD-1 questions – 43 NAGA questions Test set – 27 QALD-1 questions – 44 NAGA questions Get hyperparameters (α, β, γ) in the ILP objective function. 19 QALD-1 questions in Test set 31 Natural Language Questions for the Web of Data

Evaluation Metrics author evaluated the output of DEANNA at three stages – 1. after the disambiguation of phrases – 2. after the generation of the SPARQL query – 3. after obtaining answers from the underlying linked- data sources Judgement – two human assessors who judged whether an output item was good or not – If the two were in disagreement, then a third person resolved the judgment. 32 Natural Language Questions for the Web of Data

disambiguation stage The task of judges – looked at each q-node/s-node pair, in the context of the question and the underlying data schemas, – determined whether the mapping was correct or not – determined whether any expected mappings were missing. 33 Natural Language Questions for the Web of Data

query-generation stage The task of judges – Looked at each triple pattern – determined whether the pattern was meaningful for the question or not – whether any expected triple pattern was missing. 34 Natural Language Questions for the Web of Data

query-answering stage the judges were asked to identify if the result sets for the generated queries are satisfactory. 35 Natural Language Questions for the Web of Data

Micro-averaging aggregates over all assessed items regardless of the questions to which they belong. Macro-averaging first aggregates the items for the same question, and then averages the quality measure over all questions. For a question q and item set s in one of the stages of evaluation correct(q, s) : the number of correct items in s ideal(q) : the size of the ideal item set retrieved(q, s) : the number of retrieved items define coverage and precision as follows: cov(q, s) = correct(q, s) / ideal(q) prec(q, s) = correct(q, s) / retrieved(q, s). 36 Natural Language Questions for the Web of Data

37 Natural Language Questions for the Web of Data

Conclusions Author presented a method for translating natural language questions into structured queries. Although author’s model, in principle, leads to high combinatorial complexity, they observed that the Gurobi solver could handle they judiciously designed ILP very efficiently. Author’s experimental studies showed very high precision and good coverage of the query translation, and good results in the actual question answers. 38 Natural Language Questions for the Web of Data