Template-based Question Answering over RDF Data

Template-based Question Answering over RDF Data
Christina Unger , Lorenz Bühmann , Jens Lehmann Axel-Cyrille Ngonga Ngomo ,Daniel Gerber , Philipp Cimiano Yanan Zhang

background Intuitive ways of accessing RDF data become more and more important. Question answering approaches have been proposed as a good compromise between intuitiveness and expressivity. general way: a triple-based representation e.g. Who wrote The Neverending Story? (PowerAqua): <[person,organization], wrote, Neverending Story>. <Writer, IS_A, Person> <Writer, author, The Neverending Story>

1. (a) Which cities have more than three universities?
(b) <[cities], more than, universities three> (c) SELECT ?y WHERE { ?x rdf:type onto:University . ?x onto:city ?y . } HAVING (COUNT(?x) > 3) 2. (a) Who produced the most films? (b)<[person,organization], produced, most films> ?x rdf:type onto:Film . ?x onto:producer ?y . ORDER BY DESC(COUNT(?x)) OFFSET 0 LIMIT 0 the original semantic structure of the question can not be faithfully captured using triples.

contribution a domain-independent question answering approach
the question (parse) a SPARQL template Identify domain specific entities SELECT ?x WHERE { ?x ?p ?y . ?y rdf:type ?c . } ORDER BY DESC(COUNT(?y)) LIMIT 1 OFFSET 0

POS tagger

Who produced the most films?
POS tagger (a) who/WP produced/VBD the/DT most/JJS films/NNS Parsing and template generation Domain independent lexicon: 107 entries: light verbs ,question words ,determiners, negation words, coordination and the like. (b) Covered tokens: who, the most, the, most Domain dependent lexicon: built on-the-fly. POS tag ——> syntactic and semantic properties. (c) Building entries for: produced/VBD, films/NNS

POS tag ——> syntactic and semantic properties.
Heuristics: Named entities , resources. Nouns , classes, properties. Verbs, properties. If no contribution, instead by noun (Which cities have more than 2 million inhabitants?) syntactic representation semantic representation

Who produced the most films?
SPARQL templates: Who produced the most films? (a) SELECT ?x WHERE { ?x ?p ?y . ?y rdf:type ?c . } ORDER BY DESC(COUNT(?y)) LIMIT 1 OFFSET 0 Slots: <?c, class, films> <?p, property, produced> (b) SELECT ?x WHERE { <?p, property, films>

String s——knowledge base K ——similar entity
Entity identification String s——knowledge base K ——similar entity Generic approach S Property detection Large number of expressions can be used to denote the same predicate. (X, the creator of Y and Y is a book by X ) BOA pattern library WordNet Label(e) Entities e S(s)

Sentences: …"label(x) *label(y)" or "label(y) * label(x)"…
Pairs: I(p)={(x,y):(x p y)∈K} NLE Ѳ : the form ?D? representation ?R? or ?R? representation ?D? Distinguish patterns that are Specific to property p . Support Typicity Specificity pairs x p y Sentences: …"label(x) *label(y)" or "label(y) * label(x)"… NLE Ѳ Pairs (p, Ѳ) BOA patterns

the highest scored query with a non-empty result.
Query ranking and selection String similarity, prominence of entities and the schema of the knowledge base to score a query. Entities score: type checks on queries . (?x p e e p ?x ) Return: the highest scored query with a non-empty result.

Evaluation and discussion
The QALD benchmark on Dbpedia: 50 questions annotated with SPARQL queries and answers. Metric: Precision recall Preliminary remark: manually corrected erroneous POS tags in seven questions. 11 questions rely on namespaces which we did not incorporate for predicate detection: FOAF ,YAGO

Unknown domain-independent expressions
Results: 19 p:1.0 r:1.0 P>0.8 r>0.8 Precision recall Mean

The key advantage : Incorrect templates
The semantic structure of the natural language input is faithfully captured. e.g. Complex questions containing quantifiers , comparatives, superlatives. Don’t need any user feedback. Incorrect templates No sensible template is constructed. Is there a video game called Battle Chess? Property slot: title or name Rdfs:label The structure of the templates is sometimes too rigid. Join the EU prop:accessioneudate The sporadic failure of named entity recognition. Battle of Gettysburg

Entity identification
Class or property cannot be found on the basis of the slot. Give me all soccer clubs in the Premier League. Onto:league Give me all movies with Tom Cruise Onto:starring Hard to match Which cities have more than inhabitants prop:populationTotal Who owns Aldi onto:keyPerson Which mountains are higher than the Nanga Parbat prop:elevation

Query selection others
A query with the wrong entity instantiating the slot is picked. The slot contains too little information in order to decide among candidates. Founded: prop:foundation, prop:foundingYear, prop:foundingDate,onto:foundationPerson, onto:foundationPlace Which organizations were founded in 1950 When was Capcom founded Which software has been developed by organizations founded in California others Namespace overlap and chosing one over the other often leads to different results of different quality. …….

Future work Rigid templates: a preprocessing step
a more flexible fallback strategy Provide robust question answering for large scale heterogeneous knowledge bases.

Thanks for your listening!

Template-based Question Answering over RDF Data

Similar presentations

Presentation on theme: "Template-based Question Answering over RDF Data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Template-based Question Answering over RDF Data

Similar presentations

Presentation on theme: "Template-based Question Answering over RDF Data"— Presentation transcript:

Similar presentations

About project

Feedback