Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: FREyA: an Interactive Way of Querying Linked Data using Natural Language
N ATURAL L ANGUAGE I NTERFACES AND L INKED O PEN D ATA What are NLIs? Challenges: NL understanding/grammar ambiguity/expressiveness knowledge structure/portability small ontologies, large ontologies, multiple ontologies, Linked Open Data? 30 MAY 2011 QALD-1 2
S HIFT IN C HALLENGES 30 MAY 2011 QALD-1 3 Closed-domainOpen-domain Find the answerDisambiguate the answer Language ambiguityData ambiguity Increase recall (Wordnet, etc.)Increase precision Portability 5 years ago vs. today? Heterogeneity, incompleteness, redundancy Can one system support both?
FRE Y A WORKFLOW 30 MAY 2011 QALD-1 4
F INDING POC S 03 J UNE 2010 ESWC
F INDING OC S 03 J UNE 2010 ESWC
geo:City geo:State new york POC population geo:cityPopulation M APPING POC TO OC S 03 J UNE 2010 ESWC geo:State
N EW Y ORK IS A CITY 03 J UNE 2010 ESWC
N EW Y ORK IS A STATE 03 J UNE 2010 ESWC
L EARNING 10 POCOC (context)candidate OCfunction new yorkgeo:State- new yorkgeo:City- populationgeo:Stategeo:statePopulation- populationgeo:Citygeo:cityPopulation- IFTHEN
L EARNING 11
Q UERYING L INKED D ATA WITH FRE Y A: THE USUAL CYCLE 30 MAY 2011 QALD-1 12 for i=1 to n { Initialise the system using dataset A i (forceDialog) Train the system by asking questions Save learningModel i } Intialise the system (automatic mode) by loading learningModel i, i=1,n connect to the repository containing all A i datasets, where i=1, n
EVALUATION Initialisation System performance Precision, recall, f-measure using MusicBrainz and DBPedia datasets Mean Reciprocal Rank to assess the effect of learning mechanism Analysis of failures 30 MAY 2011 QALD-1 13
I NITIALISATION AND THE DATASETS SIZE MusicBrainzDBpedia explicit statements statements entities SPARQL queries initialisation time1380s (0.38h)182779s (50.77h) 30 MAY 2011 QALD-1 14
R ESULTS : F- MEASURE STATISTICS MusicBrainzDBPedia Training Testing Precision0.75/ / / /0.63 Recall0.66/ / / /0.54 F-measure0.70/ / / /0.58 not supported questions reformulated questions avg #dialogs per question partially correct questions MAY 2011 QALD-1 15
L EARNING MusicBrainzDBPedia untrained/trained MRR0.63/ / MAY 2011 QALD-1 16
C ONCLUSION Output: Correct answer OR Identifying the flaws in the data? Ranking/disambiguation algorithms to improve MRR 30 MAY 2011 QALD-1 17 Closed-domainOpen-domain End-usersApplication developer
Contact: THANK YOU FOR YOUR ATTENTION ! QUESTIONS ? 18 Thanks to Ivan Peikov from Ontotext who helped with the configuration of OWLIM which was necessary for performing the experiments reported in this paper.
D EMO United States geography: ml MusicBrainz: tml DBpedia: ya.html