Presentation is loading. Please wait.

Presentation is loading. Please wait.

N2R-Part: Identity Link Discovery using Partially Aligned Ontologies by Nathalie Pernelle 1, Fatiha Saïs 1, Brigitte Safar 1, Maria Koutraki 2 and Tushar.

Similar presentations


Presentation on theme: "N2R-Part: Identity Link Discovery using Partially Aligned Ontologies by Nathalie Pernelle 1, Fatiha Saïs 1, Brigitte Safar 1, Maria Koutraki 2 and Tushar."— Presentation transcript:

1 N2R-Part: Identity Link Discovery using Partially Aligned Ontologies by Nathalie Pernelle 1, Fatiha Saïs 1, Brigitte Safar 1, Maria Koutraki 2 and Tushar Ghosh 1,3 1 LRI –Paris Sud University & CNRS 2 ETIS, Cergy-Pontoise 3 INRIA Saclay Île de France WOD

2 Context Discover identity links between data items in RDF data sources structured by distinct owl ontologies same restaurant, same laboratory … Existing data linking tools exploit the mapped entities (classes, properties) of the ontologies to define linking rules [Silk, LDIF] A = {HumanBeing Person, foaf:name name, …} Some of these mappings can be declared or discovered by (semi-)automatic alignment tools [Shvaiko & all 2012] But the set of mappings can be incomplete, in particular the set of property mappings 2

3 Restaurant Person Chief hasOwner hasChief Address street city String hasLocation title cuisineType phone creditCard smoking String name Restaurant Address street city String location rname food phonenum acceptedCard String name own hasCook Person Two simple ontologies O1 O2 Class mappings (complete set) : {Restaurant Restaurant, …} Property mappings: {street street, rname title, city city, hasLocation Location} unmapped prop. mapped prop. subsumption

4 Two Restaurants to compare 4 r1 r2 a1 a Visa card thai asian Only at bar Lotus bleu thai asian chinese Visa card Master card hasLocation cuisineType creditCard smoking phone p1 p2 title name location food acceptedCard own phonenum in O1in O2 phone hasOwner

5 Aim The values of the mapped properties can be very heterogeneous, or even unknown for some instances Street : downing St, London, SW1A 2AA 10 Downing street How to improve the recall in such a context ? 5

6 Main ideas Exploit unmapped properties to increase the similarity scores Exploit the ontology semantics and the property values to select the best comparable properties for two compared class instances Combine similarities between mapped properties and selected unmapped properties Propagate the similarities thanks to a graph-based data linking approach same Restaurant same Address same City same Country Focus on Data sources that can be replicated locally Extend an existing graph-based data linking tool (N2R [Sais et al 09]) 6

7 N2R Linking Tool Knowledge-based approach (i.e. keys) Common mapped keys of O1/O2 (cartesian product) O1:name,O2:birthDate,deathDate name+birthdate+deathdate Non linear equation system Each equation represents how a similarity score xi can be computed using related similarity scores fi(X)= max (fi-df(X), fi-ndf(X)) Solved thanks to an iterative method 7

8 8 r1,r2 p1,p2 a1,a2 Mapped {Le lotus bleu}, {le lotus bleu} Mapped {Le lotus bleu}, {le lotus bleu} Best comparable Data Type properties {thai,asian} {thai,asian, chinese} { },{ , 33888…} {Visacard}, {Mastercard, Visacard} Best comparable Data Type properties {thai,asian} {thai,asian, chinese} { },{ , 33888…} {Visacard}, {Mastercard, Visacard} Mapped {Chang Lee} {Chang lee} Mapped {Chang Lee} {Chang lee} Mapped {17 rue Polar}, {rue Polar} Mapped {17 rue Polar}, {rue Polar} Mapped {Paris}, {Paris} Mapped {Paris}, {Paris} key Impacts and propagation Best comparable Object Properties

9 Comparable properties 9 Exploit the ontology to select comparable properties Comparable object properties it exists one compatible (more specific or equivalent) domain and one compatible range, and inverse properties are considered own (domain Person, range Restaurant) is comparable to Inverse(hasOwner) (domain Restaurant, range Person) Inverse(haschief) (domain Restaurant, range Chief) Comparable datatype properties compatible w.r.t the datatypes of XML schema cuisineType is comparable to food, acceptedCard … (domain Restaurant, range string)

10 Similarity of Best comparable properties Exploit property values to select the best comparable properties for two compared class instances For 2 datatype property values : elementary similarity measures sim(«asian », « asian ») =1 Sum ( >given threshold) (i1, i2, prop1, prop2, sum,maxNumberOfPropertyInstances) (r1, r2, cuisineType, food, 2, 3) Finally, similarity of (r1,r2) based on unmapped datatype properties simNAP(r1,r2)= (1+2+1)/(2+3+2)=0.43 Same process for object property values, but propagation 10

11 Extension of N2R Keep the key importance in the equation Give a bigger importance to the mapped properties fi(X)=max(fi-df(X), (fi-map(X) + α fi-unmap(X)) 11

12 Conclusions – Future Work Conclusions Extension of a graph-based data linking tool to take into account unmapped properties Future Work Evaluation of this strategy on real data sets Focus on declared (or learned) unmapped keys/unmapped discriminative properties [symeonidou11, atencia12] (i.e select phone, but not creditCard) Discover new mappings between properties thanks to discovered links 12

13 Thank you for your attention! Questions? 13


Download ppt "N2R-Part: Identity Link Discovery using Partially Aligned Ontologies by Nathalie Pernelle 1, Fatiha Saïs 1, Brigitte Safar 1, Maria Koutraki 2 and Tushar."

Similar presentations


Ads by Google