Presentation is loading. Please wait.

Presentation is loading. Please wait.

Afraz Jaffri, Hugh Glaser, Ian Millard Electronics and Computer Science University of Southampton.

Similar presentations


Presentation on theme: "Afraz Jaffri, Hugh Glaser, Ian Millard Electronics and Computer Science University of Southampton."— Presentation transcript:

1 Afraz Jaffri, Hugh Glaser, Ian Millard Electronics and Computer Science University of Southampton

2 2SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage 1. Linked Data 2. URI Multiplicity 3. The Problem of Coreference 4. URI Identity Management Approaches 5. The Problem with owl:sameAs 6. The Consistent Reference Service (CRS) 7. CRS Architecture 8. A CRS Application: The RKB Explorer 9. Summary and Future Work

3 3SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage DBpedia has URIs for approximately 2 million entities Linked datasets contain many overlapping entities A single entity can have a number of URIs Entities are linked using owl:sameAs Example

4 4SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage http://www.rkbexplorer.com Contains URIs for more than 10 million entities Data relating to people, projects, papers and institutions A single entity has a number of URIs (even within the same repository) Entities are linked using CRSs DBLP

5 5SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage URIs for Spain: http://dbpedia.org/resource/Spain http://ww4.wiwiss.fu-berlin.de/factbook/resource/Spain http://sws.geonames.org/2510769 http://www4.wiwiss.fu-berlin.de/eurostat/resource/countries/Espa%C3%Bla URIs for Hugh Glaser: http://acm.rkbexplorer.com/rdf/resource-P112732 http://citeseer.rkbexplorer.com/rdf/resource-CSP109020 http://citeseer.rkbexplorer.com/rdf/resource-CSP109013 http://citeseer.rkbexplorer.com/rdf/resource-CSP109011 http://citeseer.rkbexplorer.com/rdf/resource-CSP109002 http://dblp.rkbexplorer.com/rdf/resource-27de9959 http://europa.eu/People/#person-0ff816fa http://resist.ecs.soton.ac.uk/wiki/User:hugh_glaser http://www.ecs.soton.ac.uk/info/#person-00021

6 6SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage Tom Anderson – http://www4.wiwiss.fu-berlin.de/dblp/resource/person/109074 Is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dc:creator of dc:creator is dblp:editor of dblp:editor Vice President O-in Design Automation inc. USAProfessor, University of NewcastleProfessor, Heriot Watt UniversityUniversity of WashingtonUniversity of California, BerkelyTom Andersen - University of DenmarkLucent Technologies, Illinois

7 7SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage The problem of coreference has existed for many years Physical Libraries disambiguate authors through Date of Birth Digital Libraries still have the problem of author disambiguation Problems caused by variations in naming schemes e.g. Glaser, H. H. Glaser Glaser, Hugh H. Glazer

8 8SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage Coreference Problem referred to as Record Linkage Matching entities between records similar to matching entities between datasets Database linkage is easier due to imposed schema Formal theory of Record Linkage proposed by Fellegi & Sunter (1969) Uses coded agreements between each field (property) to give the probability of record (instance) equivalence Can be adapted for use on the Semantic Web

9 9SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage Coreference on the Semantic Web is defined as being the situation where two or more URIs are used for a single non- information resource URI usage can change with context Non-Information resources are hard to define precisely Examples Hugh Glaser at Southampton vs. Hugh Glaser at Imperial Harry Potter and the Order of the Phoenix in Hardback vs. Softback ISBN: 978-0747561071 978-0747551003

10 10SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage Use a centralised naming authority to issue URIs for every entity in the world Let everyone create their own URIs and link them to official URIs (using owl:sameAs) Let everyone create their own URIs and register them at a centralised repository Let everyone create their own URIs and let them be managed by many decentralised repositories In all of the above encourage reuse and linking as far as possible

11 11SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage owl:sameAs was designed for a specific purpose Resources linked with owl:sameAs have the same identity i.e. The subject and object are exactly the same resource owl:sameAs has been misused for Linking Open Data Linking can occur between two very different resources, e.g. Tom Anderson Reasoning with LOD will have unintended consequences

12 12SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage Hugh Glaser Hugh Glaser hg@ecs.soton.ac.uk hg1@soton.ac.uk Reader Lecturer Assert SELECT ?x WHERE { vcard:EMAIL ?x} Returns hg1@soton.ac.uk hg@ecs.soton.ac.uk Which email belongs to which role? Using owl:sameAs means that both URIs become indistinguishable even though they may refer to different entities according to the context in which they are used.

13 13SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage Data (Knowledge) providers publish data (knowledge) Resources from one provider cannot be guaranteed to be the same as resources from another provider Knowledge will be published and made dereferenceable at the domain that the publisher has control over URIs will be constructed from the domain name of the publishers site An intermediate service groups URIs of resources that may be the same This knowledge is made available upon dereferencing the URI of a resource

14 14SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage Can be seen as a conventional Knowledge Base Contains knowledge about the URIs in a repository URIs referring to the same resource are grouped together in Bundles A Bundle has properties: Coref:hasEquivalentReference – The URIs in a bundle are grouped together using this predicate Coref:hasCanonicalReference – One URI in a bundle can be made to be the canonical representation i.e. The preferred URI Coref:updatedOn – The date of the last update to the bundle

15 15SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage @prefix coref:. @prefix rdf:. a coref:Bundle ; coref:hasCanonicalReference ; coref:hasEquivalentReference,,,.

16 16SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage http://southampton.rkbexplorer.com/id/person-00021 RESOLVE RETRIEVE RDF http://southampton.rkbexplorer.com/data/person-00021 http://southampton.rkbexplorer.com/description/person-00021 KB CRS Non-Information Resource Information Resource Text/HtmlRDF/XML Application

17 17SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage Finding all equivalences (bundles) is up to the application A separate activity from coreferencing a single data source Services such as Sindice can perform this function for free To perform the equivalence closure just follow the crs:hasCRS links Scalability is ensured by not including all possible bundles in every CRS

18 18SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage The Resilience Knowledge Base Explorer displays communities of practice for people, projects and publications from the RKB Uses multiple CRSs to disambiguate people and publications One CRS per knowledge base ensures scalability Multiple SPARQL queries Look yourself up! www.rkbexplorer.com/explorer

19 19SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage Equivalence Mining is a difficult task that requires multiple algorithms Adding policies to determine the trust level of a CRS Establishing the authority of a CRS over a KB Establishing performance metrics Collaborating with LOD community for wide scale deployment Formalising the linking methodology

20 20SSWS07 - Vilamoura, Potugal URI Identity Management for Semantic Web Data Integration and Linkage Coreference exists in many disciplines and will exist on the Semantic Web The equivalence of non-information resources depends on context The semantics of owl:sameAs do not fit with the current usage in Linked Data The CRS is a solution that is being deployed on a large knowledge-based infrastructure Its my knowledge, so let me name it!

21 SSWS07 - Vilamoura, Potugal21 Questions? URI Identity Management for Semantic Web Data Integration and Linkage


Download ppt "Afraz Jaffri, Hugh Glaser, Ian Millard Electronics and Computer Science University of Southampton."

Similar presentations


Ads by Google