Presentation is loading. Please wait.

Presentation is loading. Please wait.

KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Institute AIFB www.kit.edu Linked Data and Services.

Similar presentations


Presentation on theme: "KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Institute AIFB www.kit.edu Linked Data and Services."— Presentation transcript:

1 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Institute AIFB www.kit.edu Linked Data and Services Andreas Harth and Barry Norton

2 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Outline Motivation Linked Data Principles Query Processing over Linked Data Linked Data Services (LIDS) and Linked Open Services (LOS) Conclusion

3 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Motivation Semantic Web/Linked Data technologies are well-suited for data integration Taking the LIDS off Data Silos Andreas Harth 28.08.2015 Data Integration Interactive Data Exploration Common Data Format/Access Protocol ! ?

4 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linked Data Principles* 1. Use URIs to name things; not only documents, but also people, locations, concepts, etc. 2. To enable agents (human users and machine agents alike) to look up those names, use HTTP URIs 3. When someone looks up a URI we provide useful information; with 'useful' in the strict sense we usually mean structured data in RDF. 4. Include links to other URIs allowing agents (machines and humans) to discover more things (*) http://www.w3.org/DesignIssues/LinkedData.html

5 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Correspondence between thing-URI and source-URI 5 User Agent Web Server http://www.polleres.net/foaf.rdf#me http://www.polleres.net/foaf.rdf HTTP GET RDF

6 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Correspondence between thing-URI and source-URI 6 User Agent Web Server http://dbpedia.org/resource/Gordon_Brown http://dbpedia.org/data/Gordon_Brown HTTP GET 303HTTP GET RDF http://dbpedia.org/page/Gordon_Brown

7 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

8 Queries over Linked Data SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n. } ?f?n SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }

9 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Data warehousing or materialisation-based approaches (MAT) Querying Data Across Sources Andreas Harth Data Summaries for On-Demand Queries over Linked Data 915.03.2010 CRAWL INDEXSERVE SELECT * FROM… RS Distributed query processing approaches (DQP) RS

10 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association DQP on Linked Data Andreas Harth Data Summaries for On-Demand Queries over Linked Data 1015.03.2010 SELECT * FROM… RS RS SELECT ?s WHERE… TP HTTP GET HTTP GET ODBC

11 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Query Processing Overview Andreas Harth Data Summaries for On-Demand Queries over Linked Data 1115.03.2010 TP (an:f#ah foaf:knows ?f) SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n. } TP (?f foaf:name ?n) ?f?n http://danbri.org/foaf.rdf#danbriDan Brickley Select source(s) HTTP GET RDF HTTP GET RDF

12 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Barry

13 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Problem: Source Selection for Triple Patterns Andreas Harth Data Summaries for On-Demand Queries over Linked Data 1315.03.2010 (?s ?p ?o) (#s ?p ?o) (?s #p ?o) (?s ?p #o) (#s #p ?o) (#s ?p #o) (?s #p #o) (#s #p #o) Given a triple pattern, which source can contribute bindings for the triple pattern?

14 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Keep index of properties and/or classes contained in sources (?s #p ?o), (?s rdf:type #o) Covers only queries containing schema-level elements Commonly used properties select potentially too many sources Schema-Level Indices [Stuckenschmidt et al. 2004] Andreas Harth Data Summaries for On-Demand Queries over Linked Data 1415.03.2010 SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n. } SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }

15 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Exploits correspondence between thing-URI and source-URI Linked Data sources (aka RDF files) return typically triples with a subject corresponding to the source Sometimes the sources return triples with object corresponding to the source (#s ?p ?o), (#s #p ?o), (#s #p #o) (?s ?p #o), (?s #p #o) Incomplete wrt. patterns but also wrt. to URI reuse across sources Limited parallelism, unclear how to schedule lookups Direct Lookup (DL) [Hartig et al. 2009] Andreas Harth Data Summaries for On-Demand Queries over Linked Data 1515.03.2010 SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n. } SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }

16 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Combined description of schema-level and instance-level Use approximation to reduce index size (incurs false positives) Possible to use entire query for source selection Parallel lookups since sources can be determined for the entire query (?s ?p ?o), (#s ?p ?o), (?s #p ?o), (?s ?p #o), (#s #p ?o), (#s ?p #o), (?s #p #o), (#s #p #o) and combinations of triple patterns Approximate Data Summaries Andreas Harth Data Summaries for On-Demand Queries over Linked Data 1615.03.2010 SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n. } SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1. }

17 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Implementation Deploy wrappers „in the cloud“ Google App Engine: hosting of Java and Python webapps on Google’s Cloud infrastructure Limited amount of processing time (6hrs/day) Single-threaded applications Suited for deploying wrappers e.g. http://twitter2foaf.appspot.com/ converts Twitter user data to RDFhttp://twitter2foaf.appspot.com/

18 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linking Open Data Cloud 2007

19 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linking Open Data Cloud 2008

20 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linking Open Data Cloud 2009

21 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linking Open Data Cloud 2010

22 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Geonames Services

23 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Geonames Services

24 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Geonames Services {"weatherObservation": {"clouds":"broken clouds", "weatherCondition":"drizzle", "observation":"LESO 251300Z 03007KT 340V040 CAVOK 23/15 Q1010", "windDirection":30, "ICAO":"LESO",...

25 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association {"weatherObservation": {"clouds":"broken clouds", "weatherCondition":"drizzle", "observation":"LESO 251300Z 03007KT 340V040 CAVOK 23/15 Q1010", "windDirection":30, "ICAO":"LESO",... Geonames Services

26 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linked Open Service Principles REST Principles 1. Application state and functionality is divided into resources 2. Every resource is uniquely addressable 3. All resources share a uniform interface: a) A constrained set of well-defined operations b) A constrained set of content types Linked Data Principles 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs. so that they can discover more things. Linked Open Service Principles 1. Describe services as LOD prosumers with input and output descriptions as SPARQL graph patterns 2. Communicate RDF by RESTful content negotiation 3. The output should make explicit its relation with the input

27 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association LOS Weather Service

28 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association LOS Geo Resources

29 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Resource-Based Linked Open Services GET Accept: text/html 303 REDIRECT /page GET Accept: application/rdf+xml (or text/n3) 303 REDIRECT /data Linked Data Linked Service GET /weather Accept: application/rdf+xml (or text/n3) 200

30 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Interlinking Data with Data from Services?

31 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Data Services Given input, provide output Input and output are related in a service-specific way Do not change the state of the world E.g. GeoNames findNearbyWikipedia service Input: lat/lon Output: places Relation: output places that are nearby input place InputOutput Service relation defines

32 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linked Data Services We’d like to integrate data services with Linked Data 1. LIDS need to adhere to Linked Data principles We’d like to use data services in software programs 2. LIDS need machine-readable descriptions of input and output Compared to naïve approach: assign URI to service output Relationship between input and output is explicitly described Dynamicity is supported Multiple or no output resources can be linked to input

33 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association 1. Data Services as Linked Data Input is given as URI http://geowrap.openlids.org/findNearbyWikipedia ?lat=37.416&lng=-122.152 #point Resolving the URI yields RDF: @prefix dbp:. @prefix : :point :point foaf:based_near dbp:Palo_Alto%2C_California ; foaf:based_near dbp:Packard%27s_garage. Service Endpoint Parameters Input Identifier Input Output Relation

34 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association 2. LIDS Descriptions LIDS characterised by Endpoint URI ep, which is the base for all input entities Local identifier i of input entity List of parameters X i Basic graph pattern T i describing conditions on parameters Basic graph pattern T o describing minimum output data Example: ep = i = point X i = {?lat, ?lng} T i = ?point a Point. ?point geo:lat ?lat. ?point geo:long ?lng T o = ?point foaf:based_near ?feature

35 Interlink LIDS and Linked Data Generate service URIs with input bindings, from evaluating : select X i where T i sameAs: binding for i

36 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Scale-Up Experiment: Link BTC to GeoNames 3 billion triples from the Billion Triple Challenge (BTC) 2010 data set: Annotate with LIDS wrapper of GeoNames findNearby service Annotation time: < 12 hours on laptop! ~ 12 hours for uncompressing the data set, cleaning results, and gather statistics Original BTC data: 74 different domains that linked to GeoNames URIs Interlinking process added 891 new now linked to LIDS geowrap In total 2,448,160 new links were added

37 Query Answering using LIDS and Linked Data Query execution resolves URIs => enlarges data set LIDS are interlinked Query is executed again on new data set Repeat until no new links or no new data Combine results

38 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Experiment: Query Answering Input: List of 562 (potential) universities from Facebook Graph API Output: Facebook fans and DBpedia student numbers for 104 universities PREFIX u: SELECT ?n ?f ?s WHERE { u:list foaf:topic ?u. ?u foaf:name ?n. ?u og:fan_count ?f.?u d:numberOfStudents ?s }

39 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Linked Services and PlanetData Several areas seem likely to produce services: Stream, inc. Sensor, resources (latest values) Any others exposing dynamic resources Dynamic computations, inc. on-the-fly quality assessments Other areas seem likely to consider service technologies and move towards more service-like HTTP interactions Access control (OpenID, OAuth, etc.) Finally, remaining areas could serve to complement LIDS/LOS alignment Provenance


Download ppt "KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Institute AIFB www.kit.edu Linked Data and Services."

Similar presentations


Ads by Google