Presentation is loading. Please wait.

Presentation is loading. Please wait.

Representation and Querying of Linked Geospatial Data (ReQuLGD)

Similar presentations


Presentation on theme: "Representation and Querying of Linked Geospatial Data (ReQuLGD)"— Presentation transcript:

1 Representation and Querying of Linked Geospatial Data (ReQuLGD)
Manolis Koubarakis and Konstantina Bereta ISWC 2017 Tutorial October 21, 2017

2 Outline Introduction Previous related research in other areas
Motivation Extensions of RDF and SPARQL for the representation and querying of geospatial data Geospatial description logics and geospatial ontology- based data access systems Implemented systems, evaluation and comparison Open questions for future research

3 Why Spatial (and Temporal) Data?
5/17/11 Why Spatial (and Temporal) Data? Spatial and temporal data are very important in reality: Everything that happens, happens sometime, somewhere. Decision making can be substantially improved if we know when and where things take place. This tutorial is for the “where”. Snow cover frequency in Northeast China (Journal of Remote Sensing) 3 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 3 3

4 Previous Research

5 Geographic Information Systems (GIS) Research
5/17/11 Geographic Information Systems (GIS) Research Lots of interesting theoretical and practical work by GIS researchers. Topics covered: Geographic data and their representation Geographic data modelling and geographic databases GIS software Cartography and map production Spatial data analysis and decision making Geospatial data on the Web 5 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 5 5

6 Geographic Information Systems Research (cont’d)
5/17/11 Geographic Information Systems Research (cont’d) Industrial impact: Lots of relevant standards by the Open Geospatial Constortium (OGC). State-of-the-art GIS software e.g., ArcGIS or QGIS. 6 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 6 6

7 Geographic Information Systems Research (cont’d)
5/17/11 Geographic Information Systems Research (cont’d) Educational impact: Relevant ideas have found their way in well- known GIS books. 7 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 7 7

8 Spatial Database Research
5/17/11 Spatial Database Research Lots of interesting theoretical and practical work by database researchers. Topics covered: Data models and query languages Storage structures and indexing techniques Query processing User interfaces Implemented systems Applications 8 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 8 8

9 Spatial Database Research (cont’d)
5/17/11 Spatial Database Research (cont’d) Industrial impact: The OGC standard “OpenGIS Simple Feature Access - Part 2: SQL option” introduced geospatial data in SQL in 2010. Spatially enabled databases (e.g., PostGIS, Oracle Spatial, Spatialite). 9 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 9 9

10 Spatial Database Research (cont’d)
5/17/11 Spatial Database Research (cont’d) Educational impact: Relevant ideas have found their way in well- known database textbooks. 10 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 10 10

11 5/17/11 Spatial AI Research Lots of interesting theoretical and practical work by Artificial Intelligence researchers. Topics covered: Spatial logics Spatial constraint networks Implemented systems Applications Contributions to other AI research areas like planning and commonsense reasoning 11 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 11 11

12 Spatial AI Research (cont’d)
5/17/11 Spatial AI Research (cont’d) Industrial impact: RCC in GeoSPARQL standard (more later) Educational impact: Relevant ideas have found their way in well-known AI textbooks 12 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 12 12

13 Motivation

14 Geospatial Data on the Web
5/17/11 Geospatial Data on the Web Very popular and useful map software. 14 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 14 14

15 Open Government Data 5/17/11 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 15

16 Linked Geospatial Data- Ordnance Survey (United Kingdom)
5/17/11 Linked Geospatial Data- Ordnance Survey (United Kingdom) 16 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 16 16

17 Linked Geospatial Data- Kadaster (The Netherlands)
5/17/11 Linked Geospatial Data- Kadaster (The Netherlands) 17 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 17 17

18 OpenStreetMap 5/17/11 18 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 18 18

19 http://www.linkedopendata.gr 5/17/11 Suvayu Ali, S.F.U.
19 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 19 19

20 LOD Cloud (Aug. 2014): Lots of Geospatial Data
5/17/11 20 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 20 20

21 LOD Cloud (Aug. 2017): Lots of Geography Data
5/17/11 21 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 21 21

22 Geospatial Extensions of RDF and SPARQL

23 Overview Early papers: Kolas (2007) Perry’s PhD dissertation (2008)
5/17/11 Overview Early papers: Kolas (2007) Perry’s PhD dissertation (2008) Koubarakis and Kyzirakos (2010) More recent proposals: The OGC standard GeoSPARQL (2012) The data model stRDF/stSPARQL (2012) The framework RDFi (2013) 23 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 23 23

24 GeoSPARQL and stRDF/stSPARQL
5/17/11 GeoSPARQL and stRDF/stSPARQL The two proposals offer constructs for: Developing ontologies for spatial and temporal data Encoding spatial and temporal data that use these ontologies in RDF Extending SPARQL to query spatial and temporal data Temporal data is covered only by stRDF/stSPARQL 24 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 24 24

25 GeoSPARQL GeoSPARQL is an OGC standard. Main functionalities:
5/17/11 GeoSPARQL [Perry and Herring, 2012] GeoSPARQL is an OGC standard. Main functionalities: Representing geospatial information is done using high level ontologies inspired from GIS terminology Geometries are represented using literals of spatial datatypes Literals are serialized using OGC standards WKT and GML Families of functions are offered for querying geometries 25 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 25 25

26 GeoSPARQL Components Parameters Serialization WKT GML Relation Family
Core Parameters Serialization WKT GML Relation Family Simple Features RCC-8 Egenhofer Topology Vocabulary Extension - relation family Geometry Extension - serialization - version Geometry Topology Extension - serialization - version - relation family Query Rewrite Extension - serialization - version - relation family RDFS Entailment Extension - serialization - version - relation family

27 5/17/11 GeoSPARQL Core Defines two top level classes that can be used to organize geospatial data 27 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 27 27

28 GeoSPARQL Geometry Extension
5/17/11 GeoSPARQL Geometry Extension Provides vocabulary for asserting and querying data about the geometric attributes of a feature 28 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 28 28

29 Example gag:Olympia rdf:type gag:MunicipalCommunity;
5/17/11 Example gag:Olympia rdf:type gag:MunicipalCommunity; gag:name "Ancient Olympia"; gag:population "184"^^xsd:int; geo:hasGeometry ex:polygon1. ex:polygon1 rdf:type geo:Geometry; geo:asWKT " POLYGON(( , , , , ))" ^^sf:wktLiteral. Ancient Olympia geonames:264637 29 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 29 29

30 GeoSPARQL Geometry Extension (cont’d)
5/17/11 GeoSPARQL Geometry Extension (cont’d) The following non-topological query functions from the “OpenGIS Simple Feature Access” standard are also offered: geof:distance geof:buffer geof:convexHull geof:intersection geof:union geof:difference geof:symDifference geof:envelope geof:boundary 30 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 30 30

31 Example Query Find forests near municipal communities. SELECT ?r ?c
5/17/11 Example Query Find forests near municipal communities. SELECT ?r ?c WHERE { ?r rdf:type clc:Region; geo:hasGeometry ?rGeom; clc:hasCorineLandCover ?f. ?f rdfs:subClassOf clc:Forest. ?c rdf:type gag:MunicipalCommunity; geo:hasGeometry ?cGeom. FILTER(geof:distance(?rGeom,?cGeom,uom:metre) < 1000)} 31 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 31 31

32 GeoSPARQL Geometry Topology Extension (cont’d)
5/17/11 GeoSPARQL Geometry Topology Extension (cont’d) The following topological query functions from the “OpenGIS Simple Feature Access” standard are offered: geof:sfEquals geof:sfDisjoint geof:sfIntersects geof:sfTouches geof:sfCrosses geof:sfWithin geof:sfContains geof:sfOverlaps 32 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 32 32

33 Example Query Find forests that border municipal communities.
5/17/11 Example Query Find forests that border municipal communities. SELECT ?r ?c WHERE { ?r rdf:type clc:Region; geo:hasGeometry ?rGeom; clc:hasCorineLandCover ?f. ?f rdfs:subClassOf clc:Forest. ?c rdf:type gag:MunicipalCommunity; geo:hasGeometry ?cGeom. FILTER(geof:sfTouches(?rGeom,?cGeom))} 33 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 33 33

34 GeoSPARQL Geometry Topology Extension (cont’d)
5/17/11 GeoSPARQL Geometry Topology Extension (cont’d) The previous family of functions are based on the DE-9IM model studied by Clementini and Felice. Similarly, the family of functions in the Egenhofer and RCC-8 frameworks are offered. 34 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 34 34

35 GeoSPARQL Topology Vocabulary Extension
5/17/11 GeoSPARQL Topology Vocabulary Extension This extension is used for representing topological information about features. Topological information is inherently qualitative and it is expressed in terms of topological relations (e.g., containment, adjacency, overlap etc.). Topological information can be derived from geometric information or it might be captured by asserting explicitly the topological relations between features. 35 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 35 35

36 Topological Relations
5/17/11 Topological Relations The topological relations of the “OpenGIS Simple Feature Access” standard are offered. 36 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 36 36

37 Topological Relations (cont’d)
5/17/11 Topological Relations (cont’d) Similarly, the topological relations of the Egenhofer and the RCC-8 framework. GeoSPARQL offers us vocabulary for expressing these topological relations in the database and the queries. 37 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 37 37

38 Example gag:Olympia rdf:type gag:MunicipalCommunity.
5/17/11 gag:Olympia rdf:type gag:MunicipalCommunity. gag:OlympiaMunicipality rdf:type gag:Municipality. gag:WesternGreece rdf:type gag:Region. gag:Olympia geo:sfWithin gag:OlympiaMunicipality. gag:OlympiaMunicipality geo:sfWithin gag:WesternGreece. 38 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 38 38

39 5/17/11 Query I Find the municipality that contains the community of Ancient Olympia. SELECT ?m WHERE { ?m rdf:type gag:Municipality. ?m geo:sfContains gag:Olympia. } 39 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 39 39

40 Answer gag:OlympiaMunicipality 5/17/11 Suvayu Ali, S.F.U.
40 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 40 40

41 5/17/11 Query II Find the region of Greece that contains the community of Ancient Olympia. SELECT ?m WHERE { ?m rdf:type gag:Region. ?m geo:sfContains gag:Olympia. } 41 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 41 41

42 Answer gag:WesternGreece 5/17/11 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U.
42 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 42 42

43 5/17/11 Query II (cont’d) The answer to Query II can be computed by reasoning about the transitivity of relation geo:sfContains. The GeoSPARQL standard does not cover such entailed topological relations between spatial objects. Is this a problem? 43 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 43 43

44 Administrative Geography of Great Britain
5/17/11 Administrative Geography of Great Britain 73,546,231 triples 44 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 44 44

45 Global Administrative Areas (GADM)
5/17/11 Global Administrative Areas (GADM) 9,896,532 triples 45 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 45 45

46 Nomeclature of Territorial Units for Statistics (NUTS)
5/17/11 Nomeclature of Territorial Units for Statistics (NUTS) 316,246 triples 46 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 46 46

47 Pending Extension in schema.org
Topological relations between places in schema.org

48 Knowledge Graphs with Geospatial Information
5/17/11 Knowledge Graphs with Geospatial Information Current extension of Yago2 with geospatial information including topological relations between geo-entities. 48 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 48 48

49 The Query Rewrite Extension
Enables the translation of qualitative topological information appearing in a query to quantitative. This is done by rewriting of queries with triple patterns involving topological relations into queries with topological functions on geometries. The rewriting is based on a set of RIF rules defined in the standard.

50 The RDFS Extension Enables standard RDFS reasoning for GeoSPARQL classes and properties.

51 [ Kyzirakos, Karpathiotakis & Koubarakis ISWC 2012 ]
5/17/11 The Data Model stRDF [ Kyzirakos, Karpathiotakis & Koubarakis ISWC 2012 ] An extension of RDF for the representation of geospatial information that changes over time. Geospatial dimension: Spatial data types are introduced. Geospatial information is represented using spatial literals of these datatypes. OGC standards WKT and GML are used for the serialization of spatial literals. Temporal dimension Proposed independently and around the same time as GeoSPARQL (starting with an ESWC 2010 paper by Koubarakis and Kyzirakos). 51 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 51 51

52 The Query Language stSPARQL
5/17/11 The Query Language stSPARQL strdf:geometry strdf:union(set of strdf:geometry A) strdf:geometry strdf:intersection(set of strdf:geometry A) strdf:geometry strdf:extent(set of strdf:geometry A) It is an extension of SPARQL 1.1 It offers families of functions for querying geometries. The same functions as in the Geometry Extension and Geometry Topology Extension of GeoSPARQL. In addition the following spatial aggregate functions are offered: Temporal dimension (not covered in this tutorial. See the ESWC 2013 by Bereta, Smeros and Koubarakis). 52 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 52 52

53 GeoSPARQL vs. stSPARQL Core Topology Vocabulary Extension
- relation family Geometry Extension - serialization - version Geometry Topology Extension - serialization - version - relation family sSPARQL Query Rewrite Extension - serialization - version - relation family RDFS Entailment Extension - serialization - version - relation family

54 Example of stSPARQL Compute the parts of burnt areas that lie in coniferous forests. SELECT ?burntArea (strdf:intersection(?baGeom, strdf:union(?fGeom)) AS ?burntForest) WHERE { ?burntArea rdf:type noa:BurntArea; strdf:hasGeometry ?baGeom. ?forest rdf:type clc:Region; clc:hasLandCover clc:ConiferousForest; strdf:hasGeometry ?fGeom. FILTER (strdf:intersects(?baGeom,?fGeom)) } GROUP BY ?burntArea ?baGeom

55 Incomplete Geospatial Information in RDF: The Framework RDFi
Hotspot corresponds to a fire The fire occurred in a region The region (point) is inside the Hotspot And we could represent this kind of information using the stRDF data model that Kostis has already demonstrated in the previous session. So, we encode the geometry for the POINT as a typed literal with datatype strdf:WKT according to the specification of the WKT format from OGC. 55 Suvayu Ali, S.F.U.

56 Beyond the Topology Vocabulary Extension of GeoSPARQL
Triples (not following GeoSPARQL exactly): ex:region1 geo:hasGeometry “POLYGON(A)”. ex:region2 geo:hasGeometry “POLYGON(B)”. ex:region3 geo:sfWithin ex:region2 ex:region4 geo:sfWithin ex:region3 Query: Is region4 contained in region1? B A ex:region3 ex:region4

57 Beyond the Topology Vocabulary Extension of GeoSPARQL (cont’d)
Triples: ex:region1 geo:hasGeometry “POLYGON(A)”. ex:region2 geo:hasGeometry “POLYGON(B)”. ex:region3 geo:sfWithin ex:region2 ex:region4 geo:sfWithin ex:region3 Query: Is region4 contained in region1? Answer: Yes, using the transitivity of geo:sfWithin and a geometric computation. B A ex:region3 ex:region4

58 The Framework RDFi (Nikolaou and Koubarakis; RR2013, AIJ 2016)
Extension of RDF with incomplete information. New kind of literals (e-literals) for each datatype. Property values that exist but are unknown or partially known. Blank nodes, understood as in the RDF model theory, could also have been used with some difficulties. Partial knowledge: captured by constraints in appropriate constraint language L. RDF graphs extended to RDFi databases: pairs (G, φ) G: RDF graph with e-literals φ: quantifier-free formula of L

59 Example G: φ: ex:region2 ex:region3 ex:region1
ex:region1 geo:hasGeometry ex:polygon1. ex:polygon1 geo:asWKT _g1. ex:region2 ex:hasColor “blue”. ex:region2 geo:hasGeometry ex:geom2. ex:geom2 geo:asWKT _g2. ex:region3 ex:hasColor “red”. ex:region3 geo:hasGeometry ex:geom3. ex:geom3 geo:asWKT _g3. φ: _g1 geo:sfEquals “POLYGON (…)”. _g2 geo:sfOverlaps _g1. _g3 geo:sfWithin _g1. ex:region3 ex:region1

60 Query Is there a red region inside a blue region? ex:region2

61 Answer Yes, under the condition that _g3 geo:sfWithin _g2. ex:region2

62 The Framework RDFi (cont’d)
Answers to queries in RDFi can be conditional like in the conditional table model of Imielinski and Lipski (1984). Therefore it makes sense to consider possibility and certainty queries: Is it possible that there is a red region inside a blue region? Is it certain that there is a read region inside a blue region?

63 Semantics of RDFi An RDFi database D represents a set of RDF graphs Rep(D). Evaluating a query in an RDFi database D should be like evaluating it in all RDF graphs in Rep(D) and returning a database that represents all these answers.

64 Evaluating SPARQL Queries over RDFi Databases
We make the following extensions to standard SPARQL query evaluation: Mappings are extended to e-mappings and conditional mappings. The answer to a SELECT query is defined as a pair (Ω, φ) where Ω is a set of conditional mappings and φ is a quantifier free formula in the constraint language L. The answer to a CONSTRUCT query is defined as a pair (T, φ) where T is a set of conditional triples and φ is a quantifier free formula in the constraint language L.

65 Representation Systems for SPARQL and RDFi
The concept of a representation system (from the literature of incomplete information in relational databases) essentially guarantees that we are evaluating queries in a correct way in an incomplete information setting. We have shown the existence of the following two representation systems for SPARQL and RDFi: CONSTRUCT queries Without blank nodes in their templates With monotone graph patterns (using only operators AND, UNION and FILTER). With well-designed graph patterns (graph patterns using only AND, FILTER and OPT plus some intuitive conditions).

66 Complexity of SPARQL Query Answering in RDFi Databases
Data complexity: Query answering is coNP-complete for certainty queries and various interesting classes of spatial constraints (like the ones in our earlier examples). Query answering is not worse than for other simpler cases of constraints (e.g., equality). Compare this with the LOGSPACE complexity bound for the standard SPARQL case.

67 OGC/W3C Spatial Data on the Web Working Group
This working group developed two interesting working notes: Spatial Data on the Web Use Cases & Requirements Spatial Data on the Web Best Practices See for more details. The working group now continues as Spatial Data on the Web Interest Group ( .

68 Geospatial description logics and geospatial ontology-based data access

69 Motivation Publishing data as RDF and correlating them with linked geospatial data is useful Challenges: Domain experts (earth scientists, geologists, etc.) and domain specific applications heavily rely on geospatial databases Original data are stored in them Conversion to RDF is not always practical Frequent updates Large datasets Different tools need to be used Domain experts  So do we need an OBDA system with geospatial support? The answer is yes, and this need comes from the geospatial data practitioners. A lot of geospatial data is being produced and stored in geospatial databases and decades of research Have been devoted to make them efficient. Although there are tools for converting data from geospatial relational databases to RDF, this is not always practical as data might get frequently updated or they might be big

70 Representing Geospatial Information in Description Logics
Use OWL-DL [Katz et al. 2005] Define a spatial concrete-domain DL [Lutz and Milisic, 2007] [Özçep and Möller, 2012] Use OWL and a spatial ABOX RacerPro [Wessel-Möller, 2009] PelletSpatial [Stocker and Sirin, 2009] [Grütter et al., 2008] Keyword queries over spatial OBDA sources [Eiter et al. 2013] Spatial ontology-mediated query answering over mobility streams [Eiter et al. 2017]

71 Geospatial Ontology-based Data Access
Geospatial RDB2RDF systems: GeoTriples, TriplesGeo Mapping languages: R2RML (W3C standard), OBD OBDA systems: Ontop [Rodriguez-Muro et al., JWS’15] Ultrawrap [Sequeda et al., JWS’13] OpenLink Virtuoso RDF Views [Erling, CSSW’07] D2RQ Geospatial OBDA systems: Ontop-spatial [Bereta & Koubarakis, ISWC’16] Oracle Spatial and Graph 12c release 2 And soon the need for representing and querying geospatial RDF data emerged. So there are data models and query languages that extend RDF and SPARQL with geospatial capabilities, such as stRDF and stSPARQL, as well as the OGC standard GeoSPARQL. There is also a wide variety of geospatial RDF stores and query engines, but there was no OBDA system with GeoSPARQL support

72 OBDA Mappings But first, we have the mappings
[MappingDeclaration] [[mappingId gag_geometry target gag:geometry/{gid}/ gag:asWKT {geo}^^geo:wktLiteral . source select distinct gid,geom from gag mappingId clc_geometry Target clc:/{gid}/ clc:hasGeometry clc:/geometry/{id}/ . clc:/{gid}/ clc:asWKT {geom}^^geo:wktLiteral . source select distinct gid, geom from clc mappingId clc_id target clc:/{gid}/ clc:hasID {gid} . clc:/{gid}/ clc:hasLandUse {code_00} . source select distinct gid, geom, code_00 from clc mappingId clc_type target clc:/{gid}/ clc:type clc:type . clc:/{gid}/ rdf:type clc:Area . source select distinct gid, geom from clc]] But first, we have the mappings Mappings encode how relational data like this table here, are mapped into RDF. Various mapping languages exist, but R2RML became a standard. So this minimal mapping maps columns of this table to RDF terms, and you can see how the geometries, orginally stored in WKB format are mapped to WKT GeoSPARQL literals. This R2RML mapping is equivalent mapping of the Ontop native mapping language and encodes the same information.

73 R2RML example [ a rr:TriplesMap ; rr:logicalTable [ a rr:R2RMLView ;
rr:sqlQuery "select distinct gid,geom from gag" ] ; rr:predicateObjectMap [ a rr:PredicateObjectMap ; rr:objectMap [ a rr:ObjectMap , rr:TermMap ; rr:column "geo" ; rr:termType rr:Literal ] ; rr:predicate clc:asWKT ] ; rr:subjectMap [ a rr:TermMap , rr:SubjectMap ; rr:template gag:{gid} ; rr:termType rr:IRI ]] .

74 Virtual Triples clc:20440 rdf:type geo:Geometry;
geo:asWKT “POLYGON(…)”^^geo:wktLiteral . Clc:20512 rdf:type geo:Geometry;

75 Example GeoSPARQL query
Select CORINE areas, their land use and the administrative division they belong to. PREFIX geo: < PREFIX gag: < PREFIX clc: < SELECT DISTINCT ?x1 ?x2 ?lu    WHERE { ?x1 geo:asWKT ?g1 .   ?x2 geo:asWKT ?g2 . ?x2 clc:hasLandUse ?lu . FILTER (geof:sfIntersects(?g1,?g2)) 

76 GeoSPARQL-to-SQL translation
GeoSPARQL query SELECT ?x1 ?x2 WHERE{ ?x1 geo:asWKT ?g1 . ?x2 geo:asWKT ?g2 . FILTER (geof:sfOverlaps(?g1,?g2)) } SELECT ?x1 ?x2 WHERE{ ?x1 geo:sfOverlaps ?x2 } OR Datalog spatially-enabled program …. Spatial SQL query And now, given the mappings before, let us see an example of how GeoSPARQL-to-SQL translation is performed. So this is a GeoSPARQL query that can either be quantitative or qualitative. Geospatial DBMS

77 GeoSPARQL-to-SQL translation
GeoSPARQL query Datalog spatially-enabled program ans1(x1,g1) :- ans2(g1,g2,x1,x2) ans2(g1,g2,x1,x2) :- ans4(g1,g2,x1,x2), SF-OVERLAPS(g1,g2) ans4(g1,g2,x1,x2) :- ans8(g1,x1), ans9(g2,x2) ans8(g1,x1) :- ans9(g2,x2) :- …. Spatial SQL query Ontop creates a datalog program from a SPARQL query. So the geospatial operators of GeoSPARQL as as well translated into distinguished datalog predicates. So in this case SF-overlaps is a distinguished datalog predicate Geospatial DBMS

78 GeoSPARQL-to-SQL translation
GeoSPARQL query SELECT * FROM ( SELECT DISTINCT 1 AS "x1QuestType", ... 10 AS "g1QuestType", NULL AS "g1Lang", CONCAT('< ' , ST_AsText(QVIEW2."strdfgeo")) AS "g1" FROM geo_values QVIEW1, geo_values QVIEW2 WHERE (ST_Overlaps(QVIEW2."strdfgeo",QVIEW1."strdfgeo")) AND QVIEW1."id" IS NOT NULL AND QVIEW1."strdfgeo" IS NOT NULL AND QVIEW2."id" IS NOT NULL AND QVIEW2."strdfgeo" IS NOT NULL ) SUB_QVIEW Datalog spatially-enabled program …. Spatial SQL query An then the datalog query is translated in SQL. Note that the SQL operator ST_Overlaps that is supported in all geospatial databases Geospatial DBMS

79 Example GeoSPARQL Query
PREFIX gag: < PREFIX clc: < SELECT distinct ?x1 ?lu   WHERE { ?x1 geo:asWKT ?g1 .    ?x2 geo:asWKT ?g2 . ?x2 clc:hasLandUse ?lu . FILTER(geof:sfIntersects(?g1,?g2)) } 

80 Raster Data Sources Data sources [[ mappingId chicago
target :{rid} rdf:type :rasterCell ; :hasGeometry {rast} . source select rid,rast from chicago; mappingId gadm target : {id_0} rdf:type :AdministrativeDivision; geo:hasGeometry :{gid} . :{gid} geo:asWKT {geom}^^geo:WKTLiteral . source select * from usa_adm2 ]] Data sources CHICAGO[rid | rast] USA_ADM2[gid | id_0 | iso | name_0 | id_1 | name_1 | id_2 | name_2 |geom] GeoTIFF image of Chicago imported in PostGIS as table (raster geometries) Shapefile describing USA administrative divisions and boundaries (vector geometries)

81 Example Query Retrieve administrative divisions that intersect with raster cells of the GeoTIFF image of Chicago. SELECT ?adm WHERE{ ?r rdf:type :rasterCell . ?r :hasGeometry ?rast . ?adm rdf:type :AdministrativeDivision . ?adm geo:hasGeometry ?g . ?g geo:asWKT ?geom . FILTER(geof:sfIntersects(geom,rast)) It is exactly the same query we would pose if we had only vector geometries. Vector geometries will be bound Raster geometries will be bound

82 Implemented systems, evaluation and comparison

83 Find more at: http://strabon.di.uoa.gr
Strabon (ISWC 2012, ESWC 2013) Find more at: WKT GML stRDF graphs stSPARQL/ GeoSPARQL queries

84 Strabon - Geospatial features
Support for: stRDF and stSPARQL GeoSPARQL (core, geometry extension, geometry topology extension) Multiple Coordinate Reference Systems (CRS) Builds on Sesame RDBMS Geospatial relational database as back-end (PostGIS, MonetDB) R-tree index

85 Parliament Developed by Raytheon BBN Technologies (Dave Kolas). Available at: First GeoSPARQL implementation. Supports: Core Topology vocabulary Geometry Geometry Topology RDF entailment Multiple CRS R-tree index

86 uSeekM Spatial plugin for Sesame by OpenSahara. Supports: GeoSPARQL
Core Topology Vocabulary Geometry Geometry Topology RDFS entailment No multiple CRS. Only WGS84 Open source (Apache v2.0). Available at:

87 GraphDB Developed by Ontotext. Former OWLIM. GeoSPARQL support
Apache Lucene index Closed source. Available at:

88 Allegrograph Quad store developed by Franz Inc
Closed source. Available at No GeoSPARQL support Supports only points Only a few spatial operations supported (Buffer, Bounding Box, Distance)

89 OpenLink Virtuoso Developed by OpenLink. Available at: Supports: No GeoSPARQL Points only Serialized as typed literals Spatial operations (subset of SQL/MM) Multiple CRS R-tree

90 Stardog Limited GeoSPARQL support:
WKT literals. Native support for points. Use of JTS library for polygons Operators geof:relate, geof:distance, geof:within, geof:nearby, geof:area Geospatial features only offered in enterprise edition

91 Brodt et al. Built on top of RDF-3X by University of Stuttgart
No GeoSPARQL support Geometries represented as typed WKT literals Only WGS84 supported OGC-SFA spatial operations as SPARQL filter functions R-tree supported (but only used for spatial selections)

92 Perry PhD thesis Implementation on top of Oracle 10g by Wright State University Support for SPARQL-ST GeoRSS GML serialization of geometries Spatial and temporal variables Spatial and temporal filters (RCC8, Allen) R-tree support

93 Oracle spatial and Graph
Developed by Oracle GeoSPARQL support CRS support Recently added support for virtual RDF graphs (as of Oracle Spatial and Graph 12c Release 2)

94 [ESWC2016, ISWC2016] So what do we mean by the term Spatially enabled OBDA We mean is to enable users querying their geospatial databases using GeoSPARQL queries that they are translated on the fly into SQL and are evaluated to the geospatial DBMS. The translation is based on ontologies and mappings. So we have implemented the first to the best of our knowledge geospatially-enabled GeoSPARQL implementation

95 Architecture overview (extending Ontop)

96 Evaluation

97 Geographica Benchmark
Evaluation of the state-of-the-art geospatial RDF stores [Garbis et al., ISWC 2013] Real workload Synthetic workload Stresses all recent systems in heavily spatial queries and workload. Open source Java framework available at

98 #points per geometry (avg)
Real workload Dataset Size #triples #geometries #points per geometry (avg) Geonames 45MB 400K 22000 1 DBpedia 89MB 430K 8000 LGD 29MB 150K 12000 GAG 33MB 4K 325 400 CLC 401MB 630K 45000 140 Hotspots 90MB 450K 37000 4

99 Execution times in real workload
Non topological construct functions Spatial selections Spatial joins Aggregate functions

100 Adding systems with limited geospatial functionalities

101 Geospatial RDF stores vs OBDA

102 Dataset Table Size No. of rows/geometries Avg #points/ geometry Corine Land Cover (CLC) 283MB 44834 187.84 Hotspots 35 MB 37048 5 Global Administrative Geography (GAG) 24 MB  326 OSM-Buildings 42 MB 155474 6.5 OSM-Landuse 20 MB 40220 19.4 OSM-places 2.4 MB 13043 1 OSM-points 12 MB 61664 OSM-railways 2 MB 4996 13.3 OSM-roads 250 MB 514403 19 OSM-waterways 16 MB 20565 39.84 So this is the workload that we used. Our methodology is the following: We took the original datasets available in Shapefile format (a well-known geospatial format) and imported them into a PostGIS database. In ontop-spatial, every Shapefile corresponds to a table in the database and this is the size and the number of rows of a table. Each table has a geometry column.

103 Queries Highly selective query Poorly selective query These are the two sets of queries that we used, spatial selections and spatial joins. We used queries with different geospatial operators, different number of triple patterns and varying selectivity. Poorly selective queries produce more results and thus more intermediate results between joins

104 Evaluation Cold cache experiments. Spatial selections and spatial joins. If a system has no measurements it times out for the corresponding query. Ontop-spatial Strabon System O

105 Translated queries Strabon query Ontop-spatial query So for example this is how spatial join query 2 is translated in Strabon and Ontop-spatial. We can see that Strabon performs more joins due to the star schema of the db.

106 Spatial join query 6: Strabon query execution
Result of explain query visualized using PgAdmin III

107 DB statistics for spatial join 6
Strabon db statistics Ontop db statistics Tables produced using

108 Performance Evaluation: Strabon vs. Ontop-spatial on a 30 GB dataset
Operation (geof:intersects) Selectivity Geometry types Strabon Ontop-spatial Remarks Spatial Selection high * (irrelevant) 100 msecs low Point-Polygon Polygon-Polygon 500 msecs msecs Spatial Join Point - Polygon < 1000 msecs msecs >40 mins 10 mins Sometimes the difference here is order(s) of magnitude *We can say that both systems can scale up to 100 GB if the geometries are points or/and we have highly selective queries *The differences between Strabon and Ontop-spatial are bigger when the selectivity is low, forcing a lot of intermediate results, which is the weak point of Strabon (while in Ontop-spatial the data is partitioned). In easier queries (selections, highly selective joins or joins with non complex geometries) the differencies are small

109 Performance Evaluation and Scalability of Strabon and Ontop-spatial
Defined and used the benchmark Geographica ( Strabon has better performance and functionality than Parliament, uSeekM, System X, Virtuoso, System O, and System Y (longer version of ISWC 2013 paper). Ontop-spatial has better performance than Strabon and System O (long version of ISWC 2016 paper).

110 Scalability Strabon and Ontop-spatial can scale to 100GB of data and answer queries in milliseconds if the geometries are points and/or the selectivity of the query is high. More complex geometries have an impact on performance.

111 How do we Implement Query Processing for RDFi?
We could use a DL reasoner which offers topological reasoning, e.g., for RCC-8, using path consistency: RacerPro (Haarslev and Möller, 2001) PelletSpatial (Stocker and Sirin, 2009) We could use rules (Batsakis, 2012). None of these approaches is expressive enough or scales to the size of the datasets we have considered (i.e., hundreds of thousands to millions of constraints).

112 Administrative Geography of Great Britain
73,546,231 triples

113 Global Administrative Areas (GADM)
9,896,532 triples

114 Nomeclature of Territorial Units for Statistics (NUTS)
316,246 triples

115 How do we Implement Query Processing for RDFi? (cont’d)
We have implemented our own reasoner: For RCC-5 networks with landmarks/polygons (Giannakopoulou, Nikolaou and Koubarakis, AAAI 2014) Lessons learned: Standard path consistency algorithms plus some geometric computations suffice (Li, Liu and Wang, 2013) For the real world datasets we experimented with (GADM, GAG, GAU, NUTS), geometric computations dominate We could handle a few hundreds of thousands of constraints but not millions Simpler geometries and parallelization help a lot We have not implemented a complete query processing engine for RDFi . This is still an open problem

116 Open Problems

117 Open Questions (Theory)
The data complexity of query processing for stSPARQL and GeoSPARQL has not been studied so far. We have done so only for the original stSPARQL proposal based on constraints (ESWC 2010). Similarly for other interesting problems such as query containment etc. The foundations of Ontop-spatial deserve further study (comparison with the theory of spatial description logics).

118 Open Questions (Practice)
How can we built an (even more) scalable geospatial RDF store like Strabon on top of Apache big data technologies? Analyzed the pros and cons of using GeoSpark, SIMBA and SpatialSpark. GeoSpark seems to be the most mature and is continuously been improved by its developers. How can we handle 1012 triples with 108 polygons? (the volume of geospatial data owned by a national cartographic agency)

119 Open Questions (cont’d)
How do we represent and query raster data on the Semantic Web? Raster extension of Ontop-spatial. Array database extension of Ontop-spatial. Work on “Coverages in Linked Data” by the OGC/W3C Spatial Data on the Web working group.

120 Other relevant research topics of interest
Natural language query processing for large geospatial knowledge bases (e.g., Yago2). Example queries: Find parks in Bonn close to Fraunhofer IAIS. Find rivers that cross cities of Greece and their length is more than 20km. Current work in the context of WDAqua (Answering Questions using Web Data, a Marie Sklodowska-Curie Innovative Training Network, ).

121 Thanks! Questions? For more, see the web page of our group .


Download ppt "Representation and Querying of Linked Geospatial Data (ReQuLGD)"

Similar presentations


Ads by Google