Quality issues in Spatial Databases M. Mostafavi, G. Edwards, R. Jeansoulin CRG & GEOIDE & REVIGIS Victoria, May 2003
Contents Introduction Problems Objective Methodology Results Discussion Conclusions and perspectives
Introduction Data fusion and Data Quality Multi sources spatial data Vector data : BNDT, BDTQ, … Raster data: satellites images, aerial images,… Need for better quality Logical consistency Completeness Semantic accuracy Temporal accuracy Positional accuracy and more … Decision making ( Effective crisis management (MSPQ))
A real case problem BNDT: good geometry Statistics Canada database, Canada election database: reach descriptive information but weak geometry How to reconcile these two data sets? BNDT SC, EC
Context Fusion SDB1 SDB3 SDB2 Information of greater quality User vision (fitness for use) Producer vision (Product ontology)
Logical consistency Logical consistency is an important element of data quality. It defines the degree of consistency of the data with respect to its specifications. Integrity constrains Explicit rules stated in the data specifications (e.g. connectivity between two objects) Implicit rules (e.g. a river always flows downstream) Ontology vs. specifications Ontology specifications
Project definition NTDB Ontology BDTQ Ontology NTDB data BDTQ data New data set Ontology consistency data consistency data Consistency vs. BNDT data consistency Step 1 Step 2 Step 3 Step 4 Step 5 Integrated ontology Mapping the ontologies Ontology fusion Data fusion Does this Help? Lack of explicit rules Yes ?......No
Consistency in NTDB NTDB Ontologies Delphi Interface Delphi Interface Studying the Logical consistency of the dataset Prolog Dataset Step 1Step 2
Formalizing the ontology Knowledge base Facts Rules Queries BNDT Ontology
Spatial relations in NTDB Spatial relations in NTDB are: 1.Connection relations 2.Sharing relations 3.adjacency relations 4.Superposition relations A B A B A B CDE A B C
Logical approach- facts For NTDB the facts consist of Taxonomy of NTDB Themes Entities Allowed Combinations Code (NTDB identity code) Geometric representations Spatial relations Connection Sharing Superposition/ adjacency Minimal values (e.g. distance constraints between objects)
Logical approach- facts Types of factsTotal groupsFacts Taxonomy Connection Sharing Adjacency/superposition Total There are about 350,000 facts describing the NTDB Remark: regrouping of objects for programming purposes has created some inconsistencies
Logical approach- rules RelationInconsistency rules Connection a. Object A is connected to object B and the inverse relation is not defined. b. Connection is illegal (C1=0-0) and for the same objects we have C1 ≠ 0-0. Sharing a. Object A shares with object B but the inverse relation is not defined. b. The same objects share with different values of C2. Superposition Adjacency a. Two objects are superposed and are adjacent at the same time. Several rules are defined to analyze the ontological consistency of the NTDB. Inconsistency rules
Results (1/2) Inconsistency (inverse connection) Data dictionary: (generic relation) between themes: Railway (L) Connected to Road (L) between themes : Road (L) Connected to Railway (L) Table of connection and cardinalities CodeEntityCombinationC1Code 3002RailwayStandard, Ground level, Operational, Multiple RoadSecondary, Ground level,Hard surface Not verified ?
Results (2/2) Inconsistency (Different Values for the cardinality one) Data dictionary: (Generic relation) Gas and oil facilities (P) is Connected to Building (P) Table of connection and cardinalities CodeEntityCombinationC1Code 788Gas and oil facilities Generic/ unknown Gas and oil facilities Generic/ unknown - (Not verified) 147 ?
Consistency in Data NTDB Ontologies Delphi Interface Studying the Logical consistency of the dataset Prolog VB Interface Dataset Step 1Step 2
Geomedia professional Meet Entirely Contained Entirely Contained by Contains and Contained by Spatially equal touch Meet Overlap Spatial operations
Mapping Polygon – Polygon Relations Relations DisjointMeetEqualInsideContains Covered by CoversOverlap Connection Sharing xxxx Superposition xxxxxx Adjacent x
Mapping problems Several problems Confusions in spatial relations Unique mapping is not possible Cardinalities cannot be considered
Step 2: BNDT Data vs its ontology
Data vs ontology File 21E05 Region: Sherbrooke 68 Entities 23,283 objects Analyzed binary relations: Contours vs. water bodies Buildings vs. roads Water bodies vs. buildings Liquid depot vs. Liquid depot Roads vs. water bodies …
Results Liquid depot vs. Liquid depot Spatial representations (Point, Area) Spatial relations Ontology/ specification (superposition is illegal) Data (superposition case is found)
Problem: Road crosses a water body Illegal relation with respect to semantics of the objects Incomplete ontology Results
Problem: Cut line crosses a water body Illegal relation with respect to semantic definition of the objects Incomplete ontology Results
Problem: Contour crosses water body Illegal relation with respect to the ontology Inconsistent data Results
Problem: Road crosses water body Illegal relation with respect to the ontology Inconsistent data
Problem: Road crosses Building Illegal relation with respect to the semantics of objects Incomplete ontology Results
Problem: Water body (L) superposed Vegetation (A) Illegal relation with respect to the ontology Inconsistent data Control system problem Results
Problem: Buildings (S) superposed to water body (A) Illegal relation with respect to the semantics of objects Inconsistent data Results
Problem: Building (A) Overlap Vegetation (A) Illegal relation with respect to the semantics of objects Inconsistent data Results
Suggestions, solutions Adding new rules Building (a) and vegetation (a) (illegal superposition) Road (l) and building (conditional superposition) A better control system is needed Find exceptions
Current situation Product ontology is analyzed Mapping of topological relations to binary relations Ontology translation in prolog (Delphi program) Consistency studding of spatial relations Connection (table C) Sharing (table D) Superposition and adjacency (table E) Consistency between different relations (fusion of facts) connection and sharing, connection and superposition / adjacency, sharing and superposition / adjacency Consistency of data vs. specifications are studied
Future work logical consistency of other available datasets Mapping of ontologies Fusion of ontologies Fusion of data Consistency of the newly created data set