Presentation is loading. Please wait.

Presentation is loading. Please wait.

24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion.

Similar presentations


Presentation on theme: "24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion."— Presentation transcript:

1 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion Technical University Vienna frank@geoinfo.tuwien.ac.at

2 24. April 1998 Dutch Cadastre 2 Overview Why a Database Two Database Issues: Modeling and Implementation Base assumptions about spatio-temporal database Implementation of spatial access Modeling Interoperability Interaction: Multi-Agency Databases Open GIS

3 24. April 1998 Dutch Cadastre 3 Why a Database A database achieves in an agency: Integration Consistency Sharing (reduction of redundancy, but not of storage)

4 24. April 1998 Dutch Cadastre 4 Two Issues: Modeling what are the things to represent and how do we logically structure them Implementation how is this solved with a computer Usually only modeling is important.

5 24. April 1998 Dutch Cadastre 5 Implementation is crucial for DBMS because performance is critical: GIS are too large to be stored completely in main memory. Access to disk takes 10 millisec; access in main memory is 100 nsec 1 : 10 8 - or like 3 sec to 1 year! We therefore must start with the performance for the most often used GIS operation.

6 24. April 1998 Dutch Cadastre 6 Base Assumptions about Spatio-Temporal Database Objects: independently existing, with some properties and entering in some relations to other objects. Spatial objects: have a location and a spatial extend (expressed in a global coordinate system)

7 24. April 1998 Dutch Cadastre 7 Base Assumptions about Spatio-Temporal Database Temporal objects: change their properties in time questions about past (or future states) can be asked (valid time) Administrative database: questions about when a change became known can be asked (transaction times)

8 24. April 1998 Dutch Cadastre 8 Most often used operation Access to spatial data Spatial data must be retrieved quickly based on location: This is missing in a commercial DBMS. SQL can be used, but performance is insufficient Spatial clustering absolutely required

9 24. April 1998 Dutch Cadastre 9 Databases for Spatial Data: Classical architecture: a DBMS for the administrative data, a specialized file system for the other data Research goal: integrated database for both attribute and geometric data with spatial access method. The field tree is a methods for spatial access designed for cadastral applications)

10 24. April 1998 Dutch Cadastre 10 Field tree explanations: Regular grid - could cluster point objects with equal density

11 24. April 1998 Dutch Cadastre 11 Field tree explanations 2: Quadtree grid - could cluster point objects with irregular distribution

12 24. April 1998 Dutch Cadastre 12 Field tree explanations 3: But it cannot cluster extended objects

13 24. April 1998 Dutch Cadastre 13 Field tree explanations 4: The field tree can do it:

14 24. April 1998 Dutch Cadastre 14 Field tree explanations 5: Add a next level, half as large and shifted:

15 24. April 1998 Dutch Cadastre 15 Field tree explanations 6: Add another level (blue):

16 24. April 1998 Dutch Cadastre 16 Field tree explanations 7: Add another level (green):

17 24. April 1998 Dutch Cadastre 17 Why Field Trees ? Extended objects (represented by surrounding minimal box) are stored with a field. Fields cover the area multiply. Guarantee: Every object is on a page which is at most 4 times the size of the object. (This a quad tree based method cannot achieve) Access times depend on the amount of data retrieved, not on the amount of data stored.

18 24. April 1998 Dutch Cadastre 18 Query in field tree Determine which fields overlap the query window; search all (but only these) for objects of interest

19 24. April 1998 Dutch Cadastre 19 Spatial access research Concentration was the implementation of spatial access. Results: Mostly complex, difficult to implement methods

20 24. April 1998 Dutch Cadastre 20 Conclusion from spatial access research: Samet: "use a spatial clustering, use any" Problem was: unclear what criteria for optimization (nearest neighbor, range query) and what the properties of the data Identify problem before you optimize. Identification of detail of problem was not possible; lack of spatial statistics methods. The field tree has been used for cadastral and similar problems in commercial environments previously and performed well.

21 24. April 1998 Dutch Cadastre 21 The Issue is Integration The integration of spatial access with the other DBMS services (especially transaction management) is extremely difficult. The commercial DBMS vendors are not willing nor capable of providing spatial access built into the core of the DBMS engine; the difficulty is the integration with the transaction management system

22 24. April 1998 Dutch Cadastre 22 Practical solution: Spatial access/spatial clustering must be built on top of standard DB functionality (e.g. commercial relational DB). How to cluster if one cannot access the low level storage subsystem? One must exploit the B-Tree data structure, which uses physical clustering in most commercial DBMS

23 24. April 1998 Dutch Cadastre 23 Concept 1. Assign to each object a single number based on a spatial encoding. 2. When storing the object, use this number to achieve physical clustering by spatial location in the DBMS. 3. When searching: determine all spatial codes which fall into the query window, search these codes using the DBMS’s built in B-Tree

24 24. April 1998 Dutch Cadastre 24 How to encode spatial location: Cluster by Morton numbers Proposal by Abel (CSIRO) based on quad tree Disadvantage - small objects may end in very large cell (or multiple keys necessary; multiple keys cannot be used for clustering in most B-Trees)

25 24. April 1998 Dutch Cadastre 25 Field Code Field-tree numbers to encode the spatial location and extend of an objects (v.Oosterom’s idea) --> a single number Spatial Location Code Store spatial objects with this code (exploiting physical clustering) Search: determine the fields which may contain the objects; search these

26 24. April 1998 Dutch Cadastre 26 Practical results Search based on intervals of field codes; Heuristics to reduce the number of intervals submitted for range queries. Tests with real data: demonstrate speed up of search by factors from 10 to 100 times

27 24. April 1998 Dutch Cadastre 27 Modeling Issues Assumption: Relational DBMS - today's standard for implementation. Data model: relations, consisting of tuples of attribute values; relational calculus, SQL as 'universal data speak' (not really useful as a user query language) This is a data (value) oriented concept

28 24. April 1998 Dutch Cadastre 28 Object-orientation necessary OO concept necessary for spatial and temporal databases, especially cadastre: Object have identity in time (parcel id as the classical example) Objects have attribute values Objects enter in relations

29 24. April 1998 Dutch Cadastre 29 Object ID centered Object-Oriented data models create a data model clash, similar to the clash between Relational DBMS and sequential processing in conventional languages.

30 24. April 1998 Dutch Cadastre 30 Object ID centered The relation between objects and attribute values are functions from ID to attribute value relations are function from ID to ID (of the related object) A concept of a representation of an object as a contiguous data space is not necessary, but may be useful for clustering using Spatial Location Codes. This approach seems to solve most of the oo model problems discussed in the literature

31 24. April 1998 Dutch Cadastre 31 Future: What we can realize now are: Spatio-temporal multi-user database for a single agency. How to deal with cooperating agencies? (Your achievements demonstrates the need for this) What are the next questions?

32 24. April 1998 Dutch Cadastre 32 The multi-agency database: Data is shared Responsibility for the data is clearly identified Data is not centralized.

33 24. April 1998 Dutch Cadastre 33 The multi-agency database: This is more than a distributed DB, because it requires a new transaction concept The classical discussion of the 'long transaction', including distributed responsibility for data change within a transaction. Concept: agencies send update proposals for data they cannot change themselves to the agency which is responsible.

34 24. April 1998 Dutch Cadastre 34 Interoperability Agencies must cooperate. So far, we exchange data. Updates are not propagated! Future: interoperability, independent of vendor of the software (the so called Open GIS)

35 24. April 1998 Dutch Cadastre 35 Interoperability as a technical problem Computer network agreement on base cooperation (network standard) GIS cooperation: data model and related concepts

36 24. April 1998 Dutch Cadastre 36 Interoperability as a semantics problem What does the data mean? How to describe the data? How to describe the meaning of data - in a formal language to be used in a computer?

37 24. April 1998 Dutch Cadastre 37 Formal Language Describing natural language with formal tools not likely achieved soon. Sufficient for GIS: Definitions for restricted user communities e.g., agencies within a town

38 24. April 1998 Dutch Cadastre 38 Open GIS Standards Development of industry accepted standards in step with the rapid development of base technology Cooperation of all GIS vendors: Goal: Open Systems

39 24. April 1998 Dutch Cadastre 39 Open GIS Interoperability independent of vendor storage of data under one system analysis tools from another system

40 24. April 1998 Dutch Cadastre 40 Open 2 GIS Interoperability independent of agency Needs cooperation of user communities. Major users are already working in the Open GIS Consortium to assure that their application concepts are standardized.

41 24. April 1998 Dutch Cadastre 41 GIS User Organization gain from Open GIS standardized environments to solve application problems accumulation of knowledge of the application domain cooperation of agencies in Europe (and export of knowledge) A cadastral special interest group is discussed in OGC

42 24. April 1998 Dutch Cadastre 42 GIPSIE Project EU project (DG III: Information Technology) to promote Open GIS within the GI industry and user community in Europe to bring European Issues into the OGC process to contribute with research to the Open GIS standards Participation by European companies and agencies required. Contacts Andrew Frank - TU Vienna Werner Kuhn - U Muenster


Download ppt "24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion."

Similar presentations


Ads by Google