Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.

Similar presentations


Presentation on theme: "1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University."— Presentation transcript:

1 1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University

2 2 What is spatial database? A database system that is optimized to store and query spatial objects: –Point: a hotel, a car –Line: a road segment –Polygon: landmarks, layout of VLSI VLSI LayoutRoad NetworkSatellite Image

3 3 Are spatial databases useful? Geographical Information Systems –e.g. data: road network and places of interest. –e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems –e.g. data: land cover, climate, rainfall, and forest fire. –e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems –e.g. data: store locations and customer locations. –e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems –e.g. data: locations of soldiers (w/wo medical equipments). –e.g. usage: monitor soldiers that may need help from each one with medical equipment.

4 4 MapQuest.com Shortest-Path Query Fastest-Path Query

5 5 Driving directions as you go. Find nearest Wal-Mart or hospital. NN Query

6 6 ArcGIS 9.2, ESRI Range query

7 7 Are spatial databases useful? Geographical Information Systems –e.g. data: road network and places of interest. –e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems –e.g. data: land cover, climate, rainfall, and forest fire. –e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems –e.g. data: store locations and customer locations. –e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems –e.g. data: locations of soldiers (w/wo medical equipments). –e.g. usage: monitor soldiers that may need help from each one with medical equipment.

8 8 Aggregation query

9 9 Are spatial databases useful? Geographical Information Systems –e.g. data: road network and places of interest. –e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems –e.g. data: land cover, climate, rainfall, and forest fire. –e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems –e.g. data: store locations and customer locations. –e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems –e.g. data: locations of soldiers (w/wo medical equipments). –e.g. usage: monitor soldiers that may need help from each one with medical equipment.

10 10 Optimal Location query

11 11 Are spatial databases useful? Geographical Information Systems –e.g. data: road network and places of interest. –e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems –e.g. data: land cover, climate, rainfall, and forest fire. –e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems –e.g. data: store locations and customer locations. –e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems –e.g. data: locations of soldiers (w/wo medical equipments). –e.g. usage: monitor soldiers that may need help from each one with medical equipment.

12 12 Bob John George Bill Mike NN(Bob) = George

13 13 Bob John George Bill Mike RNN query Who will seek help from me? RNN(Bob) = {John, Mike}

14 14 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query

15 15 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query

16 16 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query

17 17 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query

18 18 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query

19 19 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query

20 20 Research goals in spatial databases Support spatial database queries efficiently! –range query, aggregation query, NN query, RNN query, optimal-location query, fastest-path query, skyline query, … Which statement is the best in a large spatial database? (a) Both an O(n 2 ) algorithm and an O(n) algorithm are efficient. (b) An O(n 2 ) algorithm is not efficient, but an O(n) algorithm is. (c) Neither an O(n 2 ) algorithm nor an O(n) algorithm is efficient. Answer: (c)! Even a linear algorithm is not efficient!

21 21 Research goals in spatial databases Example of a linear algorithm: to find my nearest Wal-mart, compare my location with all Wal-marts in the world. Example of a quadratic algorithm: to find the skyline of NBA players, compare every player against all other players (to see if it is dominated). Sample scenario: –Disk page size: 8KB. –Database size: 1GB = 131,072 disk page. –Let each disk I/O be 10 -3 second. O(n): 131 seconds  2 minutes. (Not efficient!) O(n 2 ):  200 days! (Out of the question!)

22 22 How can you do better than O(n)? Answer: use (disk-based) index structures! However, 1-dim index structures, e.g. the B+- tree, are not efficient. E.g. to search for hotels in Boston…

23 23 A 1-dim index is not good enough Suppose a B+-tree exists on X.

24 B+-tree 2*3* Root 17 24 30 14*16* 19*20*22*24*27* 29*33*34* 38* 39* 135 7*5*8* disk-based: stored on disk, load to memory the needed part. paginated: every node is a disk page of fixed size (e.g. 8KB). balanced: all leaf nodes have the same distance from root. dynamically-updateable: dynamic insertion/deletion leaf-storage: all records are stored in leaf nodes. min-capacity: every node (except the root) is at least half full.

25 B+-tree 2*3* Root 17 24 30 14*16* 19*20*22*24*27* 29*33*34* 38* 39* 135 7*5*8* Exact match query: find the record with key=15. Load to memory the nodes along a single path from root to leaf.

26 B+-tree 2*3* Root 17 24 30 14*16* 19*20*22*24*27* 29*33*34* 38* 39* 135 7*5*8* Range search query: find the records  [15, 25]. Note that leaf nodes are linked together. So a range search = exact match + horizontal scan.

27 27 A 1-dim index is not good enough Suppose a B+-tree exists on X.

28 28 Suppose a B+-tree exists on X. A 1-dim index is not good enough

29 29 A 1-dim index is not good enough Suppose a B+-tree exists on Y.

30 30 Solution: spatial index! E.g. the R-tree, the HB-tree. Similar to the B+-tree: disk-based, paginated, balanced, dynamically updateable, leaf-storage, min-capacity. Different from the B+-tree: clusters objects which are close to each other in multiple dimensions (vs. one).

31 31 A leaf node in the B+-tree

32 32 A leaf node in the R-tree

33 33 Selected spatial queries (I) Range query: find the objects in a given range. Aggregation query: find some aggregate value (e.g. COUNT) of the objects in a given range. NN query: find the nearest neighbor of a query location. RNN query: find the objects closer to a given location than to other objects. shortest/fastest query: find the shortest/fastest path in a road network.

34 34 Selected spatial queries (II) Optimal-location query: find the optimal location in a given region to build a new franchise store. Skyline query: find the objects not dominated (i.e. worse in all dimensions) by any other object. Join query: find all pairs of intersecting objects, one in each dataset (e.g. find cities near lakes).


Download ppt "1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University."

Similar presentations


Ads by Google