Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University.

Similar presentations


Presentation on theme: "1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University."— Presentation transcript:

1 1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University

2 2 What is spatial database? A database system that is optimized to store and query spatial objects: –Point: a hotel, a car –Line: a road segment –Polygon: landmarks, layout of VLSI VLSI LayoutRoad NetworkSatellite Image

3 3 Are spatial databases useful? Geographical Information Systems –e.g. data: road network and places of interest. –e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems –e.g. data: land cover, climate, rainfall, and forest fire. –e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems –e.g. data: store locations and customer locations. –e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems –e.g. data: locations of soldiers (w/wo medical equipments). –e.g. usage: monitor soldiers that may need help from each one with medical equipment.

4 4 MapQuest.com Shortest-Path Query Fastest-Path Query

5 5 Driving directions as you go. Find nearest Wal-Mart or hospital. NN Query

6 6 ArcGIS 9.2, ESRI Range query

7 7 Are spatial databases useful? Geographical Information Systems –e.g. data: road network and places of interest. –e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems –e.g. data: land cover, climate, rainfall, and forest fire. –e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems –e.g. data: store locations and customer locations. –e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems –e.g. data: locations of soldiers (w/wo medical equipments). –e.g. usage: monitor soldiers that may need help from each one with medical equipment.

8 8 Aggregation query

9 9 Are spatial databases useful? Geographical Information Systems –e.g. data: road network and places of interest. –e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems –e.g. data: land cover, climate, rainfall, and forest fire. –e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems –e.g. data: store locations and customer locations. –e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems –e.g. data: locations of soldiers (w/wo medical equipments). –e.g. usage: monitor soldiers that may need help from each one with medical equipment.

10 10 Optimal Location query

11 11 Are spatial databases useful? Geographical Information Systems –e.g. data: road network and places of interest. –e.g. usage: driving directions, emergency calls, standalone applications. Environmental Systems –e.g. data: land cover, climate, rainfall, and forest fire. –e.g. usage: find total rainfall precipitation. Corporate Decision-Support Systems –e.g. data: store locations and customer locations. –e.g. usage: determine the optimal location for a new store. Battlefield Soldier Monitoring Systems –e.g. data: locations of soldiers (w/wo medical equipments). –e.g. usage: monitor soldiers that may need help from each one with medical equipment.

12 12 Bob John George Bill Mike NN(Bob) = George

13 13 Bob John George Bill Mike RNN query Who will seek help from me? RNN(Bob) = {John, Mike}

14 14 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query

15 15 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query

16 16 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query

17 17 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query

18 18 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query

19 19 And beyond the “space” … 2004 NBA dataset*: each player has 17 attributes “Spatial Data”: an object is a point in a 17-dimensional space Who are the best players? –i.e. not “ dominated ” by any other player. NamePointsReboundsAssistsSteals …… Tracy McGrady2003484448135 …… Kobe Bryant181939239886 …… Shaquille O'Neal166976020036 …… Yao Ming14656696134 …… Dwyane Wade1854397520121 …… Steve Nash116524986174 …… * www.databaseBasketball.com Skyline query

20 20 Research goals in spatial databases Support spatial database queries efficiently! –range query, aggregation query, NN query, RNN query, optimal-location query, fastest-path query, skyline query, … Which statement is the best in a large spatial database? (a) Both an O(n 2 ) algorithm and an O(n) algorithm are efficient. (b) An O(n 2 ) algorithm is not efficient, but an O(n) algorithm is. (c) Neither an O(n 2 ) algorithm nor an O(n) algorithm is efficient. Answer: (c)! Even a linear algorithm is not efficient!

21 21 Research goals in spatial databases Example of a linear algorithm: to find my nearest Wal-mart, compare my location with all Wal-marts in the world. Example of a quadratic algorithm: to find the skyline of NBA players, compare every player against all other players (to see if it is dominated). Sample scenario: –Disk page size: 8KB. –Database size: 1GB = 131,072 disk page. –Let each disk I/O be 10 -3 second. O(n): 131 seconds  2 minutes. (Not efficient!) O(n 2 ):  200 days! (Out of the question!)

22 22 How can you do better than O(n)? Answer: use (disk-based) index structures! However, 1-dim index structures, e.g. the B+- tree, are not efficient. E.g. to search for hotels in Boston…

23 23 A 1-dim index is not good enough Suppose a B+-tree exists on X.

24 24 Suppose a B+-tree exists on X. A 1-dim index is not good enough

25 25 Content The R-tree –Range Query –Aggregation Query NN Query Skyline Query Highlights of Our Research

26 26 R-Tree Motivation 2 0 46 8 10 2 4 6 8 x axis y axis b c a d e f g h i j k l m Range query: find the objects in a given range. E.g. find all hotels in Boston. No index: scan through all objects. NOT EFFICIENT!

27 27 R-Tree: Clustering by Proximity

28 28 R-Tree

29 29 R-Tree

30 30 Range Query 2 0 46 8 10 2 4 6 8 x axis y axis b c a E 1 d e f g h i j k l m E 2 a b cd e E 1 E 2 E 3 E 4 E 5 Root E 1 E 2 E 3 E 4 f g h E 5 l m E 7 i j k E 6 E 6 E 7

31 31 Range Query 2 0 46 8 10 2 4 6 8 x axis y axis b c a E 1 d e f g h i j k l m E 2 a b cd e E 1 E 2 E 3 E 4 E 5 Root E 1 E 2 E 3 E 4 f g h E 5 l m E 7 i j k E 6 E 6 E 7

32 32 Aggregation Query Given a range, find some aggregate value of objects in this range. COUNT, SUM, AVG, MIN, MAX E.g. find the total number of hotels in Massachusetts. Straightforward approach: reduce to a range query. Better approach: along with each index entry, store aggregate of the sub-tree.

33 33 Aggregation Query 2 0 46 8 10 2 4 6 8 x axis y axis b c a E 1 d e f g h i j k l m E 2 a b cd e E :8 1 E :5 2 E :3 3 E :2 4 E :3 5 Root E 1 E 2 E 3 E 4 f g h E 5 l m E 7 i j k E 6 E :3 6 E :2 7

34 34 Aggregation Query 2 0 46 8 10 2 4 6 8 x axis y axis b c a E 1 d e f g h i j k l m E 2 a b cd e E :8 1 E :5 2 E :3 3 E :2 4 E :3 5 Root E 1 E 2 E 3 E 4 f g h E 5 l m E 7 i j k E 6 E :3 6 E :2 7 Subtree pruned!

35 35 Content The R-tree –Range Query –Aggregation Query NN Query Skyline Query Highlights of Our Research

36 36 Nearest Neighbor (NN) Query Given a query location q, find the nearest object. E.g.: given a hotel, find its nearest bar. q a

37 37 Minimum distance between q and an MBR. It is an lower bound of d(o, q) for every object o in E 1. A Useful Metric: MINDIST E1E1 q MINDIST(q, E 1 )

38 38 NN Basic Algorithm Keep a heap H of index entries and objects, ordered by MINDIST. Initially, H contains the root. While H   –Extract the element with minimum MINDIST –If it is an index entry, insert its children into H. –If it is an object, return it as NN. End while E1E1 q

39 39 NN Query Example E 1 1 E 2 2 Visit Root ActionHeap 2 0 46 8 10 2 4 6 8 x axis y axis b c a E 3 d e f g h i j k l m query E 4 E 5 E 1 E 2 E 6 E 7 12 5 95 2 13 a b c d e E 1 E 2 E 3 E 4 E 5 Root E 1 E 2 E 3 E 4 f g h E 5 l m E 7 i j k E 6 E 6 E 7 2 10 13

40 40 NN Query Example E 1 1 E 2 2 Visit Root follow E 1 E 2 2 E 5 3 E 5 5 E 9 4 ActionHeap 2 0 46 8 10 2 4 6 8 x axis y axis b c a E 3 d e f g h i j k l m query E 4 E 5 E 1 E 2 E 6 E 7 12 5 95 2 13 a b c d e E 1 E 2 E 3 E 4 E 5 Root E 1 E 2 E 3 E 4 f g h E 5 l m E 7 i j k E 6 E 6 E 7 2 10 13

41 41 NN Query Example E 1 1 E 2 2 Visit Root follow E 1 E 2 2 E 5 3 E 5 5 E 9 4 ActionHeap follow E 2 E 2 6 E 5 3 E 5 5 E 9 4 E 13 7 2 0 46 8 10 2 4 6 8 x axis y axis b c a E 3 d e f g h i j k l m query E 4 E 5 E 1 E 2 E 6 E 7 12 5 95 2 13 a b c d e E 1 E 2 E 3 E 4 E 5 Root E 1 E 2 E 3 E 4 f g h E 5 l m E 7 i j k E 6 E 6 E 7 2 10 13

42 42 NN Query Example E 1 1 E 2 2 Visit Root follow E 1 E 2 2 E 5 3 E 5 5 E 9 4 ActionHeap follow E 2 E 2 6 E 5 3 E 5 5 E 9 4 E 13 7 follow E 6 j 10 i 2 E 5 3 E 5 5 E 9 4 E 13 7 k 2 0 46 8 10 2 4 6 8 x axis y axis b c a E 3 d e f g h i j k l m query E 4 E 5 E 1 E 2 E 6 E 7 12 5 95 2 13 a b c d e E 1 E 2 E 3 E 4 E 5 Root E 1 E 2 E 3 E 4 f g h E 5 l m E 7 i j k E 6 E 6 E 7 2 10 13

43 43 NN Query Example E 1 1 E 2 2 Visit Root follow E 1 E 2 2 E 5 3 E 5 5 E 9 4 ActionHeap follow E 2 E 2 6 E 5 3 E 5 5 E 9 4 E 13 7 follow E 6 Report i and terminate j 10 i 2 E 5 3 E 5 5 E 9 4 E 13 7 k 2 0 46 8 10 2 4 6 8 x axis y axis b c a E 3 d e f g h i j k l m query E 4 E 5 E 1 E 2 E 6 E 7 12 5 95 2 13 a b c d e E 1 E 2 E 3 E 4 E 5 Root E 1 E 2 E 3 E 4 f g h E 5 l m E 7 i j k E 6 E 6 E 7 2 10 13

44 44 Content The R-tree –Range Query –Aggregation Query NN Query Skyline Query Highlights of Our Research

45 45 Skyline of Manhattan Which buildings can we see? –not dominated (further away and shorter)

46 46 Which one is better? –i or h? (i, because its price and distance dominate those of h) –i or k? A skyline example: best hotels

47 47 The skyline: a, i, k. A skyline example: best hotels

48 48 Branched and Bound Skyline (BBS) Assume all points are indexed in an R-tree. mindist(MBR) = the L 1 distance between its lower-left corner and the origin.

49 49 Each heap entry keeps the mindist of the MBR. Branched and Bound Skyline (BBS)

50 50 Example of BBS Process entries in ascending order of their mindists.

51 51 Example of BBS

52 52 Example of BBS

53 53 Example of BBS

54 54 Example of BBS

55 55 Example of BBS

56 56 Content The R-tree –Range Query –Aggregation Query NN Query Skyline Query Highlights of Our Research

57 57 The Compressed Skycube [SIGMOD’06] Goal: support skyline queries for an arbitrary subset of dimensions. Pre-computing all skylines: –too much space –expensive update The Compressed Skycube is a very compact representation of all skylines, with efficient query and update support.

58 58 The Optimal-Location Query [SSTD’05, VLDB’06] The optimal location, of a potential new store, can be defined as a location which –maximizes the number of customers who will be “attracted”, or –maximizes the combined saving for the customers in their traveling distance to the nearest store. There seem to have infinite number of candidate locations to check. Efficient algorithms to find exact answers.

59 59 Continuous RNN Monitoring [ICDE’06, ICDE’07] In a battlefield, the RNNs of a soldier with medical equipment are the soldiers that may need to receive help from him. To continuously monitor the RNNs in real time while all objects are moving is challenging. We proposed solution to the monochromatic case. Cooperated with Univ. of Minnesota to solve the bichromatic case.

60 60 Fastest-path computation [ICDE’06] MapQuest provides driving directions without asking leaving time. During rush hour, the best route should be different. Suppose each road segment has a speed pattern. We provide solutions for finding the fastest path, with a leaving time INTERVAL. “I may leave for work some time between 7 and 9. Suggest all fastest paths, e.g. if leaving during [7:43, 8:06], take route A, otherwise take route B”.

61 61 Summary Spatial database has many practical applications. Spatial database research aims to design efficient algorithms for various queries. The talk mentioned a few (range query, aggregation query, NN query, RNN query, optimal-location query, fastest-path query, and skyline query). There are much more -- an on-going research field.


Download ppt "1 Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University."

Similar presentations


Ads by Google