Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spatial Databases - Indexing

Similar presentations


Presentation on theme: "Spatial Databases - Indexing"— Presentation transcript:

1 Spatial Databases - Indexing
Spring, 2015 Ki-Joune Li

2 What is Indexing ? Indexing : Fight against TIME Example
Suppose that you have a Hamlet, and you want to know the name of Hamlet’s father. Without Index : Full (Sequential) Scan of the book With Index : Direct Access to the Page Hamlet

3 Some Constraints Modern Database What should we do ?
Very Huge Volume : e.g. several peta bytes Storage on Disk Inevitable But slow (cf. main memory) : msec. vs. nano sec. Even in Main Memory Database System What should we do ? Minimize the number of Disk Access

4 The Objective of Indexing
Database in Disk Query Condition Disk Address (Block Number) Indexing

5 Classification of Indexing
According to the type of query and data Alphanumeric query Image Spatial What is the nearest post office to the Louvre Museum ? Spatial predicate Spatial Index Database in Disk Spatial Query Disk Address (Block Number)

6 Spatial Query Sophisticated Types of Spatial Query
One Scan Query Region Query : Containment, Intersection K-Nearest Neighbor Query Multi-Scan Query : Join Spatial Join Distance Join Spatial Query Processing Tightly coupled with Spatial Indexing Method

7 Spatial Processing Strategy
Filtering and Refinement Strategy Index Verification of Geometry Complete Data Candidates Spatial Query Result Simplification of Geometry Filtering Refinement 1. More Light Index : e.g. < 1 M bytes 2. Remove Unnecessary Disk Accesses

8 Classification of Spatial Indexing Methods
Hashing and Indexing Index (in wide sense) Hashing, Indexing (in narrow sense) Space Decomposition vs. MBR Decomposition of a space : Whole Space Bounding Rectangle : Only Interesting Area Dimensionality No Transformation to Higher Dimension To Lower Dimension : Linearization

9 Indexing vs. Hashing Hashing Indexing (in narrow sense)
1. b = h(r.key) 2. Store(r, b) Block number is determined by hashing function or mechanism Only for primary index Search by a hashing function Indexing (in narrow sense) 1. b = Store(r ) 2. Insert(B, (r.key, b) ) Block number is independent from indexing mechanism For primary or secondary index Search by a data structure called index

10 Decomposition vs. Bounding Region

11 Decomposition Methods
Grid File : An Extension of Hashing to 2-D Variation Fixed Grid Grid File Multi-Level Grid File Hierarchical Data Structure KD-tree Quadtree skd-tree etc.

12 Fixed Grid Most Simple Method Minimum Data for Hashing 1 Disk Page
Query Window 20 30 40 50 10 1. Find intersecting grids 2. Find corresponding blocks 3. Read objects from the blocks 4. Refinement

13 Problems of Fixed Grid Only for Point Object Large Dead Space
Object with measure : duplicated storage Degrade performance Large Dead Space Causes Unnecessary Disk Accesses Not very Flexible On Distribution Query Window 20 30 40 50 10

14 Grid File To overcome problems of Fixed Grid
Reduce Dead Space within a cell Increase Blocking Factor Query Window Directory Grid Boundary Block# A (0,0),(15,20) Page 0 B (15,0),(30,20) Page 1 . . . I (30,28),(50,40) Page 15 40 28 20 15 20 30 50

15 Blocking Factor A Key Factor on performance How to increase Bf ?
Number of Objects in a Disk Block Number of Disk Accesses How to increase Bf ? Increase Block Size : not always possible Packing

16 Problems of Fixed Grid Only for Point Object Still Large Dead Space
Large Size of Directory Directory Grid Boundary Block# A (0,0),(15,20) Page 0 B (15,0),(30,20) Page 1 . . . I (30,28),(50,40) Page 15

17 Hierarchical Decomposition
To overcome the size of directory in Grid File Hierarchical Structure of Directory Acceleration of Search

18 KD-tree : Index Extension of Binary Tree to K-Dimension (K=2 for us)
Example : suppose Bf =3 A Directory B E x=20 y=20 y=10 x=30 =< < 15 A B E 10 D Each leaf node points to the disk page A C C D 20 30

19 KD-tree : Search B E x=20 =< < y=20 y=10 15 A A B x=30 E 10 D A

20 Weak Points of KD-tree Only for Point Objects Dead Space
How to Store Tree Structure on Disk Space Blocking Problem Widely used for main memory index Rarely used for disk resident index Unbalanced Tree Zipf’s Law (or 80/20 law) Most events are concentrated Leads highly skewed tree B E D A C

21 Quadtree Extension of KD-tree : KD-tree : binary split
Quadtree 4-way equi-split instead Example : Bf =3 C D F A F Each leaf node points to the disk page B E B C D E G H I J H J G A I

22 Weak Points of Quadtree
Same Problems of KD-tree In addition to the lack of flexibility Only for Point Objects Dead Space How to Store Tree Structure on Disk Space Blocking Problem Widely used for main memory index Rarely used for disk resident index Unbalanced Tree Zipf’s Law (or 80/20 law) Most events are concentrated Leads highly skewed tree

23 Point Quadtree A Simple Variation of Quadtree
Specification of Partition Point instead of equi-split More Adaptive to the distribution of objects Less Skewed (10,20) (5,25) A (5,25) F (35,10) (10,20) B C D E G H I J (35,10)

24 Linear Quadtree : Space-Filling Curve
Quadtree but another representation Linearization by Space-Filling Curve 11 6 13 N-order Hilbert Column-wise Linearize points(or cells) by their peano-key

25 Linear Quadtree Example : N-order curve
Computation of Peano-Key : Bit-Interleaving 11 1. Binary representation of coordinates (10,01) 10 2. Bit-Interleaving x = y = 01 Peano key = 00 = 9 00 01 10 11

26 MBR Methods MBR (Minimum Bounding Box) R-tree and its variants
Two dimensional geometric simplification of objects Not the Whole space, only in the region occupied by objects R-tree and its variants (X1max, X2max ) (X1min, X2min)

27 R-tree Construction of R-tree : Sequence of Insertion Upward Split
B C E A H F G I B C D D E F G H I J K J K A Leaf node points to the disk page 2-D Objects Construction of R-tree : Sequence of Insertion Upward Split

28 Splitting in R-tree Split MBR in the case of overflow
Line sweeping : Compare Cost-X and Cost-Y New MBR Splitting Line Cost Measure Area, Perimeter Overlapping Area

29 R-tree : Query Processing
B C E A A H Query Region W F B B C C D D I G D E E F F G G H H I I J J K K J Candidate K A Read its exact geometry from databaseCandidate Refinement Sample :

30 Strength of R-tree For point and non-point Objects
Good for non-uniform distribution Paged Tree Hierarchical Structure but Balanced Less Dead Space than Decomposition Methods A B C D E J K C D H I E F G

31 Weak Points of R-tree : Overlapping Area
Overlapping : False Matching Query Region A B C J D E F K G H I L M A B G C L H K J D I K E F M False Matching : Visit unnecessary node Performance Degradation

32 Weak Points of R-tree : Dead Space
Query Region A B G C L H J D I E K F M At least one visit at this node (K) even though there is nothing

33 Weak Points of R-tree : Bad Split
Good Split Bad Split 1. Make them as COMPACT as possible 2. Preserve spatial proximity as possible

34 Improvement of R-tree Minimize Or Make it more COMPACT
Overlapping area Dead Space Or Make it more COMPACT Preserve Spatial Proximity Two approaches Packing (or Bulk Loading) Good Split or Insertion Strategies

35 R*-tree : An Improvement of R-tree
Re-Insertion Strategy on Overflow Overflow Newly Inserted Object Delete and Re-Insert this

36 R*-tree : An Improvement of R-tree
Re-Insertion Strategy on Overflow More Compact Re-Inserted Object

37 R*-tree : An Improvement of R-tree
Compact Small Overlapping Area Small Sum of MBR area or perimeters Small Dead Space Stable : Not very affected by the order of insertions The most widely used spatial indexing method

38 Packing R-tree : Improvement of R-tree
Preprocessing for making R-tree more compact Hilbert R-tree STR (Sort-Tile Recursive) Uniformization Instead of Sequential Insertions

39 Hilbert Packing Hilbert Curve A Space Filling Curve
Linearize spatial objects by their peano-key N-order Hilbert Column-wise

40 Hilbert Packing Hilbert Packing Example: Bf =3
Sort objects by Hilbert key Packing by round-robin way Maximize storage utilization Minimum Dead Space, and Sum of MBR area Example: Bf =3

41 STR (Sort-Tile Recursive)
Basic idea : “tile” the data space using vertical slices r : number of rectangles n : blocking factor P ( leaf node page ) = Example Suppose r = 25, n =3 nTile = 9, nV = 3, nH = 3

42 Comparison : Hilbert Packing vs. STR
HP Large Objects STR HP Points STR

43 Uniformization Non-Uniform Distribution Uniformization Technique
Negative Effect on the performance But in real applications : Non-Uniform Uniformization Technique Step 1 : Transform Non-Uniform data to Uniform by STR Step 2 : Apply R-tree (or Fixed Grid) Step 3 : Transform Query Region Strength High Storage Utilization Very Simple and Good Performance

44 Uniformization Non Equi-Width Equi-Width 1. Area of each cell : identical 2. Number of objects within each cell : almost identical

45 Uniformization : Example
By Delaunay Triangulation By STR Original

46 Uniformization : Example
Original By STR

47 Query Processing by R-tree : Nearest Neighbor
Query Point Searching Space 2nd Distances in 2-D Minimum

48 Query Processing by R-tree : Nearest Neighbor
Branching Branching Pruning Minimum

49 Transformation to Higher Space
Transformation to Higher Dimension Transform non-point object to point object Reuse of spatial indexing methods (e.g. Grid File) applicable only to point objects to non-point objects Example Max C B B A A C Amin Amax Min

50 Corner Transformation
From 2-D to 4-D 1. Simplification by MBR 2. MBR ((Xmin, Ymin), (Xmax, Ymax)) to Point (Xmin, Ymin, Xmax, Ymax) (Xmax, Ymax) (Xmin, Ymin)

51 Query Processing for Corner Transformation : 1-D Example
W Query : Find Contained Objects Max VI IV III A V II A I Min Amin Amax Region I : Wmax < Amin Region II : W  A Region III : Amax < Wmin Region IV : Amin < Wmin, Amax < Wmax Region V : Wmin < Amin, Wmax < Amax Region VI : A  W

52 Transformation to Lower Dimension : Linear Quadtree
1. Simplification of Geometry (22, 0) 2. Compute Peano Key with lower-left corner (28, 1) (23, 0) 3. If necessary, divide it and give peano key to each 4. Define the size of each piece according to the number of quadrants 4. Insert them into B-tree 5. Query Processing by B-tree (0, 2)


Download ppt "Spatial Databases - Indexing"

Similar presentations


Ads by Google