Download presentation

Presentation is loading. Please wait.

Published byEmily Raspberry Modified over 2 years ago

1
Trees for spatial indexing Part 2 : SAMs

2
SAMs R-Tree XTV R*-Tree

3
Answering question The Kd-Trie, is similar to kd-tree. In the article it was used for kd-tree. The split-axis isnt in the middle, but is choosen is the median point. Because, we work with points, we have no problem is separating the elements.

4
UB-Tree range queries Algorithm is : Find all region who intersects q –IF this region is a page, all objects that intersects q is in the answer. –After that we search for the last subcube in this region and we search the brother, and if it intersects q we make the same loop on it. –After that we look the father of B and search again.

5
R-Tree Special B+-Tree for spatial indexing. The performance of the R*-Tree is decreasing with the dimensionality. R-tree access method is prohibitively slow for dimensions higher than 5.

6
Problems of (R-Tree based) Index Structures Because it has been shown that with the increasing of the dimensionality we have also more overlap. Overlap is intuitively when for some point queries, we have multiple paths to search.

7
Definition of overlap Intuitively, overlap is the pourcentage of the volume that is covered by more than one directory hyperrectangle. This intuitive definition of overlap is directly correlated to the query performance. Because it implies multiple paths.

8
Definition of the overlap (2) Overlap = ||( U i,j, ij R i R j )|| / ||( U i R i )|| We add all the intersection of the MBR in volume and we divide it by the union of all the MBR in volume. But overlap in highly populated areas is much more critical than overlap in low population. WeightedOverlap = |{ p|p U i,j,ij R i R j )}| / |(p|p U i R i )|

9
1 1 Overlap = (¼)/(2) = 1/8 = 12,5 %WeightedOverlap = (2)/(6) = 1/3 = 33 %

10
Overlap / WeightedOverlap Depending the kind of data the the measurement can be different. If we have uniformed distributed data points, we can use the overlap measure In the case of real data, when can have clustering, so the weightedOverlap is more accurate.

11
X-Tree Avoid overlap in the directory. X-Tree hybrid of a linear array-like and a hierarchical R-Tree-like directory. In low dimensions the most efficient organization of the directory is hierarchical organization. For high dimensionality a linear organization is more efficient.

12
X-Tree In the X-Tree we have 3 types of nodes : data nodes,normal directory, and supernodes. The supernodes avoid splits in directory, so its more faster to search. Not the same as R*-Tree with larger blocks, because it creates larger blocks only if necessary.

13
X-Tree Supernode Normal directory Data nodes

14
Creation of supernodes They are only created if there is no other possibility to avoid overlap during insertion.

15
TV-Tree (Telescopic-Vector tree) The basis of the tv-tree is to use dynamically contracting and extending feature vectors. ( Like in classification )

16
TV-Tree A m-contraction of x, is a sequence of A m x where A m is a contraction matrix. A natural A m is ( 1 0 … 0 ) ( … 0 ) ( …. ) ( 0 …. 0 1)

17
Multiple shapes We can use for example a sphere, because its only a center and a radius r. Represents the set of points with euclidean distance r. ~the euclidean distance is a special case of the L p metrics with p=2. For L 1 metric (manhattan distance) it defines a diamond shape. The TV-tree is working with any L p -sphere.

18
Tv-Tree principle So the TV treats the attributs asymmetrically favoring the first few features over the rest. TV-Tree can use any type of MBR (minimum bounding region), rectangle,cube,sphere etc. TV-Tree can use any L p -Sphere

19
TV-Tree node structure Each node is represents the MBR of all its descendents ( say an L p -sphere ). Each region is represented by a center which is a telescopic-vector and a radius. So we talk about TMBR.

20
TV-1-Tree example

21
TV-2-Tree example

22
TMBR Act. Dim : y Act. Dim : x Act. Dim : z Act. Dim : x,z Act. Dim : x,y

23
What is the best number of active dimensions ? They find out that the best number of active dimensions was two

24
TV-Tree conclusion We accept overlap, so also multiple path to search. Branch choosen for new point is done with the following criteria :

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google