Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hans-Peter Kriegel, Martin Pfeifle, Marco Pötke, Thomas Seidl A Cost Model for Interval Intersection Queries on RI-Trees Institute for Computer Science.

Similar presentations


Presentation on theme: "Hans-Peter Kriegel, Martin Pfeifle, Marco Pötke, Thomas Seidl A Cost Model for Interval Intersection Queries on RI-Trees Institute for Computer Science."— Presentation transcript:

1 Hans-Peter Kriegel, Martin Pfeifle, Marco Pötke, Thomas Seidl A Cost Model for Interval Intersection Queries on RI-Trees Institute for Computer Science University of Munich, Germany Database Group SSDBM 2002, Edinburgh

2 07/25/02 Martin Pfeifle Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work

3 07/25/02 Martin Pfeifle Extended Objects in Databases t 1D Objects: Temporal data Approximate values Interval constraints … 2D Objects: Geographic data VLSI design Bitemporal data … 3D Objects: CAD documents Digital mockup Haptic rendering … t Interval query Box query Window query

4 07/25/02 Martin Pfeifle Integration of Access Methods Declarative Embedding Object-relational DML and DDL Extensible Indexing Framework query processing index_open() index_fetch() index_close() maintenance index_create() index_drop() index_insert() index_delete() index_update()

5 07/25/02 Martin Pfeifle Integration of Access Methods Extensible Indexing Framework Declarative Embedding Object-relational DML and DDL Physical Implementation Block-Manager, Caches, Locking, Logging, … User-defined Index Structure Extensible Indexing Framework Object-relational interface for index maintenance and querying functions. Relational Implementation Mapping to built-in indexes (B + -trees); SQL-based query processing

6 07/25/02 Martin Pfeifle Integration of Access Methods User-defined Index Structure Extensible Indexing Framework Object-relational interface for index maintenance and querying functions. Relational Implementation Mapping to built-in indexes (B + -trees); SQL-based query processing Physical Implementation Block-Manager, Caches, Locking, Logging, … Declarative Embedding Object-relational DML and DDL Extensible Optimization Framework optimization stats_collect() stats_delete() predicate_sel() index_io_cost()

7 07/25/02 Martin Pfeifle Integration of Access Methods User-defined Index Structure Extensible Indexing Framework Object-relational interface for index maintenance and querying functions. Relational Implementation Mapping to built-in indexes (B + -trees); SQL-based query processing Physical Implementation Block-Manager, Caches, Locking, Logging, … User-defined Cost Model Extensible Optimization Framework Object-relational interface for selectivity estimation and cost prediction functions. Relational Implementation Mapping to built-in statistics facilities; SQL-based evaluation of cost model Declarative Embedding Object-relational DML and DDL

8 07/25/02 Martin Pfeifle Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work

9 07/25/02 Martin Pfeifle 3a3a 15 a 12 c 5c5c 15 a Relational Interval Tree (RI-Tree) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 15 8 135713119 261014 412 alice bob chris dave 7b7b 1b1b 13 d Foundation: Interval Tree [Edelsbrunner 1980] primary structure: binary search tree on possible endpoints secondary structure: sorted lists of stored endpoints  each interval is registered at exactly one node [Kriegel, Pötke, Seidl VLDB 2000]

10 07/25/02 Martin Pfeifle 15 8 135713119 261014 412 RI-Tree: Virtual Primary Structure 15135713119 261014 412 8 root = 2 h–1 12 h – 1 no materialization of the binary tree storage cost O(1): parameter root fixed data space: root = 2 h–1 covers [1..2 h – 1]  first step: virtualize the primary structure

11 07/25/02 Martin Pfeifle RI-Tree: Relational Secondary Structure 8 15135713119 261014 412 3a3a 15 a 12 c 5c5c 15 a 7b7b 1b1b 13 d  second step : manage secondary structure by two B + -trees storage of n intervals: O(n/b) disk blocks of size b insert and delete: O(log b n) disk block accesses in the indexes nodelower 4 8 13 1 3 5 13 id bacdbacd nodeupper 4 8 13 7 12 15 13 id bcadbcad lowerIndex (node,lower,id) upperIndex (node,upper,id)

12 07/25/02 Martin Pfeifle upper = 2522 = lower RI-Tree: Interval Intersection Query 16 = root 24 = fork 25 28 26 20 22  1 2 3 4 h = 5 23

13 07/25/02 Martin Pfeifle upper = 2522 = lower RI-Tree: Interval Intersection Query 16 = root 20  1 2 3 4 h = 5 select id from upperIndex i, leftNodes left where i.node = left.node and i.upper >= .lower

14 07/25/02 Martin Pfeifle upper = 2522 = lower RI-Tree: Interval Intersection Query 24 = fork 25 22  1 2 3 4 h = 5 select id from upperIndex i, leftNodes left where i.node = left.node and i.upper >= .lower union all select id from upperIndex i where i.node between .lower and .upper 23

15 07/25/02 Martin Pfeifle upper = 2522 = lower RI-Tree: Interval Intersection Query 28 26  1 2 3 4 h = 5 select id from upperIndex i, leftNodes left where i.node = left.node and i.upper >= .lower union all select id from upperIndex i where i.node between .lower and .upper union all select id from lowerIndex i, rightNodes right where i.node = right.node and i.lower <= .upper

16 07/25/02 Martin Pfeifle upper = 2522 = lower RI-Tree: Interval Intersection Query 16 = root 24 = fork 25 28 26 20 22  1 2 3 4 h = 5  I/O complexity: O(h·log b n + r/b) select id from upperIndex i, leftNodes left where i.node = left.node and i.upper >= .lower union all select id from upperIndex i where i.node between .lower and .upper union all select id from lowerIndex i, rightNodes right where i.node = right.node and i.lower <= .upper 23

17 07/25/02 Martin Pfeifle Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work

18 07/25/02 Martin Pfeifle join I/O (T,  ) = Gaps left (  ) Gaps right (  )  root  I/O Cost Model for Interval Intersections 1 2 3 4 h = 5 T upperIndex(node, upper, id) lowerIndex(node, lower, id) B  root output I/O (T,  ) =  (T,  )·B O( h·log b n + r/b )

19 07/25/02 Martin Pfeifle Selectivity Estimation Histogram-based: (equi-width histogram) – replication of intervals intersection multiple buckets – statistics management requires user-defined code Quantile-based: (equi-count histogram) + better adaption to the data distribution + exploits built-in statistics of the ORDBMS analogously to r left

20 07/25/02 Martin Pfeifle Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work

21 07/25/02 Martin Pfeifle Experimental Evaluation Datasets UNIREAL

22 07/25/02 Martin Pfeifle Experimental Evaluation Computation of Statistics

23 07/25/02 Martin Pfeifle Experimental Evaluation Selectivity Estimation UNIREAL

24 07/25/02 Martin Pfeifle Experimental Evaluation Selectivity Estimation

25 07/25/02 Martin Pfeifle Experimental Evaluation Cost Estimation UNIREAL

26 07/25/02 Martin Pfeifle Experimental Evaluation Cost Estimation UNIREAL

27 07/25/02 Martin Pfeifle Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work

28 07/25/02 Martin Pfeifle Conclusions and Future Work Relational access methods: – employ an ORDBMS as virtual machine – extensible indexing and optimizing framework Indexing extended objects: – Relational Interval Tree Development of cost models: – estimation of selectivity and I/O cost Conclusions: Future Work: Cost models: – general interval relationships – interval sequences

29 07/25/02 Martin Pfeifle Any questions? ? ? ? ? ? ? ?


Download ppt "Hans-Peter Kriegel, Martin Pfeifle, Marco Pötke, Thomas Seidl A Cost Model for Interval Intersection Queries on RI-Trees Institute for Computer Science."

Similar presentations


Ads by Google