Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,

Similar presentations


Presentation on theme: "Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,"— Presentation transcript:

1 Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University, Germany VLDB 2005, Trondheim Data Management and Exploration Prof. Dr. Thomas Seidl

2 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 2 Outline Interval-and-Value (IaV) Data and Applications Relational Interval Tree (RI-tree) Managing Interval-and-Value Tuples Using RI-tree Experimental Results

3 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 3 Contracts table: storing period and budget of contracts CREATE TABLE contracts ( // key: c_noVARCHAR(10), // simple-valued attribute: c_budgetDECIMAL(10,2), // interval: c_periodROW ( c_start DATE, c_end DATE)) Interval-and-Value Data: Example No. Budget (k€) Period StartEnd C12502005-03-012005-31-07 C253002002-02-172003-05-06 C3107001999-05-272001-12-17 C416002001-02-282002-11-02 C58702002-06-252002-08-12

4 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 4 Interval-and-Value Data: Query Sample query on contracts table // Find all contracts SELECTc_no FROM contracts // within certain budget range WHEREc_budget BETWEEN 500 AND 2000 // running during certain time interval ANDc_period OVERLAPS (DATE ‘2003-03-01’, DATE ‘2004-01-31’) Special Cases of this general Range-Interval query: – Value-Interval Query// value range is a single point – Range-Stabbing Query// query interval is a single point – Value-Stabbing Query// both restrictions hold

5 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 5 Motivation of Relational Indexing Main Memory Structures –no persistency, no disk block structure Secondary Storage Structures +persistency, high block-oriented efficiency –integration into DBMS kernel typically not supported (GiST?) Relational Storage Structures +basic idea: don‘t extend, just use RDBMS (virtual storage machine) +sound formal fundament, little implementation effort +immediate industrial strength (availability, robustness, ACID, …) +high efficiency by exploiting built-in indexing structures (B + -tree) Disk No DB SQL

6 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 6 5252 13 2 3434 15 4 5252 12, 15, C1 12, 10, C1 8, 13, C2 12, 15, C1 8, 5, C2 12, 10, C1 4, 7, C3 8, 13, C2 12, 15, C1 4, 1, C3 8, 5, C2 12, 10, C1 4, 7, C3 8, 13, C2 8, 15, C4 12, 15, C1 4, 1, C3 8, 3, C4 8, 5, C2 12, 10, C1 Two relational indexes (B + -trees) store the interval bounds lowerIndex (node,start,id): upperIndex (node,end,id): Supported by any RDBMS: No modification of built-in B+-trees Optimal complexities for space, updates, and intersection queries Relational Interval Tree C4 7373 1313 10 1 15 1 C3 C2 C1 15 8 135713119 261014 412 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 root = 2 h-1 [Kriegel, Pötke, Seidl: VLDB 2000] based on [Edelsbrunner 1980]

7 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 7 Single Interval Query Processing Two steps to process an interval query 1.Transform interval query into a set of range queries –The generated queries are collected in transient tables (no I/Os) 2.Perform a single SQL query –Join the transient query tables with the relational indexes start end

8 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 8 Preprocessing: Generate Query Ranges Generate a set of range queries for lowerIndex and upperIndex (32,48,52) –At nodes left of start: report entries i with i.end  start (32,48,52) (56) –At nodes right of end: report entries i with i.start  end (56) (54 - 55) –For nodes between start and end: report all entries (54 - 55) start end upperIndex 324852 lowerIndex 5654 to 55 1513 14 13 2 57 6 4 8 119 10 12 1719 18 2123 22 20 24 3129 30 2725 26 28 16 4745 46 3335 34 3739 38 36 40 4341 42 44 4951 50 5355 54 52 56 6361 62 5957 58 60 48 32

9 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 9 Processing by a Single SQL Query Join transient query tables with B+-tree indexes SELECT id FROM upperIndex AS i JOIN :leftQueries USING (node) WHERE i.end >= :start UNION ALL SELECT id FROM lowerIndex AS i JOIN :rightQueries USING (node) WHERE i.start <= :end UNION ALL SELECT id FROM lowerIndex // or upperIndex WHERE node BETWEEN :start AND :end No duplicates are produced → UNION ALL Blocked output of index range scans is guaranteed

10 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 10 Extending the RI-tree for IaV Support (1) Add value predicate to RI-tree query SELECT id// lower subquery FROM upperIndex AS i JOIN :leftQueries USING (node) WHERE i.end >= :start AND i.value BETWEEN :Value1 and :Value2 UNION ALL... // upper subquery UNION ALL SELECT id// inner subquery FROM lowerIndex // or upperIndex WHERE node BETWEEN :start AND :end AND value BETWEEN :Value1 and :Value2 Integrate simple value attribute into lower-/upperIndex –old schema: (node, bound, id) –new schema: ? → depends on type of query to support

11 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 11 Extending the RI-tree for IaV Support (2) Viable schemas for new lower-/upperIndexes –(value, node, bound, id) –(node, value, bound, id)estimate access cost for each query type –(node, bound, value, id) Observations (see paper for details): –Value queries best supported by (value, node, bound, id) index simple attribute predicates = point queries evaluation requires same number of disk accesses as original proceeding –Range Queries: choice of index not obvious inner subquery of Range-Stabbing Queries best supported by (node, value, bound, id) otherwise: depends on stored data and values of query variables Question: Can Range Queries be further enhanced?

12 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 12 Improving Range Query Processing (1) Problem of composite indexes for multiple attributes –queries may contain range predicates on two or more of the indexed attributes –tuples satisfying first predicate lie in contiguous disk area –tuples satisfying both/all predicates are scattered within this area Common solution: using space-filling curves –mapping multi-dimensional data to one-dimensional values –similar values of original data are mapped on similar index data –ranges of indexed attributes will be found in adjacent disk areas Application on RI-tree scenario –combining some attributes of lower-/upperIndex –depends on type of query to support

13 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 13 Improving Range Query Processing (2) Identifying viable schemas for new lower-/upperIndexes –find subqueries containing several range predicates for Range Queries: lower and upper subqueries (bound, value) for Range-Interval Queries: inner subquery (node, value) –combine respective attributes (x,y) within space-filling curve {x,y} –useful combinations for lower-/upperIndex: (node, {value, bound}) ({node, value}, bound)

14 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 14 Improving Range Query Processing (3) Observations: –lower and upper subqueries of Range Queries will profit by a (node, {value, bound}) index –inner subquery of Range-Interval Queries will profit by a ({node, value}, bound) index –Value Queries will not profit by “space-filling indexes” Intermediate result –space-filling indexes can reduce disk accesses in certain cases –there is no “universal” index supporting all queries to the same extent –different subqueries will profit by different indexes

15 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 15 Identifying best indexes for each query type –Value Queries: best supported by (value, node, bound, id) index –Range Queries: depends on data and space-filling curve (if used) different subqueries best supported by different indexes subqueries may be evaluated separately using best index drawback: higher cost for index updates and storage requirements Employing index mixes QueriesLower/Upper SubqueryInner Subquery Value-Stabbing(value, node, bound) Value-Interval(value, node,bound) Range-Stabbing(node, {value, bound})(node, value, bound) Range-Interval(node, {value, bound})({node, value})

16 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 16 Adapting the RI-tree Algorithms (1) Example: Evaluate a contracts query using „space-filling index“ Contracts table: –Node and Z-order value calculated for each tuple –B-tree index on (node, Z(budget, start), no) No. Budget (k€) Period NodeZ(budget, start) StartEnd C12154 4 C25298 50 C31081716 221 C46141916 149 C58212624 186

17 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 17 Range-Interval Query: value range (1,12); interval (3,6) Adapting the RI-tree Algorithms (2) Evaluation of upper subquery with Z-order index

18 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 18 Access Cost with Varying Table Sizes Value-Stabbing QueriesValue-Interval Queries

19 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 19 Access Cost with Varying Table Sizes Range-Stabbing QueriesRange-Interval Queries

20 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 20 Access cost for varying length of ranges Stabbing QueriesInterval Queries

21 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 21 Access cost for varying length of ranges Range Queries

22 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 22 Comparison with competing techniques

23 Data Management and Exploration Prof. Dr. Thomas Seidl Enderle, Schneider, SeidlQueries on Interval-and-Value Tuples in RDBsVLDB 05 - 23 Conclusions Processing Interval-and-Value Tuples in SQL databases Extensions of the Relational Interval Tree Various types of queries –Range vs. Value Queries –Interval vs. Stabbing Queries Experiments demonstrate high performance Future work: –Extend proposed techniques to more complex queries (joins) –Cost models to predict benefits for evolving query workload


Download ppt "Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,"

Similar presentations


Ads by Google