Download presentation
Presentation is loading. Please wait.
Published byAlberta Dawson Modified over 8 years ago
1
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton
2
Outline Introduction Estimation of Output Rates Using Estimates to Optimize Queries Experimental Validation Conclusion
3
Introduction Cost-Based Query Optimization Decide between execution plans by minimizing the execution cost of evaluating the query Cardinality Estimation Use cardinalities of tables at leaves of query tree Very effective in most traditional DBMS applications
4
Introduction Drawback of Cardinality Estimation In case of streaming data, cardinality is not often know In case of infinite streams it may not even be well defined Rate-Based Optimization Uses estimates of rates of the streams in the query evaluation tree
5
Introduction Motivating Example 2 selection operators with selectivity 0.1 Input rate = 500 per second Operator 1 processes 50 inputs per second Operator 2 processes data as fast as it arrives Run a query that applies the 2 selections in series on an infinite stream
6
Introduction
8
Rate-Based Query Optimization Focus – what is the expected output rate of this query plan? Advantages Can optimize for most possible outputs in the first n seconds Can optimize for first k results
9
Estimation of Output Rates Output Rate = Number of outputs transmitted Time Needed to make the transmission
10
Projections 2 cases Cost of one projection is <= inter-arrival time Cost of one projection is > inter-arrival time
11
Selections Output Rate,
12
Joins and Cartesian Products If rl is the left input rate and rr is the right input rate, what is the output rate for results generated by the arrival of this tuples?
13
Cost Model for Join Operator Implementations
15
Using Estimates to Optimize Queries General framework for rate-based optimization Optimize for a specific time point in the execution process Which plan will produce the most results by time to? Optimize for output production size Which plan is the first to reach N results?
16
Examples of heuristics Local Rate Maximization
17
Examples of heuristics Local Time Minimization
18
Grammar
19
Algorithm
20
Experimental Validation Focus Does the cost model correctly estimate individual plan performance? Is the framework capable of providing correct decisions regarding the best choices among a set of plans?
21
Experimental Validation Experimental Setup Synthetically generated XML data set To investigate streaming behavior, view files as prefixes of streams
22
Experimental Validation
23
Validation of the cost Model
28
Complex Plans
33
Comparison to the Traditional Cost Model
34
Conclusions Rate-based optimization enables query optimization when working with infinite input streams It allows optimization for specific points in time during query evaluation Experiments indicate that rate-based optimization is a potentially viable approach
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.