Adaptive Query Processing (Background) Advisor: Elke A. Rundensteiner Luping Ding Brad Pielech 5/21/2019 DSRG TALK
Contents Motivation Issues to consider when building adaptive query system Category of adaptivity and related issues Related work Our initial ideas thus far (to be continued…) 5/21/2019 DSRG TALK
Motivation New environment and applications Characteristics Internet and web-based query system Sample applications Network monitoring system Financial applications: stock trading, … Characteristics Distributed, heterogeneous, autonomous data sources Un-predictable, variable data volume and transfer rate 5/21/2019 DSRG TALK
Adaptive Query Processor … XML View DS1 DS2 DSn User Query Adaptive Query Processor N S J T 5/21/2019 DSRG TALK
Motivation II Requirements Ability to process streaming data using non-blocking operators Dynamic inter- and intra- operator scheduling to adapt to data transfer rate Sharing and re-use of sub-plan across multiple queries The ability to output partial/approximate results according to user preferences (discussed later) 5/21/2019 DSRG TALK
Traditional vs. Adaptive Ready data One-time query Blocking operators Query optimization before execution Exact answer Streaming data may be continuous query Non-blocking operators Query optimization before and during execution Partial/approximate answer 5/21/2019 DSRG TALK
Challenges and Possible Solutions The data arrive at a very high speed Sample data and compute approximate answer Un-predictable change of data transfer rate due to sources drying up or network congestion Interleave query execution and optimization to rework the query plan to minimize execution downtime Blocking operators appear in query plan caused by GroupBy, OrderBy, and Join clauses Implement non-blocking alternatives for blocking operators Unbounded or huge data streams need unbounded or huge intermediate storage Compute approximate answer Switch between memory and disk 5/21/2019 DSRG TALK
Contents Motivation Issues to consider when building adaptive system Category of adaptivity and related issues Related work 5/21/2019 DSRG TALK
General Issues I Decide granularity of stream data Each token Individual Element Decided by XPath specified by query 5/21/2019 DSRG TALK
for $b in document(“bib.xml")/bib/book return <result> { $b/title } { $b/author } </result> <bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author>W. Stevens</author> <price> 65.95</price> </book> <book year="2000"> <title>Data on the Web</title> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> <price> 39.95</price> 5/21/2019 DSRG TALK
General Issues II Give order-sensitive result Assign unique ID for each data unit (sequence number or timestamp) Each algebra node keeps order of the data Each algebra node doesn’t keep order, but the top node do sorting 5/21/2019 DSRG TALK
General Issues III Generate approximate results Answers to aggregate queries may change based on new tuples and thus the results are approximate Generate partial results New tuples will not change the validity of existing results Both require non-blocking operator implementations to provide the answer so far 5/21/2019 DSRG TALK
* * * * * P * * * * * * * * * P * * * * General Issues IV Compute statistics Data arrive speed Selectivity of operator Execution cost of operator Introduce control message for synchronization Within algebra node Along with data stream * * * * * P * * * * * * * * * P * * * * 5/21/2019 DSRG TALK
General Issues V Design mechanisms for query plan re-optimization When to re-optimize Action-event rule (Tukwila) Signal in the stream (Niagara) How to re-optimize Reorder joins based on statistics Possibly find other sources to obtain data from slow sources 5/21/2019 DSRG TALK
Contents Motivation Issues to consider when building adaptive system Category of adaptivity and related issues Related work Our Initial Ideas Thus Far (to be continued…) 5/21/2019 DSRG TALK
Categories of Adaptively An adaptive system can be adaptive on many different levels including: Batch: adapt query plans after X unit of time Per query: adapt after every query Inter-operator: adapt after several operators Intra-operator: adapt within an operator Per tuple: adapt after one or more tuples 5/21/2019 DSRG TALK
Per Query Adaptivity Illustration XML View Data Sources N S J T Adapt after every query has been executed Sharing execution of common sub expressions between similar queries Reusing of optimized sub-plans 5/21/2019 DSRG TALK
Inter-Operator Adaptivity Illustration Adapt after one or more operators have been executed XML View Data Sources N S J T Modify query execution plans on-the-fly when delays are encountered during runtime Operator scheduling for CPU and memory allocation Alternative source selecting 5/21/2019 DSRG TALK
J Intra-Operator Adaptivity Illustration Adapt during the execution of one operator J J N S N N S S Change execution of one operator to another semantically correct implementation Input stream scheduling XML View Data Sources 5/21/2019 DSRG TALK
J J Per Tuple Adaptivity Illustration Adapt some operator’s execution on a tuple by tuple basis T J J Each tuple can be routed to a different join in the query plan so that each join is busy at all times Uses timestamp to keep track of which tuples have run through which joins Tuple Router N S S N N S XML View Data Sources 5/21/2019 DSRG TALK
Contents Motivation Issues to consider when building adaptive system Category of adaptivity and related issues Related work 5/21/2019 DSRG TALK
Related Work Tukwila project at U. of Washington Pure XML AQP through the integration of query planning and execution Optimizes for time-to-first tuple first, then for the whole result later Dynamic scheduling of operators to adjust to I/O delays and flow rates Breaks query into execution groups or fragments and can re-optimize plan after each group has been executed Uses event-condition-action rules to determine if re-optimization should take place 5/21/2019 DSRG TALK
Related Work II Havasu project at Arizona State U. User preference driven query optimization Niagara project at U. of Wisconsin User doesn’t have to specify the sources for a query Allows user to “give me results so far” even in the presence of aggregation operators MIX system at San Diego State Information integration system using XML as the intermediate data model Lazy navigation into the result controlled by the user Doesn’t adapt query plan during execution 5/21/2019 DSRG TALK
Related Work III Aurora project at Brown/MIT/Brandeis Telegraph project at UC Berkeley Stream project at Stanford Univ. 5/21/2019 DSRG TALK
To be continued… 5/21/2019 DSRG TALK