The Raindrop Engine: Continuous Query Processing Elke A. Rundensteiner Database Systems Research Lab, WPI 2003.

The Raindrop Engine: Continuous Query Processing Elke A. Rundensteiner Database Systems Research Lab, WPI 2003

2 Monitoring Applications Monitor troop movements during combat and warn when soldiers veer off course Send alert when patient’s vital signs begin to deteriorate Monitor incoming news feeds to see stories on “Iraq” Scour network traffic logs looking for intruders

3 Properties of Monitoring Applications Queries and monitors run continuously, possibly unending Applications have varying service preferences:  Patient monitoring only want freshest data  Remote sensors have limited memory  News service wishes maximal throughput  Taking 60 seconds to process vital signs and sound an alert may be too long

4 Properties of Streaming Data Possibly never ending stream of data Unpredictable arrival patterns:  Network congestion  Weather (for external sensors)  Sensor moves out of range

5 DBMS Approach to Continuous Queries Insert each new tuple into the database, encode queries as triggers [MWA03] Problems:  High overhead with inserts [CCC02]  Triggers do not scale well [CCC02]  Uses static optimization and execution strategies that cannot adapt to unpredictable streams  System is less utilized if data streams arrive slowly  No means to input application service requirements

6 New Class of Query Systems  CQ Systems emerged recently (Aurora, Stream, NiagaraCQ, Telegraph, et al.)  Generally work as follows:  System subscribes to some streams  End users issued continuous queries against streams  System returns the results to the user as a stream  All CQ systems use some adaptive techniques to cope with unpredictable streams

7 Overview of Adaptive Techniques in CQ Systems Research Work Technique(s)Goal Aurora [CCC02] Load shedding, batch tuple processing to reduce context switching Maintain high quality of service STREAM [MWA03] Adaptive scheduling algorithm (Chain) [BBM03], Load shedding Minimize memory requirements during periods of bursty arrival NiagaraCQ [CDT00] Generate near-optimal query plans for multiple queries Efficiently share computation between multiple queries, highly scalable system, maximize output rate Eddies [AH00] (Telegraph) Dynamically route tuples among JoinsKeep system constantly busy, improve throughput XJoin [UF00] Break Join into 3 stages and make use of memory and disk storage Keep Join and system running at full capacity at all times [UF01]Schedule streams with the highest rateMaximize throughput to clients Tukwila [IFF99] [UF98] Reorganize query plans on the fly by using using synchronization packets to tell operators to finish up their current work. Improve ill-performing query plans

8 CAPE Runtime Engine Runtime Engine Operator Configurator QoS Inspector Operator Scheduler Plan Migrator Execution Engine Storage Manager Stream Receiver Distribution Manager Query Plan Generator Stream / Query Registration GUI Stream Provider Queries Results The WPI Stream Project: Raindrop

9 Topics Studied in Raindrop Project  Bring XML into Stream Engine  Scalable Query Operators (Punctuations)  Cooperative Plan Optimization  Adaptive Operator Scheduling  On-line Query Plan Migration  Distributed Plan Execution

PART I: XQueries on XML Streams (Automaton Meets Algebra) Based on CIKM’03 Joint work with Hong Su and Jinhui Jian

11 What ’ s Special for XML Stream Processing? Dream Catcher King S. Bt Bound 30 … Dream Catcher … Token-by-Token access manner timeline Pattern retrieval + Filtering + Restructuring FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return $t Token: not a direct counterpart of a tuple 30Bt BoundS.KingDream2001 pricepublisherfirstlasttitleyear Pattern Retrieval on Token Streams

12 Two Computation Paradigms Automata-based [yfilter02, xscan01, xsm02, xsq03, xpush03…] Algebraic [niagara00, …] This Raindrop framework intends to integrate both paradigms into one

13 Automata-Based Paradigm FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return $t 1 book * 2 4 title price Auxiliary structures for: 1.Buffering data 2.Filtering 3.Restructuring … //book //book/title //book/price 3

14 Algebraic Computation book title author last first publisherprice Text … Navigate $b, /title -> $t Navigate $b, /price->$p Navigate $b, /title-> $t Tagger Select $p < 30 Logic Plan Navigate //$b, /title->$t Rewrite by “pushing down selection” Navigate $b,/price->$p Select $p < 30 Tagger Rewritten Logic Plan Navigate-Index $b, /price -> $p Select $p < 30 Tagger Navigate-Scan $b, /title -> $t Physical Plan Choose low- level implementation alternatives FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return $t $b $t … …

15 Observations Either paradigm has deficiencies Both paradigms complement each other Automata ParadigmAlgebra Paradigm Good for pattern retrieval on tokensDoes not support token inputs Need patches for filtering and restructuring Good for filtering and restructuring Present all details on same low levelSupport multiple descriptive levels (declarative->procedural) Little studied as query processing paradigm Well studied as query process paradigm

16 How to Integrate Two Paradigms

17 How to Integrate Two Models? Design choices  Extend algebraic paradigm to support automata?  Extend automata paradigm to support algebra?  Come up with completely new paradigm? Extend algebraic paradigm to support automata  Practical Reuse & extend existing algebraic query processing engines  Natural Present details of automata computation at low level Present semantics of automata computation (target patterns) at high level

18 Raindrop: Four-Level Framework Semantics-focused Plan Stream Logic Plan Stream Physical Plan Stream Execution Plan Abstraction Level High (Declarative) Low (Procedural)

19 Level I: Semantics-focused Plan [Rainbow- ZPR02] Express query semantics regardless of stored or stream input sources Reuse existing techniques for stored XML processing  Query parser  Initial plan constructor  Rewriting optimization Decorrelation Selection push down …

20 FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return $t Dream Catcher King S. Bt Bound 30 … $S1 … $S1 … $b … … … $S1 … $b … $p 30 … … … $S1 … $b … $p 30 $t Dream Catcher …... …… NavUnnest $S1, //book ->$b NavNest $b, /price/text() ->$p NavNest $b, /title ->$t Select $p<30 Tagger “Inexpensive”, $t->$r Example Semantics-focused Plan

21 Level II: Stream Logical Plan Extend semantics-focused plan to accommodate tokenized stream inputs  New input data format: contextualized tokens  New operators: StreamSource, Nav, ExtractUnnest, ExtractNest, StructuralJoin  New rewrite rules: Push-into-Automata

22 One Uniform Algebraic View Token-based plan (automata plan) Tuple-based plan Tuple stream XML data stream Query answer Algebraic Stream Logical Plan

23 Modeling the Automata in Algebraic Plan: Black Box[XScan01] vs. White Box $b := //book $p := $b/price $t := $b/title Black Box FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return $t XScan StructuralJoin $b ExtractNest $b, $p ExtractNest $b, $t White Box Navigate $b, /price->$p Navigate $b, /title->$t Navigate $S1, //book ->$b

24 Example Uniform Algebraic Plan FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 30 Return $t Tuple-based plan Token-based plan (automata plan)

25 Example Uniform Algebraic Plan FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 30 Return $t StructuralJoin $b ExtractNest $b, $p ExtractNest $b, $t Navigate $b, /price->$p Navigate $b, /title->$t Navigate $S1, //book ->$b Tuple-based plan

26 Example Uniform Algebraic Plan FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 30 Return $t StructuralJoin $b ExtractNest $b, $p ExtractNest $b, $t Navigate $b, /price->$p Navigate $b, /title->$t Navigate $S1, //book ->$b Select $p<30 Tagger “Inexpensive”, $t->$r

27 From Semantics-focused Plan to Stream Logical Plan StructuralJoin $b ExtractNest $b, $p ExtractNest $b, $t Nav $b, /price/text()->$p Nav $b, /title->$t Nav $S1, //book ->$b Select $p<30 Tagger “Inexpensive”, $t->$r NavUnnest $S1, //book ->$b NavNest $b, /price/text() ->$p NavNest $b, /title ->$t Select $p<30 Tagger “Inexpensive”, $t->$r Apply “push into automata”

28 Level III: Stream Physical Plan For each stream logical operator, define how to generate outputs when given some inputs  Multiple physical implementations may be provided for a single logical operator  Automata details of some physical implementation are exposed at this level Nav, ExtractNest, ExtractUnnest, Structural Join

29 One Implementation of Extract/Structural Join 1 book title * 4 price 3 2 ExtractNest $b, $t ExtractNest /$b, $p SJoin //book … Dream Catcher … … … Nav $b, /price->$p Nav $b, /title->$t Nav., //book ->$b

30 Level IV: Stream Execution Plan Describe coordination between operators regarding when to fetch the inputs  When input operator generates one output tuple  When input operator generates a batch  When a time period has elapsed  … Potentially unstable data arrival rate in stream makes fixed scheduling strategy unsuitable  Delayed data under scheduling may stall engine  Bursty data not under scheduling may cause overflow

31 Raindrop: Four-Level Framework (Recap) Semantics-focused Plan Stream Logic Plan Stream Physical Plan Stream Execution Plan Express the semantics of query regardless of input sources Accommodate tokenized input streams Describe how operators manipulate given data Decides the Coordination among operators

32 Optimization Opportunities

33 Optimization Opportunities Semantics-focused Plan Stream Logic Plan Stream Physical Plan Stream Execution Plan General rewriting (e.g., selection push down) Break-linear-navigation rewriting Physical implementations choosing Execution strategy choosing

34 From Semantics-focused to Stream Logical Plan: In or Out? Token-based plan (automata plan) Tuple-based Plan Tuple stream XML data stream Query answer Pattern retrieval in Semantics- focused plan Apply “push into automata”

35 Plan Alternatives Nav $b, /price->$p ExtractNest $b, $p ExtractNest $b, $t SJoin //book Select price < 30 Tagger Nav $b, /title->$t Nav $S1, //book->$b ExtractNest $S1, $b Navigate /price Select price<30 Navigate book/title Tagger Nav $S1, //book->$b NavUnnest $S1, //book ->$b NavNest $b, /price ->$p NavNest $b, /title ->$t Select $p<30 Tagger “Inexpensive”, $t->$r Out In

36 Experimentation Results

37 Contributions Thus Far Combined automata and algebra based paradigms into one uniform algebraic paradigm Provided four layers in algebraic paradigm  Query semantics expressed at high layer  Automata computation on streams hidden at low layer Supported optimization at an iterative manner (from high abstraction level to low abstraction level) Illustrated enriched optimization opportunities by experiments

38 On-Going Issues To be Tackled Exploit XML schema constraints for query optimization Costing/query optimization of plans On-the-fly migration into/out of automaton Physical implementation strategies of operators Load-shedding from an automaton

PART II: On-line Query Plan Migration Joint work with Yali Zhu

40 Motivation for Migration An Initial Good Query Plan may become less effective over time: Changes in stream data distributions (selectivity) Changes in data arrival rates (operator overload) Addition of new queries/de-registering of existing queries Availability of resources allocated to query evaluation Changes in quality of service requirements

41 A Simple Motivating Example

42 On-line Plan Optimization Detection of Suboptimality in Plan Query Optimization via Plan Rewriting On-line Migration of Subplan

43 Related Work “Efficient mid-query re-optimization of sub-optimal query execution plans”, Kabra&DeWitt,SIGMOD’98. (Only re-optimize part of query not started executing yet) “On reconfiguring query execution plans in distributed object-relational DBMS”, K. Ng, 1998 ICPADS. (plan cloning) “Continuously Adaptive Continuous Queries over Streams”, S. Madden, J. Hellerstein, UC Berkeley. ACM SIGMOD 2002.

44 Focus on Dynamic Migration  Given a better plan or sub-plan  Dynamically migrate running plan to given plan.  Guarantee correctness of results No missing No duplicate No incorrect

45 Join Algorithm: Stateful Operator Symmetric NLJ For each new A tuple  Purge State B using time-based window constraints W.  Join with tuples in State B  Output result to output queue  Put into State A Input Queue AInput Queue B Node AB State AState B Output Queue AB a1 b1 a1b1 b2 a2 b2 a1b2 a2b1 a2b2 a3 — a3b2 a3

46 So what ’ s the problem of migration? Old states in old plan still need to join with future incoming tuples, cannot be discarded. New tuples arrive randomly and continuously in streaming system. ABC AB State C State A State B State AB a1 a2 b1 a1b1 a2b1 c1 c2 c3 b2 a2b1 a2b2 Old Query Plan BC ABC State BC State A State B State C New Query Plan

47 Box Concept Migration Unit = Box  Old box contains an old plan or sub-plan  New box contains a new plan or sub-plan Two equivalent boxes  Have the same input queues  Have the same output queues  Contain semantically equivalent sub-plans  Can be migrated from one to another

48 But How? Proposal of two Migration Strategies  Moving state strategy  Parallel track strategy Comparison via Cost models Experimental Evaluation

49 Parallel Track Strategy New plan and old plan co-exist during migration.  Run in parallel  Share input queues and output queues Window constraints are used to eventually time out the old states  This is when migration stage is over  Discard old plan and run only new plan

50 A Running Example ABC ABBC ABC ABC State BCState C State AState B State A State BState C State AB Output Queue ABC Input Queues a1 a2 b1 a1b1 a2b1 c1 a3 a3b1 a3 c2 a3c2b2 — a2b2 a3b2 b2c2 b2 —— a1b1c1 a2b1c1 t a3 c2 b2 a3b1c1 a3b2c2 a2 c1 b1 a1 — a2b2c2 W=3 —— a3b1—— a2b2 a3b2 —— b2c2——

51 Pros and Cons We don’t need to halt the system in order to do migration  Low delay on generating results Overhead  In old plan part, all-new tuple pair is discarded only at the last node.

52 Moving State Strategy First, freeze inputs and “drain” out the old plan Then, establish and connect the new plan And, move over all old states to the new states Lastly, let all new input data go to new plan only Resume processing

53 Moving State Strategy State Matching: Compare states of old and new plans State Moving: If two states match, move them. State Re-computation: If no match, recompute state.

54 Abstract Description ABC ABBC ABC State BCState C State AState B State A State BState C State AB a1 a2 b1 a1b1 a2b1 c1

55 Intermediate States Sharing We can share intermediate state BC if:  Inputs for both plans are exactly the same. Tuples arrived at the same state have passed exactly the same predicates.  Above must hold for any sharing to be possible.

56 Moving State Strategy ABC ABBC ABC ABC State BCState C State AState B State A State BState C State AB Output Queue ABC Input Queues a1 a2 b1 a3c2b2 a1b1 a2b1 c1 t a3 c2 b2 a2 c1 b1 a1 W=3 a1 a2 b1c1 X X X X a3 — c2 — b2c2 b2 —— b2c2 — a2b2c2 a3b2c2 a3b1c1 a2b2c2 a3b2c2 b1c1 a3b1c1

57 Why Need Two Pointers Two nodes share the same state may have different contents Each node has two pointers for each associated state.  First: points to the first tuple in the state  Last: points to the last tuple in the state ABC ABBC ABC Shared State B b1 b2 b3 b4 State B seen from AB b1 b2 State B seen from BC b2 b3 b4

58 Compare the Two Migration Algorithms Different distribution of tasks between old and new plans ABC Query plan used in 1 Query plan used in 2 OOOOld OON New ONOOldNew NOOOldNew NNOOldNew NONOldNew ONNOldNew NNN N: new tuple arrives after migration start time. O: old tuple arrives before migration start time. New: new query plan is used to compute the result. Old: old query plan is used to compute the result.

59 Which algorithm performs better? Compare the performances  Which one is faster? Cost-saving?  New query plan should outperform old query plan  In algorithm 1 old plan part deals with 7 out of 8 cases  In algorithm 2 new plan part deals with 7 out of 8 cases Seems winning  However, 2 needs extra cost to re-compute intermediate states. So cost models are needed!

60 Cost Model Assumptions All binary NL joins Assume that we already know the statistics of each node in query plan  Input arriving rate  Join node selectivity We compute processing power needed in a period of time during which migration happens.  Not computing the real power that the system used.  Those two are different because of resource limitation in a real system.

61 Cost Model Assumptions (cont.) Assume tuple processing time is the same for tuples of different sizes. Assume when migration starts, the old query plan has passed its start-up stage and fully running  States are at their max size, controlled by window constraints. Window constraints  Time-based.  Same over all streams in a join.

62 Running Example Revisit Old Query PlanNew Query Plan BC ABC State BCState A State BState C ABC AB State C State AState B State AB a1 a2 b1 a1b1 a2b1 c1 c2 c3 b2 a2b1 a2b2

63 a, b, c : Tuples/time unit, the average arrival rate on stream A, B and C.  ab,  abc_old : The selectivity of node AB and ABC in old query plan.  bc,  abc_new : The selectivity of node BC and ABC in new query plan. W: Window constraints over all joins, time-based, for example 5 time units. t: Any t time units after migration has started. C j : Cost of join for each pair of tuples, including the cost of accessing the 2 tuples, comparing their values, and so on. C s : Cost of inserting/deleting a tuple to/from states. C t : Cost of access a tuple, for example, check a tuple’s timestamp. |input_name|: Size of the state for one input of a node. For example, |A| represents the size of state A in node AB, and |A| = a W Symbol Definition

64 Cost Model for Algorithm I – for old plan part Cost for node AB in old plan  State starts full C AB = cost of purge + cost of insert + cost of join = [C s ( a / b ) b + C s ( b / a ) a + C s b +C s a +C j ( a |B|+ b |A|)] t = [2C j a b W + 2C s ( a + b )] t --- formula (1) AB = ( a |B|+ b |A|)  ab = 2 a b W  ab --- formula (2) N AB = 2 a b W  ab t Apply the same formula to each node in old query. Put the cost of each node together would be the total cost. ABC AB State C State A State B State AB a1 a2 b1 a1b1c1 c2 c3 b2 a2b1 a2b2

65 Cost Model for Algorithm I – for new plan part Cost for node BC in new plan  State start empty, no purge needed if t<=W. C BC = join cost + insert cost = t 2 b c C j + ( b t+ c t)C s, Where t <= W --- formula (3) In i th time unit after migration start time, and output rate is: i = (2i – 1) b c  bc The total number generated in anytime t (t <=W) is N BC =  I=[1, t] i = t 2 b c  bc --- formula (4) Also apply the same formula to each node in the new plan. BC ABC State BC State A State B State C

66 Cost Model for Algorithm II Extra cost needed for computing new states.  For our running example, we need to compute state BC. The extra cost would be: C stateBC = C j |B||C| = C j b W c W = W 2 b c C j We can apply formula (1) and (2) for computing cost of each node in old plan  Because all states are full new states re-computing Add above two together would be the total cost of algorithm II.

67 Analysis on Cost Models Several parameters control the performance of the two migration algorithms  Arriving rate  Join node selectivity  Window size  Time Costs may not be linear by time  As for algorithm I, new plan part Total migration time largely depends on window size Design experiments by varying those parameters.

70 Some remaining challenges Alternate Migration Strategies Selection of Box Sizes Dynamic Optimization and Migration Comparison Study to Eddies

PART III: Adaptive Scheduler Selection Framework Joint work with Brad Pielech and Tim Sutherland

72 Idea Propose a novel adaptive selection of scheduling strategies Observations: 1. The scheduling algorithm has large impact on behavior of system 2. Utilizing a single scheduling algorithm to execute a continuous query is not sufficient because all scheduling algorithms have inherent flaws or tradeoffs Hypothesis:  Adaptively choose next scheduling strategy to leverage strengths and weaknesses of each and outperform a single strategy

73 Continuous Query Issues The arrival rate of data is unpredictable. The volume of data may be extremely high. Certain domains may have different service requirements. A scheduling strategy such as Round Robin, FIFO, etc. designed to resolve a particular scheduling problem (Minimize memory, Maximize throughput, etc). What happens if we have multiple problems to solve?

74 Scheduling Example σ = tuples outputted tuples inputted C = Operator processing cost in time unit Operator 2 is the quickest and most selective Operator 1 is the slowest and least selective Every 1 time unit, a new tuple arrives in Q1 starting at t 0 When told to run, an operator will process at most 1 tuple, any extra is left in its queue for a later run. An operator takes C time units to process its input, regardless of the size of the input An operator’s output size = σ X # of input tuples if an operator inputs 1 tuple and σ = 0.9, it will output 0.9 tuples. Assume zero time for context switches Assume all tuples are the same size σ = 1 C = 0.75 σ =.1 C = 0.25 σ = 0.9 C = 1 Stream 3 2 1

75 Scheduling Example II Two Scheduling Strategies 1) FIFO: starts at leaf and processes the newest tuple until completion 1,2,3,1,2,3, etc. 2) Greedy: schedule the operator with the most tuples in its input buffer 1,1,1,2,1,1,1,2,…3 We will compare throughput and total queue sizes for both algorithms for the first few time units σ = 1 C = 0.75 σ =.1 C = 0.25 σ = 0.9 C = 1 Stream 3 2 1

76 Scheduling Example: FIFO FIFO Start at leaf and process the newest tuple until completion. Time : End User σ = 1 C = 0.75 σ =.1 C = 0.25 σ = 0.9 C = 1 Stream 3 1 2 0 0 0 0 3 1 2 1 0 0 0 0 3 1 2 1 0 0 0.9 1 3 1 2 1 0 0.09 0 1.25 3 1 2 2 0.09 0 0 2 3 1 2 2 0 0.9 3 3 1 2 2 0.09 0 3.25 3 1 2 3 0.18 0 0 4 FIFO’s queue size grows very quickly. It spends 1 time unit processing Operator 1, then 1 time unit processing 2 and 3. During these 2 time units, 2 tuples arrive in 1’s queue. FIFO outputs 0.09 tuples to the end user every 2 time units. Tuples Outputted: Queue Sizes:

77 Scheduling Example: Greedy Greedy Schedule the operator with the largest input queue. Time : End User σ = 1 C = 0.75 σ =.1 C = 0.25 σ = 0.9 C = 1 Stream 3 2 1 0 0 0 0 3 1 2 0 1 0 0 0 3 1 2 1 1 0 0 0.9 3 1 2 2 1 0 0 1.8 3 1 2 1 0 0.1 0.8 2.25 3 1 2 1 0 0.1 1.7 3.25 3 1 2 1 0 0.2 0.7 3.5 3 1 2 1 0 0.2 1.6 4.5 3 1 2 1 0 0.3 0.6 4.75 3 1 2 1 0 0.3 1.5 5.75 3 1 2 2 0 0.4 0.5 6 Greedy’s queue size grows at a slower rate because Operator 1 is run more often But tuples remain queued for long periods of time in 2 and 3 until their queue sizes become larger than 1’s Greedy will finally output a tuple at about t = 16 At t = 16, Greedy will output 1 tuple, by then, FIFO has outputted.72 (.09 x 6) tuples Tuples Outputted: Queue Sizes:

78 Scheduling Example Wrap-up FIFO: + Output tuples at regular intervals - Q1 grows very quickly - Output rate is low (.045 tuples / unit) - Does not utilize operators fully: O1 = 1 tuple per run, O2 = 0.9, 03 = 0.09. Max is 1 tuple Greedy: + Queue sizes grow less quickly than FIFO’s + Output rate is high:.0625 tuples / unit + More fully utilizes operators: each operator will run with 1 tuple each time - Some tuples will stay in the system for a long time - Long delay before any tuples are outputted

79 So? What is our point? A single scheduling strategy is NOT sufficient when dealing with varying input stream rates, data volume and service requirements!

80 New Adaptive Framework In response to this need, we propose a novel technique which will select between several scheduling strategies based on current system conditions and quality of service requirements

81 Adaptive Framework Choosing between more than one scheduling strategy can leverage the strengths of each strategy and minimize the use of an strategy when it would not perform as well. Allowing a user to input service requirements means that the CQ system can adapt to the user’s needs, not a static set of needs from the CQ system.

82 Quality of Service Preferences Each Application can specify their service requirement as a weighted list of behaviors that may be maximized or minimized as desired. Assumptions / Restrictions:  One (global) preference set at a time  Preference can change during execution.  Can only specify relative behavior.

83 Service Preferences II Input three parameters:  Metric: Any statistic that is calculated by the system.  Quantifier: Maximize or Minimize the given Metric.  Weight: The relative weight / importance of this metric. The sum of all weights is exactly 1. Minimize Maximize QuantifierWeightMetric 0.40Queue Sizes 0.60Throughput Current Supported Metrics: 1. Throughput 2. Queue Sizes 3. Delay

84 Adaptive Selection Overview Given a table of service preferences and a set of candidate scheduling algorithms: 1. Initially run all candidate algorithms once in order to gather some statistics about their performance 2. Assign a score to each algorithm based on how well they have met the preferences relative to the other algorithms ( score formulas on next slides ) 3. Choose scheduling algorithm that will best meet the preferences based on how algorithms performed thus far 4. Run that algorithm for a period of time, record statistics 5. Repeat Steps 2-4 until query or streams are over

85 Adaptive Formulas Z i - normalized statistic for a preference I – the number of preferences H- Historical Category decay- decay factor “A schedulers score is comprised by summing the normalized statistic score times the weight of the statistic for each of the defined statistics by the user.”

86 Choosing Next Scheduling Strategy Once the schedulers scores have been calculated, the next strategy has to be chosen. Should explore all strategies initially such that it can learn how each will perform Periodically should rotate strategies because a strategy that did poorly before, could be viable now Remember, the score for the last ran algorithm is not updated, only the other candidates have their score updated Roulette Wheel [MIT99] Chooses next algorithm with a probability equivalent to its score Assign an initial score to each strategy such that each will have a chance to run Favors the better scoring algorithms, but will still pick others.

87 Experiments 3 parameters : 1. Number of streams 2. Arrival Pattern 3. Number of service preferences

88 Experiment Setup 5 different query plans:  Select and Window-Join Operators Incoming streams use simulated data with a Poisson arrival pattern. The mean arrival time is altered to control burstiness Want to show that Adaptive better meets the preferences than a single algorithm, if not, then the techniques are not worthwhile

89 Single Stream Result (2 Requirements) The adaptive strategy performs as well as PTT in this environment

90 Single Stream Result (3 Requirements)

91 Multi Stream Result (2 Requirements) The Adaptive Framework performs as well as, if not better than both individual scheduling algorithms, with differing service requirements.

92 Multi Stream Result (3 Requirements)

93 Related Work Comparison Research WorkComparison Aurora [CCC02] More complex QoS model. Makes use of alternate adaptive techniques STREAM [MWA03] Only Meets memory requirement NiagaraCQ [CDT00] Only adapts prior to query execution. Concerned more with generating optimal query plans Eddies [AH00] (Telegraph) Finer grain adaptive strategy, look to incorporate in future XJoin [UF00] Finer grained technique, look to incorporate in future [UF01]Only focuses on maximizing rate, uses a single adaptive strategy Tukwila [IFF99] [UF98] Reorganizes plans on the fly, look to incorporate this in the future

94 Conclusions Identified a gap in existing CQ research and proposed a novel adaptive technique to address the problem.  Draws on genetic algorithms and AI research  Alters the scheduling algorithm based on how well the execution is meeting the service preferences The adaptive strategy is showing some promising experiment results  Never performs worse than any single strategy  Often performs as well as the best strategy, and often outperforms it.  Adapts to varying user environments without manually changing scheduling strategies

95 Overall Blizz Many interesting problems arise in this new stream context There is room for lot’s of fun research

96 Email: raindrop@cs.wpi.edu

The Raindrop Engine: Continuous Query Processing Elke A. Rundensteiner Database Systems Research Lab, WPI 2003.

Similar presentations

Presentation on theme: "The Raindrop Engine: Continuous Query Processing Elke A. Rundensteiner Database Systems Research Lab, WPI 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Raindrop Engine: Continuous Query Processing Elke A. Rundensteiner Database Systems Research Lab, WPI 2003.

Similar presentations

Presentation on theme: "The Raindrop Engine: Continuous Query Processing Elke A. Rundensteiner Database Systems Research Lab, WPI 2003."— Presentation transcript:

Similar presentations

About project

Feedback