Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Plan Migration for Continuous Queries over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group, WPI. Massachusetts,

Similar presentations


Presentation on theme: "Dynamic Plan Migration for Continuous Queries over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group, WPI. Massachusetts,"— Presentation transcript:

1 Dynamic Plan Migration for Continuous Queries over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group, WPI. Massachusetts, USA SIGMOD’2004 *Research partly supported by the RDC grant 2003-04 on ”On-line Stream Monitoring Systems: Untethered Healthcare, Intrusion Detection, and Beyond.”

2 SIGMOD 20042 Stream Query Optimization Differences with Traditional Query Optimization?

3 SIGMOD 20043 Stream Query Optimization More dynamic fluctuations in statistics   compile time optimization not possible Global optimization not practical; as huge query networks   adaptive optimization. Need to take CPU processing and main memory into account   other cost models

4 SIGMOD 20044 Motivation of ‘Query Migration’ Continuous queries over streams  Statistics unknown before start  Statistics changing during execution Stream rates, arrival pattern, distribution, etc Need for dynamic adaptation  Plan re-optimization Change the shape of query plan tree

5 SIGMOD 20045 Run-time Plan Re-Optimization Step 1 - Decide when to optimize  Statistics Monitoring Step 2 – Generate new query plan  Query Optimization Step 3 – Replace current plan by new plan  Plan Migration

6 SIGMOD 20046 Naïve Plan Migration Strategy Migration Steps  Pause execution of old plan  Drain out all tuples inside old plan  Replace old plan by new plan  Resume execution of new plan AB BC AB C AB BC A B C Problem: Works for stateless operators only

7 SIGMOD 20047 Stateful Operator in CQ Why stateful  Need non-blocking operators in CQ  Operator needs to output partial results  State data structure keeps received tuples AB AB b1 b2 b3 b4 b5 ax State AState B ax b2 axb3 Observation: The purge of tuples in states relies typically on processing of new tuples. Example: Symmetric NL join w/ window constraints

8 SIGMOD 20048 Naïve Migration Strategy Revisited Steps (1) Pause execution of old plan (2) Drain out all tuples inside old plan (3) Replace old plan by new plan (4) Resume execution of new plan AB BC AB C (2) All tuples drained (4) Processing Resumed (3) Old Replaced By new Deadlock Waiting Problem:

9 SIGMOD 20049 Concept of Migration Boxes Two exchangable migration boxes  One contains old plan or sub-plan  One contains new plan or sub-plan  Two plans are semantically equivalent  Same input queues and output queues Migration abstracted as replacing old box by new box.

10 SIGMOD 200410 Problem Definition Dynamic Plan Migration  Input (two migration boxes) One contains old plan One contains new plan Have same input and output queues  Result Old box is replaced by new box Valid Migration  No missing tuples  No duplicates Key points: - Involved plans contain stateful operators - Need to migrate yet still retain useful states and discard useless states.

11 SIGMOD 200411 State of the Art “Efficient mid-query re-optimization of sub- optimal query execution plans”  [Kabra, DeWitt 1998]  Only migrates unprocessed portion Query plan competing model  [Ioannidis, Ng, et. al. 1992] [Graefe, Cole. 1994]  Generate several candidate query plans before start  Execute all, choose one after a while

12 SIGMOD 200412 Outline Problem Motivation and Definition Dynamic Migration Strategies  Moving State Strategy  Parallel Track Strategy Experimental Results

13 SIGMOD 200413 Moving State Strategy Basic idea  Share common states between two boxes Key Steps  Identify common states State matching  Share common states State moving  Recompute unmatched states State recomputing

14 SIGMOD 200414 Moving State Strategy State Matching  state in old box has unique ID  During rewriting, new ID given to newly generated state in new box  When rewriting done, match states based on IDs. State Moving  Between matched states  On same machine, creates new pointers for matched states in new box What’s left?  Unmatched states in new box CD S ABC SDSD BC S AB SCSC AB SASA SBSB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD QAQA QBQB QCQC QDQD Q ABCD Old BoxNew Box

15 SIGMOD 200415 Moving State Strategy Basic idea  Share common states between two migration boxes Key steps  State Matching Match states based on IDs.  State Moving Create new pointers for matched states in new box  What’s left? Unmatched states in new box CD S ABC SDSD BC S AB SCSC AB SASA SBSB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD QAQA QBQB QCQC QDQD Q ABCD Old BoxNew Box

16 SIGMOD 200416 Unmatched States State Recomputing  Recursively recompute unmatched S BC and S BCD from bottom up Why always possible?  Old and new boxes have same input queues  The states associated with input queues always match Why necessary? AB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD Q ABCD

17 SIGMOD 200417 Terms on Tuples New/Old tuples  Old: tuples already in old box when migration starts  New: tuples not exist in old box when migration starts Sub-tuples  Tuple ABCD is result of  Tuple A, B, C and D are sub-tuples of tuple ABCD  Tuple ABCD has 2 4 =16 possible combinations of old/new sub-tuples A BCD CD BC AB QAQA QBQB QCQC QDQD S ABC SCSC SASA SBSB SDSD SABSAB Q ABCD

18 SIGMOD 200418 Why Recompute Unmatched States To get the complete results of ABCD, we need all 16 old/new combinations AB CD BC QBQB QCQC QDQD QAQA SASA SDSD SBSB SCSC S BCD S BC If S BC not recomputed, will miss results with both B and C as OLD: Old Tuple New Tuple B CD A B CD A B CD A

19 SIGMOD 200419 Cost Estimation of MS Migration Cost of MS consists of  Cost of state matching ID comparison (neglectable)  Cost of state moving Create pointers (neglectable)  Cost of state recomputing Majority of cost Affecting parameters  Operator selectivities  # of tuples in states Estimated as (input rate x window size) See paper for detailed cost models Cost model conclusion: Cost of MS has polynomial relationship to window size

20 SIGMOD 200420 Cost Estimation of MS Migration T MS = T match + T move + T recompute T MS ≈ T recompute (S BC ) + T recompute (S BCD ) = λ B λ C W 2 (T j + T s σ BC ) + 2λ B λ C λ D W 3 (T j σ BC + T s σ BC σ BCD ) Tm Time spent for each string comparison Tc Time spent to create a new cursor Tj Time spent to join a pair of tuples Ts Time spent to insert one tuple into a state λA Average tuple input rate from QA λB Average tuple input rate from QB σAB Reduction factor of join operator AB W Global time window constraint AB CD BC QBQB QCQC QDQD QAQA............ SDSD SBSB SCSC S BCD S BC......

21 SIGMOD 200421 MS Migration Pros and Cons Pros  Fast when # of tuples in states is small Low input rates, low selectivity or small window Cons  Output silence during entire migration stage Can query output even during migration?  Motivation for Parallel Track Strategy

22 SIGMOD 200422 Parallel Track Strategy Basic idea  Execute both old and new plans in parallel  Gradually “push” old tuples out of old box by purging Key Steps  Connect new box  Execute both boxes in parallel  Remove old box once “expired” Contains only new tuples No old tuples or sub-tuples

23 SIGMOD 200423 Parallel Track Strategy Key steps  Connect boxes  Execute in parallel Until old box “expired” (no old tuple or sub- tuple)  Disconnect old box  Start execute new box only CD S ABC SDSD BC S AB SCSC AB SASA SBSB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD QAQA QBQB QCQC QDQD Q ABCD

24 SIGMOD 200424 Potential Duplicates Tuple ABCD  2 4 =16 possible old/new sub-tuple combination  Same case not generated by both boxes Otherwise we have duplicates In new box  all states start empty  only generates ABCD as (new,new,new,new) In old box  may generate all 16 cases  duplicate the case of (new,new,new,new)

25 SIGMOD 200425 Duplicate Elimination CD BC AB QAQA QBQB QCQC QDQD S ABC SCSC SASA SBSB SDSD SABSAB Q ABCD At root op in old box: If both to-be-joined tuples have all-new sub-tuples, don’t join. Other op in old box: Proceed as normal

26 SIGMOD 200426 Estimation of PT Migration Duration T PT = W if h=0 2W if h>0 1 st W 2 nd W T M-start T M-end T New Old New Old h=2 Estimation Formula: h: height of the query tree CD BC AB QAQA QBQB QCQC QDQD S ABC SCSC SASA SBSB SDSD SABSAB Old Box h=0 AB QAQA QBQB SASA SBSB W

27 SIGMOD 200427 PT Migration Duration Given enough system computing resources  new tuples processed right away  PT migration duration ≈ 2W If not enough system resources  New tuples accumulated in queues  PT migration duration > 2W

28 SIGMOD 200428 Cost Estimation of PT Migration Cost of PT = cost of process 2W tuples in old box + cost of process 2W tuples in new box Parameters:  Input rates, window size, selectivity Similar to MS strategy

29 SIGMOD 200429 Cost Estimation of PT Migration Costs of processing 2W’s new tuples in both boxes For old box T AB = Cost of Purge + Cost of Insert + Cost of Join For new box  Differentiate first and second W T BC = Cost for the first W + Cost for the second W

30 SIGMOD 200430 PT Migrations Pros and Cons Pros  Keep on producing results even during migration no results during MS migration Cons  Migration duration is at least 2W MS may be faster depending on # tuples in states

31 SIGMOD 200431 Outline Problem Definition and Motivation Dynamic Migration Strategies  Moving State Strategy  Parallel Track Strategy Experimental Results

32 SIGMOD 200432 Experimental Setup Embed in the CAPE system  CAPE = Continuous Adaptive Processing Engine  A streaming query engine developed at DSRG, WPI VLDB’04 demo  Layers of Adaptations Punctuation exploring Adaptive scheduling Query migration Dynamic distribution

33 SIGMOD 200433 Experimental Setup (II) Experiments on migration duration  Vary window size  Vary input rates Experiments on migration effects  Changes of output rates Arrival Streams  Generated by stream generator in CAPE  Poisson arrival pattern (exponential for inter-arrival time) Machine  WIN 2000 Pentium III processor  500MHz CPU, 384M REM

34 SIGMOD 200434 Experimental Setup (II) Experiments on migration duration  Vary window size  Vary input rates Experiments on migration effects  Changes of intermediate results  Changes of output rates  Data Set Enough system resources (low config) Not enough system resources (high config) Machine  WIN 2000 Pentium III processor  500MHz CPU, 384M REM Migration DurationMigration Effects set1set2set3set4 (L)set5 (H) W (ms)vary1000vary10002000 I A (ms)10050100 50 I B (ms)100vary1210050 I C (ms)100501210050 I D (ms)100501210050  AB 0.1 0.2  BC 0.05 0.10.020.05  CD 0.02 0.10.020.05

35 SIGMOD 200435 Migration Duration vs. Window Size

36 SIGMOD 200436 Migration Duration vs. Input Rates T_MS almost constant T_PT increases with λ B

37 SIGMOD 200437 Migration Effects Migration starts at 10000ms Four lines  New – run the new (better) query plan alone  Old – run the old (worse) query plan alone  MS – start with old plan, migrate to new plan by MS strategy  PT – start with old plan, migrate to new plan by PT strategy

38 SIGMOD 200438 Experimental Results – High Config Migration starts at 10000ms New – run the new (better) query plan alone Old – run the old (worse) query plan alone MS – start with old plan, migrate to new plan by MS strategy PT – start with old plan, migrate to new plan by PT strategy

39 SIGMOD 200439 Conclusions Identify problem of migration for stateful operators First solutions for continuous query migration  Moving state strategy  Parallel track strategy Embed both strategies into stream system Cost model and experimental evaluation  Cost model confirmed by experiments  Identify performance trade-off of two strategies

40 SIGMOD 200440 Conclusions Migration duration  Confirms with prior analysis  Moving State Strategy Affected by arrival rates and window size  Parallel Track Strategy 2W if Given enough system resource Otherwise affected by arrival rates and window size Output during migration  No output during MS migration  Still output during PT migration

41 SIGMOD 200441 Future Work General migration framework  All stateful operator types Cost analysis  Effects on optimization choices

42 SIGMOD 200442 CAPE website @: http://davis.wpi.edu/~dsrg/CAPE/


Download ppt "Dynamic Plan Migration for Continuous Queries over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group, WPI. Massachusetts,"

Similar presentations


Ads by Google