Presentation is loading. Please wait.

Presentation is loading. Please wait.

Continuous Stream Monitoring Technology Elke A. Rundensteiner Database Systems Research Laboratory Department of Computer Science Worcester Polytechnic.

Similar presentations


Presentation on theme: "Continuous Stream Monitoring Technology Elke A. Rundensteiner Database Systems Research Laboratory Department of Computer Science Worcester Polytechnic."— Presentation transcript:

1 Continuous Stream Monitoring Technology Elke A. Rundensteiner Database Systems Research Laboratory Department of Computer Science Worcester Polytechnic Institute, USA rundenst @ cs.wpi.edu November 2006

2 2 A Database... Vast amount of electronic information in organisations, companies, scientific institutes that needs to be organized, stored securily, and accessed efficiently and easily. Three common steps :  Make schema design  Load database  Query static database Stored Database DBMS Select name from employee;

3 3 So what next ? Stored Database DBMS Select name from employee;

4 4 A Look at Modern Data : Streams !  Digital radio telescopes  Network traffic flow  Stock tickers/feeds  Sensor networks  Web usage transactions  Outpatient care  Environmental instruments DSMS Filter & Transform select fft(s) from radiosignal s where source(s)= “Antenna1”;

5 5 Databases : Everything is Upside Down ! data Query static data Query data streams of data Standing queries one-time queries

6 6 Continuous Queries on Data Streams Online Stream Monitoring Online Stream Monitoring

7 7 Motivating Applications Everywhere  Traffic Management : Streams of Cars and Mobile Requests  Market Analysis : Streams of Stock Exchange Data  Critical Care : Streams of Vital Sign Measurements  Physical Plant Monitoring: Streams of RFID/Environmental Readings  Emergency Response: Streams of Sensors and People tracking

8 8 Mobile Traffic-Related Streams - moving objects - dynamic range query - dynamic kNN query

9 9 Spatio-Temporal Continuous Tracking Monitor the traffic in the red areas Continuously return the area covered by the herd during the migration

10 10 FireEngine Project : Sensors in Rooms

11 11 Fire Monitoring Queries  Track smoke and heat clouds (moving clusters) in terms of their sizes and speeds?  Is there an outlier (prank), or an actual fire ?  Match sensors readings of fire with a fire stream simulation to determine similarity ?  Any sensors faulty, and thus should be ignored?

12 12 Dynamicity in Stream Query Processing Register Continuous Queries Scalable Stream Query Engine Scalable Stream Query Engine Streaming Data (push-based paradigm) Streaming Result Real-time and accurate responses required May have time- varying rates and high-volumes Available resources for executing each operator may vary over time. New query processing technology required. High workload of queries Memory- and CPU resource limitations (continuous evaluation)

13 13 Execution of Queries App QoS............ App QoS...... App QoS............ Queries = Graph = Query Plan Boxes = Query Operators such as Filter or Join Arcs = Streams with time-stamped tuples        Slide Tumble  

14 14 Execution of Queries App QoS............ App QoS...... App QoS............        Slide Tumble                       App Tumble App Execution via Operator Scheduling

15 15 Adaptation Techniques in CAPE  On-Line Query Plan Reshaping (with Yali Zhu and G. Heineman ) Published in ACM SIGMOD’ 2004, and in Submission to TODS journal 2006

16 16 Query Optimization AB BC AB C AB BC A B C How optimize if query is continuously running?

17 17 Run-time Plan Re-Optimization  Step1 - Decide when to optimize Statistics monitoring  Step2 – Generate new query plan Query optimization  Step3 – Replace current plan by new plan Plan Migration

18 18 Naïve Plan Migration Strategy  Migration Steps Pause execution of old plan Drain out all tuples inside old plan Replace old plan by new plan Resume execution of new plan AB BC AB C AB BC A B C Problem: Works for stateless operators only

19 19 Stateful Operator in Streaming  Why stateful Need non-blocking operators Operator needs to output partial results AB AB State AState B Key Observation: The purge of tuples in states relies on processing of new tuples. Symmetric hash join For each new tuple A purge state B, join state B, insert to state A

20 20 Naïve Migration Strategy Revisited  Steps (1) Pause execution of old plan (2) Drain out all tuples inside old plan (3) Replace old plan by new plan (4) Resume execution of new plan AB BC AB C (2) All tuples drained (4) Processing Resumed (3) Old Replaced By new Deadlock Waiting Problem:

21 21 Proposed Dynamic Migration Strategies  Moving State Strategy  Parallel Track Strategy

22 22 Moving State Strategy  Basic idea Share common states between two boxes  Key Steps Identify common states  State matching Share common states  State moving Recompute unmatched states  State recomputing

23 23 Moving State Strategy  State Matching State in old box has unique ID During rewriting, new ID given to new state in new box When rewriting done, match states based on IDs.  State Moving Between matched states On same machine, creates new pointers for matched states in new box  What’s left? Unmatched states in new box CD S ABC SDSD BC S AB SCSC AB SASA SBSB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD QAQA QBQB QCQC QDQD Q ABCD Old BoxNew Box

24 24 Unmatched States  State Recomputing Recursively recompute unmatched S BC and S BCD by joining matched states AB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD Q ABCD

25 25 MS Migration Pros and Cons  Pros Fast when # of tuples in states is small  Low input rates or small window size  Cons Output silence during entire migration stage  Can we output results even during migration?  Motivation for Parallel Track Strategy

26 26 Parallel Track Strategy  Basic idea Execute both old and new plans in parallel Gradually “push” old tuples out of old box by purging  Key Steps Connect new box Execute both boxes in parallel Remove old box once “expired”  Contains only new tuples  No old tuples or sub-tuples

27 27 Parallel Track Strategy  Connect boxes  Execute in parallel Until all old tuples purged  Disconnect old box CD S ABC SDSD BC S AB SCSC AB SASA SBSB SASA S BCD CD S BC SDSD BC SBSB SCSC QAQA QBQB QCQC QDQD QAQA QBQB QCQC QDQD Q ABCD A Tuple ABC in S ABC ABC

28 28 PT Migrations Pros and Cons  Pros Keep on producing results even during migration  No results during MS migration  Cons Migration duration is at least 2W  MS may be faster depends on # of tuples in states

29 29 Summary : Stream Plan Migration  Our central theme : Optimization via Adaptation  First run-time solution for stateful operators  Two migration methods: Moving State Strategy Parallel Track Strategy  Cost Models for Comparative Analysis  System Implementation in CAPE  Experimental Evaluations

30 30 Overall Summary : So Much Left to Do !  Large variety of challenging stream applications  Generic core technology for stream processing engines  Startup starting to pop up : StreamBase for Stockmarket  Major DBMS players like IBM, Oracle, etc. joining in  Cool open research, great potential for real impact !

31 31 Questions ? The End http://davis.wpi.edu.edu/~dsrg

32 32 Subset of CAPE Publications [RDZ04] E. A. Rundensteiner, L. Ding, Y. Zhu, T. Sutherland and B. Pielech, “CAPE: A Constraint- Aware Adaptive Stream Processing Engine”. Invited Book Chapter. http://www.cs.uno.edu/~nauman/streamBook/. July 2004 http://www.cs.uno.edu/~nauman/streamBook/ [ZRH04] Y. Zhu, E. A. Rundensteiner and G. T. Heineman, "Dynamic Plan Migration for Continuous Queries Over Data Streams”. SIGMOD 2004, pages 431-442. [DMR+04] L. Ding, N. Mehta, E. A. Rundensteiner and G. T. Heineman, "Joining Punctuated Streams“. EDBT 2004, pages 587-604. [DR04] L. Ding and E. A. Rundensteiner, "Evaluating Window Joins over Punctuated Streams“. CIKM 2004, to appear. [DRH03] L. Ding, E. A. Rundensteiner and G. T. Heineman, “MJoin: A Metadata-Aware Stream Join Operator”. DEBS 2003. [RDSZBM04] E A. Rundensteiner, L Ding, T Sutherland, Y Zhu, B Pielech And N Mehta. CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity. Demonstration Paper. VLDB 2004 [SR04] T. Sutherland and E. A. Rundensteiner, "D-CAPE: A Self-Tuning Continuous Query Plan Distribution Architecture“. Tech Report, WPI-CS-TR-04-18, 2004. [SPR04] T. Sutherland, B. Pielech, Yali Zhu, Luping Ding, and E. A. Rundensteiner, "Adaptive Multi- Objective Scheduling Selection Framework for Continuous Query Processing “. IDEAS 2005. [SLJR05] T Sutherland, B Liu, M Jbantova, and E A. Rundensteiner, D-CAPE: Distributed and Self- Tuned Continuous Query Processing, CIKM, Bremen, Germany, Nov. 2005. [LR05] Bin Liu and E.A. Rundensteiner, Revisiting Pipelined Parallelism in Multi-Join Query Processing, VLDB 2005. [B05] Bin Liu, Yali Zhu and E.A. Rundensteiner, Spill Policies for Long-Running Queries, ACM SIGMOD 2006, to appear. CAPE Project: http://davis.wpi.edu/dsrg/CAPE/index.html


Download ppt "Continuous Stream Monitoring Technology Elke A. Rundensteiner Database Systems Research Laboratory Department of Computer Science Worcester Polytechnic."

Similar presentations


Ads by Google