Presentation is loading. Please wait.

Presentation is loading. Please wait.

SIGMOD'061 Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Bin Liu, Yali Zhu and Elke A. Rundensteiner Database Systems Research.

Similar presentations


Presentation on theme: "SIGMOD'061 Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Bin Liu, Yali Zhu and Elke A. Rundensteiner Database Systems Research."— Presentation transcript:

1 SIGMOD'061 Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Bin Liu, Yali Zhu and Elke A. Rundensteiner Database Systems Research Laboratory Worcester Polytechnic Institute

2 SIGMOD'06 2 Motivating Example Real Time Data Integration Server... Stock Price, Volumes,... Reviews, External Reports, News,...  Produce As Many Results As Possible at Run-Time (i.e., 9:00am-4:00pm)  Require Complete Query Results (i.e., for offline analysis after 4:00pm) Decision Support System... Decision-Make Applications Analyze relationship among stock price, reports, and news? Complex queries such as multi-joins are common! A equi-Join of stock price, reports, and news on stock symbols

3 SIGMOD'06 3 Challenges As Many Run-Time Results As Possible  Demand main memory based query processing a4 a2b3 a4b2 a4 b1 a1 a2 a3 State A b1 b2 State B A B AB Push-Based Processing with Complex Queries  Demand main memory space to store operator states  Operator states may monotonically increase over time Run-Time Main Memory Overflow?

4 SIGMOD'06 4 Problem : Memory Overflow High Demand on Main Memory :  High input rates and large windows result in huge states  Bursty streams cause temporary accumulation of tuples  Long-running queries exhibit monotonic state increases Potential Solutions :  Query Optimization  Distributed Processing  Load Shedding  Memory Management

5 SIGMOD'06 5 State Spill Push Operator States Temporarily into Disks Operator states spilled are temporarily inactive ABC ABC ABC Secondary Storage New incoming tuples processed only against partial states

6 SIGMOD'06 6 Three-staged Processing : Hash  Xjoin [UF00] Two Algorithms : Hash + Merge  Hash-Merge Join [MLA04] Single-input, Distributed Environment  Flux [SHCF03] Observation: Single Operator Focus !!! State of Art : State Flushing

7 SIGMOD'06 7 Observation:  Interdependency among Pipelined Operators Spilling of bottom operators affects its downstream operators ! Problem : What about Multi-Operator Plans ? ABC D Join 2 Join 1 Maximize Run-time Throughput of Join1 !! Increase memory consumption of Join 2 :  May quickly fill main memory  May require state spill again  Causes more work downstream But states in Join 2 may not contribute to final output :  Low selectivity ?

8 SIGMOD'06 8 Outline Basics on State Spill Plan-level Spill Strategies Experimental Evaluation

9 SIGMOD'06 9 Granularity : State Partitioning Divide Input Streams into Large Number of Partitions  At run-time, only need to choose partitions to spill [ DNS92,SH03 ] Avoid expensive run-time repartitioning Does not affect partitions that are not spilled Join m1m1 m2m2 Split ABC 1 3 2 1 2 4 2 3 4 Example : 300 partitions M1 has odd IDs M2 has even IDs

10 SIGMOD'06 10 Partition Granularity : Choose State? Multiple States Exist from Different Inputs To disk... Select States from One Input Only Select States with Same ID Partition Group Granularity! Avoid across-machine processing Simplify spill management Streamline cleanup process

11 SIGMOD'06 11 Clean Up Stage Partition Groups Could be Pushed Multiple Times V 0 = PA 1 0 PB 1 0 PC 1 0 V 1 = PA 1 1 PB 1 1 PC 1 1... V k = PA 1 k PB 1 k PC 1 k The Results Have Been Generated Incremental View Maintenance Algorithm [ZMH+95] Treat Multiple Join as Materialized View Partition Groups as Source Updates (PA 1, PB 1, PC 1 ) 0 0 0, (PA 1 1, PB 1 1, PC 1 1 ), (PA 1 2, PB 1 2, PC 1 2 ),..., (PA 1 k, PB 1 k, PC 1 k )

12 SIGMOD'06 12 Merge Disk Resident States To Merge Two Partition Groups with Same ID  i.e., (PA 1 0, PB 1 0, PC 1 0 ) and (PA 1 1, PB 1 1, PC 1 1 )  V 0 = PA 1 0 PB 1 0 PC 1 0, V 1 = PA 1 1 PB 1 1 PC 1 1 After Merge Combined States: PA 1 0  PA 1 1, PB 1 0  PB 1 1, PC 1 0  PC 1 1 Final Result: V = (PA 1 0  PA 1 1 ) (PB 1 0  PB 1 1 ) (PC 1 0  PC 1 1 ) Missing Results:  = V - V 0 - V 1 V-V 0 = PA 1 1 PB 1 0 PC 1 0  (PA 1 0  PA 1 1 ) PB 1 1 PC 1 0  (PA 1 0  PA 1 1 ) (PB 1 0  PB 1 1 ) PC 1 1

13 SIGMOD'06 13 State Spill Strategies

14 SIGMOD'06 14 Which Partitions to Push? Throughput-Oriented State Spill  Productivity of a partition group : P output : Number of output tuples generated from partition group P size : Size of partition group in terms of number of tuples Productivity: P output /P size

15 SIGMOD'06 15 Globally Choose Partition Groups Rank Partitions Based on Productivity: P output /P size Choose globally least productive partitions to spill ABC D E Join 2 Join 1 Join 3 … Disk State Spill Direct Extension : Local Output Method

16 SIGMOD'06 16 Bottom Up Pushing Strategy Spill States from Bottom Operators First  Minimize number of state spills  Minimize global state memory ABC D E Join 2 Join 1 Join 3 Mimics load-shedding.

17 SIGMOD'06 17 Bottom Up Pushing Strategy Spill States from Bottom Operators First  Choose partitions from Join 1 until it reaches threshold k%  If not done, choose partitions from Join 2, and so on ABC D E Join 2 Join 1 Join 3  Minimize intermediate results in upstream operators (memory)  Minimize number of state spill processes Partition Selection: Randomly or using local productivity Less spill process  Higher overall query throughput ?

18 SIGMOD'06 18 Partition Interdependency Smaller Number of Spill Processes  High Throughput !!  Partition pushed in bottom operator may be parent for productive partitions in its downstream operators 2 10 OP 1 OP 2... p11p11 p12p12 p21p21 p22p22 2 1 1 It may worthwhile to push P 2 1 instead of P 1 1 ! Global Strategy : Account for Dependency Relationships !

19 SIGMOD'06 19 “True” Global Output Strategy P output : Contribution to Final Query Output ABC D E Join 2 Join 1 Join 3 Split E Split A Split B Split C Split 2 Split D Split 1 k  Update P output values of partitions in Join 3  Apply Split 2 to each tuple and find corresponding partitions from Join 2, and update its P output value  And so on …  Employ lineage tracing algorithm to update P output statistics

20 SIGMOD'06 20 Global Output with Penalty Incorporate Intermediate Result Sizes... 2 2 P 1 1 : P size = 10, P output =20 P 1 2 : P size = 10, P output =20 OP 1... p11p11 1 p12p12 OP 2... p2ip2i p2jp2j 1 2 20 Intermediate Result Factor P inter  Productivity value: P output /(P size + P inter )

21 SIGMOD'06 21 Global Penalty : Tracing P inter Penalty P inter : Contribution to Intermediate Result Sizes Apply Similar Lineage Tracing Algorithm for P inter... OP 1... p11p11 1 p12p12 OP 2 p21p21 p2jp2j OP 3 p31p31 p3jp3j 2 OP 4... p41p41 p4jp4j 2 2 3 3 3 4 4 4 2+3+4 3+44

22 SIGMOD'06 22 CAPE System Overview [LZ+05, TLJ+05] Local Statistics Gatherer Data Distributor CAPE-Continuous Query Processing Engine Data Receiver Query Processor Local Adaptation Controller Distribution Manager Streaming Data Networ k End User Global Adaptation Controller Runtime Monitor Query Plan Manager Repository Connection Manager Repository Application Server Stream Generator

23 SIGMOD'06 23 Experimental Setup : Queries and Data  Inputs: A, B, C, D, and E data streams  Query : Join 1 :A 1 =B 1 =C 1, Join 2 :C 2 =D 1, Join 3 :D 2 =E 1  Query Operators : Use symmetric hash join  Each input stream is partitioned into 300 partitions  Query is partitioned and run in two machines  Memory threshold for spill : 60MB  Push 30% of states in each state spill  Average tuple inter-arrival time 50ms from each input

24 SIGMOD'06 24 Experimental Setup High Performance PC cluster  Dual 2.4GHz CPUs, 2G Memory, Gigabit Network  3 Machines for Stream Generator, Application Server, and Distribution Manager.  Each Query Processor on Separate Machine Generated Data Streams with Integer Join Column Values  Data value V appears R times for every K input tuples Tuple Range : K Range Join Ratio : R  Average Join Rate : Average number of tuples with same join value per input

25 SIGMOD'06 25 Percentage Spilled per Adaptation Amount of State Pushed Each Adaptation  Percentage: # of Tuples Pushed / Total # of Tuples (Input Rate: 30ms/Input, Tuple Range:30K, Join Ratio:3, Adaptation threshold: 200MB) Run-Time Query Throughput Run-Time Main Memory Usage

26 SIGMOD'06 26 Experiment : Throughput & Memory Query with Average Join Rate: Join 1 : 3, Join 2 : 1, Join 3 : 1

27 SIGMOD'06 27 Experiment : Throughput Comparison Query with Average Join Rate: Join 1 : 1, Join 2 : 3, Join 3 : 3 Query with Average Join Rate: Join 1 : 3, Join 2 : 2, Join 3 : 3

28 SIGMOD'06 28 Experimental Summary Productivity metric improves run-time throughput Global-output-with-penality is overall winner Global output (with and without penality) outperform alternates in runtime throughput Global output (with and without penality) have similar (good) cleanup costs Bottom-up strategy has lowest # of adaptations, yet poor performer and high cleanup costs

29 SIGMOD'06 29 Related Work XJoin [UF00], Hash-Merge [MLA04], Flux [SH03]  Only spill states for one single operator Load Shedding [TUZC03]  Drop input tuples to handle resource shortage Continuous Query Processing [SLJ+05,XZH05,RD04,AC03,BBDM02,CF02, MSH02,CDT00]  No plan-level state spill

30 SIGMOD'06 30 Conclusions Identified Problem of Plan-Spill  State spill using “productivity” viable Proposed Plan-Level Spill Policies  Dependencies considered for multi-operator plans Evaluated Spill Policies  Global spill solutions improve throughput

31 SIGMOD'06 31 Questions ? Thank You !

32 SIGMOD'06 32 Acknowledgments DSRG students contributed to CAPE code base, including Luping Ding, Bin Liu, Tim Sutherland, Brad Pielech, Rimma Nehme, Mariana Jbantova, Brad Momberger, Song Wang, Natasha Bogdanova Thanks to National Science Foundation for partial support via IDM and equipment grants, to WPI for RDC grant, and to NEC for student support


Download ppt "SIGMOD'061 Run-Time Operator State Spilling for Memory Intensive Long-Running Queries Bin Liu, Yali Zhu and Elke A. Rundensteiner Database Systems Research."

Similar presentations


Ads by Google