Presentation on theme: "Jiexing Li #*, Rimma Nehme *, Jeff Naughton #* # University of Wisconsin-Madison * Microsoft Jim Gray Systems Lab Toward Progress Indicators on Steroids."— Presentation transcript:
Jiexing Li #*, Rimma Nehme *, Jeff Naughton #* # University of Wisconsin-Madison * Microsoft Jim Gray Systems Lab Toward Progress Indicators on Steroids for Big Data Systems
Explosive growth in the complexity, diversity, number of deployments, and capabilities of big data processing systems. Explosive growth in big data systems 2 [Ailamaki et al., 2011] PDW Engine PDW Store (SQL Server) Azure Engine Azure Store HDFS Nephele Asterix BTree Amazon S3 MySQL indexes Hadoop Map/Reduce Engine Hyracks runtime/execution engines SQL Map/Reduce Programming Model Cascading Pig HiveJaql PACT AQL Simple DB KVS SimpleDB PM PNuts HBase Algebrix programming models
They are large and complex beasts. To operate them efficiently, we need information about what is going on in the system. Big data systems 3 … Node 1Node 2Node n Thousands of servers Task 1 Task 2Task n Data 1Data 2Data n
Instantaneous snapshot information is important and nice, but not sufficient. We also need to know what it will look like in the future. Need to know the future state 4 Node 1Node 2Node n Data 1Data 2Data n Task 1 … Task 2Task n CPU overload! Bad disk! Lack of memory! …
Predicting the future in these systems is difficult or impossible. Dont require perfect predictions: Instead, anticipate the presence of errors. Detect them and react as time progresses. Progress indicators fit this predict, monitor, revise paradigm really well. Need predict, monitor, revise paradigm 5 One-shot predict and ignore Predict, monitor, and revise Unreliable
A PI provides feedback to users on how much of the task has been completed or when the task will finish. Begins with a prediction of the query progress, and while query executes, modifies the prediction based on the perceived information. But they are currently too weak and limited for big data systems. Progress indicator (PI)
Our goal: to advocate for the creation and research into progress indicators on steroids. Use more practical evaluation metrics to depict quality. Expand the class of computations they can serve. Expand the kinds of information they can provide. Continue to increase the accuracy of their prediction. Progress indicator on steroids 7 What?How?
Change our way of evaluating progress indicator technology. Our vision 8 Helpful for specific tasks Accurate when nothing changes React to changes quickly Accuracy (Current PIs) Progress indicators
Expand the class of computations they can serve. Our vision (cont.) 9 Optimizer Straggler/skew handler Scheduler Resource manager Performance debugger User interface (Current PIs) … Progress indicators
Expand the kinds of information they can provide. Our vision (cont.) 10 Disk fragmentation Straggling tasks Good/bad machines Resource availability Automatic failure diagnosis p% or time (Current PIs) … Progress indicators
A progress score provided by Pig for a MapReduce job: Divide it into 4 phases. For a phase, the score is the percentage of data read/processed. The overall progress for the job is the average of these 4 scores. This is a very rough estimate, which assumes that each phase contributes equally to the overall score. A promising simple example 11 Record Reader Map Combine Copy SortReduce Map task Reduce task
Hadoop uses these progress estimates to select stragglers and schedule backup executions on other machines. Improved execution time by 44%. [Dean et al., OSDI, 2004] Improved execution time further by a factor of 2. [Zaharia et al., OSDI, 2008] A promising simple example (cont.) 12 Straggler: a task that makes much less progress than tasks in its category. Node 1Node 2Node n P1%P2% Pn% … StragglerBackup execution Already deployed! Simple and rough estimates, but really helpful!
One line of research: retargeting even todays simple progress indicators to new systems can be interesting and challenging. Think: complexity and diversity of different data processing systems Example: We attempted to apply a debug run-based PI developed for MapReduce jobs to parallel database systems. Achieving vision requires research 13
For a query plan, estimates the processing speed for each phase/pipeline using information from earlier (debug) runs. The idea of a debug run-based PI 14 Data 1. Original data 2. Sample data 3. Execute the job [Morton et al., SIGMOD, 2010] 4. Calculate the processing speed (e.g., how many bytes can be processed per second) for each phase. 5. Remaining time (RT) = remaining data/speed.
This worked very well for map-reduce jobs. But what happens when we apply this debug-run approach to a parallel database system? We ran a simple experiment to find out. Questions: 15
Implemented the progress indicator in SQL Server PDW. Cluster: 18 nodes (1 control node, 1 data landing node, and 16 compute nodes). Connected with 1Gbit Ethernet switch 2 Intel Xeon L5630 quad-core processors 32 GB memory (at most 24 GB for DBMS) 10 300 GB hard drivers (8 disks for data) Experimental setup 16
Database: 1TB TPC-H. Each table is either hash partitioned or replicated. When a table is hash partitioned, each compute node contains 8 horizontal data partitions (8*16 in total). Experimental setup (cont.) 17 TablePartition keyTablePartition key Customerc_custkeyPartp_partkey Lineiteml_oderkeyPartsuppps_partkey Nation(replicated)Region(replicated) Orderso_orderkeySuppliers_supplykey
TPC-H Q1: no joins, 7 pipelines, and the speed estimates are accurate. Debug run-based PI can work well 18
TPC-H Q4: Later joins in the debug run have very few tuples to process. Complex queries are more challenging 19 00 Percentage: 1%, 0.01%, 0.0001%, 0%, 0%. 1% of the 1TB data
Cost-based optimization may yield different plans for sampled versus entire dataset. Optimizer also presents challenges 20 Original data Table Scan [l] Filter Hash Match Table Scan [o] Shuffle Move Sample Table Scan [l] Filter Nested Loop Table Scan [o] Broadcast Move Only 6 out of 22 TPC-H queries used the same plans.
Even a simple task (porting debug run-based PI from MapReduce to parallel DBMS) is challenging. New ideas needed to make it work. How to build progress indicators for variety of systems for variety of uses is a wide-open problem. Conclusion from experiment. 21
Operators Work and speed estimation Pipeline definition & shape Dynamicity Statistics Parallelism … Some specific technical challenges 22 A promising direction, but still a really long way to go!
Proposed and discussed the desirability of developing progress indicators on steroids. Issues to consider include: Evaluation metrics. Computations to serve. Information to provide. Accuracy. Small case study illustrates that even small steps towardprogress indicators on steroids require effort and careful thought. Conclusions