Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hadoop System simulation with Mumak Fei Dong, Tianyu Feng, Hong Zhang Dec 8, 2010.

Similar presentations


Presentation on theme: "Hadoop System simulation with Mumak Fei Dong, Tianyu Feng, Hong Zhang Dec 8, 2010."— Presentation transcript:

1 Hadoop System simulation with Mumak Fei Dong, Tianyu Feng, Hong Zhang Dec 8, 2010

2 Agenda Objective Comparison between MRPerf and Mumak Modifications to Mumak Results and discussion Conclusion

3 Objective Large scale distributed system has enormous amount of parameters. Running time of a user program depends non- linearly on these parameters. Predict the running time under various settings to help user choose the “optimal” setting. We start by varying the most basic parameter: cluster size.

4 MRPerf and Mumak MRPerf – Build upon a network simulator – Calculate the task running time and network delay from physical parameters – Implemented the Hadoop system in TCL – Flexible in simulation

5 MRPerf and Mumak Map slots per node Reduce slots per node Running Time 4 nodes double rack data center (Chunk Size = 64M) By MRPerf

6 MRPerf and Mumak 4 nodes (Chunk Size = 64M) By Mumak

7 MRPerf and Mumak Mumak – Inherit the JobTracker class from Hadoop and only defines the simulation interface – Use trace file to build the cluster topology / job story, then feed it into simulator – Can only reproduce previous finished experiment – Designed to verify/debug Hadoop system design – Only simulate the Map/Reduce tasks, no sort phase and shuffle phase

8 MRPerf and Mumak The approach taken by MRPerf is better – Take in parameters to estimate running time – Can make predictions MRPerf is simulating their implementation of Hadoop The design of Mumak is better – Inherit source code from Hadoop – Easy to understand and to extend We decide to take the good parts of MRPerf and then implement them in the framework of Mumak – Modify the Rumen log to change the parameters – Modify Mumak source code to add network simulator

9 Implementation Simulate a different cluster size – Hack the rumen log, change data replication factor/ locality – Modify the topology, add in / delete nodes, for example, from 2 slave nodes to 6 slave nodes. – The job tracker will assign the tasks to different nodes.

10 Implementation Simulate network delay – We defined a simple network simulator interface – Modified the source code of Mumak to add in the network delay – Actual the network delay can be ignored

11 Results and Discussion

12

13

14 Limitations and future work – Sort phase time not included – Only used single rack topology – Prediction is not always consistent for the same job with the same configuration

15 Conclusion Our objective is to predict the running time with different parameters We take the methods of MRPerf and implemented it on Mumak To have more flexible and accurate prediction, more modification to Mumak is needed – Independent from trace file – Solve the unstable problem

16 Questions?


Download ppt "Hadoop System simulation with Mumak Fei Dong, Tianyu Feng, Hong Zhang Dec 8, 2010."

Similar presentations


Ads by Google