Presentation is loading. Please wait.

Presentation is loading. Please wait.

Euro-Par 2008, Las Palmas, 27 August 2008 1 DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan.

Similar presentations


Presentation on theme: "Euro-Par 2008, Las Palmas, 27 August 2008 1 DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan."— Presentation transcript:

1 Euro-Par 2008, Las Palmas, 27 August 2008 1 DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan Sonmez, and Dick Epema PDS Group Delft University of Technology The Netherlands

2 Euro-Par 2008, Las Palmas, 27 August 2008 2 A Grid Research Toolbox Hypothesis: (a) is better than (b). DGSim 1 2 3 For scenario 1, …

3 Euro-Par 2008, Las Palmas, 27 August 2008 3 A Grid Research Toolbox Hypothesis: (a) is better than (b). DGSim 1 2 3 For scenario 1, …

4 Euro-Par 2008, Las Palmas, 27 August 2008 4 The Problem with Grid Simulations Three decades of writing simulators in computer science → writing the simulator is not the problem The problem: getting from solution design to experimental results with an automated simulation tool Experimental setup Tool to generate realistic experimental setups Experiment support for grid resource management Tool to manage large numbers of related simulations Performance Not the simulation time (decades of optimizations there) Tool proved to work with large simulations (number of resources, workload size, etc.)

5 Euro-Par 2008, Las Palmas, 27 August 2008 5 Outline 1.Problem Statement 2.The DGSim Framework 3.DGSim Validation 4.DGSim Examples 5.Future Work

6 Euro-Par 2008, Las Palmas, 27 August 2008 6 2. The DGSim Framework Name, Goal, and Challenges DGSim = Delft Grid Simulator Simulate various grid resource management architectures Multi-cluster grids Grids of grids (THE grid) Challenges Many types of architectures Generating and replaying grid workloads Management of the simulations Many repetitions of a simulation for statistical relevance Simulations with many parameters Managing results (e.g., analysis tools) Enabling collaborative experiments Two GRM architectures

7 Euro-Par 2008, Las Palmas, 27 August 2008 7 2. The DGSim Framework Overview Discrete-Event Simulator

8 Euro-Par 2008, Las Palmas, 27 August 2008 8 2. The DGSim Framework Model Details: Inter-Operation Architectures Hybrid hierarchical/ decentralized Decentralized Hierarchical IndependentCentralized

9 Euro-Par 2008, Las Palmas, 27 August 2008 9 2. The DGSim Framework Model Details: Resource Dynamics & Evolution Resource dynamics Short-term changes in resource availability status Resource evolution Long-term changes in number & … of resources A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, IEEE/ACM Grid, 2007.

10 Euro-Par 2008, Las Palmas, 27 August 2008 10 2. The DGSim Framework Workloads: Generation and Model(s) Parallel jobs Adapting the Lublin-Feitelson model to grids Bags-of-Tasks: groups of independent single-processor tasks Validated with seven long-term grid traces A. Iosup, O.O. Sonmez, S. Anoep, D.H.J.Epema, The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems, IEEE HPDC, 2008. A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, ACM/IEEE SuperComputing, 2007. Workload Generation Generate synthetic workload with realistic characteristics Iterative workload generation: incur specified load on a grid

11 Euro-Par 2008, Las Palmas, 27 August 2008 11 Outline 1.Problem Statement 2.The DGSim Framework 3.DGSim Validation 4.DGSim Examples 5.Future Work

12 Euro-Par 2008, Las Palmas, 27 August 2008 12 3. DGSim Validation Functional Validation Functional validation (simple scenario) Workload = 100 jobs ct. size 10,000 arrive at t=0 System: grid scheduler over one 10-resource cluster resource = 1 work unit/second, information delay = 0-3600s

13 Euro-Par 2008, Las Palmas, 27 August 2008 13 3. DGSim Validation Real vs. Simulated DAS-3 Multi-Cluster Grid Simulator setup Application: synthetic parallel, communication-intensive (all-gather) Measured: runtime for various configurations (co-allocation) System: heterogeneous clusters, Koala co-allocating scheduler Workload: 300 jobs, submitted over a period of 6 hours All jobs submitted through central cluster gateways Results Scheduling algorithm leads to similar results in real and simulated environments → can use simulator for analyzing scheduling trends Under-estimation of waiting time (failures lead to more contention)

14 Euro-Par 2008, Las Palmas, 27 August 2008 14 Outline 1.Problem Statement 2.The DGSim Framework 3.DGSim Validation 4.DGSim Examples 5.Future Work

15 Euro-Par 2008, Las Palmas, 27 August 2008 15 4. DGSim Examples Sample 1/3 Investigate mechanisms for inter-operating grids New mechanism: DMM Trace-based performance evaluation through simulations Real and model-based traces Largest trace: 1.4M jobs Simulate Grid’5000+DAS-2 Explored a design space of over 1 million design points A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, ACM/IEEE SuperComputing, 2007.

16 Euro-Par 2008, Las Palmas, 27 August 2008 16 4. DGSim Examples Sample 2/3 What is the performance impact of the dynamic grid resource availability? Four models for grid resource availability information Trace-based performance evaluation through simulations Real traces Simulate Grid’5000 KA = AMA > HMA >> SA A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, IEEE/ACM Grid, 2007. Resource availability StaticDynamic Availability Information Delay On-Time (0) Short period Long period SAKA AMA HMA Avg. Norm. G’put. [cpuseconds/day/proc] Goodput decreases with intervention delay Model SAKAAMA 60s AMA 1h HMA 1w HMA 1mo HMA Never

17 Euro-Par 2008, Las Palmas, 27 August 2008 17 4. DGSim Examples Sample 3/3 Analyze performance of bag- of-tasks scheduling algorithms Information availability framework: Known, Unknown, Historical records Trace-based performance evaluation through simulations Real and model-based traces Simulate Grid’5000+DAS Evaluated 8 scheduling algorithms Explored a design space of over 2 million design points A. Iosup, O.O. Sonmez, S. Anoep, D.H.J.Epema, The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems, IEEE HPDC, 2008. Task Information Resource Information KHU K H U ECT, FPLT FPFECT-P DFPLT, MQD STFR RR, WQR

18 Euro-Par 2008, Las Palmas, 27 August 2008 18 Outline 1.Problem Statement 2.The DGSim Framework 3.DGSim Validation 4.DGSim Examples 5.Future Work

19 Euro-Par 2008, Las Palmas, 27 August 2008 19 Conclusion and Future Work The DGSim framework Tool to generate realistic experimental setups Tool to manage large numbers of grouped simulations Tool proved to work with large simulations Validated underlying models and assumptions Resource dynamics and evolution model Workload model Comparing grid resource management architectures Proven in various settings Future work More scenarios Library of ready-to-use scenarios

20 Euro-Par 2008, Las Palmas, 27 August 2008 20 Thank you! Questions? Remarks? Observations? Contact: A.Iosup@gmail.com [google “Iosup“] Web sites: ohttp://www.vl-e.nl : VL-e project ohttp://www.pds.ewi.tudelft.nl : PDS group articles & software


Download ppt "Euro-Par 2008, Las Palmas, 27 August 2008 1 DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan."

Similar presentations


Ads by Google