1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.

Slides:

Advertisements

Similar presentations

Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel.

Advertisements

Evaluating the Cost-Benefit of Using Cloud Computing to Extend the Capacity of Clusters Presenter: Xiaoyu Sun.

The First 16 Years of the Distributed ASCI Supercomputer Henri Bal Vrije Universiteit Amsterdam COMMIT/

7 april SP3.1: High-Performance Distributed Computing The KOALA grid scheduler and the Ibis Java-centric grid middleware Dick Epema Catalin Dumitrescu,

LIBRA: Lightweight Data Skew Mitigation in MapReduce

SLA-Oriented Resource Provisioning for Cloud Computing

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.

19 November 2013 Exploring Portfolio Scheduling for Long-term Execution of Scientific Workloads in IaaS Clouds Alexandru Iosup Delft University of Technology.

A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.

Scheduling of parallel jobs in a heterogeneous grid environment Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous.

June 1, Inter-Operating Grids through Delegated MatchMaking Alexandru Iosup, Dick Epema PDS Group, TU Delft, NL Todd Tannenbaum, Matt Farrellee,

June 3, 2015 Synthetic Grid Workloads with Ibis, K OALA, and GrenchMark CoreGRID Integration Workshop, Pisa A. Iosup, D.H.J. Epema Jason Maassen, Rob van.

The Performance of Bags-Of-Tasks in Large-Scale Distributed Computing Systems Alexandru Iosup, Ozan Sonmez, Shanny Anoep, and Dick Epema ACM/IEEE Int’l.

Inter-Operating Grids through Delegated MatchMaking Alexandru Iosup, Dick Epema, Hashim Mohamed,Mathieu Jan, Ozan Sonmez 3 rd Grid Initiative Summer School,

Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs John Oleszkiewicz, Li Xiao, Yunhao Liu IEEE TRASACTION ON PARALLEL.

DAS-3/Grid’5000 meeting: 4th December The KOALA Grid Scheduler over DAS-3 and Grid’5000 Processor and data co-allocation in grids Dick Epema, Alexandru.

1 A Performance Study of Grid Workflow Engines Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Corina Stratan Parallel.

Dynamic Load Balancing Experiments in a Grid Vrije Universiteit Amsterdam, The Netherlands CWI Amsterdam, The

1 Trace-Based Characteristics of Grid Workflows Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Simon Ostermann,

Security-Driven Heuristics and A Fast Genetic Algorithm for Trusted Grid Job Scheduling Shanshan Song, Ricky Kwok, and Kai Hwang University of Southern.

Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,

June 25, GrenchMark: Synthetic workloads for Grids First Demo at TU Delft A. Iosup, D.H.J. Epema PDS Group, ST/EWI, TU Delft.

Grid Load Balancing Scheduling Algorithm Based on Statistics Thinking The 9th International Conference for Young Computer Scientists Bin Lu, Hongbin Zhang.

June 28, Resource and Test Management in Grids Rapid Prototyping in e-Science VL-e Workshop, Amsterdam, NL Dick Epema, Catalin Dumitrescu, Hashim.

4 december, The Distributed ASCI Supercomputer The third generation Dick Epema (TUD) (with many slides from Henri Bal) Parallel and Distributed.

June 6, 2002D.H.J. Epema/PDS/TUD1 Processor Co-Allocation in Multicluster Systems DAS-2 Workshop Amsterdam June 6, 2002 Anca Bucur and Dick Epema Parallel.

July 13, “How are Real Grids Used?” The Analysis of Four Grid Traces and Its Implications IEEE Grid 2006 Alexandru Iosup, Catalin Dumitrescu, and.

Euro-Par 2008, Las Palmas, 27 August DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan.

Dynamic Load Sharing and Balancing Sig Freund. Outline Introduction Distributed vs. Traditional scheduling Process Interaction models Distributed Systems.

Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.

Cloud MapReduce ： a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.

: Massivizing Online Games using Cloud Computing Alexandru Iosup Parallel and Distributed Systems Group, Delft University of Technology, The Netherlands.

MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.

Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.

COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.

1 TUD-PDS A Periodic Portfolio Scheduler for Scientific Computing in the Data Center Kefeng Deng, Ruben Verboon, Kaijun Ren, and Alexandru Iosup Parallel.

1 Cloud Computing Research at TU Delft – A. Iosup Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology The Netherlands.

Marcos Dias de Assunção 1,2, Alexandre di Costanzo 1 and Rajkumar Buyya 1 1 Department of Computer Science and Software Engineering 2 National ICT Australia.

Chapter 7 Scheduling Chien-Chung Shen CIS, UD

Resource Provisioning based on Lease Preemption in InterGrid Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing and Distributed Systems.

Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.

Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.

1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The.

1 Optimal Resource Placement in Structured Peer-to-Peer Networks Authors: W. Rao, L. Chen, A.W.-C. Fu, G. Wang Source: IEEE Transactions on Parallel and.

Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.

1 ROIA 2009 – CAMEO: Continuous Analytics for Massively Multiplayer Online Games CAMEO: Continuous Analytics for Massively Multiplayer Online Games Alexandru.

BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.

Dynamic Load Balancing and Job Replication in a Global-Scale Grid Environment: A Comparison IEEE Transactions on Parallel and Distributed Systems, Vol.

Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.

Copyright © 2011, Performance Evaluation of a Green Scheduling Algorithm for Energy Savings in Cloud Computing Truong Vinh Truong Duy; Sato,

MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.

Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.

Performance Analysis of Preemption-aware Scheduling in Multi-Cluster Grid Environments Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing.

Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.

Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.

Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY THERMAL-AWARE RESOURCE.

Group # 14 Dhairya Gala Priyank Shah. Introduction to Grid Appliance The Grid appliance is a plug-and-play virtual machine appliance intended for Grid.

Chapter 4 CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

Chapter 7 Scheduling Chien-Chung Shen CIS/UD

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

Cloud Benchmarking, Tools, and Challenges

(Parallel and) Distributed Systems Group

PA an Coordinated Memory Caching for Parallel Jobs

Process Scheduling B.Ramamurthy 9/16/2018.

Understanding and Exploiting Amazon EC2 Spot Instances

A Characterization of Approaches to Parrallel Job Scheduling

Hawk: Hybrid Datacenter Scheduling

CPU SCHEDULING SIMULATION

Presentation transcript:

1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan Ghiţ, Alexandru Iosup, Dick Epema Institute: TU Delft, the Netherlands

2 Challenge the future A Multicluster and Multicloud Environment Heterogeneous resources Efficient resource management

3 Challenge the future A Multicluster and Multicloud Environment Increasing number of short jobs[1][2] Goal: optimize global job slowdown by optimizing short jobs [1] Iosup, A; Epema, D., “Grid Computing Workloads”,Internet Computing, IEEE, vol.15, no.2, pp.19,26, March-April 2011 [2] Yanpei Chen; Ganapathi, A; Griffith, R.; Katz, R., “The Case for Evaluating MapReduce Performance Using Workload Suites”, MASCOTS, 2011 IEEE 19 th International Symposium, July 2011

4 Challenge the future Introduction KOALA-C scheduler for integrated multicluster and multicloud environments. Two new scheduling policies for the new environment. A comprehensive experimental evaluation of our architecture and policies. Contribution

5 Challenge the future KOALA: A Co-allocating Grid Scheduler Original goals: 1.processor co-allocation: parallel applications 2.data co-allocation: job affinity based on data locations 3.load sharing: in the absence of co-allocation Additional goals: research vehicle for grid and cloud research support for (other) popular application types (e.g. MapReduce) KOALA: is written in Java is middleware independent (initially Globus-based) has been deployed on the DAS2 – DAS4 since Sept. 2004

6 Challenge the future DAS-4: The Distributed ASCI Supercomputer 4 System purely for CS research OpenNebula installed Operational since Oct Institutions and organizations: VU University, Amsterdam (VU) Leiden University (LU) University of Amsterdam (UvA) Delft University of Technology (TUD) The MultimediaN Consortium (UvA-MN) Netherlands Institute for Radio Astronomy (ASTRON)

7 Challenge the future System Model Global and Local resource managers (GRM and LRMs).

8 Challenge the future System Model Global and Local resource managers.

9 Challenge the future System Model Global and Local resource managers. Compute-intensive jobs, sequential or parallel high-performance applications. Rigid jobs that do not change size.

10 Challenge the future System Model Global and Local resource managers. Compute-intensive jobs, sequential or parallel high-performance applications. Rigid jobs that do not change size. Preemptive jobs (restart instead of resume).

11 Challenge the future Outline 1.A Multicluster and Multicloud Environment 2.KOALA-C: a System for MC + MC Environments 3.Design for Policies without prediction 4.Simulation and Real-World Experiments 5.Conclusion

12 Challenge the future KOALA-C

13 Challenge the future KOALA-C

14 Challenge the future KOALA-C

15 Challenge the future KOALA-C

16 Challenge the future KOALA-C

17 Challenge the future KOALA-C Logical Partitioning

18 Challenge the future Outline 1.A Multicluster and Multicloud Environment 2.KOALA-C: a System for MC + MC Environments 3.Design for Policies without prediction 4.Simulation and Real-World Experiments 5.Conclusion

19 Challenge the future Policy Design Space PredictionSlowdownPreemptionMC+MC support Uninformed (FF, RR) Informed (SJF, SJF-ideal, HSDF) Traditional TAGS TAGS extensions (TAGS-chain, TAGS-sets) new We propose TAGS-chain and TAGS-sets, which are new in this work.

20 Challenge the future Task Assignment by Guessing Size (TAGS)[3]: use no prior information and use no prediction jobs can be aborted optimize short jobs Traditional TAGS Policy [3] Mor Harchol-Balter, “Task Assignment with Unknown Duration”, ACM 49, 2002

21 Challenge the future TAGS-based Policy Design Goal: to achieve low slowdown without prediction Method: to partition the sites into sets to serve jobs of different runtime ranges A number of sets of sites Set i allows jobs to run for T i amount of time (T i < T i+1 ) The last set has a T of ∞ (all jobs will finish without being killed) T 1 < T 2 < T 3 T 3 = ∞

22 Challenge the future TAGS-based Policy Design

23 Challenge the future Policy Design TAGS-based Policies in General

24 Challenge the future Policy Design TAGS-based Policies in General

25 Challenge the future Policy Design TAGS- chain and TAGS- sets TAGS-chain: TAGS-sets:

26 Challenge the future Outline A Multicluster and Multicloud Environment KOALA-C: a System for MC + MC Environments Design for Policies without Prediction Simulation and Real-World Experiments Simulation Results Real-World Experiment Results Conclusion

27 Challenge the future Simulation Results Resources: 3 cluster sites (32 nodes each) 1 private cloud site (≤64 VMs) and 1 public cloud site (≤128 VMs) 6 policies: 5 baselines and TAGS-sets Workload: 8 traces job size limited to 32 nodes 6 Policies and 2 Metrics: FF, RR, SJF, SJF-ideal, and HSDF as baselines + TAGS-sets slowdown, total cost Simulation results agree with real-world experiment results * We have numerous experimental results in the paper Experimental Setup Parallel Workload Archive

28 Challenge the future Simulation Results Configuration of TAGS- sets logscale

29 Challenge the future Simulation Results Configuration of TAGS- sets More effect on slowdown

30 Challenge the future Simulation Results Configuration of TAGS- sets More effect on slowdown No common optimal runtime limit, but 40min seems to be suitable for most traces.

31 Challenge the future Simulation Results Policy Evaluation

32 Challenge the future Simulation Results Policy Evaluation SJF-ideal has lowest slowdowns, but it cannot be achieved in reality.

33 Challenge the future Simulation Results Policy Evaluation TAGS-sets has the lowest slowdowns overall PWA-1 PWA-2 PWA-3 PWA-4 PWA-5 GWA-1 GWA-2 GWA-3

34 Challenge the future Simulation Results Policy Evaluation >10x improvement over SJF >25x improvement over SJF PWA-1 PWA-2 PWA-3 PWA-4 PWA-5 GWA-1 GWA-2 GWA-3

35 Challenge the future Simulation Results Policy Evaluation TAGS-sets can achieve good performance in multicluster and multicloud environments. PWA-1 PWA-2 PWA-3 PWA-4 PWA-5 GWA-1 GWA-2 GWA-3

36 Challenge the future Simulation Results Policy Evaluation logscale PWA-1 PWA-2 PWA-3 PWA-4 PWA-5 GWA-1 GWA-2 GWA-3

37 Challenge the future Simulation Results Policy Evaluation TAGS-sets incurs 1.3 to 4.2x higher costs than the others PWA-1 PWA-2 PWA-3 PWA-4 PWA-5 GWA-1 GWA-2 GWA-3

38 Challenge the future Simulation Results Policy Evaluation TAGS-sets can achieve a good performance-cost tradeoff in an integrated multicluster and multicloud environment. PWA-1 PWA-2 PWA-3 PWA-4 PWA-5 GWA-1 GWA-2 GWA-3

39 Challenge the future Real-World Experiment Results Resources: 2 cluster sites of the DAS-4 system (32 nodes each) OpenNebula-based private cloud of DAS-4 (up to 32 VMs) Amazon EC2 as public cloud (up to 64 VMs) Workload: A part of the CTC-SP2 workload (≈12 hours) 70% average utilization on the system (with the max cloud size) Policies: FF, SJF, and TAGS-sets TAGS-sets setup: Experimental Setup ≤10min [4] James Patton Jones and Bill Nitzberg, “Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization”, 1999

40 Challenge the future Experimental Results

41 Challenge the future Experimental Results Lower and more balanced slowdowns

42 Challenge the future Experimental Results

43 Challenge the future Experimental Results Short jobs have shorter wait times Long jobs have longer wait times

44 Challenge the future Experimental Results TAGS-sets has better optimization for short jobs than the other policies.

45 Challenge the future Outline A Multicluster and Multicloud Environment KOALA-C: a System for MC + MC Environments Design for Policies without Prediction Simulation and Real-World Experiments Conclusion

46 Challenge the future Conclusion We designed KOALA-C, a task scheduler for integrated multicluster and multicloud environments. We designed two new scheduling policies without prediction for multicluster and multicloud environments. Through experiments, we show that two new policies can achieve significant improvement in performance. For the future, we plan to extend our policy set.

47 Challenge the future More Information PDS group: Parallel and Distributed Systems TU Delft Publications: see PDS publication database at: Home pages: Web sites: KOALA: DAS-4: GWA:gwa.ewi.tudelft.nl

48 Challenge the future Thank You Questions?