June 6, 2002D.H.J. Epema/PDS/TUD1 Processor Co-Allocation in Multicluster Systems DAS-2 Workshop Amsterdam June 6, 2002 Anca Bucur and Dick Epema Parallel.

Slides:



Advertisements
Similar presentations
Operations Scheduling
Advertisements

7 april SP3.1: High-Performance Distributed Computing The KOALA grid scheduler and the Ibis Java-centric grid middleware Dick Epema Catalin Dumitrescu,
Hadi Goudarzi and Massoud Pedram
A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.
Load Balancing of Elastic Traffic in Heterogeneous Wireless Networks Abdulfetah Khalid, Samuli Aalto and Pasi Lassila
WS-VLAM: Towards a Scalable Workflow System on the Grid V. Korkhov, D. Vasyunin, A. Wibisono, V. Guevara-Masis, A. Belloum Institute.
June 3, 2015 Synthetic Grid Workloads with Ibis, K OALA, and GrenchMark CoreGRID Integration Workshop, Pisa A. Iosup, D.H.J. Epema Jason Maassen, Rob van.
Reference: Message Passing Fundamentals.
1 Cooperative Communications in Networks: Random coding for wireless multicast Brooke Shrader and Anthony Ephremides University of Maryland October, 2008.
AQM for Congestion Control1 A Study of Active Queue Management for Congestion Control Victor Firoiu Marty Borden.
DAS-3/Grid’5000 meeting: 4th December The KOALA Grid Scheduler over DAS-3 and Grid’5000 Processor and data co-allocation in grids Dick Epema, Alexandru.
Dynamic Load Balancing Experiments in a Grid Vrije Universiteit Amsterdam, The Netherlands CWI Amsterdam, The
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
An Introduction to Parallel Computing Dr. David Cronk Innovative Computing Lab University of Tennessee Distribution A: Approved for public release; distribution.
Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.
I think your suggestion is, Can we do two things at once? Well, we’re of the view that we can walk and chew gum at the same time. —Richard Armitage, deputy.
1 Queueing Theory H Plan: –Introduce basics of Queueing Theory –Define notation and terminology used –Discuss properties of queuing models –Show examples.
Performance Evaluation
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Gerhard Maierbacher Scalable Coding Solutions for Wireless Sensor Networks IT.
Performance Evaluation of Peer-to-Peer Video Streaming Systems Wilson, W.F. Poon The Chinese University of Hong Kong.
GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.
June 28, Resource and Test Management in Grids Rapid Prototyping in e-Science VL-e Workshop, Amsterdam, NL Dick Epema, Catalin Dumitrescu, Hashim.
7/3/2015© 2007 Raymond P. Jefferis III1 Queuing Systems.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Euro-Par 2008, Las Palmas, 27 August DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan.
April 10, Simplifying solar harvesting model- development in situated agents using pre-deployment learning and information sharing Huzaifa Zafar.
Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer.
(C) 2009 J. M. Garrido1 Object Oriented Simulation with Java.
1 Enabling Large Scale Network Simulation with 100 Million Nodes using Grid Infrastructure Hiroyuki Ohsaki Graduate School of Information Sci. & Tech.
Decentralised load balancing in closed and open systems A. J. Ganesh University of Bristol Joint work with S. Lilienthal, D. Manjunath, A. Proutiere and.
1 Chapter 5 Flow Lines Types Issues in Design and Operation Models of Asynchronous Lines –Infinite or Finite Buffers Models of Synchronous (Indexing) Lines.
Management of Waiting Lines McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Distributing Layered Encoded Video through Caches Authors: Jussi Kangasharju Felix HartantoMartin Reisslein Keith W. Ross Proceedings of IEEE Infocom 2001,
Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.
Performance Evaluation of Computer Systems Introduction
1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
Kevin Ross, UCSC, September Service Network Engineering Resource Allocation and Optimization Kevin Ross Information Systems & Technology Management.
NETE4631:Capacity Planning (2)- Lecture 10 Suronapee Phoomvuthisarn, Ph.D. /
Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.
Lecture 18: Dynamic Reconfiguration II November 12, 2004 ECE 697F Reconfigurable Computing Lecture 18 Dynamic Reconfiguration II.
1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.
Queuing Theory Basic properties, Markovian models, Networks of queues, General service time distributions, Finite source models, Multiserver queues Chapter.
Downlink Scheduling With Economic Considerations to Future Wireless Networks Bader Al-Manthari, Nidal Nasser, and Hossam Hassanein IEEE Transactions on.
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
The M/M/ N / N Queue etc COMP5416 Advanced Network Technologies.
CSCI1600: Embedded and Real Time Software Lecture 19: Queuing Theory Steven Reiss, Fall 2015.
An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Static Process Scheduling
Modelling job allocation where service duration is unknown Nigel Thomas University of Newcastle upon Tyne.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
(C) J. M. Garrido1 Objects in a Simulation Model There are several objects in a simulation model The activate objects are instances of the classes that.
Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Management of Waiting Lines Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent.
Simple Queueing Theory: Page 5.1 CPE Systems Modelling & Simulation Techniques Topic 5: Simple Queueing Theory  Queueing Models  Kendall notation.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
OPERATING SYSTEMS CS 3502 Fall 2017
WAITING LINES AND SIMULATION
Xing Cai University of Oslo
Introduction to Load Balancing:
Dynamic Graph Partitioning Algorithm
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
B.Ramamurthy Appendix A
Presentation transcript:

june 6, 2002D.H.J. Epema/PDS/TUD1 Processor Co-Allocation in Multicluster Systems DAS-2 Workshop Amsterdam June 6, 2002 Anca Bucur and Dick Epema Parallel and Distributed Systems Group Delft University of Technology

june 6, 2002D.H.J. Epema/PDS/TUD2 Introduction (1) In multicluster systems (like the DAS, in GRIDs), jobs may use co-allocation (i.e., span multiple clusters): –to use available capacity –to process geographically spread data Single-application performance issues: –application restructuring –wide-area runtime systems (e.g., optimize collective communication operations) Multiple-application performance issues: –design/analyze scheduling policies –minimize response time, maximize maximal utilization

june 6, 2002D.H.J. Epema/PDS/TUD3 Introduction (2): Example In april 2001, the Cactus Computational Toolkit was used for four-hour astrophysics simulations involving Einstein’s General Relativity equations Equipment: –At NCSA: 480 CPUs of three SGI Origin2000 systems –At SDSC: 1020 CPUs of Blue Horizon –OC Mbit/s network

june 6, 2002D.H.J. Epema/PDS/TUD4 Introduction (3): Problems time processors (pattern: idle) fits with if flexible fits with if unordered cluster 1 cluster 2 cluster 3 job: 1 23

june 6, 2002D.H.J. Epema/PDS/TUD5 System Model Multicluster system consisting of clusters of processors of equal speed Communication speed ratio : the ratio of the wide-area and local message transfer times ….

june 6, 2002D.H.J. Epema/PDS/TUD6 Job Components A job consists of job components that each go to a single cluster, one task per processor Distributions of job-component sizes: –Uniform: U[a,b] –Truncated and adapted geometric (favors small sizes and powers of 2): D(q) on [1,b] …. job system

june 6, 2002D.H.J. Epema/PDS/TUD7 Job Request Types (1) Ordered and unordered requests specify their job-component sizes: Ordered: Unordered: …. ?

june 6, 2002D.H.J. Epema/PDS/TUD8 Job Request Types (2) Flexible and total requests only specify the total number of processors needed: flexible: total: …. ?

june 6, 2002D.H.J. Epema/PDS/TUD9 Fitting a Job (1) It is clear when an ordered or a total request fits For an unordered request: –order components according to decreasing sizes –use First-Fit (FF) or Worst-Fit (WF) …. job system WF.… in use idle

june 6, 2002D.H.J. Epema/PDS/TUD10 Fitting a Job (2) For a flexible request: –determine minimal number of clusters needed –fill least-loaded clusters (CF) completely, or balance load (LB) (variation: LB-A) CFLB in use idle job

june 6, 2002D.H.J. Epema/PDS/TUD11 Scheduling Policies First Come First Served Fit Processors First Served: search queue for jobs that fit job queue …. system

june 6, 2002D.H.J. Epema/PDS/TUD12 Interarrival/Service Times Poisson arrival process in simulations All tasks in a job have the same service time Service-time distributions used: –Deterministic (mean 1) –Exponential (mean 1) –Hyperexponential (mean 1, coeff. of var. 3) –Derived from the DAS

june 6, 2002D.H.J. Epema/PDS/TUD13 Communication We model jobs without and with communication With communication: –tasks alternate between compute and communication phases –communication phase: all-to-all personalized communication –time for a single local synchronous message send operation: –communication speed ratios considered: 1-100

june 6, 2002D.H.J. Epema/PDS/TUD14 Single-cluster DAS Statistics service timenodes requested number of jobs mean: coeff. of var.: 1.11 mean: (62.66) coeff. of var.: 5.37

june 6, 2002D.H.J. Epema/PDS/TUD15 Performance Evaluation Parameters we vary: –job request structure –job-component-size distribution –service-time distribution –number and sizes of clusters (base case: 4x32) –placement of unordered and flexible jobs –scheduling policy –communication speed ratio –co-allocation versus no co-allocation –queueing structure (global/local) Performance metrics: –mean response time (only simulation) –maximal utilization (analysis and simulation)

june 6, 2002D.H.J. Epema/PDS/TUD16 Influence of Structure and Size response time total ordered unordered utilization distributionmeancoeff.of var. U[1,7] D(0.9) on [1,8] D(0.768)on[1,32] U[1,14] D(0.894)on[1,32]

june 6, 2002D.H.J. Epema/PDS/TUD17 Influence of Communication Speed Ratio utilization response time response time Right to left: total, flexible, unordered, ordered

june 6, 2002D.H.J. Epema/PDS/TUD18 Co-Allocation versus no Co-Alloc. (1) utilization response time flexible 2 components 4 components 1 component no communication unordered jobs job size: 4xD(0.9) on [1,8] (fits on a single cluster)

june 6, 2002D.H.J. Epema/PDS/TUD19 Co-allocation versus no Co-alloc. (2) utilization response time LB-A, ratio 5 LB-A, ratio 50 no co-allocation, FF communication flexible jobs job size: 4xD(0.9) on [1,8]

june 6, 2002D.H.J. Epema/PDS/TUD20 An Application on the DAS (1) Solves the Poisson equation with a red-black Gauss-Seidel scheme Measurements on the DAS (times in ms): Time for diffusing local errors and computing the global error: 14 ms Configuration on unit square number of iterations updateexchange borders, single cluster exchange borders, multicluster 4x x

june 6, 2002D.H.J. Epema/PDS/TUD21 An Application on the DAS (2) utilization response time Equal mix of jobs of sizes (2,2,2,2) and (4,4,4,4) total ordered

june 6, 2002D.H.J. Epema/PDS/TUD22 Maximal Utilization (1) Assume: constant backlog, ordered jobs, exponential service (no communication) Consider: the joint probability distribution of the sizes of jobs in the system Result: this distribution is the same –when the system runs for a long time –when the system is filled from the empty state Use the convolution of the job-size distribution to determine the distribution of the numbers of jobs in the system Compute the maximal utilization

june 6, 2002D.H.J. Epema/PDS/TUD23 Maximal Utilization (2) We have an approximation for the maximal utilization for unordered jobs with WF We use simulations to validate this approximation Capacity loss (1-max. util.) for 4 clusters of size 32, uniform job-component sizes: abordered (exact) unordered (approx.) unordered (simul.) total (exact)