Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Resource Brokering and Cost-based Scheduling With Nimrod-G and Gridbus Case Studies Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS)

Similar presentations


Presentation on theme: "Grid Resource Brokering and Cost-based Scheduling With Nimrod-G and Gridbus Case Studies Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS)"— Presentation transcript:

1 Grid Resource Brokering and Cost-based Scheduling With Nimrod-G and Gridbus Case Studies Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS) Lab. The University of Melbourne Melbourne, Australia www.cloudbus.org

2 2 Agenda Introduction to Grid Scheduling Application Models and Deployment Approaches Economy-based “ Computational ” Grid Scheduling Nimrod-G -- Grid Resource Broker Scheduling Algorithms and Experiments on World Wide Grid testbed Economy-based “ Data Intensive ” Grid Scheduling Gridbus -- Grid Service Broker Scheduling Algorithms and Experiments on Australian Belle Data Grid testbed SchedulingEconomics Grid Grid Economy

3 Grid Scheduling: Introduction

4 4 Grid Resources and Scheduling 2100 Single CPU (Time Shared Allocation) SMP (Time Shared Allocation) Clusters (Space Shared Allocation) Grid Resource Broker User Application Grid Information Service Local Resource Manager

5 5 Grid Scheduling Grid scheduling: Resources distributed over multiple administrative domains Selecting 1 or more suitable resources (may involve co-scheduling) Assign tasks to selected resources and monitoring execution. Grid schedulers are Global Schedulers They have no ownership or control over resources Jobs are submitted to local resource managers (LRMs) as user LRMs take care of actual execution of jobs

6 6 Example Grid Schedulers Nimrod-G - Monash University Computational Grid & Economic-based Condor-G – University of Wisconsin Computational Grid & System-centric AppLeS – University of California@San Diego Computational Grid & System centric Gridbus Broker – University of Melbourne Data Grid & Economic based

7 7 Key Steps in Grid Scheduling 1. Authorization Filtering 3. Min. Requirement Filtering 2. Application Definition Phase I-Resource Discovery 5. System Selection 4. Information Gathering Phase II - Resource Selection 7. Job Submission 6. Advance Reservation 9. Monitoring Progress 8. Preparation Tasks 11. Clean-up Tasks 10 Job Completion Phase III- Job Execution Source: J. Schopf, Ten Actions When SuperScheduling, OGF Document, 2003.Ten Actions When SuperScheduling

8 8 Movement of Jobs: Between the Scheduler and a Resource Push Model Manager pushes jobs from Queue to a resource. Used in Clusters, Grids Pull Model P2P Agent request for a job for processing from job-pool Commonly used in P2P systems such as Alchemi and SETI@Home Hybrid Model (both push and pull) Broker deploys an agent on resources, which pulls jobs from a resource. May use in Grid (e.g., Nimrod-G system). Broker also pulls data from user host or separate data host (distributed datasets) (e.g., Gridbus Broker).

9 9 Example Systems Job Dispatch Architecture PushPullHybrid CentralisedPBS, SGE, Condor, Alchemi (when in dedicated mode) Windmill from CERN (used in Physics ATLAS experiment) Condor (as it supports non- dedicated owner specified policies) DecentralisedNimrod-G, AppLeS, Condor-G, Gridbus Broker Alchemi, SETI@Home, UnitedDevice, P2P Systems, Aneka Nimrod-G (push Grid Agent, which pulls jobs)

10 Application Models and their Deployment on Global Grids

11 11 Grid Applications and Parametric Computing Bioinformatics: Drug Design / Protein Modelling Bioinformatics: Drug Design / Protein Modelling Sensitivity experiments on smog formation Natural Language Engineering Ecological Modelling: Control Strategies for Cattle Tick Electronic CAD: Field Programmable Gate Arrays Computer Graphics: Ray Tracing High Energy Physics: Searching for Rare Events Finance: Investment Risk Analysis VLSI Design: SPICE Simulations Aerospace: Wing Design Network Simulation Automobile: Crash Simulation Data Mining Civil Engineering: Building Design astrophysics

12 12 How to Construct and Deploy Applications on Global Grids ? Three Options/Solutions: Manual Scheduling - Use pure Globus commands Application Level Scheduling - Build your own Distributed App & Scheduler Application Independent Scheduling – Grid Brokers Decouple App Construction from Scheduling Perform parameter sweep (bag of tasks) (utilising distributed resources) within “ T ” hours or early and cost not exceeding $M.

13 13 Using Pure Globus commands Do all yourself! (manually) Total Cost:$???

14 14 Build Distributed Application & Application-Level Scheduler Build App and scheduler case by case basis E.g., MPI ApproachTotal Cost:$???

15 15 Compose and Deploy using Brokers – Nimrod-G and Gridbus Approach Compose Apps and Submit to the Broker Define QoS requirements Aggregate View Compose, Submit & Play!

16 The Nimrod-G Grid Resource Broker and Economy-based Grid Scheduling [Buyya, Abramson, Giddy, 1999-2001] Deadline and Budget Constrained Algorithms for Scheduling Applications on “ Computational ” Grids

17 17 A resource broker (implemented using Python) for managing, steering, and executing task farming (parameter sweep) applications on global Grids. It allows dynamic leasing of resources at runtime based on their quality, cost, and availability, and users ’ QoS requirements (deadline, budget, etc.) Key Features A declarative parameter programming language A single window to manage & control experiment Persistent and Programmable Task Farming Engine Resource Discovery Resource Trading (User-Level) Scheduling & Predications Generic Dispatcher & Grid Agents Transportation of data & results Steering & data management Accounting Nimrod-G : A Grid Resource Broker

18 18 A Glance at Nimrod-G Broker Grid Middleware Nimrod/G Client Grid Information Server(s) Schedule Advisor Trading Manager Nimrod/G Engine Grid Store Grid Explorer GE GIS TM TS RM & TS Grid Dispatcher RM: Local Resource Manager, TS: Trade Server Globus, Legion, Condor, etc. G G C L Globus enabled node. Legion enabled node. G L Condor enabled node. RM & TS CL See HPCAsia 2000 paper! $ $ $

19 19 GlobusLegion Fabric Nimrod-G Broker Nimrod-G Clients P-Tools (GUI/Scripting) (parameter_modeling) Legacy Applications P2PGTS Farming Engine Dispatcher & Actuators Schedule Advisor Trading Manager Grid Explorer Customised Apps (Active Sheet) Monitoring and Steering Portals Algorithm1 AlgorithmN Middleware... ComputersStorageNetworksInstrumentsLocal Schedulers G-Bank... Agents Resources Programmable Entities Management JobsTasks... AgentSchedulerJobServer PC/WS/ClustersRadio TelescopeCondor/LL/NQS... Database Meta-Scheduler Nimrod/G Grid Broker Architecture Channels... Database CondorGMD IP hourglass! Condor-AGlobus-ALegion-AP2P-A

20 20 A Nimrod/G Monitor CostDeadline Legion hosts Globus Hosts Bezek is in both Globus and Legion Domains

21 21 User Requirements: Deadline/Budget

22 22 Nimrod/G Interactions Grid Info Server Process Server User Process File access File Server Grid Node Nimrod Agent Compute Node User Node Grid Dispatcher Grid Trade Server Grid Scheduler Local Resource Manager Nimrod-G Grid Broker Task Farming Engine Grid Tools And Applications Do this in 30 min. for $10?

23 23 Discover Resources Distribute Jobs Establish Rates Meet requirements ? Remaining Jobs, Deadline, & Budget ? Evaluate & Reschedule Discover More Resources Adaptive Scheduling Steps Compose & Schedule

24 24 Deadline and Budget Constrained Scheduling Algorithms Algorithm/ Strategy Execution Time (Deadline, D) Execution Cost (Budget, B) Cost OptLimited by DMinimize Cost-Time OptMinimize when possible Minimize Time OptMinimizeLimited by B Conservative-Time Opt MinimizeLimited by B, but all unprocessed jobs have guaranteed minimum budget

25 25 Deadline and Budget-based Cost Minimization Scheduling 1. Sort resources by increasing cost. 2. For each resource in order, assign as many jobs as possible to the resource, without exceeding the deadline. 3. Repeat all steps until all jobs are processed.

26 Scheduling Algorithms and Experiments

27 27 World Wide Grid (WWG) WW Grid Globus+Legion GRACE_TS Australia Melbourne U. : Cluster VPAC: Alpha Solaris WS Nimrod-G+Gridbus Globus + GRACE_TS Europe ZIB: T3E/Onyx AEI: Onyx Paderborn: HPCLine Lecce: Compaq SC CNR: Cluster Calabria: Cluster CERN: Cluster CUNI/CZ: Onyx Pozman: SGI/SP2 Vrije U: Cluster Cardiff: Sun E6500 Portsmouth: Linux PC Manchester: O3K Globus + GRACE_TS Asia Tokyo I-Tech.: Ultra WS AIST, Japan: Solaris Cluster Kasetsart, Thai: Cluster NUS, Singapore: O2K Globus/Legion GRACE_TS North America ANL: SGI/Sun/SP2 USC-ISI: SGI UVa: Linux Cluster UD: Linux cluster UTK: Linux cluster UCSD: Linux PCs BU: SGI IRIX Internet Globus + GRACE_TS South America Chile: Cluster WW Grid

28 28 Application Composition Using Nimrod Parameter Specification Language #Parameters Declaration parameter X integer range from 1 to 165 step 1; parameter Y integer default 5; #Task Definition task main #Copy necessary executables depending on node type copy calc.$OS node:calc #Execute program with parameter values on remote node node:execute./calc $X $Y #Copy results file to use home node with jobname as extension copy node:output./output.$jobname endtask  calc 1 5  output.j1  calc 2 5  output.j2  calc 3 5  output.j3  …  calc 165 5  output.j165

29 29 Experiment Setup Workload: 165 jobs, each need 5 minute of CPU time Deadline: 2 hrs. and budget: 396000 G$ Strategies: 1. Minimise cost 2. Minimise time Execution: Optimise Cost: 115200 (G$) (finished in 2hrs.) Optimise Time: 237000 (G$) (finished in 1.25 hr.) In this experiment: Time-optimised scheduling run costs double that of Cost-optimised. Users can now trade-off between Time Vs. Cost.

30 30 Resources Selected & Price/CPU-sec. Resource & Location Grid services & Fabric Cost/CPU sec.or unit No. of Jobs Executed Time_OptCost_Opt. Linux Cluster-Monash, Melbourne, Australia Globus, GTS, Condor 264153 Linux-Prosecco-CNR, Pisa, Italy Globus, GTS, Fork 371 Linux-Barbera-CNR, Pisa, Italy Globus, GTS, Fork 461 Solaris/Ultas2 TITech, Tokyo, Japan Globus, GTS, Fork 391 SGI-ISI, LA, US Globus, GTS, Fork 8375 Sun-ANL, Chicago,US Globus, GTS, Fork 7424 Total Experiment Cost (G$)237000115200 Time to Complete Exp. (Min.)70119

31 31 Deadline and Budget Constraint (DBC) Time Minimization Scheduling 1. For each resource, calculate the next completion time for an assigned job, taking into account previously assigned jobs. 2. Sort resources by next completion time. 3. Assign one job to the first resource for which the cost per job is less than the remaining budget per job. 4. Repeat all steps until all jobs are processed. (This is performed periodically or at each scheduling-event.)

32 32 Resource Scheduling for DBC Time Optimization

33 33 Resource Scheduling for DBC Cost Optimization

34 34 Nimrod-G Summary One of the “ first ” and most successful Grid Resource Brokers world-wide! Project continues to be active and being used in many e-Science applications. For recent developments, please see: http://messagelab.monash.edu.au/Nimrod

35 Gridbus Broker “ Distributed ” Data-Intensive Application Scheduling

36 36 A Java-based resource broker for Data Grids (Nimrod-G focused on Computational Grids). It uses computational economy paradigm for optimal selection of computational and data services depending on their quality, cost, and availability, and users ’ QoS requirements (deadline, budget, & T/C optimisation) Key Features A single window to manage & control experiment Programmable Task Farming Engine Resource Discovery and Resource Trading Optimal Data Source Discovery Scheduling & Predications Generic Dispatcher & Grid Agents Transportation of data & sharing of results Accounting Gridbus Grid Service Broker (GSB)

37 37 Core Middleware Gridbus User Console/Portal/Application Interface Grid Info Server Schedule Advisor Trading Manager Gridbus Farming Engine Record Keeper Grid Explorer GE GIS, NWS TM TS RM & TS Grid Dispatcher G G C U Globus enabled node. A L Data Catalog Data Node Amazon EC2/S3 Cloud. $ $ $ App, T, $, Optimization Preference workload Gridbus Broker

38 38 Gridbus Broker: Separating “ applications ” from “ different ” remote service access enablers and schedulers Aneka AMI Amazon EC2Data Store Access Technology Grid FTP SRB -PBS -Condor -SGE Globus Job manager fork()batch() Gridbus agent Data Catalog -PBS -Condor -SGE -XGrid SSH fork() batch() Gridbus agent Single-sign on security Home Node/Portal Gridbus Broker fork() batch() -PBS -Condor -SGE -Aneka -XGrid Application Development Interface Scheduling Interfaces Alogorithm1 AlogorithmN Plugin Actuators

39 39 Gridbus Services for eScience applications Application Development Environment: XML-based language for composition of task farming (legacy) applications as parameter sweep applications. Task Farming APIs for new applications. Web APIs (e.g., Portlets) for Grid portal development. Threads-based Programming Interface Workflow interface and Gridbus-enabled workflow engine. … Grid Superscalar – in cooperation with BSC/UPC Resource Allocation and Scheduling Dynamic discovery of optional computational and data nodes that meet user QoS requirements. Hide L ow-Level Grid Middleware interfaces Globus (v2, v4), SRB, Aneka, Unicore, and ssh-based access to local/remote resources managed by XGrid, PBS, Condor, SGE.

40 40 Drug Design Made Easy! Click Here for Demo

41 41 s

42 42 Case Study: High Energy Physics and Data Grid The Belle Experiment KEK B-Factory, Japan Investigating fundamental violation of symmetry in nature (Charge Parity) which may help explain “ why do we have more antimatter in the universe OR imbalance of matter and antimatter in the universe? ”. Collaboration 1000 people, 50 institutes 100 ’ s TB data currently

43 43 Case Study: Event Simulation and Analysis B0->D*+D*-Ks Simulation and Analysis Package - Belle Analysis Software Framework (BASF) Experiment in 2 parts – Generation of Simulated Data and Analysis of the distributed data  Analyzed 100 data files (30MB each) that were distributed among the five nodes within Australian Belle DataGrid platform.

44 44 Australian Belle Data Grid Testbed VPAC Melbourne

45 45 Belle Data Grid (GSP CPU Service Price: G$/sec) NA G$4 Data node G$6 VPAC Melbourne G$2

46 46 Belle Data Grid (Bandwidth Price: G$/MB) NA G$4 Data node G$6 VPAC Melbourne G$2 34 31 38 31 30 33 36 32

47 47 Deploying Application Scenario A data grid scenario with 100 jobs and each accessing remote data of ~30MB Deadline: 3hrs. Budget: G$ 60K Scheduling Optimisation Scenario: Minimise Time Minimise Cost Results:

48 48 Time Minimization in Data Grids 0 10 20 30 40 50 60 70 80 123456789101112131415161718192021222324252627282930313233343536373839404142 Time (in mins.) Number of jobs completed fleagle.ph.unimelb.edu.aubelle.anu.edu.aubelle.physics.usyd.edu.aubrecca-2.vpac.org

49 49 Results : Cost Minimization in Data Grids 0 10 20 30 40 50 60 70 80 90 100 13579111315171921232527293133353739414345474951535557596163 Time(in mins.) Number of jobs completed fleagle.ph.unimelb.edu.aubelle.anu.edu.aubelle.physics.usyd.edu.aubrecca-2.vpac.org

50 50 Observation Organization Node detailsCost (in G$/CPU- sec) Total Jobs Executed TimeCost CS,UniMelbbelle.cs.mu.oz.au 4 CPU, 2GB RAM, 40 GB HD, Linux N.A. (Not used as a compute resource) -- Physics, UniMelbfleagle.ph.unimelb.edu.au 1 CPU, 512 MB RAM, 40 GB HD, Linux 23 94 CS, University of Adelaide belle.cs.adelaide.edu.au 4 CPU (only 1 available), 2GB RAM, 40 GB HD, Linux N.A. (Not used as a compute resource) -- ANU, Canberrabelle.anu.edu.au 4 CPU, 2GB RAM, 40 GB HD, Linux 42 2 Dept of Physics, USyd belle.physics.usyd.edu.au 4 CPU (only 1 available), 2GB RAM, 40 GB HD, Linux 472 2 VPAC, Melbourne brecca-2.vpac.org 180 node cluster (only head node used), Linux 623 2

51 51 Summary and Conclusion Application scheduling on global Grids is a complex undertaking as systems need to be adaptive, scalable, competitive, …, and driven by QoS. Nimrod-G is one of the popular Grid Resource Broker for scheduling parameter sweep applications on Global Grids Scheduling experiments on the World Wide Grid demonstrate Nimrod-G broker ability to dynamically lease services at runtime based on their quality, cost, and availability depending on consumers QoS requirements. Easy to use tools for creating Grid applications are essential for success of Grid Computing.

52 52 References Rajkumar Buyya, David Abramson, Jonathan Giddy, Nimrod/G: An Architecture for a Resource Management and Scheduling System in a Global Computational Grid, Proceedings of the 4th International Conference on High Performance Computing in Asia- Pacific Region (HPC Asia 2000), Beijing, China. IEEE Computer Society Press, USA, 2000.Nimrod/G: An Architecture for a Resource Management and Scheduling System in a Global Computational Grid David Abramson, Rajkumar Buyya, and Jonathan Giddy, A Computational Economy for Grid Computing and its Implementation in the Nimrod-G Resource Broker, Future Generation Computer Systems (FGCS) Journal, Volume 18, Issue 8, Pages: 1061-1074, Elsevier Science, The Netherlands, October 2002.A Computational Economy for Grid Computing and its Implementation in the Nimrod-G Resource Broker Jennifer Schopf, Ten Actions When SuperScheduling, Global Grid Forum Document GFD.04, 2003.Ten Actions When SuperScheduling Srikumar Venugopal, Rajkumar Buyya and Lyle Winton, A Grid Service Broker for Scheduling e-Science Applications on Global Data Grids, Concurrency and Computation: Practice and Experience, Volume 18, Issue 6, Pages: 685-699, Wiley Press, New York, USA, May 2006.A Grid Service Broker for Scheduling e-Science Applications on Global Data Grids


Download ppt "Grid Resource Brokering and Cost-based Scheduling With Nimrod-G and Gridbus Case Studies Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS)"

Similar presentations


Ads by Google