Presentation is loading. Please wait.

Presentation is loading. Please wait.

From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon.

Similar presentations


Presentation on theme: "From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon."— Presentation transcript:

1 From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon Giddy, DSTC Rok Sosic, Active Tools Andrew Lewis, QPSF Ian Foster, ANL Rajkumar Buyya, Monash Tom Peachy, Monash

2 2 ©David Abramson Applications Nimrod/G ‘98 - DSTC Nimrod/O ‘97 - ‘99 ARC Research Model Nimrod ‘94 - ‘98 ActiveSheets ‘00 - DSTC Commercialisation (‘97 -)

3 3 ©David Abramson Parametrised Modelling Killer App for the Grid?  Study the behaviour of some of the output variables against a range of different input scenarios.  Computations are uncoupled (file transfer)  Allows real time analysis for many applications  More realistic simulations  Study the behaviour of some of the output variables against a range of different input scenarios.  Computations are uncoupled (file transfer)  Allows real time analysis for many applications  More realistic simulations

4 4 ©David Abramson Working with Small Clusters  Nimrod (1994 - ) – DSTC Funded project – Designed for Department level clusters – Proof of concept  Clustor (www.activetools.com) (1997 - ) – Commercial version of Nimrod – Re-engineered  Features – Workstation Orientation – Access to idle workstations – Random allocation policy – Password security

5 5 ©David Abramson Execution Architecture Input Files Substitution Output Files Root Machine ComputationalNodes

6 Clustor Tools

7 7 ©David Abramson Physical Model f f Time to crack in this position (Courtesy Prof Rhys Jones, Dept Mechanical Engineering, Monash University) Clustor by example

8 8 ©David Abramson Dispatch cycle using Clustor...

9 9 ©David Abramson Sample Applications of Clustor Bioinformatics: Protein Modelling Bioinformatics: Protein Modelling Sensitivity experiments on smog formation Combinatorial Optimization: Meta-heuristic parameter estimation Ecological Modelling: Control Strategies for Cattle Tick Electronic CAD: Field Programmable Gate Arrays Computer Graphics: Ray Tracing High Energy Physics: Searching for Rare Events Physics: Laser-Atom Collisions VLSI Design: SPICE Simulations Fuzzy Logic Parameter setting ATM Network Design

10 10 ©David Abramson SMOG Sensitivity Experiments Control ROC Control NOx $$$

11 11 ©David Abramson Physics - Laser Interaction

12 12 ©David Abramson Electronic CAD

13 13 ©David Abramson Dr Dinelli Mather Monash University & MacFarlane Burnett Public Health Policy Health Standards Lew Kotler Australian Radiation Protection and Nuclear Safety Agency Airframe Simulation Dr Shane Dunn, AMRL, DSTO Network Simulation Dr Mahbun Hassan, Monash Current Application Drivers

14 14 ©David Abramson Evolution of the Global Grid GlobalClusters Desktop DepartmentClusters SharedSupercomputer Enterprise-WideClusters

15 15 ©David Abramson The Nimrod Vision... Can we make it 10% smaller? We need the answer by 5 o’clock

16 16 ©David Abramson Source: www.globus.org & updated Towards Grid Computing…. The Gusto Testbed

17 17 ©David Abramson What does the Grid have to offer? “Dependable, consistent, pervasive access to [high-end] resources”  Dependable: Can provide performance and functionality guarantees  Consistent: Uniform interfaces to a wide variety of resources  Pervasive: Ability to “plug in” from anywhere Source: www.globus.org

18 18 ©David Abramson Challenges for the Global Grid Security Resource Allocation & Scheduling Data locality Network Management System Management Resource Location Uniform Access

19 19 ©David Abramson Nimrod on Enterprise Wide Networks and the Global Grid  Manual resource location – Static file of machine names  No resource Scheduling – First come first serve  No cost Model – All machines/users cost alike  Homogeneous Access Mechanism

20 20 ©David Abramson Requirements  Users & system managers want to know – Where it will run – When it will run – How much it will cost – That access is secure – Will support a range of access mechanisms

21 21 ©David Abramson Source: www.globus.org The Globus Project  Basic research in grid-related technologies – Resource management, QoS, networking, storage, security, adaptation, policy, etc.  Development of Globus toolkit – Core services for grid-enabled tools & applns  Construction of large grid testbed: GUSTO – Largest grid testbed in terms of sites & apps  Application experiments – Tele-immersion, distributed computing, etc.

22 22 ©David Abramson Layered Globus Architecture Applications Local Services LSF CondorMPI NQEEasy TCP SolarisIrixAIX UDP High-level Services and Tools DUROCglobusrunMPI Nimrod/G MPI-IOCC++ GlobusViewTestbed Status Core Services Metacomputing Directory Service GRAM Globus Security Interface Heartbeat Monitor Nexus GloperfGASS Source: www.globus.org

23 23 ©David Abramson Some issues for Nimrod/G

24 24 ©David Abramson Resource Location  Need to locate suitable machines for an experiment – Speed – Number of processors – Cost – Availability – User account  Available resources will vary across experiment  Supported through Directory Server (Globus MDS)

25 25 ©David Abramson Resource Scheduling  User view – solve problem in minimum time  System – Spread load across machines  Soft real time problem through deadlines – Complete by deadline – Unreliable resource provision – Machine load may change at any time – Multiple machine queues

26 26 ©David Abramson Resource Scheduling...  Need to establish rate at which a machine can consume jobs  Use deadline as metric for machine performance  Move jobs to machines that are performing well  Remove jobs from machines that are falling behind

27 27 ©David Abramson Computational Economy  Resource selection on based real money and market based  A large number of sellers and buyers (resources may be dedicated/shared)  Negotiation: tenders/bids and select those offers meet the requirement  Trading and Advance Resource Reservation  Schedule computations on those resources that meet all requirements

28 28 ©David Abramson Cost Model  Without cost ANY shared system becomes un- managable  Charge users more for remote facilities than their own  Choose cheaper resources before more expensive ones  Cost units may be – Dollars – Shares in global facility – Stored in bank

29 29 ©David Abramson Cost Model...  Non-uniform costing  Encourages use of local resources first  Real accounting system can control machine usage 13 21 User 5 Machine 1 User 1 Machine 5

30 30 ©David Abramson Security  Uses Globus Security Layer  Generic Security Service API using an implementation of SSL, Secure Sockets Layer.  RSA encryption algorithm employing both public and private keys.  X509 certificate consisting of – duration of the permissions, – the RSA public key, – signature of the Certificate Authority (CA).

31 31 ©David Abramson Uniform Access  Resource Allocation Module (GRAM) provides interface to range of schemes – Fork – Queue (Easy, LoadLeveler, Condor, LSF)  Multiple pathways to same machine (if supported)  Integrated with Security scheme

32 32 ©David Abramson Nimrod/G Architecture Nimrod/G Client Grid Directory Services Schedule Advisor Resource Discovery Grid Middleware Services Dispatcher GUSTO Test Bed Parametric Engine Persistent Info.

33 33 ©David Abramson Nimrod/G Interactions MDS server Resource location Queuing System GRAM server Resource allocation (local) Additional services used implicitly: GSI (authentication & authorization) Nexus (communication) User process File access GASS server Gatekeeper node Job Wrapper Computational node Dispatcher Root node Scheduler Prmtc.. Engine

34 34 ©David Abramson A Nimrod/G Client CostDeadline AvailableMachines

35 35 ©David Abramson Nimrod/G Scheduling Algorithm Find a set of machines (MDS search) Distribute jobs from root to machines Establish job consumption rate for each machine For each machine Can we meet deadline? If not, then return some jobs to root If yes, distribute more jobs to resource If cannot meet deadline with current resource Find additional resources

36 36 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

37 37 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

38 38 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

39 39 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

40 40 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

41 41 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

42 42 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

43 43 ©David Abramson Nimrod/G Scheduling algorithm... LocateMachines DistributeJobs EstablishRates MeetDeadlines? Re-distributeJobs LocatemoreMachines

44 44 ©David Abramson Some results experiments

45 45 ©David Abramson

46 46

47 47

48 48 Optimal Design using computation - Nimrod/O  Clustor allows exploration of design scenarios – Search by enumeration  Search for local/global minima based on objective function – How do I minimise the cost of this design? – How do I maxmimize the life of this object?  Objective function evaluated by computational model – Computationally expensive  Driven by applications

49 49 ©David Abramson Application Drivers  Complex industrial design problems – Air quality – Antenna Design – Business Simulation – Mechanical Optimisation

50 50 ©David Abramson Cost function minimization  Continuous functions - gradient descent  Quasi-Newton BFGS algorithm – find gradient using finite difference approximation – line search using bound constrained, parallel method

51 51 ©David Abramson Implementation  Master - slave parallelization  Gradient-determination & line-searching – tasks queued via IBM LoadLeveler – (adapt to number of CPUs allocated by the Resource Manager)  Interfaced to existing dispatchers – Clustor – Nimrod/G

52 52 ©David Abramson Meta-heuristicSearch Meta-heuristicSearch Supercomputer or Cluster Pool ArchitectureBFGS ClustorDispatcher FunctionEvaluations JobsClustorPlanFile

53 53 ©David Abramson Ongoing research  Increased parallelism – Multi-start for better coverage – High dimensioned problems – Addition of other search algorithms – Simplex algorithm  Mixed integer problems – BFGS modified to support mixed integer – Mixed search/enumeration – Meta-heuristic based search – Adaptive Simulated Annealing (ASA)

54 54 ©David Abramson Further Information Nimrodwww.csse.monash.edu.au/~davida/nimrod.html DSTCwww.dstc.edu.au Globuswww.globus.org Activetoolswww.activetools.com Our Clusterhathor.csse.monash.edu.au


Download ppt "From PC Clusters to a Global Computational Grid David Abramson Head of School Computer Science and Software Engineering Monash University Thanks to Jon."

Similar presentations


Ads by Google