Presentation is loading. Please wait.

Presentation is loading. Please wait.

All-in-one graphical tool for the management of DIET a GridRPC middleware Eddy Caron, Frédéric Desprez, David Loureiro, Benjamin Depardon, Aurélien Cedeyn.

Similar presentations


Presentation on theme: "All-in-one graphical tool for the management of DIET a GridRPC middleware Eddy Caron, Frédéric Desprez, David Loureiro, Benjamin Depardon, Aurélien Cedeyn."— Presentation transcript:

1 All-in-one graphical tool for the management of DIET a GridRPC middleware Eddy Caron, Frédéric Desprez, David Loureiro, Benjamin Depardon, Aurélien Cedeyn LIP ENS Lyon, INRIA Rhône-Alpes, GRAAL project David.Loureiro@ens-lyon.fr

2 DIET Dashboard -- Motivations DIET hierarchies are designed to be deployed on grids/clusters of nodes Users need several and complex tools for the management of resources and client/server applications. A distributed middleware deployment is not easily manageable: If you deal with a large amount of nodes If you manage your resource reservations by hand If you need to write each configuration file of your middleware by hand If you need to launch each component of your middleware by hand

3 DIET Experiment workflow Resources Reservation DIET Platform Design Resources Mapping DIET Platform Generation DIET Platform Deployment Experiment Workflow Design Workflow Execution Results Recuperation

4 DIET Dashboard Extensible set of tools for the DIET community Based on seven tools:  DIET designer  DIET Mapping tool  DIET Deployment tool  XML GoDIETGenerator  Workflow designer  Workflow log service  DIET resource tool aka GRUDU DIET Tools Workflow Tools Grid Tools The DIET DashBoard is written in Java Provides to the DIET end-user, friendly-user interfaces to design, deploy and monitor the execution of client/server applications Also provides to the grid user tools for the allocation and monitoring resources on Grid'5000

5 DIET Resources Tool To manage grid resources used by the application Currently only used for Grid'5000 platform. Provides several operations to facilitate the access to this platform. Main goals: Displaying the status of the platform (grid/site/job level) Resources allocation through the use of OAR (v1 & v2 are supported) Resources monitoring through the use of the Ganglia (site/job nodes) Deployment management with a GUI for KaDeploy (multiple sites at a time) A terminal emulator (access frontale/site frontale/job main node connection) A file transfer manager (local/remote and synchronization features)

6 Grid'5000 Reservation Utility for Deployment Usage Web: http://grudu.gforge.inria.fr

7 GRUDU – Resources Allocation We are able to reserve ressources (OAR1 & OAR2)  Time parameters, date and reservation walltime  Queue  OARGrid sub behaviour/ Script to launch

8 GRUDU – Monitoring We are able to monitor the status of the grid/site/a job. We are able to get instantaneous/historical data with Ganglia

9 GRUDU - KaDeploy/JFTP GUI for KaDeploy jobs deployment File Transfert interface (local remote/rsync on Grid'5000)

10 DIET Designer/Mapping - Allows the user to design graphically a DIET hierarchy. - Only the application characteristics are defined (agent type: Master or Local and SeD parameters). - Allow the user to map DIET components ont the allocated Grid'5000 resources - The mapping is done in an interactive way by selecting the site then DIET agents or SeD.

11 XML GoDIET Generator To help the end-user creating hierarchies from existing frameworks based on the reserved resources The user will be asked to choose an experience (a framework of hierarchy) from the one available (personal hierarchies can be added) For each hierarchy the user will have to specify the required elements involved (MA/LA/SeD) Finally a platform will be generated and the user can deploy it through the DIET deployment tool

12 DIET Deployment Tool This tool is a graphical interface to GoDIET It provides the basic GoDIET operations: open, launch, stop and also a monitoring mechanism to check if DIET application elements still alive (three states are available: unknown, dead and running)

13 Workflow Designer/Log Service Compose services to get a complete application workflow in a drag’&’drop fashion Monitor workflows execution by displaying the DAG nodes of each workflow and their states.

14 Monitoring DIET experiment Online/Offline experiment monitoring DIET Data Management monitoring DIET Services use/selection/etc monitoring DIET Platform performance evaluation

15 Prototype Cosmo – DIET : Gantt

16 Prototype Cosmo – DIET : impact DIET

17 Large scale experiment: the DIET/Ramses case Validation of the DIET architecture at large scale over different administrative domains in the framework of the LEGO project (ANR CICG05-11) Grid’5000  Goal : Launch the maximum of Ramses execution (Grid based Hydro solver application developed at the DAPNIA/CEA for cosmological simulations)  Stress DIET over a large number of machine and in a large period of time  But also stress Grid'5000...  KaDeploy image with DIET and all the mandatory tools  12 clusters on 7 sites : 979 machines for 48 hours  1 MA, 12 LA, 29 SeDs  1824 processors dedicated to Ramses

18 Large scale experiment on Grid’5000: Requests submitted via DIET 1824 processors dedicated to Ramses 59 simulations (33 complete, 26 partial) Equivalent to 368 days on 1 processor GalaxyMaker & MoMaF: Web interface for submission of parameter sweep jobs Workload modelisation for scheduling predictions Workflow / data management

19 On Going Work Deploy DIET accross many sites Improve Data management Write a plug-in scheduler

20 Workflow

21 Modèle temps exécution GalaxyMaker

22 Modèle taille outputs GalaxyMaker

23 Modèle temps exécution MoMaF

24 Large scale experiment: the DIET/Ramses case Use of the DIET DashBoard:  20 seconds for the reservation of 979 nodes  25 minutes for the deployment with KaDeploy  23 seconds for the deployment of the DIET platform Main difficulties:  Disk space on NFS storage  OmniORB not available on Itanium2  Sites not available for deployment

25 Conclusion  DIET is a grid middleware designed for scheduling application tasks with a hierarchical architecture  The DIET DashBoard provides to DIET users:  A full-featured framework for experiments  An easy way to manage Grid'5000  The DIET Resources Tool provides to the Grid'5000 community a powerful tool dedicated to the interaction with the grid:  Monitoring  Reservation  Deployment  etc.  The DIET Resources tool exists in a stand alone version known as GRUDU dedicated to the Grid'5000 community

26 Future Work  Web-based version of the DIET DashBoard Used on the Decrypthon project: WebBoard  GUI for client/server applications design  DIET Data Management interface  Support of other Batch Schedulers (such as LoadLeveler or SGE)  Plugin based architecture ‏

27 Introduction - Context Climate evolution Global Warming Effect Two problems Long term evolution (need super-computer) Climate model parametrization (need numerous simulations)

28 Introduction - Motivations The project aims to study the parametrization sensitivity of a climate model A better understanding of parametrization will provide better simulations Once good parameters have been found, we will have the possibility to simulate the climate further in the future Need to perform numerous independent simulations The focus of this talk is the minimization of the execution time of these independent simulations

29 Outline Introduction Framework Ocean-Atmosphere Application Grid’5000 Diet Scheduling Strategies Experimental Results Conclusion & Future Work

30 Ocean-Atmosphere scenarios Climate simulation over the 21st century An experiment is composed of several scenarios A scenario is a chain of 1800 monthly simulations (150 years) Input of (n+1)th monthly simulation is the output of the nth one The scenarios are independent. Month 1 Month 2 Month 1799 Month 1800 ….. A scenario

31 Ocean-Atmosphere running 1 >1 20 0 160 A monthly simulation Post-processing task Main-task Parallel task (4 to 11 processors)

32 E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 Software environment GridRPC compliant for interoperability Client/Agent/Server paradigm Middleware with a hierarchical architecture designed to provide scalability Resource finding for the client Plug-in scheduler with hierarchical behavior Data management with replication Easy to deploy Easy to use

33 Platform environment: Grid’5000 Congregation of resources Composed of numerous clusters distributed over 9 sites all over France All nodes of a cluster have access to a NFS to store data Possibility to deploy its own system image on nodes Well suited to execute our independent scenarios

34 Outline Introduction Framework Scheduling Strategies Cluster Level Scheduling Grid Level Scheduling Experimental Results Conclusion & Future Work

35 Scheduling Strategies We use Grid’5000 as an experiment platform The platform is composed of several heterogeneous clusters Each cluster is homogeneous internally We use Diet to perform the scheduling Cluster 1 Client Cluster 2 Cluster 3 Send request Performance prediction (makespan) Distribution of scenarios Computation Experiment end Diet hierarchy

36 Cluster Level Scheduling (1/5) We consider an homogeneous platform composed of R resources (processors) We have NS scenarios Execution times take into account the time to get the data, make the computation and store the results T[i] is the time needed to execute a main-task on i processors All post-processing tasks are left at the end of the execution because of main-tasks good speedup If there are too much resources, the post-processing tasks will be executed at the same time

37 Cluster Level Scheduling (2/5) Cluster name Processor type Node number Core number Memory size Capricor ne AMD Opt. 246 2.0 Ghz 561122 GB Sagitaire AMD Opt. 250 2.4 Ghz 701402 GB Chicon AMD Opt. 285 2.6 Ghz 261044 GB Chti AMD Opt. 252 2.6 Ghz 20404 GB Grelon Intel Xeon 1.6 Ghz 1204802 GB Clusters are heterogeneous T[i] on 5 clusters of Grid’5000

38 Cluster Level Scheduling (3/5) We need to find the grouping of processors leading to the best makespan Find ni (number of groups with i resources) such that: The portion of code executed at each time step is maximized We have no more than NS groups and use less that R resources

39 E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 Cluster Level Scheduling (4/5) Cluster c Re so urc es (pr oc ess ors ) Ti me Scenario 1 Scenario 2 Scenario 3 Scenario 4 Example of grouping: 3 groups (4, 4 and 7 processors) Fairness among scenarios: when a group becomes idle, the task of the less advanced scenario is scheduled

40 E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 Cluster Level Scheduling (5/5) Every resource is taken into account Makespan is strictly decreasing when adding more resources The decrease rate of the makespan diminishes

41 E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 Grid Level Scheduling (1/2) Aim: reduce makespan by distributing the NS scenarios among nbClusters clusters When performance prediction is performed, the makespans from 1 to NS scenarios on cluster C are send to the client (performance[C]) Algorithm complexity: O(NS × nbClusters) One experiment: NS = 10 and nbClusters is small on Grid’5000 (≈20) makespan = 0 initialize number of scenarios on each cluster to 0 while there are scenarios to schedule do find cluster C where makespan increases the less increment NSC the number of scenarios on C update makespan with performance[C][NSC] endwhile send scenarios to SeDs

42 E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 Grid Level Scheduling (2/2) Comparison with Round Robin on 5 clusters Maximum speedup (25%): equal to the speedup when executing one main-task on the slowest and the fastest cluster With a higher load, the algorithm behaves better with a few resources Convergence on gains Gain of 25% ≈ 230h on a ≈ 822h long experiment

43 Outline Introduction Framework Scheduling Strategies Experimental Results Conclusion & Future Work

44 E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 Experimental Results (1/2) Because of technical limitations, no more than one scenario can be executed on a single node All nodes on Grid’5000 are bi-cores or quad-cores New constraint: the size of a group has to be divisible by the number of cores per node of the cluster Possibility to make groups of 12 processors to reduce loss Loss due to this technical difficulty: Few resources: loss between 1% and 13% More resources: loss between 1% and 5% Lot of resources: no more loss

45 Experimental Results (2/2) Accuracy of simulations on 7 experiments Bad with all post-processing tasks at the end (20.8% difference) Good if we consider only main-tasks (6.3% difference) Keeping a resource to execute post-processing tasks during experiment suppresses the simulations inaccuracy Positive difference means the real execution was slower than expected

46 Outline Introduction Framework Scheduling Strategies Experimental Results Conclusion & Future Work

47 E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 Conclusion Improve performances in a climate prediction application Modelization of the application Proof of usage of Grid’5000 and Diet Scheduling on real application Scheduling done at two levels Groups of processors at cluster level Distribution of scenarios at grid level Real implementation suffered from technical limitations Simulations are quite precise but we need to keep one resource for post-processing tasks

48 Future Work Extension of this work to generic independent chains of Dags composed of moldable tasks Resource reservation is done manually, so we want to use tools such as SimGrid/SimBatch to determine how many resources to reserve and then, use the SeDBatch to make the reservation automatically


Download ppt "All-in-one graphical tool for the management of DIET a GridRPC middleware Eddy Caron, Frédéric Desprez, David Loureiro, Benjamin Depardon, Aurélien Cedeyn."

Similar presentations


Ads by Google