Download presentation
Presentation is loading. Please wait.
Published byJuniper Griffin Modified over 9 years ago
1
Abstractions: Programming and deploying apps. on Grids Franck Cappello INRIA* (*this is my own opinion!) CCGRID’08 - Panel
2
Grid’5000 * Application Runtime Grid or P2P Middleware Operating System Programming Environments Networking Application Experimental conditions injector Measurement tools A fully reconfigurable and controllable environment (resource “dedication”) >400 experiments in total >100 experiments on apps.
3
What are the main Distributed Apps. In your project? Application domains: Life science (mammogram comparison, protein sequencing, Gene prediction, virtual screening, conformation sampling, etc.) Physics (seismic imaging, parallel solvers, hydrogeology, Self-propelled solids, seismic tomography, etc.) Applied Mathematics (sparse matrix computation, combinatorial Optimization, parallel model checkers, PDE problem solvers, etc.) Chemistry (molecular simulation, estimation of thickness on Thin films), Industrial processes, Financial computing Main usage of Grid’5000 for these applications: Evaluate the performance of applications ported to the Grid, Test alternatives, Design new algorithms and new methods
4
What programming difficulties and abstraction opportunities ? Organizing the calculus Tolerating performance variations and Hw&Sw failures Scheduling computation & communications Implementing computing codes Synchronizing task executions Implementing global operations Selecting the communication protocols Dealing with resources (data, computers, etc.) Dealing with administration domains
5
Current infrastructures: how they mask complexity Solution 1) ask the “user” to conform to a certain abstraction of the execution platform --> developing applications following standard interfaces (HPC centers, most deployed Grids) Solution 2) ask the execution platform to conform to “users” abstractions --> users keep their apps. and environment unchanged and need a reconfiguration of the platform (Grid’5000, Amazon Elastic Compute Cloud) Solution 3) ask the user to choose from a variety of predefined execution environments
6
What are the common patterns – programming? Rule reduction (e.g. Chemical Computing) --> soon Graph of components (Data&Workflow) --> OpenWP Specific control graph controlled by data (e.g. Divide & Conquer, B&B) --> Proactive, PARADISEO Components (code coupling) --> Grid Corba component model Components with Control Graphs (Workflow) --> DagMan&Condor Global operations (MAP-Reduce) --> not aware of SPMD (MPI for Grids) --> QcGOpenMPI, MPICH-G2, etc. Client-server (Grid-RPC) --> DIET, XtremWeb, etc. Assembly languages (set of scripts) … Programming models tested on Grid’5000:
7
Example 1: Combinatorial Optimization Problems Flow-shop ( one of the hardest challenge problems in combinatorial optimization ): Schedule a set of jobs on a set of machines minimizing makespan. Exhaustive enumeration of all combinations would take several years. The challenge is thus to reduce the number of explored solutions. New Grid exact method based on the Branch-and-Bound, combining new approaches of combinatorial algorithmic, grid computing, load balancing and fault tolerance. Problem: 50 jobs on 20 machines, optimally solved for the 1st time, 1245 CPUs (peak) with 1245 CPUs (peak) Involved Grid5000 sites (6): Bordeaux, Lille, Orsay, Rennes, Sophia-Antipolis and Toulouse. The optimal solution required a wall-clock time of 25 days. Many success stories in combinatorial optimizations: one of the most promising one, in 2008: Grid’5000 was used to design and improve the algorithm (MOGO) used in the first computer victory against a professional Go player (5 Dan) on a 9x9 plate in the last Paris tournament! (it’s close to the Dan!)
8
Example2: OpenWP OpenWP: A directive based language and runtime for coarse grain distributed executions Express dependencies of computing blocs+work distribution For existing codes Uses a virtual shared memory model Run over existing workflow engines Linear Speedup Non parallel region Workflow engine overhead Negligible cost Effect of optimizations AMIBES (EADS): Mesher Module of the jCAE (CAD environment in Java)
9
Applications “deployment” on Grid’5000: Site level: –Node selection --> OAR –Node Reservation (ISOLATION) --> OAR (batch scheduler) –Reconfiguration --> Kadeploy Grid Level --> GRUDU (Grid Reservation Utility) Application configuration and launch --> Adage What are the common patterns – Deployment?
10
Deployment: Grudu (G5K Reservation Utility) Main goals : –Displaying the status of the platform –Resources allocation through the use of OAR –Resources monitoring through Ganglia –Deployment management with a GUI for KaDeploy – A terminal emulator and a file transfer manager All-in-one GUI client-side tool for the monitoring of the Grid'5000 platform.
11
ADAGE: Automatic deployment of large scale applications that need one or multiple middleware systems: MPI, CCM, JXTA, Jobs, GFarm, P2P overlays, DIET MPI Application CCM Application Resource Description Generic Application Description Control Parameters Deployment Planning Deployment Execution Application Configuration LEGO Application Application deployment Rendez-vous peers JXTA edge peers “rendez vous” peers known by one of the “rendez vous” peer X axis: time ; Y axis: “rendez vous” peer ID “rendez vous” peers known by one of the “rendez vous” peer X axis: time ; Y axis: “rendez vous” peer ID Jxta Scalability test: -Evaluation of the peerview and discovery protocols -Deployment of 1000s of Jxta peers -Run the scalability test
12
Resource Dedication: G5K VS. EGEE number of images Execution time (seconds) Data Parallelism + Pipelining Data Parallelism number of images Data Parallelism + Pipelining Data Parallelism Naive execution Execution time (seconds) 1800 3600 5400 7200 9000 10800 12600 14400 Bronze Standard method addressing the issue of medical image algo. evaluation. Application on estimation of the spatial rigid transformation between two images (convenient to align two different images of a same patient acquired separately). Complex workflow of computations on large number of data sets. Typically require 10s to 100s of 3D images pairs. 15 minutes per image pair. The method is executed with MOTEUR (workflow engine) Several degrees of parallelism are tested: only the workflow intrinsic parallelism data sets are processed concurrently services in sequential branches are pipelined data sets are processed concurrently
13
Are the patterns (applications) well supported? --> Thanks to the Node reconfiguration model, many patterns are well supported What further abstractions should be considered? --> Node configuration and deployment are still difficult and require too much effort for the users --> the Network resources should be reserved and isolated What abstractions have worked for you? --> Reservation, Isolation, Reconfiguration and Deployment What abstractions do you feel you need? --> Reservation, Configuration and Deployment issues How well will abstractions work with the next generation of infrastructure that your project will use? --> Reservation, Isolation, Reconfiguration and deployment will be required for “ transparent ” Cloud Computing Gap Analysis
14
The notion of energy “conservation” Programming interface Compile-time operations & optimizations Runtime operations & optimizations Grid Infrastructure Programming interface (less abstraction but more optimization Opportunities) Compile-time operations & optimizations Runtime operations & optimizations Grid Infrastructure
15
“programming” models & Abstractions Chemical Computing Data&Workflow Divide & Conquer Workflow MAP-Reduce MPI for Grids Grid-RPC Set of scripts Organizing the calculus Tolerating variations Scheduling computation & communications Implementing computing codes Synchronizing task executions Implementing global operations Selecting the communication protocols Dealing with resources (data, computers, etc.) Dealing with administration domains
16
I didn’t know that Grid had to be programmed (??) Is there anything so different on Grids that it justifies to program them in a specific way? What was the promise? An infrastructure providing resources (data, storage, computing) as the power Grid provides electricity --> Transparently So, why should we care about “programming Grids”? Because the “abstracting job” is not finished: –Moving data and programs rapidly (protocols) –Dealing with several (many) administration domains (VO) –Dealing with several (many) batch schedulers (interfaces) –Moving data and jobs in a smart way (control) –Tolerating the performance variations & failures of resources –Provide QoS –Etc. Even the “good” software layer(s) where to implement the abstraction is (are) not stabilized (Middleware, OS, Network ?) So YES I still have to program Grids
17
That’s not a problem Why should I care about Grid at all ? There is a new very promising solution… It is cleaner (environment friendly, more abstract, etc.) It does not compare with Electricity distribution (the power Grid) BUT with Water distribution… It’s
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.