Presentation on theme: "NDGF CO2 Community Grid Olli Tourunen NORDUnet 2008 Espoo, Finland April 10th 2008."— Presentation transcript:
NDGF CO2 Community Grid Olli Tourunen NORDUnet 2008 Espoo, Finland April 10th 2008
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 2 Topics Project overview First use case Requirements and architecture Implementation Experiences Statistics Future
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 3 CO2-CG overview NDGF Community Grid (CO2-CG) project aims to build an application environment for scientists studying CO2 sequestration CO2-CG was selected in NDGF call for community projects along with BioGrid NDGF provides project coordinator and half FTE for application grid integration plus funding for full FTE for community software development One year project, started in fall 2007 Project coordinator: Michael Gronager (NDGF) Project leader: Klaus Johannsen (BCCS, Bergen) Science specialist: Philip Binning (DTU, Copenhagen) Software developer: Csaba Anderlik (BCCS, Bergen) Grid specialist: Olli Tourunen (NDGF)
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 4 First use case for CO2-CG Parameter study of different attributes of potential CO2 sequestration reservoirs Software: MUFTE-UG, a general purpose simulator for multi-phase, multi-component flow in porous media Pilot user: Andreas Kopp (University of Stuttgart)
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 5 First use case (contd.) Order of hundreds of 32 to 64 processor parallel simulations, computationally bound (not data intensive) One simulation covers a time frame of approximately 50 years starting from CO2 injection to the reservoir Why parallel? Isnt this a parameter study after all? A single 32 process run typically takes 3-4 days to complete With 16 processes we might be running over a week Resources for these simulations are provided mainly by NOTUR, the Norwegian national infrastructure for computational science
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 6 Requirements Main target: Provide scientists with transparent access to computational resources in the grid Input: Users working directory containing MUFTE-UG source code and the input files for the simulation Output: Simulation results returned to the user in a user specified directory Support for NorduGrid ARC middleware Standard grid credential handling to avoid need for custom security policies with participating sites
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 7 ARCARC ARCARC Cluster A MUFTE Runtime Environment Cluster B MUFTE Runtime Environment Architecture overview Grid Job Manager Supercomputer C MUFTE Runtime Environment Application server DB Job descr 1 Job descr 2 Job descr 3 R S S R R Command Line UI S S S S Software R Results
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 8 Architecture Command line UI (application server) Introduces one keyword grid which can be invoked with different options á la openssl Example: user prepares the source code and input files for a simulation in a directory of her choice User issues command like grid submit –np 32 The submit module packages the simulation directory into the spool directory and inserts the parameters into the database User tracks the progress by running grid status The results are made available to user when the job finishes
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 9 Architecture (contd.) Grid Job Manager (application server) Scans the database for new jobs Prepares the new jobs for grid based on job parameters Submits the jobs into grid Keeps track of the grid jobs Downloads the results when a job is ready Downloads the evidence for autopsy when a job fails MUFTE Runtime Environment (grid resource) Standard ARC Runtime Environment Compiles the software based on local configuration and environment Runs the simulation
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 10 Implementation Grid Job Manager (GJM) There is one GJM instance per user One sweep at a time -job, intended to be launched from cron Runs under user credentials Spools active jobs in /var/spool/co2-cg/ Written in Python Uses an object-RDB –mapper called SQLAlchemy Interacts with ARC grid middleware through standard user commands Python API for ARC is also available, might use that in the future
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 11 Implementation (contd.) Database Standard PostgreSQL relational database 3 main tables plus some auxiliary ones Runtime Environment Compilation is done on the ARC server host before the job is submitted using users credentials Compilation and execution parameters are based on the job attributes in the DB Supported levels of parallelism are encoded in the RE name (e.g. MUFTE-MPI-64-1.0)
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 12 Challenges Transparent grid credential handling Balance the security policies and ease of use Parallel run parameterization User needs vs. types of resources vs. available resources No explicit brokering support for this in ARC This can be done with clever RE naming Database access right management (not really an issue until this goes to bigger scale) Lots of different possibilities to solve this if needed (DB level access rights, per user tablespaces, row change staging, n-layer architecture outside the DB…). So far applied KISS.
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 13 Experiences: User side User can access a significant number of distributed resources in a transparent manner Peak so far: 512 cores simultaneously in use Problems Memory specifications for the jobs Walltime specifications for the jobs Getting all the information to debug the jobs that have crashed Non-converging jobs
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 14 Experiences: Operator side It takes around a day setup the MUFTE RE in a new cluster If the site has experience in running MPI-jobs through ARC, the process is quite straightforward In one case we have also had to set up a cross compiling facility AA is easy to configure since the users are managed in NDGF VOMS Since there are not that many parallel jobs run in the grid, ARC LRMS interface needed some tweaks in some clusters Thanks for all the sysadmins that have helped us along the way!
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 15 Statistics Since February 12th 2008, over 400 simulations of 16 to 64 processors have been run Total compute time around 230000 hours Disclaimer: measurements done from the application server side, not from resources accounting.
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 18 Future developments Switching focus to operation Software and application server hardening Automated tests for the runtime environments + blacklisting Cleanup procedures Integrate CO2-CG into the NDGF accounting system Track the simulations that are not converging Easier certificate handling Possibly a web portal for job tracking and collaboration Include the new Cray XT4 in Bergen
NDGF CO2-Community Grid NORDUnet 2008, Espoo, Finland, April 10th 2008 19 Conclusions With moderate effort, simple tools and application specific user interface the grid resource usage can be made easy for the end users On-demand compilation works for selected applications Parallel jobs can be run in a large scale in the grid with little effort
Your consent to our cookies if you continue to use this website.