EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarksEGEE-III INFSO-RI-222667 MPI on the grid:

Slides:



Advertisements
Similar presentations
MPI CUSTOMIZATION IN ROMA3 SITE Antonio Budano Federico Bitelli.
Advertisements

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
HPCC Mid-Morning Break MPI on HPCC Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Research
EGEE-II INFSO-RI Enabling Grids for E-sciencE Supporting MPI Applications on EGEE Grids Zoltán Farkas MTA SZTAKI.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Parallel execution of chemical software on.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Supporting MPI applications on the EGEE Grid.
SA1 / Operation & support Enabling Grids for E-sciencE Integration of heterogeneous computational resources in.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks perfSONAR deployment over Spanish LHC Tier.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Middleware Deployment and Support in EGEE.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Integration of Astro-WISE with Grid storage.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
INFSO-RI Enabling Grids for E-sciencE Strategy for gLite multi-platform support Author:Eamonn Kenny Meeting:SA3 All Hands Meeting.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite Build Programme and Multi-Platform.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks, Novelties and Features around the GridWay.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Stuart Kenny and Stephen Childs Trinity.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Stephen Childs Trinity College Dublin &
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Batch Systems and the Info (Dynamic) Provider.
Enabling Grids for E-sciencE SGE J. Lopez, A. Simon, E. Freire, G. Borges, K. M. Sephton All Hands Meeting Dublin, Ireland 12 Dec 2007 Batch system support.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Status report on Application porting at SZTAKI.
Portal Update Plan Ashok Adiga (512)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Vassiliki Pouli
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarksEGEE-III INFSO-RI Astro-Wise and EGEE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
MPI WG Proceedings Jeroen Engelberts SARA Reken- en Netwerkdiensten Amsterdam, The Netherlands.
EGEE-II INFSO-RI Enabling Grids for E-sciencE YAIM Overview MiMOS Grid tutorial HungChe, ASGC OPS Team.
INFSO-RI Enabling Grids for E-sciencE /10/20054th EGEE Conference - Pisa1 gLite Configuration and Deployment Models JRA1 Integration.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks APEL CPU Accounting in the EGEE/WLCG infrastructure.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Update Authorization Service Christoph Witzig,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Progress report from University of Cyprus.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks IPv6 code checker tool Salvatore Monforte.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA3 partner collaboration tasks & process.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks C. Martín, A. Lorca (UCM) Introduction to.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Middleware Update Maria Alandes Pradillo.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CharonGUI A Graphical Frontend on top of.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Spanish National Research Council- CSIC Isabel.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite configuration (plans) Robert Harakaly.
INFSO-RI Enabling Grids for E-sciencE Installing & configuring Joachim Flammer Integration Team, CERN EMBRACE Tutorial, Clermont-Ferrand.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSA3.4.1 “The process document” Oliver Keeble.
EGEE is a project funded by the European Union under contract IST Experiment Software Installation toolkit on LCG-2
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Configuration Data or “What should be.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Stephen Childs Trinity College Dublin &
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Study on Authorization Christoph Witzig,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks NA3 Resources Robin McConnell.
II EGEE conference Den Haag November, ROC-CIC status in Italy
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Overview of software tools for gLite installation & configuration.
INFSO-RI Enabling Grids for E-sciencE Worker Node installation & configuration Giuseppe Platania INFN Catania EMBRACE Tutorial Clermont-Ferrand,
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GOCDB4 Gilles Mathieu, RAL-STFC, UK An introduction.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM: current status and next steps EGEE-JRA1.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI MPI VT report OMB Meeting 28 th February 2012.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
NA4/medical imaging. Medical Data Manager Installation
Outline Introduction Objectives Motivation Expected Output
Presentation transcript:

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarksEGEE-III INFSO-RI MPI on the grid: operational issues Fokke Dijkstra Donald Smits Center for Information Technology University of Groningen EGEE Conference Barcelona, 22 September 2009

Enabling Grids for E-sciencE EGEE-III INFSO-RI MPI on the grid: operational issues EGEE Conference 2009 Introduction 2 nd MPI WG started 2009 Questionnaires for users and site administrators Site administrator perspective –Experiences with installing a site according to the guidelines from the previous WG Note that some things are already fixed! Local site RUG-CIT –Torque/maui –Yaim

Enabling Grids for E-sciencE EGEE-III INFSO-RI MPI on the grid: operational issues EGEE Conference 2009 YAIM Special Yaim functions exist to configure for MPI –Information system –Environment variables –Submit filter to adapt jobs –Dummy mpirun Installation of packages is up to the site admin

Enabling Grids for E-sciencE EGEE-III INFSO-RI MPI on the grid: operational issues EGEE Conference 2009 WMS support MPICH jobtype –Confuses implementation with jobtype –Broader parallel job support needed Hardcoded call to mpirun in job script –Still there! –MPICH mpirun can start a script –WG Suggestion to install wrapper script instead  Installed by YAIM Only pbs and lsf supported by WMS –Publish that you use pbs if using torque MPICH jobtype abandoned, Normal jobtype can also be used!

Enabling Grids for E-sciencE EGEE-III INFSO-RI MPI on the grid: operational issues EGEE Conference 2009 Information system Available MPI implementations published in IS –GlueHostApplicationSoftwareRunTimeEnvironment  MPICH, OPENMPI, etc.  MPICH-1.2.7, etc. Environment variables set to point to location on file system Version requirements hard to use –Current regexp matching in JDL makes this very hard  Exact or partial match only Consistency not checked Information about interconnects and shared file system also published –e.g. MPI-Infiniband –Needs to be consistent across sites

Enabling Grids for E-sciencE EGEE-III INFSO-RI MPI on the grid: operational issues EGEE Conference 2009 MPI packages Lack of rpm packages for common MPI implementations in standard gLite repository (only MPICH-1) Inconsistent packaging (e.g. OpenMPI supplied in CentOS repository does not install into /opt) Support for mpiexec dropped because of switch to newer torque version This makes it hard for most site administrators to install MPI! MPI versions should also be compiled with support for local hardware and software (scheduler) –E.g. OpenMPI can use torque scheduling daemons –Support for local interconnects hard to provide

Enabling Grids for E-sciencE EGEE-III INFSO-RI MPI on the grid: operational issues EGEE Conference 2009 mpi-start / mpiexec mpi-start recommended as a tool to help users starting up MPI jobs –Seems to work fine –Depends heavily on setting environment variables to correct values Depends on mpiexec for mpich-1 –mpiexec support dropped from gLite! –Mpich-1 only available implementation in gLite repository Is it still maintained? MPICH support dropped MPICH2 and OpenMPI added mpi-start still maintained

Enabling Grids for E-sciencE EGEE-III INFSO-RI MPI on the grid: operational issues EGEE Conference 2009 Shared file system Shared file system between WNs recommended –Easier for applications Existence has to be published Performance? –NFS –Moving sequential jobs to local temporary directory –What, when a job does not need it for I/O? –Fast shared file system may be good for everyone Configuration not supported by YAIM –Is this a good idea anyway?

Enabling Grids for E-sciencE EGEE-III INFSO-RI MPI on the grid: operational issues EGEE Conference 2009 Starting up tasks using ssh Most MPI implementations support ssh for starting up remote tasks Passwordless ssh set up by YAIM for torque job manager Site administrators may not like this –For torque a pam module exists to prevent unwanted logins –Correct cpu time accounting problematic –Remote tasks not under control of scheduler Gives user a lot of freedom to perform other parallel work without using MPI Some MPI implementations can use scheduler daemons –OpenMPI (has to be compiled with support for this) –mpiexec for MPICH2

Enabling Grids for E-sciencE EGEE-III INFSO-RI MPI on the grid: operational issues EGEE Conference 2009 SAM tests Sites that support MPI may still not work Lack of (enforced) SAM test for MPI What should be tested anyway? –Consistency  Published information  Environment variables  Installed packages –Simple test program

Enabling Grids for E-sciencE EGEE-III INFSO-RI MPI on the grid: operational issues EGEE Conference 2009 Improved scheduling of cores Use job requirement from the user –Schedule full nodes –I don’t care (site default applied)  Scheduling first available cores improves scheduling time dramatically on small sites Sites have a different number of cores per node –Always schedule complete nodes –Schedule just the requested cores –This probably should be another job requirement Use of torque submit filter may break/remove site script

Enabling Grids for E-sciencE EGEE-III INFSO-RI MPI on the grid: operational issues EGEE Conference 2009 Other problems Topics not covered by the previous WG –Support for other parallel job types –Scheduling of cores to the jobs CPU time limits do not work with parallel jobs –Best solution, only use wallclocktime –Other option make them very high  Some large VOs make requirements on CPU time (why?) –Local solution publish wallclock time limit as cpu time limit

Enabling Grids for E-sciencE EGEE-III INFSO-RI MPI on the grid: operational issues EGEE Conference 2009 My personal recommendations Improve WMS –Remove call to mpirun √ –Remove requirement to publish pbs or lsf √ Improve repository –Make rpms available of commonly used implementations  OpenMPI  MPICH2 –mpiexec for MPICH2 –Support mpi-start –Support for torque –Source RPMS for recompilation Make use of SAM tests to check working MPI support Improve documentation –General MPI –Guidelines for setting up:  Shared file system  Passwordless ssh  Recompiling MPI implementations with support for local hardware/software