SA1 / Operation & support Enabling Grids for E-sciencE Integration of heterogeneous computational resources in.

Slides:



Advertisements
Similar presentations
MAGO Monitoring All Grid Objects Anna Jannace 1, Carmine Spizuoco 1, Francesco Adinolfi 1, Giovanni Bracco 2 1- Consorzio Campano per lInformatica e lAutomazione.
Advertisements

Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
XSEDE 13 July 24, Galaxy Team: PSC Team:
EGEE-II INFSO-RI Enabling Grids for E-sciencE Supporting MPI Applications on EGEE Grids Zoltán Farkas MTA SZTAKI.
ENEA-GRID and gLite Interoperability: robustness of SPAGO approach Catania, Italy, February ENEA-GRID and gLite Interoperability: robustness.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
Sergey Belov, Tatiana Goloskokova, Vladimir Korenkov, Nikolay Kutovskiy, Danila Oleynik, Artem Petrosyan, Roman Semenov, Alexander Uzhinskiy LIT JINR The.
SA1 / Operation & support Enabling Grids for E-sciencE Integration of heterogeneous computational resources in.
1 Deployment of an LCG Infrastructure in Australia How-To Setup the LCG Grid Middleware – A beginner's perspective Marco La Rosa
Cloud Computing in ENEA-GRID: Virtual Machines, Roaming Profile, and Online Storage Ing. Giovanni Ponti, Ph.D. ENEA – UTICT-HPC
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
SA1 / Operation & support Enabling Grids for E-sciencE Integration of heterogeneous computational resources in.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
E-science grid facility for Europe and Latin America Bridging OurGrid-based and gLite-based Grid Infrastructures Abmar de Barros, Adabriand.
INFSO-RI Enabling Grids for E-sciencE Status of LCG-2 porting Stephen Childs, Brian Coghlan and Eamonn Kenny Grid-Ireland/EGEE October.
Maximilian Berger David Gstir Thomas Fahringer Distributed and parallel Systems Group University of Innsbruck Austria Oct, 13, Krakow, PL.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Configuring and Maintaining EGEE Production.
INFSO-RI Enabling Grids for E-sciencE The GENIUS Grid portal Tony Calanducci INFN Catania - Italy First Latin American Workshop.
Enabling Grids for E-sciencE CRESCO HPC SYSTEM INTEGRATED INTO ENEA GRID ENVIRONMENT G. Bracco, S.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Computational grids and grids projects DSS,
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite IPv6 compliance project tests Further.
Enabling Grids for E-sciencE ENEA and the EGEE project gLite and interoperability Andrea Santoro, Carlo Sciò Enea Frascati, 22 November.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
SA1 / Operation & support Enabling Grids for E-sciencE Multiplatform grid computation applied to an hyperbolic.
BOSCO Architecture Derek Weitzel University of Nebraska – Lincoln.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
E-science grid facility for Europe and Latin America Bridging the High Performance Computing Gap with OurGrid Francisco Brasileiro Universidade.
Grid Middleware Tutorial / Grid Technologies IntroSlide 1 /14 Grid Technologies Intro Ivan Degtyarenko ivan.degtyarenko dog csc dot fi CSC – The Finnish.
Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -
CEOS WGISS-21 CNES GRID related R&D activities Anne JEAN-ANTOINE PICCOLO CEOS WGISS-21 – Budapest – 2006, 8-12 May.
Enabling Grids for E-sciencE CRESCO COMPUTATIONAL RESOURCES AND ITS INTEGRATION IN ENEA-GRID.
Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
Integration of heterogeneous computational resources in EGEE: a live demo Istanbul, Turkey, September Integration of heterogeneous computational.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
User Interface UI TP: UI User Interface installation & configuration.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
Mobile Analyzer A Distributed Computing Platform Juho Karppinen Helsinki Institute of Physics Technology Program May 23th, 2002 Mobile.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Overview of gLite, the EGEE middleware Mike Mineter Training Outreach Education National.
II EGEE conference Den Haag November, ROC-CIC status in Italy
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA Grid2Win : gLite for Microsoft Windows Elisa Ingrà - INFN.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
Page : 1 SC2004 Pittsburgh, November 12, 2004 DEISA : integrating HPC infrastructures in Europe DEISA : integrating HPC infrastructures in Europe Victor.
ENEA GRID & JPNM WEB PORTAL to create a collaborative development environment Dr. Simonetta Pagnutti JPNM – SP4 Meeting Edinburgh – June 3rd, 2013 Italian.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarksEGEE-III INFSO-RI MPI on the grid:
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Computing Clusters, Grids and Clouds Globus data service
CRESCO Project: Salvatore Raia
MAGO Monitoring All Grid Objects
Presentation transcript:

SA1 / Operation & support Enabling Grids for E-sciencE Integration of heterogeneous computational resources in EGEE infrastructure: a live demo A. Santoro, G. Bracco, S. Migliori, S. Podda, A. Quintiliani, A. Rocchi, C. Sciò * ENEA-FIM, ENEA C.R. Frascati, Frascati (Roma) Italy, (*) Esse3Esse Summary The SPAGO (Shared Proxy Approach for Grid Objects) architecture enables the EGEE user to submit jobs not necessarily based on the x86 or x86_64 Linux architectures, thus allowing a wider array of scientific software to be run on the EGEE Grid and a wider segment of the research community to participate in the project. It also provides a simple way for local resource managers to join the EGEE infrastructure and the procedure shown in this demo further reduces the complexity involved in implementing the SPAGO approach. This fact can widen significantly the penetration of gLite middle-ware outside its traditional domain of the distributed and capacity focused computation. For example the world of the High Performance Computing, which often requires dedicated system software, can find in SPAGO the easy way to join the large EGEE community. SPAGO will be used to connect ENEA CRESCO HPC system (#125 in top500/2008) to EGEE infrastructure. The aim of this demo is to show how a computational platform not supported by gLite (such as AIX, Altix, Irix or MACOS) may still be used as a gLite Worker Node, and thus be integrated inside EGEE by employing the above mentioned SPAGO methodology. All the machines required to support the demo (consisting of both the gLite infrastructure machines and the non-standard Worker Nodes) reside on the ENEA-GRID infrastructure. Specifically, the demo will make use of two shared filesystems (NFS and AFS), Worker Nodes belonging to five different architectures (AIX, linux, altix, cray, IRIX), and one resource manager systems (LSF), plus five Computing Elements, two gLite worker nodes (which will act as proxies) and a machine acting as BDII. All those resources are integrated into the ENEA-GRID infrastructure which offers a uniform interface to access all of them. The case of a multiplattform user application (POLY-SPAN) which takes advantage of the infrastructure is also shown. ENEA [Italian National Agency for New Technologies, Energy and Environment] 12 Research sites and a Central Computer and Network Service (ENEA-INFO) with 6 computer centres managing multi-platform resources for serial & parallel computation and graphical post processing. ENEA-GRID computational resources: Hardware: ~400 hosts and ~3400 cpu : IBM SP; SGI Altix & Onyx; Linux clusters 32/ia64/x86_64; Apple cluster; Windows servers. Most relevant resources: CRESCO 2700 cpu mostly dual Xeon Clovertown 4 core IBM SP5 256 cpu; 3+ frames of IBM SP4 105 cpu ENEA GRID mission [started 1999]: provide a unified user environment and an homogeneous access method for all ENEA researchers, irrespective of their location. optimize the utilization of the available resources ENEA GRID GRID functionalities (unique authentication, authorization, resource access and resource discovery) are provided using “mature”, multi-platform components: Distributed File System: OpenAFS Resource Manager: LSF Multicluster [ Unified user interface: Java & Citrix Technologies These components constitute the ENEA-GRID Middleware. OpenAFS user homes, software and data distribution integration with LSF user authentication/authorization, Kerberos V ENEA GRID architecture CRESCO (Computational Research Center for Complex Systems) is an ENEA Project, co-funded by the Italian Ministry of University and Research (MIUR). The project is functionally built around a HPC platform and 3 scientific thematic laboratories: the Computing Science Laboratory, hosting activities on HW and SW design, GRID technology and HPC platform management The HPC system consist of a ~2700 cores (x86_64) resource (~17.1 Tflops HPL Benchmark #125 top500/2008), InfiniBand connected with a 120 TB storage area (GPFS). A fraction of the resource, part of ENEA-GRID, will be made available to EGEE GRID using gLite middle-ware through the SPAGO approach. the Computational Systems Biology Laboratory, with activities in the Life Science domain, ranging from the “post-omic” sciences (genomics, interactomics, metabolomics) to Systems Biology; the Complex Networks Systems Laboratory, hosting activities on complex technological infrastructures, for the analysis of Large National Critical Infrastructures. CRESCO HPC Centre The Issues of SPAGO Approach The gateway implementation has some limitations, due to the unavailability of the middleware on the Worker Nodes. The Worker Node API are not available and also the monitoring is partially implemented. As a result, RGMA is not available as also the Worker Node GRIDICE components. A work around solution can be found for GRIDICE, by collecting the required information directly using a dedicated script on the information collecting machine, by means of native LSF commands. SPAGO in EGEE Production GRID GOC/GSTAT page with AIX WN information EGEE-III INFSO-RI The SPAGO approach The Computing Element (CE) used in a standard gLite installation and its relation with the Worker Nodes (WN) and the rest of the EGEE GRID is shown in the Figure 1. When the Workload Management Service (WMS) sends the job to the CE, the gLite software on the CE employs the resource manager (LSF for ENEA-INFO) to schedule jobs for the various Worker Nodes. When the job is dispatched to the proper worker node (WN 1 ), but before it is actually executed, the worker node employs the gLite software installed on itself to setup the job environment (it loads from the WMS storage the files needed to run, known as the InputSandbox). Analogously, after the job execution the Worker Node employs gLite software to store on the WMS storage the output of the computation (the OutputSandbox). The problem is that this architecture is based on the assumption underlying the EGEE design that all the machines, CE and WN alike, employ the same architecture. In the current version of gLite (3.1) the software is written for intel-compatible hardware running Scientific Linux. The basic design principle of the ENEA-INFO gateway to EGEE is outlined in Figure 2 and it exploits the possibility to use a shared file system. When the CE receives a job from the WMS, the gLite software on the CE employs the resource manager to schedule jobs for the various Worker Nodes, as in the standard gLite architecture. However the worker node is not capable to run the gLite software that recovers the InputSandbox. To solve this problem the LSF configuration has been modified so that any attempt to execute gLite software on a Worker Node actually executes the command on a specific machine, labeled Proxy Worker Node which is able to run standard gLite. By redirecting the gLite command to the Proxy WN, the command is executed, and the InputSandbox is downloaded into the working directory of the Proxy WN. The working directory of each grid user is maintained into the shared filesystem, and is shared among all the Worker Nodes and the Proxy WN, thus downloading a file into the working directory of the Proxy WN makes it available to all the other Worker Nodes as well. Now the job on the WN 1 can run since its InputSandbox has been correctly downloaded into its working directory. When the job generates output files the OutputSandbox is sent back to the WMS storage by using the same method. In the above architecture, the Proxy WN may become a bottleneck since its task is to perform requests coming from many Worker Nodes. In that case a pool of Proxy WN can be allocated to distribute the load equally among them. The SPAGO approach: no middleware on WN Modifications on CE YAIM: config_nfs_sw_dir_server, config_nfs_sw_dir_client, config_users Gatekeeper: lsf.pm, cleanup-grid-accounts.sh Information system: lcg-info-dynamic-lsf CE & WN layout for the standard site CE & WN layout for SPAGO Architecture Tested implementations Shared Filesystems Resource dispatchers Worker Nodes Architecture NFS AFS (requires additional modification of the CE due to authentication issues) GPFS (in progress) LSF Multicluster (v 6.2 and 7.0) Script SSH PBS (under investigation) Non-standard LINUX AIX 5.3 (in production) IRIX ALTIX 350 RH 3, 32 cpu CRAY XD1 (Suse 9) 24 cpu MacOSX 10.4 Worker nodes: the commands that shoyuld have been executed on the WN have been substtituted by wrappers on the shared filesystem that invoke a remote execution on the Proxy Worker Node. Modifications on WN FOCUS OF THE DEMO 1) We show how a Worker Node whose architecture and operating system are not explicitly supported by gLite can still be integrated into EGEE. The demo summarizes the steps to integrate a generic UNIX machine into the grid and job submission will be demostrated to AIX, Altix, IRIX, Cray(Suse) and MacOSX worker nodes. 2) We show how jobs submitted by users for a specific, non standard platform are automatically redirected to the proper Worker Nodes. 3) We present an user application [POLY-SPAN] compatible with many different platforms not supported by gLite, and will show how it can run on the non-standard worker nodes presented above. The activity has been supported by the ENEA-GRID and CRESCO TEAM P. D'Angelo, D. Giammattei, M. De Rosa, S. Pierattini, G. Furini, R. Guadagni, F. Simoni, A. Perozziello, A. De Gaetano, S. Pecoraro, A. Funel, S. Raia, G. Aprea, U. Ferrara, F. Prota, D. Novi, G. Guarnieri Figure 1 Figure 2