INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org CREAM: a WebService based CE Massimo Sgaravatto INFN Padova On behalf of the JRA1 IT-CZ Padova.

Slides:



Advertisements
Similar presentations
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Advertisements

A Computation Management Agent for Multi-Institutional Grids
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
Enabling Grids for E-sciencE Medical image processing web portal : Requirements analysis. An almost end user point of view … H. Benoit-Cattin,
INFSO-RI Enabling Grids for E-sciencE XACML and G-PBox update MWSG 14-15/09/2005 Presenter: Vincenzo Ciaschini.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
INFSO-RI Enabling Grids for E-sciencE DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN.
Grid Workload Management Massimo Sgaravatto INFN Padova.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Security and Job Management.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Using gLite API Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers.
INFSO-RI Enabling Grids for E-sciencE The gLite Workload Management System Elisabetta Molinari (INFN-Milan) on behalf of the JRA1.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM and ICE Massimo Sgaravatto – INFN Padova.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Conference name Company name INFSOM-RI Speaker name The ETICS Job management architecture EGEE ‘08 Istanbul, September 25 th 2008 Valerio Venturi.
WP1 WMS rel. 2.0 Some issues Massimo Sgaravatto INFN Padova.
EGEE is a project funded by the European Union under contract INFSO-RI Practical approaches to Grid workload management in the EGEE project Massimo.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
Enabling Grids for E-sciencE The gLite Workload Management System Alessandro Maraschini OGF20, Manchester,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract IST Job sandboxes.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
INFSO-RI Enabling Grids for E-sciencE Policy management and fair share in gLite Andrea Guarise HPDC 2006 Paris June 19th, 2006.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
WP1 WMS release 2: status and open issues Massimo Sgaravatto INFN Padova.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite – UNICORE interoperability Daniel Mallmann.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
EGEE is a project funded by the European Union under contract IST LCG open issues Massimo Sgaravatto INFN Padova JRA1 IT-CZ cluster meeting,
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Probes Requirement Review OTAG-08 03/05/ Requirements that can be directly passed to EMI ● Changes to the MPI test (NGI_IT)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
EGEE is a project funded by the European Union under contract IST Datamat Status Report F. Pacini Datamat S.p.a. Milan, IT-CZ JRA1 meeting,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM: current status and next steps EGEE-JRA1.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
CE design report Luigi Zangrando
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
EGEE is a project funded by the European Union under contract IST Padova report Massimo Sgaravatto On behalf of the INFN Padova JRA1 Group.
CREAM Status and plans Massimo Sgaravatto – INFN Padova
INFSO-RI Enabling Grids for E-sciencE CREAM, WMS integration and possible deployment scenarios Massimo Sgaravatto – INFN Padova.
FESR Trinacria Grid Virtual Laboratory Practical using WMProxy advanced job submission Emidio Giorgio INFN Catania.
Resource access in the EGEE project Massimo Sgaravatto INFN Padova
Practical using C++ WMProxy API advanced job submission
CEMon
OGF PGI – EDGI Security Use Case and Requirements
StoRM: a SRM solution for disk based storage systems
Security aspects of the CREAM-CE
Workload Management System ( WMS )
Preview Testbed Massimo Sgaravatto – INFN Padova
and Alexandre Duarte OurGrid/EELA Interoperability Meeting
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
ICE-CREAM Luigi Zangrando On behalf of the JRA1 IT-CZ Padova group
Francesco Giacomini – INFN JRA1 All-Hands Nikhef, February 2008
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE CREAM: a WebService based CE Massimo Sgaravatto INFN Padova On behalf of the JRA1 IT-CZ Padova group

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 2 CREAM CREAM service: Computing Resource Execution And Management service Simple, lightweight service for job management operation at the Computing Element (CE) level Web service interface –WS-I compliant Goals –Trying to address some of the problems/missing functionality of the current implementations, based on the users and admins input and feedback –Easy support and maintenance –Trying to stick to emerging standards  Service oriented architecture Implemented and maintained by the Padova Group of the EGEE JRA1 IT-CZ cluster –Same team developing and maintaining the CEMon service

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 3 CREAM usage scenario CREAM should be invoked: –By a generic client (e.g. an end-user willing to interact directly via the CE) –Through the Glite WMS WMS Direct Job Submission through the WMS CREAM

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 4 CREAM: functionality Job submission –Submission of jobs to a CREAM based CE –Includes also staging of input sandbox files  Support for ‘scattered’ sandboxes, as in the WMproxy –Actually the operation is split in 2 operations (job register and job start) as done in the WMproxy –Job characteristics described via a JDL (Job Description Language) expression  CREAM JDL is basically the same JDL used by the Glite WMS –Supported job types  Simple, batch jobs  MPI jobs, as supported by the Glite WMS Open to revise the current approach if this is not considered satisfactory oE.g. if always using mpirun is not considered appropriated  Bulk jobs (parametric jobs, job collections, as defined in Fabrizio’s presentation) planned for the next future Efficient transfer and management of the sandbox oIt is very usual that the jobs of the collections share some sandbox files oIt doesn’t make sense to transfer these files multiple times

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 5 CREAM functionality Proxy delegation –Possibility to automatically delegate a proxy for each job submission –Possibility to delegate a proxy, and then using it for multiple job submissions  Recommended approach wrt performance, since proxy delegation can be “expensive” –Same approach used in WMproxy Job cancellation –To cancel previously submitted jobs Job status –To get status and other info (e.g. creation/submission/start execution/job completion times, worker node, failure reason, e.g.) of submitted jobs –Also possible to apply filters on submission time and/or job status Job list –To get the identifiers of all your jobs Job suspension and job resume –To hold and then restart jobs

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 6 CREAM functionality Job purge –To clear a job from a CREAM based CE –Can be explicitly called by the client, or can be called via a cron job (e.g. to clean old jobs) Operations planned but not yet implemented –Job signal  To send a signal to a currently running job –Job assess  To assess how “good” is a specific CE E.g. how many/which resources on that CE are good for that job E.g. estimated time to have the job starting its execution To be further discussed

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 7 CREAM interfaces C++ CLI user interface –Very similar and “homogeneous” with the WMproxy CLI  We try to stay synchronized Also at WSDL level Java client also available –Implemented as requested by the EU funded GRIDCC project  Integrated in their portal CREAM C++ API available –Used by the C++ CLI and by ICE (see later) –We will make these APIs public soon  When we feel that they are stable enough Of course possible to implement a “custom” user interface, using the CREAM WSDL

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 8 CREAM: some internal details Web service application –Implemented in Java –Use of Axis –Use of Tomcat as application server Job management requests saved on ‘journal manager’ Pool of threads serving in parallel the requests saved on this journal manager –Number of threads is configurable Interface with underlying resource management system –“Hidden” by a proper “abstraction layer” –Implemented via BLAH  Already used in the current EGEE Condor based CE –Manages job management operations on behalf of CREAM –Notifies CREAM about job status changes  Very good performance in being able to promptly detect job status changes –All batch systems supported by BLAH (currently LSF and PBS/Torque) are automatically supported by CREAM

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 9 CREAM: some internal details Security –CREAM security architecture follows the guidelines of the Global EGEE security architecture, relying on the official tools provided by the EGEE security group (JRA3) –Authentication  PKI based infrastructure  X.509 certificates –Authorization  Ban/allow (grid-mapfile) PDP currently used  VOMS PDP (recently released) being integrated  Integration with G-PBOX planned For policy enforcements  A user can manage (e.g. cancel, monitor) only her jobs But possibility to define CE admins, who can manage also jobs submitted by other users –Proxy delegation  Using the official EGEE port delegation stuff –Credential mapping  To map Grid credential on local accounts  Implemented via glexec (JRA3 tool) Glexec uses LCMAPS and LCAS

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 10 CREAM: some internal details GridFTP server LCAS-LCMAPS enabled deployed on the CE node –Used for the Input Sandbox transfer –Explored also DIME mechanisms but not considered satisfactory  Performance problems  Problems with interoperability with GSOAP Paying attention to robustness and fault tolerance –E.g. trying to be crash proof, saving persistently vital information  E.g. job information data (job repository) saved on permanent storage  E.g. job management requests to be served saved on permanent storage as well –E.g. trying to be resilient to BLAH parser crashes

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 11 ICE ICE: Interface to Cream Environment Software component acting as an interface between the WMS and CREAM CEs Daemon running on the WMS node –It will be investigated if it can be a WM thread for the future Basically has the role played by JC+LM+Condor in the submission to non-CREAM CEs Isn’t “ICE-CREAM integration” nice ?

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 12 WMS-CREAM integration ICE takes the job management requests from its filelist Submitter handles job submissions, job removals, proxy renewals –Also job suspend and job resume when implemented in WMS Failed submissions are reinserted into the WM’s filelist as in the current implementation (JC+LM) NS WMProxy FileList WM MM JA ICE FileList CREAM Helpers JC+LM Condor Submitter Job Status Handler CEMon

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 13 Job Status Changes A thread of ICE receives notifications about the job status changes from CEMon closely coupled with CREAM CE –CEMon: already used in the current Glite CE to provide CE information As a fail-safe mechanism, another thread is needed to poll the status of jobs still alive –If the relevant notifications are not received via CEMon ICE CREAM CEMon Job Status Handler Notifications sent by CEMon Periodic status polling

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 14 ICE: some internal details Written in C++ Multithreaded Fault tolerance and robustness –Persistent saving of vital information –Reliable lease based mechanisms in the submission protocol  General idea Each job has an attribute (the lease) which is basically the time to live of the job When the lease expires, the job is removed on both sides: CREAM and WMS Leases are renewed by ICE as long as ICE and CREAM can talk to each other  To handle failure scenarios and avoid to “lose” jobs (zombies) ICE failures CREAM failures WMS  CREAM connection failures

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 15 Job Lease ICE CREAM crashes Lease expires, job is removed from WMS Cream restarts sees lease expired, purges jobs ICE tries to renew lease

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 16 Status CREAM, as described (also with the described known problems/missing functionality) is ready and continues to evolve –Just deployed a CREAM certification testbed where tests in a clean environment are being performed  See CREAM web site –Also stress tests and performance measures –Going to open this testbed to interested users in the next future ICE working prototype with core functionality exists –Missing functionality  Logging to LB  Lease protocol  Proxy renewal –Integrated tests with WMS needed

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 17 Some next steps Complete on-going developments –In particular support of bulk jobs Finalize WMS-CREAM integration (ICE) Address problems raised during testing and certification process JSDL support –JSDL: GGF based language to describe job characteristics –Emerging as standard Follow interface standardization efforts –Several initiatives: GGF BES, CRM initiative, Multi-Grids interoperation, OMII- Europe, … Support of non homogeneous CE –Using the functionality that BLAH is going to offer to pass some directives to the underlying batch systems –Exploring matchmaking within CREAM CE Interactive access to job –Interactive read-only access to a running job’s environment –Remote ps, top, ls, cat and tail-like functionality on the Worker Node –Intelligent browsing of remote files: client-side hex viewer and view-like functionality only trasfers needed chunks of the remote file as needed –Some work already done

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 18 Other info Please visit CREAM – ICE web site: Software –CREAM and CREAM CLI sw –Installation and configuration guides –Release notes Documentation –User’s guide –JDL specification

Enabling Grids for E-sciencE INFSO-RI Catania, January NA4 generic application meeting 19 Conclusions Tried and trying to pay attention to users and admins requirements Open to suggestions and recommendations We think that some of the achieved/planned functionality can be really useful and can address some current problems –E.g. bulk job submission at CE level First results are really encouraging –E.g. wrt performance Having full control on the software without many external dependencies is facilitating the process