Biomed tutorial 1 Enabling Grids for E-sciencE INFSO-RI-508833 EGEE is a project funded by the European Union under contract IST-2003-508833 JDL Flavia.

Slides:



Advertisements
Similar presentations
Workload Management David Colling Imperial College London.
Advertisements

EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Hands on Job Services.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
The Grid Constantinos Kourouyiannis Ξ Architecture Group.
Job Submission The European DataGrid Project Team
EGEE is funded by the European Union under contract IST Elena Slabospitskaya IHEP NA3 manager for Russia An inroduction to services provided.
INFSO-RI Enabling Grids for E-sciencE Architecture of the gLite Workload Management System Giuseppe Andronico INFN EGEE Tutorial.
E-infrastructure shared between Europe and Latin America 12th EELA Tutorial for Users and System Administrators Architecture of the gLite.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
The EDG Workload Management System – n° 1 The EDG Workload Management System.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
Glite WMS overview Alessandra Forti Computing Seminar Manchester 20th November 2008.
Job Submission The European DataGrid Project Team
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
INFSO-RI Enabling Grids for E-sciencE GILDA Praticals GILDA Tutors INFN Catania ICTP/INFM-Democritos Workshop on Porting Scientific.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Computational grids and grids projects DSS,
Enabling Grids for E-sciencE Workload Management System on gLite middleware Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi (Vietnam)
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
1 Esther Montes Prado CIEMAT 10th EELA Tutorial Madrid, Hands-on on WMS (Review and Summary)
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Job Submission The European DataGrid Project Team
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite job submission Fokke Dijkstra Donald.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Using gLite API Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers.
INFSO-RI Enabling Grids for E-sciencE The gLite Workload Management System Elisabetta Molinari (INFN-Milan) on behalf of the JRA1.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
EGEE is a project funded by the European Union under contract IST Job Description Language - more control over your Job Assaf Gottlieb University.
EGEE is a project funded by the European Union under contract IST EGEE Tutorial Turin, January Job Services Emidio.
Job Management DIRAC Project. Overview  DIRAC JDL  DIRAC Commands  Tutorial Exercises  What do you have learned? KEK 10/2012DIRAC Tutorial.
WP1 WMS rel. 2.0 Some issues Massimo Sgaravatto INFN Padova.
E-infrastructure shared between Europe and Latin America 1 Workload Management System-WMS Luciano Diaz Universidad Nacional Autónoma de México - UNAM Mexico.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
Enabling Grids for E-sciencE Workload Management System on gLite middleware - commands Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
EGEE-0 / LCG-2 middleware Practical.
EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,
INFSO-RI Enabling Grids for E-sciencE Job Submission Tutorial (material from INFN Catania)
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
1 DIRAC Job submission A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
INFSO-RI Enabling Grids for E-sciencE Job Description Language (JDL) Giuseppe La Rocca INFN First gLite tutorial on GILDA Catania,
INFSO-RI Enabling Grids for E-sciencE GILDA Praticals Giuseppe La Rocca INFN – Catania gLite Tutorial at the EGEE User Forum CERN.
Further aspects of EGEE middleware components INFN, Catania EGEE is funded by the European Union under contract IST
EGEE is a project funded by the European Union under contract IST Job Description Language – How to control your Job Nadav Grossaug IsraGrid.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.
Job Submission The European DataGrid Project Team
User Interface UI TP: UI User Interface installation & configuration.
LCG2 Tutorial Viet Tran Institute of Informatics Slovakia.
Istituto Nazionale di Astrofisica Information Technology Unit INAF-SI Job with data management Giuliano Taffoni.
EGEE is a project funded by the European Union under contract IST GENIUS and GILDA Guy Warner NeSC Training Team Induction to Grid Computing.
GRID commands lines Original presentation from David Bouvet CC/IN2P3/CNRS.
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
Introduction to Job Description Language (JDL) Alessandro Costa INAF Catania Corso di Calcolo Parallelo Grid Computing Catania - ITALY September.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
EGEE is a project funded by the European Union under contract IST Job Submission Giuseppe La Rocca EGEE NA4 Generic Applications INFN Catania.
Architecture of the gLite WMS
Workload Management System on gLite middleware
Workload Management System ( WMS )
EGEE tutorial, Job Description Language - more control over your Job Assaf Gottlieb Tel-Aviv University EGEE is a project.
Workload Management System
5. Job Submission Grid Computing.
The EU DataGrid Job Submission Services
Job Description Language
Job Description Language (JDL)
Job Submission M. Jouvin (LAL-Orsay)
Presentation transcript:

Biomed tutorial 1 Enabling Grids for E-sciencE INFSO-RI EGEE is a project funded by the European Union under contract IST JDL Flavia Donno

Biomed tutorial 2 Enabling Grids for E-sciencE INFSO-RI Possible job states

Biomed tutorial 3 Enabling Grids for E-sciencE INFSO-RI Jobs State Machine (1/9) Submitted: job is entered by the user to the User Interface but not yet transferred to Network Server for processing

Biomed tutorial 4 Enabling Grids for E-sciencE INFSO-RI Jobs State Machine (2/9) Waiting: job accepted by NS and waiting for Workload Manager processing or being processed by WMHelper modules.

Biomed tutorial 5 Enabling Grids for E-sciencE INFSO-RI Jobs State Machine (3/9) Ready: job processed by WM and its Helper modules (CE found) but not yet transferred to the CE (local batch system queue) via JC and CondorC. This state does not exists for a DAG as it is not subjected to matchmaking (the nodes are) but passed directly to DAGMan.

Biomed tutorial 6 Enabling Grids for E-sciencE INFSO-RI Jobs State Machine (4/9) Scheduled: job waiting in the queue on the CE. This state also does not exists for a DAG as it is not directly sent to a CE (the node are).

Biomed tutorial 7 Enabling Grids for E-sciencE INFSO-RI Jobs State Machine (5/9) Running: job is running. For a DAG this means that DAGMan has started processing it.

Biomed tutorial 8 Enabling Grids for E-sciencE INFSO-RI Jobs State Machine (6/9) Done: job exited or considered to be in a terminal state by CondorC (e.g., submission to CE has failed in an unrecoverable way).

Biomed tutorial 9 Enabling Grids for E-sciencE INFSO-RI Jobs State Machine (7/9) Aborted: job processing was aborted by WMS (waiting in the WM queue or CE for too long, over-use of quotas, expiration of user credentials).

Biomed tutorial 10 Enabling Grids for E-sciencE INFSO-RI Jobs State Machine (8/9) Cancelled: job has been successfully canceled on user request.

Biomed tutorial 11 Enabling Grids for E-sciencE INFSO-RI Jobs State Machine (9/9) Cleared: output sandbox was transferred to the user or removed due to the timeout.

Biomed tutorial 12 Enabling Grids for E-sciencE INFSO-RI Contents The EDG Workload Management System Job Preparation –Job Description Language Job submission and job status monitoring WMS Matchmaking Different job types –Normal jobs –Interactive jobs –Checkpointable jobs –Parallel jobs APIs Overview

Biomed tutorial 13 Enabling Grids for E-sciencE INFSO-RI LCG Workload Management System Workload Management System (WMS)The user interacts with Grid via a Workload Management System (WMS) The Goal of WMS is the distributed scheduling and resource management in a Grid environment. What does it allow Grid users to do? –To submit their jobs –To execute them on the “best resources”  The WMS tries to optimize the usage of resources –To get information about their status –To retrieve their output

Biomed tutorial 14 Enabling Grids for E-sciencE INFSO-RI Job Preparation Information to be specified when a job has to be submitted: –Job characteristics –Job requirements and preferences on the computing resources  Also including software dependencies –Job data requirements Information specified using a Job Description Language (JDL) –Based upon Condor’s CLASSified ADvertisement language (ClassAd)  Fully extensible language  A ClassAd Constructed with the classad construction operator [] It is a sequence of attributes separated by semi-colons. An attribute is a pair (key, value), where value can be a Boolean, an Integer, a list of strings, … o = ; So, the JDL allows definition of a set of attribute, the WMS takes into account when making its scheduling decision

Biomed tutorial 15 Enabling Grids for E-sciencE INFSO-RI Job Description Language The supported attributes are grouped in two categories: –Job Attributes  Define the job itself –Resources  Taken into account by the RB for carrying out the matchmaking algorithm (to choose the “best” resource where to submit the job)  Computing Resource Used to build expressions of Requirements and/or Rank attributes by the user Have to be prefixed with “other.”  Data and Storage resources Input data to process, SE where to store output data, protocols spoken by application when accessing SEs

Biomed tutorial 16 Enabling Grids for E-sciencE INFSO-RI JDL: Relevant attributes JobType –Normal (simple, sequential job), Interactive, MPICH, Checkpointable –Or combination of them Executable (mandatory) –The command name Arguments (optional) –Job command line arguments StdInput, StdOutput, StdError (optional) –Standard input/output/error of the job Environment –List of environment settings InputSandbox (optional) –List of files on the UI local disk needed by the job for running –The listed files will automatically staged to the remote resource OutputSandbox (optional) –List of files, generated by the job, which have to be retrieved

Biomed tutorial 17 Enabling Grids for E-sciencE INFSO-RI JDL: Relevant attributes Requirements –Job requirements on computing resources –Specified using attributes of resources published in the Information Service –If not specified, default value defined in UI configuration file is considered  Default: other.GlueCEStateStatus == "Production" (the resource has to be able to accept jobs and dispatch them on WNs) Rank –Expresses preference (how to rank resources that have already met the Requirements expression) –Specified using attributes of resources published in the Information Service –If not specified, default value defined in the UI configuration file is considered  Default: - other.GlueCEStateEstimatedResponseTime (the lowest estimated traversal time)  Default: other.GlueCEStateFreeCPUs (the highest number of free CPUs) for parallel jobs (see later)

Biomed tutorial 18 Enabling Grids for E-sciencE INFSO-RI JDL: Relevant attributes InputData –Refers to data used as input by the job: these data are published in the Replica Location Service (RLS) and stored in the SEs) –LFNs and/or GUIDs DataAccessProtocol (mandatory if InputData has been specified) –The protocol or the list of protocols which the application is able to speak with for accessing InputData on a given SE OutputSE –The Uniform Resource Identifier of the output SE –RB uses it to choose a CE that is compatible with the job and is close to SE

Biomed tutorial 19 Enabling Grids for E-sciencE INFSO-RI Example of JDL file [ JobType=“Normal”; Executable = “gridTest”; StdError = “stderr.log”; StdOutput = “stdout.log”; InputSandbox = {“home/joda/test/gridTest”}; OutputSandbox = {“stderr.log”, “stdout.log”}; InputData = {“lfn:green”, “guid:red”}; DataAccessProtocol = “gridftp”; Requirements = other.GlueHostOperatingSystemNameOpSys == “LINUX” && other.GlueCEStateFreeCPUs>=4; Rank = other.GlueCEPolicyMaxCPUTime; ]

Biomed tutorial 20 Enabling Grids for E-sciencE INFSO-RI Job Submission edg-job-submit [–r ] [-c ] [-vo ] [-o ] -r the job is submitted directly to the computing element identified by -c the configuration file is pointed by the UI instead of the standard configuration file -vo the Virtual Organization (if user is not happy with the one specified in the UI configuration file) -o the generated edg_jobId is written in the Useful for other commands, e.g.: edg-job-status –i (or edg_jobId) -i the status information about edg_jobId contained in the are displayed

Biomed tutorial 21 Enabling Grids for E-sciencE INFSO-RI Job resubmission If something goes wrong, the WMS tries to reschedule and resubmit the job (possibly on a different resource satisfying all the requirements) Maximum number of resubmissions: min(RetryCount, MaxRetryCount) –RetryCount: JDL attribute –MaxRetryCount: attribute in the “RB” configuration file E.g., to disable job resubmission for a particular job: RetryCount=0; in the JDL file

Biomed tutorial 22 Enabling Grids for E-sciencE INFSO-RI Other (most relevant) UI commands edg-job-list-match –Lists resources matching a job description –Performs the matchmaking without submitting the job edg-job-cancel –Cancels a given job edg-job-status –Displays the status of the job edg-job-get-output –Returns the job-output (the OutputSandbox files) to the user edg-job-get-logging-info –Displays logging information about submitted jobs (all the events “pushed” by the various components of the WMS) –Very useful for debug purposes

Biomed tutorial 23 Enabling Grids for E-sciencE INFSO-RI WMS APIs The WMS makes C++ and Java APIs available for UI, LB consumer and client. In the following document: details about the rpms containing the APIs are given. Correspondent doxigen documentation can be found in share/doc area. Ex.: $EDG_LOCATION/share/doc/edg-wl-ui-api-cpp-lcg2.1.49/html BrokerInfo CLI and APIs are described:

Biomed tutorial 24 Enabling Grids for E-sciencE INFSO-RI WMS APIs #include #include "edg/workload/logging/client/JobStatus.h" #include "edg/workload/common/utilities/Exceptions.h" #include "edg/workload/common/requestad/JobAd.h" #include "edg/workload/userinterface/client/Job.h" using namespace std ; using namespace edg::workload::common::utilities ; using namespace edg::workload::logging::client ; /* ************************************************************************** * Example based on edg-wl-job-submit.cpp, edg-wl-job-status.cpp * for further examples see also: bin/cvsweb.cgi/workload/userinterface/test/?cvsroot=lcgware * * author: * * Example usage on GILDA: *./workload Hello.jdl grid004.ct.infn.it 7772 grid004.ct.infn.it 9000 * */ %./workload Hello.jdl lxb0704.cern.ch 7772 lxb0704.cern.ch 9000

Biomed tutorial 25 Enabling Grids for E-sciencE INFSO-RI WMS APIs int main (int argc,char *argv[]) { cout << "Workload Management API Example " << endl; try{ if (argc < 6 || strcmp(argv[1],"--help") == 0) { cout << "Usage : " << argv[0] [ ]" << endl; return -1; } edg::workload::common::requestad::JobAd jab; jab.fromFile ( argv[1] ) ; edg::workload::userinterface::Job job(jab); job.setLoggerLevel (6) ; cout << "Submit job to " << argv[2] << ":" << argv[3] << endl; cout << "LB address: "<< argv[4] << ":" << argv[5] << endl; cout << "Please wait..." << endl; // We now submit the job. If a CE is given (argv[6]), we send it directly // to the specified CE // if (argc ==6) job.submit (argv[2], atoi(argv[3]), argv[4], atoi(argv[5]), "") ; else job.submit (argv[2], atoi(argv[3]), argv[4], atoi(argv[5]), argv[6] ) ; cout << "Job Submission OK; JobID= " toString() << endl << flush ; The JobAd class provides users with management operations on JDL files We instantiate a Job object that corresponds to our JDL file and handles our job

Biomed tutorial 26 Enabling Grids for E-sciencE INFSO-RI WMS APIs // Print some detailed error information in case the job did not // succeed. // if ((status.status == 8) || (status.status == 9)) { printStatus(status); exit(-1); } // Now that the job has successfully finished, we retrieve the output // string outputDir = "/tmp"; job.getOutput(outputDir); cout << "\nThe output has been retrieved and stored in the directory " << outputDir << endl; return 0; } catch (Exception &exc){ cerr << "\nWMS Error\n"; cerr << exc.printStackTrace(); } return -1; } The job finished successfully. We can retrieve the output. The job finished successfully. We can retrieve the output.

Biomed tutorial 27 Enabling Grids for E-sciencE INFSO-RI WMS APIs C = gcc GLOBUS_FLAVOR = gcc32 ARES_LIBS = -lares BOOST_LIBS = -L/opt/boost/gcc-3.2.2/lib/release -lboost_fs \ -lboost_thread -lpthread -lboost_regex CLASSAD_LIBS = -L/opt/classads/gcc-3.2.2/lib -lclassad EXPAT_LIBS = -lexpat GLOBUS_THR_LIBS = -L/opt/globus/lib -lglobus_gass_copy_gcc32dbgpthr \ -lglobus_ftp_client_gcc32dbgpthr -lglobus_gass_transfer_gcc32dbgpthr \ -lglobus_ftp_control_gcc32dbgpthr -lglobus_io_gcc32dbgpthr \ -lglobus_gss_assist_gcc32dbgpthr -lglobus_gssapi_gsi_gcc32dbgpthr \ -lglobus_gsi_proxy_core_gcc32dbgpthr \ -lglobus_gsi_credential_gcc32dbgpthr \ -lglobus_gsi_callback_gcc32dbgpthr -lglobus_oldgaa_gcc32dbgpthr \ -lglobus_gsi_sysconfig_gcc32dbgpthr \ -lglobus_gsi_cert_utils_gcc32dbgpthr \ -lglobus_openssl_gcc32dbgpthr -lglobus_proxy_ssl_gcc32dbgpthr \ -lglobus_openssl_error_gcc32dbgpthr -lssl_gcc32dbgpthr \ -lcrypto_gcc32dbgpthr -lglobus_common_gcc32dbgpthr GLOBUS_COMMON_THR_LIBS = -L/opt/globus/lib -L/opt/globus/lib \ -lglobus_common_gcc32dbgpthr GLOBUS_SSL_THR_LIBS = -L/opt/globus/lib -L/opt/globus/lib \ -lssl_gcc32dbgpthr -lcrypto_gcc32dbgpthr VOMS_CPP_LIBS = -L/opt/edg/lib -lvomsapi_gcc32dbgpthr all: workload workload: workload.o $(CC) -o workload \ -L${EDG_LOCATION}/lib -ledg_wl_common_requestad \ –lpthread \ -ledg_wl_userinterface_client \ -ledg_wl_exceptions -ledg_wl_logging \ -ledg_wl_loggingpp \ -ledg_wl_globus_ftp_util -ledg_wl_util \ -ledg_wl_common_requestad \ -ledg_wl_jobid -ledg_wl_logger -ledg_wl_gsisocket_pp \ -ledg_wl_checkpointing -ledg_wl_ssl_helpers \ -ledg_wl_ssl_pthr_helpers \ $(VOMS_CPP_LIBS) \ $(CLASSAD_LIBS) $(EXPAT_LIBS) $(ARES_LIBS) \ $(BOOST_LIBS) \ $(GLOBUS_THR_LIBS) \ $(GLOBUS_COMMON_THR_LIBS) \ $(GLOBUS_SSL_THR_LIBS) \ workload.o workload.o: workload.cpp $(CC) -I ${EDG_LOCATION}/include \ -I/opt/classads/gcc-3.2.2/include -c workload.cpp clean: rm -rf workload workload.o Makefile