Download presentation
Presentation is loading. Please wait.
1
gLite Job Management Amina KHEDIMI (a.khedimi@dtri.cerist.dz) CERIST
18/09/2018 gLite Job Management Amina KHEDIMI CERIST Africa 6 -Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting Rabat, June 6, 2011 Rabat
2
Outline Key Element Job Description Language Job Life cycle
18/09/2018 Key Element Job Description Language Job Life cycle Submission and Management of a job Hands-on Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting Rabat
3
Key Elements In order to submit a Job to gLite infrastructure, users contact the Workload Management System (WMS) The Workload Management System (WMS) is the gLite component that allows users to submit jobs performs all tasks required to execute them, without exposing the user to the complexity of the Grid. WMProxy is the service providing access to the gLite WMS Web Services based interfaces, it can be accessed through the published WSDL (WebServiceDescriptionLanguage) implements SOA(ServiceOrientedArchitecture) Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 3
4
Key Elements Job Description Language (JDL) is the language used to describe a job. User have to describe his jobs and their requirements, and to retrieve the output when the jobs are finished. The Command Line Interface is a suite of gLite commands used in order to interact with the WMS. Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 4
5
gLite Architecture Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 5
6
Job Flow Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
7
cancellation) expressed
Job Flow Job management requests (submission, cancellation) expressed via a Job Description Language (JDL) Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
8
Job Flow Finds an appropriate CE for each submission
request, taking into account job requests and preferences, Grid status, utilization policies on resources Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
9
immediately available
Job Flow Keeps submission requests Requests are kept for a while if no resources are immediately available Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
10
Job Flow Repository of resource
information available to matchmaker Updated via notifications and/or active polling on resources Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
11
Job Flow The LB is responsible to:
- Stores events generated by the various components of the WMS - Querying the LB user can retrieve information about the status of the job Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
12
Job Description Language
The Job Description Language (JDL) is a high-level language based on the Classified Advertisement (ClassAd) language, used to describe jobs and aggregates of jobs with arbitrary dependency relations. A job description is a file (called JDL file) consisting of lines having the format: attribute = expression; Expressions can consist of several lines, but only the last one must be terminated by a semicolon. Literal strings are enclosed in double quotes. If a string itself contains double quotes, they must be escaped with a backslash (e.g.: Arguments = "\"hello\" 10“;). 12
13
Simple example Type = "Job"; JobType = "Normal"; Executable = "my_exe"; StdInput = "myinput.txt"; StdOutput = "message.txt"; StdError = "error.txt"; InputSandbox = {"myinput.txt","/home/user/example/myexe"}; OutputSandbox = {"message.txt", "error.txt"}; Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 13
14
18/09/2018 Type The JDL allows description of the following Types of requests (the ones suppor ted by the WMS): Job, a simple job (default) DAG, a Direct Acyclic Graph of dependent jobs Collection, a set of independent jobs Although DAGs and collections represent sets of jobs, they are described through a single JDL file, and submitted in one shot to the WMS Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 14 Rabat 14
15
JobType This attribute is a string representing the type of the job described by the JDL; possible values are: Normal (default) MPICH (deprecated) Parametric This attribute only makes sense when the Type attribute equals to “Job” Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
16
Executable InputSandbox = {"/home/amina/my_exe"};
The Executable attribute specifies the command to be run by the job. The user can specify an executable that resides already t on the remote CE- worker nodes, it must be expressed as an absolute path; Executable = “/user/local/java/bin/java”; or Executable = “$JAVA_HOME/bin/java”; If it has to be copied from the UI, only the file name must be specified, and the path of the command on the UI should be given in the InputSandbox attribute. Executable = “my_exe"; InputSandbox = {"/home/amina/my_exe"}; Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 16
17
Arguments The Arguments attribute contain a string value, which is taken as argument list for the executable: Special characters, such as &,|,\,<,>, should be preceded by triple \, or specified inside quoted strings Executable = “my_exe”; Arguments = “-i args_input -o file_out” InputSandbox = {“/home/of/UI/my_exe”,”args_input”}; Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 17
18
StdInput = “my_job_input”;
This attribute is a string representing the standard input of the executable. This means that the job is run as follows: bash # job < standard_input by using the bash redirection StdInput = “my_job_input”; Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
19
StdOutput and StdError
The attributes StdOutput and StdError define the name of the files containing the standard output and standard error of the executable, once the job output is retrieved. StdOutput = "std.out"; StdError = "std.err"; Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 19
20
InputSandbox This attribute specifies the list of files on the UI local file system (or an accessible gridFTP server), needed by the job for running. These files are transferred from the UI to the WMS, and then downloaded on the WN The InputSandbox cannot contain two files with the same name, even if they have a different absolute path, as when transferred they would overwrite each other InputSandbox = {“file_1”,...,“file_N”}; Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
21
InputSandboxBaseURI 18/09/2018 A new feature introduced by the gLite WMS is the possibility to indicate input sandbox files not stored on the UI,but on a GridFTP server, and, similarly, to specify that output files should be transferred to a GridFTP server when the job Finishes. InputSandbox = {“file_1”,...,“file_N”}; InputSandboxBaseURI = “gsiftp://ipaddress.of.gsiFT.server:432/tmp”; Represents the URI on a gridFTP server where the InputSandbox files (absolute/relative paths)are available for being transferred to WNs Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 21 Rabat
22
OutputSandbox = {“out_1”,...,“out_N”};
Represents the list of output files, generated at runtime by the executable, to be transferred back,retrieved to the UI after the job is finished File names can be provided as simple file names or relative paths with respect to the current working directory on the WN. The list should NOT contain two or more files having the same name, as when are transferred on the WMS machine they would over write themselves. OutputSandbox = {“out_1”,...,“out_N”}; Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
23
OutputSandboxDestURI
In order to store the output sandbox files to a GridFTP server at job completion, the OutputSandboxDestURI attribute must be used together with the usual OutputSandbox attribute. where the first two files have to be copied to a GridFTP server, while the third file will be copied back to the WMS with the usual mechanism. Clearly, glite-wms-job-output will retrieve only the third file. OutputSandbox = {"fileA", "data/fileB", "fileC"}; OutputSandboxDestURI= {"gsiftp://lxb0707.cern.ch/cms/doe/fileA", "gsiftp://lxb0707.cern.ch/cms/doe/fileB","fileC"}; Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
24
OutputSandboxBaseDestURI
Another possibility is to use the OutputSandboxBaseDestURI attribute to specify a base URI on a GridFTP server where the files listed in OutputSandbox will be copied.. will copy both files under the specified GridFTP URI. Note: the directory on the GridFTP where the files have to be copied must already exist. OutputSandbox = {"fileA", "fileB"}; OutputSandboxBaseDestURI = “gsiftp://ipaddress.of.the.server:5432/tmp”; Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
25
OutputSandbox notes The OutputSandboxDestURI and the OutputSandboxBaseDestURI attributes, cannot be pecified together in the same JDL. The OutputSandboxDestURI list must have the same cardinality as the OutputSandbox list If neither OutputSandboxDestURI nor OutputSandboxBaseDestURI are specified, then all the files listed in the OutputSandbox will be available on the WMS node for retrieval Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
26
Environment and Virtual Organisation
This attribute is a list of string representing environment settings on the WN, needed by the job to run The VirtualOrganisation attribute can be used to explicitly specify the VO of the user: Environment = {“JOB_LOG_FILE=/tmp/job.log”, “JAVABIN=/usr/local/bin/java”}; VirtualOrganisation = “gilda"; Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
27
RetryCount It is possible to have the WMS automatically resubmit jobs which, for some reason, are aborted by the Grid. The user can limit the number of times the WMS should resubmit a job by using the JDL attribute RetryCount RetryCount = 5; Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
28
Requirements The Requirements attribute can be used to express constraints on the resources where the job should run. Its value is a Boolean expression that must evaluate to true for a job to run on that specific CE. Note: Only one Requirements attribute can be specified (if there are more than one, only the last one is considered).If several conditions must be applied to the job, then they all must be combined in a single Requirements attribute. For example, let us suppose that the user wants to run on a CE using PBS as batch system, and whose WNs have at least two CPUs. He will write then in the job description file: Requirements = other.GlueCEInfoLRMSType == "PBS" && other.GlueCEInfoTotalCPUs > 1; ! Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 28
29
Requirements The WMS can be also asked to send a job to a particular queue in a CE with the following expression: Requirements = other.GlueCEUniqueID == "lxshare0286.cern.ch:2119/jobmanager-pbs-short"; It is also possible to use regular expressions when expressing a requirement. Let us suppose for example that the user wants all his jobs to run on any CE in the domain cern.ch. This can be achieved putting in the JDL file the following expression: Requirements = RegExp("cern.ch",other.GlueCEUniqueID); The opposite can be required by using: Requirements = (!RegExp("cern.ch", other.GlueCEUniqueID)); Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
30
Job Description Language
The character “ ‘ ” cannot be used in the JDL. Comments must be preceded by a sharp character (#) or a double slash (//) at the beginning if each line. Multi-line comments must be enclosed between “/*” and “*/” . Attention! The JDL is sensitive to blank characters and tabs. No blank characters or tabs should follow the semicolon at the end of a line. Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 30
31
Job Life Cycle Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 31
32
Job submission Initialisation of the Proxy:
Before the user can use the glite-wms-job-* commands he must set up a Short Term Proxy ~]$ voms-proxy-init --voms eumed Enter GRID pass phrase: Your identity: /C=IT/O=INFN/OU=Personal Certificate/L=DZ-eScience/CN=Khedimi Amina Creating temporary proxy Done Contacting voms2.cnaf.infn.it:15016 [/C=IT/O=INFN/OU=Host/L=CNAF/CN=voms2.cnaf.infn.it] "eumed" Done Creating proxy Done Your proxy is valid until Mon Jun 6 00:16: Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
33
Delegating a proxy to WMProxy
Each job submitted to WMProxy must be associated to a proxy credential previously delegated by the owner of the job to the WMProxy server. This proxy is then used any time WMProxy needs to interact with other services for job related operations (e.g. submission to the CE, a GridFTP file transfer etc) There are two possible mechanisms to ask for a delegation of the user credentails: asking the “automatic” delegation of the credentials during the submission operation asking for an “explicit“ delegation Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
34
Delegating a proxy to WMProxy
18/09/2018 To automatically delegate a user proxy to WMProxy, the command to use is: glite-wms-job-delegate-proxy -a To explicitly delegate a user proxy to WMProxy, the command to use is: glite-wms-job-delegate-proxy -d <delegID> where <delegID> is a string chosen by the user. For example, to delegate a proxy: ~]$ glite-wms-job-delegate-proxy -d amina Connecting to the service er ================== glite-wms-job-delegate-proxy Success ================== Your proxy has been successfully delegated to the WMProxy(s): with the delegation identifier: amina ========================================================================== Instead of creating a delegation ID with -d, the -a option can be used. This causes a delegated proxy to be established automatically. In this case you do not need to remember a delegation identifier. However, repeated use of this option is not recommended, since it delegates a new proxy each time the commands are issued. Delegation is a time-consuming operation, so it's better to use glite-wms-job-delegate-proxy and reuse the delegation ID when submitting your jobs. Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 34 Rabat
35
Matching computing elements
It is possible to see which CEs are useful to run a job glite-wms-job-list-match <JDL file> -a Automatic delegation or -d <dID> Use a previous explicitly delegated proxy. -o <file> Store the CE list in a file [ ~]$ glite-wms-job-list-match -a hostname.jdl Connecting to the service ========================================================================== COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* - cccreamceli09.in2p3.fr:8443/cream-sge-long - cccreamceli09.in2p3.fr:8443/cream-sge-medium - cccreamceli09.in2p3.fr:8443/cream-sge-short - ce0.m3pec.u-bordeaux1.fr:2119/jobmanager-pbs-eumed - ce01.isabella.grnet.gr:2119/jobmanager-pbs-eumed - cream.sns.it:8443/cream-pbs-grid - ========================================================================== Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
36
Submitting a simple job
18/09/2018 Starting from a simple JDL file, we can submit it via WMProxy by doing: $ glite-wms-job-submit –d mydelegID –r <CEId> test.jdl Options:- -a delegation -d <dID> Use a previous explicitly delegated proxy. must use either -a or -d -o <file> Append jobId to specified file (creating if necessary) -r <CE> Send a job directly to a particular CE. Don't check CE for suitability or create a BrokerInfo file. ~]$ glite-wms-job-submit -d amina hostname.jdl Connecting to the service ====================== glite-wms-job-submit Success ====================== The job has been successfully submitted to the WMProxy Your job identifier is: ========================================================================== glite-wms-job-submit –a test.jdl For the automatic delegation the job identifier (jobID), which uniquely defines the job and can be used to perform further operations on the job, like interrogating the system about its status, or canceling it The format of the jobID is: Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 36 Rabat
37
Retrieving the status of a job
~]$ glite-wms-job-status ======================= glite-wms-job-status Success ===================== BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Done (Success) Exit code: Status Reason: Job terminated successfully Destination: lpsc-ce.in2p3.fr:2119/jobmanager-pbs-eumed Submitted: Sun Jun 5 14:14: CET ========================================================================== The verbosity level controls the amount of information provided. The value of the -v option ranges from 0 to 3. The commands to get the job status can have several jobIDs as arguments, or you can use the -i <file path> option: glite-wms-job-status –i jobid Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 37
38
Cancelling a job ~]$ glite-wms-job-cancel Are you sure you want to remove specified job(s) [y/n]y : y Connecting to the service ============================= glite-wms-job-cancel Success ============================= The cancellation request has been successfully submitted for the following job(s): - =================================================================================== If the cancellation is successful, the job will terminate in status CANCELLED Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 38
39
Retrieving the output(s)
18/09/2018 ~]$ glite-wms-job-output Connecting to the service ================================================================================ JOB GET OUTPUT OUTCOME Output sandbox files for the job: have been successfully retrieved and stored in the directory: /tmp/jobOutput/amina_P1wT0EB-b-6opI0BHuOWkQ The default location for storing the outputs (normally /tmp) is defined in the UI configuration, but it is possible to specify in which directory to save the output using the --dir <path name> option. glite-wms-job-output –i jobId –dir /path Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 39 Rabat 39
40
Jobs State Machine (1/9) 18/09/2018 Submitted job is entered by the user to the User Interface Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 40 Rabat 40
41
Jobs State Machine (2/9) 18/09/2018 Waiting job accepted and waiting for Workload Manager processing. Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 41 Rabat 41
42
Jobs State Machine (3/9) 18/09/2018 Ready job processed by WM but not yet transferred to the CE (local batch system queue). Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 42 Rabat 42
43
Jobs State Machine (4/9) Scheduled job waiting in the queue on the CE.
18/09/2018 Scheduled job waiting in the queue on the CE. Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 43 Rabat 43
44
Jobs State Machine (5/9) Running job is running.
Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 44
45
Jobs State Machine (6/9) Done job exited or considered to be in a terminal state by CondorC (e.g., submission to CE has failed in an unrecoverable way). Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 45
46
Jobs State Machine (7/9) 18/09/2018 Aborted job processing was aborted by WMS (waiting in the WM queue or CE for too long, expiration of user credentials). Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 46 Rabat 46
47
Jobs State Machine (8/9) 18/09/2018 Cancelled job has been successfully canceled on user request. Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 47 Rabat 47
48
Jobs State Machine (9/9) Cleared output sandbox was transferred to
18/09/2018 Cleared output sandbox was transferred to the user or removed due to the timeout. Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 48 Rabat 48
49
..an useful reminder Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 49
50
Sammary Jobs run in batch mode on traditional gLite grids.
Steps in running a job on a gLite grid with WMS: Create a text file in “Job Description Language” Create a proxy Optional check: list the compute elements that match your requirements (“glite-wms-job-list-match myfile.jdl” command) Submit the job ~ “glite-wms-job-submit myfile.jdl” Non-blocking - Each job is given an id. Occasionally check the status of your job (“glite-wms-job-status” command) When “Done” retrieve output (“glite-wms-job-output” command) Or just cancel the job (“glite-wms-job-cancel” command) Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
51
JDL Attributes Specification
References WMProxy User’s guide JDL Attributes Specification gLite User’s guide Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 51
52
Questions … Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 52
53
https://grid.ct.infn.it/twiki/bin/view/GILDA/SimpleJobSubmission
Hands-on Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting 53
54
Questions … Rabat, Joint CHAIN/EPIKH/EUMEDGRID Support event in School on Application Porting
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.