Presentation is loading. Please wait.

Presentation is loading. Please wait.

FESR Consorzio COMETA - Progetto PI2S2 The gLite Workload Management System Annamaria Muoio INFN Catania Italy

Similar presentations


Presentation on theme: "FESR Consorzio COMETA - Progetto PI2S2 The gLite Workload Management System Annamaria Muoio INFN Catania Italy"— Presentation transcript:

1 www.consorzio-cometa.it FESR Consorzio COMETA - Progetto PI2S2 The gLite Workload Management System Annamaria Muoio INFN Catania Italy Annamaria.muoio@ct.infn.it Tutorial per utenti e sviluppo di applicazioni in Grid 16 - 20 July 2007 Catania

2 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 2 Outline This presentation will cover the following arguments: Overview of the gLite WMS Architecture Job Description Language Overview - Principal Attributes References and hands-on

3 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 3 Overview of gLite Middleware

4 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 4 Workload Management System Workload Management System Workload Management System (WMS) comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources. Purpose of Workload Manager (WM) is accept and satisfy requests for job management coming from its clients meaning of the submission request is to pass the responsibility of the job to the WM. WM will pass the job to an appropriate CE for executiontaking into account requirements and the preferences expressed in the job description. matchmaking The decision of which resource should be used is the outcome of a matchmaking process.

5 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 5 UI Job Contr. - Condor Computing Element Storage Element CE characts & status SE characts & status Job submission LFC Information System Logging & Book-keeping Network Server

6 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 6 Logging & Book-keeping UI Job Contr. - CondorG Computing Element Storage Element CE characts & status SE characts & status Job Status UI: allows users to access the functionalities of the WMS (via command line, GUI, C++ and Java APIs) WMS: Workload Management System LFC Information System Network Server

7 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 7 Network Server Logging & Book-keeping Information System LFC UI Job Contr. - CondorG Computing Element Storage Element RB node CE characts & status SE characts & status edg-job-submit myjob.jdl myjob.jdl JobType = “Normal”; Executable = "$(CMS)/exe/sum.exe"; InputSandbox = {"/home/user/WP1testC","/home/file*”}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = other. GlueHostOperatingSystemName == “linux" && other.GlueCEPolicyMaxCPUTime > 10000; Rank = other.GlueCEStateFreeCPUs; submitted Job Status Job Description Language (JDL) to specify job characteristics and requirements Job Status

8 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 8 UI Job Contr. - CondorG Computing Element Storage Element CE characts & status SE characts & status WMS storage Input Sandbox files Job submitted Job Status NS: network daemon responsible for accepting incoming requests LFC Information System Logging & Book-keeping Job Status Network Server

9 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 9 Logging & Book-keeping UI Job Contr. - CondorG Computing Element Storage Element CE characts & status SE characts & status WMS storage waiting submitted Job Status WM: responsible to take the appropriate actions to satisfy the request Job Where must this job be executed ? Match- Maker/ Broker LFC Information System Job Status Network Server

10 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 10 Network Server UI Job Contr. - Condor Computing Element Storage Element CE characts & status SE characts & status WMS storage waiting submitted Job Status Match- Maker/ Broker Where are (which SEs) the needed data ? What is the status of the Grid ? Matchmaker: responsible to find the “best” CE where to submit a job LFC Information System Logging & Book-keeping Job Status

11 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 11 UI Job Contr. - Condor Computing Element Storage Element CE characts & status SE characts & status WMS storage waiting submitted Job Status Match- Maker/ Broker CE choice LFC Information System Logging & Book-keeping Job Status Network Server

12 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 12 Job Status Logging & Book-keeping UI Job Contr. - Condor Computing Element Storage Element CE characts & status SE characts & status WMS storage waiting submitted Job Status Job Adapter JA: responsible for the final “touches” to the job before it’s passed to Condor (e.g. creation of wrapper script, etc.) LFC Information System Network Server

13 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 13 Job Status Logging & Book-keeping UI Job Contr. - Condor Computing Element Storage Element CE characts & status SE characts & status WMS storage Job Status JC: responsible for the actual job management operations (done via CondorG) Job submitted waiting ready LFC Information System Network Server

14 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 14 UI Job Contr. - Condor Computing Element Storage Element CE characts & status SE characts & status WMS storage Job Status Job Input Sandbox files submitted waiting ready scheduled LFC Information System Logging & Book-keeping Job Status Network Server

15 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 15 UI Job Contr. - Condor Computing Element Storage Element WMS storage Job Status Input Sandbox submitted waiting ready scheduled running “Grid enabled” data transfers/ accesses Job LFC Information System Logging & Book-keeping Job Status Network Server

16 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 16 UI Job Contr. - Condor Computing Element Storage Element WMS storage Job Status Output Sandbox files submitted waiting ready scheduled running done LFC Information System Logging & Book-keeping Job Status Network Server

17 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 17 UI Job Contr. - Condor Computing Element Storage Element WMS storage Job Status Output Sandbox submitted waiting ready scheduled running done edg-job-get-output LFC Information System Logging & Book-keeping Job Status Network Server

18 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 18 UI Job Contr. - Condor Computing Element Storage Element WMS storage Job Status Output Sandbox files submitted waiting ready scheduled running done cleared LFC Information System Logging & Book-keeping Job Status Network Server

19 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 19 Possible job states Flag Meaning SUBMITTEDsubmission logged in the LB WAITjob match making for resources READYjob being sent to executing CE SCHEDULEDjob scheduled in the CE queue manager RUNNINGjob executing on a WN of the selected CE queue DONEjob terminated without grid errors CLEAREDjob output retrieved ABORTjob aborted by middleware, check reason

20 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 20 Workload Management System LFCCatalogue Logging & Book-keeping Resource Broker (WorkLoad Mgr.) StorageElement ComputingElement InformationService Job Status DataSets info Author. &Authen. Job Submit Event Job Query Job Status Input “sandbox” Input “sandbox” + Broker Info Output “sandbox” Publish SE & CE info “User interface”

21 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 21 Command Line Interface --vo : perform submission with a different VO than the UI default one. --output, -o save jobId on a file. --resource, -r specify the resource for execution. --nomsgi neither message nor errors on the stdout will be displayed. Job Submission $ edg-job-submit [options]

22 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 22 If the request has been correctly submitted this is the typical output that you can get: edg-job-submit test.jdl ====================edg-job-submit Success ===================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://lxshare0234.cern.ch:9000/rIBubkFFKhnSQ6CjiLUY8Q ============================================================== In case of failure, an error message will be displayed instead, and an exit status different form zero will be retured.

23 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 23 If the command returns the following error message: **** Error: API_NATIVE_ERROR **** Error while calling the "NSClient::multi" native api AuthenticationException: Failed to establish security context... **** Error: UI_NO_NS_CONTACT **** Unable to contact any Network Server it means that there are authentication problems between the UI and the Network Server (check your proxy or contact the site administrator).

24 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 24 It is possible to see which CEs are eligible to run a job specified by a given JDL file using the command edg-job-list-match test.jdl Connecting to host lxshare0380.cern.ch, port 7772 Selected Virtual Organisation name (from UI conf file): dteam ********************************************************************* COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: adc0015.cern.ch:2119/jobmanager-lcgpbs-infinite adc0015.cern.ch:2119/jobmanager-lcgpbs-long adc0015.cern.ch:2119/jobmanager-lcgpbs-short **********************************************************************

25 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 25 After a job is submitted, it is possible to see its status using the glite-job-status command. edg-job-status https://lxshare0234.cern.ch:9000/X-ehTxfdlXxSoIdVLS0L0whttps://lxshare0234.cern.ch:9000/X-ehTxfdlXxSoIdVLS0L0w ************************************************************* BOOKKEEPING INFORMATION: Printing status info for the Job: https://lxshare0234.cern.ch:9000/X-ehTxfdlXxSoIdVLS0L0w Current Status: Scheduled Status Reason: Job successfully submitted to Globus Destination: lxshare0277.cern.ch:2119/jobmanager-pbs-infinite reached on: Fri Aug 1 12:21:35 2003 *************************************************************

26 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 26 After the job has finished (it reaches the DONE status), its output can be copied to the UI edg-job-get-output https://lxshare0234.cern.ch:9000/snPegp1YMJcnS22yF5pFlghttps://lxshare0234.cern.ch:9000/snPegp1YMJcnS22yF5pFlg Retrieving files from host lxshare0234.cern.ch ***************************************************************** JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://lxshare0234.cern.ch:9000/snPegp1YMJcnS22yF5pFlg have been successfully retrieved and stored in the directory: /tmp/jobOutput/larocca_snPegp1YMJcnS22yF5pFlg ***************************************************************** By default, the output is stored under /tmp/jobOutput, but it is possible to specify in which directory to save the output using the - -dir option.

27 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 27 A job can be canceled before it ends using the command edg-job-cancel. edg-job-cancel https://lxshare0234.cern.ch:9000/dAE162is6EStca0VqhVkoghttps://lxshare0234.cern.ch:9000/dAE162is6EStca0VqhVkog Are you sure you want to remove specified job(s)? [y/n]n :y =================== edg-job-cancel Success==================== The cancellation request has been successfully submitted for the following job(s) - https://lxshare0234.cern.ch:9000/dAE162is6EStca0VqhVkog ===========================================================

28 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 28 Job Description Language (JDL)In gLite Job Description Language (JDL) is used to describe jobs for execution on Grid. The JDL adopted within the gLite middleware is CLASSified Advertisement language (ClassAd) based upon Condor CLASSified Advertisement language (ClassAd). – A ClassAd is a record-like structure composed of a finite number of attributes separated by a semi-colon (;) – A ClassAd is highly flexible and can be used to represent arbitrary services match-making processThe JDL is used in gLite to specify the job’s characteristics and constrains, which are used during the match-making process to select the best resources that satisfy job’s requirements.

29 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 29 JDL syntax The JDL syntax consists on statements like: Attribute = value; Comments must be preceded by a sharp character # ( # ) or have to follow the C++ syntax WARNING: The JDL is sensitive to blank characters and tabs. No blank characters or tabs should follow the semicolon at the end of a line.

30 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 30 In a JDL, some attributes are mandatory while others are optional. An “essential” JDL is the following: Executable = “test.sh”; StdOutput = “std.out”; StdError = “std.err”; InputSandbox = {“test.sh”,”input.dat”}; OutputSandbox = {“std.out”,”std.err”}; If needed, arguments to the executable can be passed: Arguments = “Hello World!”;

31 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 31 If the argument contains quoted strings, the quotes must be escaped with a backslash e.g. Arguments = “\”Hello World!\“ 10”; Special characters such as &, |, >, < are only allowed if specified inside a quoted string or preceded by triple \ (e.g. Arguments = "-f file1\\\&file2";) The backtick character ` cannot be specified in the JDL.

32 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 32 JDL : Relevant Attributes

33 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 33 JobType JobType (optional) – Normal (simple, sequential job), Interactive, MPICH, Checkpointable, Partitionable, Parametric – Or combination of them  Checkpointable, Interactive  Checkpointable, MPI JobType = “Interactive”; E.g. JobType = “Interactive”; JobType = {“Interactive”,”Checkpointable”}; JobType = {“Interactive”,”Checkpointable”}; “Interactive” + “MPI” not yet permitted

34 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 34 Executable Executable (mandatory) This is a string representing the executable/command name. The user can specify an executable which is already on the remote CE Executable = {“/opt/EGEODE/GCT/egeode.sh”}; Executable = {“egeode.sh”}; The user can provide a local executable name which will be staged from the UI to the WN Executable = {“egeode.sh”}; InputSandbox = {“/home/larocca/egeode/ InputSandbox = {“/home/larocca/egeode/egeode.sh”};

35 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 35 Arguments Arguments (optional) This is a string containing all the job command line arguments. E.g.: If your executable sum has to be started as: $ sum N1 N2 –out result.out Executable = “sum”; Executable = “sum”; Arguments = “N1 N2 –out result.out”; Arguments = “N1 N2 –out result.out”;

36 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 36 Environment Environment (optional) List of environment settings needed by the job to run properly Environment = {“JAVABIN=/usr/local/java”}; E.g. Environment = {“JAVABIN=/usr/local/java”}; InputSandbox InputSandbox (optional) List of files on the UI local disk needed by the job for running The listed files will automatically staged to the remote resource InputSandbox ={“myscript.sh”,”/tmp/cc,sh”}; E.g. InputSandbox ={“myscript.sh”,”/tmp/cc,sh”};

37 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 37 OutputSandbox OutputSandbox (optional) List of files, generated by the job, which have to be retrieved OutputSandbox = E.g. OutputSandbox ={“std.out”,”std.err”, “image.png” “image.png”};

38 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 38 Requirements Requirements (optional) Job requirements on computing resources Specified using attributes of resources published in the Information Service If not specified, default value defined in UI config\uration file is considered Default. Requirements = other.GlueCEStateStatus == "Production“; Requirements = other.GlueCEInfoLRMSType == “PBS” && other.GlueCEInfoTotalCPUs > 2 && Member (“ALICE-2.1.7”, other.GlueHostApplicationSoftwareRunTimeEnvironment);

39 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 39 Rank Rank (optional) Floating-point expression used to rank CEs that have already fulfill the Requirements expression. The Rank expression can contain attributes that describe the CE in the Information System (IS). The evaluation of the rank expression is performed by the Resource Broker (RB) during the match-making phase. A higher numeric value equals a better rank. Rank = other.GlueCEStateFreeCPUs; E.g.: Rank = other.GlueCEStateFreeCPUs;

40 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 40 InputData InputData (optional) This is a string or a list of strings representing the Logical File Name (LFN) orGrid Unique Identifier (GUID) needed by the job as input. The list is used by the RB to find the CE from which the specified files can be better accessed and schedules the job to run there. InputData = { “lfn:cmstestfile”, “guid:135b7b23-4a6a-11d7-87e7-9d101f8c8b70” “guid:135b7b23-4a6a-11d7-87e7-9d101f8c8b70”};

41 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 41 DataAccessProtocol DataAccessProtocol (mandatory if InputData has been specified) The protocol or the list of protocols which the application is able to “speak” with for accessing files listed in InputData on a given SE. gsiftp file Supported protocols in gLite are currently gsiftp, and file. DataAccessProtocol = {“file”,“gsiftp”};

42 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 42 OutputSE OutputSE (optional) This string representing the URI of the Storage Element (SE) where the user wants to store the output data. This attribute is used by the Resource Broker to find the bestCE “close” to this SE and schedule the job there. OutputSE = “aliserv6.ct.infn.it”;

43 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 43 OutputData OutputData (optional) This attribute allows the user to ask for the automatic upload and registration of datasets produced by the job on the Worker Node (WN). This attribute contains the following three attributes: OutputFile StorageElement LogicalFileName

44 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 44 OutputFile OutputFile (mandatory if OutputData has been specified) This is a string attribute representing the name of the output file, generated by the job on the WN, which has to be automatically uploaded and registered by the WMS. StorageElement StorageElement (optional) This is a string representing the URI of the Storage Element where the output file specified in the OutputFile has to be uploaded by the WMS. LogicalFileName LogicalFileName (optional) This is a string representing the LFN user wants to associate to the output file when registering it to the Catalogue.

45 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 45 NodeNumber NodeNumber (mandatory if JobType=MPICH) NodeNumber attribute is an integer specifying the number of nodes needed for a MPI job. The RB uses this attribute during the matchmaking for selecting those CE having a number of CPUs equals or greater the one specified in NodeNumber. NodeNumber = 5;

46 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 46 JobSteps JobSteps (mandatory for checkpointable or partitionable jobs) JobSteps attribute can be either an integer representing the number of steps for a checkpointable or partitionable job e.g.: JobSteps = 100000; or a list of strings representing labels associated to the steps of a checkpointable or partitionable job e.g.: JobSteps = {“d0”, “d1”, ”gmos”};

47 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 47 CurrentStep CurrentStep (mandatory for checkpointable or partitionable jobs) CurrentStep attribute used to indicate the initial step when submitting a checkpointable or partitionable job. CurrentStep = 2;

48 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 48 References & Hands-on JDL (sottomissione via WMS Netrwork Server) https://edms.cern.ch/file/555796/1/EGEE-JRA1-TEC-555796-JDL- Attributes-v0-7.doc https://grid.ct.infn.it/twiki/bin/view/GILDA/SimpleJobS ubmissionWithRB https://grid.ct.infn.it/twiki/bin/view/GILDA/MoreOnJDL- withedgcommands Remember to initialize the proxy before to interact with the WMS!

49 Tutorial per utenti e sviluppo di applicazioni in Grid – 16 - 20 July -Catania 49 Thank you for your attention !!!!


Download ppt "FESR Consorzio COMETA - Progetto PI2S2 The gLite Workload Management System Annamaria Muoio INFN Catania Italy"

Similar presentations


Ads by Google