Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Plan for this morning: Description of the EDG WP 1 software: How it works, basic commands, how to get started etc Example of how to submit jobs: From.

Similar presentations


Presentation on theme: "The Plan for this morning: Description of the EDG WP 1 software: How it works, basic commands, how to get started etc Example of how to submit jobs: From."— Presentation transcript:

1 The Plan for this morning: Description of the EDG WP 1 software: How it works, basic commands, how to get started etc Example of how to submit jobs: From “hallo world” to writing to and from storage elements Experience of using the testbed: Alex actually uses the testbed for LISA and UKDMC. Job Submission in SAM: SAM has different, data centric view of the world These are meant to be interactive, so please ask questions!

2 Job Submission and Resource Brokering WP 1 (lots of slides but most just for reference)

3 Contents: The components What (should) works now and configuration How to submit jobs … the UI and JDL Planned future functionality What is in place at the moment How to get started Documentation available from: http://server11.infn.it/workload-grid/documents.htm (Linked from marianne.in2p3.fr) Particularly useful “JDL HowTo”, “User Interface man pages” and “User and Administrator Guide”.

4

5 The User Interface (UI): All user interactions are through the UI Installed on the submitting machine Communicates with both the Resource Broker (RB) and the Logging Broker (LB) On job submission the UI assigns a unique job identifier to the job (dg_jobId), sends the executable, Job Description File and Input Sandbox to the RB. It also sends notification of the submission to the LB.

6 The User Interface (UI): The UI can also be used to query the status of a job… which it does by interrogating the LB Configuration: The UI configuration is contained /etc/UI_ConfigEnv.cfg which contains the following information: address and port of accessible RBs address and port of accessible LBs default location of the local storage areas for the Input/Output sandbox files default values for the JDL mandatory attributes default number of retrials on fatal errors when connecting to the LB.

7 ################################################# # LB addresses are in the format # # : of all LBs accessible by the UI # ################################################# %beginLB% https://gm04.hep.ph.ic.ac.uk:7846 %endLB% ################################################### # RB addresses are in the format # # : of all RBs accessible by the UI # ################################################### %beginRB% gm04.hep.ph.ic.ac.uk:7771 %endRB%

8 ############################################### # UI needed environment settings # # and corresponding default values # # Format is always =. # ############################################### ## Stage IN/OUT Storage Paths ## DEFAULT_STORAGE_AREA_IN = /tmp ## Default values for Mandatory Attributes ## requirements = TRUE rank = - other.EstimatedTraversalTime

9 The User Interface (UI): Users concurrently using the same submitting machine use the same configuration files. For users (or groups of users) having particular needs it is possible to “customise” the UI configuration through the -config option supported by each UI command.

10 The Resource Broker (RB): Situated at a central location (not local to your machine). Expected to have one per VO, currently one at CERN for EDG (with backup at CNAF), one at IC for GridPP Jobs are queued locally(stored in a PostgreSQL Database) Interrogates the replica catalogue and the information services and attempts to match the job to an available resource. Matching is based on the Condor ClassAd Libraray. If a suitable match is made the RB can submit the job to the Job Submission Service (JSS). Of course all events and status information is sent to the LB.

11 The Resource Broker (RB): Configuration: Most people will never need to configure their own RB. However for completeness the configuration file is: /etc/rb.conf. This contains entries for the replica catalogue, the MDS etc. For more detailed information see the “Administrator and User Guide”. Input/Output Sandboxes etc are stored on the machine hosting the RB and so a reasonable amount of disk space is required.

12 The Job Submission Service (JSS): If the RB has successfully matched a job to a resource it is passed to the JSS (which is usually on the same machine). The JSS queues the job internally in a PostgreSQL database. Job submission is performed using Condor-G The JSS also monitors job until their completion, notifying the LB of any significant events.

13 The Job Submission Service (JSS): Configuration: Again most people will need to configure a JSS sever. The configuration file is /etc/jss.conf

14 The Logging Broker (LB): All events throughout the job submission, execution and output retrieval processes are logged by the LB in a MySQL database. All information is time stamped. It is through the logged information that users are able to discover the state of their jobs.

15 The Logging Broker (LB): Configuration: An LB local logger must be installed on all machines which are pushing information into the LB system (RB and JSS machines and the gatekeeper machines of each CE). An exception to this is the job submission machine which can have a local logger but it is not mandatory. The LB server needs only be installed on a server machine.

16 The Logging Broker (LB): Configuration: The local logger requires no configuration and the server is configured when the database is created using /etc/server.sql. No further configuration is required.

17 Submitting a job: First you have to describe your job in JDL file. JDL is based on Condor ClassAds. ClassAds are: (Statements from the manual) Declarative – rather than procedural… that is they describe notions of compatibility rather than specifying a procedure to determine compatibilty Simple – both syntactically and semantically … easy to use Portable – Nothing is used that requires features specific to a given architecture

18 Submitting a job: ClassAds have dynamic typing and so only values have types (not expressions) As well as the usual type (numeric, string Boolean) values can also have types such as time intervals and timestamps and esoteric values such as undefined and error. ClassAds can be nested ClassAds have the usual set of operators (See the JDL how to).

19 Submitting a job: An example: Executable = "WP1testF"; StdOutput = "sim.out"; StdError = "sim.err"; InputSandbox = {"/home/datamat/sim.exe", "/home/datamat/DATA/*"}; OutputSandbox = {"sim.err","sim.err","testD.out"}; Rank = other.TotalCPUs * other.AverageSI00; Requirements = other.LRMSType == "PBS" \ && (other.OpSys == "Linux RH 6.1" || other.OpSys == "Linux RH 6.2") && \ self.Rank > 10 && other.FreeCPUs > 1; RetryCount = 2; Arguments = "file1"; InputData = "LF:test10099-1001"; ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica Catalog,dc=sunlab2g, dc=cnaf, dc=infn, dc=it"; DataAccessProtocol = "gridftp"; OutputSE = "grid001.cnaf.infn.it";

20 Submitting a job: ANDFTUE OROR FTUE NOT FFFFE FFTUE FT TFTUE TTTTE TF UFUUE UUTUE UU EEEEE EEEEE EE

21 Submitting a job: – dg-job-submit Allows the user to submit a job for execution on remote resources in a grid. SYNOPSIS dg-job-submit [-help] dg-job-submit [-version] dg-job-submit [-template] dg-job-submit [-input input_file | - resource res_id] [-notify e_mail_address] [-config group_name] [-output out_file] [-noint] [-debug]

22 – dg-list-job-match Returns the list of resources fulfilling job requirements. SYNOPSIS dg-list-job-match [-help] dg-list-job-match [-version] dg-list-job-match [-verbose] [-config group_name] [-output output_file] [-noint] [-debug] – dg-job-cancel Cancels one or more submitted jobs. SYNOPSIS dg-job-cancel [-help] dg-job-cancel [-version] dg-job-cancel [-notify e_mail_address] [-config group_name] [-output output_file] [-noint] [-debug]

23 – dg-get-job-output This command requests the RB for the job output files (specified by the OutputSandbox attribute of the job-ad) and stores them on the submitting machine local disk. SYNOPSIS dg-get-job-output [-help] dg-get-job-output [-version] dg-get-job-output [-dir directory_path] [-config group_name] [-noint] [-debug] Examples Let us consider the following command: $> dg-get-job-output https://grid004.it:2234/124.75.74.12/12354732109721?www.rb.com:4577 –dir /home/data it retrieves the files listed in the OutputSandbox attribute from the RB and stores them locally in /home/data/12354732109721.

24 – dg-job-status Displays bookkeeping information about submitted jobs. SYNOPSIS dg-job-status [-help] dg-job-status [-version] dg-job-status [-full] [-config group_name] [-output output_file] [-noint] [-debug]

25 Examples $> dg-job-status dg_jobId2 displays the following lines: ******************************************************************** BOOKKEEPING INFORMATION Printing status for the job: dg_jobId2 --- dg_JobId= firefox.esrin.esa.it__20010514_163007_21833_RB1_LB3 Job Owner =/C=IT/O=ESA/OU=ESRIN/CN=Fabrizio Pacini/Email=fpacini@datamat.it Status= RUNNING Location= firefox.esa.it:2119/jobmanager-condor Job Destination = http://ramses.esrin.esa.it/rams/dataset1 Status Enter Time = 10:24:32 05-06-2001 GMT Last Update Time = 10:25:45 05-06-2001 GMT CpuTime= 1 ********************************************************************

26 – dg-get-logging-info Displays logging information about submitted jobs. SYNOPSIS dg-get-logging-info [-help] dg-get-logging-info [-version] dg-get-logging-info [- from T1] [-to T2] [-level logLevel] [-config group_name] [-output output_file] [- noint] [-debug]

27 Job Submission: There is a GUI

28

29

30

31

32

33

34 Things to come over the next year: ReleaseItem 1.4Interactive jobs 2Job Partitioning 1.3MPI Jobs 1.3APIs for applications 2Accounting 2Checkpointing and Job Partitioning 1.4Advanced reservation 1.4Dependent Jobs 1.2Proxy renewal

35 What is in place in the UK testbed? (an RB centric view of the world) Only GridPP and Babar VOs Imperial R. B. JSS II LB Bristol Replica Catalogue Imperial CE, SE, UI RAL CE, SE, UI Birmingham CE, UI Liverpool CE, UI Bristol CE, UI QMUL CE, UI RHUL CE, UI IN2P3-Babar UI

36 Also the main EDG testbed: The current sites are: RAL, IC, IN2P3, CERN, CNAF, Catania, Padova, Torino, NIKHEF, MSU This will grow, soon Bristol, Liverpool, Manchester, UCL(?)

37 Current problems: Biggest by far is Stability particularly between the RB and the JSS If RB/JSS go down jobs can be left hanging around Some bugs associated with file storage GDMP 2 is pretty useless Hope and expectations: Realease 1.2 should be a lot more stable, and have bugs fixed. GDMP 3 is much better, and is out

38 How get started You need a machine with UI … gppui.gridpp.rl.ac.uk for example You need to get a certificate … grid-cert-request Register with suitable VO… either experiment specific or GridPP for UK sites/ WP6 for EDG. For EDG this also involves signing usage rules. Follow “register” link from Marianne.in2p3.fr Once you have a certificate try the “Hello World” example


Download ppt "The Plan for this morning: Description of the EDG WP 1 software: How it works, basic commands, how to get started etc Example of how to submit jobs: From."

Similar presentations


Ads by Google