Presentation is loading. Please wait.

Presentation is loading. Please wait.

Long term job submission and monitoring uing grid services

Similar presentations


Presentation on theme: "Long term job submission and monitoring uing grid services"— Presentation transcript:

1 Long term job submission and monitoring uing grid services
Riccardo Bruno INFN, Sez. CT 23/07/2007 Meeting sull'uso di applicazioni parallele in PI2S2

2 Outline Long term job submission Long term job monitoring References
MyProxyServer Renewal The renewal process and JDL tag Long term job monitoring Middleware tools How to do monitoring efficiently The Watchdog Watchdog use example The main script The watchdog flow The main script code Some outputs The future … References Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

3 Long term job submission
Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

4 MyProxyServer Proxy has limited lifetime (default is 12 h)
• Bad idea to have longer proxy myproxy server: • myproxy-init –voms <voname> -s <host_name> – Allows to create and store a long term proxy certificate: -s: <host_name> specifies the hostname of the myproxyserver • myproxy-info – Get information about stored long living proxy • myproxy-get-delegation – Get a new proxy from the MyProxy server • myproxy-destroy – Removes the stored proxy from the server Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

5 Renewal • A dedicated service on the RB can renew automatically the proxy: [edg-wl-renewd] - /etc/init.d/edg-wl-proxyrenewal • Some dedicated flags are required during the creation of the long term proxy credential with myproxy-init: – -d : Use the proxy certificate subject (DN) as the default username, instead of the LOGNAME env. var. – -n : Don't prompt for passphrase bash-2.05b$ myproxy-init –voms cometa -d -n Your identity: /C=IT/O=GILDA/L=INFN Catania/CN=Riccardo Bruno/ Enter GRID pass phrase for this identity: Creating proxy Done Proxy Verify OK Your proxy is valid until: Fri Jul 23 09:30: A proxy valid for 168 hours (7.0 days) for user /C=IT/O=GILDA/L=INFN Catania/ CN=Riccardo now exists on grid001.ct.infn.it. Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

6 The renewal process and JDL tag
5 or 10 minutes before the proxy expires the RB proxy renewal daemon will perform the following steps: Contacts the MyProxyServer indicated into the JDL and asks for a new delegation contacts the VOMS server to add the ACs transfers the new VOMS-enabled proxy to the WNs running the job. An additional attribute has to be added to the JDL MyProxyServer = "grid001.ct.infn.it"; The item informs the RB which MyProxyServer has to be contacted to renew the credentials. Otherwise a default one is taken from UI VO configuration settings: glite_wmsui.conf Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

7 Long term job submission
Create the long term proxy on the MyProxy server myproxy-init --voms cometa -d –n Create a new proxy or get the delegation from MyProxy server voms-proxy-init –voms cometa myproxy-get-delegation –d -a $X509_USER_PROXY (Please notice you must have already a valid proxy on the UI) Submit the job normaly edg-job-submit -o jid testmyproxy.jdl bash-2.05b$ myproxy-init –voms cometa -d -n Your identity: /C=IT/O=GILDA/L=INFN Catania/CN=Riccardo Bruno/ Enter GRID pass phrase for this identity: Creating proxy Done Proxy Verify OK Your proxy is valid until: Fri Jul 23 09:30: A proxy valid for 168 hours (7.0 days) for user /C=IT/O=GILDA/L=INFN Catania/ CN=Riccardo now exists on grid001.ct.infn.it. Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

8 Renewal feedback Starting at: subject : /C=IT/O=INFN/…/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=limited proxy type : limited proxy strength : 512 bits path : /tmp/globus-tmp.unime-wn timeleft : 0:56:58 === VO cometa extension information === VO : cometa subject : /C=IT/O=INFN/OU=Personal Certificate/L=Catania/CN=Riccardo Bruno issuer : /C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it attribute : /cometa/Role=NULL/Capability=NULL timeleft : 11:56:01 … Other output from job’ core execution (just sleep execution) subject : /C=IT/O=INFN/…/CN=proxy/CN=proxy/CN=proxy/CN=limited proxy timeleft : 8:45:18 timeleft : 10:26:00 Ending at: This job has been executed with a delegated proxy 1 hr long (myproxy-get-delegation -d -t 1:00 -a $X509_USER_PROXY) The 1° call to voms-proxy-info returns 0:56:58 as time left After the job core execution the 2° call to voms-proxy-info gives 8:45:18 as time left Please notice also the different subjects: /C=IT/O=INFN/…/CN=proxy/CN=proxy/CN=proxy/CN=proxy/CN=limited proxy /C=IT/O=INFN/…/CN=proxy/CN=proxy/CN=proxy/CN=limited proxy Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

9 Long term jobs monitoring
Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

10 Middleware tools Currently gLite offers the following services allowing to monitor the job execution Interactive Jobs or direct use of X server communication via SSH tunneling User forced to use interactive JDL Keep open the X client for the whole job duration Use of RGMA The use of dedicated producers need to apply code changes not ever possible. Code changes are error prone and need to be tested Use of AMGA The use of AMGA APIs requires code changes as well Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

11 How to do monitoring efficiently
IDEA: Perform the job monitoring using still grid services in the less possible invasive way. Observations: Almost all jobs submitted on the grid are piloted by shell scripts Shell scripting allow to get precious info in case of faults Shell scripting can pilot more complex batch processing Both SE and file catalog can be used as the simplest IS on the grid. lfc-* and lcg-* tools already available for file creation and retrieve The latency of CLI tools for the storage is very low compared to long term jobs Requirements: It would be useful to configure the monitoring tool accordingly to the user needs Few shell environment variables can be used to configure the monitoring tool Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

12 The Watchdog The Watchdog is a shell script to be included in the main script. Some watchdog features: It starts in background before to run the long term job The watchdog runs as long as the main job The main script can stop and wait until the watchdog has finished Easily and highly configurable The watchdog does not compromise the CPU power of the WN The watchdog is really simple and its behavior can be extended by the user The best way to explain the watchdog is to make an use example … Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

13 Watchdog use example The simplest use case foresees the following:
The JDL: script.jdl The main script file: script.sh The watchdog script file: watchdog.sh script.jdl Type = "Job"; JobType = "Normal"; Executable = "/bin/bash"; StdOutput = "file.out"; StdError = "file.err"; InputSandbox = {"watchdog.sh", "script.sh"}; OutputSandbox = {"file.out", "file.err", "watchdog.out"}; Arguments = "script.sh"; InputSandbox file.out script.sh file.err watchdog.sh watchdog.out OutputSandbox Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

14 The main script It is a good practice to have a main script like the following structure: Get information about the WN Start the watchdog Stop the watchdog Execute and control the main job Collect information about the job execution Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

15 The watchdog flow Initialization File Catalog/SE USERPATH/JobId
Enter the loop For each file in the list Take a snapshoot (just increments will be copied) <timestamp>_<file_1> <timestamp>_<file_2> <timestamp>_<file_n> VO USERPATH FILE Catalog SE DELAY LIST OF FILES CTLR File exsists Create notification file CTRL file NTFY file Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

16 The main script code # # watchdog – Riccardo Bruno 200707
echo "Starting at: “\ $(date +%y%m%d%H%M%S) HOSTNAME=$(hostname -f) USER=$(whoami) ARG1=$1 LOCALDIR=$(pwd) echo "*****************************" echo "HOST: "$HOSTNAME echo "USER: "$USER echo "ARGS: "$ARG1 echo "LOCALDIR is: "$LOCALDIR echo "HOMEDIR is:"$HOME echo "Content of home:" ls -l $HOME echo "Content of current dir:" ls -l . echo "******************************" #start the watchdog chmod +x watchdog.sh ./watchdog.sh > watchdog.out & # perform 8 iterations, 15 seconds each # 2 minutes for i in $(seq 1 8) do echo "This is mine output at: “\ $(date +%y%m%d%H%M%S) echo "This is mine error at: “\ $(date +%y%m%d%H%M%S) 1>&2 sleep 15 done #stop and wait the dog rm -f watchdog.ctrl while [ ! -e watchdog.done ] sleep 1 echo "Waiting for watchdog: “\ echo "Watchdog closed" echo "done" echo "done" 1>&2 echo "Ending at: "$(date +%y%m%d%H%M%S) Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

17 Some outputs tmp]$ lfc-ls -l /grid/gilda/brunor/2DFfQYycd5guISZSU3ZdOQ -rw-rw-r Jul 18 16: _testmyproxy.out -rw-rw-r Jul 18 16: _testmyproxy.err brunor_2DFfQYycd5guISZSU3ZdOQ]$ cat file.out Starting at: **************************************** <WN INFO …> This is my output at: This is my output at: done Ending at: brunor_2DFfQYycd5guISZSU3ZdOQ]$ cat file.err This is my error at: brunor_2DFfQYycd5guISZSU3ZdOQ]$ cat watchdog.out Starting watchdog at: guid:205a e0-4c68-b963-2facf30efb6f guid:a21f30b4-46cf-4e63-919b-ceb911bfe710 Ending watchdog at: Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

18 The future … The watchdog can be easily improved
Use a special folder in the catalog to be used as a virtual UI on the WN allowing the user to issue shell commands: WD_USER_PATH/<JobId>/ <timestamp>_file_1 <timestamp>_file_2 <timestamp>_file_n UI/ commands <timestamp>_cmdresult_1 Use of AMGA/RGMA CLI tools instead of the catalog Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

19 References The watchdog wiki
Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,

20 Questions… Catania, Meeting sull'uso di applicazioni parallele in PI2S2 ,


Download ppt "Long term job submission and monitoring uing grid services"

Similar presentations


Ads by Google