Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.eu-eela.eu E-science grid facility for Europe and Latin America Watchdog: A job monitoring solution inside the EELA-2 Infrastructure Riccardo Bruno,

Similar presentations


Presentation on theme: "Www.eu-eela.eu E-science grid facility for Europe and Latin America Watchdog: A job monitoring solution inside the EELA-2 Infrastructure Riccardo Bruno,"— Presentation transcript:

1 www.eu-eela.eu E-science grid facility for Europe and Latin America Watchdog: A job monitoring solution inside the EELA-2 Infrastructure Riccardo Bruno, Roberto Barbera, Elisa Ingrà INFN Sez. Catania (Italy) 2nd EELA-2 Conference Choroni (Venezuela), 25-27.11.2009

2 www.eu-eela.eu Job Monitoring in gLite Before gLite v3.1 no job monitoring systems were available Jobs running into the WNs are considered as Black Boxes No prompted job status retrieval (Done/Abort/…) Output Sandbox available only after WMS recognize job completion This situation was not good for jobs requesting very long computational time. 2 Choroni (Venezuela), 2 nd EELA-2 Conference, 25-27.11.2009 Jobs WMS CE WNs WN ? Output SandBox

3 www.eu-eela.eu Analysis Need –Get in touch with the jobs running into the WN (especially for long term jobs) monitoring and controlling their execution. How –Perform job control and monitoring using grid services in the less invasive way for the application. Observations –Almost all Grid jobs are piloted by a main shell script:  Get precious info in case of faults  Pilot complex batch workflows –Both AMGA and SE+LFC can be used as a basic Grid Info System  lfc-* and lcg-* tools already available for Grid file management  mdcli AMGA command can be used by jobs on the WNs  cp command in case of shared file system on the WN  The latency of CLI tools is very low compared to long term jobs 3 Choroni (Venezuela), 2 nd EELA-2 Conference, 25-27.11.2009

4 www.eu-eela.eu Requirements Monitor job execution timely watching files produced by the job while it executes on the WN –File snapshots will be reported on LFC+SE, AMGA servers or mounted shared FSs It would be useful to configure the monitoring tool accordingly to the user needs –The monitoring tool will consist only of bash script files –Few shell environment variables can be used to configure the monitoring behavior Control the job execution accessing directly on the WN –It is possible to send user commands on the WN –It is possible to change the monitoring while the Grid job runs 4 Choroni (Venezuela), 2 nd EELA-2 Conference, 25-27.11.2009

5 www.eu-eela.eu The Watchdog The Watchdog consists of set of shell scripts to be included in the JDL InputSandbox and then called by the pilot script. Watchdog features: –It starts in background before to run the Grid job –The watchdog runs as long as the main job –The monitoring process can be piloted until the pilot script has not finished –Easily configurable and customizable –The watchdog does not compromise the CPU power of the WN –The watchdog can be used with MPI jobs –Files may be fully or partially reported (only last changes) 5 Choroni (Venezuela), 2 nd EELA-2 Conference, 25-27.11.2009

6 www.eu-eela.eu WD Main Components watchdog.sh –The WD core main script, it is the responsible of the job monitoring file snapshot reporting and user command execution watchdog.ctrl –This script controls the execution of the WD core script; it can: start, stop, pause and resume the WD. It can be also used to: alter the time interval add/remove files to watch and change reporting strategy (full/partial) watchdog.conf –This script contains all environment variables needed to configure the WD – The use of AMGA reporting requires more files Choroni (Venezuela), 2 nd EELA-2 Conference, 25-27.11.2009 6

7 www.eu-eela.eu WD Additional Components getinfo.sh / setinfo.sh getcontent.sh / setcontent.sh (AMGA) –Utilities to set/get WD reported information from/to AMGA metadata catalog uuencode / uudecode (shareutils) (AMGA) –Executables needed by WD to encode binaries and multiline text content into the AMGA metadata catalog in Base64 text format. –In EELA-2 (prod VO) available into:  $VO_PROD_VO_EU_EELA_EU_SW_DIR wdcli –CLI application to let the user interact with the WD Choroni (Venezuela), 2 nd EELA-2 Conference, 25-27.11.2009 7

8 www.eu-eela.eu WD Usage Choroni (Venezuela), 2 nd EELA-2 Conference, 25-27.11.2009 8 1.Configure the Watchdog setting the watchdog.conf file 2.Applications using Watchdog MUST include the files: watchdog.sh, watchdog.ctrl, watchdog.conf, uuencode,uudecode (in case of AMGA reporting) or configure the PATH VO_PROD_VO_EU_EELA_EU_SW_DIR in the WN 3.Call the watchdog.ctrl into the pilot script Type = "Job"; JobType = "Normal"; Executable = "/bin/bash"; StdOutput = "file.out"; StdError = "file.err"; InputSandbox = {"watchdog.sh", "watchdog.ctrl", "watchdog.conf", "uuencode", "uudecode", "AppPilotScript.sh"}; OutputSandbox = {"MyApp.out","MyApp.err", "watchdog.log”,"watchdog.err"}; Arguments = "AppPilotScript.sh"; App JDL #!/bin/sh … # prepare and start the watchdog PATH=${VO_PROD_VO_EU_EELA_EU_SW_DIR}\ /:${PATH}:. chmod +x watchdog.*./watchdog.ctrl start #run application … # Use the./watchdog.ctrl # to control the WD anytime #stop and wait the watchdog completes./watchdog.ctrl stop AppPiloyScript.sh

9 www.eu-eela.eu WD Interaction Choroni (Venezuela), 2 nd EELA-2 Conference, 25-27.11.2009 9 /6-tPC2d2knO7m6GP2XC7-Q _watchdog/ 091002232421_wdcli_cmd1.cmd 091002232421_wdcli_cmd1.err 091002232421_wdcli_cmd1.out... 091002232729_wdcli_cmd7.cmd 091002232729_wdcli_cmd7.err 091002232729_wdcli_cmd7.out WDEND or WDPID WDENV WDHST cmdlist/ wdcli_cmd8 091002231841_13156_file.err 091002231853_13156_file.out 091002231904_13156_watchdog.err … 091002232836_13156_watchdog.log 6-tPC2d2knO7m6GP2XC7-Q Flags WD Control DIR watchdog.conf WD CMD Exe DIR OUT ERR CMD watchdog.sh WN File snapshots LFC/AMGA Mounted Sh FS

10 www.eu-eela.eu wdcli CLI to ease the WD user interaction – 20091124164201 wd> Uses the watchdog.conf file to get user configuration Principal commands: – set Set MODE (LFC/AMGA/mounted Shared FS) – show jobs Get list of monitored jobs –Attach to a monitored job – show snapshots Get the list of file snapshots –View the snapshot content –Get generic info: ENV,PID,CE,WN,Proxy … – exec Execute a given command  Interactive commands are not allowed  It is possible to call the watchdog.ctrl command (use –n opt!) Choroni (Venezuela), 2 nd EELA-2 Conference, 25-27.11.2009 10

11 www.eu-eela.eu WD in EELA-2 Presented 1 st time in E2GRIS1 at Itacuruca (Brazil) –G-HMMER/G-InterProScan  Bioinformatic – Get semi-real time info to be published on the WEB –CrossFire  Civil Protection – Get semi-real time info to view the simulation output Presented the 2 nd time in E2GRIS2 at Qeretaro (Mexico) –HeMoLab  Bioinformatic – Long run jobs, check output files while running –AeroVANT  Engineering – Long run jobs, get data while running –BioMD  Bioinformatic – Long run job, monitor the simulation –Seismic Sensors (planned to)  Earth Science – Monitor the job execution Cinefilia  Recommender Systems – Monitor the computation Choroni (Venezuela), 2 nd EELA-2 Conference, 25-27.11.2009 11

12 www.eu-eela.eu Conclusions WD mainly used for: –Job monitoring (Long run) –Check/Get job produced data WD used as: –As a Debugging helper tool –As an application component (CrossFire) WD easy to integrate but needs a precise configuration –EELA-2 has 2 different AMGA server using different access rights (EU and LA) –EELA-2 does not have shareutils ( uuencode/uudecode ) package installed on the WNs. These tools available under WN path: VO_PROD_VO_EU_EELA_EU_SW_DIR or put ‘ uu**code ’ commands in the InputSandbox –EELA-2 several WNs were using a different BDII, some users were unable to retrieve easily the snapshot content (LFC) Choroni (Venezuela), 2 nd EELA-2 Conference, 25-27.11.2009 12

13 www.eu-eela.eu Future Improve the User Interaction –Improve wdcli (due to the good success in E2GRIS2) –Create tools to easily create web based front ends –Provide tools to reconstruct a file monitored incrementally Ease the application integration (AMGA) – uuencode / uudecode independent –provide watchdog.conf file templates for VOs Improve the Monitoring –Provide independent time watching cycles for each file –Provide a sandboxing mechanism for file I/O from/to WN Choroni (Venezuela), 2 nd EELA-2 Conference, 25-27.11.2009 13

14 www.eu-eela.eu 14 www.eu-eela.eu Questions?


Download ppt "Www.eu-eela.eu E-science grid facility for Europe and Latin America Watchdog: A job monitoring solution inside the EELA-2 Infrastructure Riccardo Bruno,"

Similar presentations


Ads by Google