A proposal for standardizing the working environment for a LCG/EGEE job David Bouvet - Grid Computing team - CCIN2P3 HEPIX Karlsruhe 13/05/2005
David Bouvet – HEPIX Karlsruhe 13/05/ Motivation Problem raised some months ago by Jeff Templon: –D0 jobs encountered problems at Lyon due to different use of environment variables to address scratch/temp disk space Standard is defined for: –Environment Variables « IEEE Std , 2004 POSIX Part 1: Base definitions, Amendment 8 » among which: HOME, PATH, PWD, SHELL, TMPDIR, USER –Batch Environment Services « IEEE Std , 2004 POSIX Part 2: Shell and Utilities, Amendment 1 » PBS_ENVIRONMENT, PBS_JOBID, PBS_JOBNAME, PBS_QUEUE PBS_O_HOME, PBS_O_HOST, PBS_O_LOGNAME, PBS_O_PATH, PBS_O_QUEUE, PBS_O_SHELL, PBS_O_WORKDIR these variables are not directly used by the jobs There is no standard definition of environment variables for grid batch jobs Proposal for LCG/EGEE sites of a common definition of minimal set of environment variables for grid batch jobs
David Bouvet – HEPIX Karlsruhe 13/05/ Current status through several batch used on the grid Environment variables for grid batch job have been checked on several LCG/EGEE sites (among which all the LCG T1s) Conditions of test: ATLAS VO, short queue Batch system CEs distribution # CEs checked BQS32 CONDOR43 TORQUE7211 PBS3613 LSF54
David Bouvet – HEPIX Karlsruhe 13/05/ Current status: POSIX variables : defined : not defined on some sites not all these variables are defined on the various batch systems VariableBQSCONDORTORQUEPBSLSF POSIX basic: HOME PATH PWD SHELL TMPDIR USER POSIX batch
David Bouvet – HEPIX Karlsruhe 13/05/ Current status (cont.) : defined : not defined on some sites even for Globus, not all the sites define the same set of environment variables. VariableBQSCONDORTORQUEPBSLSF GLOBUS variables: GLOBUS_LOCATION GLOBUS_PATH GLOBUS_TCP_PORT_RANGE X509_USER_PROXY MYPROXY_SERVER (useful for proxy renewal)
David Bouvet – HEPIX Karlsruhe 13/05/ Current status: LCG environment variables (middleware related) (list from the LCG Users Guide) VariableDefinitionBQSCONDORTORQUEPBSLSF EDG_LOCATION Base of the installed EDG software LCG_LOCATION Base of the installed LCG software EDG_WL_JOBID Job ID (for a running job) in a WN EDG_WL_LOCATION Base of the EDG’s WMS software EDG_WL_PATH Path for EDG’s WMS commands EDG_WL_RB_BROKERINFO Location of the.BrokerInfo file in a WN LCG_GFAL_INFOSYS Location of the BDII for lcg-utils and GFAL LCG_CATALOG_TYPE Type of file catalog used (edg or lfc) for lcg-utils and GFAL LFC_HOST Location of the LFC catalog (only for catalog type lfc)
David Bouvet – HEPIX Karlsruhe 13/05/ Current status: LCG environment variables (job related) (list from the LCG Users Guide) VariableDefinitionBQSCONDORTORQUEPBSLSF EDG_TMP Temp directory LCG_TMP Temp directory VO_ _DEFAULT_SE Default SE defined for a CE in a WN VO_ _SW_DIR Base directory of the VO’s software in a WN possible uniformization to POSIX name: TMPDIR ?
David Bouvet – HEPIX Karlsruhe 13/05/ Current status: gLite environment variables gLite environment variables on WN (in config. files and scripts) from gLite installation guide: –GLITE_LOCATION /opt/glite –GLITE_LOCATION_VAR /var/glite –GLITE_LOCATION_LOG /var/log/glite –GLITE_LOCATION_TMP /tmp/glite GLITE_LOCATION_TMP another tmp directory to clean!
David Bouvet – HEPIX Karlsruhe 13/05/ Proposal for standardization Variable type DefinitionName POSIX Home directory of job user on WN HOME Temp directory TMPDIR (currently LCG_TMP, EDG_TMP, GLITE_LOCATION_TMP) PWD SHELL PATH Grid batch jobs Job working directory on WN GRID_WORKDIR Site name on which the job run (same as siteName in Information Provider) GRID_SITENAME WN hostname on which the job run GRID_HOSTNAME CE and queue names on which the job run (same as GlueCEUniqueID in Information Provider) GRID_CEID Job ID in local batch system GRID_LOCAL_JOBID Job ID on grid GRID_GLOBAL_JOBID (currently EDG_WL_JOBID) User’s DN of certificate GRID_USERID
David Bouvet – HEPIX Karlsruhe 13/05/ Proposal for standardization (cont.) Use of POSIX variable when existing –TMPDIR: POSIX variable which can replace LCG_TMP, EDG_TMP, GLITE_LOCATION_TMP –HOME: MPI jobs need a home directory
David Bouvet – HEPIX Karlsruhe 13/05/ Proposal for standardization (cont.) Minimal set of environment variable (not related to middleware). The naming convention must be independant of grid middleware name for grid job portability –GRID_WORKDIR –GRID_WORKDIR: work directory specific to the job (unix right 700) e.g.: /scratch/atlas ccwl0092 –GRID_SITENAME –GRID_SITENAME: to know on which site the job run (same as siteName in the Information System) e.g.: IN2P3-CC –GRID_HOSTNAME –GRID_HOSTNAME: could be useful to know the WN hostname for problem tracking (and parallel jobs?) e.g.: ccwl0006.in2p3.fr –GRID_CEID –GRID_CEID: CE and queue names on which the job run (same as GlueCEUniqueID in Information System) e.g.: heplnx201.pp.rl.ac.uk:2119/jobmanager-torque-short –GRID_LOCAL_JOBID –GRID_LOCAL_JOBID: useful for problem tracking (and parallel jobs?) lcg e.g.: lcg –GRID_GLOBAL_JOBID –GRID_GLOBAL_JOBID: same as EDG_WL_JOBID for LCG e.g.: –GRID_USERID –GRID_USERID: DN of user’s certificate (already exists on some sites) /e.g.: /O=GRID-FR/C=FR/O=CNRS/OU=CC-LYON/CN=David
David Bouvet – HEPIX Karlsruhe 13/05/ Proposal for standardization (cont.) When agreed on a set of variables and a naming convention, this standard should be implemented on all LCG/EGEE CEs. Based on today’s discussion, a document will be distributed to site administrators and applications. A possible deadline for discussion and beginning of deployment: end of June
David Bouvet – HEPIX Karlsruhe 13/05/ Proposal for standardization (discussion) Variable type DefinitionName Agreement on POSIX Home directory of job user on WN HOME Temp directory TMPDIR (currently LCG_TMP, EDG_TMP, GLITE_LOCATION_TMP) Grid batch jobs Job working directory on WN GRID_WORKDIR Site name on which the job run (same as siteName in Information Provider) GRID_SITENAME WN hostname on which the job run GRID_HOSTNAME CE and queue names on which the job run (same as GlueCEUniqueID in Information Provider) GRID_CEID Job ID in local batch system GRID_LOCAL_JOBID Job ID on grid GRID_GLOBAL_JOBID (currently EDG_WL_JOBID) User’s DN of certificate GRID_USERID