Presentation is loading. Please wait.

Presentation is loading. Please wait.

CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

Similar presentations


Presentation on theme: "CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration."— Presentation transcript:

1 CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration

2 CE presentation The Computing Element is the central service of a site. Its main functionally are: – manage the jobs (job submission, job control) – update to WMS the status of the jobs – publish all site informations (about site, queue, number of total,free CPUs) It can run several kinds of batch system: –Torque + MAUI – LSF – Condor

3 TORQUE server presentation The Torque server is composed by a: – pbs_server – pbs_server which provides the basic batch services such as receiving/creating a batch job. The Torque client is composed by a: – pbs_mom – pbs_mom which places the job into execution. It is also responsible for returning the job’s output to the user The MAUI system is composed by a: – job_scheduler – job_scheduler which contains the site's policy to decide which job must be executed.

4 CE: site-info.def variables (1) Main variables of the site configuration file for the CE : CE_HOST=ce1.$MY_DOMAIN # Jobmanager specific settings JOB_MANAGER=lcgpbs CE_BATCH_SYS=torque BATCH_BIN_DIR=/usr/bin BATCH_VERSION=torque-1.0.1b BATCH_LOG_DIR=/var/spool/pbs/server_priv/accounting # Architecture and enviroment specific settings CE_CPU_MODEL=PIV CE_CPU_VENDOR=intel CE_CPU_SPEED=1001 CE_OS="Scientific Linux SL" CE_OS_RELEASE="SL" CE_OS_VERSION=3.0.5 CE_MINPHYSMEM=1024

5 CE : site-info.def variables (2) CE_MINVIRTMEM=2048 CE_SMPSIZE=1 CE_SI00=381 CE_SF00=0 CE_OUTBOUNDIP=TRUE CE_INBOUNDIP=FALSE CE_RUNTIMEENV=" LCG-2 LCG-2_1_0 … GLITE-3_0_0 R-GMA " # TORQUE - Change this if your torque server is not on the CE TORQUE_SERVER=$CE_HOST Worker Node list defined for the site “private.griprototype” : WN_LIST=/opt/glite/yaim/travail/wn-list.conf ce1.private.gridprototype se1.private.gridprototype

6 WN: worker node & Torque client presentation  The Torque client is composed by a:  pbs_mom  pbs_mom which places the job into execution. It is also responsible for returning the job’s output to the user The Worker Node is a service where the jobs run. Its main functionally are: execute the jobs update to Computing Element the status of the jobs It can run several kinds of client batch system: Torque LSF

7 CE certification: cd /etc/grid-security/ ln -s ce1.private.gridprototype.crt hostcert.pem ln -s ce1.private.gridprototype.key hostkey.pem chmod 644 hostcert.pem chmod 400 hostkey.pem For the CE1 machine, certificates are files named : ce1.private.gridprototype.crt ce1.private.gridprototype.key Certificates installation in /etc/grid-security directory on CE Get certificates from the BEINGRID CA Certification Authority: http://voms.beingrid.fr.cgg.com/ca/ backup the certificate as a.p12 file and extract public and private keys openssl pkcs12 –nocert –in ce1.p12 –out ce1….cert openssl pkcs12 –nocert –in ce1.p12 –out ce1….key

8 List of mandatory configuration files : the WN list defined for the site “private.griprototype” : WN_LIST=/opt/glite/yaim/travail/wn-list.conf the mapped-users list defined for the site “private.griprototype” : /opt/glite/yaim/travail/users.conf the mapped-groups list defined for the site “private.griprototype” : /opt/glite/yaim/travail/groups.conf

9 CE installation and configuration gLite-yaim generic command: install_node site-info.def lcg-CE_torque glite-WN The CE is a certified machine, install certificates in the directory /etc/grid-security/ configure_node site-info.def CE_torque WN_torque BDII_site

10 CE publication test The CE should publish information to the BDII: lcg-infosites --vo egeode ce valor del bdii: rb1.private.gridprototype:2170 #CPU Free Total Jobs Running Waiting ComputingElement ------------------------------------------------------- 2 2 0 0 0 ce1.private.gridprototype:2119/jobmanager-lcgpbs-egeode The CE should publish status of jobs queues: As egeode005 user locally, it should match the WN list defined in /opt/glite/…/wn-list.conf pbsnodes -a se1.private.gridprototypece1.private.gridprototype state = free np = 1 properties = lcgpro ntype = cluster etc…

11 Local job submission on the CE To be able to submit jobs locally the user must be mapped egeode005 user on the new installed CE machine. cat test.sh #!/bin/sh /bin/hostname /bin/sleep 300 qsub -q egeode test.sh 35.ce1.private.gridprototype qstat -a ce1.private.gridprototype: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time --------------- -------- -------- ---------- ------ --- --- ------ ----- 35.ce1.private. egeode00 egeode test.sh 11239 -- -- -- 48:00 R

12 UI/GUI JAVA graphical interface commands : edj-wl-ui-jobmonitor.sh edj-wl-ui-jdleditor.sh …

13 CE Torque/Maui documentation  TORQUE ADMIN GUIDE http://www.clusterresources.com/wiki/doku.php?id=torque:torque_wiki http://www.clusterresources.com/wiki/doku.php?id=torque:torque_wiki  MAUI ADMIN GUIDE http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml http://www.clusterresources.com/products/maui/docs/mauiadmin.shtml

14 Sample Image Questions on the CE ?


Download ppt "CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration."

Similar presentations


Ads by Google