Presentation is loading. Please wait.

Presentation is loading. Please wait.

South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA.

Similar presentations


Presentation on theme: "South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA."— Presentation transcript:

1 South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA

2 18 Nov 2009, Cape Town South African Grid Training 2 OUTLINE OVERVIEW INSTALLATION & CONFIGURATION TESTING FIREWALL SETUP TROUBLESHOOTING

3 18 Nov 2009, Cape Town South African Grid Training 3 OVERVIEW The Computing Element is the central service of a site. Its main functionalities are: – manage the jobs (job submission, job control) ‏ – update the status of the jobs to the WMS – publish all site information (site location, queues, about the CPUs status, and so on) via LDAP (site BDII service) ‏ It can run on several kinds of batch systems: – Torque + MAUI – LSF – SGE – Condor

4 18 Nov 2009, Cape Town South African Grid Training 4 TORQUE + MAUI The Torque server is composed of: – pbs_server – pbs_server which provides the basic batch services such as receiving/creating a batch job. The Torque client is composed of: – pbs_mom – pbs_mom which places the job into execution. It is also responsible for returning the job’s output to the user The MAUI system is composed of: – job_scheduler – job_scheduler which contains the site's policy to decide which job must be executed and when.

5 18 Nov 2009, Cape Town South African Grid Training 5 Site BDII** – By default it is installed on the CE – It collects all site GRISes* (for example SE,RB,LFC,etc..) ‏ – The name of the service is bdii – Log file: /opt/bdii/var/bdii.log *GRIS=Grid Resource Information Service **BDII=Berkeley Database Information Index

6 18 Nov 2009, Cape Town South African Grid Training 6 Computing Element installation & configuration using YAIM

7 18 Nov 2009, Cape Town South African Grid Training 7 There are several kinds of metapackages to install: ig_CE – LCG ComputingElement without batch system packages. ig_CE_LSF – LCG ComputingElement with LSF. IMPORTANT: provided for consistency, it does not install LSF but it apply some fixes via ig_configure_node. ig_CE_torque – LCG ComputingElement with Torque+MAUI. WHAT KIND OF CE?

8 18 Nov 2009, Cape Town South African Grid Training 8 HOW TO GET A HOST CERTIFICATE Host certificate for CE. – Please, request it from your RA For this tutorial: HOST=$(hostname -f) ‏ mkdir /etc/grid-security cp /root/$HOST/${HOST}-cert.pem /etc/grid-security/hostcert.pem cp /root/$HOST/${HOST}-key.pem /etc/grid-security/hostkey.pem Install host certificates – (hostcert.pem and hostkey.pem) in /etc/grid-security – mkdir /etc/grid-security – cd /etc/grid-security – chmod 644 hostcert.pem – chmod 400 hostkey.pem

9 18 Nov 2009, Cape Town South African Grid Training 9 Repository settings REPOS="ca dag glite-lcg_ce ig jpackage gilda" Download and save the repo files: for name in $REPOS; do wget http://grid018.ct.infn.it/mrepo/repos/$name.repo -O /etc/yum.repos.d/$name.repo; done http://grid018.ct.infn.it/mrepo/repos

10 18 Nov 2009, Cape Town South African Grid Training 10 INSTALLATION yum remove jdk yum install xml-commons-resolver12 yum install jdk java-1.6.0-sun-compat yum install maui-3.2.6p19_20.snap.1182974819-5.slc4 \ maui-server-3.2.6p19_20.snap.1182974819-5.slc4 yum install ig_CE_torque yum install lcg-CA Gilda rpms: yum install gilda_utils If it's also the site BDII collector: yum install ig_BDII

11 18 Nov 2009, Cape Town South African Grid Training 11 Copy ig-site-info.def template file provided by ig_yaim into gilda directory and customize it cp /opt/glite/yaim/examples/siteinfo/ig-site-info.def /opt/glite/yaim/etc/gilda/ Open /opt/glite/yaim/etc/gilda/ file using a text editor and set the following values according to your grid environment: CE_HOST= TORQUE_SERVER=$CE_HOST Customize ig-site-info.def

12 18 Nov 2009, Cape Town South African Grid Training 12 JOB_MANAGER=lcgpbs BATCH_BIN_DIR=/usr/bin BATCH_VERSION=torque-2.1.9-4 CE_BATCH_SYS=pbs CE_CPU_MODEL=Opteron CE_CPU_VENDOR=AMD CE_CPU_SPEED=3000 CE_OS="ScientificSL" CE_OS_RELEASE=4.8 CE_OS_VERSION="SL" CE_MINPHYSMEM=2048 CE_MINVIRTMEM=4096 CE_SMPSIZE=2 CE_SI00=1000 CE_SF00=1200 CE_OUTBOUNDIP=TRUE CE_INBOUNDIP=TRUE Customize ig-site-info.def

13 18 Nov 2009, Cape Town South African Grid Training 13 GROUPS_CONF=/opt/glite/yaim/etc/gilda/ig-groups.conf USERS_CONF=/opt/glite/yaim/etc/gilda/ig-users.conf JAVA_LOCATION="/usr/java/latest" SITE_EMAIL="grid-prod@ " SITE_NAME=GILDA-54..58 #Your Number (eg. GILDA-60) SITE_LOC="Cape Town, SOUTH AFRICA" SITE_LAT=37.5 SITE_LONG=15.152 SITE_WEB="https://gilda.ct.infn.it" SITE_SUPPORT_SITE="grid-prod@ “ REMOVE the following, if it exists: SITE_TIER=“xxxxxxxx" Customize ig-site-info.def

14 18 Nov 2009, Cape Town South African Grid Training 14 QUEUES="short long infinite gilda" SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS If you configure a queue for a single VO: QUEUES="short long infinite gilda" SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS GILDA_GROUP_ENABLE="gilda" Customize ig-site-info.def

15 18 Nov 2009, Cape Town South African Grid Training 15 DPM_HOST="aliserv6.ct.infn.it“ SE_LIST="$DPM_HOST“ VOS="gilda " #If you have more than one: "gilda my_other_vo" ALL_VOMS="gilda“ WMS_HOST="egee-wms-01.cnaf.infn.it" SE_MOUNT_INFO_LIST="none" CE_OTHERDESCR="Cores=8,Benchmark=$CE_SI00-HEP-SPEC06" CE_RUNTIMEENV="LCG-2 LCG-2_1_0 LCG-2_1_1 LCG-2_2_0 GLITE-3_0_0 GLITE- 3_1_0 R-GMA" CE_CAPABILITY="CPUScalingReferenceSI00=$CE_SI00" BATCH_SERVER=$CE_HOST BDII_HOST=gilda-bdii.ct.infn.it SITE_BDII_HOST=$CE_HOST BDII_REGIONS="CE SE" BDII_CE_URL="ldap://$CE_HOST:2170/mds-vo-name=resource,o=grid" BDII_SE_URL="ldap://$DPM_HOST:2170/mds-vo-name=resource,o=grid" Customize ig-site-info.def

16 18 Nov 2009, Cape Town South African Grid Training 16 WMS_HOST="egee-wms-01.cnaf.infn.it" SE_MOUNT_INFO_LIST="none" CE_OTHERDESCR="Cores=8,Benchmark=$CE_SI00-HEP-SPEC06" CE_RUNTIMEENV="LCG-2 LCG-2_1_0 LCG-2_1_1 LCG-2_2_0 GLITE-3_0_0 GLITE- 3_1_0 R-GMA" CE_CAPABILITY="CPUScalingReferenceSI00=$CE_SI00" VO_GILDA_SW_DIR=$VO_SW_DIR/gilda VO_GILDA_DEFAULT_SE=$CLASSIC_HOST VO_GILDA_STORAGE_DIR=$CLASSIC_STORAGE_DIR/gilda VO_GILDA_QUEUES="gilda" VO_GILDA_VOMS_SERVERS="vomss://voms.ct.infn.it:8443/voms/gilda?/gilda" VO_GILDA_VOMSES="'gilda voms.ct.infn.it 15001/C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it gilda'" VO_GILDA_VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA' '/C=IT/O=INFN/CN=INFN CA'" Customize ig-site-info.def

17 18 Nov 2009, Cape Town South African Grid Training 17 WN_LIST=/opt/glite/yaim/etc/gilda/wn-list.conf The file specified in WN_LIST has to define all your WNs' full hostnames. WARNING: It's important to configure the WN file (/opt/glite/yaim/etc/gilda/wn-list.conf) before you run the yaim configure command Customize ig-site-info.def

18 18 Nov 2009, Cape Town South African Grid Training 18 Copy users and groups example files to /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-groups.conf /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-users.conf /opt/glite/yaim/etc/gilda/ Append gilda users and groups definitions to /opt/glite/yaim/etc/gilda/ig- users.conf and ig-groups.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-users.conf >> /opt/glite/yaim/etc/gilda/ig-users.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-groups.conf >> /opt/glite/yaim/etc/gilda/ig-groups.conf Customize ig-site-info.def

19 18 Nov 2009, Cape Town South African Grid Training 19 CE Torque Configuration Now we can configure the node: /opt/glite/yaim/bin/ig_yaim -c \ -s /opt/glite/yaim/etc/gilda/ \ -n ig_CE_torque \ -n BDII_site * Note that there is two different (-n) node type parameters

20 18 Nov 2009, Cape Town South African Grid Training 20 Computing Element Testing

21 18 Nov 2009, Cape Town South African Grid Training 21 Check that the local GRIS and the site BDII are running on CE and are publishing the right information (CPU, site name and so on) ‏ ldapsearch -x –h your_ce_hostname -p 2170 -b mds-vo-name=resource,o=grid ldapsearch -x –h your_ce_hostname -p 2170 -b mds-vo- name=your_site_name,o=grid The second ldapsearch will return nothing See next slide Testing

22 18 Nov 2009, Cape Town South African Grid Training 22 ldapsearch -x -h your_ce_hostname -p 2170 -b mds-vo- name=your_site_name,o=grid The ldapsearch won’t return anything Solution: Edit the following file /opt/glite/yaim/etc/gilda/services/glite-bdii_site Comment out the following entries, or set the correct values for them and rerun ig_yaim... BDII_REGIONS=... BDII_host-id-1_URL=... Testing

23 18 Nov 2009, Cape Town South African Grid Training 23 Become a gilda user # su – gilda001 Create a file (test.sh) and add the following: #!/bin/sh sleep 20 #(it's useful to see the job status) ‏ hostname Save it and set the file permission to be executable: chmod 700 test.sh Testing

24 18 Nov 2009, Cape Town South African Grid Training 24 [gilda001@ce gilda001]$ qsub -q short test.sh [gilda001@ce gilda001]$ qstat -a ce.localdomain: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ---- 3.wn.localdo gilda001 short test.sh 5839 -- -- -- 00:15 R -- Testing

25 18 Nov 2009, Cape Town South African Grid Training 25 [gilda001@ce gilda001]$ qstat -a [gilda001@ce gilda001]$ The job execution has finished and we have to list the output file: [gilda001@ce gilda001]$ ls test.sh.e3 test.sh.o3 And show the results: [gilda001@ce gilda001]$ cat test.sh.e3 (error file) ‏ [gilda001@ce gilda001]$ [gilda001@ce gilda001]$ cat test.sh.o3 (output file) ‏ wn.localdomain Testing

26 18 Nov 2009, Cape Town South African Grid Training 26 Log onto the UI: Hostname -> glite-tutor.ct.infn.it Username -> capetown01..06 Password -> GridCAP01..06 Grid passphrase -> CAPETOWN Testing

27 18 Nov 2009, Cape Town South African Grid Training 27 [plt@glite-tutor plt]$ voms-proxy-init --voms gilda [plt@glite-tutor plt]$ globus-job-run :2119/jobmanager-lcgpbs -q short /bin/hostname wn.localdomain [plt@glite-tutor plt]$ glite-wms-job-submit -a -r your-ce-hostname:2119/jobmanager-lcgpbs-gilda hostname.jdl Selected Virtual Organisation name (from proxy certificate extension): gilda Connecting to host glite-rb.ct.infn.it, port 7772 Logging to host glite-rb.ct.infn.it, port 9002 ******************************************************************************** JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://glite-rb.ct.infn.it:9000/Vo-4Ih1s-iDbBPr3rs69GQ ******************************************************************************** plt@glite-tutor plt]$ glite-wms-job-status https://glite-rb.ct.infn.it:9000/Vo-4Ih1s-iDbBPr3rs69GQ Testing

28 18 Nov 2009, Cape Town South African Grid Training 28 FIREWALL SETUP

29 18 Nov 2009, Cape Town South African Grid Training 29 /etc/sysconfig/iptables (1/2) ‏ *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :RH-Firewall-1-INPUT - [0:0] -A INPUT -j RH-Firewall-1-INPUT -A FORWARD -j RH-Firewall-1-INPUT -A RH-Firewall-1-INPUT -i lo -j ACCEPT -A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2135 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2119 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2170 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2811 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport maui -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport pbs_mom -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport pbs_resmon -j ACCEPT

30 18 Nov 2009, Cape Town South African Grid Training 30 -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport pbs -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 3878:3879 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 3879 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 3882 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 1020:1023 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 20000:25000 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 32768:65535 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 32768:65535 -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -m tcp --syn -j REJECT -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited COMMIT /etc/sysconfig/iptables (2/2) ‏

31 18 Nov 2009, Cape Town South African Grid Training 31 IPTABLES STARTUP /sbin/chkconfig iptables on /etc/init.d/iptables start

32 18 Nov 2009, Cape Town South African Grid Training 32 Troubleshooting

33 18 Nov 2009, Cape Town South African Grid Training 33 Troubleshooting [plt@ui plt]$ globus-job-run you_ce_hostname:2119/jobmanager-lcgpbs -q short /bin/hostname GRAM Job submission failed because the connection to the server failed (check host and port) (error code 12) ‏ solution: check if the globus-gatekeeper daemon is up and running on CE [plt@ui plt]$ globus-job-run :2119/jobmanager-lcgpbs -q short /bin/hostname GRAM Job submission failed because authentication failed: GSS Major Status: Authentication Failed GSS Minor Status Error Chain: init.c:499: globus_gss_assist_init_sec_context_async: Error during context initialization init_sec_context.c:171: gss_init_sec_context: SSLv3 handshake problems globus_i_gsi_gss_utils.c:888: globus_i_gsi_gss_handshake: Unable to verify remote side's credentials globus_i_gsi_gss_utils.c:847: globus_i_gsi_gss_handshake: Unable to verify remote side's credentials: Couldn't verify the remote certificate OpenSSL Error: s3_pkt.c:1046: in library: SSL routines, function SSL3_READ_BYTES: sslv3 alert bad certificate (error code 7) ‏ solution: probably there is no GILDA CA rpm installed on CE

34 18 Nov 2009, Cape Town South African Grid Training 34 [plt@ui plt]$ edg-gridftp-ls gsiftp:// / error the server sent an error response: 530 530 LCMAPS credential mapping NOT successful Solution: Check the VO mapping on the CE: /opt/edg/etc/lcmaps/gridmapfile /opt/edg/etc/lcmaps/groupmapfile Troubleshooting

35 18 Nov 2009, Cape Town South African Grid Training 35 The CE is publishing incorrect information such as: GlueCEStateFreeCPUs: 0 GlueCEStateRunningJobs: 0 GlueCEStateStatus: Production GlueCEStateTotalJobs: 0 GlueCEStateWaitingJobs: 4444 Run the script: /opt/glite/etc/gip/plugin/glite-info-dynamic-scheduler-wrapper and check if it gives some errors. Often it doesn’t work because the batch system is down or in a lock state. If that is the case, restart the torque-server service: /etc/init.d/pbs_server restart Troubleshooting

36 18 Nov 2009, Cape Town South African Grid Training 36 If a query to the site BDII doesn’t show the information about a site, you have to look at the BDII logfile:/opt/bdii/var/bdii.log For example: GILDA: ldap_bind: Can't contact LDAP server Check if: – BDII is up & running (ps aux |grep bdii) ‏ – That resource url is in the list file /opt/glite/etc/gip/site-urls.conf – Firewall Setup Troubleshooting

37 18 Nov 2009, Cape Town South African Grid Training 37


Download ppt "South African Grid Training COMPUTING ELEMENT Albert van Eck UFS - ICTS 18 November 2009 Slides by: GIUSEPPE PLATANIA."

Similar presentations


Ads by Google