Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.eu-eela.org E-science grid facility for Europe and Latin America COMPUTING ELEMENT GIUSEPPE PLATANIA INFN Catania 30 June - 4 July, 2008.

Similar presentations


Presentation on theme: "Www.eu-eela.org E-science grid facility for Europe and Latin America COMPUTING ELEMENT GIUSEPPE PLATANIA INFN Catania 30 June - 4 July, 2008."— Presentation transcript:

1 www.eu-eela.org E-science grid facility for Europe and Latin America COMPUTING ELEMENT GIUSEPPE PLATANIA INFN Catania 30 June - 4 July, 2008

2 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 2 OUTLINE OVERVIEW INSTALLATION & CONFIGURATION TESTING FIREWALL SETUP TROUBLESHOOTING

3 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 3 OVERVIEW The Computing Element is the central service of a site. Its main functionally are: – manage the jobs (job submission, job control)‏ – update to WMS the status of the jobs – publish all site informations (site location, queues, about the CPUs status, and so on) via ldap (site BDII service)‏ It can run several kinds of batch system: – Torque + MAUI – LSF – SGE – Condor

4 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 4 TORQUE + MAUI The Torque server is composed by a: – pbs_server – pbs_server which provides the basic batch services such as receiving/creating a batch job. The Torque client is composed by a: – pbs_mom – pbs_mom which places the job into execution. It is also responsible for returning the job’s output to the user The MAUI system is composed by a: – job_scheduler – job_scheduler which contains the site's policies in order to choose which job must be executed.

5 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 5 Site BDII** – By default it is installed on the CE – It collects all site GRISes* (for example SE,RB,LFC,etc..)‏ – The name of the service is bdii – The list of GRISes you want to publish is:  /opt/glite/etc/gip/site-urls.conf – Log file: /opt/bdii/var/bdii.log *GRIS=Grid Resource Information Service **BDII=Berkely Database Infomatin Index

6 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 Computing Element installation & configuration using YAIM

7 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 There are several kinds of metapackages to install: ig_CE – LCG ComputingElement without batch system packages. ig_CE_LSF – LCG ComputingElement with LSF. IMPORTANT: providedfor consistency, it does not install LSF but it apply some fixes via ig_configure_node. ig_CE_torque – LCG ComputingElement with Torque+MAUI. WHAT KIND OF CE?

8 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 8 HOW TO GET AN HOST CERTIFICATE Host certificate for CE. – Please, request it to your RA Install host certificate (hostcert.pem and hostkey.pem) in /etc/grid-security. – mkdir /etc/grid-security – chmod 644 hostcert.pem – chmod 400 hostkey.pem

9 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 9 Repository settings REPOS="ca dag ig jpackage gilda glite-lcg_ce_torque glite- bdii" Download and store repo files: for name in $REPOS; do wget http://grid018.ct.infn.it/mrepo/repos/$name.repo -O /etc/yum.repos.d/$name.repo; done

10 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 10 INSTALLATION yum install jdk java-1.5.0-sun-compat yum install lcg-CA yum install ig_CE_torque If it's also the site bdii collector: yum install ig_BDII Gilda rpms: yum install gilda_utils

11 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 11 Copy ig-site-info.def template file provided by ig_yaim in to gilda dir and customize it cp /opt/glite/yaim/examples/siteinfo/ig-site-info.def /opt/glite/yaim/etc/gilda/ Open /opt/glite/yaim/etc/gilda/ file using a text editor and set the following values according to your grid environment: CE_HOST= BATCH_SERVER=$CE_HOST Customize ig-site-info.def

12 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 WN_LIST=/opt/glite/yaim/etc/gilda/wn-list.conf The file specified in WN_LIST has to be set with the list of all your WNs hostname. WARNING: It’s important to setup it before to run the configure command Customize ig-site-info.def

13 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 Copy users and groups example files to /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-groups.conf /opt/glite/yaim/etc/gilda/ cp /opt/glite/yaim/examples/ig-users.conf /opt/glite/yaim/etc/gilda/ Append gilda users and groups definitions to /opt/glite/yaim/etc/gilda/ig-users.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-users.conf >> /opt/glite/yaim/etc/gilda/ig-users.conf cat /opt/glite/yaim/etc/gilda/gilda_ig-groups.conf >> /opt/glite/yaim/etc/gilda/ig-groups.conf Customize ig-site-info.def

14 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 GROUPS_CONF=/opt/glite/yaim/etc/gilda/ig-groups.conf USERS_CONF=/opt/glite/yaim/etc/gilda/ig-users.conf JAVA_LOCATION="/usr/java/j2sdk1.4.2_12“ SITE_EMAIL=grid-prod@ SITE_NAME=GILDA-01..05 SITE_LOC=“Catania, ITALY" SITE_LAT=37.5 SITE_LONG=15.152 SITE_WEB="https://gilda.ct.infn.it" SITE_TIER="GILDA Testbed" SITE_SUPPORT_SITE="grid-prod@ " Customize ig-site-info.def

15 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 JOB_MANAGER=lcgpbs CE_BATCH_SYS=pbs BATCH_BIN_DIR=/usr/bin BATCH_VERSION=torque-2.1.9-4 CE_CPU_MODEL=Opteron CE_CPU_VENDOR=AMD CE_CPU_SPEED=3000 CE_OS="Scientific Linux“ CE_OS_RELEASE=4.5 CE_OS_VERSION="SL“ CE_MINPHYSMEM=2048 CE_MINVIRTMEM=4096 CE_SMPSIZE=2 CE_SI00=1000 CE_SF00=1200 CE_OUTBOUNDIP=TRUE CE_INBOUNDIP=TRUE Customize ig-site-info.def

16 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 DPM_HOST=“dpm_hostname” SE_LIST="$DPM_HOST“ SITE_BDII_HOST=$CE_HOST BDII_REGIONS="CE SE“ BDII_CE_URL="ldap://$CE_HOST:2170/mds-vo- name=resource,o=grid“ BDII_SE_URL="ldap://$DPM_HOST:2170/mds-vo- name=resource,o=grid“ VOS=“gilda” ALL_VOMS=“gilda” Customize ig-site-info.def

17 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 QUEUES="short long infinite“ SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS In case of to configure a queue fo a single VO: QUEUES="short long infinite gilda“ SHORT_GROUP_ENABLE=$VOS LONG_GROUP_ENABLE=$VOS INFINITE_GROUP_ENABLE=$VOS GILDA_GROUP_ENABLE=“gilda” Customize ig-site-info.def

18 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 CE Torque CONFIGURATION Now we can configure the node: /opt/glite/yaim/bin/ig_yaim -c -s /opt/glite/yaim/etc/gilda/ -n ig_CE_torque -n BDII_site

19 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 Computing Element testing

20 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 Check if the local GRIS and the site BDII are running on CE and are publishing the right informations (CPU, site name and so on)‏ ldapsearch -x -h -p 2170 -b mds-vo- name=resource,o=grid ldapsearch -x -h -p 2170 -b mds-vo- name=,o=grid Testing

21 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 Become a gilda user # su – gilda001 Edit a file and write: #!/bin/sh sleep 20 #(it's useful to see the job status)‏ hostname Save it and set the permission of execution: chmod 700 test.sh Testing

22 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 [gilda001@ce gilda001]$ qsub -q short test.sh [gilda001@ce gilda001]$ qstat -a ce.localdomain: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ---- 3.wn.localdo gilda001 short test.sh 5839 -- -- -- 00:15 R -- Testing

23 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 [gilda001@ce gilda001]$ qstat -a [gilda001@ce gilda001]$ The job execution has finished and we have to list the output file: [gilda001@ce gilda001]$ ls test.sh.e3 test.sh.o3 And show them: [gilda001@ce gilda001]$ cat test.sh.e3 (error file)‏ [gilda001@ce gilda001]$ [gilda001@ce gilda001]$ cat test.sh.o3 (output file)‏ wn.localdomain Testing

24 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 Log on the UI: hostname -> glite-tutor.ct.infn.it Username -> catania01..30 Password -> GridCAT01..30 Grid passphrase -> CATANIA Testing

25 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 [plt@glite-tutor plt]$ voms-proxy-init –voms gilda [plt@glite-tutor plt]$ globus-job-run grid006.ct.infn.it:2119/jobmanager-lcgpbs -q short /bin/hostname wn.localdomain [plt@glite-tutor plt]$ edg-job-submit -r grid006.ct.infn.it:2119/jobmanager-lcgpbs-short hostname.jdl Selected Virtual Organisation name (from proxy certificate extension): gilda Connecting to host glite-rb.ct.infn.it, port 7772 Logging to host glite-rb.ct.infn.it, port 9002 ******************************************************************************** JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://glite-rb.ct.infn.it:9000/Vo-4Ih1s-iDbBPr3rs69GQ ******************************************************************************** Testing

26 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 FIREWALL SETUP

27 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 /etc/sysconfig/iptables (1/2)‏ *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :RH-Firewall-1-INPUT - [0:0] -A INPUT -j RH-Firewall-1-INPUT -A FORWARD -j RH-Firewall-1-INPUT -A RH-Firewall-1-INPUT -i lo -j ACCEPT -A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2135 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2119 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2170 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 2811 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport maui -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport pbs_mom -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport pbs_resmom -j ACCEPT

28 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport pbs -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 3878:3879 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 3879 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 3882 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 1020:1023 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 20000:25000 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 32768:65535 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp --dport 32768:65535 -j ACCEPT -A RH-Firewall-1-INPUT -p tcp -m tcp --syn -j REJECT -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited COMMIT /etc/sysconfig/iptables (2/2)‏

29 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 IPTABLES STARTUP /sbin/chkconfig iptables on /etc/init.d/iptables start

30 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 Troubleshooting

31 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 Troubleshooting [plt@ui plt]$ globus-job-run :2119/jobmanager-lcgpbs -q short /bin/hostname GRAM Job submission failed because the connection to the server failed (check host and port) (error code 12)‏ solution: check if the globus-gatekeeper daemon is up and running on CE [plt@ui plt]$ globus-job-run :2119/jobmanager-lcgpbs -q short /bin/hostname GRAM Job submission failed because authentication failed: GSS Major Status: Authentication Failed GSS Minor Status Error Chain: init.c:499: globus_gss_assist_init_sec_context_async: Error during context initialization init_sec_context.c:171: gss_init_sec_context: SSLv3 handshake problems globus_i_gsi_gss_utils.c:888: globus_i_gsi_gss_handshake: Unable to verify remote side's credentials globus_i_gsi_gss_utils.c:847: globus_i_gsi_gss_handshake: Unable to verify remote side's credentials: Couldn't verify the remote certificate OpenSSL Error: s3_pkt.c:1046: in library: SSL routines, function SSL3_READ_BYTES: sslv3 alert bad certificate (error code 7)‏ solution: probably there is no GILDA CA rpm installed on CE

32 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 [plt@ui plt]$ edg-gridftp-ls gsiftp:// / error the server sent an error response: 530 530 LCMAPS credential mapping NOT successful solution: check on CE the VO mapping in /opt/edg/etc/lcmaps/gridmapfile /opt/edg/etc/lcmaps/groupmapfile Troubleshooting

33 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 The CE is publishing wrong informations such as: GlueCEStateFreeCPUs: 0 GlueCEStateRunningJobs: 0 GlueCEStateStatus: Production GlueCEStateTotalJobs: 0 GlueCEStateWaitingJobs: 4444 Run the script: /opt/glite/etc/gip/plugin/glite-info-dynamic-scheduler-wrapper and check if it gives some errors. Often it doesn’t work because the batch system is down or in lock state. In this case restart torque service: /etc/init.d/pbs_server restart Troubleshooting

34 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 If a query to the site BDII doesn’t show the information about a site, you have to look at the bdii log file /opt/bdii/var/bdii.log For example: GILDA: ldap_bind: Can't contact LDAP server Check if: – bdii is up & running (ps aux |grep bdii)‏ – That resource url is in the list file /opt/glite/etc/gip/site-urls.conf – Firewall setup Troubleshooting

35 www.eu-eela.eu Catania (Italy), Joint EELA/EGEEIII Tutorial for Trainers, 30.06.2008 – 04.07.2008 35


Download ppt "Www.eu-eela.org E-science grid facility for Europe and Latin America COMPUTING ELEMENT GIUSEPPE PLATANIA INFN Catania 30 June - 4 July, 2008."

Similar presentations


Ads by Google