1 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) CE+WN Installation and configuration Riccardo Rotondo National Institute of Nuclear Physics Asia CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators Kolkata,
2 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Outline Computing Element overview Worker Node overview CE CREAM overview gLite stack overview gLite CE cream and siteBDII –Installation on CE and WN (wiki) –Configuration on CE and WN (wiki)
3 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, gLite stack overview
4 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, gLite overview worker node
5 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, glite overview User Interface: it’s the point of access for users to glite grid services WMS: it’s the component that optimize resource usage. CE: the machine who manage worker nodes WN: the machines who actually execute applications SE: machines where files are stored LFC: used to “find” files on the grid BDII: services responsible to publish all info of your sites Logging and Bookkeping: as it’s name says it’s a logger and alert user when job is finisched
6 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Computing Element Overview Computing Element provides some of main services of a site. Main functionalities: –job management (job submission, job control) –job status updated for WMS –Usually installed together with the site BDII service that publishes all information regarding the computing element It can runs several kinds of batch system: –Torque + MAUI –LSF –SGE –Condor
7 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Torque + MAUI Torque server service: –pbs_server provides basic batch services such as receiving/creating a batch job. Torque client service: –psb_mom places jobs into execution. It’s is also responsible for returning job’s output to the user. MAUI system service: –job_scheduler contains site’s policy to decide which job is going to be executed and when.
8 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Site BDII* By default it was installed on CE but now it’s better to install it on a dedicated server, physical or virtual. It collect all site GRISes* (for example SE,RB,LFC,etc...) Service is named bdii Log file: /opt/bdii/var/bdii.log *BDII = Berkeley Database Information Index **GRIS = Grid Resouce Information Service
9 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Worker Node Element Overview They are machines which really execute your job. User can only access their services by a Computing Element. Their characteristics are collected by Computing Element that publishes all information by BDII services
10 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Computing Resource Execution And Management Accept job submission requests belonging from a WMS and other job management request. It exposes a web services interface CE Cream overview
11 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Requirements Three or more machine: –One will be used to perform CE installation; –Others will be used to perform WN installation; Architecture: 64 bit Operating System: Scientific Linux 5 CE machine with a public ip address, direct and reverse address resolution on a DNS and equipped with an X509 certificate.
12 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, CE Cream and WN Installation & Configruation (on Torque/PBS)
13 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Wiki Follow the steps here for CE CREAM: – KH/CECreamEpikhhttps://grid.ct.infn.it/twiki/bin/view/EPI KH/CECreamEpikh Follow the steps here for WN: EPIKH/WNEpikhhttps://grid.ct.infn.it/twiki/bin/view/ EPIKH/WNEpikh
14 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, A few words on benchmark How to set CE_SI00, CE_SF00, CE_CAPABILITY, CE_OTHERDESCR ? Try to search for you value in thris link: For example if you have an Intel XEON GHz with no Hyper Threading will find in the table of previous link a value of 95 and a conversion factor of 1HS06=40 so: CE_SI00 = 3800 CE_SF00 = 3800 CE_CAPABILITY="CPUScalingReferenceSI00=3800” CE_OTHERDESCR="Cores=4,Benchmark=23.75-HEP-SPEC06” Where (3800/40)/4= 23.75
15 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Adding a VO # vim my-ig-site-info.def VOS="euindia infngrid ops dteam" QUEUES="cert grid" CERT_GROUP_ENABLE="euindia ops dteam" GRID_GROUP_ENABLE="infngrid"
16 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Adding a VO/2 q1q2q3
17 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Q1_GROUP_ENABLE Adding a VO/3 Q2_GROUP_ENABLE Q3_GROUP_ENABLE
18 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Adding a VO/4 # vim vo.d/euindia SW_DIR=$VO_SW_DIR/euindiaDEFAULT_SE=$SE_HOSTSTORAGE_DIR=$CLASS IC_STORAGE_DIR/euindiaVOMS_SERVERS="'vomss://voms.ct.infn.it:8443/voms/ euindia?/euindia'"VOMSES="'euindia voms.ct.infn.it /C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it euindia'"VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA'" Here some settings to support euindia VO: Then install the VO voms certificates with: wget i386/RPMS.app/cometa-vomscert noarch.rpm rpm –ivh cometa-vomscert noarch.rpm
19 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Adding a VO/5 Now you have to provide a group and some users for EUINDIA VO modifying this two files: -ig-groups.conf -ig-users.conf # vim ig-groups.conf # Append following lines to the end of file "/euindia/ROLE=SoftwareManager":::sgm: "/euindia"::::-
20 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Adding a VO/6 # vim ig-users.conf #append this line at the end of the file 39001:euindia001:3900:euindia:euindia:: 39002:euindia002:3900:euindia:euindia:: 39003:euindia003:3900:euindia:euindia:: 39004:euindia004:3900:euindia:euindia:: 39005:euindia005:3900:euindia:euindia:: 39006:euindia006:3900:euindia:euindia:: 39007:euindia007:3900:euindia:euindia:: 39008:euindia008:3900:euindia:euindia:: 39009:euindia009:3900:euindia:euindia:: 39010:euindia010:3900:euindia:euindia:: 39011:euindia011:3900:euindia:euindia:: 39012:euindia012:3900:euindia:euindia:: 39013:euindia013:3900:euindia:euindia:: 39014:euindia014:3900:euindia:euindia:: 39015:euindia015:3900:euindia:euindia:: 39016:euindia016:3900:euindia:euindia:: 39017:euindia017:3900:euindia:euindia:: 39018:euindia018:3900:euindia:euindia:: 39019:euindia019:3900:euindia:euindia:: 39020:euindia020:3900:euindia:euindia:: 39101:sgmeuindia001:3910,3900:sgmeuindia,euindia:euindia:sgm: 39102:sgmeuindia002:3910,3900:sgmeuindia,euindia:euindia:sgm: 39103:sgmeuindia003:3910,3900:sgmeuindia,euindia:euindia:sgm:
21 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Testing installation
22 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Tests on CE SSH access to CE to test if CE can see WN and to test if all main service are up & running # pbsnodes Your-ip-hostname state = free np = 2 properties = lcgpro ntype = cluster status = opsys=linux,uname=Linux grid-test-63.trigrid.it el5 #1 [cut] # /etc/init.d/gLite status*** tomcat5:/opt/glite/etc/init.d/tomcat5 is already running (1514)*** glite-lb- locallogger:glite-lb-logd runningglite-lb-interlogd running# /etc/init.d/globus- gridftp statusglobus-gridftp-server (pid 25452) is running...
23 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Tests on CE SSH access to CE and then become a gilda user: # su – euindia001 $ vi test.sh #!/bin/sh sleep 20 #(it's useful to see the job status) hostname Create a file and add the following: Set right permission to be executable: $ chmod 700 test.sh
24 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Tests on CE Launch job locally on CE $ qsub –q euindia test.sh Then check list of job in execution on CE $ qstat –a ce.localdomain: Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time wn.localdo gilda001 short test.sh :15 R -- In case you want to abort a job execution: $ qdel 3 #that is jobid In case you want to more info: $ qstat -f 3
25 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Tests on CE If typing “qstat -a” command you didn’t get no output, no jobs are being executed on CE and this means your previous job terminated so now you can list output. $ lstest.sh.e3 test.sh.o3 $ cat test.sh.e3 #error file$$ cat test.sh.o3 #output filewn.localdomain
26 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, JDL example $ vim hostname-cream.jdl Type = "Job"; JobType = "Normal"; Executable = "/bin/hostname"; StdOutput = "hostname.out"; StdError = "hostname.err"; OutputSandbox = {"hostname.err","hostname.out"}; Arguments = "-f"; OutputSandboxBaseDestUri = "gsiftp://localhost"; ShallowRetryCount = 3;
27 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Working test SSH access to UI to test if CE can receive and execute simple job $ ssh #password: XXXXXXX $ voms-proxy-init --voms euinda [cut] ~]$ glite-ce-delegate-proxy -e grid-test-33.trigrid.it riccardo :36:21,683 WARN - No configuration file suitable for loading. Using built-in configuration :36:26,389 NOTICE - Proxy with delegation id [riccardo] succesfully delegated to endpoint [ cream/services/gridsite-delegation] ~]$ glite-ce-job-submit –r grid-test-33.trigrid.it:8443/cream-pbs-cert -D riccardo hostname-cream.jdl :39:06,444 WARN - No configuration file suitable for loading. Using built-in configuration $ glite-ce-job-status JobID=[ Status = [DONE-OK] ExitCode = [0]
28 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Troubleshooting Which logs are supposed to be open if something goes wrong?: –/var/log/message, for general errors –/opt/glite/var/log (especially glite- ce-cream.log) –/var/spool/pbs/server_priv/account ing/, if even local submission on batch system doesn’t work.
29 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, References INFNGRID generic installation guide: – YAIM configuration variables – CE Cream installation guide: –GLITE Cream CE 3.2 SL5 Installation Guide [INFNGRID Release Wiki]GLITE Cream CE 3.2 SL5 Installation Guide [INFNGRID Release Wiki] YAIM system administrator guide: – How To Check And Test Your CREAMCE – CEhttp://grid.pd.infn.it/cream/field.php?n=Main.HowToCheckAndTestYourCREAM CE
30 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, Thank you for your kind attention ! Any questions ?