Presentation is loading. Please wait.

Presentation is loading. Please wait.

CE+WN+siteBDII Installation and configuration

Similar presentations


Presentation on theme: "CE+WN+siteBDII Installation and configuration"— Presentation transcript:

1 CE+WN+siteBDII Installation and configuration
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) CE+WN+siteBDII Installation and configuration Riccardo Rotondo National Institute of Nuclear Physics Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators Cairo,

2 Outline Computing Element overview Worker Node overview
CE CREAM overview gLite stack overview gLite CE cream and siteBDII Installation on CE and WN Configuration on CE and WN Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

3 gLite stack overview Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

4 gLite overview worker node
Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

5 glite overview User Interface: it’s the point of access for users to glite grid services WMS: it’s the component that optimize resource usage. CE: the machine who manage worker nodes WN: the machines who actually execute applications SE: machines where files are stored LFC: used to “find” files on the grid BDII: services responsible to publish all info of your sites Logging and Bookkeping: as it’s name says it’s a logger and alert user when job is finisched Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

6 Computing Element Overview
Computing Element provides some of main services of a site. Main functionalities: job management (job submission, job control) job status updated for WMS Usually installed together with the site BDII service that publishes all information regarding the computing element It can runs several kinds of batch system: Torque + MAUI LSF SGE Condor On of computing elements’ main task, fase? Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

7 Torque + MAUI Torque server service: Torque client service:
pbs_server provides basic batch services such as receiving/creating a batch job. Torque client service: psb_mom places jobs into execution. It’s is also responsible for returning job’s output to the user. MAUI system service: job_scheduler contains site’s policy to decide which job is going to be executed and when. subito Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

8 Site BDII* By default it was installed on CE but now it’s better to install it on a dedicated server, physical or virtual. It collect all site GRISes* (for example SE,RB,LFC,etc...) Service is named bdii Log file: /opt/bdii/var/bdii.log rara, acronimo *BDII = Berkeley Database Information Index **GRIS = Grid Resouce Information Service Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

9 Worker Node Element Overview
They are machines which really execute your job. User can only access their services by a Computing Element. Their characteristics are collected by Computing Element that publishes all information by BDII services Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

10 CE Cream overview Computing Resource Execution And Management
Accept job submission requests belonging from a WMS and other job management request. It exposes a web services interface interrogare query Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

11 Requirements Three or more machine: Architecture: 64 bit
One will be used to perform CE installation; One will be used to perform site BDII installation; Others will be used to perform WN installation; Architecture: 64 bit Operating System: Scientific Linux 5 Two machine with a public ip address, direct and reverse address resolution on a DNS and equipped with an X509 certificate (for the CE and site BDII) variare Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

12 CE Cream and BDII Installation (on Torque/PBS)
Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators, 12

13 Network Time Protocol # date # /etc/init.d/ntpd status
Let’s check if date’s machine is correct with: # date if ntp date isn’t correct # /etc/init.d/ntpd status # ntpdate ntp-1.infn.it if not let’s configure file and make service start on boot: # /etc/init.d/ntpd start # chkconfig ntpd on Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

14 Repository set up # cd /etc/yum.repos.d/
Add to system repository ones specific for middleware to install # cd /etc/yum.repos.d/ # mv dag.repo dag.repo.stop # REPO="dag ig glite-generic lcg-ca glite-cream_torque glite-bdii" # for rep_name in $REPO; do wget done Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

15 Which metapackages we are going to install?
There are several kinds of metapackages to install: lcg-CA LHC Computing Grid rpm collection to support external Certification Authority . ig_cream_torque INFNGRID Compunting Element CREAM and torque services rpm. ig_BDII INFNGRIF BDII services rpm. Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

16 Middleware component installation
Use yum to install needed packets # yum clean all # yum install -y lcg-CA # yum install -y ig_CREAM_torque # yum install -y ig_BDII Sometimes it’s necessary add manually metapackage profiles which include middleware components # yum install -y xml-commons-apis Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

17 Before configuration Some preliminary steps before configuration:
copy host certificate in default path: # cd # mv SRVXX.eun.eg/SRVXX.eun.eg-cert.pem /etc/grid-security/hostcert.pem # mv SRVXX.eun.eg/SRVXX.eun.eg-key.pem /etc/grid-security/hostkey.pem # chmod 400 /etc/grid-security/hostkey.pem # chmod 600 /etc/grid-security/hostcert.pem Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

18 Before configuration/2
generate configurations file from YAIM template # cd /opt/glite/yaim/examples/; mkdir mysite-conf # cp -r wn-list.conf ig-users.conf ig-groups.conf siteinfo/vo.d/ siteinfo/services/ siteinfo/ig-site-info.def mysite-conf/ # cd mysite-conf/ # mv ig-site-info.def my-ig-site-info.def Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

19 YAIM configuration Main file to edit is my-ig-site-info.def, where you specify some general settings and other component’s parameters (CE Cream) Other file to be edited are: wn-list.conf, ig-groups.conf, services/glite-creamce, services/ig-bdii_site Set variables with corrected values replacing example ones. # vim services/glite-creamce CEMON_HOST=${CE_HOST} CREAM_DB_USER="cream_db_user" CREAM_DB_PASSWORD="cream_pass" BLPARSER_HOST=${CE_HOST} Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

20 YAIM configuration/2 # vim services/glite-bdii_site
SITE_DESC=”Egypt Grid Site SITE_LOC=”Cairo, Egypt” SITE_OTHER_GRID="WLCG|EGEE|EUMED" BDII_REGIONS="CE" BDII_CE_URL="ldap://$CE_HOST:2170/mds-vo-name=resource,o=grid" Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

21 YAIM configuration/3 # vim wn-list.conf
### Delete all example values present SRVXX.eun.eg #insert worker nodes hostname Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

22 YAIM configuration/4 Here some settings to support AFRICACERT VO:
# vim vo.d/africacert SW_DIR=$VO_SW_DIR/africacertDEFAULT_SE=$SE_HOSTSTORAGE_DIR=$CLASSIC_STORAGE_DIR/africacertVOMS_SERVERS="'vomss://voms.ct.infn.it:8443/voms/africacert?/africacert'"VOMSES="'africacert voms.ct.infn.it /C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it africacert'"VOMS_CA_DN="'/C=IT/O=INFN/CN=INFN CA'" Then install the VO voms certificates with: wget rpm –ivh cometa-vomscert noarch.rpm Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

23 YAIM configuration/5 Now you have to provide a group and some users for AFRICACERT VO modifying this two files: ig-groups.conf ig-users.conf # vim ig-groups.conf # Append following lines to the end of file "/africacert/ROLE=SoftwareManager":::sgm: "/africacert"::::- Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

24 YAIM configuration/6 # vim ig-users.conf #append this line at the end of the file 39001:africacert001:3900:africacert:africacert:: 39002:africacert002:3900:africacert:africacert:: 39003:africacert003:3900:africacert:africacert:: 39004:africacert004:3900:africacert:africacert:: 39005:africacert005:3900:africacert:africacert:: 39006:africacert006:3900:africacert:africacert:: 39007:africacert007:3900:africacert:africacert:: 39008:africacert008:3900:africacert:africacert:: 39009:africacert009:3900:africacert:africacert:: 39010:africacert010:3900:africacert:africacert:: 39011:africacert011:3900:africacert:africacert:: 39012:africacert012:3900:africacert:africacert:: 39013:africacert013:3900:africacert:africacert:: 39014:africacert014:3900:africacert:africacert:: 39015:africacert015:3900:africacert:africacert:: 39016:africacert016:3900:africacert:africacert:: 39017:africacert017:3900:africacert:africacert:: 39018:africacert018:3900:africacert:africacert:: 39019:africacert019:3900:africacert:africacert:: 39020:africacert020:3900:africacert:africacert:: 39101:sgmafricacert001:3910,3900:sgmafricacert,africacert:africacert:sgm: 39102:sgmafricacert002:3910,3900:sgmafricacert,africacert:africacert:sgm: 39103:sgmafricacert003:3910,3900:sgmafricacert,africacert:africacert:sgm: Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

25 YAIM configuration/7 In my-ig-site-info.def there are many variables to set: # vim my-ig-site-info.def WN_LIST=/opt/glite/yaim/examples/mysite-conf/wn-list.conf USERS_CONF=/opt/glite/yaim/examples/mysite-conf/ig-users.conf GROUPS_CONF=/opt/glite/yaim/examples/mysite-conf/ig-groups.conf MYSQL_PASSWORD=good_mysql_pass # any password you want SITE_NAME=EG-01-ERI SITE_LAT=30.04 SITE_LONG=31.21 Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

26 YAIM configuration/8 # vim my-ig-site-info.def
CE_HOST=SRVXX.eun.eg # substitute with hostname machine CE_CPU_MODEL=XEON #cat /proc/cpuinfo CE_CPU_VENDOR=Intel CE_CPU_SPEED=2230 CE_OS=ScientificSL CE_OS_RELEASE= #cat /etc/redhat-release CE_OS_VERSION="Boron" CE_OS_ARCH=x86_64 CE_MINPHYSMEM=512 #cat /proc/meminfo on WN CE_MINVIRTMEM=512 CE_PHYSCPU= #total cpu in site (dual dual core) CE_LOGCPU=4 CE_SMPSIZE=4 CE_OUTBOUNDIP=TRUE CE_INBOUNDIP=FALSE Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

27 YAIM configuration/9 CE_RUNTIMEENV=" LCG-2 LCG-2_1_0 LCG-2_1_1
# vim my-ig-site-info.def CE_RUNTIMEENV=" LCG-2 LCG-2_1_0 LCG-2_1_1 LCG-2_2_0 LCG-2_3_0 LCG-2_3_1 LCG-2_4_0 LCG-2_5_0 LCG-2_6_0 LCG-2_7_0 GLITE-3_0_0 GLITE-3_1_0 GLITE-3_2_0 R-GMA SI00MeanPerCPU_3800 SF00MeanPerCPU_3800 " CE_SI00=3800 CE_SF00=3800 Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

28 YAIM configuration/10 # vim my-ig-site-info.def
CE_CAPABILITY="CPUScalingReferenceSI00=23.75" CE_OTHERDESCR="Cores=4,Benchmark=6.5-HEP-SPEC06" SE_MOUNT_INFO_LIST="${INT_HOST_SW_DIR}:/opt/exp_soft,/opt/exp_soft" Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

29 YAIM configuration/11 How to set CE_SI00, CE_SF00, CE_CAPABILITY, CE_OTHERDESCR ? Try to search for you value in thris link: SPEC06 12 For example if you have an Intel XEON GHz with no Hyper Threading will find in the table of previous link a value of 95 and a conversion factor of 1HS06=40 so: CE_SI00 = 3800 CE_SF00 = 3800 CE_CAPABILITY="CPUScalingReferenceSI00=3800” CE_OTHERDESCR="Cores=4,Benchmark=23.75-HEP-SPEC06” Where (3800/40)/4= 23.75 Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

30 YAIM configuration/12 # vim my-ig-site-info.def
BATCH_SERVER=SRVXX.eun.eg JOB_MANAGER=pbs CE_BATCH_SYS=pbs BATCH_LOG_DIR=/var/spool/pbs APEL_DB_PASSWORD="anything" DGAS_ACCT_DIR=/var/spool/pbs/server_priv/accounting VOS="africacert eumed infngrid ops dteam" QUEUES="cert eumed grid" CERT_GROUP_ENABLE="ops dteam africacert" EUMED_GROUP_ENABLE="eumed" GRID_GROUP_ENABLE="infngrid" Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

31 YAIM configuration/14 After editing you can launch command: # /opt/glite/yaim/bin/ig_yaim -c -s my-ig-site-info.def -n ig_CREAM_torque # /opt/glite/yaim/bin/ig_yaim -c -s my-ig-site-info.def -n ig_BDII_site Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

32 Fixing errors ONLY IF TOMCAT IS STOPPED TRY THIS SOLUTION:
Check tomcat running after configuration, if you get this message:: # /etc/init.d/tomcat5 status/etc/init.d/tomcat5 is stopped ONLY IF TOMCAT IS STOPPED TRY THIS SOLUTION: # rm -fr /var/lib/tomcat5/common/lib/jakarta* # /etc/init.d/tomcat5 start Starting tomcat5: /usr/bin/rebuild-jar-repository: error: Could not find log4j Java extension for this JVM/usr/bin/rebuild-jar-repository: error: Some detected jars were not found for this jvm [ OK ] Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

33 WN Cream Installation (on Torque/PBS)
Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators, 33

34 WN - Network Time Protocol
Let’s check if date’s machine is correct with: # date if ntp date isn’t correct # /etc/init.d/ntpd status # ntpdate ntp-1.infn.it if not let’s configure file and make service start on boot: # /etc/init.d/ntpd start # chkconfig ntpd on Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

35 WN - Repository set up (by CNAF repo)
Add to system repository ones specific for middleware to install # cd /etc/yum.repos.d/ # mv dag.repo dag.repo.stop# REPO="dag ig glite-generic lcg-ca glite-wn_torque"# for rep_name in $REPO; do wget done Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

36 Which metapackages we are going to install?
There are several kinds of metapackages to install: lcg-CA LHC Computing Grid rpm collection to support external Certification Authority . ig_WN_torque_noafs INFNGRID Worker Node torque client in other to dialogue to torque server. We decide not to install afs file system. This metapackage is used with groupinstall option. Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

37 WN - Middleware component installation
Use yum to install needed packets # yum clean all # yum install -y lcg-CA # yum groupinstall -y ig_WN_torque_noafs Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

38 WN - YAIM Configuration
You can use same configuration file edited on CE: this can be done on all worker node of a site; so you don’t neet to re-edit anything! Copy file from CE machine: # cd /opt/glite/yaim/examples/ # scp -r . # cd mysite-conf Ready to configure now # /opt/glite/yaim/bin/ig_yaim -c -s my-ig-site-info.def -n ig_WN_torque_noafs Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

39 Testing installation Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators, 39

40 Tests on CE SSH access to CE to test if CE can see WN and to test if all main service are up & running # pbsnodes SRVXX.eun.eg state = free np = properties = lcgpro ntype = cluster status = opsys=linux,uname=Linux grid-test-63.trigrid.it el5 #1 [cut] # /etc/init.d/gLite status*** tomcat5:/opt/glite/etc/init.d/tomcat5 is already running (1514)*** glite-lb-locallogger:glite-lb-logd runningglite-lb-interlogd running# /etc/init.d/globus-gridftp statusglobus-gridftp-server (pid 25452) is running... Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

41 Tests on CE SSH access to CE and then become a gilda user:
# su – eumed001 Create a file and add the following: $ vi test.sh #!/bin/sh sleep 20 #(it's useful to see the job status) hostname Set right permission to be executable: $ chmod 700 test.sh Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

42 Tests on CE Launch job locally on CE $ qsub –q eumed test.sh
Then check list of job in execution on CE $ qstat –a ce.localdomain: Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time wn.localdo gilda001 short test.sh :15 R -- In case you want to more info: $ qstat -f 3 In case you want to abort a job execution: $ qdel 3 #that is jobid Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

43 Tests on CE If typing “qstat -a” command you didn’t get no output, no jobs are being executed on CE and this means your previous job terminated so now you can list output. $ lstest.sh.e3 test.sh.o3 $ cat test.sh.e3 #error file$$ cat test.sh.o3 #output filewn.localdomain Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

44 JDL example $ vim hostname-cream.jdl Type = "Job"; JobType = "Normal";
Executable = "/bin/hostname"; StdOutput = "hostname.out"; StdError = "hostname.err"; OutputSandbox = {"hostname.err","hostname.out"}; Arguments = "-f"; OutputSandboxBaseDestUri = "gsiftp://localhost"; ShallowRetryCount = 3; Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

45 Working test SSH access to UI to test if CE can receive and execute simple job $ ssh #password: ceristuserXX $ voms-proxy-init --voms eumed [cut] ~]$ glite-ce-delegate-proxy -e SRVXX.eun.eg riccardo :36:21,683 WARN - No configuration file suitable for loading. Using built-in configuration :36:26,389 NOTICE - Proxy with delegation id [riccardo] succesfully delegated to endpoint [ ~]$ glite-ce-job-submit –r SRVXX.eun.eg:8443/cream-pbs-cert -D riccardo hostname-cream.jdl :39:06,444 WARN - No configuration file suitable for loading. Using built-in configuration $ glite-ce-job-status JobID=[ Status = [DONE-OK] ExitCode = [0] Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

46 Test on BDII ldapsearch is a very useful command to query the index
$ ldapsearch -x -h sirius-ce.ct.infn.it -p b "mds-vo-name=local, o=grid" # extended LDIF # # LDAPv3 # base <mds-vo-name=local, o=grid> with scope sub # filter: (objectclass=*) # requesting: ALL # local, grid dn: Mds-Vo-name=local,o=grid objectClass: GlueTop objectClass: Mds Mds-Vo-name: local # search result search: 2 result: 0 Success # numResponses: 2 # numEntries: 1 Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

47 Test on BDII If you want to have more and more info (with this resources it will be print a lof of info): $ ldapsearch -x -LLL -h sirius-se.ct.infn.it -p b "mds-vo-name=resource, o=grid" Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

48 Which logs are supposed to be open if something goes wrong?:
Troubleshooting Which logs are supposed to be open if something goes wrong?: /var/log/message, for general errors /opt/glite/var/log (especially glite- ce-cream.log) /var/spool/pbs/server_priv/account ing/<data>, if even local submission on batch system doesn’t work. Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

49 References INFNGRID generic installation guide:
YAIM configuration variables CE Cream installation guide: GLITE Cream CE 3.2 SL5 Installation Guide [INFNGRID Release Wiki] YAIM system administrator guide: EUMEDGRID wiki: EuMedGRID sites installation and setup tips on How To Check And Test Your CREAMCE CE Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,

50 Thank you for your kind attention !
Any questions ? Cairo, Africa Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators,


Download ppt "CE+WN+siteBDII Installation and configuration"

Similar presentations


Ads by Google