Presentation is loading. Please wait.

Presentation is loading. Please wait.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) CE+WN+siteBDII Installation and configuration Bouchra

Similar presentations


Presentation on theme: "The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) CE+WN+siteBDII Installation and configuration Bouchra"— Presentation transcript:

1 www.epikh.eu The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) CE+WN+siteBDII Installation and configuration Bouchra RAHIM(rahim@cnrst.ma) Africa 6 2010 - Joint EUMEDGRID-Support/EPIKH School for Grid Site Administrators Rabat, 01.06.2011

2 2 Outline Computing Element overview Worker Node overview CE CREAM overview gLite stack overview gLite CE siteBDII gLite CE cream and WN

3 3 gLite stack overview

4 4 gLite overview worker node

5 5 glite overview User Interface: it’s the point of access for users to glite grid services WMS: it’s the component that optimize resource usage. CE: the machine who manage worker nodes WN: the machines who actually execute applications SE: machines where files are stored LFC: used to “find” files on the grid BDII: services responsible to publish all info of your sites Logging and Bookkeping: as it’s name says it’s a logger and alert user when job is finisched

6 6 Computing Element Overview Computing Element provides some of main services of a site. Main functionalities: –job management (job submission, job control) –job status updated for WMS –Communicate with BDII site that publishes all information regarding the computing element It can runs several kinds of batch system: –Torque + MAUI –LSF –SGE –Condor

7 7 Torque + MAUI Torque server service: –pbs_server provides basic batch services such as receiving/creating a batch job. Torque client service: –psb_mom places jobs into execution. It’s is also responsible for returning job’s output to the user. MAUI system service: –job_scheduler contains site’s policy to decide which job is going to be executed and when.

8 8 Site BDII* By default it was installed on CE but now it’s better to install it on a dedicated server, physical or virtual. It collect all site GRISes* (for example SE,RB,LFC,etc...) Service is named bdii Log file: /opt/bdii/var/bdii.log *BDII = Berkeley Database Information Index **GRIS = Grid Resouce Information Service

9 9 Worker Node Element Overview They are machines which really execute your job. User can only access their services by a Computing Element. Their characteristics are collected by Computing Element that publishes all information by BDII services

10 Computing Resource Execution And Management Accept job submission requests belonging from a WMS and other job management request. It exposes a web services interface 10 CE Cream overview

11 11 Requirements Three or more machine: –One will be used to perform CE installation; –One will be used to perform site BDII installation; –Others will be used to perform WN installation; Architecture: 64 bit Operating System: Scientific Linux 5 Two machines with a public ip address, direct and reverse address resolution on a DNS (CE and BDII ) The CE machine must be equipped with an X509 certificate

12 12 BDII Installation )

13 13 Preparing the Linux machine Network Time Protocol settings # yum install ntp Copy the ntp.conf file and the ntp directory from ftp://repo.magrid.ma/pub/CE_WN_BDII/ to /etc/ (Winscp) ftp://repo.magrid.ma/pub/CE_WN_BDII/ Synchronize the date # /etc/init.d/ntpd stop # ntpdate ntp.marwan.ma # /etc/init.d/ntpd start # chkconfig ntpd on Start the ntpd service and configure it to start on boot

14 14 Preparing the Linux machine Disable Selinux: make sure /etc/selinux/config contains line: SELINUX=disabled # /etc/init.d/iptables stop # chkconfig iptables off Stop iptables Please check If you have a valid hostname #hostname –f # cat /etc/hosts Reboot

15 15 Repository set up-BDII Add to system repository ones specific for middleware to install # cd /etc/yum.repos.d/ # mv dag.repo dag.repo.stop export MREPO=http://repo.magrid.ma/yumrepo/glite32 # REPOS="dag lcg-CA glite-BDII_site" # for name in $REPOS; do wget $MREPO/$name.repo –O /etc/yum.repos.d/$name.repo; done

16 16 package installation-BDII Use yum to install needed packets # yum install lcg-CA ca-policy-egi-core ca-policy-lcg # yum install glite-BDII_site

17 17 Yaim Configuration All the configuration samples files are located in /opt/glite/yaim/examples/siteinfo directory it’s better to make a copy of the original files

18 18 Yaim Configuration You can find some template files in : ftp://repo.magrid.ma/pub/CE_WN_BDII/ ftp://repo.magrid.ma/pub/CE_WN_BDII/ Edit the site-info.def file and change the following variables: –SITE_NAME=MA-ZZ-School (Name of the site) –CE_HOST=pcXX.magrid.ma (XX the machine that will be a CE) –SITE_BDII_HOST=pcYY.magrid.ma(the current machine) Edit the services/glite-bdii_site file and change the following variables: –SITE_NAME=MA-ZZ-School –SITE_DESC="MA-ZZ-School"

19 19 Yaim Configuration-BDII Run the configuration Command: if everything is OK, run a basic test –ldapsearch -x -h pcYY.magrid.ma -p 2170 -b "mds-vo- name=local,o=grid" /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite- BDII_site

20 20 CE Cream Installation (on Torque/PBS) 20

21 21 Preparing the Linux machine Network Time Protocol settings # yum install ntp Copy the ntp.conf file and the ntp directory from ftp://repo.magrid.ma/pub/CE_WN_BDII/ to /etc/ (Winscp) ftp://repo.magrid.ma/pub/CE_WN_BDII/ Synchronize the date with an ntp server # /etc/init.d/ntpd stop # ntpdate ntp.marwan.ma # /etc/init.d/ntpd start # chkconfig ntpd on Start the ntpd service and configure it to start on boot Preparing the Linux machine

22 22 Preparing the Linux machine Disable Selinux: make sure /etc/selinux/config contains line: SELINUX=disabled # /etc/init.d/iptables stop # chkconfig iptables off Stop iptables Please check If you have a valid hostname #hostname –f # cat /etc/hosts Preparing the Linux machine Reboot

23 23 Repository set up-CE Add to system repository ones specific for middleware to install # cd /etc/yum.repos.d/ # mv dag.repo dag.repo.stop export MREPO=http://repo.magrid.ma/yumrepo/glite32 # REPO="dag lcg-CA glite-CREAM glite-TORQUE_server glite-TORQUE_utils" # for name in $REPOS; do wget $MREPO/$name.repo –O /etc/yum.repos.d/$name.repo; done

24 24 package installation-CE Use yum to install needed packets # yum clean all # yum install lcg-CA ca-policy-egi-core ca-policy-lcg # yum install glite-CREAM # yum install glite-TORQUE_server glite-TORQUE_utils Due to a dependency problem within the Tomcat distribution in SL5 first install xml-commons-apis: yum install xml-commons-apis

25 25 Before configuration-HostCertificates Some preliminary steps before configuration: -copy host certificate in default path: # cd # mv /root/pcXXcert.pem /etc/grid-security/hostcert.pem # mv root/pcXXkey.pem /etc/grid-security/hostkey.pem # chmod 400 /etc/grid-security/hostkey.pem # chmod 600 /etc/grid-security/hostcert.pem

26 26 YAIM configuration-CE Main file to edit is site-info.def, where you specify some general settings and other component’s parameters (CE Cream) Other file to be edited are: wn-list.conf, users.conf,groups.conf, services/glite-creamce Set variables with corrected values replacing example ones. # vi services/glite-creamce CEMON_HOST=pcXX.$MY_DOMAIN CREAM_DB_USER=eumed CREAM_DB_PASSWORD=grid2011 BLPARSER_HOST=pcXX.$MY_DOMAIN

27 27 YAIM configuration-CE # vi wn-list.conf pcAA.magrid.ma pcBB.magrid.ma Declare the worker nodes in wn-list.conf

28 28 YAIM configuration-CE CE_HOST=pcYY.magrid.ma CE_CPU_MODEL=XEON #cat /proc/cpuinfo CE_CPU_VENDOR=Intel CE_CPU_SPEED=2230 CE_OS=ScientificSL CE_OS_RELEASE=5.5 #cat /etc/redhat-release CE_OS_VERSION="Boron" CE_OS_ARCH=x86_64 CE_MINPHYSMEM=512 #cat /proc/meminfo on WN CE_MINVIRTMEM=512 CE_PHYSCPU=1 #total cpu in site CE_LOGCPU=4 CE_SMPSIZE=4 CE_OUTBOUNDIP=TRUE CE_INBOUNDIP=FALSE CE_OTHERDESCR="Cores=4,Benchmark=6.5-HEP-SPEC06” http://gkswiki.fzk.de/index.php5/Configuration_of_the_CREAM_CE

29 29 YAIM configuration-CE How to set CE_SI00, CE_SF00, CE_CAPABILITY, CE_OTHERDESCR ? Try to search for you value in this link: http://www.italiangrid.org/grid_operations/site_manager/HEP-SPEC06 https://hepix.caspur.it/benchmarks/doku.php?id=bench:results_sl5_x86_64_gcc_4 12https://hepix.caspur.it/benchmarks/doku.php?id=bench:results_sl5_x86_64_gcc_4 12 https://hepix.caspur.it/processors/dokuwiki/doku.php?id=benchmarks:results For example if you have an Intel XEON 5520 2.23 GHz with no Hyper Threading will find in the table of previous link a value of 95 and a conversion factor of 1HS06=40 so: CE_SI00 = 3800 CE_SF00 = 3800 CE_CAPABILITY="CPUScalingReferenceSI00=3800” CE_OTHERDESCR="Cores=4,Benchmark=23.75-HEP-SPEC06” Where (3800/40)/4= 23.75

30 30 YAIM configuration-CE BATCH_SERVER=$CE_HOST JOB_MANAGER=lcgpbs CE_BATCH_SYS=pbs BATCH_LOG_DIR=/var/spool/pbs APEL_DB_PASSWORD=grid2011 DGAS_ACCT_DIR=/var/spool/pbs/server_priv/accounting VOS="eumed" QUEUES=“eumed" EUMED_GROUP_ENABLE="eumed"

31 31 YAIM configuration-CE # /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n creamCE -n TORQUE_server -n TORQUE_utils #/opt/glite/yaim/bin/yaim -r -s /opt/glite/yaim/etc/siteinfo/site-info.def -n creamCE -f config_cream_blparser After editing you can launch command: http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:devel:install-cream32

32 32 Check the CE http://grid.pd.infn.it/cream/field.php?n=Main.CheckYourCREAMCEC onfigurationhttp://grid.pd.infn.it/cream/field.php?n=Main.CheckYourCREAMCEC onfiguration Download the script wget http://grid.pd.infn.it/cream/CheckCreamConf/current/CheckCrea mConf.pl chmod +x CheckCreamConf.pl Run it:./CheckCreamConf.pl Check output : CheckCreamConf.log

33 33 WN Cream Installation (on Torque/PBS) 33

34 34 Preparing the Linux machine Network Time Protocol settings # yum install ntp Copy the ntp.conf file and the ntp directory from ftp://repo.magrid.ma/pub/CE_WN_BDII/ to /etc/ (Winscp) ftp://repo.magrid.ma/pub/CE_WN_BDII/ Synchronize the date # /etc/init.d/ntpd stop # ntpdate ntp.marwan.ma # /etc/init.d/ntpd start # chkconfig ntpd on Start the ntpd service and configure it to start on boot Preparing the Linux machine

35 35 Preparing the Linux machine Disable Selinux: make sure /etc/selinux/config contains line: SELINUX=disabled # /etc/init.d/iptables stop # chkconfig iptables off Stop iptables Please check If you have a valid hostname #hostname –f # cat /etc/hosts Preparing the Linux machine Reboot

36 36 Repository set up-CE Add to system repository ones specific for middleware to install # cd /etc/yum.repos.d/ # mv dag.repo dag.repo.stop export MREPO=http://repo.magrid.ma/yumrepo/glite32 # REPOS="dag lcg-CA glite-WN glite-TORQUE_client " # for name in $REPOS; do wget $MREPO/$name.repo –O /etc/yum.repos.d/$name.repo; done Repository set up-WN

37 37 package installation-CE Use yum to install needed packets # yum clean all # yum install -y lcg-CA ca-policy-egi-core ca-policy-lcg # yum groupinstall glite-WN # yum install glite-TORQUE_client package installation-WN

38 38 WN - YAIM Configuration You can use same configuration file edited on CE: -this can be done on all worker node of a site; -so you don’t neet to re-edit anything! Copy configuration files from CE machine using scp command: mkdir /opt/glite/yaim/etc/siteinfo/ mkdir /opt/glite/yaim/etc/siteinfo/services #Copy the following files site-info.def,users.conf,groups.conf and wn- list.conf from ce root@pcYY:/opt/glite/yaim/etc/siteinfo/site-info.def #copy the glite-wn from examples/services Ready to configure now # /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite- WN -n TORQUE_client

39 39 WN - YAIM Configuration Ready to configure now # /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite- WN -n TORQUE_client A basic test: Check the status of pbs_mom pbsnodes –a

40 40 Ready to configure now # /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/siteinfo/site-info.def -n glite- WN -n TORQUE_client A basic test: Check the status of pbs_mom pbsnodes –a

41 41 Testing installation

42 42 Tests on CE SSH access to CE to test if CE can see WN and to test if all main service are up & running # pbsnodes # /etc/init.d/gLite status

43 43 Tests on CE SSH access to CE and then become a gilda user: # su – eumed001 $ vi test.sh #!/bin/sh sleep 20 #(it's useful to see the job status) hostname Create a file and add the following: Set right permission to be executable: $ chmod 700 test.sh

44 44 Tests on CE Launch job locally on CE $ qsub –q eumed test.sh Then check list of job in execution on CE $ qstat –a ce.localdomain: Req'd Req'd ElapJob ID Username Queue Jobname SessID NDS TSK Memory Time S Time--------------- -------- -------- ---------- ------ --- --- ------ ----- - ----0.pc22.magrid.ma eumed001 short test.sh 5839 -- -- -- 00:15 R -- In case you want to abort a job execution: $ qdel 3 #that is jobid In case you want to more info: $ qstat -f 3

45 45 Tests on CE If typing “qstat -a” command you didn’t get no output, no jobs are being executed on CE and this means your previous job terminated so now you can list output. $ ls test.sh.e3 test.sh.o3 $ cat test.sh.e3 #error file$$ cat test.sh.o3 #output filewn.localdomain

46 46 JDL example $ vim hostname-cream.jdl Type = "Job"; JobType = "Normal"; Executable = "/bin/hostname"; StdOutput = "hostname.out"; StdError = "hostname.err"; OutputSandbox = {"hostname.err","hostname.out"}; Arguments = "-f"; OutputSandboxBaseDestUri = "gsiftp://localhost/tmp“;

47 47 Working test SSH access to UI to test if CE can receive and execute simple job $ ssh gridXX@ui01.magrid.ma #password: gridXX #set up the certificate mkdir /home/grid01/.globus [root@ui01 ~]# cp /root/user_cert/usercert.pem /home/grid01/.globus/usercert.pem [root@ui01 ~]# cp /root/user_cert/userkey.pem /home/grid01/.globus/userkey.pem [root@ui01 ~]# chown grid01 /home/grid01/.globus/usercert.pem [root@ui01 ~]# chown grid01 /home/grid01/.globus/userkey.pem [root@ui01 ~]# chmod 400 /home/grid01/.globus/userkey.pem [root@ui01 ~]# su – grid01 [grid01@ui01 ~]$ voms-proxy-init --voms eumed Enter GRID pass phrase: [grid2011] $ voms-proxy-init --voms eumed password[grid2011] #glite-ce-job-submit –r pc22.magrid.ma:8443/cream-pbs-eumed –o ID hostname-cream.jdl #glite-ce-job-status –i ID

48 48 Troubleshooting Which logs are supposed to be open if something goes wrong?: –/var/log/message, for general errors –/opt/glite/var/log (especially glite- ce-cream.log) –/var/spool/pbs/server_priv/account ing/, if even local submission on batch system doesn’t work.

49 49 References INFNGRID generic installation guide: –http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:install-3_2http://igrelease.forge.cnaf.infn.it/doku.php?id=doc:guides:install-3_2 YAIM configuration variables –https://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variableshttps://twiki.cern.ch/twiki/bin/view/LCG/Site-info_configuration_variables CE Cream installation guide: –GLITE Cream CE 3.2 SL5 Installation Guide [INFNGRID Release Wiki]GLITE Cream CE 3.2 SL5 Installation Guide [INFNGRID Release Wiki] YAIM system administrator guide: –https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide400 EUMEDGRID wiki: –http://wiki.eumedgrid.eu/bin/viewhttp://wiki.eumedgrid.eu/bin/view EuMedGRID sites installation and setup tips –http://wiki.eumedgrid.eu/twiki/bin/view/InfrastructureStatus/EumedSiteInstallati onhttp://wiki.eumedgrid.eu/twiki/bin/view/InfrastructureStatus/EumedSiteInstallati on How To Check And Test Your CREAMCE –http://grid.pd.infn.it/cream/field.php?n=Main.HowToCheckAndTestYourCREAM CEhttp://grid.pd.infn.it/cream/field.php?n=Main.HowToCheckAndTestYourCREAM CE

50 50 Thank you for your kind attention ! Any questions ?


Download ppt "The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) CE+WN+siteBDII Installation and configuration Bouchra"

Similar presentations


Ads by Google