Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to get started London Tier2 O. van der Aa. 16/04/2007 Running the LT2 UK HEP Grid: GridPP, One T1, Four T2 ScotGrid Durham, Edinburgh, Glasgow NorthGrid.

Similar presentations


Presentation on theme: "How to get started London Tier2 O. van der Aa. 16/04/2007 Running the LT2 UK HEP Grid: GridPP, One T1, Four T2 ScotGrid Durham, Edinburgh, Glasgow NorthGrid."— Presentation transcript:

1 How to get started London Tier2 O. van der Aa

2 16/04/2007 Running the LT2 UK HEP Grid: GridPP, One T1, Four T2 ScotGrid Durham, Edinburgh, Glasgow NorthGrid Daresbury, Lancaster, Liverpool, Manchester, Sheffield SouthGrid Birmingham, Bristol, Cambridge, Oxford, RAL PPD London Tier2 Brunel, Imperial, QMUL, RHUL, UCL

3 16/04/2007 Running the LT2 Imperial College Spread Across two Sites Physics Department 465 KIS2K (Dual Core Intel Woodcrest) running sge 6. 60TB running dCache Computing Department 177 KIS2K (Opterons) running sge 6. Storage using the Physics Department one All Running RHEL4 and RHEL3 and using the LCG Tarball. Local Physicist, CMS/LHCB/DZero

4 16/04/2007 Running the LT2 324 KSI2K across two clusters. Two CE running pbs/maui 6.5 TB of storage running DPM Complex situation wrt to networking. Grid is in demilitarized zone with 200Mb/s max. Local Physicist are mainly from CMS.

5 16/04/2007 Running the LT2 Biggest cluster in London Mixture of Athlons,Xeons Opterons. Total of 1200 KSI2K running separate pbs/maui Cluster shared by Astronomy/HEP/Material Sciences. Storage 18TB running poolfs and DPM Expect to use worker node local disk with luster. 400TB Local community Atlas oriented

6 16/04/2007 Running the LT2 Separate pbs/maui from ce 160 KSI2K 8TB running DPM ATLAS/ILC community Running slc3 Will soon buy 265KSI2K and 136TB to come around april

7 16/04/2007 Running the LT2 Situation similar to Imperial: Physics department 24KSI2K and ~1TB Computing department Shared cluster with 50 KSI2K 1.5 TB running DPM Running centos3, sge

8 16/04/2007 Running the LT2 Resource Summary CPU: 2.5 MSI2K Storage: 94 TB

9 16/04/2007 Running the LT2 How are the resources used ? Currently around 70%

10 16/04/2007 Running the LT2 How to contact us Our mailing list: lt2-technical@imperial.ac.uklt2-technical@imperial.ac.uk –The coordinator: o.van-der-aa@ic.ac.uko.van-der-aa@ic.ac.uk –The T2 manager: d.colling@ic.ac.uk via GGUS: http://www.ggus.orghttp://www.ggus.org –Specify UKI-LT2 in the subject field and the university –Use it for any specific problem once you are setup Our wiki: http://wiki.gridpp.ac.uk/wiki/London_Tier2http://wiki.gridpp.ac.uk/wiki/London_Tier2 –Used to describe the infrastructure –Gives links to monitoring pages

11 16/04/2007 Running the LT2 How to start ? “The Tree Steps” … 1.Register for a certificate (as explained in the ngs talk). https://ca.grid-support.ac.uk/ 2.With your certificate register to the ltwo virtual organisation https://voms.gridpp.ac.uk:8443/voms/ltwo/ 3.Get access to a user interface Ask via the lt2-technical mailing list. Each university in the LT2 has a user interface

12 16/04/2007 Running the LT2 Summary, main Grid Components User Interface (UI) is where the user sits to submit his job The Virtual Organisation Membership Service (VOMS) is involved in authorizing and authenticating users The Information System (IS) publishes the individual site information (CE Queue names, SE contact points, #waiting jobs, #running jobs etc) The Workload Management System (WMS) take the user job find a compatible site and submit the job to the site CE. The Computing Element (CE) is the entrance point for the jobs to get into the computing cluster. The Storage Element (SE) is the equivalent of the CE but for data

13 16/04/2007 Running the LT2 The Main Grid Components wms voms

14 16/04/2007 Running the LT2 Information System Tree structure showing all available resources in the Grid. –Implemented in the form of a ldap server –Top Level view at lcg-bdii.gridpp.ac.uk, port 2170 –Interesting to have a look Use Jxplorer ldap browser http://www.jxplorer.org/

15 16/04/2007 Running the LT2 Submitting your first job Get a login on a user interface –In this case gfe03.hep.ph.ic.ac.uk Initialize your proxy –voms-proxy-init --voms ltwo Prepare your JDL (Job Description Language) –The name of the executable –The files you want to transfer before the job starts –Your constrains, for example: How much cpu time you need Which subset of resources you want to use

16 16/04/2007 Running the LT2 The files Hello.jdl Executable = "/bin/sh"; Arguments = "Hello.sh"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"Hello.sh"}; OutputSandbox = {"std.out", "std.err"}; Hello.sh #!/bin/sh echo 'Hello LT2 Workshop' whoami hostname

17 16/04/2007 Running the LT2 Submitting Finding matching resources –edg-job-list-match Hello.jdl *************************************************************************** COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* ce00.hep.ph.ic.ac.uk:2119/jobmanager-sge-30min ce00.hep.ph.ic.ac.uk:2119/jobmanager-sge-72hr ce1.pp.rhul.ac.uk:2119/jobmanager-pbs-ltwogrid dgc-grid-35.brunel.ac.uk:2119/jobmanager-lcgpbs-short gw39.hep.ph.ic.ac.uk:2119/jobmanager-lcgpbs-ltwo mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-10min mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-12hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-1hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-24hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-30min mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-3hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-6hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-72hr gw-2.ccc.ucl.ac.uk:2119/jobmanager-sge-default ***************************************************************************

18 16/04/2007 Running the LT2 Submitting The actual submission –Edg-job-submit Hello.jdl ********************************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://gfe01.hep.ph.ic.ac.uk:9000/izz75vlTThizfJVP-7VGdQ ********************************************************************************************* This is your job identifier you need to keep track of them

19 16/04/2007 Running the LT2 Checking the state of your job Edg-job-status [your job id]

20 16/04/2007 Running the LT2 Getting the result Edg-job-get-output [jobid] –Will store your OutputSandbox files in /tmp/ Std.out Std.err –Content of std.out --------- Hello LT2 Workshop lt2-ltwo007 mars092.mars.lesc.doc.ic.ac.uk ---------

21 16/04/2007 Running the LT2 JDL: more complex requirements Specify a CE in a domain –Requirements = RegExp(".*mars.lesc.doc.ic.ac.uk.*$",other.GlueCEUniqueID); Require Some CPU Time (min) –Requirements = RegExp(".*mars.lesc.doc.ic.ac.uk.*$",other.GlueCEUniqueID) && (other.GlueCEPolicyMaxCPUTime > 600); Require Some CPU*KSI2K Time –Requirements = other.GlueCEPolicyMaxCPUTime > 30 * 500/other.GlueHostBenchmarkSI00 )) More on how to master JDL at http://tinyurl.com/28oje9

22 16/04/2007 Running the LT2 Data Management In the previous example –All files are transferred via the SandBox –SandBox is limited to 100Mb Clearly something additional is required to transfer bigger datasets  Data Management tools: lcg utils, gfal

23 16/04/2007 Running the LT2 Catalogue services A “file” is identified by a GUID Several Alias (LFN) can be attached to the GUID One “file” can be located a several places (PFN)

24 16/04/2007 Running the LT2 Uploading a file to a storage element (SE) Finding list of SE –Lcg-info-sites --vo dteam SE –If you don’t specify an SE the one closest to the cluster will be used Uploading –lcg-cr --vo dteam -d gfe02.hep.ph.ic.ac.uk file:myfile.dta –Returns: guid:ec362b1a-6f88-4860-a72b-68d4ad55eb59

25 16/04/2007 Running the LT2 GUID ? Remembering GUID is not human friendly You can give an alias (lfn) to a GUID. –lcg-aa --vo dteam guid:ec362b1a-6f88-4860-a72b-68d4ad55eb59 lfn:/grid/home/lt2wk.dta You can give an alias when registering the file –lcg-cr --vo dteam -d gfe02.hep.ph.ic.ac.uk file:myfile.dta -l lfn:/grid/dteam/lt2wk.dta

26 16/04/2007 Running the LT2 More on moving files Copying files back on your UI –lcg-cp --vo dteam lfn:/grid/dteam/lt2wk.dta file:`pwd`/myfile.dta Replicating files somewhere else –lcg-rep -d se1.pp.rhul.ac.uk --vo dteam lfn:/grid/dteam/lt2wk.dta

27 16/04/2007 Running the LT2 Listing files Listing replicas: –lcg-lr –-vo [yourvo] lfn: List the guid: –lcg-lg –-vo [yourvo] lfn: Example –lcg-lr --vo dteam lfn:/grid/dteam/lt2wk.dta srm://gfe02.hep.ph.ic.ac.uk/pnfs/hep.ph.ic.ac.uk/da ta/dteam/generated/2007-04-16/filec6b6fba2-c854- 4ee6-a0db-68bd6cd6e0dd srm://se1.pp.rhul.ac.uk/dpm/pp.rhul.ac.uk/home/dtea m/generated/2007-04-16/file5642a5ea-b63f-411a-b56c- 84a75137d716

28 16/04/2007 Running the LT2 Sending your job where your files are In your JDL –InputData = {"lfn:/grid/dteam/lt2wk.dta"}; –DataAccessProtocol ={"file", "srm", "gridftp"}; Then you have to use the lcg- commands to copy the files Alternatively you can link to the gfal library and stream the data (man gfal).

29 16/04/2007 Running the LT2 Conclusions In London you have –Around 2500 cpu –94 TB –All availaible trough the ltwo vo To get more on how to use –http://www.gridpp.ac.uk/deployment/users/http://www.gridpp.ac.uk/deployment/users/ –Get registered to the ltwo vo. See the GANGA talk for more high level tools to submit jobs without having to write jdl.

30 16/04/2007 Running the LT2 Thanks to all of the Team M. Aggarwal, D. Colling, A. Chamberlin, S. George, K. Georgiou, M. Green, W. Hay, P. Hobson, P. Kyberd, A. Martin, G. Mazza, D. Rand, G. Rybkine, G. Sciacca, K. Septhon, B. Waugh, LT 2

31 BACKUP

32 16/04/2007 Running the LT2 Listing the SE. Removing files lcg-infosites –-vo ltwo se Don’t forget to remove your files –lcg-del

33 16/04/2007 Running the LT2 RLS remember file location

34 16/04/2007 Running the LT2 VOMS: Virtual Organization Membership Service. Provides information on the user's relationship with her Virtual Organization: her groups, roles and capabilities. Provides the list of users for a given VO

35 16/04/2007 Running the LT2 GridLoad Tool to monitor the sites: -Updates every 5minutes -Uses the RTM data and stores it in rrd files Shows theNumber of Jobs in any state VO view. Stacks the Jobs by VO CE view. Stacks the Jobs by CE https://gfe03.hep.ph.ic.ac.uk:4175/cgi-bin/load Still a prototype. Will add View by GOC and ROC. Error checking. Add usage (running cpu/ total cpu). Improve look and feel Could interface with NAGIOS for raising alarms (high abort rate)

36 16/04/2007 Running the LT2 GridLoad What can it be used for ? #Aborted Jobs Home dir full Problem solved Can be used to have a unique measure of the health of the system We can then use nagios to find out more Avoid the to many alarms syndrome ! You can query the cgi to get graphs for your site

37 16/04/2007 Running the LT2 Extracting the private and public keys. You have to create a.globus directory and extract the keys into it. –Extract your public key: openssl pkcs12 -in cert.p12 -clcerts -nokeys -out usercert.pem Chmod 644 usercert.pem –Extract your private key: openssl pkcs12 -in cert.p12 -nocerts -out userkey.pem Protected it: chmod 200 userkey.pem

38 16/04/2007 Running the LT2 Initialize your Proxy The Proxy is a temporary key pair that is signed by your private key. It allows to delegate your credidential to another machine where your job will run. To create a proxy (which will be a file in the /tmp directory) you need to –Voms-proxy-init –-voms ltwo –Type the password to decrypt your public key You should see this: Your identity: /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1) Enter GRID pass phrase: Creating temporary proxy............................................... Done Contacting gm01.hep.ph.ic.ac.uk:15002 [/C=UK/O=eScience/OU=Imperial/L=Physics/CN=host/gm01.hep.ph.ic.ac.uk/emailAddress=o.van -der-aa@imperial.ac.uk] "ltwo" Done Creating proxy............................................ Done Your proxy is valid until Tue Dec 6 23:45:14 2005

39 16/04/2007 Running the LT2 Preparing for submitting jobs A simple job program is made available in the /tmp/Lecture.tar.gz Copy it to your home dir and untar it. To submit a job you need to create a file that contains your requirements this is the so called jdl file (job description language) We will submit jobs as members of the London Tier 2 VO (LTWO) so we need to specify to run on sites that support it. For the moment the site that support it is the Imperial College HEP site.

40 16/04/2007 Running the LT2 Submit the job edg-job-submit --config-vo gridpp_wl_vo_ltwo.conf --config gridpp_wl_cmd_var.conf hello.jdl Or runjob.sh hello.jdl The configuration files (gridpp_...) are there to specify to use the imperial Ressource Broker since it is the only one that knows about the ltwo vo. ********************************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://gfe01.hep.ph.ic.ac.uk:9000/kvexAiToJyvcBvxdMBTdoA ********************************************************************************************* This is your Job ID

41 16/04/2007 Running the LT2 Check the status of your job Edg-job-status [Your job ID] will get the status of your job

42 16/04/2007 Running the LT2 Managing large files To transfer large files your should not use the input and output sandbox. They are limited to 9MB. File replication should be used. The LTWO vo does not have a catalog to register the files so I will describe what can be done.

43 16/04/2007 Running the LT2 Globus-url-copy You can copy file to our SE using the globus-url-copy command Globus-url-copy file:////myfile gsiftp://gw38.hep.ph.ic.ac.uk/stage2/lcg2- data/ltwo/myfilenamefile:////myfile But this is not using the catalog to avoid knowing where your file really is.

44 16/04/2007 Running the LT2 Hello.jdl and finding matching ressources In the Lecture directory –See file Hello.jdl Executable = "/bin/hostname"; #Arguments = "none"; StdOutput = "std.out"; StdError = "std.err"; OutputSandbox = {"std.out", "std.err"}; Name of the executable Files you want to retreive Check which ressources match your requirements edg-job-list-match --config-vo gridpp_wl_vo_ltwo.conf --config gridpp_wl_cmd_var.conf hello.jdl

45 16/04/2007 Running the LT2 Exercice Find out what the GridCR program does Submit 5 jobs. The output of the GridCR program should be stored on the classic SE Using your job standard output retreive the files that have been generated.

46 16/04/2007 Running the LT2 Check the validity of your proxy voms-proxy-info will tell you how many hours your delegation is valid. subject : /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1)/CN=proxy issuer : /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1) identity : /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1) type : proxy strength : 512 bits path : /tmp/x509up_u37227 timeleft : 11:58:43

47 16/04/2007 Running the LT2 Finding which ce support the ltwo vo To get a list of CE that support the ltwo vo you use the lcg-infosites command –Lcg-infosites –vo ltwo ce gw39.hep.ph.ic.ac.uk:2119/jobmanager-lcgpbs-ltwo This is the CE of the HEP group. - If you do lcg-infosites –-vo dteam ce you will get a list of CE in LCG.

48 16/04/2007 Running the LT2 Lcg-cr,lcg-rep To register a file in a catalog and copy it to your beloved SE lcg-cr –-vo [yourvo] file://`pwd`/ \ -l lfn: -d yoursefile://`pwd`/ \ If you do not give SE the local one will be used. To replicate the same file in a different CE –lcg-rep -–vo [yourvo]


Download ppt "How to get started London Tier2 O. van der Aa. 16/04/2007 Running the LT2 UK HEP Grid: GridPP, One T1, Four T2 ScotGrid Durham, Edinburgh, Glasgow NorthGrid."

Similar presentations


Ads by Google