How to get started London Tier2 O. van der Aa. 16/04/2007 Running the LT2 UK HEP Grid: GridPP, One T1, Four T2 ScotGrid Durham, Edinburgh, Glasgow NorthGrid.

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

London Tier2 Status O.van der Aa. Slide 2 LT 2 21/03/2007 London Tier2 Status Current Resource Status 7 GOC Sites using sge, pbs, pbspro –UCL: Central,
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
The Grid Constantinos Kourouyiannis Ξ Architecture Group.
Job Submission The European DataGrid Project Team
Steve LloydGridPP13 Durham July 2005 Slide 1 Using the Grid Steve Lloyd Queen Mary, University of London.
16 th LHCb Software Week1 April th LHCb Software Week1 April 2004 Happy April Fools Unfortunately, not yet … … but I hope so one day.
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
INFSO-RI Enabling Grids for E-sciencE Practicals on VOMS and MyProxy Emidio Giorgio INFN Retreat between GILDA and ESR VO, Bratislava,
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Luciano Díaz ICN-UNAM Based on Domenico.
The gLite API – PART I Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
LT 2 London Tier2 Status Olivier van der Aa LT2 Team M. Aggarwal, D. Colling, A. Fage, S. George, K. Georgiou, W. Hay, P. Kyberd, A. Martin, G. Mazza,
GRID Computing: Ifrastructure, Development and Usage in Bulgaria M. Dechev, G. Petrov, E. Atanassov.
Computational grids and grids projects DSS,
Nadia LAJILI User Interface User Interface 4 Février 2002.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) GISELA Additional Services Diego Scardaci
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
Group 1 : Grid Computing Laboratory of Information Technology Supervisors: Alexander Ujhinsky Nikolay Kutovskiy.
August 13, 2003Eric Hjort Getting Started with Grid Computing in STAR Eric Hjort, LBNL STAR Collaboration Meeting August 13, 2003.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Nov. 18, EGEE and gLite are registered trademarks gLite Middleware Usage Dusan.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Feb. 06, Introduction to High Performance and Grid Computing Faculty of Sciences,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Submission Fokke Dijkstra RuG/SARA Grid.
Jan 31, 2006 SEE-GRID Nis Training Session Hands-on V: Standard Grid Usage Dušan Vudragović SCL and ATLAS group Institute of Physics, Belgrade.
GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.
Presenter Name Facility Name UK Testbed Status and EDG Testbed Two. Steve Traylen GridPP 7, Oxford.
E-infrastructure shared between Europe and Latin America 1 Workload Management System-WMS Luciano Diaz Universidad Nacional Autónoma de México - UNAM Mexico.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6
Enabling Grids for E-sciencE Workload Management System on gLite middleware - commands Matthieu Reichstadt CNRS/IN2P3 ACGRID School, Hanoi.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
EGEE-0 / LCG-2 middleware Practical.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Alexandre Duarte CERN IT-GD-OPS UFCG LSD 1st EELA Grid School.
Tier 3 Status at Panjab V. Bhatnagar, S. Gautam India-CMS Meeting, July 20-21, 2007 BARC, Mumbai Centre of Advanced Study in Physics, Panjab University,
Workload Management System Jason Shih WLCG T2 Asia Workshop Dec 2, 2006: TIFR.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Moisés Hernández Duarte UNAM FES Cuautitlán.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Command Line Grid Programming Spiros Spirou Greek Application Support Team NCSR “Demokritos”
Data Management The European DataGrid Project Team
Further aspects of EGEE middleware components INFN, Catania EGEE is funded by the European Union under contract IST
Enabling Grids for E-sciencE EGEE-II INFSO-RI Porting an application to the EGEE Grid & Data management for Application Rachel Chen.
Enabling Grids for E-sciencE Sofia, 17 March 2009 INFSO-RI Introduction to Grid Computing, EGEE and Bulgarian Grid Initiatives –
User Interface UI TP: UI User Interface installation & configuration.
Stephen Burke – Sysman meeting - 22/4/2002 Partner Logo The Testbed – A User View Stephen Burke, PPARC/RAL.
A GANGA tutorial Professor Roger W.L. Jones Lancaster University.
LCG2 Tutorial Viet Tran Institute of Informatics Slovakia.
J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.
Hands on Security, Authentication and Authorization Virginia Martín-Rubio Pascual RedIRIS/Red.es Curso Grid y e-Ciencia.
EGI-InSPIRE RI Grid Training for Power Users EGI-InSPIRE N G I A E G I S Grid Training for Power Users Institute of Physics Belgrade.
Enabling Grids for E-sciencE gLite security pratical tutorial Dario Russo INFN Catania Catania,
Istituto Nazionale di Astrofisica Information Technology Unit INAF-SI Job with data management Giuliano Taffoni.
GRID commands lines Original presentation from David Bouvet CC/IN2P3/CNRS.
Introduction to Computing Element HsiKai Wang Academia Sinica Grid Computing Center, Taiwan.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
EGEE is a project funded by the European Union under contract IST Job Submission Giuseppe La Rocca EGEE NA4 Generic Applications INFN Catania.
Real Time Fake Analysis at PIC
Workload Management System
5. Job Submission Grid Computing.
login: clermont-ferrandxx password: GridCLExx
Certificates Usage and Simple Job Submission
Certificates Usage and Simple Job Submission
Certificates Usage and Simple Job Submission
gLite Job Management Christos Theodosiou
Presentation transcript:

How to get started London Tier2 O. van der Aa

16/04/2007 Running the LT2 UK HEP Grid: GridPP, One T1, Four T2 ScotGrid Durham, Edinburgh, Glasgow NorthGrid Daresbury, Lancaster, Liverpool, Manchester, Sheffield SouthGrid Birmingham, Bristol, Cambridge, Oxford, RAL PPD London Tier2 Brunel, Imperial, QMUL, RHUL, UCL

16/04/2007 Running the LT2 Imperial College Spread Across two Sites Physics Department 465 KIS2K (Dual Core Intel Woodcrest) running sge 6. 60TB running dCache Computing Department 177 KIS2K (Opterons) running sge 6. Storage using the Physics Department one All Running RHEL4 and RHEL3 and using the LCG Tarball. Local Physicist, CMS/LHCB/DZero

16/04/2007 Running the LT2 324 KSI2K across two clusters. Two CE running pbs/maui 6.5 TB of storage running DPM Complex situation wrt to networking. Grid is in demilitarized zone with 200Mb/s max. Local Physicist are mainly from CMS.

16/04/2007 Running the LT2 Biggest cluster in London Mixture of Athlons,Xeons Opterons. Total of 1200 KSI2K running separate pbs/maui Cluster shared by Astronomy/HEP/Material Sciences. Storage 18TB running poolfs and DPM Expect to use worker node local disk with luster. 400TB Local community Atlas oriented

16/04/2007 Running the LT2 Separate pbs/maui from ce 160 KSI2K 8TB running DPM ATLAS/ILC community Running slc3 Will soon buy 265KSI2K and 136TB to come around april

16/04/2007 Running the LT2 Situation similar to Imperial: Physics department 24KSI2K and ~1TB Computing department Shared cluster with 50 KSI2K 1.5 TB running DPM Running centos3, sge

16/04/2007 Running the LT2 Resource Summary CPU: 2.5 MSI2K Storage: 94 TB

16/04/2007 Running the LT2 How are the resources used ? Currently around 70%

16/04/2007 Running the LT2 How to contact us Our mailing list: –The coordinator: –The T2 manager: via GGUS: –Specify UKI-LT2 in the subject field and the university –Use it for any specific problem once you are setup Our wiki: –Used to describe the infrastructure –Gives links to monitoring pages

16/04/2007 Running the LT2 How to start ? “The Tree Steps” … 1.Register for a certificate (as explained in the ngs talk). 2.With your certificate register to the ltwo virtual organisation 3.Get access to a user interface Ask via the lt2-technical mailing list. Each university in the LT2 has a user interface

16/04/2007 Running the LT2 Summary, main Grid Components User Interface (UI) is where the user sits to submit his job The Virtual Organisation Membership Service (VOMS) is involved in authorizing and authenticating users The Information System (IS) publishes the individual site information (CE Queue names, SE contact points, #waiting jobs, #running jobs etc) The Workload Management System (WMS) take the user job find a compatible site and submit the job to the site CE. The Computing Element (CE) is the entrance point for the jobs to get into the computing cluster. The Storage Element (SE) is the equivalent of the CE but for data

16/04/2007 Running the LT2 The Main Grid Components wms voms

16/04/2007 Running the LT2 Information System Tree structure showing all available resources in the Grid. –Implemented in the form of a ldap server –Top Level view at lcg-bdii.gridpp.ac.uk, port 2170 –Interesting to have a look Use Jxplorer ldap browser

16/04/2007 Running the LT2 Submitting your first job Get a login on a user interface –In this case gfe03.hep.ph.ic.ac.uk Initialize your proxy –voms-proxy-init --voms ltwo Prepare your JDL (Job Description Language) –The name of the executable –The files you want to transfer before the job starts –Your constrains, for example: How much cpu time you need Which subset of resources you want to use

16/04/2007 Running the LT2 The files Hello.jdl Executable = "/bin/sh"; Arguments = "Hello.sh"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"Hello.sh"}; OutputSandbox = {"std.out", "std.err"}; Hello.sh #!/bin/sh echo 'Hello LT2 Workshop' whoami hostname

16/04/2007 Running the LT2 Submitting Finding matching resources –edg-job-list-match Hello.jdl *************************************************************************** COMPUTING ELEMENT IDs LIST The following CE(s) matching your job requirements have been found: *CEId* ce00.hep.ph.ic.ac.uk:2119/jobmanager-sge-30min ce00.hep.ph.ic.ac.uk:2119/jobmanager-sge-72hr ce1.pp.rhul.ac.uk:2119/jobmanager-pbs-ltwogrid dgc-grid-35.brunel.ac.uk:2119/jobmanager-lcgpbs-short gw39.hep.ph.ic.ac.uk:2119/jobmanager-lcgpbs-ltwo mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-10min mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-12hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-1hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-24hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-30min mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-3hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-6hr mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-72hr gw-2.ccc.ucl.ac.uk:2119/jobmanager-sge-default ***************************************************************************

16/04/2007 Running the LT2 Submitting The actual submission –Edg-job-submit Hello.jdl ********************************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - ********************************************************************************************* This is your job identifier you need to keep track of them

16/04/2007 Running the LT2 Checking the state of your job Edg-job-status [your job id]

16/04/2007 Running the LT2 Getting the result Edg-job-get-output [jobid] –Will store your OutputSandbox files in /tmp/ Std.out Std.err –Content of std.out Hello LT2 Workshop lt2-ltwo007 mars092.mars.lesc.doc.ic.ac.uk

16/04/2007 Running the LT2 JDL: more complex requirements Specify a CE in a domain –Requirements = RegExp(".*mars.lesc.doc.ic.ac.uk.*$",other.GlueCEUniqueID); Require Some CPU Time (min) –Requirements = RegExp(".*mars.lesc.doc.ic.ac.uk.*$",other.GlueCEUniqueID) && (other.GlueCEPolicyMaxCPUTime > 600); Require Some CPU*KSI2K Time –Requirements = other.GlueCEPolicyMaxCPUTime > 30 * 500/other.GlueHostBenchmarkSI00 )) More on how to master JDL at

16/04/2007 Running the LT2 Data Management In the previous example –All files are transferred via the SandBox –SandBox is limited to 100Mb Clearly something additional is required to transfer bigger datasets  Data Management tools: lcg utils, gfal

16/04/2007 Running the LT2 Catalogue services A “file” is identified by a GUID Several Alias (LFN) can be attached to the GUID One “file” can be located a several places (PFN)

16/04/2007 Running the LT2 Uploading a file to a storage element (SE) Finding list of SE –Lcg-info-sites --vo dteam SE –If you don’t specify an SE the one closest to the cluster will be used Uploading –lcg-cr --vo dteam -d gfe02.hep.ph.ic.ac.uk file:myfile.dta –Returns: guid:ec362b1a-6f a72b-68d4ad55eb59

16/04/2007 Running the LT2 GUID ? Remembering GUID is not human friendly You can give an alias (lfn) to a GUID. –lcg-aa --vo dteam guid:ec362b1a-6f a72b-68d4ad55eb59 lfn:/grid/home/lt2wk.dta You can give an alias when registering the file –lcg-cr --vo dteam -d gfe02.hep.ph.ic.ac.uk file:myfile.dta -l lfn:/grid/dteam/lt2wk.dta

16/04/2007 Running the LT2 More on moving files Copying files back on your UI –lcg-cp --vo dteam lfn:/grid/dteam/lt2wk.dta file:`pwd`/myfile.dta Replicating files somewhere else –lcg-rep -d se1.pp.rhul.ac.uk --vo dteam lfn:/grid/dteam/lt2wk.dta

16/04/2007 Running the LT2 Listing files Listing replicas: –lcg-lr –-vo [yourvo] lfn: List the guid: –lcg-lg –-vo [yourvo] lfn: Example –lcg-lr --vo dteam lfn:/grid/dteam/lt2wk.dta srm://gfe02.hep.ph.ic.ac.uk/pnfs/hep.ph.ic.ac.uk/da ta/dteam/generated/ /filec6b6fba2-c854- 4ee6-a0db-68bd6cd6e0dd srm://se1.pp.rhul.ac.uk/dpm/pp.rhul.ac.uk/home/dtea m/generated/ /file5642a5ea-b63f-411a-b56c- 84a75137d716

16/04/2007 Running the LT2 Sending your job where your files are In your JDL –InputData = {"lfn:/grid/dteam/lt2wk.dta"}; –DataAccessProtocol ={"file", "srm", "gridftp"}; Then you have to use the lcg- commands to copy the files Alternatively you can link to the gfal library and stream the data (man gfal).

16/04/2007 Running the LT2 Conclusions In London you have –Around 2500 cpu –94 TB –All availaible trough the ltwo vo To get more on how to use – –Get registered to the ltwo vo. See the GANGA talk for more high level tools to submit jobs without having to write jdl.

16/04/2007 Running the LT2 Thanks to all of the Team M. Aggarwal, D. Colling, A. Chamberlin, S. George, K. Georgiou, M. Green, W. Hay, P. Hobson, P. Kyberd, A. Martin, G. Mazza, D. Rand, G. Rybkine, G. Sciacca, K. Septhon, B. Waugh, LT 2

BACKUP

16/04/2007 Running the LT2 Listing the SE. Removing files lcg-infosites –-vo ltwo se Don’t forget to remove your files –lcg-del

16/04/2007 Running the LT2 RLS remember file location

16/04/2007 Running the LT2 VOMS: Virtual Organization Membership Service. Provides information on the user's relationship with her Virtual Organization: her groups, roles and capabilities. Provides the list of users for a given VO

16/04/2007 Running the LT2 GridLoad Tool to monitor the sites: -Updates every 5minutes -Uses the RTM data and stores it in rrd files Shows theNumber of Jobs in any state VO view. Stacks the Jobs by VO CE view. Stacks the Jobs by CE Still a prototype. Will add View by GOC and ROC. Error checking. Add usage (running cpu/ total cpu). Improve look and feel Could interface with NAGIOS for raising alarms (high abort rate)

16/04/2007 Running the LT2 GridLoad What can it be used for ? #Aborted Jobs Home dir full Problem solved Can be used to have a unique measure of the health of the system We can then use nagios to find out more Avoid the to many alarms syndrome ! You can query the cgi to get graphs for your site

16/04/2007 Running the LT2 Extracting the private and public keys. You have to create a.globus directory and extract the keys into it. –Extract your public key: openssl pkcs12 -in cert.p12 -clcerts -nokeys -out usercert.pem Chmod 644 usercert.pem –Extract your private key: openssl pkcs12 -in cert.p12 -nocerts -out userkey.pem Protected it: chmod 200 userkey.pem

16/04/2007 Running the LT2 Initialize your Proxy The Proxy is a temporary key pair that is signed by your private key. It allows to delegate your credidential to another machine where your job will run. To create a proxy (which will be a file in the /tmp directory) you need to –Voms-proxy-init –-voms ltwo –Type the password to decrypt your public key You should see this: Your identity: /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1) Enter GRID pass phrase: Creating temporary proxy Done Contacting gm01.hep.ph.ic.ac.uk:15002 [/C=UK/O=eScience/OU=Imperial/L=Physics/CN=host/gm01.hep.ph.ic.ac.uk/ Address=o.van "ltwo" Done Creating proxy Done Your proxy is valid until Tue Dec 6 23:45:

16/04/2007 Running the LT2 Preparing for submitting jobs A simple job program is made available in the /tmp/Lecture.tar.gz Copy it to your home dir and untar it. To submit a job you need to create a file that contains your requirements this is the so called jdl file (job description language) We will submit jobs as members of the London Tier 2 VO (LTWO) so we need to specify to run on sites that support it. For the moment the site that support it is the Imperial College HEP site.

16/04/2007 Running the LT2 Submit the job edg-job-submit --config-vo gridpp_wl_vo_ltwo.conf --config gridpp_wl_cmd_var.conf hello.jdl Or runjob.sh hello.jdl The configuration files (gridpp_...) are there to specify to use the imperial Ressource Broker since it is the only one that knows about the ltwo vo. ********************************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - ********************************************************************************************* This is your Job ID

16/04/2007 Running the LT2 Check the status of your job Edg-job-status [Your job ID] will get the status of your job

16/04/2007 Running the LT2 Managing large files To transfer large files your should not use the input and output sandbox. They are limited to 9MB. File replication should be used. The LTWO vo does not have a catalog to register the files so I will describe what can be done.

16/04/2007 Running the LT2 Globus-url-copy You can copy file to our SE using the globus-url-copy command Globus-url-copy file:////myfile gsiftp://gw38.hep.ph.ic.ac.uk/stage2/lcg2- data/ltwo/myfilenamefile:////myfile But this is not using the catalog to avoid knowing where your file really is.

16/04/2007 Running the LT2 Hello.jdl and finding matching ressources In the Lecture directory –See file Hello.jdl Executable = "/bin/hostname"; #Arguments = "none"; StdOutput = "std.out"; StdError = "std.err"; OutputSandbox = {"std.out", "std.err"}; Name of the executable Files you want to retreive Check which ressources match your requirements edg-job-list-match --config-vo gridpp_wl_vo_ltwo.conf --config gridpp_wl_cmd_var.conf hello.jdl

16/04/2007 Running the LT2 Exercice Find out what the GridCR program does Submit 5 jobs. The output of the GridCR program should be stored on the classic SE Using your job standard output retreive the files that have been generated.

16/04/2007 Running the LT2 Check the validity of your proxy voms-proxy-info will tell you how many hours your delegation is valid. subject : /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1)/CN=proxy issuer : /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1) identity : /C=UK/O=eScience/OU=Imperial/L=Physics/CN=olivier van der aa (vo1) type : proxy strength : 512 bits path : /tmp/x509up_u37227 timeleft : 11:58:43

16/04/2007 Running the LT2 Finding which ce support the ltwo vo To get a list of CE that support the ltwo vo you use the lcg-infosites command –Lcg-infosites –vo ltwo ce gw39.hep.ph.ic.ac.uk:2119/jobmanager-lcgpbs-ltwo This is the CE of the HEP group. - If you do lcg-infosites –-vo dteam ce you will get a list of CE in LCG.

16/04/2007 Running the LT2 Lcg-cr,lcg-rep To register a file in a catalog and copy it to your beloved SE lcg-cr –-vo [yourvo] file://`pwd`/ \ -l lfn: -d yoursefile://`pwd`/ \ If you do not give SE the local one will be used. To replicate the same file in a different CE –lcg-rep -–vo [yourvo]