Presentation on theme: "UK Testbed Report GridPP 9 Steve Traylen"— Presentation transcript:
UK Testbed Report GridPP 9 Steve Traylen
Topics Current test beds. Resources at GridPP sites. (Corrections welcome) –Scot Grid –London Grid –North Grid –South Grid –Tier1/A Grid Future test beds and grid facilities.
EDG No big changes within EDG 2.1.X, only bug fixes and security updates. Released last November. 19 sites in total with 8 from the UK. Software freeze, 4 th February. New sites freeze, 9 th February. EDG review on 19 th, 20 th February. EDG will continue as is through March. OS security policy based on Fedora Legacy.
LCG 1 Released October Since then only a few security updates. In limited use by experiments but the deployment procedure is now well established as are plans for the next data challenge. Three sites in the UK of a total 28. No support for managed or mass storage access. Security updates provided by CERN Linux group.
How to join LCG. Procedure is quite formal but appears to work. –A How2Start document exists. A questionnaire must be filled in supplying your intentions. –Number of WNs. –Storage capacity. –VOs you plan to support. All install support for LCG is via your primary LCG, for the UK this is RAL. The existing tb-support list can also be used. LCG2 similar although the GOC will be collecting information via a Gridsite enabled web form.
ScotGrid (Glasgow) Previously within EDG. EDG2 being installed. Various WP2 test machines. New CE, SE and UI being arranged. Will join LCG2 with these resources and new resources. 6 x335 and 29 blades from IBM funded by eDIKT for bioinformatics being added. A CDF-SAM/JIM front end exists. Possible UK eScience front end run by NeSC Hub. Fraser Speirs now the tech co- ordinator for ScotGrid.
ScotGrid (Durham) EDG installed. Ganglia to be added. New GridPP hardware will be installed in EDG/LCG. Extra front ends will be made available from ScotGrid.
NorthGrid (Lancaster) Predominately SAMGrid. EDG CE, SE, MON, WNs. 6 figure sum of hardware, arriving end of This shiney farm will appear within LCG and NorthGrid. Ganglia is use and liked for debugging and ease of install. EDG 2.1.8
NorthGrid (Sheffield) Most recent full member of the EDG testbed. More WNs will now be added. Grid jobs could be fed to existing 68cpu farm. This farm will continue to grow. Ganglia is fantastic. Plans exist for a 400 CPU cluster for use by GridPP. EDG
NorthGrid (Manchester) VO services for GridPP, Babar and experimental MICE, CALICE and DESY. WWW for GridPP. Very active with Babar grid. Will join LCG1/2. Alessandra will be North grid coordinator. Ganglia in used throughout department. Also a GridPP ganglia view likely to be created CPU farm being commissioned this year.
NorthGrid (Liverpool) 940 DELL P4s installed. Around 80 will be committed to LCG2, this figure can vary once actual demand is known. Bunch of nodes for grid front ends are available. Plan to take part in ATLAS and LHCb data challenges as well as Babar grid.
ScotGrid (Edinburgh) Joining EDG2 now. Plan to join LCG and/or Babar. Grid jobs could be fed to a 17 node Babar farm. Grid access to some of 150TB of SRIF storage is planned. Phil Clark will coordinating ScotGrid activities in Edinburgh.
LondonGrid (UCL) UCL is a full member of the EDG2 testbed. SRIF funded cluster with 192 CPUs for Grid and E- Science projects being installed now A large portion will be made available within LCG2. Will be used for Atlas DC2. EDG 2.1.8
LondonGrid (QMUL) A full member of EDG2 testbed. Plan to front end SRIF farm CPUs are in tender process now Ganglia in use. Unhappy with ScalablePBS/Maui. Will consider Sun Grid Engine v6 once released. EDG 2.1.8
London Grid (Imperial College) Full member of LCG1 and EDG2. Eagerly awaiting LCG2 for CMS DC. Both CPU and Disk dedicated to LCG and EDG will be increased. A UI for LCG2--. Monitoring, job tracking map is now an applet. EDG LCG
South Grid (Bristol) VDT based CE within Babar Grid. This fronts an existing 40 WN Babar farm. There is a GridPP replica catalogue. Ganglia running. A 6 month plan to be running Babar SP production lies ahead. Lots of work for CMS DC04 such as GMcat. VDT
SouthGrid (Birmingham) EDG testbed installed. Possibly integrate to existing farms later. New on-site hardware expected, not dedicated to UKHEP. EDG 2.1.8
SouthGrid (RAL PPD) A full member of EDG2, but will ramp down to join LCG2. WP3 testbed also includes 2 R-GMA Nagios nodes. Ganglia in use, Nagios being considered CPUs, 5TB disk being arranged. Supporting SouthGrid install EDG and then LCG. EDG RGMA, EDG
SouthGrid (Oxford) A full member of EDG testbed. Oxford eScience centre has a Condor and JISC infrastructure testbed. Ganglia in use else where other than EDG cluster. Hardware being sourced now for 2004 data challenges. All resources will appear within SouthGrid. EDG
SouthGrid (Cambridge) 20 nodes in total including one for testing and a NM. Plan to feed jobs to existing eScience farms. 3TB and 20 more CPUs to deployed in 2months time. Ganglia in use and liked. EDG LCG
RAL Tier1/a (EDG App) RAL runs EDG core services such as RGMA catalogue and RLS for some VOs. EDG
RAL Tier1/a (EDG and Others) EDG Dev: CE, 2xSE, MON and RLS. R-GMA: CE, SE, MON and IC. EDG-SE: 4 x SEs Public access UIs for apptb and devtb. Gatekeeper into main production farm for Babar Grid. Central SRB MCAT server.
RAL Tier1/a (LCG1/2--) LCG1: UI, CE, SE, BDII, WN and West GIIS. –Skeleton service to be terminated ASAP. LCG2 : UI, CE, WN, BDII, PROXY, RB and west GIIS. –Babar and DZero VOs added to standard LHC VOs. –R-GMA added to LCG2 as proof of concept. –Nagios running across LCG2.
Common Problems/Requests More reliable or informative monitoring. Firewall requirements, one site was nearly charged because of the required complexity. Strange Globus TCP considered illegal by a couple of fire walls and blocked wrongly or rightly. Mistakes/omissions in the instructions. –Too many possible errors in the system to document.
Arrival of LCG2. LCG2 is appearing now. With this release it is good time commit resources. –Data challenges are all eager to start. –Teir1 will initially move 70 high end nodes into this. Installation by LCFG or manual by hand instructions exist for installed/shared resources. Likely to be in place for some time perhaps for the remainder of this year.
DCache SRM What is it with respect to the LCG? –One solution to SRMed fronted disk. –DCache is GridFTP server with a very flexible backend. –A SRM implementation exists. RAL is packaging, testing and configuring with LCFG. Testing at a brick wall due to minor, but critical, differences in SRM implementation. – The SRM does not create directories automagically. –As a classic SE gridftp-mkdir not implemented.
Plans for EDG testbed. Lots of speculation. After the review software may move closer to LCG2++. Officially EDG testbed will become a development/demonstration testbed for EGEE. From early April people inside EGEE will coordinate and run this resource. This may move to be the SA1 testbed, it will be the same people running it. A partial quattor install EDG is also being worked on.
Testbeds for EGEE JRA1, Middleware Engineering and Integration –Similar role to EDG dev testbed for EGEE middleware. –Rather more controlled and closed. –Resources at CERN, NIKHEF and RAL plus five testers at CERN. –Release frequency up to 1/week. SA1, EU Grid Operations, Support, and Management. –Testing ground for applications with EGEE middleware. –Release frequency of around 3 months WP3 testbed. –Will continue into the future for EGEE in some form.
Conclusions GridPP sites still make up the most significant portion of EDG. EDG will become smaller and somehow migrate to EGEE. As it stands it is probably to large for the early stages of EGEE. Running EDG is very good practise for LCG. Significant resources are wanted and best suited to LCG. Tier2s appear to working closely to do work on same things.