Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ian Bird LCG Project Leader Project status report WLCG LHCC Referees’ meeting 16 th February 2010.

Similar presentations


Presentation on theme: "Ian Bird LCG Project Leader Project status report WLCG LHCC Referees’ meeting 16 th February 2010."— Presentation transcript:

1 Ian Bird LCG Project Leader Project status report WLCG LHCC Referees’ meeting 16 th February 2010

2 Ian.Bird@cern.ch 2 Agenda today

3 Ian.Bird@cern.ch 3 Project status report  Overall status  Resources status (installation, pledges, use)  Planning and milestones  Status of planning for new Tier 0  Brief summary of situation with EGI/EMI etc proposals  Resource planning for 2010, 2011, 2012  Understanding of baseline schedule, conditions used by experiments for planning  Initial detailed requirements – presented by each experiment this afternoon

4 Management Board Management of the Project Architects Forum Coordination of Common Applications Grid Deployment Board Coordination of Grid Operations Overview Board - OB Collaboration Board – CB Experiments and Regional Centres Collaboration Board – CB Experiments and Regional Centres LHC Committee – LHCC Scientific Review LHC Committee – LHCC Scientific Review Computing Resources Review Board – C-RRB Funding Agencies Computing Resources Review Board – C-RRB Funding Agencies Physics Applications Software Service & Support Grid Deployment Computing Fabric Activity Areas Worldwide LCG Organisation Resource Scrutiny Group – C-RSG EGEE, OSG representation

5 Ian.Bird@cern.ch 5 OVERALL STATUS

6 Lyon/CCIN2P3 Barcelona/PIC De-FZK US-FNAL Ca- TRIUMF NDGF CERN US-BNL UK-RAL Taipei/ASGC 26 June 2009Ian Bird, CERN6 Today we have 49 MoU signatories, representing 34 countries: Australia, Austria, Belgium, Brazil, Canada, China, Czech Rep, Denmark, Estonia, Finland, France, Germany, Hungary, Italy, India, Israel, Japan, Rep. Korea, Netherlands, Norway, Pakistan, Poland, Portugal, Romania, Russia, Slovenia, Spain, Sweden, Switzerland, Taipei, Turkey, UK, Ukraine, USA. Today we have 49 MoU signatories, representing 34 countries: Australia, Austria, Belgium, Brazil, Canada, China, Czech Rep, Denmark, Estonia, Finland, France, Germany, Hungary, Italy, India, Israel, Japan, Rep. Korea, Netherlands, Norway, Pakistan, Poland, Portugal, Romania, Russia, Slovenia, Spain, Sweden, Switzerland, Taipei, Turkey, UK, Ukraine, USA. WLCG Collaboration Status Tier 0; 11 Tier 1s; 64 Tier 2 federations (124 Tier 2 sites) WLCG Collaboration Status Tier 0; 11 Tier 1s; 64 Tier 2 federations (124 Tier 2 sites) Amsterdam/NIKHEF-SARA Bologna/CNAF

7 Ian.Bird@cern.ch 7 Overall summary   November  Ongoing productions  Cosmics data taking  November – December  Beam data and collisions  Productions + analysis  December – February  Ongoing productions  Cosmics  WLCG service has been running according to the defined procedures  Reporting and follow up of problems at same level  Middleware process – updates & patches – as planned

8 Ian.Bird@cern.ch 8 2009 Physics Data Transfers Final readiness test (STEP’09) Preparation for LHC startupLHC physics data Nearly 1 petabyte/week More than 8 GB/s peak transfers from Castor fileservers at CERN

9 Ian.Bird@cern.ch 9 Reliabilities  This is not the full picture:  Experiment-specific measures give complementary view  Need to be used together with some understanding of underlying issues

10 Ian.Bird@cern.ch 10 Network & Security  LHCOPN  OPN group has only been concerned with T0 – T1 links  Have proposed that the OPN group also coordinate the requirements for T1-T1 links and forming a group to work on data flows to/from T2s as well.  Approved by the CB and MB and representatives of experiments and non-T1 sites sought.  Security Patching  Concern around delays in installing urgent OS security patches last summer at many sites.  Many only patched when EGEE threatened them with suspension in October.  A subsequent threat later in the quarter was controlled more quickly by issuing the suspension threat sooner. A new EGEE policy was adopted of suspending sites who do not patch within seven days of the Security Officer requiring it.  Lack of such basic patching could represent a risk of serious disruption.

11 Ian.Bird@cern.ch 11 Support for analysis  Prototyped in ATLAS, using suggestions/ideas from CMS and others  Discussions with other experiments

12 Ian.Bird@cern.ch 12 RESOURCE STATUS

13 Ian.Bird@cern.ch 13 2008

14 Ian.Bird@cern.ch 14 Issues with 2009 pledges  CPU was in general OK; many delays with disks:  CERN: disk servers had hardware problems  ASGC: 2009 pledges were delayed until Dec (following fire etc)  BNL: disk available as agreed with ATLAS  CNAF: As announced to RRB, no tenders done in 2009  NL: 2009 pledges were late following machine room delay – catch up with 2010  PIC: delayed waiting for 2 TB disks  RAL: 50% of disk had hardware problems  TRIUMF: most was installed by end of year CERN ASGC BNL CNAF NL-T1 PIC RAL TRIUMF

15 Ian.Bird@cern.ch 15 2010 Pledges NB. Many footnotes associated with this information!!

16 Ian.Bird@cern.ch 16 2010 pledges status  Agreement was that for 2010 availability deadline is June 1  Sites have also discussed desired installation profiles with experiments  CA-TRIUMF: CPU purchased already, disk ongoing  DE-KIT: Expect to be on schedule  FR-IN2P3: disk tender done, CPU planned Q1:  BUT budget reduction!  NL-T1: 2010 pledges already purchased; disk avail in Q1  IT-CNAF: CPU Q1, Disk Q2 (will cover 2009+10 pledges)  NDGF: anticipated OK for June  BUT budget reduction in Sweden (TBC)?  ES-PIC: Disks already purchased, CPU in Q1.  TW-ASGC: Hope for June, procurement had not started  UK-RAL: Disk Jan/Feb, CPU April  US-BNL: CPU by June, disk in Jan, Sep as agreed with ATLAS  US-FNAL: 1/3 avail now, rest by Q2  CERN: delivered, commissioned by March/April

17 Ian.Bird@cern.ch 17 Installed capacity  Automation of collection of installed capacity desired and promised  But agreement is fairly complex  Document describing information to be collected – physical vs logical CPU, how to benchmark, how to publish information etc; available and agreed  Information is being published  Tools and tests exist to verify sensible information  But validation of actual data takes a long time

18 Ian.Bird@cern.ch 18 PLANNING & MILESTONES

19 Ian.Bird@cern.ch 19 201020112012 JanFebMarAprMayJunJulAugSepOctNovDecJanFebMarAprMayJunJulAugSepOctNovDecJanFeb SUpp runningHISUpp runningHI WLCG timeline 2010-2012 2010 Capacity commissioned 2011 Capacity commissioned EGEE-III ends EGI & NGIs EGI HEP – SSC EMI (SA3) 

20 Ian.Bird@cern.ch 20 Now full report each month Glexec + SCAS services available; Deployment discussion / policy ongoing Not all sites yet publishing; information validation in progress

21 Ian.Bird@cern.ch 21

22 Ian.Bird@cern.ch 22 Future milestones  Actually very few formal milestones now  Moved from set up to regular operations  Not all problems solved – and more will certainly arise  These can be subject to specific milestones  However, in general we must move from tracking milestones to tracking metrics for  Performance  Reliability  Scalability  Today we have some – but we need to propose a set of useful metrics that we track  Accounting, reliability/availability, throughputs, are published on-line  Operational metrics reviewed weekly  A lot of information in different places (SLS, dashboards, etc).  Others?

23 Ian.Bird@cern.ch 23 STATUS OF PLANS FOR TIER 0

24 Ian.Bird@cern.ch 24 Tier0 Status  Computer Centre Infrastructure  Last year: Decision on construction of new Computer Centre in Prévessin was suspended pending clearer view of  Costs for a container based solution  Costs for hosting of services outside of CERN  Long-term computing requirements  Some of these are now clearer...  In addition:  Gains in power efficiency in newer machines + aggressive replacement of older machines  Delay in accelerator schedule - Have meant that we have gained ~1-2 years in when a new facility is needed; BUT - Issue around critical (i.e. backed up) power is critical

25 Ian.Bird@cern.ch 25 Tier-0 Strategy  Consolidate Critical Power  Hosting - contract for 100 kW in Geneva – available within a few months  Consolidate existing building – will expand critical power to ~600 kW  Container option – not a quick fix – requires service building which requires planning permission (Prevessin)  Absorb physics load  Incremental: containers  Longer term to be decided ~2010  Norway/Finland/Switzerland  CC building on-site  Additional containers  Risks  Remote Operations  Costs – containers and commercial hosting are not cheap  Yearly costs equivalent to what we anticipated to pay for new CC on site  Legal issues

26 Ian.Bird@cern.ch 26 SITUATION FOR EGI

27 EGI Council 22-Oct-09 J. Knobloch/CERN: European Grid Initiative – EGI 22 Oct 2009: 33 NGIs 2 EIRO (CERN, EMBL) + Observers (Armenia, Belarus, Georgia, Kazakhstan, Moldova, Russia, Ukraine) 3 Feb 2010: Council met and agreed the statutes 18 posts have now been advertised to fill the EGI.eu management team Hope to have a basic team in place by May

28 2. EGI_DS Review www.eu-egi.eu 28 NGIs in Europe www.eu-egi.eu

29 Ian.Bird@cern.ch 29 Status of project submissions There were 3 different (sub-) calls 1)EGI itself (project named EGI-Inspire); includes an activity (SA3) specifically focussed on support for existing large communities  This project was invited to a hearing 2)Middleware (project named EMI); includes support for all gLite software required by WLCG (FTS, LFC, dCache, etc., etc.)  This project was invited to a hearing 3)Virtual Research Communities (ex-SSC); There were several EGEE- derived proposals, including one (ROSCOE) that contained a VRC for HEP  None of these have been invited to a hearing  expectation is that these will NOT be funded.  This raises many questions (e.g. Is this a change in strategy?), but formal feedback is not expected before March  Not yet known if EGI-Inspire and EMI will be fully funded, and if not, where any cuts will be required  Project funding expected to start only in June (may be back dated to May)

30 Ian.Bird@cern.ch 30 EGEE-EGI: Risk for WLCG?  This situation does not represent a major risk for WLCG  EGEE  EGI transition is well planned by EGEE, and is well advanced  Countries representing the majority of the resources have NGIs and the Tier 1s are well placed  Important operational tools (GGUS, monitoring, etc.) are assured even if project funding does not appear  WLCG operational procedures are well tested and are mostly independent of the existence of EGEE or EGI  SA3 activity contains Dashboards, Ganga, & specific tasks for each experiment (~2 FTE each); VRC had integration/analysis support; may need to re-focus within constraints of project & feedback  EMI contains essential middleware support and “harmonisation” gLite/ARC/Unicore (long term development was not included)  No funding for HEP VRC means that work with other application communities will significantly reduce  Must wait and see what actually gets funded for operations and middleware; must in any case consider strategy for longer term of middleware

31 Ian.Bird@cern.ch 31 RESOURCE PLANNING Baseline assumptions used by all experiments for requirements analysis

32 No break Present understanding of schedule for both 2010 and 2011

33 Ian.Bird@cern.ch 33 Prelim. schedule 2010/11/12  2010 + 2011  Running from mid-Feb – end Nov  Pb-Pb in November  In principle stop after 1 fb -1 ; plan to run 2 years  (0.2 in 2010, remainder in 2011)  2012: shutdown of accelerator (but not computing)  Guidance on availability for physics is 50%  Must agree common assumptions for #days etc.  182 days pp and 28 days HI in each 2010,2011  7.9 Ms pp + 1.2 Ms HI each year assuming 50% avail  Experiments will also take data in March (pp setup) and Oct (HI setup)  Assume 30 + 14 days at 35% availability  1.3 Ms each year  Totals:  7.9 Ms pp, 1.2 Ms HI, 1.3 Ms “setup” in each of 2010 and 2011  NB: For 2010 this is greater than assumptions used so far for 2010 requirements. Pledges will not change (and may decrease!) – must take care in what is said to funding agencies  Probably we should adjust the 2010 numbers to be not more than existing requirements

34 Ian.Bird@cern.ch 34 Process now  Agree preliminary requirements for 2010, 2011,12 today  Including final agreement on assumptions and baseline  Foresee a MB in March with LHCC and RSG reps present to finalise and agree what is presented to RRB in April.  In principle should say something about 2013 to the RRB  Not clear what can be said yet  Guidance?  NB: many sites (160!!) now depend for their planning on the schedule of the accelerator. It is important that we have some clear published schedules...

35 Ian.Bird@cern.ch 35 Summary  Very much ongoing operations according to the agreed procedures  So far the service behaves similarly for simulations, cosmics, real data  Not without problems  Stability and scalability, particularly of storage interfaces remain a concern  We have to address these this year  Some concerns over some Tier 1s; recovery and response to problems could improve  Still to see what effect many more non-expert users will have  Resource planning for coming years is a concern  Must be careful not to overstate the needs... Or to underestimate  Transition from EGEE to EGI is a concern – must ensure that it is not disruptive  But it is not a major risk now


Download ppt "Ian Bird LCG Project Leader Project status report WLCG LHCC Referees’ meeting 16 th February 2010."

Similar presentations


Ads by Google