WLCG Service Report ~~~ WLCG Management Board, 31 st March 2009.

Slides:



Advertisements
Similar presentations
WLCG ‘Weekly’ Service Report ~~~ WLCG Management Board, 22 th July 2008.
Advertisements

Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
WLCG Service Report ~~~ WLCG Management Board, 27 th January 2009.
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
WLCG Service Report ~~~ WLCG Management Board, 27 th October
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
WLCG Service Schedule June 2007.
CMS STEP09 C. Charlot / LLR LCG-DIR 19/06/2009. Réunion LCG-France, 19/06/2009 C.Charlot STEP09: scale tests STEP09 was: A series of tests, not an integrated.
GGUS summary ( 4 weeks ) VOUserTeamAlarmTotal ALICE ATLAS CMS LHCb Totals 1.
WLCG Service Report ~~~ WLCG Management Board, 24 th November
1 24x7 support status and plans at PIC Gonzalo Merino WLCG MB
CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 1 Tier0 Status Tony Cass LCG-LHCC Referees Meeting 18 th November 2008.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
Your university or experiment logo here Storage and Data Management - Background Jens Jensen, STFC.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
WLCG Service Report ~~~ WLCG Management Board, 7 th April 2009.
WLCG Service Report ~~~ WLCG Management Board, 9 th August
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
WLCG Service Report ~~~ WLCG Management Board, 16 th December 2008.
Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN.
Handling ALARMs for Critical Services Maria Girone, IT-ES Maite Barroso IT-PES, Maria Dimou, IT-ES WLCG MB, 19 February 2013.
GGUS Slides for the 2012/07/24 MB Drills cover the period of 2012/06/18 (Monday) until 2012/07/12 given my holiday starting the following weekend. Remove.
WLCG Tier1 [ Performance ] Metrics ~~~ Points for Discussion ~~~ WLCG GDB, 8 th July 2009.
GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE1102 ATLAS CMS LHCb Totals
WLCG Service Report ~~~ WLCG Management Board, 17 th March 2009.
CCRC’08 Monthly Update ~~~ WLCG Grid Deployment Board, 14 th May 2008 Are we having fun yet?
WLCG Service Report ~~~ WLCG Management Board, 7 th September 2010 Updated 8 th September
Summary of 2008 LCG operation ~~~ Performance and Experience ~~~ LCG-LHCC Mini Review, 16 th February 2009.
WLCG Service Report ~~~ WLCG Management Board, 7 th July 2009.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE4015 ATLAS CMS LHCb Totals
4 March 2008CCRC'08 Feb run - preliminary WLCG report 1 CCRC’08 Feb Run Preliminary WLCG Report.
WLCG Service Report ~~~ WLCG Management Board, 16 th September 2008 Minutes from daily meetings.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
WLCG Service Report ~~~ WLCG Management Board, 18 th September
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE4004 ATLAS CMS LHCb Totals
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
2011/11/03 Partial downtimes management Pierre Girard WLCG T1 Service Coordination Meeting.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
SRM v2.2 Production Deployment SRM v2.2 production deployment at CERN now underway. – One ‘endpoint’ per LHC experiment, plus a public one (as for CASTOR2).
WLCG critical services update Andrea Sciabà WLCG operations coordination meeting December 18, 2014.
LCG Issues from GDB John Gordon, STFC WLCG MB meeting September 28 th 2010.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
WLCG Service Report ~~~ WLCG Management Board, 20 th January 2009.
WLCG Service Report ~~~ WLCG Management Board, 9 th February
WLCG Service Report ~~~ WLCG Management Board, 14 th February
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
WLCG Service Report Jean-Philippe Baud ~~~ WLCG Management Board, 24 th August
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
WLCG Service Report ~~~ WLCG Management Board, 17 th February 2009.
Summary of SC4 Disk-Disk Transfers LCG MB, April Jamie Shiers, CERN.
WLCG Service Report ~~~ WLCG Management Board, 10 th November
Analysis of Service Incident Reports Maria Girone WLCG Overview Board 3 rd December 2010, CERN.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
WLCG Service Report ~~~ WLCG Management Board, 9 th December 2008.
Update on Plan for KISTI-GSDC
~~~ WLCG Management Board, 10th March 2009
WLCG Management Board, 16th July 2013
Castor services at the Tier-0
~~~ LCG-LHCC Referees Meeting, 16th February 2010
WLCG Service Report 5th – 18th July
Dirk Duellmann ~~~ WLCG Management Board, 27th July 2010
Presentation transcript:

WLCG Service Report ~~~ WLCG Management Board, 31 st March 2009

Introduction This report covers the two weeks since the last WLCG MB 2 SiteDateIssue RAL24/32 power glitches resulted in major site outage. CASTOR up at 14:30 on 25/3, other services soon after. ATLAS replication had to be restarted due to DB corruption. Some other knock-on effects… Move to new machine room & STEP’09? CNAF27/3 – 03/04Scheduled downtime – (power supply and air conditioning) due to the interconnection of the existing services to the new infrastructure system. LFC has been replicated in Roma (CHEP)CHEP PIC06/04 12h – 08/04 15h Annual power supply maintenance – scheduled downtime [ Impact on ATLAS ES(?) cloud? ] ASGC27/3 (?)Have now hired a full-time DBA (who?) Still in process of relocating services to IDC. Communication is still an issue here – (for example) many days delay in response to problems with Oracle DB & streaming for ATLAS – this is not compatible with the response times we have discussed here (MB) nor with a reliable service. Commissioning for STEP’09 needs to be understood!

More on ASGC ASGC T1 and Taiwan Federated T2 services will be collocated at IDC from Mar. 19. All the T1 and T2 services planned to be up and running before Mar. 23. We hope to make 2,500 cores and 1.3 Petabyte disk space available next week. Tape Library will be available about one more week later with clean tapes added gradually. During the transition, all ASGC T1 and T2 services will be shutdown and restart at the last day (Sunday, Mar. 22). The delivery of the tape system would be difficult to be online in one week due to the cleaning of existing tape system is not well fit with schedule proposed. besides, all tape drives are not able to cleanup due to the complex interior components of the LTO3/4 drives. the MES procurement expect to finish mid of next week, while the actual deliver term will extend another 45 days. though vendor promise to speed up the internal bidding as well as shipment but still might delay another two weeks or more. local IBM promise to loan for another LTO4 drive but we might have limited b/w if two VOs request the migration/stagin around the same time. we're still negotiating with local vendor see if having chance expending the drives to two or three more. migration of the data from existing cartridges will be another concern while we need to confirm other technical details, procedures before taking any action. merging/splitting of TS3500 tape system will delay another 3-4 days, depend on the labor order from the MES case. hope for your understanding, and we hope to relocate also the tape system into collocation data center area while we need better coordination due to the limited floor plane serving the tape library. Alessandro - is there any foreseen time when they will be ready? How could ATLAS handle situation with a T1 down so long? Trying to move Australian T2 to a different cloud. Maybe use LFC and FTS in TRIUMF? 3

GGUS Summaries 4  Alarm testing scheduled for this week – alarms should be issued and analysis complete well in advance of next week’s F2F meetings! VO concernedUSERTEAMALARMTOTAL ALICE1001 ATLAS CMS2002 LHCb Totals VO concernedUSERTEAMALARMTOTAL ALICE2002 ATLAS CMS4004 LHCb82010 Totals

5 VO boxes – voalice03/06 [ see slide notes for some discussion ] CE & SRM SRM – chimera etc CE CE & SRM Mainly SRM CE & SRM Mainly CE SRM

6 CE CE & SRM Mainly SRM Mainly CE SRM

Summary “Transitory” problems continue – possibly at a higher rate than in recent weeks (As in “here today and gone tomorrow”) Masking out scheduled or understood issues leaves a relatively good service view for this period – is this representative? Scale Testing for the Experiment Program – STEP ’09: [ Only ] Possible Schedule: May/June, to finish by end June!  ”The priority to improve the site readiness is there NOW and not only in May/June when ATLAS and the other VOs are actively scale-testing at the same time” 7

ATLAS metrics for STEP09 All T1’s must be fully operational – no downtimes for >4 hours are allowed during the STEP09 week T2’s are supposed to be fully operational – T2’s may sign off until June 1 We will size the tests such they all fit within 1 week – We will measure what has finished and what not Need to define precise tape metrics (rate, efficiency, losses,..) Need to define goals for production and reconstruction rates and efficiencies