1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.

Slides:



Advertisements
Similar presentations
LHCb on the Grid A Tale of many Migrations
Advertisements

User Board - Supporting Other Experiments Stephen Burke, RAL pp Glenn Patrick.
Graeme Stewart: ATLAS Computing WLCG Workshop, Prague ATLAS Suspension and Downtime Procedures Graeme Stewart (for ATLAS Central Operations Team)
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Summary of issues and questions raised. FTS workshop for experiment integrators Summary of use  Generally positive response on current state!  Now the.
LHCb Quarterly Report October Core Software (Gaudi) m Stable version was ready for 2008 data taking o Gaudi based on latest LCG 55a o Applications.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
WLCG Service Report ~~~ WLCG Management Board, 27 th January 2009.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
WLCG Service Report ~~~ WLCG Management Board, 27 th October
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE ATLAS CMS LHCb Totals
SRM 2.2: tests and site deployment 30 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
LHCb: March/April Operational Report NCB 10 th May 2010.
Graeme Stewart: ATLAS Computing WLCG Workshop, Prague ATLAS Suspension and Downtime Procedures Graeme Stewart (for ATLAS Central Operations Team)
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
LHCb The LHCb Data Management System Philippe Charpentier CERN On behalf of the LHCb Collaboration.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
Handling ALARMs for Critical Services Maria Girone, IT-ES Maite Barroso IT-PES, Maria Dimou, IT-ES WLCG MB, 19 February 2013.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
WLCG Service Report ~~~ WLCG Management Board, 7 th September 2010 Updated 8 th September
WLCG Service Report ~~~ WLCG Management Board, 7 th July 2009.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE4015 ATLAS CMS LHCb Totals
4 March 2008CCRC'08 Feb run - preliminary WLCG report 1 CCRC’08 Feb Run Preliminary WLCG Report.
CERN IT Department CH-1211 Genève 23 Switzerland t Experiment Operations Simone Campana.
WLCG LHCC mini-review LHCb Summary. Outline m Activities in 2008: summary m Status of DIRAC m Activities in 2009: outlook m Resources in PhC2.
WLCG Service Report ~~~ WLCG Management Board, 16 th September 2008 Minutes from daily meetings.
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
Report from GSSD Storage Workshop Flavia Donno CERN WLCG GDB 4 July 2007.
WLCG Service Report ~~~ WLCG Management Board, 18 th September
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE4004 ATLAS CMS LHCb Totals
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
LHCb status and plans Ph.Charpentier CERN. LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 2 Ph.C. Status of DC06  Reminder:  Two-fold.
GDB meeting - July’06 1 LHCb Activity oProblems in production oDC06 plans & resource requirements oPreparation for DC06 oLCG communications.
SRM v2.2 Production Deployment SRM v2.2 production deployment at CERN now underway. – One ‘endpoint’ per LHC experiment, plus a public one (as for CASTOR2).
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
The GridPP DIRAC project DIRAC for non-LHC communities.
WLCG Service Report Jean-Philippe Baud ~~~ WLCG Management Board, 24 th August
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
WLCG Service Report ~~~ WLCG Management Board, 10 th November
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
LHCb Status report June 08. LHCb Computing Report Activities since February  Applications and Core Software  Preparation of applications for real data.
LHCb D ata P rocessing S oftware J. Blouw, A. Zhelezov Physikalisches Institut, Universitaet Heidelberg DESY Computing Seminar, Nov. 29th, 2010.
Computing Operations Roadmap
L’analisi in LHCb Angelo Carbone INFN Bologna
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
R. Graciani for LHCb Mumbay, Feb 2006
LHCb status and plans Ph.Charpentier CERN.
The LHCb Computing Data Challenge DC06
Presentation transcript:

1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008

2 LHCb computing model ➟ CERN (Tier-0) is the hub of all activity  Full copy at CERN of all raw data and DSTs  All T1s have a full copy of dst-s ➟ Simulation at all possible sites (CERN, T1, T2) ‏  LHCb has used about 120 sites on 5 continents so far ➟ Reconstruction, Stripping and Analysis at T0 / T1 sites only  Some analysis may be possible at “large” T2 sites in the future ➟ Almost all the computing (except for development / tests) will be run on the grid.  Large productions : production team  Ganga ( Dirac ) grid user interface

3 LHCb on the grid  Small amount of activity over past year ▓ DIRAC3 has been under development ▓ Physics groups have not asked for new productions ▓ Situation has changed recently...

4 LHCb on the grid ➟ DIRAC3  Nearing stable production release ▓ Extensive experience with CCRC08 and follow-up exercises ▓ Used as THE production system for LHCb  Now testing of the interfaces by Ganga developers ➟ Generic pilot agent framework  Critical problems found with the gLite WMS 3.0, 3.1 ▓ Mixing of VOMS roles under certain reasonably common conditions  Cannot have people with different VOMS roles! ▓ Savannah bug #39641 ▓ Being worked on by developers  Waiting for this to be solved before restarting tests

5 DIRAC3 Production >90,000 jobs in past 2 months Real production activity and testing of gLite WMS

6 DIRAC3 Job Monitor y

7 LHCb storage at RAL ➟ LHCb storage primarily on the Tier-1s and CERN ➟ CASTOR used as storage system at RAL  Fully moved out of dCache in May 2008 ▓ One tape damaged and file on it marked lost  Was stable (more or less) until 20 Aug 2008 ▓ Not been able to take great load on servers  Low upper limit (8) on lsf job slots on various castor diskservers  Too many jobs (>500) can come into the batch system. The concerned service class hangs then  Temporarily fixed for now. Needs to be monitored (probably by the shifter on duty?) ‏ »Increase limit to >100 rfio jobs per server »Not all hardware can handle a limit of 200 jobs (start using swap space)‏  Problem seen many times now over the last few months ▓ Castor now in downtime ▓ This is worrying given how close we are to data taking

8 LHCb at RAL ➟ Move to srm-v2 by LHCb  Needed to retire srm-v1 endpoints, hardware for RAL  When DIRAC3 becomes baseline for User analysis ▓ Already used for almost all production ▓ Ganga working on submitting through DIRAC3 ▓ Needs LHCb also to rename files in the LFC  All space tokens, etc have been setup  Target : Turn off srm-v1 access by end September ➟ Currently use srm-v1 for user analysis ▓ DIRAC2 does not support srm-v2 ➟ Batch system :  Pausing of jobs during downtime? ▓ Not clear about the status of this  For now, stop the batch system from accepting LHCb jobs a few hours before scheduled downtimes ▓ No LHCb job should run for >24 hours  Announce beginning and end of downtimes ▓ Problems with broadcast tools ▓ GGUS ticket opened by Derek Ross

9 LHCb and CCRC08 ➟ Planned tasks : Test the LHCb computing model Raw data distribution from pit to T0 centre ▓ Use of rfcp into CASTOR from pit - T1D0 Raw data distribution from T0 to T1 centres ▓ Use of FTS - T1D0 Recons of raw data at CERN & T1 centres ▓ Production of rDST data - T1D0 ▓ Use of SRM 2.2 Stripping of data at CERN & T1 centres ▓ Input data: RAW & rDST - T1D0 ▓ Output data: DST - T1D1 ▓ Use SRM 2.2 Distribution of DST data to all other centres ▓ Use of FTS

10 LHCb and CCRC08 Reconstruction Stripping

11 LHCb CCRC08 Problems ➟ CCRC08 highlighted areas to be improved  File access problems ▓ Random or permanent failure to open files using gsidcap  Request IN2P3 and NL-T1 to allow dcap protocol for local read access  Now using xroot at IN2P3 – appears to be successful ▓ Wrong file status returned by dCache SRM after a put  bringOnline was not doing anything  Software area access problems ▓ Site banned for a while until problem is fixed  Application crashes ▓ Fixed with new SW release and deployment  Major issues with LHCb bookkeeping ▓ Especially for stripping ➟ Lessons learned  Better error reporting in pilot logs and workflow  Alternative forms of data access needed in emergencies ▓ Downloading of files to WN (used at IN2P3, RAL) ‏

12 LHCb Grid Operations ➟ Grid Operations and Production team has been created

13 Communications ➟ LHCb sites  Grid operations team keep track of problems  Report to sites via GGUS and eLogger ▓ All posts are reported on ▓ Please subscribe if you want to know what is going on ➟ LHCb users  Mailing lists ▓  All problems directed here ▓ Specific lists for each LHCb application and Ganga  Ticketing systems (Savannah, GGUS) for DIRAC, Ganga, apps ▓ User by developers and “power” users  Software weeks provide training sessions for using Grid tools  Weekly distributed analysis meetings (starts Friday) ‏ ▓ DIRAC, Ganga, core software developers along with some users ▓ Aims to identify needs and coordinate release plans RSS feed available

14 Summary ➟ Concerned about CASTOR stability close to data taking ➟ DIRAC3 workload and data management system now online  Has been extensively tested when running LHCb productions  Now moving it into the user analysis system ▓ Ganga needs some additional development ➟ Grid operations team working with sites, users and devs to identify and resolve problems quickly and efficiently ➟ LHCb looking forward to imminent switch on of the LHC!

15 Backup - CCRC08 Throughput