LCG-LHCC mini-review ALICE Latchezar Betev Latchezar Betev for the ALICE collaboration.

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
ALICE Operations short summary and directions in 2012 Grid Deployment Board March 21, 2011.
ALICE Operations short summary LHCC Referees meeting June 12, 2012.
ALICE Operations short summary and directions in 2012 WLCG workshop May 19-20, 2012.
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LHCb Quarterly Report October Core Software (Gaudi) m Stable version was ready for 2008 data taking o Gaudi based on latest LCG 55a o Applications.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Experience with the WLCG Computing Grid 10 June 2010 Ian Fisk.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
ALICE Roadmap for 2009/2010 Patricia Méndez Lorenzo (IT/GS) Patricia Méndez Lorenzo (IT/GS) On behalf of the ALICE Offline team Slides prepared by Latchezar.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
Costin Grigoras ALICE Offline. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and.
What is expected from ALICE during CCRC’08 in February.
Offline shifter training tutorial L. Betev February 19, 2009.
ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
ALICE Grid operations: last year and perspectives (+ some general remarks) ALICE T1/T2 workshop Tsukuba 5 March 2014 Latchezar Betev Updated for the ALICE.
The ALICE Distributed Computing Federico Carminati ALICE workshop, Sibiu, Romania, 20/08/2008.
Status of PDC’07 and user analysis issues (from admin point of view) L. Betev August 28, 2007.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
Planning and status of the Full Dress Rehearsal Latchezar Betev ALICE Offline week, Oct.12, 2007.
ALICE Operations short summary ALICE Offline week June 15, 2012.
2012 RESOURCES UTILIZATION REPORT AND COMPUTING RESOURCES REQUIREMENTS September 24, 2012.
CBM Computing Model First Thoughts CBM Collaboration Meeting, Trogir, 9 October 2009 Volker Friese.
Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.
Computing for Alice at GSI (Proposal) (Marian Ivanov)
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
ALICE experiences with CASTOR2 Latchezar Betev ALICE.
ALICE Computing Model A pictorial guide. ALICE Computing Model External T1 CERN T0 During pp run i (7 months): P2: data taking T0: first reconstruction.
ALICE Grid operations +some specific for T2s US-ALICE Grid operations review 7 March 2014 Latchezar Betev 1.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
GRID interoperability and operation challenges under real load for the ALICE experiment F. Carminati, L. Betev, P. Saiz, F. Furano, P. Méndez Lorenzo,
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
ALICE computing Focus on STEP09 and analysis activities ALICE computing Focus on STEP09 and analysis activities Latchezar Betev Réunion LCG-France, LAPP.
ALICE Full Dress Rehearsal ALICE TF Meeting 02/08/07.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
The ALICE Analysis -- News from the battlefield Federico Carminati for the ALICE Computing Project CHEP 2010 – Taiwan.
T1/T2 workshop – 7th edition - Strasbourg 3 May 2017 Latchezar Betev
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
Data Challenge with the Grid in ATLAS
Status of the CERN Analysis Facility
INFN-GRID Workshop Bari, October, 26, 2004
Update on Plan for KISTI-GSDC
Offline data taking and processing
ALICE Computing : 2012 operation & future plans
Offline shifter training tutorial
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Readiness of ATLAS Computing - A personal view
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
MC data production, reconstruction and analysis - lessons from PDC’04
Simulation use cases for T2 in ALICE
Offline shifter training tutorial
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
ATLAS DC2 & Continuous production
The LHCb Computing Data Challenge DC06
Presentation transcript:

LCG-LHCC mini-review ALICE Latchezar Betev Latchezar Betev for the ALICE collaboration

2 Registration/replication of RAW Data volumes (2008) Data volumes (2008) Total 310 TB at CERN TB replicated at T1s Total 310 TB at CERN TB replicated at T1s Includes cosmic data, detector calibration Includes cosmic data, detector calibration Tools and procedures Tools and procedures T0 registration -> CASTOR2 and AliEn file catalogue T0 registration -> CASTOR2 and AliEn file catalogue Routine operation, high redundancy of servers and software Routine operation, high redundancy of servers and software Problem-free operation Problem-free operation All updates carried out transparently by IT/FIO All updates carried out transparently by IT/FIO Replication FTD/FTS Replication FTD/FTS Tested Tested All goals (rate/stability) met and exceeded All goals (rate/stability) met and exceeded Routine operation during data taking (cosmics, calibration) Routine operation during data taking (cosmics, calibration)

3 Registration/replication of RAW (2) Data cleanup from tapes Data cleanup from tapes Large part of 2006 RAW data is being removed Large part of 2006 RAW data is being removed Old MC productions – more difficult as there are still being used for comparative studies Old MC productions – more difficult as there are still being used for comparative studies Ongoing discussion detector groups and physics coordination Ongoing discussion detector groups and physics coordination Targeted for removal – ~1.2PB of master copy + T1 replicas Targeted for removal – ~1.2PB of master copy + T1 replicas Policy for tape replicas to be discussed with T1s Policy for tape replicas to be discussed with T1s Possible complete removal of (tapes) and re-replication of specific important RAW datasets Possible complete removal of (tapes) and re-replication of specific important RAW datasets Benefit – new scale test of FTS + re-spin of tapes Benefit – new scale test of FTS + re-spin of tapes

4 Registration/replication of RAW (3) No major issues No major issues Confidence in the storage and Grid tools is high Confidence in the storage and Grid tools is high Middleware and computing centres storage (T0 and T1s) have been fully certified Middleware and computing centres storage (T0 and T1s) have been fully certified

5 Offline reconstruction Detector reconstruction parameters Detector reconstruction parameters Several beam/multiplicity/luminosity conditions Several beam/multiplicity/luminosity conditions Taken into account on event-by-event basis Taken into account on event-by-event basis Quasi-online reconstruction status Quasi-online reconstruction status All runs from 2008 cosmics data processed All runs from 2008 cosmics data processed Emphasis on ‘First physics’ detectors Emphasis on ‘First physics’ detectors Selected runs already re-processed as ‘Pass 2’ and ‘Pass 3’ Selected runs already re-processed as ‘Pass 2’ and ‘Pass 3’ Re-processing of all cosmics data – general ‘Pass 2’ Re-processing of all cosmics data – general ‘Pass 2’ After completion of alignment and calibration studies by detectors After completion of alignment and calibration studies by detectors

6 Offline reconstruction (2) Development of quasi-online processing framework Development of quasi-online processing framework Further refinement of Online QA Further refinement of Online QA Speed up the launch of reconstruction jobs to assure ‘hot copy’ of the RAW data Speed up the launch of reconstruction jobs to assure ‘hot copy’ of the RAW data January 2009 – detector code readiness review and new set of milestones adapted to the run plan January 2009 – detector code readiness review and new set of milestones adapted to the run plan The middleware and fabric are fully tested for ‘pass 1’ (T0) RAW data processing The middleware and fabric are fully tested for ‘pass 1’ (T0) RAW data processing To a lesser extent at T1s – limited replication of RAW to save tapes To a lesser extent at T1s – limited replication of RAW to save tapes

7 Conditions data - Shuttle Shuttle (subsystem DBs to Grid conditions data publisher) system is in operation since 2 years Shuttle (subsystem DBs to Grid conditions data publisher) system is in operation since 2 years In production regime for the whole 2008 In production regime for the whole 2008 Detector algorithms (DAs) within Shuttle have evolved significantly, ready for standard data taking Detector algorithms (DAs) within Shuttle have evolved significantly, ready for standard data taking High stability of primary sources of conditions data: DCS, DAQ DBs and configuration servers High stability of primary sources of conditions data: DCS, DAQ DBs and configuration servers Toward the end of last cosmics data taking period (August) – all pieces, including DAs fully operational Toward the end of last cosmics data taking period (August) – all pieces, including DAs fully operational

8 Conditions data – Shuttle (2) Major efforts concentrated on provide the missing conditions data Major efforts concentrated on provide the missing conditions data Critical LHC parameters Critical LHC parameters Newly installed detectors and control hardware Newly installed detectors and control hardware Some improvements on replication of conditions data on the Grid (file replicas) Some improvements on replication of conditions data on the Grid (file replicas) So far, ALICE maintains 2 full replicas So far, ALICE maintains 2 full replicas Conditions data access is the area with least problems on the Grid Conditions data access is the area with least problems on the Grid Both in terms of publication and client access for processing and analysis Both in terms of publication and client access for processing and analysis

9 Grid batch user analysis High importance task Predominantly MC data analysis Each new MC production cycle accelerates the trend and user presence RAW (cosmics) data is still to be re-processed to qualify for massive analysis First pass has been extensively analyzed already by detector experts Ever increasing number of users on the Grid 435 registered, ~120 active

10 User job distribution Average 1900 user jobs over 6 months ~15% of Grid resources Predominantly chaotic analysis

11 Grid response time - user jobs Type 1 - jobs with little input data (MC) Average waiting time – 11 minutes Average running time – 65 minutes 62% probability of waiting time <5 minutes Type 2 – large input data (analysis) Average waiting time – 37 minutes Average running time – 50 minutes Response time proportional to number of replicas

12 Grid batch analysis (1) Internal ALICE prioritization within the common task queue works well Production user is ‘demoted’ in favor of normal users in the queue Generic pilot jobs assure fast execution of user jobs at the sites Type 1 (MC) jobs can be regarded as the ‘maximum Grid efficiency’

13 Grid batch analysis (2) Type 2 (ESDs) can be improved Trivial - more data replication Not an option – no storage resources Analysis train – grouping many analysis tasks in a common data set – is in active development/test phase Allows for better CPU/Wall and even load on the storage servers Non-local data access through xrootd global redirector Inter-site SE cooperation, common file namespace

14 Prompt analysis (1) PROOF enabled facilities currently available at two sites Darmstad Extremely popular with users ~150 active Fast processing of MC ESDs – critical for first physics, tested extensively in 2008 Calibration and alignment iterative tasks, tested with cosmics and detector calibration data

15 Prompt analysis (2) was recently upgraded with higher CPU and local disk capacity ALICE encourages centres to install PROOF-enabled analysis facilities Open to all ALICE users Good integration with the Grid on the level of data exchange Data can be reconstructed/analysed directly from Grid storage Results can be published on the Grid

16 Middleware updates Fully migrated to WMS submission Fully migrated to WMS submission Parallel installation at all sites of CREAM CE Parallel installation at all sites of CREAM CE Deployment (following recommendations of GD) - CREAM-CE instance in parallel with gLite CE Deployment (following recommendations of GD) - CREAM-CE instance in parallel with gLite CE Impressive service stability – test instance was maintenance-free for 4 month period Impressive service stability – test instance was maintenance-free for 4 month period

17 Storage New storage at T2s New storage at T2s Gradually increasing the number of sites with xrootd-enabled SEs Gradually increasing the number of sites with xrootd-enabled SEs Every SE is validated and tuned independently Every SE is validated and tuned independently Storage status monitored with MonALISA Storage status monitored with MonALISA Emphasis on disk-based SEs for analysis Emphasis on disk-based SEs for analysis Including a capacity at T0/T1s Including a capacity at T0/T1s Storage types remain unchanged Storage types remain unchanged T1D0 for RAW, production, custodial ESD/AOD storage T1D0 for RAW, production, custodial ESD/AOD storage T0D1 for analysis: ESD/AOD replicas T0D1 for analysis: ESD/AOD replicas

18 Storage – MSS use optimization File size increase File size increase RAW 1GB->10GB - presently at 3GB due to event size/processing time needed RAW 1GB->10GB - presently at 3GB due to event size/processing time needed Secondary files 100MB->1GB Secondary files 100MB->1GB Aggressive use of archives, containing multiple files Aggressive use of archives, containing multiple files File access patterns File access patterns Emphasis on prompt replication and reconstruction – files are still in the MSS disk buffer Emphasis on prompt replication and reconstruction – files are still in the MSS disk buffer Analysis and chaotic file access moved away from MSS SEs Analysis and chaotic file access moved away from MSS SEs

Data taking scenario Cosmics Cosmics Resume data taking in July 2009, ~300TB of RAW Resume data taking in July 2009, ~300TB of RAW p+p runs p+p runs Running at maximum DAQ bandwidth Running at maximum DAQ bandwidth Few 0.9 GeV (October 2009) Few 0.9 GeV (October 2009) TeV TeV Machine parameters at P2 - optimum data taking conditions for ALICE Machine parameters at P2 - optimum data taking conditions for ALICE Computing resources must be sufficient for quasi online processing Computing resources must be sufficient for quasi online processing Address the ALICE genuine p+p physics program and provide baseline measurements for AA Address the ALICE genuine p+p physics program and provide baseline measurements for AA

Data taking scenario (2) A+A run A+A run Fall a standard period of Pb+Pb running Fall a standard period of Pb+Pb running Computing resources must be sufficient to process these data within 4 months after data taking (as foreseen in the Computing Model) Computing resources must be sufficient to process these data within 4 months after data taking (as foreseen in the Computing Model) Results to be presented at (the LHC QM) in Spring 2011 Results to be presented at (the LHC QM) in Spring 2011 Monte Carlo Monte Carlo are standard years for Monte Carlo production are standard years for Monte Carlo production

ALICE Data Taking Scenario Year/OperationCosmicppAAMC Data taking Data processin g Data taking Data processin g Data taking Data processin g 2009 Q1 Q2 Q3 Q Q1 Q2 Q3 Q Q1

Resources requirements CPU power required per event as in the Computing TDR and validated by cosmic data processing CPU power required per event as in the Computing TDR and validated by cosmic data processing Event (RAW/ESD/AOD) sizes as in the Computing TDR and validated by cosmic data processing Event (RAW/ESD/AOD) sizes as in the Computing TDR and validated by cosmic data processing Any Tier category (T0,1,2,3) can perform any task (reconstruction, simulation, analysis) Any Tier category (T0,1,2,3) can perform any task (reconstruction, simulation, analysis) CPU resources dynamically allocated depending on current experiment activity CPU resources dynamically allocated depending on current experiment activity

Resources requirements: CPU Year New requirements MSI2K Previous requirements MSI2K Variation % CERNT1T2CERNT1T2CERNT1T Q ,717, Q Q Q Q ,223,626, Q Q Q

Resources requirements: Disk The total required disk has been shared differently among Tiers The total required disk has been shared differently among Tiers Any type of data (RAW, ESD) can reside on any xrootd-enabled SE Any type of data (RAW, ESD) can reside on any xrootd-enabled SE Year New requirements PB Previous requirements PB Variation % CERNT1T2CERNT1T2CERNT1T Q Q Q Q Q Q Q Q

Resources requirements: Tape The total required mass storage (integrated) takes into account the already used storage The total required mass storage (integrated) takes into account the already used storage Year New requirements PB Previous requirements PB Variation % CERNT1T2CERNT1T2CERNT1T Q Q Q Q Q Q Q Q

Grid support ALICE is supported currently by 1 ½ FTEs in CERN IT ALICE is supported currently by 1 ½ FTEs in CERN IT 50% of P.Mendez (IT/GD) – gLite middleware installation, tuning, SAM suite and ALICE-WLCG liaison 50% of P.Mendez (IT/GD) – gLite middleware installation, tuning, SAM suite and ALICE-WLCG liaison 50% of P.Saiz (IT/GS) – AliEn and WLCG dashboard development 50% of P.Saiz (IT/GS) – AliEn and WLCG dashboard development 50% of F.Furano (IT/DM) – xrootd development 50% of F.Furano (IT/DM) – xrootd development These people are critically important for the ALICE Grid operation These people are critically important for the ALICE Grid operation

27 Activities calendar RAW deletion Replication of RAW Reprocessing Pass 2 Analysis train WMS and CREAM CE Cosmics data taking Data taking