Presentation is loading. Please wait.

Presentation is loading. Please wait.

LCG-LHCC mini-review ALICE Latchezar Betev Latchezar Betev for the ALICE collaboration.

Similar presentations


Presentation on theme: "LCG-LHCC mini-review ALICE Latchezar Betev Latchezar Betev for the ALICE collaboration."— Presentation transcript:

1 LCG-LHCC mini-review ALICE Latchezar Betev Latchezar Betev for the ALICE collaboration

2 2 Registration/replication of RAW Data volumes (2008) Data volumes (2008) Total 310 TB at CERN + 200 TB replicated at T1s Total 310 TB at CERN + 200 TB replicated at T1s Includes cosmic data, detector calibration Includes cosmic data, detector calibration Tools and procedures Tools and procedures T0 registration -> CASTOR2 and AliEn file catalogue T0 registration -> CASTOR2 and AliEn file catalogue Routine operation, high redundancy of servers and software Routine operation, high redundancy of servers and software Problem-free operation Problem-free operation All updates carried out transparently by IT/FIO All updates carried out transparently by IT/FIO Replication FTD/FTS Replication FTD/FTS Tested extensively@CCRC08 Tested extensively@CCRC08 All goals (rate/stability) met and exceeded All goals (rate/stability) met and exceeded Routine operation during data taking (cosmics, calibration) Routine operation during data taking (cosmics, calibration)

3 3 Registration/replication of RAW (2) Data cleanup from tapes Data cleanup from tapes Large part of 2006 RAW data is being removed Large part of 2006 RAW data is being removed Old MC productions – more difficult as there are still being used for comparative studies Old MC productions – more difficult as there are still being used for comparative studies Ongoing discussion detector groups and physics coordination Ongoing discussion detector groups and physics coordination Targeted for removal – ~1.2PB of master copy + T1 replicas Targeted for removal – ~1.2PB of master copy + T1 replicas Policy for tape replicas to be discussed with T1s Policy for tape replicas to be discussed with T1s Possible complete removal of RAW@T1 (tapes) and re-replication of specific important RAW datasets Possible complete removal of RAW@T1 (tapes) and re-replication of specific important RAW datasets Benefit – new scale test of FTS + re-spin of tapes Benefit – new scale test of FTS + re-spin of tapes

4 4 Registration/replication of RAW (3) No major issues No major issues Confidence in the storage and Grid tools is high Confidence in the storage and Grid tools is high Middleware and computing centres storage (T0 and T1s) have been fully certified Middleware and computing centres storage (T0 and T1s) have been fully certified

5 5 Offline reconstruction Detector reconstruction parameters Detector reconstruction parameters Several beam/multiplicity/luminosity conditions Several beam/multiplicity/luminosity conditions Taken into account on event-by-event basis Taken into account on event-by-event basis Quasi-online reconstruction status Quasi-online reconstruction status All runs from 2008 cosmics data processed All runs from 2008 cosmics data processed Emphasis on ‘First physics’ detectors Emphasis on ‘First physics’ detectors Selected runs already re-processed as ‘Pass 2’ and ‘Pass 3’ Selected runs already re-processed as ‘Pass 2’ and ‘Pass 3’ Re-processing of all cosmics data – general ‘Pass 2’ Re-processing of all cosmics data – general ‘Pass 2’ After completion of alignment and calibration studies by detectors After completion of alignment and calibration studies by detectors

6 6 Offline reconstruction (2) Development of quasi-online processing framework Development of quasi-online processing framework Further refinement of Online QA Further refinement of Online QA Speed up the launch of reconstruction jobs to assure ‘hot copy’ of the RAW data Speed up the launch of reconstruction jobs to assure ‘hot copy’ of the RAW data January 2009 – detector code readiness review and new set of milestones adapted to the run plan January 2009 – detector code readiness review and new set of milestones adapted to the run plan The middleware and fabric are fully tested for ‘pass 1’ (T0) RAW data processing The middleware and fabric are fully tested for ‘pass 1’ (T0) RAW data processing To a lesser extent at T1s – limited replication of RAW to save tapes To a lesser extent at T1s – limited replication of RAW to save tapes

7 7 Conditions data - Shuttle Shuttle (subsystem DBs to Grid conditions data publisher) system is in operation since 2 years Shuttle (subsystem DBs to Grid conditions data publisher) system is in operation since 2 years In production regime for the whole 2008 In production regime for the whole 2008 Detector algorithms (DAs) within Shuttle have evolved significantly, ready for standard data taking Detector algorithms (DAs) within Shuttle have evolved significantly, ready for standard data taking High stability of primary sources of conditions data: DCS, DAQ DBs and configuration servers High stability of primary sources of conditions data: DCS, DAQ DBs and configuration servers Toward the end of last cosmics data taking period (August) – all pieces, including DAs fully operational Toward the end of last cosmics data taking period (August) – all pieces, including DAs fully operational

8 8 Conditions data – Shuttle (2) Major efforts concentrated on provide the missing conditions data Major efforts concentrated on provide the missing conditions data Critical LHC parameters Critical LHC parameters Newly installed detectors and control hardware Newly installed detectors and control hardware Some improvements on replication of conditions data on the Grid (file replicas) Some improvements on replication of conditions data on the Grid (file replicas) So far, ALICE maintains 2 full replicas So far, ALICE maintains 2 full replicas Conditions data access is the area with least problems on the Grid Conditions data access is the area with least problems on the Grid Both in terms of publication and client access for processing and analysis Both in terms of publication and client access for processing and analysis

9 9 Grid batch user analysis High importance task Predominantly MC data analysis Each new MC production cycle accelerates the trend and user presence RAW (cosmics) data is still to be re-processed to qualify for massive analysis First pass has been extensively analyzed already by detector experts Ever increasing number of users on the Grid 435 registered, ~120 active

10 10 User job distribution Average 1900 user jobs over 6 months ~15% of Grid resources Predominantly chaotic analysis

11 11 Grid response time - user jobs Type 1 - jobs with little input data (MC) Average waiting time – 11 minutes Average running time – 65 minutes 62% probability of waiting time <5 minutes Type 2 – large input data (analysis) Average waiting time – 37 minutes Average running time – 50 minutes Response time proportional to number of replicas

12 12 Grid batch analysis (1) Internal ALICE prioritization within the common task queue works well Production user is ‘demoted’ in favor of normal users in the queue Generic pilot jobs assure fast execution of user jobs at the sites Type 1 (MC) jobs can be regarded as the ‘maximum Grid efficiency’

13 13 Grid batch analysis (2) Type 2 (ESDs) can be improved Trivial - more data replication Not an option – no storage resources Analysis train – grouping many analysis tasks in a common data set – is in active development/test phase Allows for better CPU/Wall and even load on the storage servers Non-local data access through xrootd global redirector Inter-site SE cooperation, common file namespace

14 14 Prompt analysis (1) PROOF enabled facilities currently available at two sites CAF@CERN GSIAF@GSI Darmstad Extremely popular with users ~150 active Fast processing of MC ESDs – critical for first physics, tested extensively in 2008 Calibration and alignment iterative tasks, tested with cosmics and detector calibration data

15 15 Prompt analysis (2) CAF@CERN was recently upgraded with higher CPU and local disk capacity ALICE encourages centres to install PROOF-enabled analysis facilities Open to all ALICE users Good integration with the Grid on the level of data exchange Data can be reconstructed/analysed directly from Grid storage Results can be published on the Grid

16 16 Middleware updates Fully migrated to WMS submission Fully migrated to WMS submission Parallel installation at all sites of CREAM CE Parallel installation at all sites of CREAM CE Deployment (following recommendations of GD) - CREAM-CE instance in parallel with gLite CE Deployment (following recommendations of GD) - CREAM-CE instance in parallel with gLite CE Impressive service stability – test instance was maintenance-free for 4 month period Impressive service stability – test instance was maintenance-free for 4 month period

17 17 Storage New storage at T2s New storage at T2s Gradually increasing the number of sites with xrootd-enabled SEs Gradually increasing the number of sites with xrootd-enabled SEs Every SE is validated and tuned independently Every SE is validated and tuned independently Storage status monitored with MonALISA Storage status monitored with MonALISA Emphasis on disk-based SEs for analysis Emphasis on disk-based SEs for analysis Including a capacity at T0/T1s Including a capacity at T0/T1s Storage types remain unchanged Storage types remain unchanged T1D0 for RAW, production, custodial ESD/AOD storage T1D0 for RAW, production, custodial ESD/AOD storage T0D1 for analysis: ESD/AOD replicas T0D1 for analysis: ESD/AOD replicas

18 18 Storage – MSS use optimization File size increase File size increase RAW 1GB->10GB - presently at 3GB due to event size/processing time needed RAW 1GB->10GB - presently at 3GB due to event size/processing time needed Secondary files 100MB->1GB Secondary files 100MB->1GB Aggressive use of archives, containing multiple files Aggressive use of archives, containing multiple files File access patterns File access patterns Emphasis on prompt replication and reconstruction – files are still in the MSS disk buffer Emphasis on prompt replication and reconstruction – files are still in the MSS disk buffer Analysis and chaotic file access moved away from MSS SEs Analysis and chaotic file access moved away from MSS SEs

19 Data taking scenario Cosmics Cosmics Resume data taking in July 2009, ~300TB of RAW Resume data taking in July 2009, ~300TB of RAW p+p runs p+p runs Running at maximum DAQ bandwidth Running at maximum DAQ bandwidth Few days @ 0.9 GeV (October 2009) Few days @ 0.9 GeV (October 2009) 11 months @ 10 TeV 11 months @ 10 TeV Machine parameters at P2 - optimum data taking conditions for ALICE Machine parameters at P2 - optimum data taking conditions for ALICE Computing resources must be sufficient for quasi online processing Computing resources must be sufficient for quasi online processing Address the ALICE genuine p+p physics program and provide baseline measurements for AA Address the ALICE genuine p+p physics program and provide baseline measurements for AA

20 Data taking scenario (2) A+A run A+A run Fall 2010 - a standard period of Pb+Pb running Fall 2010 - a standard period of Pb+Pb running Computing resources must be sufficient to process these data within 4 months after data taking (as foreseen in the Computing Model) Computing resources must be sufficient to process these data within 4 months after data taking (as foreseen in the Computing Model) Results to be presented at QM@Annecy (the LHC QM) in Spring 2011 Results to be presented at QM@Annecy (the LHC QM) in Spring 2011 Monte Carlo Monte Carlo 2009-2010 are standard years for Monte Carlo production 2009-2010 are standard years for Monte Carlo production

21 ALICE Data Taking Scenario Year/OperationCosmicppAAMC Data taking Data processin g Data taking Data processin g Data taking Data processin g 2009 Q1 Q2 Q3 Q4 2010 Q1 Q2 Q3 Q4 2011 Q1

22 Resources requirements CPU power required per event as in the Computing TDR and validated by cosmic data processing CPU power required per event as in the Computing TDR and validated by cosmic data processing Event (RAW/ESD/AOD) sizes as in the Computing TDR and validated by cosmic data processing Event (RAW/ESD/AOD) sizes as in the Computing TDR and validated by cosmic data processing Any Tier category (T0,1,2,3) can perform any task (reconstruction, simulation, analysis) Any Tier category (T0,1,2,3) can perform any task (reconstruction, simulation, analysis) CPU resources dynamically allocated depending on current experiment activity CPU resources dynamically allocated depending on current experiment activity

23 Resources requirements: CPU Year New requirements MSI2K Previous requirements MSI2K Variation % CERNT1T2CERNT1T2CERNT1T2 2009 Q1 7.98.08.5 10.211,717,7-12-18-46 Q2 7.98.08.5 Q3 7.98.08.5 Q4 8.911.19.6 2010 Q1 9.211.19.6 10,223,626,2+13+9-8 Q2 9.211.19.6 Q3 11.414.422.4 Q4 11.425.624.2

24 Resources requirements: Disk The total required disk has been shared differently among Tiers The total required disk has been shared differently among Tiers Any type of data (RAW, ESD) can reside on any xrootd-enabled SE Any type of data (RAW, ESD) can reside on any xrootd-enabled SE Year New requirements PB Previous requirements PB Variation % CERNT1T2CERNT1T2CERNT1T2 2009 Q1 1.72.41.7 1.910.010.2+29-48-54 Q2 1.93.42.7 Q3 2.24.33.7 Q4 2.45.34.7 2010 Q1 2.76.25.6 4.210.010.3+12+15+14 Q2 2.97.26.6 Q3 3.28.17.6 Q4 4.711.511.7

25 Resources requirements: Tape The total required mass storage (integrated) takes into account the already used storage The total required mass storage (integrated) takes into account the already used storage Year New requirements PB Previous requirements PB Variation % CERNT1T2CERNT1T2CERNT1T2 2009 Q1 3.32.4 4.912.3-24-51 Q2 3.43.6 Q3 3.64.8 Q4 3.76.0 2010 Q1 4.27.2 8.521.4-22-44 Q2 4.68.4 Q3 5.09.6 Q4 6.712.0

26 Grid support ALICE is supported currently by 1 ½ FTEs in CERN IT ALICE is supported currently by 1 ½ FTEs in CERN IT 50% of P.Mendez (IT/GD) – gLite middleware installation, tuning, SAM suite and ALICE-WLCG liaison 50% of P.Mendez (IT/GD) – gLite middleware installation, tuning, SAM suite and ALICE-WLCG liaison 50% of P.Saiz (IT/GS) – AliEn and WLCG dashboard development 50% of P.Saiz (IT/GS) – AliEn and WLCG dashboard development 50% of F.Furano (IT/DM) – xrootd development 50% of F.Furano (IT/DM) – xrootd development These people are critically important for the ALICE Grid operation These people are critically important for the ALICE Grid operation

27 27 Activities calendar 12 01 02 03 04 05 06 07 08 09 10 11 12 2008 2009 RAW deletion Replication of RAW Reprocessing Pass 2 Analysis train WMS and CREAM CE Cosmics data taking Data taking


Download ppt "LCG-LHCC mini-review ALICE Latchezar Betev Latchezar Betev for the ALICE collaboration."

Similar presentations


Ads by Google