Presentation is loading. Please wait.

Presentation is loading. Please wait.

The ALICE Production Patricia Méndez Lorenzo (CERN, IT/PSS) On behalf of the ALICE Offline Project LCG-France Workshop Clermont, 14th March 2007.

Similar presentations


Presentation on theme: "The ALICE Production Patricia Méndez Lorenzo (CERN, IT/PSS) On behalf of the ALICE Offline Project LCG-France Workshop Clermont, 14th March 2007."— Presentation transcript:

1 The ALICE Production Patricia Méndez Lorenzo (CERN, IT/PSS) On behalf of the ALICE Offline Project LCG-France Workshop Clermont, 14th March 2007

2 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 2 Overview of the Talk  The ALICE DC06: Goals, Phases and Results  The AliEn Software: Workflow and Data Management  Principle of Operations: VOBOXES  Conclusions of DC06  Plans for 2007  ALICE and the French Federation

3 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 3 ALICE in the GRID: Main Goals  Validation of the LCG/gLite workload management services  Stability of the services is fundamental for the entire duration of the exercise  Validation of the data transfer and storage services  2 nd phase of the PDC06 and full DC07  The stability and support of the services have to be assured beyond the throughput tests  Validation of the ALICE distributed reconstruction and calibration model  Integration of all Grid resources within one single – interfaces to different Grids (LCG, OSG, NDGF)  End-user data analysis

4 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 4 The ALICE DC06: Phases  First phase:  Production of p+p and Pb+Pb MC events  Conditions and samples agreed with PWGs  (Most of) Data migrated from all Tiers to CASTOR2@CERN (also at NIHAM)  Second phase:  Reconstruction of RAW data: 1 st pass reconstruction at CERN, 2 nd pass at T1  Scheduled data transfers T0-T1  Scheduled data transfers T2- (supporting)T1 oNot yet performed. Still pending in some cases the T1-T2 assignment  Third phase:  End-user analysis on the GRID

5 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 General Results of the ALICE PDC06  The PCD06 has been the longest running Data Challenge in ALICE  Stress test of the ALICE Computing Model  Running 9 months with non-stop: real data taking approach  Some Statistics:  685K grid jobs in total  40M events  0.63PB generated, reconstructed and stored  2M of written files to SE  7MSI2K x hours = 1500 CPUs running continuously  Stables structures in production and support have been established  Stress period also for the WLCG structure  55 computing centers in 4 continents  6 T1, 49 T2, almost 60 VOBOXES in production  Stress tests for FTS since Summer 2006  We are in continuous DC mode

6 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 6 T1-T2 Relations To be reviewed Assigment fully clear and approved?

7 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 7 AliEn Implementation  Own Task queue and related services  Pull Model service: a server holds a master queue of jobs and it is up to the CE that provides the CPU cycles. It asks for the jobs  Use of the WLCG-WMS for agent submission  Several Grid infrastructures available during the PDC06  Use of AliEn as a general front-end  LCG, OSG, NDGF  Lots of resources but different middleware  Use high-level tools and APIs to access Grid resources  Developers put a lot of abstraction effort into hiding the complexity and shielding the user from implementation changes

8 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 8 Job Submission Structure Site ALICE central services Job 1lfn1, lfn2, lfn3, lfn4 Job 2lfn1, lfn2, lfn3, lfn4 Job 3lfn1, lfn2, lfn3 Job 1.1lfn1 Job 1.2lfn2 Job 1.3lfn3, lfn4 Job 2.1lfn1, lfn3 Job 2.1lfn2, lfn4 Job 3.1lfn1, lfn3 Job 3.2lfn2 Optimizer Computing Agent RB CEWN Env OK? Die with grac e Execs agent Sends job agent to site YesNo Close SE’s & Software Matchmakes Receives work-load Asks work-load Retrieves workload Sends job result Updates TQ Submits job User ALICE Job Catalogue Submits job agent VO-Box LCG User Job ALICE catalogues Registers output lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} ALICE File Catalogue packman

9 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 9 Site ALICE File Catalogue ALICE central services lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} lfnguid{se’s} SA RB WN Applic ation File list User SURL VO-Box LCG User Job ALICE catalogues Submit work xrootd File location&GUID GUID CE LFC GUID xrootd://SURL SRM SURL TURL MSS UI JDL Site xrootd File non local Get file File handling - read

10 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 10 Principles of Operation: VO-box  VO-boxes deployed at all T0-T1-T2 sites providing resources for ALICE  Mandatory requirement to enter the production  Required in addition to all standard LCG Services  Entry door to the LCG Environment  Runs standard LCG components and ALICE specific services  Uniform deployment  Same behavior for T1 and T2 in terms of production  Differences between T1 and T2 a matter of QoS only  Installation and maintenance entirely ALICE responsibility  Based on a regional principle  Set of ALICE experts matched to groups of sites  Site related problems handled by site administrators  LCG Service problems reported via GGUS  Not too much, ALICE has experts in almost all sites

11 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 11 Requirements for the VO-BOX  WLCG Configuration  The VOBOX is the integration point to the WLCG  FULL IMPLEMENTATION OF THE WLCG-UI (ALICE requirement)  FTS Client Services  ALICE Configuration  The specific requirements have been included in a public document: VOBOX Security and Operations Questionnaires v-0.5  Distributed to all site managers before setting up  Support for the whole Production  Regional experts handle the VOBOXES  Who is who perfectly established in most of the sites  Central support placed at CERN

12 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 12 Monitoring of the VOBOX Status of the VOBOX, ALICE and WLCG services are monitored through ML Sites are encouraged to check the status through these pages Alarm system established

13 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 13 Monitoring the VOBOX through SAM  The current tool used in MonALISA to check the status of the VOBOX will be also included in SAM (Service Availability Monitoring)  WLCG Monitoring Service  It will allow to check and control the ALICE sites with its own test suites through WLCG  Beginning to include the same tests suite included in ML

14 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 14 ALICE in the World  Status of the sites continuously monitored through MonALISA

15 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 15 MC Production up to now 43% usage of T1 sites 57% usage of T2 sites Fundamental role of the T2 estructure during the production France Case for ALICE: T1: 49% Larger contribution of T2: 51% T2s than Lyon!!

16 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 16 Requirements to WLCG  Improved FTS and underlying storage stability  Continue central (CERN) and site experts pro-active follow up on problems  xrootd interfaces to DPM and CASTOR2  Inclusion of xrootd in the standard storage element would really help  Not using GFAL  Implementation of glexec  First on the testbed and then on the LCG nodes  We should all aim to further improve the overall stability of the services!

17 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 17 Conclusions of the PDC06 Production  Only 50% percent of pledged resources used during the PDC06  Not available resources  Competition from other experiments and instabilities in SW/HW  Lack of manpower to support ALICE at some centres  Everything works with motivated competent people on site  New centres should be brought in the picture asap  It takes a long time to start working together effectively  Still a lot of manual operations needed  Establish a stable and automatic system for testing  Tests required by the Grid Deployment team perfectly established in ALICE oAlready testes the behavior of ALICE software in SLC4 oNew gLite middleware tests ongoing oAlready succesfully tested the new gLite-WMS

18 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 18 Plans for 2007  AliRoot and SHUTTLE framework ready end Q1’07  CAF and Grid MW will continue evolving, with an important test end Q3’07  April 2007: Dress Rehearsal  Combined tests of all steps needed to produce the ESDs from RAW  Continuous “data challenge” mode, aiming at using all resources requested  Physics studies  System stability and functionality  Stable and well defined support for all services and sites  Integration of new centres  Exercising of operational procedures  Improvement of MW  We are planning a T0 combined test in Q2’07

19 Patricia Méndez Lorenzo LCG-France Workshop 14th March 2007 19 French Federation Conclusions  Missing a dedicated ALICE person in Lyon  Very good results during the last and current production with the sites in France  Good support of the site managers  CCIN2P3: By far, the most stable T1 site during the whole T0-T1 exercise (Talk tomorrow)  Resources: ALICE requests a larger share of Lyon (T1+AF) and the french T2  New sites to join the Alice production: Genoble and Strasburg?  Fundamental federation in 2007 and during the dress rehearsal


Download ppt "The ALICE Production Patricia Méndez Lorenzo (CERN, IT/PSS) On behalf of the ALICE Offline Project LCG-France Workshop Clermont, 14th March 2007."

Similar presentations


Ads by Google