Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006.

Similar presentations


Presentation on theme: "1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006."— Presentation transcript:

1 1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL June 2006

2 2 Outline of Talk The ALICE Experiment The ALICE Experiment ALICE computing requirements ALICE computing requirements ALICE Grid – AliEn ALICE Grid – AliEn Analysis using AliEn Analysis using AliEn Status of ALICE Data Challenge 2006 Status of ALICE Data Challenge 2006 Summary and Outlook Summary and Outlook

3 3 The ALICE Experiment ALICE is one of the four main LHC experiments at CERN. ALICE is one of the four main LHC experiments at CERN. Only one dedicated to heavy-ion physics. Only one dedicated to heavy-ion physics. –Study of QCD under extreme conditions ~ 1000 collaborators ~ 1000 collaborators ~ 100 institutions ~ 100 institutions Birmingham is only UK institute involved Birmingham is only UK institute involved

4 4 ALICE Requirements Data taking (each year) Data taking (each year) –1 month of Pb-Pb data ~ 1 PByte –Also p-p for rest of the year ~ 1 PByte Large scale simulation effort Large scale simulation effort –1 Pb-Pb event: ~ 8 hrs (3 GHz) Data Reconstruction Data Reconstruction Data analysis Data analysis Smaller Collaboration than ATLAS or CMS but similar computing requirements. Smaller Collaboration than ATLAS or CMS but similar computing requirements.

5 5 Profile of CPU requirements Total CERN T0 CERN T1 Ext Tier 1 Ext Tier 2 35 MSK2K Jan 07Sept 08 Nov 09

6 6 Tier Hierarchy MONARC Model MONARC Model Cloud Model (Tier free) used Cloud Model (Tier free) used in ALICE data challenges (native AliEn sites – for LCG site we comply with Tier model) in ALICE data challenges (native AliEn sites – for LCG site we comply with Tier model) Tier 0 RAW data master copy Data reconstruction (1 st pass) Prompt analysis Tier 1 Copy of RAW reconstruction Scheduled analysis Tier 2 MC production Partial copy of ESD Data analysis

7 7 ALICE Gridd - AliEn AliEn (ALICE Environment) – Grid framework developed by ALICE – used in production for ~5 years. AliEn (ALICE Environment) – Grid framework developed by ALICE – used in production for ~5 years. Based on WEB services and standard protocols. Based on WEB services and standard protocols. Built around open source code Built around open source code –Less than 5% is native AliEn code (mainly PERL). To date, > 500,000 ALICE jobs have been run under AliEn control worldwide. To date, > 500,000 ALICE jobs have been run under AliEn control worldwide.

8 8 AliEn Pull Protocol One of the major differences between ALiEn and LCG grids is that AliEn uses the pull rather thanpush protcol. One of the major differences between ALiEn and LCG grids is that AliEn uses the pull rather thanpush protcol. EDG/Globus model: EDG/Globus model: ALiEn model: ALiEn model: userserver Resource Broker userserver Resource Broker job list

9 9 LCG / gLite ALICE is committed to using as much common grid applications as possible. ALICE is committed to using as much common grid applications as possible. Changes have been made to make AliEn work with LCG Changes have been made to make AliEn work with LCG –E.g. changes to File Catalogue (FC) LFC (Local File Catalogue or LCG File Catalogue) –V0 Box at each Tier 1 and Tier 2 –Globus/GSI compatible authentication Interface AliEn gLite in development Interface AliEn gLite in development

10 10 Analysis Core of ALICE computing model is AliRoot Core of ALICE computing model is AliRoot –Uses ROOT framework Couple AliEn with ROOT for Grid-based analysis. Couple AliEn with ROOT for Grid-based analysis. –Use PROOF – Parallel ROOT Facility –To the user its like using ROOT 4-tier architecture: 4-tier architecture: –ROOT client session, API server (AliEn + PROOF), Site PROOF master servers, PROOF slave servers. Data from DC2006 only accessible via Grid Data from DC2006 only accessible via Grid

11 11 PROOF Each node has PROOF slave Each site has PROOF master server Uses pull protocol i.e. the slaves ask the master for work packets. Slower slaves get smaller work packets etc. Client API Server AliEn FC …. List of sites with data

12 12 ALICE Data Challenge 2006 (PDC06) Last challenge before the start of data taking Last challenge before the start of data taking Test of all Grid components Test of all Grid components –AliEn as a ALICE interface to the Grid and much, much more –LCG/gLite baseline services (WMS, DMS) Test of computing centres infrastructure Test of computing centres infrastructure Major test of stability of all of the above Major test of stability of all of the above

13 13 Grid software deployment and running LCG sites are operated through the VO-box framework LCG sites are operated through the VO-box framework –All ALICE sites need one –Relatively extended deployment cycle, a lot of configuration and version update issues had to be solved –Situation is quite routine now Data management Data management –This year – xrootd as disk pool manager on all sites –The installation/configuration procedures have just been released –xrootd integrated in other storage management solutions (CASTOR, DPM, dCache) – under development Data replication (FTS) Data replication (FTS) –We use it for scheduled replication of data between the computing centres (RAW from T0->T1, MC production T2->T1, etc…) –Fully incorporated in the AliEn FTD, to be extensively tested in July

14 14 VO box support and operation In additional to the standard LCG components, the VO-box runs ALICE-specific software components In additional to the standard LCG components, the VO-box runs ALICE-specific software components –V0-boxes now at RAL Tier 1 and Birmingham Tier 2 –Birmingham ALICE students are testing ALiEn for analysis purposes through Birmingham Tier 2. The installation and maintenance of these is entirely our responsibility: The installation and maintenance of these is entirely our responsibility: –Support for UK V0-box supplied by CERN (no UK manpower available) Site related problems are handled by the site admins Site related problems are handled by the site admins LCG services problems are reported to GGUS LCG services problems are reported to GGUS

15 15 Operation status Running in a continuous mode since 24/05 Running in a continuous mode since 24/05 VO-boxes: VO-boxes: –monthly releases of AliEn (curently v.2-10), LCG and soon tests of gLite 3.0 Central ALICE services: Central ALICE services: –AliEn machinery and API Service is developed/deployed and maintained by the AliEn team Site services: Site services: –Stability testing of both AliEn and LCG components –The interfaces AliEn-LCG/gLite are still in development –A gLite V0-box has already been provided at CERN and first tests performed.

16 16 Running status – one month

17 17 Sites contributions in the past 2 months 60%T1, 40%T2 (almost half from 2 T2 sites!) 60%T1, 40%T2 (almost half from 2 T2 sites!) RAL: 0.7%

18 18 Running status – site averages Pledged resources – 4000 CPUs Pledged resources – 4000 CPUs Our average is on a 12% level Our average is on a 12% level –Due to central and site services malfunctions –Mostly due to sites providing less CPUs than pledged

19 19 Stability improvements This is a data challenge, so there is always place for improvement: This is a data challenge, so there is always place for improvement: –AliEn is undergoing gradual fixes and new features are added –The LCG software will undergo a quantum leap – move from LCG to gLite –Site infrastructure – VO-box, etc… also needs solidification, especially at the T2s –Monitoring and control – continuously adding new features

20 20 Outlook PDC06 has started as planned PDC06 has started as planned –This is the last exercise before the beam! –It is a test of all Grid tools/services we will use in 2007 »If not in PDC06, good chance is that they will not be ready –It is also a large-scale test the computing infrastructure – computing, storage and network performance

21 21 Outlook (2) Outlook (2) We have all pieces needed to run production on the Grid (some untested). We have all pieces needed to run production on the Grid (some untested). The exercise started 2 months ago and will continue until the end of the year The exercise started 2 months ago and will continue until the end of the year At the moment, we are optimising the use of resources – attempting to get from the sites the promised resources At the moment, we are optimising the use of resources – attempting to get from the sites the promised resources Next phase of the plan is a test of the file transfer utilities of LCG (FTS) and integration with AliEn FTD Next phase of the plan is a test of the file transfer utilities of LCG (FTS) and integration with AliEn FTD In parallel to that we will run event production as usual In parallel to that we will run event production as usual

22 22 Summary AliEn is a Grid framework developed by ALICE using 95% open source code (e.g SOAP) and 5 % AliEn specific (perl) code. AliEn is a Grid framework developed by ALICE using 95% open source code (e.g SOAP) and 5 % AliEn specific (perl) code. AliEn evolving to take into account EGEE/gLite framework and to work with LCG. AliEn evolving to take into account EGEE/gLite framework and to work with LCG. –New user interfaces developed –PROOF for analysis developed –Better authentication/authorisation developed Data Challenge 2006 – since April – going well Data Challenge 2006 – since April – going well V0 boxes at RAL T1 and Bham T2 V0 boxes at RAL T1 and Bham T2 Lack of computing resources a worry. Lack of computing resources a worry.


Download ppt "1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL 27-29 June 2006."

Similar presentations


Ads by Google