1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP 2009 - Prague.

Slides:



Advertisements
Similar presentations
1 ALICE Grid Status David Evans The University of Birmingham GridPP 14 th Collaboration Meeting Birmingham 6-7 Sept 2005.
Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
New CERN CAF facility: parameters, usage statistics, user support Marco MEONI Jan Fiete GROSSE-OETRINGHAUS CERN - Offline Week –
Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week,
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
The ALICE Analysis Framework A.Gheata for ALICE Offline Collaboration 11/3/2008 ACAT'081A.Gheata – ALICE Analysis Framework.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
DataGrid Applications Federico Carminati WP6 WorkShop December 11, 2000.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
PNPI HEPD seminar 4 th November Andrey Shevel Distributed computing in High Energy Physics with Grid Technologies (Grid tools at PHENIX)
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
Ideas for a virtual analysis facility Stefano Bagnasco, INFN Torino CAF & PROOF Workshop CERN Nov 29-30, 2007.
PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
LOGO PROOF system for parallel MPD event processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
Status of PDC’07 and user analysis issues (from admin point of view) L. Betev August 28, 2007.
LOGO Development of the distributed computing system for the MPD at the NICA collider, analytical estimations Mathematical Modeling and Computational Physics.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
The CMS CERN Analysis Facility (CAF) Peter Kreuzer (RWTH Aachen) - Stephen Gowdy (CERN), Jose Afonso Sanches (UERJ Brazil) on behalf.
PROOF and ALICE Analysis Facilities Arsen Hayrapetyan Yerevan Physics Institute, CERN.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
AliEn central services Costin Grigoras. Hardware overview  27 machines  Mix of SLC4, SLC5, Ubuntu 8.04, 8.10, 9.04  100 cores  20 KVA UPSs  2 * 1Gbps.
ALICE DATA ACCESS MODEL Outline 05/13/2014 ALICE Data Access Model 2  ALICE data access model  Infrastructure and SE monitoring.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
+ AliEn site services and monitoring Miguel Martinez Pedreira.
GRID Security & DIRAC A. Casajus R. Graciani A. Tsaregorodtsev.
10 May 2001WP6 Testbed Meeting1 WP5 - Mass Storage Management Jean-Philippe Baud PDP/IT/CERN.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
ALICE experiences with CASTOR2 Latchezar Betev ALICE.
Status of AliEn2 Services ALICE offline week Latchezar Betev Geneva, June 01, 2005.
Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
Good user practices + Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CUF,
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
GRID interoperability and operation challenges under real load for the ALICE experiment F. Carminati, L. Betev, P. Saiz, F. Furano, P. Méndez Lorenzo,
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
ALICE computing Focus on STEP09 and analysis activities ALICE computing Focus on STEP09 and analysis activities Latchezar Betev Réunion LCG-France, LAPP.
Review of PARK Reflectometry Group 10/31/2007. Outline Goal Hardware target Software infrastructure PARK organization Use cases Park Components. GUI /
Grid Operations in Germany T1-T2 workshop 2015 Torino, Italy Kilian Schwarz WooJin Park Christopher Jung.
The ALICE Analysis -- News from the battlefield Federico Carminati for the ALICE Computing Project CHEP 2010 – Taiwan.
Federating Data in the ALICE Experiment
Experience of PROOF cluster Installation and operation
Report PROOF session ALICE Offline FAIR Grid Workshop #1
Status of the CERN Analysis Facility
PROOF – Parallel ROOT Facility
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
GSIAF "CAF" experience at GSI
Simulation use cases for T2 in ALICE
AliEn central services (structure and operation)
Ákos Frohner EGEE'08 September 2008
Support for ”interactive batch”
The LHCb Computing Data Challenge DC06
Presentation transcript:

1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague

2 Introduction The ALICE experiment offers to its users a cluster for quick interactive parallel data processing Prompt and pilot analysis Calibration/Alignment Fast Simulation and Reconstruction The cluster is called CERN Analysis Facility (CAF) The software in use is PROOF (Parallel ROOT Facility) CAF is operational since May 2006

3 Outline PROOF Schema CAF Usage Users and Groups CPU Fairshare File Staging and Disk Quota Resource Monitoring Ongoing Development PROOF on the Grid Outlook and Conclusions

4 Remote PROOF Cluster Data node1 node2 node3 node4 Proof master Proof worker Data root Client – Local PC ana.C stdout/result ana.C root PROOF Schema Data Result

5 Grid SEs CAF Proof worker xrootd disk server local disk... Staging CASTOR MSS CERN Analysis Facility (CAF) Access to local disks  Advantage of processing local data GSI Authentication Proof master, xrootd redirector LDAP Direct Access Proof worker xrootd disk server local disk HW since Sep ‘08 Arch.AMD 64 CPU15 x 8-core (IT standard) Disk15 x 2.33 TB RAID5 Workers2/node Mperf8570 ALICE

6 CAF SW Components ROOT, Scalla sw Suite (Xrootd), MonALISA Transition from olbd to cmsd Cluster Management Service Daemon Provides dynamic load balancing of files and data name-space ALICE file stager plugged into cmsd GSI (Globus Security Infrastructure) authentication Uses X509 certificates and LDAP based configuration management Same mean of authentication for Grid and CAF Grid files can be directly accessed Fast parallel reconstruction of raw data (see talk #457 from C. Cheshkov)

7 An Example

8 -19 groups -111 unique users -19 groups -111 unique users CAF Users Groups#Users PWG022 PWG13 PWG239 PWG319 PWG SUB- DETECTORS 35 Continuative history of CAF since May ‘06 shown on the MonALISA-based web repository (see poster #461 from C.Grigoras) Peak of 23 concurrent users Available disks and CPUs must be fairly used Users are grouped into sub-detectors and physics working groups (PWG) can belong to several groups Groups have a disk space (quota) which is used to stage datasets from the Grid have a CPU fairshare target (priority) to regulate concurrent queries

9 CPU Usage CAF1 CAF2 Data Processed CAF1CAF2+/- Bytes read 266TB (14m) 83.3TB (5.5m) -21% Events4.17G (14m) 1.95G (5.5m) +17% Queries9000 (2.5m) (5.5m) -47%

10 CPU Fairshare Default group quotas: Detectors: x PWGs: 5x Default group quotas: Detectors: x PWGs: 5x Compute new priorities measure difference between usages and quotas Send new priorities to PROOF CPU Fairshare

11 Datasets are used to stage files from the Grid are lists of files registered by users for processing with PROOF may share same physical files allow to keep file information consistent files are uniformly distributed by the xrootd data manager The DS manager takes care of the disk quotas at file level sends monitoring info to MonALISA: Dataset The overall number of files Number of new, touched, disappeared, corrupted files Staging requests Disk utilization for each user and for each group Number of files on each node and total size

12 PROOF worker / xrootd disk server (many) PROOF master/xrootd redirector PROOF master registers/removes DS data manager daemon keeps DS persistent by requesting staging updating file info touching files file stager stages files removes files (LRU) WN disk write/delete read stage/read/touch/delete Files are kept in “Dataset” AliEn SEs CASTOR MSS Dataset Staging Cmsd/xrootd DS

13 Resource Monitoring A full CAF status table is available on the MonALISA repository Many more parameters are available Staging queue, usage of root and log partitions CPU nice and idle status Memory consumption details Number of network sockets

14 CAF is operational since three years More than 100 users are registered and ~ 10 per day use CAF Interactive analysis with PROOF is a good addition to local analysis and batch analysis on the Grid In Conclusion… …but if PROOF + Grid…

15 PROOF on the Grid Reasons: 1. Cluster size: CAF can only hold a fraction of the yearly reco and sim data (~1PB) 2. Data store: not feasible financial and support wise in a single computing centre 3. Resources: The Grid provides lots of resources 4. Data location: bring the kB to the PB and not the PB to the kB Reasons: 1. Cluster size: CAF can only hold a fraction of the yearly reco and sim data (~1PB) 2. Data store: not feasible financial and support wise in a single computing centre 3. Resources: The Grid provides lots of resources 4. Data location: bring the kB to the PB and not the PB to the kB ToDo: 1. Cluster connectivity: Interconnection of Grid centres 2. Tasks and Data co-location: Execute tasks where data is 3. Protected access: WNs must connect to the Master 4. Dynamic Scheduler: Dynamic allocation of workers 5. Interactivity: Hiding of Grid latency ToDo: 1. Cluster connectivity: Interconnection of Grid centres 2. Tasks and Data co-location: Execute tasks where data is 3. Protected access: WNs must connect to the Master 4. Dynamic Scheduler: Dynamic allocation of workers 5. Interactivity: Hiding of Grid latency This is an ongoing development It combines PROOF and the ALICE Grid middleware (AliEn)

16 A ProxyServer service starts Xrootd and PROOF Pilot Grid jobs are submitted to the Grid to start ProxyClients where user data is stored A ProxyClient starts an Xrootd server and registers to the ProxyServer A ProxyServer keeps the list of all the workers running at the WNs A User PROOF session connects to the superMaster that, in turns, starts PROOF workers User workspace WN2 WNn AliEn-LCG WMS Site workspace WN workspace ROOT session Grid API Xrootd manager PROOF master Xrootd server ProxyServer VO-box 1 PROOF superMaster WN PROOF worker ProxyClient VO-box m AliEn-LCG Grid Schema

17 As a PROOF of Concept

18 Summary ALICE uses PROOF on a local cluster (CAF) for quick interactive parallel processing – Prompt and pilot analysis – Calibration/Alignment – Fast Simulation and Reconstruction CAF in production since May 2006, HW and SW upgrade at the end of 2008 Monthly tutorials at CERN (500+ users so far) Active collaboration with ROOT team – Contribution from ALICE to PROOF development – Implementation of dataset concept and CPU quotas Ongoing developments – Adaptation of PROOF to the Grid