GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, 2009-06-25 v. 0.8.

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
The ALICE Framework at GSI Kilian Schwarz ALICE Meeting August 1, 2005.
GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage.
ALICE analysis at GSI (and FZK) Kilian Schwarz CHEP 07.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Israel Cluster Structure. Outline The local cluster Local analysis on the cluster –Program location –Storage –Interactive analysis & batch analysis –PBS.
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
New CERN CAF facility: parameters, usage statistics, user support Marco MEONI Jan Fiete GROSSE-OETRINGHAUS CERN - Offline Week –
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week,
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
Grid Developers’ use of FermiCloud (to be integrated with master slides)
PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
The CMS CERN Analysis Facility (CAF) Peter Kreuzer (RWTH Aachen) - Stephen Gowdy (CERN), Jose Afonso Sanches (UERJ Brazil) on behalf.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS XROOTD news New release New features.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
+ AliEn site services and monitoring Miguel Martinez Pedreira.
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
BaBar Cluster Had been unstable mainly because of failing disks Very few (
T3g software services Outline of the T3g Components R. Yoshida (ANL)
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz The future of AliEn.
Status of AliEn2 Services ALICE offline week Latchezar Betev Geneva, June 01, 2005.
Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
ATLAS TIER3 in Valencia Santiago González de la Hoz IFIC – Instituto de Física Corpuscular (Valencia)
Gestion des jobs grille CMS and Alice Artem Trunov CMS and Alice support.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
Grid Operations in Germany T1-T2 workshop 2015 Torino, Italy Kilian Schwarz WooJin Park Christopher Jung.
BeStMan/DFS support in VDT OSG Site Administrators workshop Indianapolis August Tanya Levshina Fermilab.
Grid Operations in Germany T1-T2 workshop 2016 Bergen, Norway Kilian Schwarz Sören Fleischer Raffaele Grosso Christopher Jung.
Virtual machines ALICE 2 Experience and use cases Services at CERN Worker nodes at sites – CNAF – GSI Site services (VoBoxes)
Lyon Analysis Facility - status & evolution - Renaud Vernet.
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
Compute and Storage For the Farm at Jlab
Kilian Schwarz ALICE Computing Meeting GSI, October 7, 2009
Experience of PROOF cluster Installation and operation
ALICE & Clouds GDB Meeting 15/01/2013
Real Time Fake Analysis at PIC
Xiaomei Zhang CMS IHEP Group Meeting December
High Availability Linux (HA Linux)
Cluster / Grid Status Update
Report PROOF session ALICE Offline FAIR Grid Workshop #1
Status of the CERN Analysis Facility
PROOF – Parallel ROOT Facility
ALICE FAIR Meeting KVI, 2010 Kilian Schwarz GSI.
RDIG for ALICE today and in future
GSIAF "CAF" experience at GSI
PES Lessons learned from large scale LSF scalability tests
Virtualization in the gLite Grid Middleware software process
Simulation use cases for T2 in ALICE
ALICE – FAIR Offline Meeting KVI (Groningen), 3-4 May 2010
AliEn central services (structure and operation)
Partner: LMU (Atlas), GSI (Alice)
Bernd Panzer-Steindel CERN/IT
ALICE GSI Compile Cluster
Support for ”interactive batch”
Presentation transcript:

GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, 2009-06-25 v. 0.8

GSIAF Present status installation and configuration operation Issues and problems Plans and perspectives PoD summary

GSI Grid Cluster – present status CERN GridK a 1 Gbps 80 (300) TB ALICE::GSI::SE::xrootd Grid vobox Grid CE LCG RB/CE GSI batchfarm: ALICE cluster (180 nodes/1400 cores for batch: 20 nodes for GSIAF) Directly attached disk storage (112 TB) for testing purpose PROOF/Batch Lustre Cluster: 580 TB 18-Mar-2009 GSI 3

Present Status ALICE::GSI:SE::xrootd 75 TB disk on fileserver (16 FS a 4-5 TB each) currently being upgraded to 300 TB 3U 12*500 GB disks RAID 5 6 TB user space per server Lustre cluster for local data storage. Directly mounted by the batch farm nodes capacity: 580 TB (to be shared by all GSI experiments) nodes dedicated to ALICE (Grid and local) but used also by FAIR and Theory (less slots, lower prority) 15 boxes, 2*2 cores, 8 GB RAM, 1+3 disks, funded by D-Grid 40 boxes, 2*4 cores, 16 GB RAM, 4 disks RAID5, funded by ALICE 25 boxes, 2*4 cores, 32 GB RAM, 4 disks RAID5, funded by D-Grid 112 blades, 2*4 cores, 16 GB RAM, 2 disks RAID0, funded by ALICE ==> 192 computers, ca. 1500 cores on all nodes installed: Debian Etch64

GSIAF (static cluster) dedicated PROOF Cluster GSIAF 20 nodes (160 PROOF servers per user) Reads data from Lustre used very heavily for code debugging and development needed for fast response using large statistics Performance: see next slide 18-Mar-2009

GSIAF performance (static cluster) Alice first physics analysis for pp @ 10 TeV 18-Mar-2009

installation shared NFS dir, visible by all nodes xrootd ( version 2.9.0 build 20080621-0000) ROOT (all recent versions up to 523-04) AliRoot (including head) all compiled for 64bit reason: due to fast software changes disadvantage: possible NFS stales started to build Debian packages of the used software to install locally investigating also other methods (CfEngine) to distribute experiment software locally

Configuration (GSIAF) setup: 1 standalone, high end 8 GB machine for xrd redirector and proof master, Cluster: LSF and proof workers so far no authentification/authorization via Cfengine platform independent computer administration system (main functionality: automatic configuration). xrootd.cf, proof.conf, access control, Debian specific init scripts for start/stop of daemons (for the latter also Capistrano for fast prototyping) all configuration files are under version control (SVN) 18-Mar-2009 8

Cfengine – config files in subversion

Monitoring via MonaLisa 18-Mar-2009

GSI Luster Clustre directly attached to PROOF workers 580 TB (to be shared by all GSI experiments) used for local data storage

GSIAF usage experience real life analysis work of staged data by GSI ALICE group (1-4 concurrent users) 2 user tutorials for GSI ALICE users (10 students each training) GSIAF and CAF were used as PROOF clusters during PROOF course at GridKa School 2007 and 2008

PROOF users at GSIAF (sc)

ideas came true ... Coexistence of interactive and batch processes (PROOF analysis on staged data and Grid user/production jobs) on the same machines can be handled !!! re”nice” LSF batch processes to give PROOF processes a higher priority (LSF parameter) number of jobs per queue can be increased/decreased queues can be enabled/disabled jobs can be moved from one queue to other queues Currently at GSI each PROOF worker is an LSF batch node optimised I/O. Various methods of data access (local disk, file servers via xrd, mounted lustre cluster) have been investigated systematically. Method of choice: Lustre and xrd based SE. Local disks are not used for PROOF anymore at GSIAF. PROOF nodes can be added/removed easily local disks have no good surviving rate extend GSI T2 and GSIAF according to promised ramp up plan PoD

cluster load

data staging and user support how to bring data to GSIAF ? used method: GSI users copy data from AliEn individually to Lustre using standard tools user support at GSIAF: private communication or help yourself via established procedures.

general remarks it is not planned to use more than 160 cores for GSIAF static cluster for less than 160 cores per PROOF session users use PoD

issues and problems currently the cluster is automatically restarted every night. Otherwise the system behaves stable. Manual intervention rarely needed.

wishlist, summary and outview

future plans and perspectives

„a PROOF cluster on the fly“ GSIAF -----> PoD „a PROOF cluster on the fly“ 18-Mar-2009

Summary GSI T2 resources have been extended according to the plan GSIAF is heavily used and behaves stable static cluster will phase out PROOF on Demand shall provide the possibility to create dynamic and private/individual PROOF clusters „on the fly“ First official release of LSF plugin April 09 First user training in same month 18-Mar-2009