S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept 20071 ATLAS computing in Geneva Szymon Gadomski description of the hardware the.

Slides:

Advertisements

Similar presentations

ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.

Advertisements

4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.

Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Duke Atlas Tier 3 Site Doug Benjamin (Duke University)

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

ATLAS computing in Geneva Szymon Gadomski, NDGF meeting, September 2009 S. Gadomski, ”ATLAS computing in Geneva", NDGF, Sept 091 the Geneva ATLAS Tier-3.

Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.

Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.

ATLAS computing in Geneva 268 CPU cores (login + batch) 180 TB for data the analysis facility for Geneva group grid batch production for ATLAS special.

DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.

1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

1 Deployment of an LCG Infrastructure in Australia How-To Setup the LCG Grid Middleware – A beginner's perspective Marco La Rosa

QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.

Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010.

UCL Site Report Ben Waugh HepSysMan, 22 May 2007.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

Preparation of KIPT (Kharkov) computing facilities for CMS data analysis L. Levchuk Kharkov Institute of Physics and Technology (KIPT), Kharkov, Ukraine.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.

INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.

3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.

MiniBooNE Computing Description: Support MiniBooNE online and offline computing by coordinating the use of, and occasionally managing, CD resources. Participants:

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June ATLAS cluster in Geneva Szymon Gadomski University of Geneva.

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.

Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April

OSG Tier 3 support Marco Mambelli - OSG Tier 3 Dan Fraser - OSG Tier 3 liaison Tanya Levshina - OSG.

Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.

T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.

IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.

6. Juli 2015 Dietrich Liko Physics Computing 114. Vorstandssitzung.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.

RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,

Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.

T3g software services Outline of the T3g Components R. Yoshida (ANL)

IHEP Computing Site Report Shi, Jingyan Computing Center, IHEP.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.

The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.

Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.

Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.

Instituto de Biocomputación y Física de Sistemas Complejos Cloud resources and BIFI activities in JRA2 Reunión JRU Española.

Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.

Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.

A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.

U.S. ATLAS Grid Production Experience

Belle II Physics Analysis Center at TIFR

Database Services at CERN Status Update

Ruslan Fomkin and Tore Risch Uppsala DataBase Laboratory

Computing Report ATLAS Bern

US CMS Testbed.

Wide Area Workload Management Work Package DATAGRID project

The LHCb Computing Data Challenge DC06

Presentation transcript:

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the functionality we need the current status list of projects

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept The cluster at Uni Dufour (1)

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept The cluster at Uni Dufour (2) 12 worker nodes in in 2006 and 20 in 2007!

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept The cluster at Uni Dufour (3) power and network cabling of worker nodes three nodes for services (grid, batch, storage abstraction) direct line from CERN

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept The cluster in numbers 61 computers to manage –53 workers, 5 file servers, 3 service nodes 188 CPU cores in the workers 75  5 TB of disk storage can burn up to 30 kW (power supply specs)

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept The functionality we need our local cluster computing –log in and have an environment to work with ATLAS software, both offline and trigger develop code, compile, interact with ATLAS software repository at CERN –work with nightly releases of ATLAS software, normally not distributed off-site but visible on /afs –disk space –use of final analysis tools, in particular ROOT –a convenient way to run batch jobs grid computing –tools to transfer data from CERN as well as from and to other Grid sites worldwide –ways to submit our jobs to other grid sites –a way for ATLAS colleagues to submit jobs to us

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept The system in production Description at 1 file server (+1 if needed), 3 login machines and 18 batch worker nodes 30 ATLAS people have accounts –ATLAS GE + friends and relations –people rely on the service maintenance of the system (0.3 FTE, top priority) –creation of user accounts, –web-based documentation for users, –installation of ATLAS releases, –maintenance of worker nodes, file servers and the batch system, –assistance to users executing data transfers to the cluster, –help with problems related to running of ATLAS software off-CERN-site, e,g. access to data bases at CERN, firewall issues e.t.c. –raid recovery from hardware failures.

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept Our system in the Grid Geneva is in NorduGrid since 2005 In company of Berne and Manno (out Tier 2)

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept One recent setback We used to have a system of up to 35 machines in production. Problems with power to our racks since last August –A major blackout in Plainpalais area on August 2 nd ; UPS has gave up after 10’ in the machine room; all University services went down. A major disaster. –When recovering, we lost power again the next day. No explanation from the DINF. –Slightly smaller system in use since then. Power lost again on Friday Sept 7 th. Right now only a minimal service. Need to work together with the DINF, measure power consumption of our machines under full load. Also need to understand the limits of the infrastructure. Another power line is being laid for our 20 new worker nodes, the “blades”. The power cut has nothing to do with that.

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept Things to do (and to research) a)Configuration of worker nodes: configuration of the CERN Scientific Linux system, torque batch system software, other added software, as requested by the users. b)General cluster management issues: security, a way to install the system on multiple machines (three types of worker nodes), automatic shutdown when UPS turns on, monitoring of temperature, CPU use, network use. c)Storage management: operating system for the SunFire X4500 file servers (SUN Solaris or CERN Scientific Linux), a solution for storage management (e.g. dCache or DPM). d)Grid nodes and grid software: configuration of the CERN Scientific Linux for the grid interface nodes, choice and installation of a batch system, choice and installation of grid middleware. e)Tools for interactive use of multiple machines (e.g. PROOF, Ganga). Grid job submission interfaces (e.g. Ganga, GridPilot)

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept Diagram of the system for 1 st data all the hardware is in place, (not all powered up) some open questions biggest new issue is storage management with multiple servers

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept Summary The ATLAS cluster in Geneva is a large Tier 3 –now 188 worker’s CPU cores and 75 TB –not all hardware is integrated yet A part of the system is in production –a Grid site since 2005, runs ATLAS simulation like a Tier 2, plan to continue that. –since Spring in constant interactive use by the Geneva group, plan to continue and to develop further. The group needs local computing. Busy program for several months to have all hardware integrated. With a larger scale come new issues to deal with.

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept Comments about future evolution Interactive work is vital. –Everyone needs to login somewhere. –The more we can do interactively, the better for our efficiency. –A larger fraction of the cluster will be available for login. Plan to remain a Grid site. –Bern and Geneva have been playing a role of a Tier 2 in ATLAS. We plan to continue that. Data transfer are too unreliable in ATLAS. –Need to find ways to make them work much better. –Data placement from FZK directly to Geneva would be welcome. No way to do that (LCG>NorduGrid) at the moment.. Be careful with extrapolations from present experience. Real data volume will be 200x larger then a large Monte Carlo production.