RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,

Slides:

Advertisements

Similar presentations

The RHIC-ATLAS Computing Facility at BNL HEPIX – Edinburgh May 24-28, 2004 Tony Chan RHIC Computing Facility Brookhaven National Laboratory.

Advertisements

Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH Home server AFS using openafs 3 DB servers. Web server AFS Mail Server.

Site Report: The Linux Farm at the RCF HEPIX-HEPNT October 22-25, 2002 Ofer Rind RHIC Computing Facility Brookhaven National Laboratory.

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

Duke Atlas Tier 3 Site Doug Benjamin (Duke University)

Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA

1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.

Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.

The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.

A Linux PC Farm for Physics Analysis at the ZEUS Experiment Marek Kowal, Krzysztof Wrona, Tobias Haas, Ingo Martens, Rainer Mankel DESY, Notkestrasse 85,

UCL Site Report Ben Waugh HepSysMan, 22 May 2007.

ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.

27/04/05Sabah Salih Particle Physics Group The School of Physics and Astronomy The University of Manchester

CC - IN2P3 Site Report Hepix Fall meeting 2009 – Berkeley

CASPUR Site Report Andrei Maslennikov Sector Leader - Systems Catania, April 2001.

9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.

October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.

The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services.

Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS.

Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.

Ofer Rind - RHIC Computing Facility Site Report The RHIC Computing Facility at BNL HEPIX-HEPNT Vancouver, BC, Canada October 20, 2003 Ofer Rind RHIC Computing.

An Overview of PHENIX Computing Ju Hwan Kang (Yonsei Univ.) and Jysoo Lee (KISTI) International HEP DataGrid Workshop November 8 ~ 9, 2002 Kyungpook National.

Paul Scherrer Institut 5232 Villigen PSI HEPIX_AMST / / BJ95 PAUL SCHERRER INSTITUT THE PAUL SCHERRER INSTITUTE Swiss Light Source (SLS) Particle accelerator.

Nov 1, 2000Site report DESY1 DESY Site Report Wolfgang Friebel DESY Nov 1, 2000 HEPiX Fall

BNL Site Report Ofer Rind Brookhaven National Laboratory Spring HEPiX Meeting, CASPUR April 3, 2006.

Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,

Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.

Tier 1 Facility Status and Current Activities Rich Baker Brookhaven National Laboratory NSF/DOE Review of ATLAS Computing June 20, 2002.

Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA

Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility HEPiX – Fall, 2005.

Martin Bly RAL Tier1/A RAL Tier1/A Report HepSysMan - July 2004 Martin Bly / Andrew Sansum.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.

SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.

Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606

RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.

Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.

Spending Plans and Schedule Jae Yu July 26, 2002.

RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.

The GRID and the Linux Farm at the RCF HEPIX – Amsterdam HEPIX – Amsterdam May 19-23, 2003 May 19-23, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, A.

Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April

JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.

Condor Usage at Brookhaven National Lab Alexander Withers (talk given by Tony Chan) RHIC Computing Facility Condor Week - March 15, 2005.

Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA

O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.

The GRID and the Linux Farm at the RCF CHEP 2003 – San Diego CHEP 2003 – San Diego March 27, 2003 March 27, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind,

Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606

Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility (formerly CEBAF - The Continuous Electron Beam Accelerator Facility)

US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National Laboratory.

US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory Review of U.S. LHC Software and Computing Projects Fermi National Laboratory November.

ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.

December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.

Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.

Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.

January 30, 2016 RHIC/USATLAS Computing Facility Overview Dantong Yu Brookhaven National Lab.

Office of Science U.S. Department of Energy NERSC Site Report HEPiX October 20, 2003 TRIUMF.

Tier 1 at Brookhaven (US / ATLAS) Bruce G. Gibbard LCG Workshop CERN March 2004.

US ATLAS Tier 1 Facility Rich Baker Deputy Director US ATLAS Computing Facilities October 26, 2000.

Batch Software at JLAB Ian Bird Jefferson Lab CHEP February, 2000.

SRM at Brookhaven Ofer Rind BNL RCF/ACF Z. Liu, S. O’Hare, R. Popescu CHEP04, Interlaken 27 September 2004.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.

1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.

Oct. 6, 1999PHENIX Comp. Mtg.1 CC-J: Progress, Prospects and PBS Shin’ya Sawada (KEK) For CCJ-WG.

10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.

Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA

Jefferson Lab Site Report Sandy Philpott HEPiX Fall 07 Genome Sequencing Center Washington University at St. Louis.

Presentation transcript:

RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton, NY, USA October 18, 2004

Facility Overview ● Created in the mid 1990's to provide centralized computing services for the RHIC experiments ● Expanded our role in the late 1990's to act as the tier 1 computing center for ATLAS in the United States ● Currently employ 28 staff members: planning on adding 5 additional employees in the next fiscal year

Facility Overview (Cont.) ● Ramping up resources provided to ATLAS: Data Challenge 2 (DC2) underway ● RHIC Run 5 scheduled to begin in late December 2004

Centralized Disk Storage ● 37 NFS Servers Running Solaris 9: recent upgrade from Solaris 8 ● Underlying filesystems upgraded to VxFS 4.0 – Issue with quotas on filesystems larger than 1 TB in size ● ~220 TB of fibre channel SAN-based RAID5 storage available: added ~100 TB in the past year

Centralized Disk Storage (Cont.) ● Scalability issues with NFS (network-limited to ~70 MB/s max per server [75-90 MB/s max local I/O] in our configuration): testing of new network storage models including Panasas and IBRIX in progress – Panasas tests look promising. 4.5 TB of storage on 10 blades available for evaluation by our user community. DirectFlow client in use on over 400 machines – Both systems allow for NFS export of data

Centralized Disk Storage (Cont.)

Centralized Disk Storage: AFS ● Moving servers from Transarc AFS running on AIX to OpenAFS on Solaris 9 ● The move from Transarc to OpenAFS motivated by Kerberos4/Kerberos5 issues and Transarc AFS end of life ● Total of 7 fileservers and 6 DB servers: 2 DB servers and 2 fileservers running OpenAFS ● 2 Cells

Mass Tape Storage ● Four STK Powderhorn silos provided, each with the capability of holding ~6000 tapes ● 1.7 PB data currently stored ● HPSS Version 4.5.1: likely upgrade to version 6.1 or 6.2 after RHIC Run 5 ● 45 tape drives available for use ● Latest STK tape technology: 200 GB/tape ● ~12 TB disk cache in front of the system

Mass Tape Storage (Cont.) ● PFTP, HSI and HTAR available as interfaces

CAS/CRS Farm ● Farm of 1423 dual-CPU (Intel) systems – Added 335 machines this year ● ~245 TB local disk storage (SCSI and IDE) ● Upgrade of RHIC Central Analysis Servers/Central Reconstruction Servers (CAS/CRS) to Scientific Linux (+updates) underway: should be complete before next RHIC run

CAS/CRS Farm (Cont.) ● LSF (5.1) and Condor (6.6.6/6.6.5) batch systems in use. Upgrade to LSF 6.0 planned ● Kickstart used to automate node installation ● GANGLIA + custom software used for system monitoring ● Phasing out the original RHIC CRS Batch System: replacing with a system based on Condor ● Retiring 142 VA Linux 2U PIII 450 MHz systems after next purchase

CAS/CRS Farm (Cont.)

Security ● Elimination of NIS, complete transition to Kerberos5/LDAP in progress ● Expect K5 TGT to X.509 certificate transition in the future: KCA? ● Hardening/monitoring of all internal systems ● Growing web service issues: unknown services accessed through port 80

Grid Activities ● Brookhaven planning on upgrading external network connectivity to OC48 (2.488 Gbps) from OC12 (622 Mbps) to support ATLAS activity ● ATLAS Data Challenge 2: jobs submitted via Grid3 ● GUMS (Grid User Management System) – Generates grid-mapfiles for gatekeeper hosts – In production since May 2004

Storage Resource Manager (SRM) ● SRM: middleware providing dynamic storage allocation and data management services – Automatically handles network/space allocation failures ● HRM (Hierarchical Resource Manager)-type SRM server in production – Accessible from within and outside the facility – 350 GB Cache – Berkeley HRM 1.2.1

dCache ● Provides global name space over disparate storage elements – Hot spot detection – Client software data access through libdcap library or libpdcap preload library ● ATLAS & PHENIX dCache pools – PHENIX pool expanding performance tests to production machines – ATLAS pool interacting with HPSS using HSI: no way of throttling data transfer requests as of yet