BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Slides:

Advertisements

Similar presentations

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.

Advertisements

Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.

BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.

Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.

Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.

CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.

D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.

Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.

Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,

LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.

BNL Facility Status and Service Challenge 3 Zhenping Liu, Razvan Popescu, Xin Zhao and Dantong Yu USATLAS Computing Facility Brookhaven National Lab.

1 24x7 support status and plans at PIC Gonzalo Merino WLCG MB

Tier 1 Facility Status and Current Activities Rich Baker Brookhaven National Laboratory NSF/DOE Review of ATLAS Computing June 20, 2002.

D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.

Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.

BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.

Site Report BEIJING-LCG2 Wenjing Wu (IHEP) 2010/11/21.

RCF Status One issues with the Mass Storage System (HPSS) –On 1/1 at 2:40 PM the core server process stopped working –Was automatically restarted by heartbeat.

BNL Wide Area Data Transfer for RHIC & ATLAS: Experience and Plans Bruce G. Gibbard CHEP 2006 Mumbai, India.

BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing.

U.S. ATLAS Facilities Jim Shank Boston University (Danton Yu, Rob Gardner, Kaushik De, Torre Wenaus, others)

Alberto Aimar CERN – LCG1 Reliability Reports – May 2007

ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.

CERN Database Services for the LHC Computing Grid Maria Girone, CERN.

Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.

December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.

BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

Oct 24, 2002 Michael Ernst, Fermilab DRM for Tier1 and Tier2 centers Michael Ernst Fermilab February 3, 2003.

CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,

CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.

BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,

PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.

BNL Oracle database services status and future plans Carlos Fernando Gamboa, John DeStefano, Dantong Yu Grid Group, RACF Facility Brookhaven National Lab,

Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.

GridKa December 2004 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann dCache Implementation at FZK Forschungszentrum Karlsruhe.

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

LHC Logging Cluster Nilo Segura IT/DB. Agenda ● Hardware Components ● Software Components ● Transparent Application Failover ● Service definition.

A Service-Based SLA Model HEPIX -- CERN May 6, 2008 Tony Chan -- BNL.

Virtual Machine Movement and Hyper-V Replica

Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.

1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.

9/22/10 OSG Storage Forum 1 CMS Florida T2 Storage Status Bockjoo Kim for the CMS Florida T2.

The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.

RAL Plans for SC2 Andrew Sansum Service Challenge Meeting 24 February 2005.

Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.

Atlas Tier 3 Overview Doug Benjamin Duke University.

STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,

BeStMan/DFS support in VDT OSG Site Administrators workshop Indianapolis August Tanya Levshina Fermilab.

The CMS Beijing Tier 2: Status and Application Xiaomei Zhang CMS IHEP Group Meeting December 28, 2007.

Jean-Philippe Baud, IT-GD, CERN November 2007

Quick Look on dCache Monitoring at FNAL

StoRM Architecture and Daemons

Enabling High Speed Data Transfer in High Energy Physics

Oracle Database Monitoring and beyond

Large Scale Test of a storage solution based on an Industry Standard

Cloud Computing Architecture

Presentation transcript:

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group

Outline  dCache system instances at BNL RACF  Phenix (RHIC) dCache System  USATLAS Production dCache  System  Architecture  Network interface  Servers  Transfer statistics  dCache Monitoring  Issues  Current upgrade activities and further plans

BNL dCache system instances  USATLAS production dCache  PHENIX production dCache  SRM 2.2 dCache testbed  OSG dCache testbed

PHENIX production dCache  450 pools, 565 TB storage, 720K files on disk (212 TB).  Currently used as the end repository and archiving mechanism for the PHENIX data production stream.  dccp is the primary transfer mechanism within Phenix Anatrain  SRM is used for offsite transfer, e.g., recent data transfer to IN2P3 Lyon.

USATLAS Production dCache  USATLAS Tier1 dCache deployed for production usage since Oct. 2004; It also participated in a series of Service Challenges since then.  Large scale, grid-enabled, distributed disk storage system  582 nodes in total (15 Core Servers, 555 Read Servers, 12 Write Servers)  dCache PNFS Name Space  904 TB (Production TB, SC TB) as of end of May 2007  Disk Pool Space: 762 TB  Grid-enabled (SRM, GSIFTP) Storage Element  HPSS as back-end tape system.  Efficient and optimized tape data access (Oak Ridge Batch System)  Low-cost, locally-mounted disk space on the computing farm as read pool disk space.  Dedicated write pool servers  GFTP doors as adapters for grid traffic.  All grid traffic should go over GFTP doors; Not yet really work this way for all transfer scenarios.

USATLAS dCache HPSS Write pool 12 nodes Read pool 555 nodes SRM Door 2 nodes SRM/SRMDB gridFTP doors 8 nodes dual NIC Other dCache core services 5 nodes admin/pnfs/slony/maintenance/dCap Oak Ridge Batch system ~150MB/s ~350MB/s ~400MB/s BNL firewall ~550MB/s ~350MB/s ~500MB/s Traffic to/from: CERN other Tier1s Tier2s Required bandwidths are indicated ~50MB/s Traffic to/from others

7

dCache servers  Core servers  Components running  Pnfs node: pnfsManager,dir,pnfs, pnfs DB  Slony PNFS backup node: Slony  Admin node: Admin,LocationManager,PoolManager,AdminDoor  Maintenance node: InfoProvider, statistics  SRM door node: SRM, Utility  SRM DB node: SRM DB, Utility DB  GridFTP door node: GFTP door  DCap door node: Dcap  CPU, memory and OS  Pnfs, slony, admin, maintenance, SRM, SRM DB nodes (just upgraded)  4 core CPU, 8GB memory,  SAS disk for servers running DB like PNFS, slony, SRM DB, Maintenance; SATA for critical servers without DB like admin, SRM.  OS: RHEL 4, 64-bit.  32-bit PNFS; 64-bit application for others

dCache servers (Cont.)  GridFTP door nodes, DCap door node  2 core CPU, 4GB memory  OS: RHEL 4, 32-bit.  32-bit dCache application  Write servers  CPU, memory, OS, file system and disk  2 core CPU, 4GB memory  OS: RHEL 4, 32-bit.  32-bit dCache application  XFS file system; Software raid; SCSI disk  Read servers  CPU, memory, OS, file system and disk  running on worker node; CPU, memory varied  OS: SL4, 32-bit.  32-bit dCache application  EXT3 file system  Read pool space varied

Transfer Statistics (2007 Jan-Jun)

ATLAS data volume at BNL RACF (almost all of data are in dCache)

dCache Monitoring  Ganglia  Load, network, memory usage, disk I/O and etc.  Nagios  disk becomes full or nearly full  Node crash and disk failure  dCache cell offline, pool space usage, restore request status  dCache probe (internal/external; dccp/globus-url-copy/srmcp)  Check whether dCache processes are listening on the correct ports  Host certificate expiration, CRL expiration.  Monitoring scripts  Oak Ridge Batch System monitoring tool  Check log files for signs of trouble  Monitor dCache java processes  Health monitoring and automatic service restart when needed  Others  Off-hour operation; System administrator paging

13 Issues  PNFS bottleneck  Hardware improvement; Chimera deployment;  SRM performance issue; SRM bottleneck  Software improvement; Hardware improvement; SRM DB and SRM separated.  high load on write pool node with poor data I/O when handling concurrent read and write.  Better hardware needed  high load on GFTP door nodes  More GFTP doors needed

Issues (Cont.)  Heavy maintenance workload.  More automatic monitoring and maintenance tools needed.  Production team requires important data to stay on disk, but it is not always the case yet.  Need to “Pin” those data in read pool disk.

Current upgrade activities and further plans  System just upgraded  v (SRM improved)  DB and dCache applications stay separated  Maintenance components moved out of admin node  Slony as PNFS replication mechanism; PNFS routine backup moved out from pnfs node to slony node  Hardware upgraded on most core servers  On most core servers, hardware and OS upgraded to 64-bit, and 64- bit dCache applications deployed except PNFS.  Further upgrade plan  Adding five Sun Thumpers as write pools. (Ongoing)  Based on evaluation result, we expect the write I/O rate limit on each pool node to go from 15 MB/s to at least 100 MB/s (with concurrent inbound and outbound traffic)  Adding more GFTP doors

Current upgrade activities and further plans (Cont.)  Deploying HoppingManager and Transfer pool to “pin” important production data in read pool disk.  Tested through  High Availability for critical servers like PNFS, admin node, SRM, SRMDB.  failover and recovery of stopped or interrupted services  Adding more monitoring packages  SRM watch  FNAL monitoring tool  More from OSG and other sites  Chimera v1.8 evaluation and deployment (a Must to BNL)  improved file system engine  Performance scales with back-end database implementation  oracle cluster  scale to the petabyte range – USATLAS Tier-1 Disk Capacity Estimated: Y ,556 TB Y ,610 TB Y ,921 TB Y ,262 TB Y ,427 TB  SRM 2.2 deployment

“Pin” data in read pool disk

18 SUN Thumper Test Results  150 clients sequentially reading 5 random 1.4G files.  Throughput is 350 MB/s for almost 1 hour:   75 clients sequentially writing 3x1.4G files and 75 clients sequentially reading 4x1.4G randomly selected files.  Throughput is 200 MB/s write & 100 MB/s read: