BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group

Outline  dCache system instances at BNL RACF  Phenix (RHIC) dCache System  USATLAS Production dCache  System  Architecture  Network interface  Servers  Transfer statistics  dCache Monitoring  Issues  Current upgrade activities and further plans

BNL dCache system instances  USATLAS production dCache  PHENIX production dCache  SRM 2.2 dCache testbed  OSG dCache testbed

PHENIX production dCache  450 pools, 565 TB storage, 720K files on disk (212 TB).  Currently used as the end repository and archiving mechanism for the PHENIX data production stream.  dccp is the primary transfer mechanism within Phenix Anatrain  SRM is used for offsite transfer, e.g., recent data transfer to IN2P3 Lyon.

USATLAS Production dCache  USATLAS Tier1 dCache deployed for production usage since Oct. 2004; It also participated in a series of Service Challenges since then.  Large scale, grid-enabled, distributed disk storage system  582 nodes in total (15 Core Servers, 555 Read Servers, 12 Write Servers)  dCache PNFS Name Space  904 TB (Production - 583 TB, SC - 321 TB) as of end of May 2007  Disk Pool Space: 762 TB  Grid-enabled (SRM, GSIFTP) Storage Element  HPSS as back-end tape system.  Efficient and optimized tape data access (Oak Ridge Batch System)  Low-cost, locally-mounted disk space on the computing farm as read pool disk space.  Dedicated write pool servers  GFTP doors as adapters for grid traffic.  All grid traffic should go over GFTP doors; Not yet really work this way for all transfer scenarios.

USATLAS dCache HPSS Write pool 12 nodes Read pool 555 nodes SRM Door 2 nodes SRM/SRMDB gridFTP doors 8 nodes dual NIC Other dCache core services 5 nodes admin/pnfs/slony/maintenance/dCap Oak Ridge Batch system ~150MB/s ~350MB/s ~400MB/s BNL firewall ~550MB/s ~350MB/s ~500MB/s Traffic to/from: CERN other Tier1s Tier2s Required bandwidths are indicated ~50MB/s Traffic to/from others

dCache servers  Core servers  Components running  Pnfs node: pnfsManager,dir,pnfs, pnfs DB  Slony PNFS backup node: Slony  Admin node: Admin,LocationManager,PoolManager,AdminDoor  Maintenance node: InfoProvider, statistics  SRM door node: SRM, Utility  SRM DB node: SRM DB, Utility DB  GridFTP door node: GFTP door  DCap door node: Dcap  CPU, memory and OS  Pnfs, slony, admin, maintenance, SRM, SRM DB nodes (just upgraded)  4 core CPU, 8GB memory,  SAS disk for servers running DB like PNFS, slony, SRM DB, Maintenance; SATA for critical servers without DB like admin, SRM.  OS: RHEL 4, 64-bit.  32-bit PNFS; 64-bit application for others

dCache servers (Cont.)  GridFTP door nodes, DCap door node  2 core CPU, 4GB memory  OS: RHEL 4, 32-bit.  32-bit dCache application  Write servers  CPU, memory, OS, file system and disk  2 core CPU, 4GB memory  OS: RHEL 4, 32-bit.  32-bit dCache application  XFS file system; Software raid; SCSI disk  Read servers  CPU, memory, OS, file system and disk  running on worker node; CPU, memory varied  OS: SL4, 32-bit.  32-bit dCache application  EXT3 file system  Read pool space varied

Transfer Statistics (2007 Jan-Jun)

ATLAS data volume at BNL RACF (almost all of data are in dCache)

dCache Monitoring  Ganglia  Load, network, memory usage, disk I/O and etc.  Nagios  disk becomes full or nearly full  Node crash and disk failure  dCache cell offline, pool space usage, restore request status  dCache probe (internal/external; dccp/globus-url-copy/srmcp)  Check whether dCache processes are listening on the correct ports  Host certificate expiration, CRL expiration.  Monitoring scripts  Oak Ridge Batch System monitoring tool  Check log files for signs of trouble  Monitor dCache java processes  Health monitoring and automatic service restart when needed  Others  Off-hour operation; System administrator paging

13 Issues  PNFS bottleneck  Hardware improvement; Chimera deployment;  SRM performance issue; SRM bottleneck  Software improvement; Hardware improvement; SRM DB and SRM separated.  high load on write pool node with poor data I/O when handling concurrent read and write.  Better hardware needed  high load on GFTP door nodes  More GFTP doors needed

Issues (Cont.)  Heavy maintenance workload.  More automatic monitoring and maintenance tools needed.  Production team requires important data to stay on disk, but it is not always the case yet.  Need to “Pin” those data in read pool disk.

Current upgrade activities and further plans  System just upgraded  v1.7.0-41 (SRM improved)  DB and dCache applications stay separated  Maintenance components moved out of admin node  Slony as PNFS replication mechanism; PNFS routine backup moved out from pnfs node to slony node  Hardware upgraded on most core servers  On most core servers, hardware and OS upgraded to 64-bit, and 64- bit dCache applications deployed except PNFS.  Further upgrade plan  Adding five Sun Thumpers as write pools. (Ongoing)  Based on evaluation result, we expect the write I/O rate limit on each pool node to go from 15 MB/s to at least 100 MB/s (with concurrent inbound and outbound traffic)  Adding more GFTP doors

Current upgrade activities and further plans (Cont.)  Deploying HoppingManager and Transfer pool to “pin” important production data in read pool disk.  Tested through  High Availability for critical servers like PNFS, admin node, SRM, SRMDB.  failover and recovery of stopped or interrupted services  Adding more monitoring packages  SRM watch  FNAL monitoring tool  More from OSG and other sites  Chimera v1.8 evaluation and deployment (a Must to BNL)  improved file system engine  Performance scales with back-end database implementation  oracle cluster  scale to the petabyte range – USATLAS Tier-1 Disk Capacity Estimated: Y2007 - 1,556 TB Y2008 - 4,610 TB Y2009 - 8,921 TB Y2010 -17,262 TB Y2011 - 24,427 TB  SRM 2.2 deployment

“Pin” data in read pool disk

18 SUN Thumper Test Results  150 clients sequentially reading 5 random 1.4G files.  Throughput is 350 MB/s for almost 1 hour:   75 clients sequentially writing 3x1.4G files and 75 clients sequentially reading 4x1.4G randomly selected files.  Throughput is 200 MB/s write & 100 MB/s read:

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Similar presentations

Presentation on theme: "BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Similar presentations

Presentation on theme: "BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group."— Presentation transcript:

Similar presentations

About project

Feedback