USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

Slides:



Advertisements
Similar presentations
GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.
Advertisements

An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
16 th May 2006Alessandra Forti Storage Alessandra Forti Group seminar 16th May 2006.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Mass Storage System Forum HEPiX Vancouver, 24/10/2003 Don Petravick (FNAL) Olof Bärring (CERN)
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
BNL Facility Status and Service Challenge 3 Zhenping Liu, Razvan Popescu, Xin Zhao and Dantong Yu USATLAS Computing Facility Brookhaven National Lab.
Tier 1 Facility Status and Current Activities Rich Baker Brookhaven National Laboratory NSF/DOE Review of ATLAS Computing June 20, 2002.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
USATLAS Network/Storage and Load Testing Jay Packard Dantong Yu Brookhaven National Lab.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.
BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
BNL Wide Area Data Transfer for RHIC & ATLAS: Experience and Plans Bruce G. Gibbard CHEP 2006 Mumbai, India.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Oct 24, 2002 Michael Ernst, Fermilab DRM for Tier1 and Tier2 centers Michael Ernst Fermilab February 3, 2003.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Storage and Data Movement at FNAL D. Petravick CHEP 2003.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
GridKa December 2004 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann dCache Implementation at FZK Forschungszentrum Karlsruhe.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Tackling I/O Issues 1 David Race 16 March 2010.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
ASCC Site Report Eric Yen & Simon C. Lin Academia Sinica 20 July 2005.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
DCache/XRootD Dmitry Litvintsev (DMS/DMD) FIFE workshop1Dmitry Litvintsev.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,
The CMS Beijing Tier 2: Status and Application Xiaomei Zhang CMS IHEP Group Meeting December 28, 2007.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
“A Data Movement Service for the LHC”
Vincenzo Spinoso EGI.eu/INFN
BNL Tier1 Report Worker nodes Tier 1: added 88 Dell R430 nodes
James Casey, IT-GD, CERN CERN, 5th September 2005
dCache “Intro” a layperson perspective Frank Würthwein UCSD
Ákos Frohner EGEE'08 September 2008
Data Management cluster summary
LHC Data Analysis using a worldwide computing grid
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005 HEPIX Fall 2005 at SLAC

Outline USATLAS dCache system at BNL Overview of the System Usage of the system Experiences Long-term plan Service Challenge

USATLAS dCache system at BNL A distributed disk caching system as a front-end for Mass Storage System (BNL HPSS). In production service for ATLAS users since November 2004.

Benefits of using dCache Allows transparent access to large amount of data files distributed on disk pools or stored on HSM (HPSS). Provides the users with one unique name- space for all the data files. File system names space view available through an nfs2/3 interface Data is distributed among a large amount of cheap disk servers.

Benefits of using dCache (Cont.) Significantly improves the efficiency of connected tape storage systems, through caching, i.e. gather & flush, and scheduled staging techniques.

Benefits of using dCache (Cont.) Clever selection mechanism and flexible system tuning The system determines whether the file is already stored on one or more disks or in HPSS. The system determines the source or destination dCache pool based on storage group and network mask of clients, I/O direction, also “CPU” load and disk space, configuration of the dCache pools.

Benefits of using dCache (Cont.) Load balanced and fault tolerant Automatic load balancing using cost metric and inter pool transfers. Dynamically replicate files upon detection of hot spot. Allow multiple distributed administrative servers for each type, e.g., read pools, write pools, DCAP doors, SRM doors, GridFTP doors.

Benefits of using dCache (Cont.) Scalability Distributed Movers and Access Points (Doors) Highly distributed Storage Pools Direct client – disk(pool) and disk (pool) – hsm (HPSS) connection.

Benefits of using dCache (Cont.) Support of various access protocols Local access protocol: DCAP (posix like) GsiFTP data transfer protocol Secure Wide Area data transfer protocol Storage Resource Manager Protocol (SRM) - Provide SRM based storage element Space allocation Transfer Protocol Negotiation Dataset pinning Checksum management

USATLAS dCache system at BNL Hybrid model for read pool servers Read pool servers (majority of dCache servers) share resources with worker nodes. Each worker node in Linux farm acts as both storage and compute node. Cheap Linux farm solution to achieve high performance data I/O throughput. Dedicated critical servers Dedicated PNFS node, various door nodes, write pool nodes.

USATLAS dCache system at BNL (Cont.) Optimized backend tape prestage batch system. Oak Ridge Batch System Current version: v System architecture (see the next slide)

Read pools DCap doors SRM door GridFTP doors Control Channel Write pools Data Channel DCap Clients Pnfs Manager Pool Manager HPSS GridFTP Clients SRM Clients Oak Ridge Batch system DCache System

USATLAS dCache system at BNL (Cont.) Note: “shared” means that servers share resource with worker nodes

Usage of the system Total amount of datasets (only production data counted) TB as of 10/04/2005 (123 TB in HPSS for atlas archive) Grid production jobs have used dCache as data source. Positive Feedback; Globus-url-copy as client in the past. Future production system. Will use dCache as both data source and destination, also repository of intermediate data. Will use SRMCP as client. DCAP protocol will be selected instead of GridFTP for higher performance throughput when jobs and data are both on BNL site SC3 (testing phase) used production dCache.

Users and use pattern Clients from BNL on-site Local analysis application from Linux farm (dccp client tool or dCap library) Users write RAW data to the dCache (HPSS), analyze/reanalyze it on farms, then write results into the dCache (HPSS). Grid production jobs submitted to BNL Linux farm (globus-url-copy) Currently only use dCache as data sources. Will use it as for source, intermediate repository and destination. Other on-site users from interactive nodes (dccp) Off-site grid users GridFTP clients Grid production jobs submitted to remote sites Other grid users SRM clients

Experiences and issues Read pool servers sharing resource with worker nodes. Utilize idle disk on compute nodes. Hybrid model works fine. Write pool servers Should run on dedicated servers. Crashed frequently when sharing node with computing. Dedicated servers solved the problem. XFS shows better performance then EXT3. Need reliable disks.

Experiences and issues (Cont.) Potential PNFS bottleneck problem. Should use multiple metadata (PNFS) databases for better performance. Postrgres PNFS database shows better performance and stability than GDBM database. Issue: no quota control on prestage requests that one user can submit at one time.

Experiences and issues (Cont.) No support for globus-url-copy client to do 3 rd party transfer SRMCP support 3 rd part transfer, however not easy to push SRMCP client tool to every site. Anyway, next version of USATLAS production system will use SRMCP Client.

Experiences and issues (Cont.) Current system is stable. Continuously running since last restart on July 21 st even with intensive SC3 phase in the middle. One problem: on average, one read server has bad disk per week. Still reasonable. System administration Not easy in early phase. Much better later Great help from DESY and FNAL dCache project team. More documents Software is better and better. Developed automatic monitoring scripts to avoid, detect or solve problems.

Long-term plan To build petabyte-scale grid-enabled storage element Use petabyte-scale disk space on thousands of farm nodes to hold most recently used data in disk. Altas experiment run will generate data volumes each year on the petabyte scale. HPSS as tape backup for all data.

Long-term plan (Cont.) DCache as grid-enabled distributed storage element solution. Issues need to be investigated Is dCache scalable to very large clusters (thousands of nodes)? Expect higher metadata access rate. Potential bottleneck in dCache Centralized metadata database management currently. Many (i.e. 20) large dCache systems or several very large dCache system(s)? Will network I/O be a bottleneck for a very large cluster? How to decrease internal data I/O and network I/O on Linux farm? File affinity Job Scheduler (???) Monitoring and administration of petabyte scale disk storage system.

Service challenge Service Challenge To test the readiness of the overall computing system to provide the necessary computational and storage resources to exploit the scientific potential of the LHC machine. SC2 Disk-to-disk transfer from CERN to BNL SC3 testing phase Disk-to-disk transfer from CERN to BNL Disk-to-tape transfer from CERN to BNL Disk-to-disk transfer from BNL to Tier-2 centers

SC2 at BNL Testbed dCache Four dCache pool servers with 1 Gigabit WAN network connection. SRMCP was used for transfer control. Only two sites used SRM in SC2 Meet the performance/throughput challenges (disk-to-disk transfer rate at 70~80MB/sec from CERN to BNL).

One day data transfer of SC2

SC3 testing phase Steering: FTS Control: SRM Transfer protocol:GridFTP Production dCache system was used with network upgrade to 10 Gbps between USATLAS storage system and BNL BGP router Disk-to-disk transfer from CERN to BNL Achieved rate at 100~120MB/sec with peak rate at 150MB/sec (sustained for one week) Disk-to-tape transfer from CERN to BNL HPSS Achieved Rate: 60MB/sec (sustained for one week) Disk-to-disk transfer testing from BNL to tier-2 centers tier-2 centers: BU, UC, IU, UTA Aggregated transfer rate at 30MB~40MB/sec Issues: dCache SRM problem; tier-2 network bandwidth; tier-2 storage systems.

SC3

Top daily averages for dCache sites

Benefits of SC activities Help us identify problems and potential issues in dCache storage system upon intensive client usage. Report issues to dCache developers. Srm pinManager crashed with high frequency Better understanding of potential system bottleneck. PNFS More monitoring and maintenance tools developed. System is more stable with fixes.

USATLAS tier-2 dCache deployment USATLAS Tier 1/2 dCache Workshop, BNL, September 12-13, UC, UTA, OU have deployed testbed dCache systems. Ready for SC3 service phase.

dCache development BNL plans to contribute to dCache development. Very early phase. Still looking for possible topics One interesting topic: File affinity job scheduler (Integration of dCache and job scheduler) Manpower increased in September. Now 2 FTE.

Links BNL dCache user guide website USATLAS tier-1 & tier-2 dCache systems. USATLAS dCache workshop