SRM at Brookhaven Ofer Rind BNL RCF/ACF Z. Liu, S. O’Hare, R. Popescu CHEP04, Interlaken 27 September 2004.

Slides:

Advertisements

Similar presentations

The RHIC-ATLAS Computing Facility at BNL HEPIX – Edinburgh May 24-28, 2004 Tony Chan RHIC Computing Facility Brookhaven National Laboratory.

Advertisements

HEPiX GFAL and LCG data management Jean-Philippe Baud CERN/IT/GD.

Site Report: The Linux Farm at the RCF HEPIX-HEPNT October 22-25, 2002 Ofer Rind RHIC Computing Facility Brookhaven National Laboratory.

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.

1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

VL-e PoC Introduction Maurice Bouwhuis VL-e work shop, April 7 th, 2006.

Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.

Experiences Deploying Xrootd at RAL Chris Brew (RAL)

Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.

Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.

Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.

Ofer Rind - RHIC Computing Facility Site Report The RHIC Computing Facility at BNL HEPIX-HEPNT Vancouver, BC, Canada October 20, 2003 Ofer Rind RHIC Computing.

A. Sim, CRD, L B N L 1 OSG Applications Workshop 6/1/2005 OSG SRM/DRM Readiness and Plan Alex Sim / Jorge Rodriguez Scientific Data Management Group Computational.

1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.

Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.

Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,

BNL Facility Status and Service Challenge 3 Zhenping Liu, Razvan Popescu, Xin Zhao and Dantong Yu USATLAS Computing Facility Brookhaven National Lab.

GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh

D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.

Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.

NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.

The GRID and the Linux Farm at the RCF HEPIX – Amsterdam HEPIX – Amsterdam May 19-23, 2003 May 19-23, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, A.

09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

What is SAM-Grid? Job Handling Data Handling Monitoring and Information.

BNL Wide Area Data Transfer for RHIC & ATLAS: Experience and Plans Bruce G. Gibbard CHEP 2006 Mumbai, India.

Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.

BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing.

Computing Sciences Directorate, L B N L 1 CHEP 2003 Standards For Storage Resource Management BOF Co-Chair: Arie Shoshani * Co-Chair: Peter Kunszt ** *

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.

GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National.

January 26, 2003Eric Hjort HRMs in STAR Eric Hjort, LBNL (STAR/PPDG Collaborations)

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,

December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.

BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

Oct 24, 2002 Michael Ernst, Fermilab DRM for Tier1 and Tier2 centers Michael Ernst Fermilab February 3, 2003.

Storage and Data Movement at FNAL D. Petravick CHEP 2003.

Padova, 5 October StoRM Service view Riccardo Zappi INFN-CNAF Bologna.

RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,

Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.

1 Xrootd-SRM Andy Hanushevsky, SLAC Alex Romosan, LBNL August, 2006.

A Service-Based SLA Model HEPIX -- CERN May 6, 2008 Tony Chan -- BNL.

Production Mode Data-Replication Framework in STAR using the HRM Grid CHEP ’04 Congress Centre Interlaken, Switzerland 27 th September – 1 st October Eric.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.

J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

1 Scientific Data Management Group LBNL SRM related demos SC 2002 DemosDemos Robust File Replication of Massive Datasets on the Grid GridFTP-HPSS access.

9/20/04Storage Resource Manager, Timur Perelmutov, Jon Bakken, Don Petravick, Fermilab 1 Storage Resource Manager Timur Perelmutov Jon Bakken Don Petravick.

A. Sim, CRD, L B N L 1 Production Data Management Workshop, Mar. 3, 2009 BeStMan and Xrootd Alex Sim Scientific Data Management Research Group Computational.

Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.

High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.

Jean-Philippe Baud, IT-GD, CERN November 2007

StoRM: a SRM solution for disk based storage systems

Vincenzo Spinoso EGI.eu/INFN

dCache “Intro” a layperson perspective Frank Würthwein UCSD

StoRM Architecture and Daemons

Introduction to Data Management in EGI

Enabling High Speed Data Transfer in High Energy Physics

Large Scale Test of a storage solution based on an Industry Standard

Presentation transcript:

SRM at Brookhaven Ofer Rind BNL RCF/ACF Z. Liu, S. O’Hare, R. Popescu CHEP04, Interlaken 27 September 2004

9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland2 Outline Interest in Storage Resource Managers (SRM) for the RHIC and ATLAS computing facilities Interest in Storage Resource Managers (SRM) for the RHIC and ATLAS computing facilities Ongoing experience with two implementations of SRM Ongoing experience with two implementations of SRM oBerkeley HRM (LBNL) Deployment and interoperability issues odCache SRM (DESY/FNAL) Deployment and development of HPSS interface Future directions Future directions

9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland3 Overview of the RCF/ACF Located at the DOE’s Brookhaven National Laboratory, the RHIC Computing Facility (RCF) formed in the mid-90’s to provide computing infrastructure for the RHIC experiments. Located at the DOE’s Brookhaven National Laboratory, the RHIC Computing Facility (RCF) formed in the mid-90’s to provide computing infrastructure for the RHIC experiments. In the late 90’s, it was named the US Atlas Tier 1 computing center. In the late 90’s, it was named the US Atlas Tier 1 computing center.

9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland4 RCF/ACF Hardware Parameters  Linux Farm: 1350 rackmounted nodes allocated among experiments. 230 TB aggregate local disk storage. Centralized Disk: 220 TB SAN served via NFS by 39 Sun servers. Centralized Disk: 220 TB SAN served via NFS by 39 Sun servers. Mass Storage: 4 StorageTek tape silos managed by HPSS. Current store of 1500 TB. Small (10 TB) disk cache. Access via PFTP and HSI. Mass Storage: 4 StorageTek tape silos managed by HPSS. Current store of 1500 TB. Small (10 TB) disk cache. Access via PFTP and HSI.  Large size of data stores plus low cost of local disk is driving the interest in distributed storage solutions.  Grid methodology pushing the need for unified, global access to data

9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland5 Why SRM? In an era of grid computing and large, highly distributed data stores, need standardized, uniform access to heterogeneous storage. In an era of grid computing and large, highly distributed data stores, need standardized, uniform access to heterogeneous storage. Storage Resource Managers (SRM) are grid middleware components that provide dynamic space allocation and file management on shared storage elements, which can be disk (DRM) or tape (TRM) systems. Storage Resource Managers (SRM) are grid middleware components that provide dynamic space allocation and file management on shared storage elements, which can be disk (DRM) or tape (TRM) systems. SRMs complement Compute Resource Managers by providing storage reservation and information on file availability, thus facilitating the data movement necessary for scheduling and execution of Grid jobs. SRMs complement Compute Resource Managers by providing storage reservation and information on file availability, thus facilitating the data movement necessary for scheduling and execution of Grid jobs.

9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland6 SRM Features Smooth synchronization between storage resources Smooth synchronization between storage resources oPinning and releasing files oAllocating space dynamically on “as needed” basis Insulate clients from storage and network system failures Insulate clients from storage and network system failures oTransient MSS or network failure, e.g. during large file transfers Facilitate file sharing Facilitate file sharing oEliminate unnecessary file transfers Control number of concurrent file transfers Control number of concurrent file transfers oFrom MSS – avoid flooding and thrashing oFrom network – avoid flooding and packet loss Support “streaming model” Support “streaming model” oEfficient quota-based storage management allows long running tasks to process large numbers of files

9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland7 The Berkeley SRM Developed by LBNL Scientific Data Management Group Developed by LBNL Scientific Data Management Group Provides a Hierarchical Resource Manager (HRM) plus client software and web service interface Provides a Hierarchical Resource Manager (HRM) plus client software and web service interface SRM HRM/DRM WSGGWS MSS or Disk Web service interface

9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland8 Installation/Usage Experience Compact and easy to deploy with strong technical support from LBNL; suitable for small sites Compact and easy to deploy with strong technical support from LBNL; suitable for small sites Limited implementation of HRM had been in use by STAR experiment for some time (no GSI Auth and no WSG) Limited implementation of HRM had been in use by STAR experiment for some time (no GSI Auth and no WSG) Currently, single public HRM server running at BNL Currently, single public HRM server running at BNL o200 GB disk cache (to be upgraded next week) oGSI auth only oClient software deployed internally throughout the farm oFirewall currently open to BNL only; will open externally in 1-2 weeks oFile Monitoring Tool available to users to track transfer progress oWeb service gateway running with user documentation to be available soon omore details

9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland9 Deployment Issues Installation was eased with advent of a binary release and later improvements in documentation Installation was eased with advent of a binary release and later improvements in documentation Some bugs related to GSI-enabled access to HPSS  solved and fed back into codebase Some bugs related to GSI-enabled access to HPSS  solved and fed back into codebase Tested interoperability using dCache srmcp client for 3rd party transfer from LBNL SRM to dCache SRM Tested interoperability using dCache srmcp client for 3rd party transfer from LBNL SRM to dCache SRM oChoice of WSDL path created incompatibility with 3rd party SRM transfer to dCache  LBNL relocated Some limitations: Some limitations: oNo performance optimization with multiple SRM or shared disk cache oCannot backend single SRM with multiple file systems oCurrently, local client must have gridftp service to transfer files out oProxy expiration handling

9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland10 dCache/SRM Joint venture between DESY and FNAL Joint venture between DESY and FNAL dCache features of interest: dCache features of interest: oCaching frontend to MSS oIn addition to multiple file transfer protocols and SRM, provides POSIX like I/O and ROOT tDcache integration oDynamic distributed storage management with load balancing, hotspot handling, garbage collection oGlobal namespace covering distributed pool elements oPortability (JVM) oAlready in production use within the community (scalability and robustness demonstrated) Details Details

9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland11 dCache Architecture

9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland12 Installation/Deployment Experience Support from DESY/FNAL has been very helpful Support from DESY/FNAL has been very helpful Installation and configuration process has improved greatly since release of newly packaged rpms Installation and configuration process has improved greatly since release of newly packaged rpms SRM component installation is straightforward SRM component installation is straightforward oMost issues involve GSI, especially in 3rd party transfer  individual pool nodes require host certificate (at least at one end) Single file multiple transfer rate tests look good Single file multiple transfer rate tests look good Development of HPSS interface Development of HPSS interface odCache hook (GET, PUT) provided for drop-in script; pool attraction mechanism determined by PNFS tags oInitial design piggybacks on OSM interface using HSI as the transfer mechanism; plan to replace HSI with a queuing system acting as tape access optimizer oPNFS metadata must be updated following successful PUT into MSS

9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland13 GET file from dCache Client requests file PNFS HPSS Select pool Pool Manager GET MSS I/O Script Cached Not cached SRM, gridFTP, dCap, etc.

9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland14 PUT file into dCache Client submits file PNFS HPSS Select pool Pool Manager PUT MSS I/O Script SRM, gridFTP, dCap, etc.

9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland15 Installation/Deployment Experience Development of HPSS interface (cont.) Development of HPSS interface (cont.) oRegistration utility (hp-register.pl) has been developed to map extant HPSS directory tree into PNFS oMetadata consistency is an important issue since files may move around in HPSS. HPSS bitfile IDs seemed promising candidate, but no feasible API available. oTwo scenarios: 1.Files on HPSS owned by various users but accessible by special dCache user: File location in PNFS DB must be maintained by relying on responsible user (e.g. production manager) and/or automated, periodic consistency check. Both aspects have drawbacks. 2.Files on HPSS owned by special dCache user: Consistency maintained automatically, but less flexible and involves changes to existing data store. oAs dCache adoption increases, plan to move toward latter scenario.

9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland16 Future Directions For Berkeley HRM: For Berkeley HRM: oExtend HRM/DRM deployment to other US Atlas sites oIntegrate with RLS and Grid Monitoring service oContinue testing interoperability with other SRM implementations oEncourage more user adoption For dCache/SRM: For dCache/SRM: oOpen up for limited use by USATLAS and RHIC expts. oContinue performance testing on increasing scale oEvaluate feasibility of use as a distributed storage solution on dual use (pool/analysis) farm nodes