Presentation is loading. Please wait.

Presentation is loading. Please wait.

SRM at Brookhaven Ofer Rind BNL RCF/ACF Z. Liu, S. O’Hare, R. Popescu CHEP04, Interlaken 27 September 2004.

Similar presentations


Presentation on theme: "SRM at Brookhaven Ofer Rind BNL RCF/ACF Z. Liu, S. O’Hare, R. Popescu CHEP04, Interlaken 27 September 2004."— Presentation transcript:

1 SRM at Brookhaven Ofer Rind BNL RCF/ACF Z. Liu, S. O’Hare, R. Popescu CHEP04, Interlaken 27 September 2004

2 9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland2 Outline Interest in Storage Resource Managers (SRM) for the RHIC and ATLAS computing facilities Interest in Storage Resource Managers (SRM) for the RHIC and ATLAS computing facilities Ongoing experience with two implementations of SRM Ongoing experience with two implementations of SRM oBerkeley HRM (LBNL) Deployment and interoperability issues odCache SRM (DESY/FNAL) Deployment and development of HPSS interface Future directions Future directions

3 9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland3 Overview of the RCF/ACF Located at the DOE’s Brookhaven National Laboratory, the RHIC Computing Facility (RCF) formed in the mid-90’s to provide computing infrastructure for the RHIC experiments. Located at the DOE’s Brookhaven National Laboratory, the RHIC Computing Facility (RCF) formed in the mid-90’s to provide computing infrastructure for the RHIC experiments. In the late 90’s, it was named the US Atlas Tier 1 computing center. In the late 90’s, it was named the US Atlas Tier 1 computing center.

4 9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland4 RCF/ACF Hardware Parameters  Linux Farm: 1350 rackmounted nodes allocated among experiments. 230 TB aggregate local disk storage. Centralized Disk: 220 TB SAN served via NFS by 39 Sun servers. Centralized Disk: 220 TB SAN served via NFS by 39 Sun servers. Mass Storage: 4 StorageTek tape silos managed by HPSS. Current store of 1500 TB. Small (10 TB) disk cache. Access via PFTP and HSI. Mass Storage: 4 StorageTek tape silos managed by HPSS. Current store of 1500 TB. Small (10 TB) disk cache. Access via PFTP and HSI.  Large size of data stores plus low cost of local disk is driving the interest in distributed storage solutions.  Grid methodology pushing the need for unified, global access to data

5 9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland5 Why SRM? In an era of grid computing and large, highly distributed data stores, need standardized, uniform access to heterogeneous storage. In an era of grid computing and large, highly distributed data stores, need standardized, uniform access to heterogeneous storage. Storage Resource Managers (SRM) are grid middleware components that provide dynamic space allocation and file management on shared storage elements, which can be disk (DRM) or tape (TRM) systems. Storage Resource Managers (SRM) are grid middleware components that provide dynamic space allocation and file management on shared storage elements, which can be disk (DRM) or tape (TRM) systems. SRMs complement Compute Resource Managers by providing storage reservation and information on file availability, thus facilitating the data movement necessary for scheduling and execution of Grid jobs. SRMs complement Compute Resource Managers by providing storage reservation and information on file availability, thus facilitating the data movement necessary for scheduling and execution of Grid jobs.

6 9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland6 SRM Features Smooth synchronization between storage resources Smooth synchronization between storage resources oPinning and releasing files oAllocating space dynamically on “as needed” basis Insulate clients from storage and network system failures Insulate clients from storage and network system failures oTransient MSS or network failure, e.g. during large file transfers Facilitate file sharing Facilitate file sharing oEliminate unnecessary file transfers Control number of concurrent file transfers Control number of concurrent file transfers oFrom MSS – avoid flooding and thrashing oFrom network – avoid flooding and packet loss Support “streaming model” Support “streaming model” oEfficient quota-based storage management allows long running tasks to process large numbers of files

7 9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland7 The Berkeley SRM Developed by LBNL Scientific Data Management Group Developed by LBNL Scientific Data Management Group Provides a Hierarchical Resource Manager (HRM) plus client software and web service interface Provides a Hierarchical Resource Manager (HRM) plus client software and web service interface SRM HRM/DRM WSGGWS MSS or Disk Web service interface

8 9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland8 Installation/Usage Experience Compact and easy to deploy with strong technical support from LBNL; suitable for small sites Compact and easy to deploy with strong technical support from LBNL; suitable for small sites Limited implementation of HRM had been in use by STAR experiment for some time (no GSI Auth and no WSG) Limited implementation of HRM had been in use by STAR experiment for some time (no GSI Auth and no WSG) Currently, single public HRM server running at BNL Currently, single public HRM server running at BNL o200 GB disk cache (to be upgraded next week) oGSI auth only oClient software deployed internally throughout the farm oFirewall currently open to BNL only; will open externally in 1-2 weeks oFile Monitoring Tool available to users to track transfer progress oWeb service gateway running with user documentation to be available soon omore details http://www.atlasgrid.bnl.gov/srm/manuals/http://www.atlasgrid.bnl.gov/srm/manuals/

9 9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland9 Deployment Issues Installation was eased with advent of a binary release and later improvements in documentation Installation was eased with advent of a binary release and later improvements in documentation Some bugs related to GSI-enabled access to HPSS  solved and fed back into codebase Some bugs related to GSI-enabled access to HPSS  solved and fed back into codebase Tested interoperability using dCache srmcp client for 3rd party transfer from LBNL SRM to dCache SRM Tested interoperability using dCache srmcp client for 3rd party transfer from LBNL SRM to dCache SRM oChoice of WSDL path created incompatibility with 3rd party SRM transfer to dCache  LBNL relocated Some limitations: Some limitations: oNo performance optimization with multiple SRM or shared disk cache oCannot backend single SRM with multiple file systems oCurrently, local client must have gridftp service to transfer files out oProxy expiration handling

10 9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland10 dCache/SRM Joint venture between DESY and FNAL Joint venture between DESY and FNAL dCache features of interest: dCache features of interest: oCaching frontend to MSS oIn addition to multiple file transfer protocols and SRM, provides POSIX like I/O and ROOT tDcache integration oDynamic distributed storage management with load balancing, hotspot handling, garbage collection oGlobal namespace covering distributed pool elements oPortability (JVM) oAlready in production use within the community (scalability and robustness demonstrated) Details http://www.dcache.org/ Details http://www.dcache.org/http://www.dcache.org/

11 9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland11 dCache Architecture

12 9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland12 Installation/Deployment Experience Support from DESY/FNAL has been very helpful Support from DESY/FNAL has been very helpful Installation and configuration process has improved greatly since release of newly packaged rpms Installation and configuration process has improved greatly since release of newly packaged rpms SRM component installation is straightforward SRM component installation is straightforward oMost issues involve GSI, especially in 3rd party transfer  individual pool nodes require host certificate (at least at one end) Single file multiple transfer rate tests look good Single file multiple transfer rate tests look good Development of HPSS interface Development of HPSS interface odCache hook (GET, PUT) provided for drop-in script; pool attraction mechanism determined by PNFS tags oInitial design piggybacks on OSM interface using HSI as the transfer mechanism; plan to replace HSI with a queuing system acting as tape access optimizer oPNFS metadata must be updated following successful PUT into MSS

13 9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland13 GET file from dCache Client requests file PNFS HPSS Select pool Pool Manager GET MSS I/O Script Cached Not cached SRM, gridFTP, dCap, etc.

14 9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland14 PUT file into dCache Client submits file PNFS HPSS Select pool Pool Manager PUT MSS I/O Script SRM, gridFTP, dCap, etc.

15 9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland15 Installation/Deployment Experience Development of HPSS interface (cont.) Development of HPSS interface (cont.) oRegistration utility (hp-register.pl) has been developed to map extant HPSS directory tree into PNFS oMetadata consistency is an important issue since files may move around in HPSS. HPSS bitfile IDs seemed promising candidate, but no feasible API available. oTwo scenarios: 1.Files on HPSS owned by various users but accessible by special dCache user: File location in PNFS DB must be maintained by relying on responsible user (e.g. production manager) and/or automated, periodic consistency check. Both aspects have drawbacks. 2.Files on HPSS owned by special dCache user: Consistency maintained automatically, but less flexible and involves changes to existing data store. oAs dCache adoption increases, plan to move toward latter scenario.

16 9/27/04SRM at Brookhaven, CHEP04, Interlaken, Switzerland16 Future Directions For Berkeley HRM: For Berkeley HRM: oExtend HRM/DRM deployment to other US Atlas sites oIntegrate with RLS and Grid Monitoring service oContinue testing interoperability with other SRM implementations oEncourage more user adoption For dCache/SRM: For dCache/SRM: oOpen up for limited use by USATLAS and RHIC expts. oContinue performance testing on increasing scale oEvaluate feasibility of use as a distributed storage solution on dual use (pool/analysis) farm nodes


Download ppt "SRM at Brookhaven Ofer Rind BNL RCF/ACF Z. Liu, S. O’Hare, R. Popescu CHEP04, Interlaken 27 September 2004."

Similar presentations


Ads by Google