Computing Sciences Directorate, L B N L 1 SC 2003 Storage Resource Managers: Essential Components for the Grid Arie Shoshani Staff: Alex Sim, Junmin Gu,

Slides:



Advertisements
Similar presentations
HEPiX GFAL and LCG data management Jean-Philippe Baud CERN/IT/GD.
Advertisements

1 SRM-Lite: overcoming the firewall barrier for large scale file replication Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory April, 2007.
Computing Sciences Directorate, L B N L 1 CHEP 2003 Storage Resource Management In the Grid Environment Alex Sim Junmin Gu Arie Shoshani Scientific Data.
1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Toni Saarinen, Tite4 Tomi Ruuska, Tite4 Earth System Grid - ESG.
1 Storage Resource Management: a uniform interface to Grid storage systems Arie Shoshani LBNL (on behalf of the SRM collaboration)
1 GGF- Grid Storage Management WG Global Grid Forum Grid Storage Management Working Group Chairs: Arie Shoshani (LBNL) Chairs: Arie Shoshani (LBNL) Peter.
A. Sim, CRD, L B N L 1 Data Management Foundations Workshop, Mar. 3, 2009 Storage in OSG and BeStMan Alex Sim Scientific Data Management Research Group.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
A. Sim, CRD, L B N L GIN-Data : SRM Island Inter-Op Testing With SRM-TESTER Alex Sim, Vijaya Natarajan Computational Research Division Lawrence Berkeley.
Maarten Litmaath (CERN), EGEE User Forum, CERN, 2006/03/02 (v3) Use of the SRM interface Use case What is the SRM? –Who develops it? –Is it a standard?
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
A. Sim, CRD, L B N L 1 Oct. 23, 2008 BeStMan Extra Slides.
Data management in grid. Comparative analysis of storage systems in WLCG.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
1 Alex Romosan,Derek Wright, Alex Romosan, Derek Wright, Ekow Otoo, Doron Rotem, Arie Shoshani (Guidance: Doug Olson) Lawrence Berkeley National Laboratory.
A. Sim, CRD, L B N L 1 OSG Applications Workshop 6/1/2005 OSG SRM/DRM Readiness and Plan Alex Sim / Jorge Rodriguez Scientific Data Management Group Computational.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
File and Object Replication in Data Grids Chin-Yi Tsai.
INFSO-RI Enabling Grids for E-sciencE DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
4 Oct 04Storage Resource Manager, Timur Perelmutov, Don Petravick, Fermilab 1 Storage Resource Management at Fermilab Timur Perelmutov Don Petravick Fermi.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Bulk Data Movement: Components and Architectural Diagram Alex Sim Arie Shoshani LBNL April 2009.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Andrew C. Smith – Storage Resource Managers – 10/05/05 Functionality and Integration Storage Resource Managers.
1 Meeting Location: LBNL Sept 18, 2003 The functionality of a Replica Registration Service Attendees Michael Haddox-Schatz, JLAB Ann Chervenak, USC/ISI.
Intergrid KoM Santander 22 june, 2006 E-Infraestructure shared between Europe and Latin America José Manuel Gutiérrez
SRM workshop – September’05 1 SRM: Expt Reqts Nick Brook Revisit LCG baseline services working group Priorities & timescales Use case (from LHCb)
The Earth System Grid: A Visualisation Solution Gary Strand.
Web Portal Design Workshop, Boulder (CO), Jan 2003 Luca Cinquini (NCAR, ESG) The ESG and NCAR Web Portals Luca Cinquini NCAR, ESG Outline: 1.ESG Data Services.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
1 Grid File Replication using Storage Resource Management Presented By Alex Sim Contributors: JLAB: Bryan Hess, Andy Kowalski Fermi: Don Petravick, Timur.
Computing Sciences Directorate, L B N L 1 CHEP 2003 Standards For Storage Resource Management BOF Co-Chair: Arie Shoshani * Co-Chair: Peter Kunszt ** *
SRM & SE Jens G Jensen WP5 ATF, December Collaborators Rutherford Appleton (ATLAS datastore) CERN (CASTOR) Fermilab Jefferson Lab Lawrence Berkeley.
1 SRM-Lite: overcoming the firewall barrier for data movement Arie Shoshani Alex Sim Viji Natarajan Lawrence Berkeley National Laboratory SDM Center All-Hands.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National.
January 26, 2003Eric Hjort HRMs in STAR Eric Hjort, LBNL (STAR/PPDG Collaborations)
CERN SRM Development Benjamin Coutourier Shaun de Witt CHEP06 - Mumbai.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
SDM Center Coupling Parallel IO to SRMs for Remote Data Access Ekow Otoo, Arie Shoshani and Alex Sim Lawrence Berkeley National Laboratory.
1 Use of SRM File Streaming by Gateway Alex Sim Arie Shoshani May 2008.
Computing Sciences Directorate, L B N L 1 SC 2003 Storage Resource Managers: Essential Components for the Grid Arie Shoshani Staff: Alex Sim, Junmin Gu,
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
PPDG meeting, July 2000 Interfacing the Storage Resource Broker (SRB) to the Hierarchical Resource Manager (HRM) Arie Shoshani, Alex Sim (LBNL) Reagan.
Author - Title- Date - n° 1 Partner Logo WP5 Status John Gordon Budapest September 2002.
1 Xrootd-SRM Andy Hanushevsky, SLAC Alex Romosan, LBNL August, 2006.
Production Mode Data-Replication Framework in STAR using the HRM Grid CHEP ’04 Congress Centre Interlaken, Switzerland 27 th September – 1 st October Eric.
SRM-iRODS Interface Development WeiLong UENG Academia Sinica Grid Computing 1.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
1 Scientific Data Management Group LBNL SRM related demos SC 2002 DemosDemos Robust File Replication of Massive Datasets on the Grid GridFTP-HPSS access.
A. Sim, CRD, L B N L 1 OSG Site Administrators Meeting, Dec. 13, 2007 Berkeley Storage Manager (BeStMan) Alex Sim Scientific Data Management Research Group.
Arie Shoshani – Dec 2002 Arie Shoshani meeting held at CERN December, 2002 Comments and Observations SRM + GLUE.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
Bologna, March 30, 2006 Riccardo Zappi / Luca Magnoni INFN-CNAF, Bologna.
9/20/04Storage Resource Manager, Timur Perelmutov, Jon Bakken, Don Petravick, Fermilab 1 Storage Resource Manager Timur Perelmutov Jon Bakken Don Petravick.
A. Sim, CRD, L B N L 1 Production Data Management Workshop, Mar. 3, 2009 BeStMan and Xrootd Alex Sim Scientific Data Management Research Group Computational.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
1 Berkeley-SRM v2.1.1 Alex Sim Junmin Gu Arie Shoshani LCG workshop April 6, 2005
Enabling Grids for E-sciencE EGEE-II INFSO-RI The Development of SRM interface for SRB Fu-Ming Tsai Academia Sinica Grid Computing.
Grid, Storage and SRM Jan , 2008.
The Earth System Grid: A Visualisation Solution
SRM V2.1: Additional Design Issues
Data Management cluster summary
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Grid Data Replication Kurt Stockinger Scientific Data Management Group Lawrence Berkeley National Laboratory.
Presentation transcript:

Computing Sciences Directorate, L B N L 1 SC 2003 Storage Resource Managers: Essential Components for the Grid Arie Shoshani Staff: Alex Sim, Junmin Gu, Alex Romosan, Viji Natarajan Scientific Data Management Group Lawrence Berkeley National Laboratory

Computing Sciences Directorate, L B N L 2 SC 2003 Outline What are Storage Resource Managers - MotivationWhat are Storage Resource Managers - Motivation General Analysis Scenario and the use of SRMsGeneral Analysis Scenario and the use of SRMs SRM functionalitySRM functionality Real examples of working SRMsReal examples of working SRMs Advantages of using SRMsAdvantages of using SRMs Conclusions and Future WorkConclusions and Future Work

Computing Sciences Directorate, L B N L 3 SC 2003Motivation Grid architecture needs to include reservation & scheduling of:Grid architecture needs to include reservation & scheduling of: Compute resources Storage resources Network resources Storage Resource Managers (SRMs) role in the data grid architectureStorage Resource Managers (SRMs) role in the data grid architecture Shared storage resource allocation & scheduling Especially important for data intensive applications Often files are archived on a mass storage system (MSS) large scientific collaborations (100’s of clients) – opportunities for file sharing File replication and caching may be used Need to support non-blocking (asynchronous) requests

Computing Sciences Directorate, L B N L 4 SC 2003 Types of SRMs Types of storage resource managersTypes of storage resource managers Disk Resource Manager (DRM) Manages one or more disk resources Tape Resource Manager (TRM) Manages access to a tertiary storage system (e.g. HPSS) Hierarchical Resource Manager (HRM=TRM + DRM) An SRM that stages files from tertiary storage into its disk cache SRMs and File transfersSRMs and File transfers SRMs DO NOT perform file transfer SRMs DO invoke file transfer service if needed (GridFTP, FTP, HTTP, …) SRMs DO monitor transfers and recover from failures TRM: from/to MSS DRM: from/to network

Computing Sciences Directorate, L B N L 5 SC 2003 A multi-file request to a Disk Resource Manager... client File Transfer Service Disk Cache file transfer requests network DRM Disk Cache... Disk Cache File Transfer Service multi-file request file access client Tape System Client-SRM Communication

Computing Sciences Directorate, L B N L 6 SC 2003 Accessing Remote Storage Resource Managers Tape System Disk Cache file transfer requests network DRM Disk Cache... Disk Cache multi-file request file access... client DRMHRM SRM-SRM Communication

Computing Sciences Directorate, L B N L 7 SC 2003 General Analysis Scenario MSS Request Executer Storage Resource Manager Metadata catalog Replica catalog Network Weather Service logical query network client... Request Interpreter request planning A set of logical files Execution plan and site-specific files Client’s site... Disk Cache Disk Cache Compute Engine Disk Cache Compute Resource Manager Storage Resource Manager Compute Engine Disk Cache Requests for data placement and remote computation Site 2 Site 1 Site N Storage Resource Manager Storage Resource Manager Compute Resource Manager result files Execution DAG : Uniform SRM Interface : Uniform SRM Interface

Computing Sciences Directorate, L B N L 8 SC 2003 SRM is a Service (OGSA, CORBA, C++, Java, …) SRM functionalitySRM functionality Manage space Negotiate and assign space to users Manage “lifetime” of spaces Manage files on behalf of a user Pin files in storage till they are released Manage “lifetime” of files Manage action when pins expire (depends on file types) Manage file sharing Policies on what should reside on a storage resource at any one time Policies on what to evict when space is needed Get files from remote locations when necessary Purpose: to simplify client’s task Manage multi-file requests A brokering function: queue file requests, pre-stage when possible Provide grid access to/from mass storage systems HPSS (LBNL, ORNL, BNL), Enstore (Fermi), JasMINE (Jlab), Castor (CERN), MSS (NCAR), …

Computing Sciences Directorate, L B N L 9 SC 2003 SRM works with other SRMs as well as legacy systems by using GridFTP DRM Disk Cache Disk Cache Disk Cache Disk Cache Berkeley ChicagoLivermore HRMGridFTP FTP Disk Cache Request Interpreter Request Manager DRM GridFTP client server Logical Request Data Path Control path Legend:

Computing Sciences Directorate, L B N L 10 SC 2003 Tomcat servlet engine Tomcat servlet engine MCS Metadata Cataloguing Services MCS Metadata Cataloguing Services RLS Replica Location Services RLS Replica Location Services SOAP RMI MyProxy server MyProxy server MCS client RLS client MyProxy client GRAM gatekeeper GRAM gatekeeper CAS Community Authorization Services CAS Community Authorization Services CAS client NCAR-MSS Mass Storage System HPSS High Performance Storage System HPSS High Performance Storage System DRM Storage Resource Management DRM Storage Resource Management HRM Storage Resource Management HRM Storage Resource Management HRM Storage Resource Management HRM Storage Resource Management HRM Storage Resource Management HRM Storage Resource Management gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server gridFTP server openDAPg server openDAPg server gridFTP Striped server gridFTP Striped server LBNL LLNL USC-ISI NCAR ORNL ANL DRM Storage Resource Management DRM Storage Resource Management disk Earth System Grid

Computing Sciences Directorate, L B N L 11 SC 2003 Uniformity of Interface  Compatibility of SRMs SRM Enstore JASMine Client USER/APPLICATIONS Grid Middleware SRM DCache SRM CASTOR SRM Disk Cache

Computing Sciences Directorate, L B N L 12 SC 2003 Where do SRMs belong in the Grid architecture?

Computing Sciences Directorate, L B N L 13 SC 2003 SRMs provide a brokering service by supporting multi-file requests

Computing Sciences Directorate, L B N L 14 SC 2003 DataMover: SRMs use in ESG and PPDG for Robust Muti-file replication for Robust Muti-file replication HRM-COPY (thousands of files) SRM-GET (one file at a time) GridFTP GET (pull mode) stage files archive files Network transfer Get list of files From directory Recovers from file transfer failures Anywhere Disk Cache DataMover (Command-line Interface) HRM (performs writes) LBNL/ ORNL Disk Cache HRM (performs reads) NCAR NCAR-MSS Recovers from staging failures Recovers from archiving failures Web-based File Monitoring Tool

Computing Sciences Directorate, L B N L 15 SC 2003 Concepts: Types of Files Volatile: temporary files with a lifetime guaranteeVolatile: temporary files with a lifetime guarantee Files are “pinned” and “released” Files can be removed by SRM when released or when lifetime expires PermanentPermanent No lifetime Files can only be removed by creator (owner) Durable: files with a lifetime that CANNOT be removed by SRMDurable: files with a lifetime that CANNOT be removed by SRM Files are “pinned” and “released” Files can only be removed by creator (owner) If lifetime expires – invoke administrative action (e.g. notify owner, archive and release)

Computing Sciences Directorate, L B N L 16 SC 2003 Concepts: Types of Spaces TypesTypes Volatile Space can be reclaimed by SRM when lifetime expires durable Space can be reclaimed by SRM only if it does NOT contain files Can choose to archive files and release space Permanent Space can only be released by owner or administrator Assignment of files to spacesAssignment of files to spaces Files can only be assigned to spaces of the same type Spaces can be reservedSpaces can be reserved No limit on number of spaces Space reference handle is returned to client Total space of each type are subject to SRM and/or VO policies Default spacesDefault spaces Files can be put into SRM spaces without explicit reservation Defaults are not visible to client Compacting spaceCompacting space Release all unused space – space that has no files or files whose lifetime expired

Computing Sciences Directorate, L B N L 17 SC 2003 Concepts: Directory Management Usual unix semanticsUsual unix semantics srmLs, srmMkdir, srmMv, srmRm, srmRmdir A single directory for all file typeA single directory for all file type No directories for each type File assignment to types is virtual File can be placed in SRM-managed directories by maitaining mapping to client’s directory Access control servicesAccess control services Support owner/group/world permission Can only be assigned by owner When file requested by user, SRM should check permission with source site

Computing Sciences Directorate, L B N L 18 SC 2003 Examples of Directory Structures (user defined) D1 D3D2 D4 F2 (P) F4 (P)F5 (D) F1 (D)F3 (V) D1 D3D2 D4 F1 (V)F2 (V)F3 (V)F4 (D)F5 (D)F6 (D)F7 (P)F8 (P) (1) Mixed file types(2) By file type Supported function: ChangeFileType Advantage of (1): no need to move files when file types are changed

Computing Sciences Directorate, L B N L 19 SC 2003 Concepts: Space Reservations NegotiationNegotiation Client asks for space: C-guaranteed, MaxDesired SRM return: S-guaranteed <= C-guaranteed, best effort <= MaxDesired Type of spaceType of space Can be specified Subject to limits per client (SRM or VO policies) Default: volatile LifetimeLifetime Negotiated: C-lifetime requested SRM return: S-lifetime <= C-lifetime Reference handleReference handle SRM returns space reference handle User can provide: srmSpaceTokenDescription to recover handles

Computing Sciences Directorate, L B N L 20 SC 2003 Concepts: Transfer Protocol Negotiation NegotiationNegotiation Client provides an ordered list SRM return: highest possible protocol it supports ExampleExample Protocols list: bbftp, gridftp, ftp SRM returns: gridftp AdvantagesAdvantages Easy to introduce new protocols User controls which protocol to use Default – SRM policy choice How it is returned?How it is returned? The protocol of the Transfer URL (TURL) Example: bbftp://dm.slac.edu/temp/run11/File678.txt

Computing Sciences Directorate, L B N L 21 SC 2003 Concepts: Multi-file requests Can srmRequestToGet multiple filesCan srmRequestToGet multiple files Required: Files URLs Optional: space file type, space handle, Protocol list Optional: total retry time Provide: Site URL (SURL)Provide: Site URL (SURL) URL known externally – e.g. in Rep Catalogs e.g. srm://sleepy.lbl.gov:4000/tmp/foo-123 Get back: transfer URL (TURL)Get back: transfer URL (TURL) Path can be different that in SURL – SRM internal mapping Protocol chosen by SRM e.g. gridftp://dm.lbl.gov:4000/home /level1/foo-123 Managing request queueManaging request queue Allocate space according to policy, system load, etc. Bring in as many files as possible Provide information on each file brought in or pinned Bring additional files as soon as files are released Support file streaming

Computing Sciences Directorate, L B N L 22 SC 2003 SRM functionality Space reservationSpace reservation Negotiate and assign space to users Manage “lifetime” of spaces Release and compact space File managementFile management Assign space for putting files into SRM Pin files in storage when requested till they are released Manage “lifetime” of files Manage action when pins expire (depends on file types) Get files from remote locations when necessaryGet files from remote locations when necessary Purpose: to simplify client’s task srmCopy: in “pull” and “push” modes

Computing Sciences Directorate, L B N L 23 SC 2003 SRM functionality (Cont’d) Space management policies and file sharingSpace management policies and file sharing Policies on what should reside on a storage resource at any one time Policies on what to evict when space is needed Share files to avoid getting them from remote locations Manage multi-file requestsManage multi-file requests Queues file requests, pre-stage when possible Status functionsStatus functions Files: lifetime remaining, what’s available locally Requests: what files are available (needed in lieu of callbacks) Request summary: for progress report Space metadata: space in use, space available, lifetime Provide grid access to/from mass storage systemsProvide grid access to/from mass storage systems HPSS (LBNL, ORNL, BNL), Enstore (Fermi), JasMINE (Jlab), Castor (CERN), MSS (NCAR), SE (RAL) …

Computing Sciences Directorate, L B N L 24 SC 2003 SRM Methods File Movement srm(Prepare)Get: srm(Prepare)Put: srmReplicate: Lifetime management srmReleaseFiles: srmPutDone: srmExtendFileLifeTime: Terminate/resume srmAbortRequest: srmAbortFile srmSuspendRequest: srmResumeRequest: Space management srmReserveSpace srmReleaseSpace srmUpdateSpace srmCompactSpace: srmGetCurrentSpace: FileType management srmChangeFileType: Status/metadata srmGetRequestStatus: srmGetFileStatus: srmGetRequestSummary: srmGetRequestID: srmGetFilesMetaData: srmGetSpaceMetaData:

Computing Sciences Directorate, L B N L 25 SC 2003 Summary: advantages of using SRMs Synchronization between storage resourcesSynchronization between storage resources Pinning file, releasing files Allocating space dynamically on as “needed basis” Insulate clients from storage and network system failuresInsulate clients from storage and network system failures Transient MSS failure Network failures Interruption of large file transfers Facilitate file sharingFacilitate file sharing Eliminate unnecessary file transfers Support “streaming model”Support “streaming model” Use space allocation policies by SRMs: no reservations needed Use explicit release by client for reuse of space Control number of concurrent file transfersControl number of concurrent file transfers From/to MSS – avoid flooding MSS and thrashing From/to network – avoid flooding and packet loss

Computing Sciences Directorate, L B N L 26 SC 2003 Web-Based File Monitoring Tool Shows: -Files already transferred - Files during transfer - Files to be transferred Also shows for each file: -Source URL -Target URL -Transfer rate

Computing Sciences Directorate, L B N L 27 SC 2003 File tracking helps to identify bottlenecks Shows that archiving is the bottleneck

Computing Sciences Directorate, L B N L 28 SC 2003 File tracking shows recovery from transient failures Total: 45 GBs

Computing Sciences Directorate, L B N L 29 SC 2003 File tracking shows network slowdown and recovery Total: 53 GBs

Computing Sciences Directorate, L B N L 30 SC 2003 Ongoing and Future Work Ongoing workOngoing work Developing Standard SRM interfaces Particle Physics Data Grid (PPDG) project LBNL, TJNAF, FNAL European Data Grid (EDG) project WP2 - data management WP5 – mass storage Deployment LBNL, BNL, ORNL, TJNAF, FNAL, CERN, (SE-England) Use of SRM by other agents Storage Resource Broker (SDSC) calling HRM to Stage files from HPSS GridFTP invoking HRM New Spec completed (SRM V2.1) directory management File/directory file movement dynamic space management Future workFuture work Access authorization – community access service (CAS) “On-demand” space allocation, accounting, and charging Replica management – invoke SRMs and RLS as a single service Request executer (e.g. DAGMAN) to invoke SRMs SRMs over NeST (Network STorage)