Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 StorNet: Co-Scheduling Network and Storage with TeraPaths and SRM Dantong.

Similar presentations


Presentation on theme: "Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 StorNet: Co-Scheduling Network and Storage with TeraPaths and SRM Dantong."— Presentation transcript:

1 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 StorNet: Co-Scheduling Network and Storage with TeraPaths and SRM Dantong Yu (BNL) Arie Shoshani (LBNL) September 28-29, 2009 1

2 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009Outline Project OverviewProject Overview MotivationMotivation Approach and ArchitectureApproach and Architecture Gap Between The Existing Work And Project GoalsGap Between The Existing Work And Project Goals Required New FunctionalitiesRequired New Functionalities Services and Communication FlowsServices and Communication Flows Project PlansProject Plans Backup SlidesBackup Slides 2

3 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Project Overview Project Goals:Project Goals: To design and develop an integrated end-to-end resource provisioning for high performance data transfer. To improve resource co-scheduling including network and storage resource and ensure data transfer efficiency and resource utilization. To support end to end data transfer with a negotiated transfer completion timeline. Project Participants: Project Participants: LBNL: Arie Shoshani, Alex Sim, Junmin Gu, Viji Natarajan BNL: Dantong Yu, Dimitrios Katramatos, and Xin Liu (Newly Hired Post Doc). 3

4 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009Motivation End-to-end scheduling of data movement requires:End-to-end scheduling of data movement requires: Availability of network bandwidth on the backbone wide area network (WAN) Availability of local area network (LAN) bandwidth from end hosts to the border routers of the WAN But alsoBut also Availability of data to be moved out at the source Availability of storage space at the target Availability of bandwidth at the source storage system, (i.e. disk and network cards) Availability of bandwidth at the target storage system Why is that hard?Why is that hard? Need to coordinate source and target bandwidth to match with each other within available windows Also, need to coordinate these with internal and existing network bandwidth 4

5 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Approach and Architecture Leverage existing technologiesLeverage existing technologies TeraPaths on top of OSCARS (network concatenation) Storage Resource Managers (SRMs) on top of TeraPaths Use Berkeley Storage Manager (BeStMan) implementation of SRM 5 TeraPaths

6 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 What’s missing in these tools to achieve our goals BeStMan needs to be enhanced to:BeStMan needs to be enhanced to: Keep track of bandwidth commitments for multiple request Coordinate between source and target BeStMan’s for storage space and bandwidth Provide advanced reservation for future time window commitments Communication and coordination with underlying TeraPaths TeraPaths needs to be enhanced to:TeraPaths needs to be enhanced to: Receive bandwidth requests from BeStMan in the form of (volume, max-bandwidth, max-completion-time) Negotiate with OSCARS for “best” time window “best” can be earliest completion time, or shortest transfer time If success, commit reservation, and return to BeStMan If failure, find closest solution to suggest to BeStMan 6

7 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009Services SRM Services:SRM Services: Processing Service Request, and subsequent coordinating network planes.Processing Service Request, and subsequent coordinating network planes. Network Services:Network Services: End-to-end circuits connecting two storage places. Service State/Status:Service State/Status: SRM Data Transfer Progress, and Performance. End-to-end circuit state and performance. 7

8 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Multi-Layer Capability View QoS MPLS IP MPLS TeraPaths Services SRM/GridFtp Applications Application, Middleware security AA PlaneControl Plane Data Plane Service Plane Management Plane Manage ment Plane AA Layer 3 Application, Middleware Layer Security TCP UDP TeraPaths Services Manage ment Plane AA Layer 4 Security TeraPaths Control Layer2 Control VLANs TeraPaths Services AA Manage ment Plane Layer 2 Security Application, Middleware Management No in implementation 8 Translate the multi-layer network architecture view into our project implementation.

9 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Multiple-Layer Architecture View BeStMan/Application Plane TeraPaths Service Plane TeraPaths Management Plane TeraPaths Control Plane Generic DataPlane Layer AA Plane 9

10 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Specific Use Case: BeStMan in “pull’ mode 1) Target BeStMan gets request (userID (credential, priority), files/directory, maxCompletionTime) 1) Target BeStMan gets request (userID (credential, priority), files/directory, maxCompletionTime) 2) T-BeStMan checks if it has any of the files, and pins them (till maxCompletionTime) 3) T-BeStMan contact S-BeStMan (get volumeOfRestOfFiles, get S-maxBandwidth) -> sent, get response 4) T-BeStMan allocates space (for volume), finds its own T- maxBandwidth 5) Determines desiredMaxBandwidth = min(T-maxBandwidth, S- maxBandwidth) 6) T-BeStMan calls local TPs for “reserve and commit” (userID, DesiredBeginTime=now, volume, desiredMaxBandwidth, maxCompletionTime) 7) TPs checks validity of UserID, priority, and authorization, negotiates with OSCARS 8) TPs returns (a) (reservationID, reservedBeginTime, reservedEndTime, reservedBandwidth), or (b) “can’t do it by maxCompletionTime, but here is new (longer) completion time. 9) T-BeStMan informs user case a) “here is your reservation”. OK? If yes, no actions; if no, issue cancel reservation to TP case b) “can’t do it, do you wish to use extended maxCompletionTime? If no, cancel; if yes, accept. 10

11 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 New APIs to be defined and functionality and Communication Flow developed Target BeStMan Space Management Bandwidth management Source BeStMan Space Management Bandwidth management TeraPaths Bandwidth coordination and reservation Data Flow Control Flow Pulling ClientPushing Client Notes: Push and Pull modes are needed because of security limitations Data Flow Client-to-BeStMan BeStMan-to-BeStMan BeStMan-to-TeraPaths Control Plane in TeraPaths 11

12 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 The TeraPaths project Background of TeraPathsBackground of TeraPaths Project Objective View of the world (network) System architecture Establishing flow-based end-to-end QoS pathsEstablishing flow-based end-to-end QoS paths Domain interoperation Distributed reservation negotiation. 12

13 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 13 TeraPaths Overview TeraPaths is a DOE/Office of Science project on end-to-end QoS (BNL, Michigan, Boston University and Stony Brook)TeraPaths is a DOE/Office of Science project on end-to-end QoS (BNL, Michigan, Boston University and Stony Brook) It provides QoS guarantees at the individual data flow levelIt provides QoS guarantees at the individual data flow level From end host to end host; transparently Because not all data flows are the same… Default “best effort” network behavior treats all data flows as equal Capacity is limited Congestion causes bandwidth and latency variations Performance and service disruption problems, unpredictability Data flows have varying priority/importance Video streams, Critical data, Long duration transfers It schedules network utilizationIt schedules network utilization Regulate and classify (prioritize) traffic accounting for policy/SLA It’s targeted for “high-impact” domains…not intended to scale to the internet in generalIt’s targeted for “high-impact” domains…not intended to scale to the internet in general

14 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 14 View of the High Performance Network WAN ctrl WAN 1 WAN 2 WAN 3 TeraPaths Domain ctrl TeraPaths RN TeraPaths WAN ctrl Site ASite BSite CSite D MPLS tunnel Dynamic circuit

15 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 15 Establishing End-to-End QoS Paths Multiple administrative domainsMultiple administrative domains Cooperation, trust, but each maintains full control Heterogeneous environment Domain controller coordination through web services Coordination modelsCoordination models Star Requires extensive information for all domains Daisy chain Requires common flexible protocol across all domains Hybrid (star+daisy chain, end-sites first) Independent protocols Direct end site negotiation … … …

16 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 16 L2 vs. L3 (1/2) MPLS tunnel starts and ends within WAN domainMPLS tunnel starts and ends within WAN domain Packets are admitted into the tunnel based on flow ID information (IP src, port src, IP dst, port dst ) WAN admission performed at the first router of the tunnel (ingress) WAN border router MPLS tunnel ingress/egress router MPLS tunnel ingress/egress router

17 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 17 L2 vs. L3 (2/2)  Dynamic circuit appears as VLAN connecting end site border routers with single hop  Cannot use flow ID data directly  Flow must be directed to the proper VLAN  WAN admission performed within end site LAN  Select VLAN with Policy Based Routing (PBR) at both ends  Route can be selected on a per-flow basis WAN switch border router

18 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 What is needed for a reservation ?  The ReservationData data structure contains all necessary information about a reservation.  SourceDestination  Source and Destination addresses and ports  Start time and duration  Bandwidth required and QoS class  Related WAN reservations identifier  User credentials  Rescheduling criteria  Most TeraPaths web services use the ReservationData data structure to communicate with each other. 18

19 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Distributed Reservation Negotiation End-to-end paths comprise multiple segments Each segment of each domain is established by a reservation Domains have to agree on parameters and their ranges Each domain is characterized by a resource availability graph, e.g., for bandwidth The availability of all domains can be established by calculating the minimum availability graph Each new reservation has to fit in the available area Reservations that don’t fit have to be modified If no modification makes a reservation fit, it is rejected TeraPaths currently modifies only start time on a individual site basis and iterates with counter offers OSCARS is tried if/after end-sites agree Will extend to modify start time, end time, and bandwidth, using end- to-end BAGs if applicable or combination of BAGs + trial and error otherwise 19

20 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 2 1 5 3 6 4 time bandwidth t s1 t s4 t s2 t s3 t s5 t s6 t e1 t e3 t e5 t e4 t e2 t e6 t1t1 t7t7 t2t2 t3t3 t8t8 t 11 t4t4 t5t5 t 10 t9t9 t6t6 t 12 max reserved available reservation Bandwidth Reservation Requests Bandwidth Availability Graph (BAG) 20

21 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 time bandwidth t1t1 t7t7 t2t2 t3t3 t8t8 t 11 t4t4 t5t5 t 10 t9t9 t6t6 t 12 max T Smin T Smax TSTS (a) (b) new new (modified) Find Resources for New Request 21

22 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 time bandwidth t1t1 t8t8 t2t2 t3t3 t 10 t 13 t4t4 t5t5 t 12 t 11 t7t7 t 14 max A max B (a) (b) max B t6t6 t9t9 End-to-End Bandwidth Availability Graph 22 Domain A Domain B Combined BAG

23 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Storage Resource Managers (SRMs) Definition SRMs are middleware components whose function is to provide dynamic space allocation and file management for storage components 23

24 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009Requirements Grid architecture needs to include reservation & scheduling of:Grid architecture needs to include reservation & scheduling of: Compute resources Storage resources Network resources Storage Resource Managers (SRMs) role in the data grid architectureStorage Resource Managers (SRMs) role in the data grid architecture Shared storage resource allocation & scheduling Especially important for data intensive applications Often files are archived on a mass storage system (MSS) Wide area networks – need to minimize transfers by file sharing Scaling: large collaborations (100’s of nodes, 1000’s of clients) – opportunities for file sharing Controlled file replication and caching Need to support non-blocking (asynchronous) requests Storage Cleanup (garbage collection) – using lifetime 24

25 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Uniformity of Interface  Compatibility of SRMs SRM Enstore JASMine Client USER/APPLICATIONS Grid Middleware SRM dCache SRM Castor SRM Unix-based disks SRM SE CCLRC RAL

26 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Want: Peer-to-Peer Uniform Interface MSS Storage Resource Manager network client Client (command line)... Client’s site... Disk Cache Disk Cache Site 2 Site 1 Site N Storage Resource Manager Disk Cache Storage Resource Manager Client Program Disk Cache Disk Cache... Storage Resource Manager Disk Cache Disk Cache... Uniform SRM interface

27 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Who’s involved… CERN, European Organization for Nuclear Research, SwitzerlandCERN, European Organization for Nuclear Research, Switzerland Lana Abadie, Paolo Badino, Olof Barring, Jean-Philippe Baud, Tony Cass, Flavia Donno, Akos Frohner, Birger Koblitz, Sophie Lemaitre, Maarten Litmaath, Remi Mollon, Giuseppe Lo Presti, David Smith, Paolo Tedesco Deutsches Elektronen-Synchrotron, DESY, Hamburg, GermanyDeutsches Elektronen-Synchrotron, DESY, Hamburg, Germany Patrick Fuhrmann, Tigran Mkrtchan Fermi National Accelerator Laboratory, Illinois, USAFermi National Accelerator Laboratory, Illinois, USA Matt Crawford, Dmitry Litvinsev, Alexander Moibenko, Gene Oleynik, Timur Perelmutov, Don Petravick ICTP/EGRID, ItalyICTP/EGRID, Italy Ezio Corso, Massimo Sponza INFN/CNAF, ItalyINFN/CNAF, Italy Alberto Forti, Luca Magnoni, Riccardo Zappi LAL/IN2P3/CNRS, Faculté des Sciences, Orsay Cedex, FranceLAL/IN2P3/CNRS, Faculté des Sciences, Orsay Cedex, France Gilbert Grosdidier Lawrence Berkeley National Laboratory, California, USALawrence Berkeley National Laboratory, California, USA Junmin Gu, Vijaya Natarajan, Arie Shoshani, Alex Sim Rutherford Appleton Laboratory, Oxfordshire, EnglandRutherford Appleton Laboratory, Oxfordshire, England Shaun De Witt, Jens Jensen, Jiri Menjak Thomas Jefferson National Accelerator Facility (TJNAF), USAThomas Jefferson National Accelerator Facility (TJNAF), USA Michael Haddox-Schatz, Bryan Hess, Andy Kowalski, Chip Watson 27

28 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Concepts

29 Storage Resource Managers: Main concepts Non-interference with local policies Advance space reservations Dynamic space management Pinning file in spaces Support abstract concept of a file name: Site URL Temporary assignment of file names for transfer: Transfer URL Directory Management and ACLs Transfer protocol negotiation Peer to peer request support Support for asynchronous multi-file requests Support abort, suspend, and resume operations 29

30 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 SRM functionality Space reservationSpace reservation Negotiate and assign space to users Manage “lifetime” of spaces Release and compact space File managementFile management Assign space for putting files into SRM Pin files in storage when requested till they are released Manage “lifetime” of files Manage action when pins expire (depends on file types) Get files from remote locations when necessaryGet files from remote locations when necessary Purpose: to simplify client’s task srmCopy: in “pull” and “push” modes 30

31 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Concepts: Space Reservations NegotiationNegotiation Client asks for space: C-guaranteed, MaxDesired SRM return: S-guaranteed <= C-guaranteed, best effort <= MaxDesired Type of spacesType of spaces Specified during srmReserveSpace Access Latency (Online, Nearline) Retention Policy (Replica, Output, Custodial) Subject to limits per client (SRM or VO policies) Default: implementation and configuration specific LifetimeLifetime Negotiated: C-lifetime requested SRM return: S-lifetime <= C-lifetime Space reference handleSpace reference handle SRM returns space reference handle (space-token) Client can assign Description User can use srmGetSpaceTokens to recover handles on basis of ownership 31

32 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Concepts: Site URL and Transfer URL Provide: Site URL (SURL)Provide: Site URL (SURL) URL known externally – e.g. in Replica Catalogs e.g. srm://ibm.cnaf.infn.it:8444/dteam/test.10193 Get back: transfer URL (TURL)Get back: transfer URL (TURL) Path can be different than SURL – SRM internal mapping Protocol chosen by SRM based on request protocol preference e.g. gsiftp://ibm139.cnaf.infn.it:2811//gpfs/sto1/dteam/test.10193 One SURL can have many TURLOne SURL can have many TURL Files can be replicated in multiple storage components Files may be in near-line and/or on-line storage In a light-weight SRM (a single file system on disk) SURL may be the same as TURL except protocol In light-weight SRM (a single file system on disk)In light-weight SRM (a single file system on disk) SURL can be the same as TURL except protocol File sharing is possibleFile sharing is possible Same physical file, but many requests Needs to be managed by SRM 32

33 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Concepts: Transfer Protocol Negotiation NegotiationNegotiation Client provides an ordered list of desired transfer protocols SRM return: highest possible protocol it supports ExampleExample Protocols list: bbftp, gridftp, ftp SRM returns: gridftp AdvantagesAdvantages Easy to introduce new protocols User controls which protocol to use How it is returned?How it is returned? The protocol of the Transfer URL (TURL) Example: bbftp://dm.slac.edu/temp/run11/File678.txt 33

34 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Summary: SRM Methods (partial list) File Movement srmPrepareToGet srmPrepareToPut srmRemoteCopy srmBringOnline srmAddFilesToSpace srmPurgeFrom Space Lifetime management srmReleaseFiles srmPutDone srmExtendFileLifeTimeInSpace Terminate/resume srmAbortRequest srmAbortFile srmSuspendRequest srmResumeRequest Space management srmReserveSpace srmReleaseSpace srmUpdateSpace srmGetSpaceTokens FileType management srmChangeFileType srmChangeSpaceForFiles Status/metadata srmGetRequestStatus srmGetFileStatus srmGetRequestSummary srmGetRequestID srmGetFilesMetaData srmGetSpaceMetaData

35 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 e.g. Request-to-Get Files Functional Spec srmPrepareToGet In:TUserIDuserID, TGetFileRequest[ ]arrayOfFileRequest, string[]TransferProtocols, stringuserRequestDescription, TStorageSystemInfostorageSystemInfo, BooleanstreamingMode Out:TRequestTokenrequestToken, TReturnStatusreturnStatus, TGetRequestFileStatus[ ]arrayOfFileStatus 35

36 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 e.g. Space Reservation Functional Spec srmReserveSpace In: TUserIDuserID, TSpaceTypetypeOfSpace, StringuserSpaceTokenDescription, TSizeInBytessizeOfTotalSpaceDesired, TSizeInBytessizeOfGuaranteedSpaceDesired, TLifeTimeInSecondslifetimeOfSpaceToReserve, TStorageSystemInfostorageSystemInfo Int expectedFileSize [ ] Out: TSpaceToken, referenceHandleOfReservedSpace, Out: TSpaceToken, referenceHandleOfReservedSpace, TSpaceTypetypeOfReservedSpace, TSizeInBytessizeOfTotalReservedSpace, TSizeInBytessizeOfGuaranteedReservedSpace, TLifeTimeInSecondslifetimeOfReservedSpace, TReturnStatusreturnStatus 36

37 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Example SRM usage

38 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Berkeley Storage Manager (BeStMan) LBNL Java implementationJava implementation Designed to work with unix- based disk systemsDesigned to work with unix- based disk systems As well as MSS to stage/archive from/to its own disk (currently HPSS)As well as MSS to stage/archive from/to its own disk (currently HPSS) Adaptable to other file systems and storages (e.g. NCAR MSS, VU L-Store, TTU Lustre)Adaptable to other file systems and storages (e.g. NCAR MSS, VU L-Store, TTU Lustre) Uses in-memory database (BerkeleyDB)Uses in-memory database (BerkeleyDB) Multiple transfer protocolsMultiple transfer protocols Space reservationSpace reservation Directory management (no ACLs)Directory management (no ACLs) Can copy files from/to remote SRMsCan copy files from/to remote SRMs Can copy entire directory robustlyCan copy entire directory robustly Large scale data movement of thousands of files Recovers from transient failures (e.g. MSS maintenance, network down) Local PolicyLocal Policy Fair request processing File replacement in disk Garbage collection

39 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 SRMs at work Europe : WLCG/EGEEEurope : WLCG/EGEE 177+ deployments, managing more than 10PB 116 DPM/SRM 54 dCache/SRM 7 CASTOR/SRM at CERN, CNAF, PIC, RAL, Sinica StoRM at ICTP/EGRID, INFN/CNAF USUS Estimated at about 35 deployments OSG dCache/SRM from FNAL BeStMan/SRM from LBNL BeStMan-Gateway Skeleton SRM for local implementation SRM-Xrootd: using BeStMan-Gateway for Xrootd ESG DRM/SRM, HRM/SRM at LANL, LBNL, LLNL, NCAR, ORNL Others JasMINE/SRM from TJNAF L-Store/SRM from Vanderbilt Univ. BeStMan/SRM adaptation on Lustre file system at Texas Tech 39

40 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 SRB (iRODS) SDSC SINICA LBNL EGEE Interoperability in SRM v2.2 Client User/application CASTOR DPM mySQL DB Disk BeStMan xrootd BNL SLAC LBNL dCache

41 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Earth System Grid Main ESG portalMain ESG portal 148.53 TB of data at four locations (NCAR, LBNL, ORNL, LANL) 965,551 files Includes the past 7 years of joint DOE/NSF climate modeling experiments 4713 registered users from 28 countries Downloads to date: 31TB/99,938 files IPCC AR4 ESG portalIPCC AR4 ESG portal 28 TB of data at one location 68,400 files Model data from 11 countries Generated by a modeling campaign coordinated by the Intergovernmental Panel on Climate Change (IPCC) 818 registered analysis projects from 58 countries Downloads to date: 123TB/543,500 files, 300 GB/day on average Courtesy: http://www.earthsystemgrid.org

42 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 MCS Metadata Cataloguing Services RLS Replica Location Services MyProxy MSS Mass Torage System DISK HPSS DISK HPSS DRM Storage Resource Management DRM Storage Resource Management HRM Storage Resource Management HRM Storage Resource Management HRM Storage Resource Management HRM Storage Resource Management HRM Storage Resource Management HRM Storage Resource Management GridFTP server GridFTP server GridFTP server GridFTP server GridFTP server GridFTP server GridFTP server OPeNDAP-g LBNL LLNL ISI NCAR ORNL ANL DRM Storage Resource Management DRM Storage Resource Management GridFTP server GridFTP server LANL GridFTP service RLS Globus Security infrastructure IPCC Portal ESG Metadata DB User DB XML data catalogs ESG CA LAHFS RLS XML data catalogs FTP server ESG Portal Monitoring Discovery ervices DISK SRM works in concert with other Grid components in Earth System Grid (ESG)

43 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Data Replication from BNL to LBNLData Replication from BNL to LBNL 1TB/10K files per week on average In production for over 4 years Event processing in Grid CollectorEvent processing in Grid Collector Prototype uses SRMs and FastBit indexing embedded in STAR framework STAR analysis frameworkSTAR analysis framework Job driven data movement 1.Use BeStMan to bring files into local disk from a remote file repository 2.Execute jobs that access “staged in” files in local disk 3.Job creates an output file on local disk 4.Job uses BeStMan to moves the output file from local storage to remote archival location 5.SRM cleans up local disk when transfer complete 6.Can use any other SRMs implementing v2.2 STAR experiment

44 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 DataMover in HENP-STAR experiment for Robust Multi-file replication over WAN SRM-COPY (thousands of files) SRM-GET (one file at a time) GridFTP GET (pull mode) stage files archive files Network transfer Get list of files From directory Anywhere Disk Cache DataMover (Command-line Interface) BeStMan (performs writes) LBNL Disk Cache BeStMan (performs reads) BNL Create Equivalent directories RRS Catalog Registration Streaming Mode

45 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 File Tracking Shows Recovery From Transient Failures Total: 45 GBs

46 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 SRM-Tester results SRM v2.2 collective view

47 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009Summary Storage Resource Management – essential for GridStorage Resource Management – essential for Grid SRM is a functional definitionSRM is a functional definition Adaptable to different frameworks (currently web-service) Multiple implementations interoperateMultiple implementations interoperate Permit special purpose implementations for unique products Permits interchanging one SRM product by another SRM implementations exist and some in production useSRM implementations exist and some in production use Particle Physics Data Grid Earth System Grid Medicine Fusion More coming … Cumulative experience in OGF GSM-WGCumulative experience in OGF GSM-WG Specifications SRM v2.2 now accepted 47

48 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Project Plan 48

49 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 TeraPaths Tasks and Schedule 0) Planning and design of TeraPaths new functionalities (0, 1) 1) Make decision on layer 2 or layer 3 reservations based on source and target sites; decide on uni- directional or bi-directional reservations (1, 2) 2) Find reservation that minimizes reservedEndTime: design, implement, test, and evaluate distributed reservation negotiation (2, 12) 3) Implement web service based API and necessary code to accommodate the new interaction between SRM and TeraPaths (11, 4) 4) Design and development of “local-null” (consultant) mode (12, 6) 5) Setup TeraPaths testbed between BNL-Michigan and integrate into the end storage systems (SRM/BeStMan) (16, 2) 6) Functional, reliability, and performance tests under stand-alone mode and integrated mode (18, 4) 7) Plan for STAR between BNL-NERSC (12, 12) 8) User Experience Feedback, and final project report (21, 3) Note: the numbers following tasks show: (start month, length of task in months) 49

50 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 BeStMan Tasks and Schedule 0) Planning and design of BeStMan new functionalities (0, 1) 1) Add to BeStMan persistent database to keep network reservation state (1, 6) 2) Provide network module to plug-in TeraPaths (4, 2) 3) Develop module to register information in DB (6, 1) 4) Develop module to find out available bandwidth based on current commitment and policies (6,2) 5) Develop server and client web-service APIs to communicate with user on reservation request and outcome (8, 4) 6) Develop server and client web-service APIs source-BeStMan and target-BeStMan to communicate for both “pull” and “push” modes (12, 4) 7) Setup BeStMan-TP in testbed BNL-Michigan (16, 2) 8) Run basic tests on testbed (18, 2) 9) Run scalability tests on testbed (20, 2) 10) Extend BeStMan for multi-transfer coordination (15, 6) 11) Plan for STAR setup: BNL to NERSC (18, 6) Note: the numbers following tasks show: (start month, length of task in months) 50

51 Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 Management Tasks and Schedule 1) Monitoring (2, 4) 1.1) SRM transfer monitoring (design, maybe put in DB) 1.2) Circuit health monitoring byTeraPaths 2) Diagnosis (6, 6) 2.1) Dynamic bandwidth check by SRM, calling TP 2.2) Query reservation status 3) Error handling (12, 6) 3.1) Dynamic failover to best effort network and re-acquire network circuit for the remaining data transfer (12,2) 3.2) Redundancy mode (acquire more than one circuits and failover to backup(s) instead of best effort) (14,2) 3.3) Notify SRM about unrecoverable failures. (16,2) 4) Flexible authentication framework. design pluggable authentication to allow administrators to choose between Grid certificates and other alternatives, such as plain passwords. (12, 6) Note: the numbers following tasks show: (start month, length of task in months) 51


Download ppt "Kickoff Meeting for 2009 DOE Network Research Projects, 28-29 September 2009 StorNet: Co-Scheduling Network and Storage with TeraPaths and SRM Dantong."

Similar presentations


Ads by Google