Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University.

Similar presentations


Presentation on theme: "Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University."— Presentation transcript:

1 Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison STORK & NeST: Making Data Placement a First Class Citizen in the Grid

2 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 2 Need to move data around..

3 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 3 While doing this.. Locate the data Access heterogeneous resources Recover form all kinds of failures Allocate and de-allocate storage Move the data Clean-up everything All of these need to be done reliably and efficiently!

4 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 4 Stork A scheduler for data placement activities in the Grid What Condor is for computational jobs, Stork is for data placement Stork’s fundamental concept: “Make data placement a first class citizen in the Grid.”

5 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 5 Outline Introduction The Concept Stork Features Big Picture Conclusions

6 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 6 The Concept Stage-in Execute the Job Stage-out Individual Jobs

7 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 7 The Concept Stage-in Execute the Job Stage-out Stage-in Execute the jobStage-outRelease input spaceRelease output space Allocate space for input & output data Individual Jobs

8 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 8 The Concept Stage-in Execute the Job Stage-out Stage-in Execute the jobStage-outRelease input spaceRelease output space Allocate space for input & output data Data Placement Jobs Computational Jobs

9 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 9 DAGMan The Concept Condor Job Queue Data A A.submit Data B B.submit Job C C.submit ….. Parent A child B Parent B child C Parent C child D, E ….. C Stork Job Queue E DAG specification ACB D E F

10 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 10 Why Stork? Stork understands the characteristics and semantics of data placement jobs. Can make smart scheduling decisions, for reliable and efficient data placement.

11 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 11 Understanding Job Characteristics & Semantics Job_type = transfer, reserve, release? Source and destination hosts, files, protocols to use? Determine concurrency level Can select alternate protocols Can select alternate routes Can tune network parameters (tcp buffer size, I/O block size, # of parallel streams) …

12 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 12 Support for Heterogeneity Protocol translation using Stork memory buffer.

13 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 13 Support for Heterogeneity Protocol translation using Stork Disk Cache.

14 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 14 Flexible Job Representation [ Type = “Transfer”; Src_Url = “srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url = “nest://turkey.cs.wisc.edu/kosart/x.dat”; …… ]

15 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 15 Failure Recovery and Efficient Resource Utilization Fault tolerance Just submit a bunch of data placement jobs, and then go away.. Control number of concurrent transfers from/to any storage system Prevents overloading Space allocation and De-allocations Make sure space is available

16 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 16 Outline Introduction The Concept Stork Features Big Picture Conclusions

17 JOB DESCRIPTIONS USER PLANNER Abstract DAG

18 JOB DESCRIPTIONS PLANNER USER WORKFLOW MANAGER Abstract DAG Concrete DAG RLS

19 JOB DESCRIPTIONS PLANNER DATA PLACEMENT SCHEDULER COMPUTATION SCHEDULER USER GRID STORAGE SYSTEMS WORKFLOW MANAGER Abstract DAG Concrete DAG RLS / MCAT GRID COMPUTE NODES

20 JOB DESCRIPTIONS PLANNER DATA PLACEMENT SCHEDULER COMPUTATION SCHEDULER USER GRID STORAGE SYSTEMS WORKFLOW MANAGER Abstract DAG Concrete DAG RLS / MCAT GRID COMPUTE NODES D. JOB LOG FILES C. JOB LOG FILES POLICY ENFORCER

21 JOB DESCRIPTIONS PLANNER DATA PLACEMENT SCHEDULER COMPUTATION SCHEDULER USER GRID STORAGE SYSTEMS WORKFLOW MANAGER Abstract DAG Concrete DAG RLS / MCAT GRID COMPUTE NODES D. JOB LOG FILES C. JOB LOG FILES POLICY ENFORCER DATA MINER NETWORK MONITORING TOOLS FEEDBACK MECHANISM

22 JOB DESCRIPTIONS PLANNER STORK CONDOR/ CONDOR-G USER GRID STORAGE SYSTEMS DAGMAN Abstract DAG Concrete DAG RLS / MCAT GRID COMPUTE NODES D. JOB LOG FILES C. JOB LOG FILES MATCHMAKER DATA MINER NETWORK MONITORING TOOLS FEEDBACK MECHANISM

23 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 23 Conclusions Regard data placement as individual jobs. Treat computational and data placement jobs differently. Introduce a specialized scheduler for data placement. Provide end-to-end automation, fault tolerance, run-time adaptation, multilevel policy support, reliable and efficient transfers.

24 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 24 Future work Enhanced interaction between Stork and higher level planners better coordination of CPU and I/O Interaction between multiple Stork servers and job delegation from one to another Enhanced authentication mechanisms More run-time adaptation

25 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 25 Related Publications Tevfik Kosar and Miron Livny. “Stork: Making Data Placement a First Class Citizen in the Grid”. In Proceedings of 24 th IEEE Int. Conference on Distributed Computing Systems (ICDCS 2004), Tokyo, Japan, March 2004. George Kola, Tevfik Kosar and Miron Livny. “A Fully Automated Fault-tolerant System for Distributed Video Processing and Off-site Replication. To appear in Proceedings of 14 th ACM Int. Workshop on etwork and Operating Systems Support for Digital Audio and Video (Nossdav 2004), Kinsale, Ireland, June 2004. Tevfik Kosar, George Kola and Miron Livny. “A Framework for Self-optimizing, Fault-tolerant, High Performance Bulk Data Transfers in a Heterogeneous Grid Environment”. In Proceedings of 2 nd Int. Symposium on Parallel and Distributed Computing (ISPDC 2003), Ljubljana, Slovenia, October 2003. George Kola, Tevfik Kosar and Miron Livny. “Run-time Adaptation of Grid Data Placement Jobs”. In Proceedings of Int. Workshop on Adaptive Grid Middleware (AGridM 2003), New Orleans, LA, September 2003.

26 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 26 You don’t have to FedEx your data anymore.. Stork delivers it for you! For more information: Email: condor-admin@cs.wisc.edu http://www.cs.wisc.edu/condor/stork

27 www.cs.wisc.edu/condor Reliable and Efficient Grid Data Placement 27 NeST NeST (Network Storage Technology) A lightweight, portable storage manager for data placement activities on the Grid Allocation: NeST negotiates “mini storage contracts” between users and server. Multi-protocol: Supports Chirp, GridFTP, NFS, HTTP Chirp is NeST’s internal protocol. Secure: GSI authentication Lightweight: Configuration and installation can be performed in minutes, and does not require root.

28 www.cs.wisc.edu/condor Why storage allocations ? › Users need both temporary storage, and long-term guaranteed storage. › Administrators need a storage solution with configurable limits and policy. › Administrators will benefit from NeST’s automatic reclamations of expired storage allocations.

29 www.cs.wisc.edu/condor Storage allocations in NeST › Lot – abstraction for storage allocation with an associated handle Handle is used for all subsequent operations on this lot › Client requests lot of a specified size and duration. Server accepts or rejects client request.

30 www.cs.wisc.edu/condor Lot types › User / Group User: single user (user controls ACL) Group: shared use (all members control ACL) › Best effort / Guaranteed Best effort: server may purge data if necessary. Good fit for derived data. Guaranteed: server honors request duration. › Hierarchical: Lots with lots (“sublots”)

31 www.cs.wisc.edu/condor Lot operations › Create, Delete, Update › MoveFile Moves files between lots › AddUser, RemoveUser Lot level authorization List of users allowed to request sub-lots › Attach / Detach Performs NeST lot to path binding

32 www.cs.wisc.edu/condor Functionality: GT4 GridFTP GridFTP Server Disk Module Disk Storage globus-url-copy Sample Application (GSI-FTP)

33 www.cs.wisc.edu/condor NeST Server Functionality: GridFTP +NeST GridFTP Server NeST Module Disk Storage NeST Client Chirp Handler globus-url-copy (Lot operations, etc.) (File transfers) (chirp)(GSI-FTP) (File transfer) (chirp)

34 www.cs.wisc.edu/condor NeST Server NeST with Stork GridFTP Server NeST Module Stork (Lot operations, etc.) (chirp) (File transfers) (GSI-FTP) (File transfer) (chirp) GT4NeST Disk Storage Chirp Handler

35 www.cs.wisc.edu/condor Stork NeST Sample Work DAG NeST AllocateJob Xfer InXfer OutRelease Stork Condor-G

36 www.cs.wisc.edu/condor Connection Manager › Used to control connections to NeST › Allows connection reservations: Reserve # of simultaneous connections Reserve total bytes of transfer Reservations have durations & expire Reservations are persistent

37 www.cs.wisc.edu/condor Release Status › http://www.cs.wisc.edu/condor/nest › v0.9.7 expected soon v0.9.7 Pre 2 released Just bug fixes; features frozen › v1.0 expected later this year › Currently supports Linux, will support other O/S’s in the future.

38 www.cs.wisc.edu/condor Roadmap › Performance tests with Stork › Continue hardening code base › Expand supported platforms Solaris & other UNIX-en › Bundle with Condor › Connection Manager

39 www.cs.wisc.edu/condor Questions ? › More information available at http://www.cs.wisc.edu/condor/nest


Download ppt "Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University."

Similar presentations


Ads by Google