Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University.

Slides:



Advertisements
Similar presentations
Condor Project Computer Sciences Department University of Wisconsin-Madison Introduction Condor.
Advertisements

AT LOUISIANA STATE UNIVERSITY CCT: Center for Computation & LSU Stork Data Scheduler: Current Status and Future Directions Sivakumar Kulasekaran.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
GridFTP: File Transfer Protocol in Grid Computing Networks
1 Using Stork Barcelona, 2006 Condor Project Computer Sciences Department University of Wisconsin-Madison
Condor Project Computer Sciences Department University of Wisconsin-Madison Stork An Introduction Condor Week 2006 Milan.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
Globus Computing Infrustructure Software Globus Toolkit 11-2.
Jim Basney Computer Sciences Department University of Wisconsin-Madison Managing Network Resources in.
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
GridFTP Guy Warner, NeSC Training.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Dynamic Firewalls and Service Deployment Models for Grid Environments Gian Luca Volpato, Christian Grimm RRZN – Leibniz Universität Hannover Cracow Grid.
Networked Storage Technologies Douglas Thain University of Wisconsin GriPhyN NSF Project Review January 2003 Chicago.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Hao Wang Computer Sciences Department University of Wisconsin-Madison Security in Condor.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Welcome and Condor Project Overview.
INFSO-RI Enabling Grids for E-sciencE DAGs with data placement nodes: the “shish-kebab” jobs Francesco Prelz Enzo Martelli INFN.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Nick LeRoy & Jeff Weber Computer Sciences Department University of Wisconsin-Madison Managing.
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar and Miron Livny University of Wisconsin-Madison March 25 th, 2004 Tokyo, Japan.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
Cluster 2004 San Diego, CA A Client-centric Grid Knowledgebase George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison September 23 rd,
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
Flexibility, Manageability and Performance in a Grid Storage Appliance John Bent, Venkateshwaran Venkataramani, Nick Leroy, Alain Roy, Joseph Stanley,
Storage Research Meets The Grid Remzi Arpaci-Dusseau.
STORK: Making Data Placement a First Class Citizen in the Grid Tevfik Kosar University of Wisconsin-Madison May 25 th, 2004 CERN.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
1 Stork: State of the Art Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
A Managed Object Placement Service (MOPS) using NEST and GridFTP Dr. Dan Fraser John Bresnahan, Nick LeRoy, Mike Link, Miron Livny, Raj Kettimuthu SCIDAC.
Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Managing Job.
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.
George Kola Computer Sciences Department University of Wisconsin-Madison Data Pipelines: Real Life Fully.
Reliable and Efficient Grid Data Placement using Stork and DiskRouter Tevfik Kosar University of Wisconsin-Madison April 15 th, 2004.
OAK RIDGE NATIONAL LABORATORY U. S. DEPARTMENT OF ENERGY The stagesub tool Sudharshan S. Vazhkudai Computer Science Research Group CSMD Oak Ridge National.
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
An Introduction to GPFS
Enabling Grids for E-sciencE Agreement-based Workload and Resource Management Tiziana Ferrari, Elisabetta Ronchieri Mar 30-31, 2006.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Job Delegation and Planning.
Introduction to Distributed Platforms
StoRM: a SRM solution for disk based storage systems
DUCKS – Distributed User-mode Chirp-Knowledgeable Server
Outline What users want ? Data pipeline overview
STORK: A Scheduler for Data Placement Activities in Grid
NeST: Network Storage Technologies
Wide Area Workload Management Work Package DATAGRID project
Experiences in Running Workloads over OSG/Grid3
Presentation transcript:

Peter F. Couvares (based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber) Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison STORK & NeST: Making Data Placement a First Class Citizen in the Grid

Reliable and Efficient Grid Data Placement 2 Need to move data around..

Reliable and Efficient Grid Data Placement 3 While doing this.. Locate the data Access heterogeneous resources Recover form all kinds of failures Allocate and de-allocate storage Move the data Clean-up everything All of these need to be done reliably and efficiently!

Reliable and Efficient Grid Data Placement 4 Stork A scheduler for data placement activities in the Grid What Condor is for computational jobs, Stork is for data placement Stork’s fundamental concept: “Make data placement a first class citizen in the Grid.”

Reliable and Efficient Grid Data Placement 5 Outline Introduction The Concept Stork Features Big Picture Conclusions

Reliable and Efficient Grid Data Placement 6 The Concept Stage-in Execute the Job Stage-out Individual Jobs

Reliable and Efficient Grid Data Placement 7 The Concept Stage-in Execute the Job Stage-out Stage-in Execute the jobStage-outRelease input spaceRelease output space Allocate space for input & output data Individual Jobs

Reliable and Efficient Grid Data Placement 8 The Concept Stage-in Execute the Job Stage-out Stage-in Execute the jobStage-outRelease input spaceRelease output space Allocate space for input & output data Data Placement Jobs Computational Jobs

Reliable and Efficient Grid Data Placement 9 DAGMan The Concept Condor Job Queue Data A A.submit Data B B.submit Job C C.submit ….. Parent A child B Parent B child C Parent C child D, E ….. C Stork Job Queue E DAG specification ACB D E F

Reliable and Efficient Grid Data Placement 10 Why Stork? Stork understands the characteristics and semantics of data placement jobs. Can make smart scheduling decisions, for reliable and efficient data placement.

Reliable and Efficient Grid Data Placement 11 Understanding Job Characteristics & Semantics Job_type = transfer, reserve, release? Source and destination hosts, files, protocols to use? Determine concurrency level Can select alternate protocols Can select alternate routes Can tune network parameters (tcp buffer size, I/O block size, # of parallel streams) …

Reliable and Efficient Grid Data Placement 12 Support for Heterogeneity Protocol translation using Stork memory buffer.

Reliable and Efficient Grid Data Placement 13 Support for Heterogeneity Protocol translation using Stork Disk Cache.

Reliable and Efficient Grid Data Placement 14 Flexible Job Representation [ Type = “Transfer”; Src_Url = “srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url = “nest://turkey.cs.wisc.edu/kosart/x.dat”; …… ]

Reliable and Efficient Grid Data Placement 15 Failure Recovery and Efficient Resource Utilization Fault tolerance Just submit a bunch of data placement jobs, and then go away.. Control number of concurrent transfers from/to any storage system Prevents overloading Space allocation and De-allocations Make sure space is available

Reliable and Efficient Grid Data Placement 16 Outline Introduction The Concept Stork Features Big Picture Conclusions

JOB DESCRIPTIONS USER PLANNER Abstract DAG

JOB DESCRIPTIONS PLANNER USER WORKFLOW MANAGER Abstract DAG Concrete DAG RLS

JOB DESCRIPTIONS PLANNER DATA PLACEMENT SCHEDULER COMPUTATION SCHEDULER USER GRID STORAGE SYSTEMS WORKFLOW MANAGER Abstract DAG Concrete DAG RLS / MCAT GRID COMPUTE NODES

JOB DESCRIPTIONS PLANNER DATA PLACEMENT SCHEDULER COMPUTATION SCHEDULER USER GRID STORAGE SYSTEMS WORKFLOW MANAGER Abstract DAG Concrete DAG RLS / MCAT GRID COMPUTE NODES D. JOB LOG FILES C. JOB LOG FILES POLICY ENFORCER

JOB DESCRIPTIONS PLANNER DATA PLACEMENT SCHEDULER COMPUTATION SCHEDULER USER GRID STORAGE SYSTEMS WORKFLOW MANAGER Abstract DAG Concrete DAG RLS / MCAT GRID COMPUTE NODES D. JOB LOG FILES C. JOB LOG FILES POLICY ENFORCER DATA MINER NETWORK MONITORING TOOLS FEEDBACK MECHANISM

JOB DESCRIPTIONS PLANNER STORK CONDOR/ CONDOR-G USER GRID STORAGE SYSTEMS DAGMAN Abstract DAG Concrete DAG RLS / MCAT GRID COMPUTE NODES D. JOB LOG FILES C. JOB LOG FILES MATCHMAKER DATA MINER NETWORK MONITORING TOOLS FEEDBACK MECHANISM

Reliable and Efficient Grid Data Placement 23 Conclusions Regard data placement as individual jobs. Treat computational and data placement jobs differently. Introduce a specialized scheduler for data placement. Provide end-to-end automation, fault tolerance, run-time adaptation, multilevel policy support, reliable and efficient transfers.

Reliable and Efficient Grid Data Placement 24 Future work Enhanced interaction between Stork and higher level planners better coordination of CPU and I/O Interaction between multiple Stork servers and job delegation from one to another Enhanced authentication mechanisms More run-time adaptation

Reliable and Efficient Grid Data Placement 25 Related Publications Tevfik Kosar and Miron Livny. “Stork: Making Data Placement a First Class Citizen in the Grid”. In Proceedings of 24 th IEEE Int. Conference on Distributed Computing Systems (ICDCS 2004), Tokyo, Japan, March George Kola, Tevfik Kosar and Miron Livny. “A Fully Automated Fault-tolerant System for Distributed Video Processing and Off-site Replication. To appear in Proceedings of 14 th ACM Int. Workshop on etwork and Operating Systems Support for Digital Audio and Video (Nossdav 2004), Kinsale, Ireland, June Tevfik Kosar, George Kola and Miron Livny. “A Framework for Self-optimizing, Fault-tolerant, High Performance Bulk Data Transfers in a Heterogeneous Grid Environment”. In Proceedings of 2 nd Int. Symposium on Parallel and Distributed Computing (ISPDC 2003), Ljubljana, Slovenia, October George Kola, Tevfik Kosar and Miron Livny. “Run-time Adaptation of Grid Data Placement Jobs”. In Proceedings of Int. Workshop on Adaptive Grid Middleware (AGridM 2003), New Orleans, LA, September 2003.

Reliable and Efficient Grid Data Placement 26 You don’t have to FedEx your data anymore.. Stork delivers it for you! For more information:

Reliable and Efficient Grid Data Placement 27 NeST NeST (Network Storage Technology) A lightweight, portable storage manager for data placement activities on the Grid Allocation: NeST negotiates “mini storage contracts” between users and server. Multi-protocol: Supports Chirp, GridFTP, NFS, HTTP Chirp is NeST’s internal protocol. Secure: GSI authentication Lightweight: Configuration and installation can be performed in minutes, and does not require root.

Why storage allocations ? › Users need both temporary storage, and long-term guaranteed storage. › Administrators need a storage solution with configurable limits and policy. › Administrators will benefit from NeST’s automatic reclamations of expired storage allocations.

Storage allocations in NeST › Lot – abstraction for storage allocation with an associated handle Handle is used for all subsequent operations on this lot › Client requests lot of a specified size and duration. Server accepts or rejects client request.

Lot types › User / Group User: single user (user controls ACL) Group: shared use (all members control ACL) › Best effort / Guaranteed Best effort: server may purge data if necessary. Good fit for derived data. Guaranteed: server honors request duration. › Hierarchical: Lots with lots (“sublots”)

Lot operations › Create, Delete, Update › MoveFile Moves files between lots › AddUser, RemoveUser Lot level authorization List of users allowed to request sub-lots › Attach / Detach Performs NeST lot to path binding

Functionality: GT4 GridFTP GridFTP Server Disk Module Disk Storage globus-url-copy Sample Application (GSI-FTP)

NeST Server Functionality: GridFTP +NeST GridFTP Server NeST Module Disk Storage NeST Client Chirp Handler globus-url-copy (Lot operations, etc.) (File transfers) (chirp)(GSI-FTP) (File transfer) (chirp)

NeST Server NeST with Stork GridFTP Server NeST Module Stork (Lot operations, etc.) (chirp) (File transfers) (GSI-FTP) (File transfer) (chirp) GT4NeST Disk Storage Chirp Handler

Stork NeST Sample Work DAG NeST AllocateJob Xfer InXfer OutRelease Stork Condor-G

Connection Manager › Used to control connections to NeST › Allows connection reservations: Reserve # of simultaneous connections Reserve total bytes of transfer Reservations have durations & expire Reservations are persistent

Release Status › › v0.9.7 expected soon v0.9.7 Pre 2 released Just bug fixes; features frozen › v1.0 expected later this year › Currently supports Linux, will support other O/S’s in the future.

Roadmap › Performance tests with Stork › Continue hardening code base › Expand supported platforms Solaris & other UNIX-en › Bundle with Condor › Connection Manager

Questions ? › More information available at