Reliable I/O on the Grid Douglas Thain and Miron Livny Condor Project University of Wisconsin.

Slides:

Advertisements

Similar presentations

Operating System.

Advertisements

Operating System Structures

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.

Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.

A Computation Management Agent for Multi-Institutional Grids

CS 345 Computer System Overview

GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Douglas Thain and Miron Livny Computer Sciences Department University of Wisconsin-Madison

W4118 Operating Systems OS Overview Junfeng Yang.

The Kangaroo Approach to Data Movement on the Grid Douglas Thain, Jim Basney, Se-Chang Son, and Miron Livny

The Kangaroo Approach to Data Movement on the Grid Jim Basney, Miron Livny, Se-Chang Son, and Douglas Thain Condor Project University of Wisconsin.

The Kangaroo Approach to Data Movement on the Grid Douglas Thain, Jim Basney, Se-Chang Son, and Miron Livny Condor Project University of Wisconsin.

GRID Workload Management System Massimo Sgaravatto INFN Padova.

CS533 - Concepts of Operating Systems

Common System Components

Workload Management Massimo Sgaravatto INFN Padova.

The Condor Data Access Framework GridFTP / NeST Day 31 July 2001 Douglas Thain.

Lecture 1: Introduction CS170 Spring 2015 Chapter 1, the text book. T. Yang.

A. Frank - P. Weisberg Operating Systems Evolution of Operating Systems.

16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.

Member of the ExperTeam Group Ralf Ratering Pallas GmbH Hermülheimer Straße Brühl, Germany

The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin

The Origin of the VM/370 Time-sharing system Presented by Niranjan Soundararajan.

Minerva Infrastructure Meeting – October 04, 2011.

I/O Systems ◦ Operating Systems ◦ CS550. Note:  Based on Operating Systems Concepts by Silberschatz, Galvin, and Gagne  Strongly recommended to read.

Process Management A process is a program in execution. It is a unit of work within the system. Program is a passive entity, process is an active entity.

GridFTP Guy Warner, NeSC Training.

Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.

GRAPPA Part of Active Notebook Science Portal project A “notebook” like GRAPPA consists of –Set of ordinary web pages, viewable from any browser –Editable.

Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.

Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.

Networked Storage Technologies Douglas Thain University of Wisconsin GriPhyN NSF Project Review January 2003 Chicago.

Distributed File Systems

Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.

BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:

OSes: 1. Intro 1 Operating Systems v Objectives –introduce Operating System (OS) principles Certificate Program in Software Development CSE-TC and CSIM,

File and Object Replication in Data Grids Chin-Yi Tsai.

Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.

MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.

1 The Kangaroo approach to Data movement on the Grid Rajesh Rajamani June 03, 2002.

Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.

Flexibility, Manageability and Performance in a Grid Storage Appliance John Bent, Venkateshwaran Venkataramani, Nick Leroy, Alain Roy, Joseph Stanley,

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

Operating Systems 1 K. Salah Module 1.2: Fundamental Concepts Interrupts System Calls.

Error Scope on a Computational Grid Douglas Thain University of Wisconsin 4 March 2002.

File Systems cs550 Operating Systems David Monismith.

Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.

The Kangaroo Approach to Data Movement on the Grid Author: D. Thain, J. Basney, S.-C. Son, and M. Livny From: HPDC 2001 Presenter: NClab, KAIST, Hyonik.

Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.

I/O Software CS 537 – Introduction to Operating Systems.

Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.

Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 10: Mass-Storage Systems.

2. OPERATING SYSTEM 2.1 Operating System Function

Operating System.

Migratory File Services for Batch-Pipelined Workloads

US CMS Testbed.

TYPES OFF OPERATING SYSTEM

Grid Canada Testbed using HEP applications

DUCKS – Distributed User-mode Chirp-Knowledgeable Server

湖南大学-信息科学与工程学院-计算机与科学系

Initial job submission and monitoring efforts with JClarens

Co-designed Virtual Machines for Reliable Computer Systems

Presentation transcript:

Reliable I/O on the Grid Douglas Thain and Miron Livny Condor Project University of Wisconsin

Outline  A Practical Problem Half-Interactive Jobs Solution: The Grid Console  Philosophical Musings  A New System: Kangaroo

Problem: “Half-Interactive” Jobs  Users want to submit batch jobs to the Grid, but still be able to monitor the output interactively.  But, network failures are expected as a matter of course, so keeping the job running takes priority over getting output.  Examples: INFN: Collider event simulation and reconstruction with CMS NCSA: Modelling with Gaussian

Existing Tools are not Sufficient  Installing a uniform world-wide DFS is not feasible. Even if it were: NFS: disconnect causes delay AFS: close() can fail?!?  Condor Vanilla: dependent on file system. Standard: disconnect causes rollback.  GASS Staging mode: no incremental output. Append mode: no easy failure recovery.

Solution: The Grid Console  Trap reads and writes on stdio and send them via RPCs to be executed at the home site.  If connection is lost, just keep writing to disk but retry connection periodically.  If re-made, send all spooled data back and then continue operation.

Solution: The Grid Console APP GC SHADOW Execution SiteStorage Site BYPASS GC AGENT FILE SYSTEM SPOOL DIR RPC on TCP Stdin, stdout, stderr Existing storage system: NFS, AFS, GASS, etc. Other files Globus Auth

Observations on the Grid Console  Interfaces well with existing systems: Applied to vanilla Condor(G) jobs. Works on any dynamically-linked program.  Undesired properties: Only applies to standard streams. Job is blocked during recovery mode.  Strange property: Disconnected mode might be faster than connected mode! Can we have it both ways?

Philosophical Musings  What have we done?  Hidden errors Job is not designed to deal with unusual error conditions: – –Write -> disconnected? – –Close -> host not found?  Hidden latency Job is not designed to deal with slow I/O. It assumes that I/O ops are low latency, or at least appear to be. GC could be better at this.

Philosophical Musings, #2  These problems are one and the same: Hiding errors: Retry, report the error to a third party, and use another resource to satisfy the request. Hiding latency: Use another resource to satisfy the request in the background, but if an error occurs, there is no channel to report it.  Reliability is not a binary property. A slow link can be just as damaging to throughput as a disconnection.

Philosophical Musings, #3  A traditional OS deals with these same problems when it uses memory to buffer disk operations.  Let’s apply the same principle to the Grid: Use memory and disk to satisfy unscheduled I/O operations in the background.

Introducing Kangaroo - A user-level data movement system that ‘hops’ files piecemeal from node to node on the Grid. - A background process that will ‘fight’ for your jobs’ I/O needs. - A ‘damage control’ specialist that will give errors to a third party but never admit failure to the job.

Our Vision: A Grid File System File System File System File System K K K K K K K Data Movement System App Disk

Kangaroo Prototype  We have built a first-try Kangaroo that validates the central ideas of error and latency hiding.  Emphasis on high-level reliability and throughput, not on low-level optimizations.  First, work to improve writes, but leave room in the design to improve reads.

User Interface  Like the GC, attach standard applications with Bypass. A tool for trapping UNIX I/O operations and routing them through new code. Works on any dynamically-linked, unmodified program.  Examples: setenv LD_PRELOAD pfs_agent.so vi kangaroo://coral.cs.wisc.edu/etc/hosts gcc gsiftp://ftp/input.c -o kangaroo://host/out

Kangaroo Prototype APP KANGAROO AGENT K SERVER SPOOL DIR K MOVER K SERVER FILE SYSTEM Execution SiteStorage Site BYPASS Writes Reads

Microbenchmark: File Transfer  Create a large output file at the execution site, and send it to a storage site.  Ideal conditions: No competition for cpu, network, or disk bandwidth.  Three methods: Stream output directly to target. Stage output to disk, then copy to target. Kangaroo

Macrobenchmark: Image Processing  Post-processing of satellite image data: Need to compute various enhancements and produce output for each. Read input image For I=1 to N – –Compute transformation of image – –Write output image  Example: Image size about 5 MB Compute time about 6 sec IO-cpu ratio.91 MB/s

I/O Models for Image Processing OUTPUT CPU OUTPUT Online I/O: Offline I/O: Current Kangaroo: INPUT OUTPUT CPU OUTPUT CPUOUTPUTINPUTOUTPUTCPU OUTPUT CPUOUTPUTINPUTOUTPUTCPU PUSH

Summary of Results  At the micro level, our prototype provides reliability with reasonable performance.  At the macro level, I/O overlap gives reliability and speedups (for some applications.)  Kangaroo allows the application to survive on its real I/O needs:.91 MB/s. Without it, there is ‘false pressure’ to provide fast networks.

Research Problems  Virtual Memory A K-node has one input, one output, and a memory/disk buffer. How should we move data to maximize throughput?  File System Existing spool directory is clumsy and inefficient. Need a fs optimized for 1-write, 1-read, 1-delete.  Fine-Grained Scheduling Reads should have priority over writes. This is easy at one node, but multiple nodes?

Conclusion  The Grid is BYOFS.  Error hiding and latency hiding are tightly- knit problems.  The solution to both is to overlap I/O and computation.  The benefits of high-level overlap can outweigh any low-level inefficienies.

Conclusion  Need more info?  Demo time: Wednesday, 9-12 AM Room 3381 CS  Questions now?