Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.

Slides:

Advertisements

Similar presentations

Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.

Advertisements

Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.

Paging: Design Issues. Readings r Silbershatz et al: ,

Performance of Cache Memory

Intro to Computer Org. Pipelining, Part 2 – Data hazards + Stalls.

Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.

Introduction CSCI 444/544 Operating Systems Fall 2008.

COMP 3221: Microprocessors and Embedded Systems Lectures 27: Virtual Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.

1 Virtual Memory vs. Physical Memory So far, all of a job’s virtual address space must be in physical memory However, many parts of programs are never.

Forking & Process Scheduling Vivek Pai / Kai Li Princeton University.

CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.

Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.

CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.

Computer Organization and Architecture

Chapter 23: ARP, ICMP, DHCP IS333 Spring 2015.

Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University

Code Generation CS 480. Can be complex To do a good job of teaching about code generation I could easily spend ten weeks But, don’t have ten weeks, so.

Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.

Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.

FileSecure Implementation Training Patch Management Version 1.1.

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.

Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.

Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.

© 2011 Cisco All rights reserved.Cisco Confidential 1 APP server Client library Memory (Managed Cache) Memory (Managed Cache) Queue to disk Disk NIC Replication.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Distributed File Systems

Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.

Block1 Wrapping Your Nugget Around Distributed Processing.

Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606

CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.

Distributed Backup And Disaster Recovery for AFS A work in progress Steve Simmons Dan Hyde University.

The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.

ALMA Archive Operations Impact on the ARC Facilities.

6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.

Using Map-reduce to Support MPMD Peng

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.

U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.

Derek Wright Computer Sciences Department University of Wisconsin-Madison New Ways to Fetch Work The new hook infrastructure in Condor.

Virtualization and Databases Ashraf Aboulnaga University of Waterloo.

HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.

Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison

CS307 Operating Systems Virtual Memory Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University Spring 2012.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Session 7 Introduction to Inheritance. Accumulator Example a simple calculator app classes needed: –AdderApp - contains main –AddingFrame - GUI –CloseableFrame.

I/O Software CS 537 – Introduction to Operating Systems.

Overview on Web Caching COSC 513 Class Presentation Instructor: Prof. M. Anvari Student name: Wei Wei ID:

HTCondor Security Basics HTCondor Week, Madison 2016 Zach Miller Center for High Throughput Computing Department of Computer Sciences.

Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.

Five todos when moving an application to distributed HTC.

Taverna allows you to automatically iterate through large data sets. This section introduces you to some of the more advanced configuration options for.

Jonathan Walpole Computer Science Portland State University

HTCondor Security Basics

Dynamic Deployment of VO Specific Condor Scheduler using GT4

High Availability in HTCondor

CREAM-CE/HTCondor site

Building Grids with Condor

The Scheduling Strategy and Experience of IHEP HTCondor Cluster

Integration of Singularity With Makeflow

湖南大学-信息科学与工程学院-计算机与科学系

Machine Independent Features

HTCondor Security Basics HTCondor Week, Madison 2016

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

Virtual Memory: Working Sets

Credential Management in HTCondor

In Today’s Class.. General Kernel Responsibilities Kernel Organization

Job Submission Via File Transfer

PU. Setting up parallel universe in your pool and when (not

Presentation transcript:

Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison

› Input files are never reused  From one job to the next  Multiple slots on same machine › Input files are transferred serially from the machine where the job was submitted › This results in the submit machine often transferring multiple copies of the same file simultaneously (bad!), sometimes to the same machine (even worse!). “Problems” with HTCondor 2

› Enter the HTCache! › Runs on the execute machine › Runs under the condor_master just like any other daemon › One daemon serves all users of that machine › Runs with same privilege as the startd HTCache 3

› Cache is on disk › Persists across restarts › Configurable size › Configurable cache replacement policy HTCache 4

› The cache is shared › All slots use same local cache › Even if user is different (data is data!) › Thus, the HTCache needs the ability to write files into a job’s sandbox as the user that will run the job HTCache 5

› Instead of fetching files from the shadow, the job instructs the HTCache to put specific files into the sandbox › If the file is in the cache, the HTCache COPIES the file into the sandbox › Each slot gets its own copy, in case the job decides to modify it. (As opposed to hard or soft-linking the file into the sandbox) Preparing Job Sandbox 6

› If the file is not in the cache, the HTCache fetches the file directly into the sandbox and then possibly adds it to the cache › Wait… possibly? Preparing Job Sandbox 7

› Yes, possibly. › Obvious case: File is larger than cache › Larger question: which files are the best to keep? › Cache policy is one of those things where it is rarely a “one solution works best in all cases” Cache Policy 8

› There are 10 problems in Computer Science:  Caching  Levels of Indirection  Off-by-one errors › Allow flexible caching by adding a level of indirection. Don’t use size, time, etc., but rather the “value” of a file. Cache Policy 9

› How do we determine the value? › Another trick: punt to the admin! › The cache policy is implemented as a plugin, using a dynamically loaded library: double valuationFun (long size, long age, int stickiness, long uses, long bytes_seeded, long time_since_seed) { return (stickiness – age) * size; } Cache Policy 10

› The plugin determines the “value” of a file using the input parameters:  File size  Time file entered cache  Time last accessed  Number of hits  “Stickiness” (This is a hint provided by the submit node… more on that later) Cache Policy 11

› When deciding whether or not to cache a file, the HTCache considers all files currently in the cache, plus the file under consideration › Computes the “value” of each file › Finds the “maximum value cache” that fits in the allocated size › May or may NOT include the file just fetched Cache Policy 12

› There is a submit-side component as well, although it has a slightly different role  Does not have a dedicated disk cache  Instead, serves all files requested by jobs  Periodically scans the queue, counts the number of jobs that use each input file, and broadcasts this “stickiness” value to all HTCache daemons Submit Node HTCache 13

› Suppose I have a cluster of 25 eight-core machines › I have a 1GB input file common to all my jobs (a common scenario for say, BLAST) › I submit 1000 jobs › Old way: Each time a job starts up it transfers the 1GB file to the sandbox (1TB) Example 14

› New way: Each of the 25 machines gets the file once, shares it among all 8 slots, and it persists across jobs › Naïve calculation: 25GB transfer (as opposed to 1TB). › Of course, this ignores competition for the cache. Example 15

› This is where “stickiness” helps › If I submit a separate batch of 50 jobs using a different 1GB input, the HTCache can look at the stickiness and decide not to evict the first 1GB file since 1000 jobs are scheduled to use it is opposed to 50 › It’s possible to write a cache policy tailored to your cluster’s particular workload Example 16

› This already has huge advantages. › Even if cache does nothing useful and makes all the wrong choices, it can do NO WORSE than the existing method of transferring file every time. › A huge advantage: Multiple slots share same cache! (And this advantage grows as number of cores grows) › Massively reduces network load on Schedd Success! 17

HTCache Results 18

› Although the load is reduced, the Schedd is still the single source for all input files However… 19

› What if there was a way to get the files from somewhere else? › Maybe even bits of the files from multiple different sources? › Peer-to-peer? › We already have an HTCache deployed on all the execute nodes… However… 20

BitTorrent 21

› The HTCache running on the submit node acts as a SeedServer › It always has all pieces of files that may be read. If you recall, it is not managing a cache, only serving the already existing files in place. › When a job is submitted, input files are then automatically added to the seed server Submit Node w/ BitTorrent 22

› The HTCache uses BitTorrent to retreive the file directly into the sandbox first. › Optionally adds the file to its own cache › Thus, BitTorrent is used to transfer files even if they won't end up in the cache Execute Node w/ BitTorrent 23

Putting It All Together 24

Putting It All Together 25

› “GradStudent-ware”  Was done as a class project  Doesn’t yet meet the exceedingly high standards for committing into our main code repository. › BitTorrent traffic is completely independent from HTCondor. As such, doesn’t work with the shared_port daemon Project Status 26

› Obvious statement of the year: Caching is good! › Runner-up: Using peer-to-peer file transfer can be faster than one-to-many file transfer! › However, the nature of scientific workloads and multi-core machines creates an environment where these are especially advantageous Conclusion 27

› Thank you! › Questions? Comments? › Ask now, talk to me at lunch, or me at Conclusion 28