Integration of Singularity With Makeflow

Slides:



Advertisements
Similar presentations
Chapter 20 Oracle Secure Backup.
Advertisements

Windows Deployment Services WDS for Large Scale Enterprises and Small IT Shops Presented By: Ryan Drown Systems Administrator for Krannert.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Processes CSCI 444/544 Operating Systems Fall 2008.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
Condor Project Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
1 port BOSS on Wenjing Wu (IHEP-CC)
Hands-On Virtual Computing
Sun Grid Engine. Grids Grids are collections of resources made available to customers. Compute grids make cycles available to customers from an access.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
MSc. Miriel Martín Mesa, DIC, UCLV. The idea Installing a High Performance Cluster in the UCLV, using professional servers with open source operating.
Section 3.1: Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random.
Distributed Video Rendering using Blender, VirtualBox, and BOINC. Christopher J. Reynolds. Centre for Parallel Computing, University of Westminster.
Chapter 10 Chapter 10: Managing the Distributed File System, Disk Quotas, and Software Installation.
Module 4 Planning for Group Policy. Module Overview Planning Group Policy Application Planning Group Policy Processing Planning the Management of Group.
System Administrator Responsible for? Install OS Network Configuration Security Configuration Patching Backup Performance Management Storage Management.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.
Lecture 02 File and File system. Topics Describe the layout of a Linux file system Display and set paths Describe the most important files, including.
Solaris 가상화 기술 이강산. What is a zone? A zone is a virtual operating system abstraction that provides a protected environment in which applications run.
“Candidates were not advantaged by defining every type of operating system provided as examples in the explanatory notes of the standard. Candidates who.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
Embedded Software Design Week II Linux Intro Linux Kernel.
Five todos when moving an application to distributed HTC.
2Operating Systems  Program that runs on a computer  Manages hardware resources  Allows for execution of programs  Acts as an intermediary between.
Intro To Virtualization Mohammed Morsi
Advanced Computing Facility Introduction
Compute and Storage For the Farm at Jlab
Outline Installing Gem5 SPEC2006 for Gem5 Configuring Gem5.
Welcome to Indiana University Clusters
Containers as a Service with Docker to Extend an Open Platform
Fundamentals Sunny Sharma Microsoft
Create setup scripts simply and easily.
Daniel Templeton, Cloudera, Inc.
HTCondor Security Basics
How to download, configure and run a mapReduce program In a cloudera VM Presented By: Mehakdeep Singh Amrit Singh Chaggar Ranjodh Singh.
Scaling Up Scientific Workflows with Makeflow
Operating Systems Overview
Operating System Structure
Machine Learning Workshop
Containers and Virtualisation
Andrew Pruski SQL Server & Containers
3.2 Virtualisation.
Containers in HPC By Raja.
Solving ETL Bottlenecks with SSIS Scale Out
Overview Introduction VPS Understanding VPS Architecture
“Geek Out”: DIY vSphere 5.1 Lab
Oracle DB and Docker Get Your Dockerized Oracle Sandbox Running in the Cloud or On- Premises Martin Knazovicky Dbvisit Software.
Versatile HPC: Comet Virtual Clusters for the Long Tail of Science SC17 Denver Colorado Comet Virtualization Team: Trevor Cooper, Dmitry Mishin, Christopher.
Introduction to Introduction to Singularity
Haiyan Meng and Douglas Thain
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
HTCondor Security Basics HTCondor Week, Madison 2016
Weaving Abstractions into Workflows
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
HC Hyper-V Module GUI Portal VPS Templates Web Console
What’s New in Work Queue
MAINTAINING SERVER AVAILIBILITY
Overview of Workflows: Why Use Them?
LO2 – Understand Computer Software
LO3 – Understand Business IT Systems
Azure Container Service
SCCM Advanced Task Sequence Part 1
Presentation transcript:

Integration of Singularity With Makeflow Kyle Sweeney University of Notre Dame ksweene3@nd.edu

Main design of an HPC Environment Compute Nodes with small discs In an abstract sense, an HPC cluster is a series of computation nodes, each with a small amount of local storage, and all connected to a massive FileSystem which maintains all of the storage for the system. Massive Parallel File System

Makeflow and Work Queue with HPC Work nodes with Workers Computer running Makeflow and Work Queue Makeflow then uses a work managing system, such as Workqueue, to create workers on the nodes, and distribute jobs. Massive Parallel File System

Anatomy of a Node with Workers Scratch Space Workqueue sets up a central cache on every host machine where workers exist. When a task needs an input file, the file is copied over into this central cache, and then symlinked into the worker’s temporary work space. This enables us to reuse input files, and save of network traffic if multiple tasks require the same input. This is also the default operation, which can be changed when using Workqueue, but is the main assumption of Makeflow, for a general Makeflow script. While in Makeflow we can specify the hardware requirements for tasks (cores, HDD space, memory capacity), it doesn’t allow us to specify software requirements, which is where containers come in, namely , singularity Symlink

Singularity No root Daemon! Does not manage a database of images! Does not attempt any kind of scheduling! No root daemon: simply need to be root to install it Doesn’t have a database of images: users manage the images themselves Doesn’t do scheduling: runs as a user process, behaving just like every other process

Main Benefits of Singularity Democratization of Resource Management! Everything needs to be done by the user And Singularity gives tools to do this Singularity gives tools to create images exactly how users want them, packages, go inside of the images to install and set variables, even import from other container tools, such as docker.

Using Singularity and Makeflow Two methods of using Singularity with Makeflow Global Wrapper: “makeflow –T wq –N mytasks --singularity=<img> script.mf” Will wrap every tasks in “script.mf” in Singularity “singularity exec <img> sh –c ‘task1’ ” Per-Task wrapper: Inside of your makeflow script, you can simply add singularity to the start of you task, and include the image file “foo.out: singularity mytask myfile.img foo.in singularity exec myfile.img mytask foo.in –o foo.out” When you run creating your workers, you need to specify to run on machines that you know has singularity installed.

Singularity and a Node with Workers Scratch Space By default, when running a command in a container with singularity, the image is read-only. This is due to how singularity handles privilege levels inside of the container. To be root inside the container, you need to be root outside of the container. Singularity thus only allows changes to files inside the container if the calling user is root. This is to our advantage though, because now every worker can share the same image for their task. This massively saves on network traffic, as images tend to be large. Symlink

Problems with Singularity Creating and Modifying an image file requires the user to be root Image Files are large Folders linked into Singularity Modifying and creation: Thus, users need to be able to do “sudo” or become root to create their image file for their jobs Large Image Files: Image files by default are 1024MB, but not really, Image files in our version of Singularity are sparse-thus they are really only around 32MB large at the most bare bones (just the OS), but the filesystem and network reads them as 1GB

Biggest issue with integration Singularity attempts to bind 3 important locations into a container /tmp Current working directory $HOME/ When a user requests a worker, that worker’s owner is that user Singularity attempted to bind the user calling Makeflow and Workqueue’s home directory into the container Worker’s owner is the user: Thus all programs called by that worker are owned by the user Attempted to bind /home to container: Due to AFS permissions, this isn’t allowed

Solutions First: change config file on every node; there’s an option in the config files whether to bind the user’s home directory. By default, this is set to True.

Solution Second: set home directory in singularity. We modified it to add a switch, and set it in makeflow global wrapper to use “--change-home `pwd`” This is better because it puts less work on the sys-admins when installing singularity, it allows for more future flexibility, and the developers of singularity wanted this to be a feature anyway

Testing Singularity

CPU Measurement Modified the shrimp workflow Performed this test 3 different ways Natively, Using Docker, and Using Singularity Each worker was requested to have 1 Core, 2000MB memory Performed this test around 150 times Modified shrimp: a bio-informatics workflow that’s CPU heavy, Reduced the workflow to just the first setup task done locally, the first 50 tasks, and the final combination step also done locally, Only looked at the results of the 50 tasks, as those were on the “disc” cluster

X-axis is the test number, and the y axis is how much time each item took to run.

CPU Measurement This is an upclose examination of the data, around the middle. It shows off how singularity and native are basically the same, sometimes going higher than the other, but generally the same. Docker, on the other hand, is always higher.

Disc I/O Measurement Modified a test which tested the overhead of different file systems The test nests singularity calls (e.g singularity exec img1.img singularity exec img2.img my_task.sh) The images were made ahead of time, all existing flat, not nested inside of each other Required sudo to work, thus ran on an Ubuntu VM on my laptop Modified test: nest calls, it reads and writes a certain amount of data, tested it 10 deep but got it to go 20 deep, and the amount of data read/written were each their own category of tests, at 2GB, 4GB, 8GB

Disc I/O Overhead Talk about the X and Y values, this is the 2GB files example

Conclusion Singularity fits into our Makeflow+Work Queue system very well, thanks to Singularity’s democratization of resource management Requires users to create their own images to match their work Negligible CPU overhead, and minor I/O overhead

Singularity: http://singularity.lbl.gov Makeflow + Singularity – out soon! CCL Website: http://ccl.cse.nd.edu Email: ksweene3@nd.edu