AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

IBM Software Group ® Design Thoughts for JDSL 2.0 Version 0.2.
2013 Trend Micro 25th Anniversary 2014/6/11 Amazon Web Service AWS Technical Professional.
SLA-Oriented Resource Provisioning for Cloud Computing
© 2013 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Jim Donahue | Principal Scientist Adobe Systems Technology Lab Flint: Making.
Spark: Cluster Computing with Working Sets
ProActive Task Manager Component for SEGL Parameter Sweeping Natalia Currle-Linde and Wasseim Alzouabi High Performance Computing Center Stuttgart (HLRS),
HTCondor at Cycle Computing: Better Answers. Faster. Ben Cotton Senior Support Engineer.
Workload Management Massimo Sgaravatto INFN Padova.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Processes and OS basics. RHS – SOC 2 OS Basics An Operating System (OS) is essentially an abstraction of a computer As a user or programmer, I do not.
Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.
Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
AMH001 (acmse03.ppt - 03/7/03) REMOTE++: A Script for Automatic Remote Distribution of Programs on Windows Computers Ashley Hopkins Department of Computer.
SpotADAPT: Spot-Aware (re-)Deployment of Analytical Processing Tasks on Amazon EC2 by Dalia Kaulakiene, Aalborg University (Denmark) Christian Thomsen,
1.1 Sandeep TayalCSE Department MAIT 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming Batched Systems Time-Sharing Systems.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
1 OPERATING SYSTEMS. 2 CONTENTS 1.What is an Operating System? 2.OS Functions 3.OS Services 4.Structure of OS 5.Evolution of OS.
Large-scale cluster management at Google with Borg
SQL Database Management
Microsoft Build /1/2017 1:25 AM Azure Batch
HPC In The Cloud Case Study: Proteomics Workflow
Applied Operating System Concepts
TensorFlow– A system for large-scale machine learning
OPERATING SYSTEMS CS 3502 Fall 2017
Chapter 1: Introduction
OpenPBS – Distributed Workload Management System
Chapter 1: Introduction to Systems Analysis and Design
Aaron Harlap, Alexey Tumanov*, Andrew Chung, Greg Ganger, Phil Gibbons
ITCS-3190.
Working With Azure Batch AI
OPERATING SYSTEMS CS3502 Fall 2017
Spark Presentation.
Chapter 1: Introduction
Chapter 1: Introduction
Parallel Programming By J. H. Wang May 2, 2017.
Architecture & System Overview
Parallel Algorithm Design
Chapter 1: Introduction
Chapter 1: Introduction
AWS COURSE DEMO BY PROFESSIONAL-GURU. Amazon History Ladder & Offering.
Real-time Software Design
Database Performance Tuning and Query Optimization
Chapter 1: Introduction
AWS DevOps Engineer - Professional dumps.html Exam Code Exam Name.
2018 Amazon AWS DevOps Engineer Professional Dumps - DumpsProfessor
Anne Pratoomtong ECE734, Spring2002
Azure Container Instances
Haiyan Meng and Douglas Thain
Threads and Data Sharing
Chapter 2: System Structures
Basic Grid Projects – Condor (Part I)
Operating System Concepts
Chapter 1: Introduction
Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.
Docker in AWS ECS.
Introduction to Operating Systems
Creating a Dynamic HPC Infrastructure with Platform Computing
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 11 Database Performance Tuning and Query Optimization
Chapter 1: Introduction
Presented By: Darlene Banta
Overview of Workflows: Why Use Them?
Chapter 1: Introduction
Operating System Concepts
Chapter 1: Introduction to Systems Analysis and Design
Chapter 1: Introduction
Presentation transcript:

AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017

Intro to Batch Computing

What is batch computing? Run jobs asynchronously and automatically across one or more computers. Jobs may have dependencies, making the sequencing and scheduling of multiple jobs complex and challenging.

RAM I/O GPU Storage CPU FPGA Job Queue R4 I3 P2 D2 C5 F1

AWS Batch Overview & Concepts

Introducing AWS Batch Fully managed batch primitives Focus on your applications (shell scripts, Linux executables, Docker images) and their resource requirements We take care of the rest!

AWS Batch Concepts Jobs Job definitions Job queue Compute environments Scheduler

Job Definitions Similar to ECS task definitions, AWS Batch job definitions specify how jobs are to be run. While each job must reference a job definition, many parameters can be overridden. Some of the attributes specified in a job definition: IAM role associated with the job vCPU and memory requirements Mount points Container properties Environment variables Retry strategy $ aws batch register-job-definition --job-definition-name gatk --container-properties ...

Jobs Jobs are the unit of work executed by AWS Batch as containerized applications running on Amazon EC2. Containerized jobs can reference a container image, command, and parameters. Or, users can fetch a .zip containing their application and run it on a Amazon Linux container. $ aws batch submit-job --job-name variant-calling --job-definition gatk --job-queue genomics

Easily run massively parallel jobs Today, users can submit a large number of independent “simple jobs.” Soon, we will add support for “array jobs” that run many copies of an application against an array of elements. Array jobs are an efficient way to run: Parametric sweeps Monte Carlo simulations Processing a large collection of objects These use cases are still possible today. Simply submit more jobs.

Workflows, Pipelines, and Job Dependencies Jobs can express a dependency on the successful completion of other jobs or specific elements of an array job. Use your preferred workflow engine and language to submit jobs. Flow-based systems simply submit jobs serially, while DAG-based systems submit many jobs at one time, identifying inter-job dependencies. $ aws batch submit-job –depends-on 606b3ad1-aa31-48d8-92ec-f154bfc8215f ...

Workflows and Job Dependencies

Compute Environments Job queues are mapped to one or more compute environments containing the EC2 instances used to run containerized batch jobs. Managed compute environments enable you to describe your business requirements (instance types, min/max/desired vCPUs, and Spot Instance bid as a % of the On-Demand price) and we launch and scale resources on your behalf. You can choose specific instance types (e.g., c4.8xlarge) or instance families (e.g., C4, M4, R4). Or, choose “optimal” and AWS Batch launches appropriately sized instances from our more modern instance families.

Customer Provided AMIs Customer Provided AMIs let you set the AMI that is launched as part of a managed compute environment. Makes it possible to configure Docker settings, mount EBS/EFS volumes, and configure drivers for GPU jobs. AMIs must be Linux-based, HVM and have a working ECS agent installation.

Compute Environments Alternatively, you can launch and manage your own resources within an unmanaged compute environment. Your instances need to include the ECS agent and run supported operating systems and Docker versions. AWS Batch then create an ECS cluster which can accept the instances you launch. Jobs can be scheduled to your compute environment as soon as your instances are healthy and register with the ECS agent. $ aws batch create-compute-environment --compute-environment-name unmanagedce --type UNMANAGED ...

Job Queues Jobs are submitted to a job queue, where they reside until they are able to be scheduled to a compute resource. Information related to completed jobs persists in the queue for 24 hours. Job queues support priorities and multiple queues can schedule work to the same compute environment. $ aws batch create-job-queue --job-queue-name genomics --priority 500 --compute-environment-order ...

Scheduler The scheduler evaluates when, where, and how to run jobs that have been submitted to a job queue. Jobs run in approximately the order in which they are submitted, as long as all dependencies on other jobs have been met.

Common AWS Batch Configurations You can achieve different objectives via AWS Batch through service configuration and solution architectures: Cost-optimized Minimize operational overhead Work can happen any time over a multi- hour period (or a weekend) Monte-Carlo simulations or bulk loan application processing Resource-optimized Budget constraints Multiple job queues, priorities, sharing compute environments Existing compute resources that are available / underutilized (RI, SF, etc.) RI Time-optimized Workloads with firm deadlines Queue w/ primary compute environment using RIs and fixed capacity and a secondary Spot CE Financial settlement Business Priorities drive architectural configurations: Cost – Not overly time sensitive, cost the primary concern Resource (RI’s) – If the customer is already paying for resources, AWS Batch can help ensure they get fully utilized Time (SLA oriented)

Any questions?