Elastic Applications in the Cloud Dinesh Rajan University of Notre Dame CCL Workshop, June 2012.

Slides:



Advertisements
Similar presentations
Hello i am so and so, title/role and a little background on myself (i.e. former microsoft employee or anything interesting) set context for what going.
Advertisements

1 Real-World Barriers to Scaling Up Scientific Applications Douglas Thain University of Notre Dame Trends in HPDC Workshop Vrije University, March 2012.
SALSA HPC Group School of Informatics and Computing Indiana University.
Experience with Adopting Clouds at Notre Dame Douglas Thain University of Notre Dame IEEE CloudCom, November 2010.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Overview Of Microsoft New Technology ENTER. Processing....
Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Peter Sempolinski University of Notre Dame.
MapReduce Simplified Data Processing On large Clusters Jeffery Dean and Sanjay Ghemawat.
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
Introduction to Makeflow and Work Queue CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.
DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Parallelization with the Matlab® Distributed Computing Server CBI cluster December 3, Matlab Parallelization with the Matlab Distributed.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Massively Parallel Ensemble Methods Using Work Queue Badi’ Abdul-Wahid Department of Computer Science University of Notre Dame CCL Workshop 2012.
Introduction to Hadoop and HDFS
Engr. M. Fahad Khan Lecturer Software Engineering Department University Of Engineering & Technology Taxila.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
Building Scalable Scientific Applications with Makeflow Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts University.
The Cooperative Computing Lab  We collaborate with people who have large scale computing problems in science, engineering, and other fields.  We operate.
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Ben Tovar University of Notre Dame October 10, 2013.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
SALSA HPC Group School of Informatics and Computing Indiana University.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Mehmet Can Kurt, The Ohio State University Gagan Agrawal, The Ohio State University DISC: A Domain-Interaction Based Programming Model With Support for.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Blue Brain Project Carlos Osuna, Carlos Aguado, Fabien Delalondre.
Studying Protein Folding on the Grid: Experiences Using CHARMM on NPACI Resources under Legion University of Virginia Anand Natrajan Marty A. Humphrey.
SALSASALSASALSASALSA Clouds Ball Aerospace March Geoffrey Fox
Framework Details  All products may be run from one program  Coordination of input data:  Model Forecast data  Emissivity Data  Instrument Data 
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
Abstract A Structured Approach for Modular Design: A Plug and Play Middleware for Sensory Modules, Actuation Platforms, Task Descriptions and Implementations.
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
Pregel: A System for Large-Scale Graph Processing Nov 25 th 2013 Database Lab. Wonseok Choi.
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
User Scenarios in VENUS-C Focus on Structural Analysis Ignacio Blanquer I3M - UPV.
Azure in a Day Training: Windows Azure Module 1: Windows Azure Overview Module 2: Development Environment / Portal – DEMO: Signing up for Windows Azure.
Microsoft Cloud Computing. Topics to be covered 1.Environmental Features of windows azure 2.What is Cloud Computing 3.Roles in Cloud Computing 4.Benefits.
Building Scalable Elastic Applications using Work Queue Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft,
Demonstration of Scalable Scientific Applications Peter Sempolinski and Dinesh Rajan University of Notre Dame.
Building Cloud Solutions Presenter Name Position or role Microsoft Azure.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Presentation-SC2001 Interactive Molecular Dynamics (IMD) IMD allows the user to guide and receive feedback from a running simulation Our demo illustrates.
Deploying Highly Available SQL Server in Windows Azure A Presentation and Demonstration by Microsoft Cluster MVP David Bermingham.
Massively Parallel Molecular Dynamics Using Adaptive Weighted Ensemble Badi’ Abdul-Wahid PI: Jesús A. Izaguirre CCL Workshop 2013.
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
CSE 5810 Biomedical Informatics and Cloud Computing Zhitong Fei Computer Science & Engineering Department The University of Connecticut CSE5810: Introduction.
Introduction to Makeflow and Work Queue Nicholas Hazekamp and Ben Tovar University of Notre Dame XSEDE 15.
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Organizations Are Embracing New Opportunities
Self Healing and Dynamic Construction Framework:
mps-tk : A C++ toolkit for multiple-point simulation
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Algorithm Design
Anne Pratoomtong ECE734, Spring2002
Introduction to Makeflow and Work Queue
Applying Twister to Scientific Applications
Haiyan Meng and Douglas Thain
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
What’s New in Work Queue
Creating Custom Work Queue Applications
Dtk-tools Benoit Raybaud, Research Software Manager.
How To Integrate an Application on Grid
Containers on Azure Peter Lasne Sr. Software Development Engineer
Presentation transcript:

Elastic Applications in the Cloud Dinesh Rajan University of Notre Dame CCL Workshop, June 2012

Application

3 Scenarios of Interest  High performance computing On-demand InexpensiveDedicated

4  High-performance applications Protein folding Genome sequencing Genetic algorithms (search)  Employ parallel computing frameworks MPI Elastic Application Candidates +

Agile: Elastic: Robust: handle & recover failures Elastic Applications High Scalability + Reliability 5 Elastic Applications Characteristics App Run

6 Talk Overview: Elastic Applications 1. Guidelines for Software Framework 2. Choose Framework: Work Queue 3. Build Elastic Applications 4. Features in Work Queue

7 Building Elastic Applications  Use Software Frameworks Library to write applications Abstract away low level details Lower effort to build & run on distributed systems + Software Framework Application

8 Guidelines for Software Framework Elasticity: Harness, adapt to run-time resource availability Fault-tolerance: Continue execution through failures Portability: Enable application on different cloud platforms Scalability: Allow application to scale in size, complexity Platform Independence: Independent of different platforms Application Independence: Not tied to any application domain Ease of effort: Minimal effort in deploying, running in cloud

Work Queue Application Model Work Queue Worker Cloud infrastructure Work Queue API Application Build Deploy Execute

10 Example Elastic Application  Replica Exchange Molecular dynamics application Used in study of Protein Folding  Convert to Elastic Replica Exchange Using Work Queue

11 Elastic Replica Exchange Work Queue Master Protein Molecule Inputs Create configurations for each replica Transfer inputs for replicas Transfer output to master Workers running simulations Replica 0Replica 1 Replica 2Replica 3Replica 4Replica 5Replica 6 Attempt exchange between 2 replicas Create replicas of protein Assign temperature to each replica Simulate replicas for given Monte Carlo step After each step

Elastic Replica Exchange./ec2_submit_workers./sge_submit_workers./condor_submit_workers

13 Elasticity Start with 100 workers in Plat. B Add 150 workers in Plat. D Remove 100 in Plat. B Add 110 in Plat. D + 40 in Plat. A Remove 125 in Plat D + 25 in Plat. A 400 Replicas

14 Elasticity + Portability gives Scalability 100 Workers 250 Workers 400 Workers 150 Workers

Elastic Applications in CCTools  Elastic Replica Exchange  replica_exchange.py  using ProtoMol  Molecular Dynamics Simulation Framework  protomol_functions.py  In CCTools  bin/ in install  apps/ in src

Elastic Applications in CCTools  Asynchronous Replica Exchange  Synchronize only exchanging replicas  Lower synchronization overheads  Default mode in Elastic Replica Exchange -b flag to use synchronous replica exchange R1R2R3R4R5 Barrier R1R2R3R4R5 Barrier R6 Barrier R6 Barrier

New Work Queue Features for Elastic Applications  String Interpolation in Input Files Dispatch dependencies based on worker environment Cygwin vs. Linux vs. Solaris ($OS) X86_64 vs. i686 vs. GPU ($ARCH) Transfers a.Linux.x86_64 to workers running on Linux x86_64 Transfers a.Cygwin.i686 to workers running on Cygwin i686 API: task_specify_file(t, "a.$OS.$ARCH", "a", WORK_QUEUE_INPUT, WORK_QUEUE_CACHE)

New Work Queue Features for Elastic Applications  Cancel Task  Cancel any submitted task  Immediately retrieves task & removes from Work Queue  Tasks to cancel identified by either taskid or tag This cancels a task with tag named ‘task3’  Useful where there are redundant or obsolete tasks  Replicate tasks when there are more resources than tasks API: cancel_by_tasktag (q, “task3”)

19 Upcoming Elastic Applications 1.Elastic Replica Exchange GROMACS Multi-dimensional Exchange – OpenMM 2.Adaptive Weighted Ensemble Method Improve sampling of kinetics of MD systems Large scale ~3000 workers 3.Genetic Algorithms for Assistive Robotics Search for optimal biped walking controller

Ongoing and Future Work For given cost & performance requirements: What should be the size of resource allocation? For given resource allocation & its characteristics: How to decompose application workflow into tasks?

21 Conclusions High Performance Applications costly, constrained with parallel computing frameworks Build Elastic Applications robust, flexible, scalable Software framework for Elastic Applications Work Queue Example Elastic Applications Replica Exchange Genetic Algorithms in Robotics Adaptive Weighted Ensemble Method

23 Cloud Platforms NamePlatformProcessorI/OCost Platform AAmazon EC22*2 x GHZ7.5 GB Memory$0.34/hr Platform BNotre Dame SGE2*2 x 2.6 GHZ8-12 GB Memory$0.0/hr Platform CMicrosoft Azure2 x 1.6 GHZ3.5 GB Memory$0.24/hr Platform DCondor2 x 2.4 GHZ4 GB Memory$0.0/hr