Experience with Adopting Clouds at Notre Dame Douglas Thain University of Notre Dame IEEE CloudCom, November 2010.

Slides:



Advertisements
Similar presentations
1 Real-World Barriers to Scaling Up Scientific Applications Douglas Thain University of Notre Dame Trends in HPDC Workshop Vrije University, March 2012.
Advertisements

Lobster: Personalized Opportunistic Computing for CMS at Large Scale Douglas Thain (on behalf of the Lobster team) University of Notre Dame CVMFS Workshop,
Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
SALSA HPC Group School of Informatics and Computing Indiana University.
Introduction to Scalable Programming using Makeflow and Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.
Scaling Up Without Blowing Up Douglas Thain University of Notre Dame.
1 Condor Compatible Tools for Data Intensive Computing Douglas Thain University of Notre Dame Condor Week 2011.
1 High Throughput Scientific Computing with Condor: Computer Science Challenges in Large Scale Parallelism Douglas Thain University of Notre Dame UAB 27.
1 Opportunities and Dangers in Large Scale Data Intensive Computing Douglas Thain University of Notre Dame Large Scale Data Mining Workshop at SIGKDD August.
1 Scaling Up Data Intensive Science with Application Frameworks Douglas Thain University of Notre Dame Michigan State University September 2011.
1 Models and Frameworks for Data Intensive Cloud Computing Douglas Thain University of Notre Dame IDGA Cloud Computing 8 February 2011.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Deconstructing Clusters for High End Biometric Applications NSF CCF June Douglas Thain and Patrick Flynn University of Notre Dame 5 August.
Getting Beyond the Filesystem: New Models for Data Intensive Scientific Computing Douglas Thain University of Notre Dame HEC FSIO Workshop 6 August 2009.
An Introduction to Grid Computing Research at Notre Dame Prof. Douglas Thain University of Notre Dame
Introduction to Makeflow Li Yu University of Notre Dame 1.
Building Scalable Elastic Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft, Netherlands.
Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Peter Sempolinski University of Notre Dame.
MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,
Building Scalable Applications on the Cloud with Makeflow and Work Queue Douglas Thain and Patrick Donnelly University of Notre Dame Science Cloud Summer.
Introduction to Makeflow and Work Queue CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.
Portable Resource Management for Data Intensive Workflows Douglas Thain University of Notre Dame.
Design of an Active Storage Cluster File System for DAG Workflows Patrick Donnelly and Douglas Thain University of Notre Dame 2013 November 18 th DISCS-2013.
1 1 Hybrid Cloud Solutions (Private with Public Burst) Accelerate and Orchestrate Enterprise Applications.
Elastic Applications in the Cloud Dinesh Rajan University of Notre Dame CCL Workshop, June 2012.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
Software Architecture
Cluster Computing Applications for Bioinformatics Thurs., Aug. 9, 2007 Introduction to cluster computing Working with Linux operating systems Overview.
Toward a Common Model for Highly Concurrent Applications Douglas Thain University of Notre Dame MTAGS Workshop 17 November 2013.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Copyright © 2012 Axceleon Intellectual Property All rights reserved HPC User Forum, Dearborn MI. Our Focus: Enable HPC solutions in the Cloud for our Customer.
Introduction to Work Queue Applications CSE – Cloud Computing – Fall 2014 Prof. Douglas Thain.
Large Scale Sky Computing Applications with Nimbus Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes – Bretagne Atlantique Rennes, France
Building Scalable Scientific Applications with Makeflow Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts University.
Building Scalable Scientific Applications using Makeflow Dinesh Rajan and Douglas Thain University of Notre Dame.
The Cooperative Computing Lab  We collaborate with people who have large scale computing problems in science, engineering, and other fields.  We operate.
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Ben Tovar University of Notre Dame October 10, 2013.
1 Computational Abstractions: Strategies for Scaling Up Applications Douglas Thain University of Notre Dame Institute for Computational Economics University.
Introduction to Work Queue Applications Applied Cyberinfrastructure Concepts Course University of Arizona 2 October 2014 Douglas Thain and Nicholas Hazekamp.
SALSA HPC Group School of Informatics and Computing Indiana University.
Grid and Cloud Computing Globus Provision Dr. Guy Tel-Zur.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Introduction to Makeflow and Work Queue Prof. Douglas Thain, University of Notre Dame
Analyzing LHC Data on 10K Cores with Lobster and Work Queue Douglas Thain (on behalf of the Lobster Team)
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
Introduction to Scalable Programming using Work Queue Dinesh Rajan and Mike Albrecht University of Notre Dame October 24 and November 7, 2012.
HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox
Bringing your favorite analysis applications to iPlant using Docker containers Nirav Merchant
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Purdue RP Highlights TeraGrid Round Table May 20, 2010 Preston Smith Manager - HPC Grid Systems Rosen Center for Advanced Computing Purdue University.
Building Scalable Elastic Applications using Work Queue Dinesh Rajan and Douglas Thain University of Notre Dame Tutorial at CCGrid, May Delft,
Demonstration of Scalable Scientific Applications Peter Sempolinski and Dinesh Rajan University of Notre Dame.
Building Scalable Scientific Applications with Work Queue Douglas Thain and Dinesh Rajan University of Notre Dame Applied Cyber Infrastructure Concepts.
Next Generation of Apache Hadoop MapReduce Owen
Introduction to Makeflow and Work Queue Nicholas Hazekamp and Ben Tovar University of Notre Dame XSEDE 15.
HPC In The Cloud Case Study: Proteomics Workflow
Introduction to Makeflow and Work Queue
Introduction to Distributed Platforms
Integrated genome analysis using
Scaling Up Scientific Workflows with Makeflow
Introduction to Makeflow and Work Queue
Integration of Singularity With Makeflow
Introduction to Makeflow and Work Queue
Haiyan Meng and Douglas Thain
Introduction to Makeflow and Work Queue
Introduction to Makeflow and Work Queue
Introduction to Makeflow and Work Queue with Containers
What’s New in Work Queue
Creating Custom Work Queue Applications
Cloud versus Cloud: How Will Cloud Computing Shape Our World?
Presentation transcript:

Experience with Adopting Clouds at Notre Dame Douglas Thain University of Notre Dame IEEE CloudCom, November 2010

Hardware is not the problem core campus grid 8000-core HPC clusters. Private clusters. And yet - Amazon EC2/S3 Windows Azure Campus IT Cloud And yet - Talks by N. Regola and P. Sempolinski on Thurs at 11AM

Clouds Invert Software Design Parallel application design: – I have a machine this big already paid for. – How do I write a program to use the hardware most efficiently? Elastic application design: – I have a problem this big today. – How many resources do I need to solve it? – Like grids, except there is a $ cost to inefficiency.

We haven’t gotten much interest in writing Map-Reduce apps. Run Hadoop for 3 years on 64 cores and 128 TB. – Lots of education and outreach. – Nobody really found it that useful! Reasons why not: – Existing apps written in C, C++, Fortran, Python, etc use the filesystem in non trivial ways. – “We re-wrote the application” is a phrase that has a negative connotation outside of CS. – No good way to integrate into existing workflows and other execution system. – In short, it’s a self-contained world. (A CS virtue)

Campus Condor Pool Hadoop Storage Cloud Patrick Donnelly, “Attaching Cloud Storage to a Campus Grid”, Thursday at 4PM App Parrot Chirp

How do we take existing applications and data, and make them both portable and scalable?

Personal Beowulf Cluster Campus Condor Pool Public Cloud Provider Private SGE Cluster Hundreds of Workers in a Personal Cloud sge_submit_workers condor_submit_workers Your Program Work Queue Library submit tasks done ssh Local Files and Programs Work Queue

Example Applications T=10KT=20K T=30K T=40K Replica Exchange Work Queue Scalable Assembler Work Queue Align x100s AGTCACACTGTACGTAGAAGTCACACTGTACGTAA… AGTC ACTCAT ACTGAGC TAATAAG Fully Assembled Genome Raw Sequence Data

Makeflow = Make + Workflow part1 part2 part3: input.data split.py./split.py input.data out1: part1 mysim.exe./mysim.exe part1 >out1 out2: part2 mysim.exe./mysim.exe part2 >out2 out3: part3 mysim.exe./mysim.exe part3 >out3 result: out1 out2 out3 join.py./join.py out1 out2 out3 > result

Makeflow for Bioinformatics BLAST SHRIMP SSAHA BWA Maker..

Personal Beowulf Cluster Campus Condor Pool Public Cloud Provider Private SGE Cluster Hundreds of Workers in a Personal Cloud sge_submit_workers condor_submit_workers Makeflow Work Queue submit tasks done ssh Local Files and Programs

Bad News: A Scalable Application is a Denial-of- Service Weapon in Disguise!

How to Shape the Application?

Observations Our users (and yours?) want to connect and scale existing data intensive applications and systems. Adoption of a new model/concept requires that the user be suffering already and the solution is orders of magnitude better than what exists. People are beginning to confront the real costs of computing (a quality TB-year is expensive!) End users need a lot of help in understanding when and how to scale. Apps should be self scaling/adjusting/throttling.