Https://portal.futuregrid.org HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November 17 2011 Geoffrey Fox

Slides:



Advertisements
Similar presentations
1 Challenges and New Trends in Data Intensive Science Panel at Data-aware Distributed Computing (DADC) Workshop HPDC Boston June Geoffrey Fox Community.
Advertisements

SALSA HPC Group School of Informatics and Computing Indiana University.
International Conference on Cloud and Green Computing (CGC2011, SCA2011, DASC2011, PICom2011, EmbeddedCom2011) University.
Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
Geoinformatics and Data Intensive Applications on Clouds International Collaborative Center for Geo-computation Study (ICCGS)
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
1 Clouds and Sensor Grids CTS2009 Conference May Alex Ho Anabas Inc. Geoffrey Fox Computer Science, Informatics, Physics Chair Informatics Department.
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October Chicago Geoffrey Fox
Virtual Clusters Supporting MapReduce in the Cloud Jonathan Klinginsmith School of Informatics and Computing.
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
Panel Session The Challenges at the Interface of Life Sciences and Cyberinfrastructure and how should we tackle them? Chris Johnson, Geoffrey Fox, Shantenu.
3DAPAS/ECMLS panel Dynamic Distributed Data Intensive Analysis Environments for Life Sciences: June San Jose Geoffrey Fox, Shantenu Jha, Dan Katz,
1 Challenges Facing Modeling and Simulation in HPC Environments Panel remarks ECMS Multiconference HPCS 2008 Nicosia Cyprus June Geoffrey Fox Community.
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
X-Informatics Introduction: What is Big Data, Data Analytics and X-Informatics? January Geoffrey Fox
X-Informatics Cloud Technology (Continued) March Geoffrey Fox Associate.
Science of Cloud Computing Panel Cloud2011 Washington DC July Geoffrey Fox
Clouds for Sensors and Data Intensive Applications May st International Workshop on Data-intensive Process Management.
Science Clouds and FutureGrid’s Perspective June Science Clouds Workshop HPDC 2012 Delft Geoffrey Fox
OpenQuake Infomall ACES Meeting Maui May Geoffrey Fox
IU QuakeSim/E-DECIDER Effort. QuakeSim Accomplishments (1) Deployed, improved many QuakeSim gadgets for standalone integration into QuakeSim.org – Disloc,
Remarks on Big Data Clustering (and its visualization) Big Data and Extreme-scale Computing (BDEC) Charleston SC May Geoffrey Fox
Science Applications on Clouds June Cloud and Autonomic Computing Center Spring 2012 Workshop Cloud Computing: from.
Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure Thilina Gunarathne Bingjing Zhang, Tak-Lon.
Big Data Ogres and their Facets Geoffrey Fox, Judy Qiu, Shantenu Jha, Saliya Ekanayake Big Data Ogres are an attempt to characterize applications and algorithms.
Scientific Computing Environments ( Distributed Computing in an Exascale era) August Geoffrey Fox
Geoinformatics and Data Intensive Applications on Clouds International Collaborative Center for Geo-computation Study (ICCGS)
ICETE 2012 Joint Conference on e-Business and Telecommunications Hotel Meliá Roma Aurelia Antica, Rome, Italy July
Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.
SALSA HPC Group School of Informatics and Computing Indiana University.
Some remarks on Use of Clouds to Support Long Tail of Science July XSEDE 2012 Chicago ILL July 2012 Geoffrey Fox.
ISERVOGrid Architecture Working Group Brisbane Australia June Geoffrey Fox Community Grids Lab Indiana University
SALSASALSASALSASALSA Clouds Ball Aerospace March Geoffrey Fox
X-Informatics MapReduce February Geoffrey Fox Associate Dean for Research.
Programming Models for Technical Computing on Clouds and Supercomputers (aka HPC) May Cloud Futures 2012 May 7–8,
Scientific Computing Supported by Clouds, Grids and HPC(Exascale) Systems June HPC 2012 Cetraro, Italy Geoffrey Fox.
SALSASALSASALSASALSA Cloud Panel Session CloudCom 2009 Beijing Jiaotong University Beijing December Geoffrey Fox
Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox
Security: systems, clouds, models, and privacy challenges iDASH Symposium San Diego CA October Geoffrey.
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
Big Data to Knowledge Panel SKG 2014 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China August Geoffrey Fox
Training Data Scientists DELSA Workshop DW4 May Washington DC Geoffrey Fox Informatics, Computing.
1 TCS Confidential. 2 Objective : In this session we will be able to learn:  What is Cloud Computing?  Characteristics  Cloud Flavors  Cloud Deployment.
Remarks on MOOC’s SC13 Birds of a Feather November Geoffrey Fox Informatics, Computing and Physics.
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
Big Data Workshop Summary Virtual School for Computational Science and Engineering July Geoffrey Fox
Panel: Beyond Exascale Computing
Department of Intelligent Systems Engineering
Geoffrey Fox, Shantenu Jha, Dan Katz, Judy Qiu, Jon Weissman
Digital Science Center Overview
I590 Data Science Curriculum August
Data Science Curriculum March
Biology MDS and Clustering Results
Clouds from FutureGrid’s Perspective
Discussion: Cloud Computing for an AI First Future
Big Data Architectures
Services, Security, and Privacy in Cloud Computing
Department of Intelligent Systems Engineering
$1M a year for 5 years; 7 institutions Active:
PolarGrid and FutureGrid
Panel on Research Challenges in Big Data
Digital Science Center
Cloud versus Cloud: How Will Cloud Computing Shape Our World?
Welcome to (HT)Condor Week #19 (year 34 of our project)
CReSIS Cyberinfrastructure
Convergence of Big Data and Extreme Computing
Presentation transcript:

HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox Director, Digital Science Center, Pervasive Technology Institute Associate Dean for Research and Graduate Studies, School of Informatics and Computing Indiana University Bloomington

Question for the Panel How does the Cloud fit in the HPC landscape today and what’s its likely role in the future? More specifically: – What advantages of HPC in the Cloud have you observed? – What shortcomings of HPC in the Cloud have you observed and how can they be overcome? – Given the possible variations in cloud services, implementation and business model what combinations are likely to work best for HPC? 2

Some Observations Distinguish HPC machines and HPC problems Classic HPC machines as MPI engines offer highest possible performance on closely coupled problems Clouds offer from different points of view – On-demand service (elastic) – Economies of scale from sharing – Powerful new software models such as MapReduce, which have advantages over classic HPC environments – Plenty of jobs making it attractive for students & curricula – Security challenges HPC problems running well on clouds have above advantages – Tempered by free access to some classic HPC systems 3

What Applications work in Clouds Pleasingly parallel applications of all sorts analyzing roughly independent data or spawning independent simulations – Long tail of science – Integration of distributed sensors (Internet of Things) Science Gateways and portals Workflow federating clouds and classic HPC Commercial and Science Data analytics that can use MapReduce (some of such apps) or its iterative variants (most analytic apps) 4

Clouds and Grids/HPC Synchronization/communication Performance Grids > Clouds > Classic HPC Systems Clouds appear to execute effectively Grid workloads but are not easily used for closely coupled HPC applications Service Oriented Architectures and workflow appear to work similarly in both grids and clouds Assume for immediate future, science supported by a mixture of – Clouds – see application discussion – Grids/High Throughput Systems (moving to clouds as convenient) – Supercomputers (“MPI Engines”) going to exascale

Smith-Waterman-Gotoh All Pairs Sequence Alignment Performance Pleasingly Parallel Azure Amazon (2 ways) HPC MapReduce

Performance with/without data caching Speedup gained using data cache Scaling speedup Increasing number of iterations Number of Executing Map Task Histogram Strong Scaling with 128M Data PointsWeak Scaling Task Execution Time Histogram

Kmeans Speedup normalized to 32 at 32 cores HPC Cloud HPC

10 Application Classification

What can we learn? There are many pleasingly parallel simulations and data analysis algorithms which are super for clouds There are interesting data mining algorithms needing iterative parallel run times There are linear algebra algorithms with dodgy compute/communication ratios but can be done with reduction collectives not lots of MPI- SEND/RECV Expectation Maximization good for Iterative MapReduce 11

Architecture of Data Repositories? Traditionally governments set up repositories for data associated with particular missions – For example EOSDIS (Earth Observation), GenBank (Genomics), NSIDC (Polar science), IPAC (Infrared astronomy) – LHC/OSG computing grids for particle physics This is complicated by volume of data deluge, distributed instruments as in gene sequencers (maybe centralize?) and need for intense computing like Blast – i.e. repositories need HPC? 12

Clouds as Support for Data Repositories? The data deluge needs cost effective computing – Clouds are by definition cheapest – Need data and computing co-located Shared resources essential (to be cost effective and large) – Can’t have every scientists downloading petabytes to personal cluster Need to reconcile distributed (initial source of ) data with shared computing – Can move data to (disciple specific) clouds – How do you deal with multi-disciplinary studies Data repositories of future will have cheap data and elastic cloud analysis support? 13