CASC Spring Meeting 2012 Craig A. Stewart

Slides:



Advertisements
Similar presentations
What is Cloud Computing? Massive computing resources, deployed among virtual datacenters, dynamically allocated to specific users and tasks and accessed.
Advertisements

What is Cloud Computing? Massive computing resources, deployed among virtual datacenters, dynamically allocated to specific users and tasks and accessed.
Statewide IT Conference30-September-2011 HPC Cloud Penguin on David Hancock –
Zhao Lixing.  A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation.  Supercomputers.
Bill Barnett, Bob Flynn & Anurag Shankar Pervasive Technology Institute and University Information Technology Services, Indiana University CASC. September.
Data Gateways for Scientific Communities Birds of a Feather (BoF) Tuesday, June 10, 2008 Craig Stewart (Indiana University) Chris Jordan.
Research Computing Governance Brad Wheeler Office of the VP for IT & CIO © 2007 Trustees of Indiana University Creative Commons Attribution.
 Amazon Web Services announced the launch of Cluster Compute Instances for Amazon EC2.  Which aims to provide high-bandwidth, low- latency instances.
DANSE Central Services Michael Aivazis Caltech NSF Review May 23, 2008.
1 Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not needed. Supporting Polar Research with National Cyberinfrastructure.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
High Performance Computing with cloud Xu Tong. About the topic Why HPC(high performance computing) used on cloud What’s the difference between cloud and.
Dr. Brad Wheeler Vice President for IT & CIO, Dean, & Professor Indiana University ovpit.iu.edu Leading Ahead of the Curves © 2008 Brad.
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Rockhopper: Penguin on Demand at Indiana.
1 Penguin Computing and Indiana University partner for “above campus” and campus bridging services to the community Craig A. Stewart Executive.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
Slide 1 Copyright © 2003 Encapsule Systems, Inc. Hyperworx Platform Brief Modeling and deploying component software services with the Hyperworx™ platform.
By Mihir Joshi Nikhil Dixit Limaye Pallavi Bhide Payal Godse.
1 Get the convenience of cloud while keeping your rights – through the IU / Penguin Computing partnership Craig A. Stewart - Executive.
Delivering a New Desktop and Application Deployment Strategy Indiana University and the New Emerging Personal Computing Model Duane Schau
DynamicBLAST on SURAgrid: Overview, Update, and Demo John-Paul Robinson Enis Afgan and Purushotham Bangalore University of Alabama at Birmingham SURAgrid.
DISTRIBUTED COMPUTING
Big Red II & Supporting Infrastructure Craig A. Stewart, Matthew R. Link, David Y Hancock Presented at IUPUI Faculty Council Information Technology Subcommittee.
Genomics, Transcriptomics, and Proteomics: Engaging Biologists Richard LeDuc Manager, NCGAS eScience, Chicago 10/8/2012.
OpenQuake Infomall ACES Meeting Maui May Geoffrey Fox
Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales.
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
DANSE Central Services Michael Aivazis Caltech NSF Review May 31, 2007.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
Copyright © 2011 Penguin Computing, Inc. All rights reserved PODShell: Simplifying HPC in the Cloud Workflow June 2011.
Using SWARM service to run a Grid based EST Sequence Assembly Karthik Narayan Primary Advisor : Dr. Geoffrey Fox 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
INDIANAUNIVERSITYINDIANAUNIVERSITY Spring 2000 Indiana University Information Technology University Information Technology Services Please cite as: Stewart,
February 27, 2007 University Information Technology Services Research Computing Craig A. Stewart Associate Vice President, Research Computing Chief Operating.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Update on EAGER: Best Practices and.
Award # funded by the National Science Foundation Award #ACI Jetstream: A Distributed Cloud Infrastructure for.
Cloud Computing is a Nebulous Subject Or how I learned to love VDF on Amazon.
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
Bio-IT World Conference and Expo ‘12, April 25, 2012 A Nation-Wide Area Networked File System for Very Large Scientific Data William K. Barnett, Ph.D.
Galaxy Community Conference July 27, 2012 The National Center for Genome Analysis Support and Galaxy William K. Barnett, Ph.D. (Director) Richard LeDuc,
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
Built on the Powerful Microsoft Azure Platform, Forensic Advantage Helps Public Safety and National Security Agencies Collect, Analyze, Report, and Distribute.
Introduction to Avaya’s SDN Architecture February 2015.
The Future of Whole Human Genome Data Management and Analysis, Available on the Microsoft Azure Platform Today MICROSOFT AZURE APP BUILDER PROFILE: SPIRAL.
Task Performance Group Provides Cutting-Edge E-Commerce B2B EDI Integration Using MegaXML SaaS Solution on Microsoft Azure Cloud Platform MICROSOFT AZURE.
ChinaGrid: National Education and Research Infrastructure Hai Jin Huazhong University of Science and Technology
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Extreme Scale Infrastructure
HPC Roadshow Overview of HPC systems and software available within the LinkSCEEM project.
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Clouds , Grids and Clusters
DocFusion 365 Intelligent Template Designer and Document Generation Engine on Azure Enables Your Team to Increase Productivity MICROSOFT AZURE APP BUILDER.
Matt Link Associate Vice President (Acting) Director, Systems
Recap: introduction to e-science
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Hosted on Azure, LoginRadius’ Customer Identity
Campus Bridging in XSEDE, OSG, and Beyond
DeFacto Planning on the Powerful Microsoft Azure Platform Puts the Power of Intelligent and Timely Planning at Any Business Manager’s Fingertips Partner.
Overview of HPC systems and software available within
Building and running HPC apps in Windows Azure
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
Presentation transcript:

Penguin Computing / IU Partnership HPC “cluster as a service” and Cloud Services CASC Spring Meeting 2012 Craig A. Stewart (stewart@iu.edu) Executive Director, Pervasive Technology Institute Associate Dean, Research Technologies Matthew Jacobs SVP Corporate Development Penguincomputing.com

Please cite as: Stewart, C. A. and M. Jacobs. 2012 Please cite as: Stewart, C.A. and M. Jacobs. 2012. “Penguin Computing / IU Partnership HPC ‘cluster as a service’ and Cloud Services.” Presentation. Presented at Coalition of Academic Scientific Computation, 29 February 2012, Arlington, VA. http://hdl.handle.net/2022/14441 The image on slide 1 (title slide) and slides 3 – 7 © Penguin Computing Inc. all rights reserved; may not be reused without permission from Penguin Computing Inc. Other slides (except where explicitly noted) are copyright 2011 by the Trustees of Indiana University, and this content is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/)

What is POD Services On-demand HPC system Compute, storage, low latency fabrics, GPU, non-virtualized Robust software infrastructure Full automation User and administration space controls Secure and seamless job migration Extensible framework Complete billing infrastructure Services Custom product design Site and workflow integration Managed services Application support HPC support expertise Skilled HPC administrators Leverage 13 yrs serving HPC market Internet (150Mb, burstable to 1Gb)

Penguin HPC Cloud Services Penguin Computing on Demand True HPC in the cloud on a pay-as-you-go basis Overflow, Targeted workload, Targeted user set Post-Purchase Collocation Collocation services provided by Penguin Cost Reduction, Budget reallocation Public-Private On-Demand Partnerships Penguin-owned and operated PODs, hosted at academic or government facilities Revenue sharing, Augment local resources, Self-sustaining growth POD Hybrid On-premise cluster designed to mean usage + POD Save on initial capital outlay while sustaining high service level to users OEM HPC Cloud POD distribution to internal or external customers Augment local resource, expertise, fund growth HPC SaaS Platform Hosting platform for SAAS providers On-demand delivery platform for ISVs Turnkey Managed Services Remote managed services for penguin and non-Penguin clusters Augment local expertise, reduce costs

Scyld HPC Cloud Management System Created by POD Developers and Administrators Create and Manage User and Group Hierarchies Simultaneously Manage Multiple Collocated Clusters Create Customer Facing Web Portals Use Web Services to Integrate with Back-End Systems Deploy HTML5 Based Cluster Management Tools Securely Migrate User Workloads Efficiently Schedule and Manage Cluster Resources Create and Deploy Virtual Headnodes for User-Specific Clusters

12 Million Commercial Jobs and Counting… Current data centers: Salt Lake City, Indiana University, Mountain View 1,500 cores (AMD and Intel) 240 TB on-demand storage Replaced in-house image analysis cluster with POD and co-located storage Provides cloud analysis services on POD for world-wide bioinformatics customers Replaced Amazon AWS cloud usage with PODTools workflow migration system Nihon ESI provides crash analysis analyses to Honda R&D during Japan’s brown-outs

The POD Advantage Persistent, customized user environment High-speed Intel and AMD compute nodes (physical) Fast access to local storage (data guaranteed to be local) Highly secure (https, shared key authentication, IP matching, VPN) Billed by the fractional core hour HPC expertise included (Penguin’s core business for many years) Cluster software stack included Troubleshooting included in support Collocated storage options available Highly dependable and dynamically scalable

Clouds look serene enough - But is ignorance bliss? In the cloud, do you know: Where your data are? What laws prevail over the physical location of your data? What license you really agreed to? What is the security (electronic / physical) around your data? And how exactly do you get to that cloud, or get things out of it? How secure your provider is financially? (The fact that something seems unimaginable, like cloud provider such-and-such going out of business abruptly, does not mean it is impossible!) Photo by http://www.flickr.com/photos/mnsc/ http://www.flickr.com/photos/mnsc/2768391365/sizes/z/in/photostream/ http://creativecommons.org/licenses/by/2.0/

Penguin Computing & IU partner for “Cluster as a Service” Just what it says: Cluster as a Service Cluster physically located on IU’s campus, in IU’s Data Center Available to anyone at a .edu or FFRDC (Federally Funded Research and Development Center) To use it: Go to podiu.penguincomputing.com Fill out registration form Verify via your email Get out your credit card Go computing This builds on Penguin’s experience - currently host Life Technologies' BioScope and LifeScope in the cloud (http://lifescopecloud.com)

We know where the data are … and they are secure

An example of NET+ Services / Campus Bridging "We are seeing the early emergence of a meta-university — a transcendent, accessible, empowering, dynamic, communally constructed framework of open materials and platforms on which much of higher education worldwide can be constructed or enhanced.” Charles Vest, president emeritus of MIT, 2006 NET+ Goal: achieve economy of scale and retain reasonable measure of controlSee: Brad Wheeler and Shelton Waggener. 2009. Above-Campus Services: Shaping the Promise of Cloud Computing for Higher Education. EDUCAUSE Review, vol. 44, no. 6 (November/December 2009): 52-67. Campus Bridging goal – make it all feel like it’s just a peripheral to your laptop (see pti.iu.edu/campusbridging)

IU POD – Innovation Through Partnership True On-Demand HPC for Internet2 Creative Public/Private model to address HPC shortfall Turning lost EC2 dollars into central IT expansion Tiered channel strategy expansion to EDU sector Program and discipline-specific enhancements under way Objective third party resource for collaboration EDU, Federal and Commercial

POD IU (Rockhopper) specifications Server Information  Architecture Penguin Computing Altus 1804 TFLOPS 4.4 Clock Speed 2.1GHz Nodes 11 compute; 2 login; 4 management; 3 servers CPUs 4 x 2.1GHz 12-core AMD Opteron 6172 processors per compute node Memory Type Distributed and Shared Total Memory 1408 GB Memory per Node 128GB 1333MHz DDR3 ECC Local Scratch Storage 6TB locally attached SATA2 Cluster Scratch 100TB Lustre Further Details OS CentOS 5 Network QDR (40Gb/s) Infiniband, 1Gb/s ethernet Job Management Software SGE Job Scheduling Software Job Scheduling policy Fair Share Access keybased ssh login to headnodes remote job control via Penguin's PODShell

Available applications at POD IU (Rockhopper) Package name Summary COAMPS Coupled ocean / atmosphere meoscale prediction system Desmond Desmond is a software package developed at D. E. Shaw Research to perform high-speed molecular dynamics simulations of biological systems on conventional commodity clusters. GAMESS GAMESS is a program for ab initio molecular quantum chemistry. Galaxy Galaxy is an open, web-based platform for data intensive biomedical research. GROMACS GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. HMMER HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. Intel compilers and libraries LAMMPS LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. MM5 The PSU/NCAR mesoscale model (known as MM5) is a limited-area, nonhydrostatic, terrain-following sigma-coordinate model designed to simulate or predict mesoscale atmospheric circulation. The model is supported by several pre- and post-processing programs, which are referred to collectively as the MM5 modeling system. mpiBLAST mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST. NAMD NAMD is a parallel molecular dynamics code for large biomolecular systems.

Available applications at POD IU (Rockhopper) Package name Summary NCBI-Blast The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. OpenAtom OpenAtom is a highly scalable and portable parallel application for molecular dynamics simulations at the quantum level. It implements the Car-Parrinello ab-initio Molecular Dynamics (CPAIMD) method. OpenFoam The OpenFOAM®  (Open Field Operation and Manipulation) CFD Toolbox is a free, open source CFD software package produced by OpenCFD Ltd. It has a large user base across most areas of engineering and science, from both commercial and academic organisations. OpenFOAM has an extensive range of features to solve anything from complex fluid flows involving chemical reactions, turbulence and heat transfer, to solid dynamics and electromagnetics. OpenMPI Infinibad based Message Passing Interface - 2 (MPI-2) implementation POP POP is an ocean circulation model derived from earlier models of Bryan, Cox, Semtner and Chervin in which depth is used as the vertical coordinate. The model solves the three-dimensional primitive equations for fluid motions on the sphere under hydrostatic and Boussinesq approximations. Portland Group compilers R R is a language and environment for statistical computing and graphics. WRF The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research needs. It features multiple dynamical cores, a 3-dimensional variational (3DVAR) data assimilation system, and a software architecture allowing for computational parallelism and system extensibility.