Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications Sep 2009 High Performance Interconnects for Distributed Computing (HPI-DC)

Slides:

Advertisements

Similar presentations

2  Industry trends and challenges  Windows Server 2012: Modern workstyle, enabled  Access from virtually anywhere, any device  Full Windows experience.

Advertisements

2  Industry trends and challenges  Windows Server 2012: Beyond virtualization  Complete virtualization platform  Improved scalability and performance.

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.

SLA-Oriented Resource Provisioning for Cloud Computing

Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers Ryan E. GrantAhmad Afsahi Pavan Balaji Department of Electrical and Computer Engineering,

The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.

Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.

KMemvisor: Flexible System Wide Memory Mirroring in Virtual Environments Bin Wang Zhengwei Qi Haibing Guan Haoliang Dong Wei Sun Shanghai Key Laboratory.

RDS and Oracle 10g RAC Update Paul Tsien, Oracle.

IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)

Efficient Cloud Computing Through Scalable Networking Solutions.

Windows Server Scalability And Virtualized I/O Fabric For Blade Server

Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.

Infiniband enables scalable Real Application Clusters – Update Spring 2008 Sumanta Chatterjee, Oracle Richard Frank, Oracle.

1 Some Context for This Session…  Performance historically a concern for virtualized applications  By 2009, VMware (through vSphere) and hardware vendors.

© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.

1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.

Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.

National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Direct Self-Consistent Field Computations on GPU Clusters Guochun.

1 Advanced Storage Technologies for High Performance Computing Sorin, Faibish EMC NAS Senior Technologist IDC HPC User Forum, April 14-16, Norfolk, VA.

Roland Dreier Technical Lead – Cisco Systems, Inc. OpenIB Maintainer Sean Hefty Software Engineer – Intel Corporation OpenIB Maintainer Yaron Haviv CTO.

Copyright 2009 Fujitsu America, Inc. 0 Fujitsu PRIMERGY Servers “Next Generation HPC and Cloud Architecture” PRIMERGY CX1000 Tom Donnelly April

1 March 2010 A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance Ryan E. Grant 1, Pavan Balaji 2, Ahmad.

Maximizing The Compute Power With Mellanox InfiniBand Connectivity Gilad Shainer Wolfram Technology Conference 2006.

Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

CONFIDENTIAL Mellanox Technologies, Ltd. Corporate Overview Q1, 2007.

© 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect.

The NE010 iWARP Adapter Gary Montry Senior Scientist

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Gilad Shainer, VP of Marketing Dec 2013 Interconnect Your Future.

High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners

Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

HPC Advisory Council Panel: AMD. 2 | Oak Ridge National Labs (ORNL) Briefing | 2/25/2009 | Confidential AMD HPC Strategy Deliver industry-leading solutions.

© 2012 MELLANOX TECHNOLOGIES 1 Disruptive Technologies in HPC Interconnect HPC User Forum April 16, 2012.

Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.

1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.

Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.

2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Update IDC HPC Forum.

Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.

Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.

Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1.

The Distributed Data Interface in GAMESS Brett M. Bode, Michael W. Schmidt, Graham D. Fletcher, and Mark S. Gordon Ames Laboratory-USDOE, Iowa State University.

Revision - 01 Intel Confidential Page 1 Intel HPC Update Norfolk, VA April 2008.

Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.

Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.

Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

Performance Evaluation of Parallel Algorithms on a Computational Grid Environment Simona Blandino 1, Salvatore Cavalieri 2 1 Consorzio COMETA, 2 Faculty.

Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.

Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb

APE group Many-core platforms and HEP experiments computing XVII SuperB Workshop and Kick-off Meeting Elba, May 29-June 1,

Extreme Scale Infrastructure

NFV Compute Acceleration APIs and Evaluation

5/15/2018 5:55 PM THR2085 Breaking Physical Barriers on Azure Hybrid Cloud & Driving SQL with QCT/Intel Next-Gen Hardware Dr. Jen-Yao Chung AVP of Research.

CLUSTER COMPUTING.

Presentation transcript:

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications Sep 2009 High Performance Interconnects for Distributed Computing (HPI-DC) Gilad Shainer, Tong Liu (Mellanox); Jeffrey Layton (Dell); Joshua Mora (AMD)

2 Note The following research was performed under the HPC Advisory Council activities –Special thanks to AMD, Dell, Mellanox Technologies –For more info please refer to Testing center: HPC Advisory Council HPC Center

3 HPC Advisory Council WW HPC organization (90 members) –Bridges the gap between HPC usage and its potential –Provides best practices and a support/development center –Explores future technologies and future developments –Explores advanced topics – HPC in a cloud, HPCaaS etc. –Leading edge solutions and technology demonstrations For more info:

4 HPC Advisory Council

5 HPC Centers Evolution From equipment warehouse to service provider From dedicated HW per application to application services Higher productivity, simplicity and efficiency App3 App2 App1 App3 App2 App1

6 Research Objectives Investigate HPC as a Service for bio-science applications –In particular NAMD and CPMD Performance and productivity impact Cluster interconnect effects on applications performance For specific information on NAMD and CPMD, please refer to additional studies at –

7 Car-Parrinello Molecular Dynamics (CPMD) A parallelized implementation of density functional theory (DFT) Particularly designed for ab-initio molecular dynamics Brings together methods –Classical molecular dynamics –Solid state physics –Quantum chemistry CPMD supports MPI and Mixed MPI/SMP CPMD is distributed and developed by the CPMD consortium

8 NAMD A parallel, object-oriented molecular dynamics software Designed for high-performance simulation of large biomolecular systems –Millions of atoms Developed by the joint collaboration of the Theoretical and Computational Biophysics Group (TCB) and the Parallel Programming Laboratory (PPL) at the University of Illinois at Urbana-Champaign NAMD is distributed free of charge with source code

9 Test Cluster Configuration Dell™ PowerEdge™ SC node cluster Quad-Core AMD Opteron™ 2382 (“Shanghai”) CPUs Mellanox® InfiniBand ConnectX® DDR HCAs and InfiniBand DDR Switch Memory: 16GB memory, DDR2 800MHz per node OS: RHEL5U2, OFED 1.3 InfiniBand SW stack MPI: Open MPI 1.3, Platform MPI Compiler: GCC Benchmark Application: –CPMD 3.13 Benchmark Dataset: C120 –NAMD 2.6 with fftw3 libraries and Charm Benchmark Dataset: ApoA1 (92,224 atoms, 12A cutoff)

10 InfiniBand Roadmap Industry Standard –Hardware, software, cabling, management –Design for clustering and storage interconnect Performance –40Gb/s node-to-node –120Gb/s switch-to-switch –1us application latency –Most aggressive roadmap in the industry Reliable with congestion management Efficient –RDMA and Transport Offload –Kernel bypass –CPU focuses on application processing Scalable for Petascale computing & beyond End-to-end quality of service Virtualization acceleration I/O consolidation Including storage The InfiniBand Performance Gap is Increasing Fibre Channel Ethernet 60Gb/s 20Gb/s 120Gb/s 40Gb/s 240Gb/s (12X) 80Gb/s (4X)

11 Performance –Quad-Core Enhanced CPU IPC 4x 512K L2 cache 6MB L3 Cache –Direct Connect Architecture HyperTransport™ Technology Up to 24 GB/s peak per processor –Floating Point 128-bit FPU per core 4 FLOPS/clk peak per core –Integrated Memory Controller Up to 12.8 GB/s DDR2-800 MHz or DDR2-667 MHz Scalability –48-bit Physical Addressing Compatibility –Same power/thermal envelopes as 2 nd / 3 rd generation AMD Opteron™ CPU 11 November5, 2007 PCI-E® Bridge I/O Hub USB PCI PCI-E® Bridge 8 GB/S Dual Channel Reg DDR2 8 GB/S Quad-Core AMD Opteron™ Processor

12 Dell PowerEdge Servers helping Simplify IT System Structure and Sizing Guidelines –24-node cluster build with Dell PowerEdge™ SC 1435 Servers –Servers optimized for High Performance Computing environments –Building Block Foundations for best price/performance and performance/watt Dell HPC Solutions –Scalable Architectures for High Performance and Productivity –Dell's comprehensive HPC services help manage the lifecycle requirements. –Integrated, Tested and Validated Architectures Workload Modeling –Optimized System Size, Configuration and Workloads –Test-bed Benchmarks –ISV Applications Characterization –Best Practices & Usage Analysis

13 NAMD Benchmark Results – Productivity Case 1: Dedicated hardware resource for NAMD Input Date: ApoA1 –Benchmark comprises 92K atoms of lipid, protein, and water –Models a bloodstream lipoprotein particle –One of the most used data sets for benchmarking NAMD Increasing number of concurrent jobs increases cluster productivity InfiniBand enables higher performance and productivity Higher is better

14 CPMD Benchmark Results – Productivity Case 2: Dedicated hardware resource for CPMD Benchmark Data: C carbon atoms Running two jobs in parallel increases cluster productivity InfiniBand enables higher performance and scalability than GigE Higher is better

15 MPI Collectives in CPMD MPI_AlltoAll is the key collective function in CPMD –Number of AlltoAll messages increases dramatically with cluster size

16 CPMD MPI Message Size Distribution Majority messages are medium size

17 NAMD Message Distribution As number of nodes scales, percentage of small messages increases Percentage of 1KB-256KB messages is relatively consistent for cluster sizes greater than 8 nodes Majority of the messages is in the range of 128B-1KB for cluster size greater than 8 nodes

18 Multiple Applications – CPMD and NAMD Case 3 – HPC as a Service (HPCaaS) HW platform to serve multiple applications at the same time –CMPD and NAMD Multiple test scenarios will be presented in the following slides –Each describes different allocation methods of the HW system per service Service refers to a single application At least 2 applications will be served at a given time –Each scenario will be to compared to a dedicated HW per applications approach Evaluation metric: productivity (number of jobs per day)

19 Multiple Applications – CPMD and NAMD Test Scenario: –Single Application approach: Two NAMD jobs in parallel for half day then two CPMD jobs for the other half day –Multiple Applications approach One CPMD job and one NAMD job simultaneously on the cluster for a full day Case I:4 cores for each application (2 cores on each CPU) Case II: One application per CPU socket Running CPMD and NAMD in parallel improves CPMD productivity Distributing CPMD processes to two sockets has better performance –Versus using an entire CPU (socket) per applications NAMD shows negligible productivity difference under the three scenarios

20 Multiple Applications – CPMD + NAMD Test Scenario: –Single Application Two NAMD jobs in parallel for 3/4 day then two CPMD jobs for 1/4 day –Multiple Applications One CPMD job and one NAMD job simultaneously on the cluster for a full day 6 cores for NAMD (3 cores on each CPU), 2 cores for CPMD (1 core on each CPU) Running CPMD and NAMD in parallel improves CPMD productivity by up to 61% NAMD shows negligible productivity difference under the two scenarios Higher is betterInfiniBand DDR

21 Multiple Applications – CPMD + NAMD Test Scenario: –Single Application Two NAMD jobs in parallel for 1/4 day then two CPMD jobs for 3/4 day –Multiple Applications One CPMD job and one NAMD job simultaneously on the cluster for a full day 2 cores for NAMD (1 core on each CPU), 6 cores for CPMD (3 cores on each CPU) Running CPMD with less cores decreases CPMD productivity NAMD shows negligible productivity difference under the two scenarios Higher is betterInfiniBand DDR

22 Results Summary NAMD –Increase number of jobs running on the each node improves productivity –InfiniBand provides nearly doubled performance of GigE –GigE does not scale beyond 20 nodes CPMD –Higher productivity is gained with 2 parallel CPMD jobs on the cluster –InfiniBand delivers up to 300% higher productivity vs GigE CPMD and NAMD simultaneously – HPC as a Service (HPCaaS) –It is feasible and productive to run CPMD and NAMD simultaneously on a single system –When enough core were allocated, CPMD productivity was increased –NAMD demonstrates same level of productivity –NAMD consumes large portion of the systems resources Having more than a single NAMD and CPMD jobs will not increase productivity

23 Conclusions HPC as a Service enables greater systems flexibility –Eliminates the need for dedicated HW resources per applications –Simplifies usage models –Enables dynamic allocation per given task Effectively model needs to take into consideration –Applications sensitivity points and applications bottlenecks –Minimum HW resource requirements per applications –Matching up applications with different hardware requirements HPC as a Service for Bio-science applications (CMPD and NAMD) –Enables increased or equal productivity versus dedicated HW resource –Method: allocation of 4 cores or less for CPMD, 4 cores or more for NAMD Cores per application allocation – using both sockets demonstrate higher productivity Better allocation of cores, memory and interconnect resources to minimize contention NAMD requires more or equal compute resources than CPMD

24 Thank You HPC Advisory Council All trademarks are property of their respective owners. All information is provided “As-Is” without any kind of warranty. The HPC Advisory Council makes no representation to the accuracy and completeness of the information contained herein. HPC Advisory Council Mellanox undertakes no duty and assumes no obligation to update or correct any information presented herein