Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Slides:



Advertisements
Similar presentations
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Advertisements

Distributed Processing, Client/Server, and Clusters
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
Chapter 16 Client/Server Computing Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
OS Spring 2004 Concurrency: Principles of Deadlock Operating Systems Spring 2004.
Understanding Operating Systems 1 Overview Introduction Operating System Components Machine Hardware Types of Operating Systems Brief History of Operating.
OS Fall’02 Concurrency: Principles of Deadlock Operating Systems Fall 2002.
Scientific Computing on Heterogeneous Clusters using DRUM (Dynamic Resource Utilization Model) Jamal Faik 1, J. D. Teresco 2, J. E. Flaherty 1, K. Devine.
Computer Organization and Architecture
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Load Test Planning Especially with HP LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Domino MailDomino AppsQuickPlace Sametime Domino WebHub / SMTP Top Ten Reasons to Consolidate Lotus  Workloads on IBM eServer  iSeries  and eServer.
CLUSTER COMPUTING Prepared by: Kalpesh Sindha (ITSNS)
Chapter 3 Operating Systems Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Parallelization with the Matlab® Distributed Computing Server CBI cluster December 3, Matlab Parallelization with the Matlab Distributed.
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
Ajmer Singh PGT(IP) Software Concepts. Ajmer Singh PGT(IP) Operating System It is a program which acts as an interface between a user and hardware.
SAINT2002 Towards Next Generation January 31, 2002 Ly Sauer Sandia National Laboratories Sandia is a multiprogram laboratory operated by Sandia Corporation,
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Chapter 1. Introduction What is an Operating System? Mainframe Systems
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Operating Systems.
LINUX System : Lecture 2 OS and UNIX summary Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Acknowledgement.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
CLUSTER COMPUTING STIMI K.O. ROLL NO:53 MCA B-5. INTRODUCTION  A computer cluster is a group of tightly coupled computers that work together closely.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3: Operating Systems Computer Science: An Overview Tenth Edition.
Fall 2000M.B. Ibáñez Lecture 01 Introduction What is an Operating System? The Evolution of Operating Systems Course Outline.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
CSE 403, Spring 2007, Alverson Software Architecture “Good software architecture makes the rest of the project easy.” McConnell, Survival Guide.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
By Omar Y. Tahboub Multimedia and Networking Lab MediaNet Computer Science Department Kent State University.
So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation,
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
SSS Test Results Scalability, Durability, Anomalies Todd Kordenbrock Technology Consultant Scalable Computing Division Sandia is a multiprogram.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
3 rd Party Software Gail Alverson August 5, 2005.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
INFORMATION SYSTEM-SOFTWARE Topic: OPERATING SYSTEM CONCEPTS.
Operating System Principles And Multitasking
IDE disk servers at CERN Helge Meinhard / CERN-IT CERN OpenLab workshop 17 March 2003.
VMware vSphere Configuration and Management v6
Threading Opportunities in High-Performance Flash-Memory Storage Craig Ulmer Sandia National Laboratories, California Maya GokhaleLawrence Livermore National.
Site Report DOECGF April 26, 2011 W. Alan Scott Sandia National Laboratories Sandia National Laboratories is a multi-program laboratory managed and operated.
CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Clusters Rule! (SMPs DRUEL!) David R. White Sandia National Labs Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin.
Other Tools HPC Code Development Tools July 29, 2010 Sue Kelly Sandia is a multiprogram laboratory operated by Sandia Corporation, a.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Tackling I/O Issues 1 David Race 16 March 2010.
Performing Fault-tolerant, Scalable Data Collection and Analysis James Jolly University of Wisconsin-Madison Visualization and Scientific Computing Dept.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multi-program laboratory.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Applied Operating System Concepts
The demonstration of Lustre in EAST data system
Introduction to Operating System (OS)
Chapter 1: Intro (excerpt)
Operating Systems.
Operating Systems.
Building and running HPC apps in Windows Azure
Presentation transcript:

Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National Laboratories has a long history of successfully applying high performance computing (HPC) technology to solve scientific problems. We drew upon our experiences with numerous architectural and design features when planning our most recent computer systems. This talk will present the key issues that were considered. Important principles are performance balance between the hardware components and scalability of the system software. The talk will conclude with lessons learned from the system deployments. Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

Outline A definition of HPC for scientific applications Design Principles –Partition Model –Network Topology –Balance of Hardware Components –Scalable System Software Lessons Learned

(n.) A branch of computer science that concentrates on developing supercomputers and software to run on supercomputers. A main area of this discipline is developing parallel processing algorithms and software programs that can be divided into little pieces so that each piece can be executed simultaneously by separate processors. ( Will not talk about embarrassingly parallel applications The idea/premise of scientific parallel processing is not new ( ) What is High Performance Computing?

The Partition Model: Match the hardware & software to its function Applies to both hardware and software Physically and logically divide the system into functional units Compute hardware different configuration than service & I/O Only run the necessary software to perform the function

Usage Model: Partitions cooperate to appear as one system Linux Login (Service) Node Compute Resource I/O

Mesh/Torus topologies are scalable 12,960 Compute Node Mesh X=27 Y=20 Z=24 Torus Interconnect in Z 310 Service & I/O Nodes

Minimize communication interference Jobs occupy disjoint regions simultaneously Example – red, green, and blue jobs: Z=24 X=27 Y=20 12,960 Compute Nodes

Hardware Performance Characteristics that Lead to a Balanced System Network bandwidth must balance with Processor speed and operations per second must balance with Memory bandwidth and capacity must balance with File system I/O bytes per second

In Addition to Balanced Hardware, System Software must be Scalable

Scalable System Software Concept #1 Do things in a hierarchical fashion

Jobs Launch is Hierarchical Compute Node Allocator Job Launch Login Node Linux User Application User Login & Start App Job Scheduler Node Batch mom Scheduler Batch Server … Compute Node Allocator Job Queues Database Node CPU Inventory Database Fan out application

System monitoring is hierarchical

Scalable System Software Concept #2 Minimize Compute Node Operating System Overhead

Operating System Interruptions Impede Progress of the Application

System monitoring is out of band and non-invasive

Scalable System Software Concept #3 Minimize Compute Node Interdependencies

Calculating Weather Minute by Minute Calc 1  0 min Calc 2  1 min Calc 3  2 min Calc 4  3 min4 min

Calculation with Breaks Calculation with Asynchronous Breaks Calc 1  0 min Wait  1 min Calc 2  2 min Calc 3  3 min Wait  4 min5 min Calc 4  6 min

Run Time Impact of Linux Systems Services (aka Daemons) Say breaks take 50  S and occur once per second –On one CPU, wasted time is 50  s every second Negligible.005% impact –On 100 CPUs, wasted time is 5 ms every second Negligible.5% impact –On 10,000 CPUs, wasted time is 500 ms Significant 50% impact

Scalable System Software Concept #4 Avoid linear scaling of buffer requirements

Connection-oriented protocols have to reserve buffers for the worst case If each node reserves a 100KB buffer for its peers, that is 1GB of memory per node for 10,000 processors. Need to communicate using collective algorithms

Scalable System Software Concept #5 Parallelize wherever possible

Use parallel techniques for I/O Compute Nodes I/O Nodes High Speed Network Parallel File System Servers (190 + MDS) 10.0 GigE Servers (50) Login Servers (10) RAIDs 10 Gbit Ethernet1 Gbit Ethernet 140 MB/s per FC X 2 X 190 = 53 GB/s 500 MB/s X 50 = 25 GB/s 1.0 GigE X 10 C C C C C C C C C C C C C C C C C C C C C C C C C C I I I I I I I I I I I I N N L L N N N N N N N N L L L L L L L L

Summary of Principles Partition the hardware and software Hardware –For scalability and upgradability, use a mesh network topology –Determine the right balance of processor speed, memory bandwidth, network bandwidth, and I/O bandwidth for your applications System Software –Do things in a hierarchical fashion –Minimize compute node OS overhead –Minimize compute node interdependencies –Avoid linear scaling of buffer requirements –Parallelize wherever possible

Lessons Learned Seek first to emulate –Learn from the past –Simulate the future Need technology philosophers Tilt Meters Historians Even Tiger Woods has a coach The big bang only worked once –Deploy test platforms early and often Build de-scalable, scalable systems –Don’t forget that you have to get it running first! –Leave the support structures (even non-scalable development tools) in working condition, you’ll need to debug some day Only dead systems never change –Nobody ever built just one system even when successfully deploying just one system –Nothing is ever done just once Build scaffolding that meets the structure –Is build and test infrastructure in place FIRST? –Will it effectively support both the team and the project?