Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Similar presentations


Presentation on theme: "The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National."— Presentation transcript:

1 The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories http://www.sandia.gov/~smkelly Abstract: Sandia National Laboratories has a long history of successfully applying massively parallel processing (MPP) technology to solve problems in the national interest for the US Department of Energy. We drew upon our experiences with numerous architectural and design features when planning the Red Storm computer system. This talk will present the key issues that were considered. Important principles are performance balance between the hardware components and scalability of the system software. Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

2 (n.) A branch of computer science that concentrates on developing supercomputers and software to run on supercomputers. A main area of this discipline is developing parallel processing algorithms and software: programs that can be divided into little pieces so that each piece can be executed simultaneously by separate processors. ( http://www.webopedia.com/TERM/H/High_Performance_Computing.html) http://www.webopedia.com/TERM/H/High_Performance_Computing.html The idea/premise of parallel processing is not new ( http://www.sandia.gov/ASC/news/stories.html#nineteen-twenty-two ) http://www.sandia.gov/ASC/news/stories.html#nineteen-twenty-two What is High Performance Computing?

3 Red Storm – a First Look Sandia/Cray Inc. partnership: –Sandia architecture –Sandia & Cray System Software –Cray engineering and manufacturing –Sandia systems HW/SW expertise

4 Red Storm is a Massively Parallel Processor Supercomputer 12,960 2.4 GHz Dual Core Opterons for computation (called nodes) 2 GB Memory per core (in progress)

5 Usage Model Linux Login (Service) Node Compute Resource I/O

6 Key Performance Characteristics that Lead to a Balanced system 124.42 TeraFLOPS (trillion floating point operations per second) Aggregate system memory bandwidth of 83 TB/s Sustained aggregate interconnect bandwidth of 120 TB/s High-performance I/O subsystem (minimum sustained file system bandwidth of 100 GB/s to 340 TB of parallel disk storage and sustained external network bandwidth of 50 GB/s)

7 Additional Architectural Features Scalability: Red Storm’s hardware and system software scale from a single cabinet system to a 32,000 node system. Functional Partitioning: Hardware and system software are carefully engineered to optimize the scalability and the performance of the system. Reliability: A full system Reliability, Availability, Serviceability (RAS) is designed into the architecture. Upgrade-ability: There is a designed-in path for system upgrades. Custom Packaging: Red Storm is designed to be a high density, relatively low power system. Price/Performance: It has excellent performance per dollar through the use of high volume commodity parts where feasible.

8 In Addition to Balanced Hardware, System Software must be Scalable

9 Scalable System Software Concept #1 Do things in a hierarchical fashion

10 Jobs Launch is Hierarchical Compute Node Allocator Job Launch (Yod) Login Node Linux User Application Red Storm User Login & Start App PBS Node PBS mom Scheduler PBS Server.................. … Compute Node Allocator Job Queues Database Node CPU Inventory Database Fan out application

11 RAS monitoring is hierarchical

12 Scalable System Software Concept #2 Minimize Interruptions to the Application

13 Calculating Weather Minute by Minute Calc 1  0 min Calc 2  1 min Calc 3  2 min Calc 4  3 min4 min

14 Calculation with Breaks Calculation with Asynchronous Breaks Calc 1  0 min Wait  1 min Calc 2  2 min Calc 3  3 min Wait  4 min5 min Calc 4  6 min

15 Operating System Interruptions Impede Progress of the Application

16 Scalable System Software Concept #3 Avoid linear scaling of buffer requirements

17 Connection-oriented protocols have to reserve buffers for the worst case If each node reserves a 100KB buffer for its peers, that is 1GB of memory per node for 10,000 processors. Need to communicate using collective algorithms

18 Scalable System Software Concept #4 Parallelize wherever possible

19 Use parallel techniques for I/O Compute Nodes I/O Nodes High Speed Network Parallel File System Servers (190 + MDS) 10.0 GigE Servers (50) Login Servers (10) RAIDs 10 Gbit Ethernet1 Gbit Ethernet 140 MB/s per FC X 2 X 190 = 53 GB/s 500 MB/s X 50 = 25 GB/s 1.0 GigE X 10 C C C C C C C C C C C C C C C C C C C C C C C C C C I I I I I I I I I I I I N N L L N N N N N N N N L L L L L L L L

20 Conclusion Hardware, system software, and application software are all important participants in achieving a high performing system. Although originally designed to address the needs of a specific project, it has become a very popular commercial product around the world.


Download ppt "The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National."

Similar presentations


Ads by Google