High-End Computing Systems EE380 State-of-the-Art Lecture Hank Dietz Professor & Hardymon Chair in Networking Electrical & Computer Engineering Dept. University.

Slides:



Advertisements
Similar presentations
Data Communications and Networking
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Lecture 6: Multicore Systems
Introduction to Computer Hardware Dr. Steve Broskoske Misericordia University.
Today’s topics Single processors and the Memory Hierarchy
Beowulf Supercomputer System Lee, Jung won CS843.
CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.
History of Distributed Systems Joseph Cordina
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Shalini Bhavanam. Key words: Basic Definitions Classification of Networks Types of networks Network Topologies Network Models.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
"Practical Considerations in Building Beowulf Clusters" Lessons from Experience and Future Directions Arch Davis (GS*69) Davis Systems Engineering.
Group 11 Pekka Nikula Ossi Hämäläinen Introduction to Parallel Computing Kentucky Linux Athlon Testbed 2
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
ECE 526 – Network Processing Systems Design
Real Parallel Computers. Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel.
Introduction to Parallel Processing Ch. 12, Pg
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Networks CSCI-N 100 Dept. of Computer and Information Science.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
LECTURE 9 CT1303 LAN. LAN DEVICES Network: Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and.
Information and Communication Technology Fundamentals Credits Hours: 2+1 Instructor: Ayesha Bint Saleem.
Interconnect Networks
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Beowulf Cluster Jon Green Jay Hutchinson Scott Hussey Mentor: Hongchi Shi.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Jump to first page One-gigabit Router Oskar E. Bruening and Cemal Akcaba Advisor: Prof. Agarwal.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
Beowulf – Cluster Nodes & Networking Hardware Garrison Vaughan.
Design an MPI collective communication scheme A collective communication involves a group of processes. –Assumption: Collective operation is realized based.
Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.
Distributed Programming CA107 Topics in Computing Series Martin Crane Karl Podesta.
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
Data Structures and Algorithms in Parallel Computing Lecture 1.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Paula Michelle Valenti Garcia #30 9B. MULTICORE TO CLUSTER Parallel circuits processing, symmetric multiprocessor, or multiprocessor: in the PC has been.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Lecture 24: Networks — Introduction Professor Randy H. Katz.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
1  2004 Morgan Kaufmann Publishers Fallacies and Pitfalls Fallacy: the rated mean time to failure of disks is 1,200,000 hours, so disks practically never.
Interconnection Networks Communications Among Processors.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Processor Level Parallelism 1
Network Connected Multiprocessors
Interconnect Networks
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
Constructing a system with multiple computers or processors
Interconnection Network Design Lecture 14
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
CSE8380 Parallel and Distributed Processing Presentation
Constructing a system with multiple computers or processors
Types of Parallel Computers
Cluster Computers.
Presentation transcript:

High-End Computing Systems EE380 State-of-the-Art Lecture Hank Dietz Professor & Hardymon Chair in Networking Electrical & Computer Engineering Dept. University of Kentucky Lexington, KY

What Is A Supercomputer? One of the most expensive computers? A very fast computer? Really two key characteristics: Computer that solves big problems... stuff that wouldn't fit on a PC stuff that would take too long to run Performance can scale... more money buys a faster machine A supercomputer can be cheap!

The Key Is Parallel Processing Process N “pieces” simultaneously, get up to factor of N speedup Modular hardware designs: Relatively easy to scale – add modules Higher availability (if not reliability)

The Evolution Of Supercomputers Most fit survives, even if it's ugly Rodents outlast dinosaurs... and bugs will outlast us all!

When Does Supercomputing Make Sense? When you need results NOW! Top500 speeds up 1.4X every 6 months! Just waiting might work... Optimizing your code helps a lot; do that first! When your application takes enough time per run to justify the effort and expense Our technologies don't change the basics... they mostly improve price/performance

What Is A Cluster Supercomputer? Not a “traditional” supercomputer? Is The Grid a cluster? Is a Farm a cluster? A Beowulf? A supercomputer made from Interchangeable Parts (mostly from PCs) Some PC parts you don't need or want Often, Linux PC “nodes”

Parts... Vs. In A Traditional Supercomputer Processors: AMD Athlon, Opteron; Intel Pentium 4, Itanium; Apple G5... within 2X of very low cost Motherboards, Memory, Disks, Network, Video, Audio, Physical Packaging... Lots of choices, but parts tuned for PC use, not for cluster supercomputing

AMD Athlon XP

Types Of Hardware Parallelism Pipeline Superscalar, VLIW, EPIC SWAR (SIMD Within A Register) SMP (Symmetric MultiProcessor) Cluster Farm Grid

Engineer To Meet Application Needs Know your application(s) Tune your application(s) Know your budget: Money, Power, Cooling, Space Hardware configuration options Software configuration options

Engineering A Cluster This is a systems problem Optimize integrated effects of: Computer architecture Compiler optimization/parallelization Operating system Application program Payoff for good engineering can be HUGE! (penalty for bad engineering is HUGE!)

One Aspect: Interconnection Network Parallel supercomputer nodes must interact Bandwidth Bits transmitted per second Bisection Bandwidth is most important Latency Time to send something from here to there Harder to improve than bandwidth....

Latency Determines Smallest Useful Parallel Grain Size

Network Design Assumptions Links are bidirectional Bounded # of network interfaces per node Point-to-point message communications Topology Hardware Software

No Network

Direct Fully Connected

Toroidal 1D Mesh (Ring)

Physical Layout Of Ring

Non-Toroidal 2D Mesh

3-Cube (AKA 3D Mesh)

Switch Networks Ideal switch connects N things such that: Bisection bandwidth = # ports Latency is low (~30us for Ethernet) Other switch-like units: Hubs, FDRs (Full Duplex Repeaters) Managed Switches, Routers Not enough ports, build a Switch Fabric

Simple Switch (8-Port)

Channel Bonding (2-Way)

Tree (4-Port Switches)

A Better Tree

Fat Tree

Our Insights Want a “flat” single-level network Top level determines bisection bandwidth Multiple levels multiply latency Connect each node to multiple switches, only talk with nodes “in the same neighborhood” Use a wiring pattern such that each node pair has at least one switch in common Design is an open problem in graph theory A Genetic Algorithm can evolve a solution!

Flat Neighborhood Network

Flat Vs. Fat Latency: 8 node, 4 port: 1.0 vs. 2.7 switch delays 64 node, 32 port: 1.0 vs. 2.5 Pairwise bisection bandwidth: 8 node, 4port: 1.29 vs. 1.0 units 64 node, 32 port: 1.48 vs. 1.0 Cost: more interfaces vs. smart routers Summary: Flat Neighborhood wins!

KLAT2, Gort, & Klaatu

Behind KLAT2

KLAT2 Changed Everything KLAT2 (Kentucky Linux Athlon Testbed 2): 1 st network designed by computer 1 st network deliberately asymmetric 1 st supercomputer to break $1K/GFLOPS 160+ news stories about KLAT2 Various awards: 2000 Gordon Bell (price/performance) 2001 Computerworld Smithsonian, among 6 Its most advancing science

Cool, But What Have You Done Recently? LOTS! Nanocontrollers (programmable nanotech) GPUs for supercomputing Warewulf & cAos systems software etc., see: Aggregate.Org

Did I Mention SFNNs? Real parallel applications don't actually have every node talk to every other node Design the network to be “Sparse”: FNN properties only for the node pairs that actually will talk to each other Network complexity apparently grows as O(N*N), but this makes it O(N*LogN)!

June 2003, KASY0

KASY0 128-node system using 24-port switches! KASY0 (Kentucky ASYmmetric zero): 1 st Sparse FNN 1 st physical layout optimized by GA 1 st TFLOPS-capable supercomputer in KY 1 st supercomputer to break $100/GFLOPS World record fastest POVRay 3.5

POVRay 3.5 Benchmark

Supercomputers R Us We make supercomputing cheap! You can help... Build parties Weekly research group meetings Projects Everything's at: Aggregate.Org