Parallel Architectures: Topologies Heiko Schröder, 2003.

Slides:



Advertisements
Similar presentations
Shantanu Dutt Univ. of Illinois at Chicago
Advertisements

SE-292 High Performance Computing
Routing in a Parallel Computer. A network of processors is represented by graph G=(V,E), where |V| = N. Each processor has unique ID between 1 and N.
Jie Liu, Ph.D. Professor Department of Computer Science
Heiko Schröder, 2003 Parallel Architectures 1 Various communication networks State of the art technology Important aspects of routing schemes Known results.
Parallel Architectures: Topologies Heiko Schröder, 2003.
1 Interconnection Networks Direct Indirect Shared Memory Distributed Memory (Message passing)
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Interconnection Networks 1 Interconnection Networks (Chapter 6) References: [1,Wilkenson and Allyn, Ch. 1] [2, Akl, Chapter 2] [3, Quinn, Chapter 2-3]
Parallel Routing Bruce, Chiu-Wing Sham. Overview Background Routing in parallel computers Routing in hypercube network –Bit-fixing routing algorithm –Randomized.
Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network.
Communication operations Efficient Parallel Algorithms COMP308.
Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Interconnection Network Topologies
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
1 Static Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic.
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
Chapter 5 Array Processors. Introduction  Major characteristics of SIMD architectures –A single processor(CP) –Synchronous array processors(PEs) –Data-parallel.
Interconnect Network Topologies
Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
Interconnect Networks
Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Lecture 3 Innerconnection Networks for Parallel Computers
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 January Session 4.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Chapter 9: Alternative Architectures In this course, we have concentrated on single processor systems But there are many other breeds of architectures:
Parallel Processing, Low-Diameter Architectures
Embedding long paths in k-ary n-cubes with faulty nodes and links
Lecture 3 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Birds Eye View of Interconnection Networks
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Shared versus Switched Media.
1 Interconnection Networks. 2 Interconnection Networks Interconnection Network (for SIMD/MIMD) can be used for internal connections among: Processors,
Parallel Programming Sathish S. Vadhiyar. 2 Motivations of Parallel Computing Parallel Machine: a computer system with more than one processor Motivations.
Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/
2016/1/5Part I1 Models of Parallel Processing. 2016/1/5Part I2 Parallel processors come in many different varieties. Thus, we often deal with abstract.
Super computers Parallel Processing
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 9.
HYPERCUBE ALGORITHMS-1
Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
Parallel Processing & Distributed Systems Thoai Nam Chapter 3.
Interconnection Networks Communications Among Processors.
Distributed-Memory or Graph Models
2016/7/2Appendices A and B1 Introduction to Distributed Algorithm Appendix A: Pseudocode Conventions Appendix B: Graphs and Networks Teacher: Chun-Yuan.
De Bruijn sequences 陳柏澍 Novembers Each of the segments is one of two types, denoted by 0 and 1. Any four consecutive segments uniquely determine.
INTERCONNECTION NETWORK
Parallel Architecture
Distributed and Parallel Processing
Connection System Serve on mutual connection processors and memory .
Interconnection topologies
Course Outline Introduction in algorithms and applications
Parallel Architectures Based on Parallel Computing, M. J. Quinn
Lecture 14: Interconnection Networks
Outline Interconnection networks Processor arrays Multiprocessors
Mesh-Connected Illiac Networks
Static Interconnection Networks
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
Embedded Computer Architecture 5SAI0 Interconnection Networks
Interconnection Networks
Static Interconnection Networks
Presentation transcript:

Parallel Architectures: Topologies Heiko Schröder, 2003

Parallel Architectures 2 Types of sequential processors (SISD) processor memory processor memory cache memory processor Von Neumann bottleneck

Heiko Schröder, 2003 Parallel Architectures 3 SIMD MIMD PE Global control unit Interconnection network PE + control unit PE + control unit PE + control unit PE + control unit Interconnection network SPMD SIMD

Heiko Schröder, 2003 Parallel Architectures 4 Message passing / shared address space PE + M control unit PE + M control unit PE + M control unit PE + M control unit Interconnection network P P P P P M M M M P/M

Heiko Schröder, 2003 Parallel Architectures 5 Various communication networks State of the art technology Important aspects of routing schemes Known results (theory) The internet

Heiko Schröder, 2003 Parallel Architectures 6 Desirable feature of a network 1. Algorithmic Low diameter (1, complete graph) High bisection width (complete graph) n(n-1)/2 edges Degree n-1 2. Technical Low degree (pin limitations – constant – modular – mesh) Short wires (mesh) Small area (mesh) Regular structure (mesh)

Heiko Schröder, 2003 Parallel Architectures 7 Diameter n-1 Bisection width 1 Connection networks I 1-D mesh (linear array)

Heiko Schröder, 2003 Parallel Architectures 8 Tree Diameter 2(log n) Bisection width 1

Heiko Schröder, 2003 Parallel Architectures 9 H-tree Area: O(n) Longest wire :O(  n) Clock distribution

Heiko Schröder, 2003 Parallel Architectures 10 2-D Mesh Diameter: Bisection width :

Heiko Schröder, 2003 Parallel Architectures 11 Torus Reduced diameter Increased bisection width All nodes equivalent Long wires?

Heiko Schröder, 2003 Parallel Architectures 12 3-D Mesh Diameter: Bisection:

Heiko Schröder, 2003 Parallel Architectures 13 Hypercube 0-D D D D 01 4-D diameter log n bisection width n/2

Heiko Schröder, 2003 Parallel Architectures 14 Cube Connected Cycles nodes # nodes nodes Diameter> bisection

Heiko Schröder, 2003 Parallel Architectures 15 Exchange (lsb) Shuffle (rotate -- left or right) node shuffle-exchange graph Degree: 3 Diameter: 2 log n –1 : at most (log n –1) shuffles + (log n ) exchanges Bisection width:  (n / log n)

Heiko Schröder, 2003 Parallel Architectures Exchange (lsb) Shuffle (rotate -- left or right) 16-node shuffle-exchange graph u 1 u 2 …u k-1 u k ex u 1 u 2 …u k-1 v 1 u k v 1 v 2 …v k-1 … u 2 …u k v 1 v 2 ls+ex v 1 v 2 …v k ls+ex Diameter: 2 log n –1 : at most (log n –1) shuffles + (log n ) exchanges Bisection width:  (n / log n) Degree: 3

Heiko Schröder, 2003 Parallel Architectures 17 u 1 u 2 …u k-1 u k u 2 u 3 …u k-1 u k 0 0 u 1 u 2 …u k-1 u k u 2 u 3 …u k-1 u k dimensional de Bruijn graph In-degree = out-degree = 2 Diameter: log n Bisection width:  (n / log n) Each Eulerian tour = De Bruijn sequence = contains each possible sub-string of length 4 exactly once De Bruijn sequence

Heiko Schröder, 2003 Parallel Architectures 18 Butterfly network Unique path FFT routing sorting

Heiko Schröder, 2003 Parallel Architectures 19 Benes network

Heiko Schröder, 2003 Parallel Architectures 20 Mesh of trees Diameter  (log n) Bisection width  ( )

Heiko Schröder, 2003 Parallel Architectures 21 The Power of Hypercubes 4-D Hamiltonian cycle Gray codes k-D meshes (tori), N-nodes simulates mesh of trees simulates hypercubic networks contains complete binary tree, almost normal algorithms

Heiko Schröder, 2003 Parallel Architectures 22 Hamiltonian Cycle A hypercube contains a Hamiltonian cycle -- proof by induction. Each Hamiltonian cycle corresponds to a Gray code (only one bit is changed per link).

Heiko Schröder, 2003 Parallel Architectures 23 Gray code reflection

Heiko Schröder, 2003 Parallel Architectures 24 Hypercube contains meshes/tori wrap around Theorem: Any n 1 x n 2 x … x n k mesh (with or without wrap arounds) is a sub-graph of an n-D hypercube if  n i = 2 n. Proof: (see Leighton: Each sub-cube has Hamiltonian cycle)

Heiko Schröder, 2003 Parallel Architectures 25 Hypercube contains double-rooted trees HC can implement all tree algorithms and also all mesh-of-tree-algorithms (possibly with minor delay). double-roots (different dimension)

Heiko Schröder, 2003 Parallel Architectures 26 Normal algorithms A hypercube algorithm is said to be normal if only one dimension of hypercube edges is used at any step and if consecutive dimensions are used in consecutive steps. Most hypercube algorithms are normal. Normal algorithms can be embedded efficiently on hypercubic networks

Heiko Schröder, 2003 Parallel Architectures 27 Josephus graph: Every even node k is connected to k+2 i -3 Diameter: about (log n) /

Heiko Schröder, 2003 Parallel Architectures Star graph: Set of nodes: k! nodes of degree k-1. Permutations of k elements. Set of edges: Exchange of first element with one other. Small degree, diameter about 2 log n. Open problems: E.g. are there (k-1)/2 edge disjoint Hamiltonian cycles? Number of nodes versus degree (Star/HC): 24, 120, 720, 4340, 34720, , 32, 64, 128, 256, 512

Heiko Schröder, 2003 Parallel Architectures 29 pin - limitations 1 4-D

Heiko Schröder, 2003 Parallel Architectures 30 wiring - limitations 4-D nodes bisection width: K 25cm 32 m

Heiko Schröder, 2003 Parallel Architectures 31 Improve the topology? The internet

Heiko Schröder, 2003 Parallel Architectures 32 against parallelism cost(large) < cost (2 small) all the FORTRAN / C software let’s stick to pipelining let’s wait for faster machines Amdahl’s Law