Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network.

Slides:

Advertisements

Similar presentations

Basic Communication Operations

Advertisements

Parallel Sorting Sathish Vadhiyar. Sorting  Sorting n keys over p processors  Sort and move the keys to the appropriate processor so that every key.

CIS December '99 Introduction to Parallel Architectures Dr. Laurence Boxer Niagara University.

Parallel Architectures: Topologies Heiko Schröder, 2003.

Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.

1 Introduction to Data Parallel Architectures Sima, Fountain and Kacsuk Chapter 10 CSE462.

Parallel Architectures: Topologies Heiko Schröder, 2003.

1 Interconnection Networks Direct Indirect Shared Memory Distributed Memory (Message passing)

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures.

1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)

Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.

Interconnection Networks 1 Interconnection Networks (Chapter 6) References: [1,Wilkenson and Allyn, Ch. 1] [2, Akl, Chapter 2] [3, Quinn, Chapter 2-3]

CSE621/JKim Lec4.1 9/20/99 CSE621 Parallel Algorithms Lecture 4 Matrix Operation September 20, 1999.

Topics: 1. Trees - properties 2. The master theorem 3. Decoders מבנה המחשב - אביב 2004 תרגול 4#

Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.

7. Fault Tolerance Through Dynamic or Standby Redundancy 7.6 Reconfiguration in Multiprocessors Focused on permanent and transient faults detection. Three.

Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.

Interconnection Network Topologies

Graphs G = (V,E) V is the vertex set. Vertices are also called nodes and points. E is the edge set. Each edge connects two different vertices. Edges are.

1 Lecture 24: Parallel Algorithms I Topics: sort and matrix algorithms.

Topic Overview One-to-All Broadcast and All-to-One Reduction

1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,

1 Static Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic.

ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.

Important Problem Types and Fundamental Data Structures

Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:

Interconnect Network Topologies

Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.

Interconnect Networks

Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.

CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.

Basic Communication Operations Based on Chapter 4 of Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar These.

MA/CSSE 473 Day 12 Insertion Sort quick review DFS, BFS Topological Sort.

CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.

1 Parallel Sorting Algorithms. 2 Potential Speedup O(nlogn) optimal sequential sorting algorithm Best we can expect based upon a sequential sorting algorithm.

Topic Overview One-to-All Broadcast and All-to-One Reduction All-to-All Broadcast and Reduction All-Reduce and Prefix-Sum Operations Scatter and Gather.

Outline  introduction  Sorting Networks  Bubble Sort and its Variants 2.

1 Dynamic Interconnection Networks Miodrag Bolic.

Switches and indirect networks Computer Architecture AMANO, Hideharu Textbook pp. ９２～１３ 0.

Lecture 3 Innerconnection Networks for Parallel Computers

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 January Session 4.

InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.

Birds Eye View of Interconnection Networks

1 Interconnection Networks. 2 Interconnection Networks Interconnection Network (for SIMD/MIMD) can be used for internal connections among: Processors,

Basic Linear Algebra Subroutines (BLAS) – 3 levels of operations Memory hierarchy efficiently exploited by higher level BLAS BLASMemor y Refs. FlopsFlops/

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture.

Data Structures & Algorithms Graphs Richard Newman based on book by R. Sedgewick and slides by S. Sahni.

Super computers Parallel Processing

HYPERCUBE ALGORITHMS-1

INTERCONNECTION NETWORKS Work done as part of Parallel Architecture Under the guidance of Dr. Edwin Sha By Gomathy Gowri Narayanan Karthik Alagu Dynamic.

Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large.

Basic Communication Operations Carl Tropper Department of Computer Science.

Interconnection Networks Communications Among Processors.

Distributed-Memory or Graph Models

INTERCONNECTION NETWORK

Parallel Architecture

Distributed and Parallel Processing

Interconnect Networks

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.

Lecture 23: Interconnection Networks

Connection System Serve on mutual connection processors and memory .

PRAM Algorithms.

Butterfly Network A butterfly network consists of (K+1)2^k nodes divided into K+1 Rows, or Ranks. Let node (i,j) refer to the jth node in the ith Rank.

i206: Lecture 14: Heaps, Graphs intro.

Indirect Networks or Dynamic Networks

Mesh-Connected Illiac Networks

Static Interconnection Networks

High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub

Static Interconnection Networks

Important Problem Types and Fundamental Data Structures

Presentation transcript:

Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network topology Question: –Should user exploit the interconnection network topology? –Does user have the freedom to exploit the topology?

Mesh with Wraparound

Example: multiplying two n by n matrices on a mesh Initially, PE[i,j] has x[i,j] = a[i,j] and y[i,j] = b[i,j] Row i shift left x data (i-1) times Col j shift up y data j-1 times At each step, PE[i,j] do { c[i,j] = c + x*y. Send x to left (Wrap around), send y to up(wrap) } How about transitive closure?

Matrix Multiplication a11 a12a13a14 b11b12b13b14 a21 a31 a41 b21 b31 b41 b22b23 b24 b32b33b34 b42 b43b44 a22a23 a24 a32 a33 a34 a42a43 a44 Step 1

a11 a12a13a14 b11b22b33b44 a22 a33 a44 b21 b31 b41 b32b43 b14 b42b13b24 b12 b23b34 a23a24 a21 a34 a31 a32 a41a42 a43 Step 2: Rearrange Data

Step 3: Multiply Add and Move Data a11 a12a13a14 b11b22b33b44 a22 a33 a44 b21 b31 b41 b32b43 b14 b42b13b24 b12 b23b34 a23a24 a21 a34 a31 a32 a41a42 a43 Data Move at Cell ik bjk aij  c21 = a22b21 + a21b11 + a24b41 + a23b33

Systolic Array Algorithm a14 a13 a12 a11 a24 a23 a22 a21 a34 a33 a32 a31 a43 a42 a41 b41 b31 b21 b11 b42 b32 b22 b12 b43 b33 b23 b13 b34 b24 b14

How to simulate wraparound mesh using regular mesh without losing speed more than a constant factor?

Tree Architecture Application: Census functions, Data Base, Queue, Stack

Tree Computation Census function : a[1] a[n] Applications: Can you compute s[i] = a[1] + a[2] +... a[i], for i=1... n? parallel prefix computation Bottleneck: Every data goes to root How to solve: Make channel to thick as it goes to the top of the tree => fat tree

Example: Parallel Prefix Computation Step 1: Upward phase For each node, when it receive data from left and right, then sum = left + right if node is not the root, send sum to its parent when the root receives data from left and right children { send 0 to its left child send left to its right child }

Step 2: Downward phase When a nonleaf receives sum from its parent{ send sum to its left child send left + sum to its right child } When a leaf node receives sum from its parent then prefix = sum + data

Disadvantages of Trees: Small bisection width Root can be the bottle neck

Properties of Interconnection Networks –Small Diameter diam = max (u,v in V) (u,v) –Large Bisection Width Smallest number of edges whose removal divides G into two equal size –Fixed node degree –Uniformity (symmetric) Graph looks the same independent from which vertex you look –Incremental extendability: Allow any size –Scalable (graph): construct larger one easily. i.e., smaller one can be obtained from the larger one by removing some nodes and edges. –hypercube, mesh –shuffle exchange netwok, DeBruin’s graph –Routing and collective communication one to all, all to all –Embeddability –Simple layout complexity (=> small bisection width: conflict) –Fault tolerance

Fat Tree CM5 Data Link

Hypercube One way of solving the bottleneck of tree and large diameter of mesh Recursively defined as follows: H n : H n-1

Hypercube Interconnection 0 1 Large Bisection width Small Radius High Fault Tolerant But node degree too high

Mapping Mesh onto a hypercube A[i,j] on mesh -> A(gray(i)·gray(j)) on Hypercube A[i,j+1] on mesh -> A(gray(i)·gray(j+1)) on Hypercube connected to A(gray(i)·gray(j))

Mapping a binary tree on a hypercube

Hypercube Data Move Example Reversing a list Before: PE[i] has A[i] After: PE[i] has A[n-i-1], 0<= i <= n-1 Reverse (H) { Swap (A) for the highest bit Reverse two H k-1 in parallel } Matrix Transpose A[i,j] -> A[j,i]

Shuffle Exchange Network

Mesh of Trees 2D with 16 nodes

Cube Connected Cycles Hupercube node with dimension 4 CCC node e1e2 e3 e4 e5 e1 e2 e3 e4 e5 2 n nodes r2 n nodes r = logn

Cube Connecyed Cycles

CCC Large bisection width Scalable Small diameter Can simulate Hypercube

Simulation of Hypercube using CCC Divide and Conquer Algorithm communication pattern Ascend d=1, d=2, d=4, d=8,..., d=n/2 Descend d=n/2,...., d=4, d=2,..., d=1 example : –merging –Sorting –FFT For this type of data movement, CCC can simulate hypercube data move without any penalty

De Bruin’s Graph (x n-1,x n-2,...,x 0 ) -> (x n-2,...,x 0,0) and -> (x n-2,...,x 0,1) Highly recursive Linear Shift Register Lock Combination D=0 D=1 D=2

Multistage Interconnection Network Blocking Networks –Unidirectional MIN –Bidirectional MIN Non Blocking Networks Any input port can be connected to any free output port without affecting the existing connections. –2D mesh Crossbar –Time Division bus –Clos network.