Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network.

Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network topology Question: –Should user exploit the interconnection network topology? –Does user have the freedom to exploit the topology?

Mesh with Wraparound

Example: multiplying two n by n matrices on a mesh Initially, PE[i,j] has x[i,j] = a[i,j] and y[i,j] = b[i,j] Row i shift left x data (i-1) times Col j shift up y data j-1 times At each step, PE[i,j] do { c[i,j] = c + x*y. Send x to left (Wrap around), send y to up(wrap) } How about transitive closure?

Matrix Multiplication a11 a12a13a14 b11b12b13b14 a21 a31 a41 b21 b31 b41 b22b23 b24 b32b33b34 b42 b43b44 a22a23 a24 a32 a33 a34 a42a43 a44 Step 1

a11 a12a13a14 b11b22b33b44 a22 a33 a44 b21 b31 b41 b32b43 b14 b42b13b24 b12 b23b34 a23a24 a21 a34 a31 a32 a41a42 a43 Step 2: Rearrange Data

Step 3: Multiply Add and Move Data a11 a12a13a14 b11b22b33b44 a22 a33 a44 b21 b31 b41 b32b43 b14 b42b13b24 b12 b23b34 a23a24 a21 a34 a31 a32 a41a42 a43 Data Move at Cell ik bjk aij  c21 = a22b21 + a21b11 + a24b41 + a23b33

Systolic Array Algorithm a14 a13 a12 a11 a24 a23 a22 a21 a34 a33 a32 a31 a43 a42 a41 b41 b31 b21 b11 b42 b32 b22 b12 b43 b33 b23 b13 b34 b24 b14

How to simulate wraparound mesh using regular mesh without losing speed more than a constant factor?

Tree Architecture Application: Census functions, Data Base, Queue, Stack

Tree Computation Census function : a[1] +... + a[n] Applications: Can you compute s[i] = a[1] + a[2] +... a[i], for i=1... n? parallel prefix computation Bottleneck: Every data goes to root How to solve: Make channel to thick as it goes to the top of the tree => fat tree

Example: Parallel Prefix Computation Step 1: Upward phase For each node, when it receive data from left and right, then sum = left + right if node is not the root, send sum to its parent when the root receives data from left and right children { send 0 to its left child send left to its right child } 1 2 3 4 5 6 7 8

Step 2: Downward phase When a nonleaf receives sum from its parent{ send sum to its left child send left + sum to its right child } When a leaf node receives sum from its parent then prefix = sum + data 1 2 3 4 5 6 7 8 0 10 0 3 0 1 36 21 152128 1 3 10 11 5 7 1 3 6 10 15 21 28 36

Disadvantages of Trees: Small bisection width Root can be the bottle neck

Properties of Interconnection Networks –Small Diameter diam = max (u,v in V) (u,v) –Large Bisection Width Smallest number of edges whose removal divides G into two equal size –Fixed node degree –Uniformity (symmetric) Graph looks the same independent from which vertex you look –Incremental extendability: Allow any size –Scalable (graph): construct larger one easily. i.e., smaller one can be obtained from the larger one by removing some nodes and edges. –hypercube, mesh –shuffle exchange netwok, DeBruin’s graph –Routing and collective communication one to all, all to all –Embeddability –Simple layout complexity (=> small bisection width: conflict) –Fault tolerance

Fat Tree CM5 Data Link

Hypercube One way of solving the bottleneck of tree and large diameter of mesh Recursively defined as follows: H n : H n-1

Hypercube Interconnection 0 1 Large Bisection width Small Radius High Fault Tolerant But node degree too high

Mapping Mesh onto a hypercube A[i,j] on mesh -> A(gray(i)·gray(j)) on Hypercube A[i,j+1] on mesh -> A(gray(i)·gray(j+1)) on Hypercube connected to A(gray(i)·gray(j))

Mapping a binary tree on a hypercube

Hypercube Data Move Example Reversing a list Before: PE[i] has A[i] After: PE[i] has A[n-i-1], 0<= i <= n-1 Reverse (H) { Swap (A) for the highest bit Reverse two H k-1 in parallel } Matrix Transpose A[i,j] -> A[j,i]

Shuffle Exchange Network 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111

Mesh of Trees 2D with 16 nodes

Cube Connected Cycles Hupercube node with dimension 4 CCC node e1e2 e3 e4 e5 e1 e2 e3 e4 e5 2 n nodes r2 n nodes r = logn

Cube Connecyed Cycles 0000 1111

CCC Large bisection width Scalable Small diameter Can simulate Hypercube

Simulation of Hypercube using CCC Divide and Conquer Algorithm communication pattern Ascend d=1, d=2, d=4, d=8,..., d=n/2 Descend d=n/2,...., d=4, d=2,..., d=1 example : –merging –Sorting –FFT For this type of data movement, CCC can simulate hypercube data move without any penalty

De Bruin’s Graph (x n-1,x n-2,...,x 0 ) -> (x n-2,...,x 0,0) and -> (x n-2,...,x 0,1) Highly recursive Linear Shift Register Lock Combination D=0 D=1 D=2

Multistage Interconnection Network Blocking Networks –Unidirectional MIN –Bidirectional MIN Non Blocking Networks Any input port can be connected to any free output port without affecting the existing connections. –2D mesh Crossbar –Time Division bus –Clos network.

Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network.

Similar presentations

Presentation on theme: "Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network.

Similar presentations

Presentation on theme: "Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network."— Presentation transcript:

Similar presentations

About project

Feedback