Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance.

Similar presentations

Presentation on theme: "Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance."— Presentation transcript:

1 Multiprocessors and the Interconnect

2 Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance

3 Interconnection zCarry data between processors and to memory zInterconnect components yswitches ylinks (wires, fiber) zInterconnection network flavors ystatic networks: point-to-point communication links xAKA direct networks. ydynamic networks: switches and communication links xAKA indirect networks.

4 Static vs. Dynamic

5 Dynamic Networks zSwitch: maps a fixed number of inputs to outputs zNumber of ports on a switch = degree of the switch. zSwitch cost ygrows as the square of switch degree yperipheral hardware grows linearly with switch degree ypackaging cost grows linearly with the number of pins zKey property: blocking vs. non-blocking yblocking xpath from p to q may conflict with path from r to s xfor independent p, q, r, s ynon-blocking xdisjoint paths between each pair of independent sources and sinks

6 Network Interface zProcessor nodes link to the interconnect zNetwork interface responsibilities ypacketizing communication data ycomputing routing information ybuffering incoming/outgoing data zNetwork interface connection yI/O bus: PCI or PCIx on many modern systems ymemory bus: e.g. AMD HyperTransport, Intel QuickPath yhigher bandwidth and tighter coupling than I/O bus zNetwork performance ydepends on relative speeds of I/O and memory buses

7 Topologies zMany network topologies zTradeoff: performance vs. cost zMachines often implement hybrids of multiple topologies ypackaging ycost yavailable components

8 Metrics zDegree ynumber of links per node zDiameter ylongest distance between two nodes in the network zBisection Width ymin # of wire cuts to divide the network in 2 halves zCost: y# links or switches

9 Topologies: Bus zAll processors access a common bus for exchanging data zUsed in simplest and earliest parallel machines zAdvantages ydistance between any two nodes is O(1) yprovides a convenient broadcast media zDisadvantages ybus bandwidth is a performance bottleneck

10 Bus Systems zA bus system is a hierarchy of buses connection various system and subsystem components. yhas a complement of control, signal, and power lines. za variety of buses in a system: yLocal bus – (usually integral to a system board) connects various major system components (chips) yMemory bus – used within a memory board to connect the interface, the controller, and the memory cells yData bus – might be used on an I/O board or VLSI chip to connect various components yBackplane – like a local bus, but with connectors to which other boards can be attached

11 Bridges zThe term bridge is used to denote a device that is used to connect two (or possibly more) buses. zThe interconnected buses may use the same standards, or they may be different (e.g. PCI in a modern PC). zBridge functions include yCommunication protocol conversion yInterrupt handling yServing as cache and memory agents

12 Bus zSince much of the data accessed by processors is local to the processor, cache is critical for the performance of busbased machines

13 Bus Replacement: Direct Connect zIntel Quickpath interconnect ( present)

14 Direct Connect: 4 Node Configurations Figure Credit : The Opteron CMP NorthBridge Architecture, Now and in the Future, AMD, Pat Conway, Bill Hughes, HOT CHIPS N SQ XFIRE BW 14.9GB/s Diam 2 avg:1 4N FC XFIRE BW 29.9GB/s Diam 1, Avg: 0.75

15 Direct Connect: 8 Node Configurations

16 Crossbar Network zA crossbar network uses an p×m grid of switches to connect p inputs to m outputs in a non-blocking manner zA non-blocking crossbar network connecting p processors to b memory banks zCost of a crossbar: O(p^2) yGenerally difficult to scale for large values of p yEarth Simulator: custom 640-way single-stage crossbar

17 Assessing Network Alternatives zBuses yexcellent cost scalability ypoor performance scalability zCrossbars yexcellent performance scalability ypoor cost scalability zMultistage interconnects ycompromise between these extremes

18 Multistage Network

19 Multistage Omega Network zOrganization ylog p stages yp inputs/outputs zAt each stage, input i is connected to output j if:

20 Omega Network Stage zEach Omega stage is connected in a perfect shuffle

21 Omega Network Switches z2×2 switches connect perfect shuffles zEach switch operates in two modes

22 Multistage Omega Network zCost: p/2 × log p switching nodes O(p log p)

23 Omega Network Routing zLet ys = binary representation of the source processor yd = binary representation of the destination processor or memory zThe data traverses the link to the first switching node yif the most significant bit of s and d are the same xroute data in pass-through mode by the switch yelse xuse crossover path zStrip off leftmost bit of s and d zRepeat for each of the log p switching stages

24 Omega Network Routing

25 Blocking in an Omega Network

26 Clos Network (non-blocking)

27 Star Connected Network zStatic counterparts of buses zEvery node connected only to a common node at the center zDistance between any pair of nodes is O(1)

28 Completely Connected Network zEach processor is connected to every other processor ystatic counterparts of crossbars ynumber of links in the network scales as O(p^2)

29 Linear Array zEach node has two neighbors: left & right zIf connection between nodes at ends: 1D torus (ring)

30 Meshes and k-d Meshes zMesh: generalization of linear array to 2D ynodes have 4 neighbors: north, south, east, and west. zk-d mesh: yd-dimensional mesh ynode have 2d neighbors

31 Hypercubes zSpecial d-dimensional mesh: p nodes, d = log p

32 Hypercube Properties zDistance between any two nodes is at most log p. yEach node has log p neighbors yDistance between two nodes = # of bit positions that differ between node numbers

33 Trees

34 Tree Properties zDistance between any two nodes is no more than 2 log p zTrees can be laid out in 2D with no wire crossings zProblem ylinks closer to root carry > traffic than those at lower levels. zSolution: fat tree ywiden links as depth gets shallower ycopes with higher traffic on links near root

35 Fat Tree Network zFat tree network for 16 processing nodes zCan judiciously choose fatness of links ztake full advantage of technology and packaging constraints

36 Metrics for Interconnection Networks

37 Metrics for Dynamic Interconnection Networks

Download ppt "Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance."

Similar presentations

Ads by Google