Presentation is loading. Please wait.

Presentation is loading. Please wait.

Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.

Similar presentations


Presentation on theme: "Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters."— Presentation transcript:

1 Interconnect Networks Basics

2 Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters of servers)

3 Interconnection network performance Latency: how much time does it take between the time when a send of 1 byte is issued and the time when the receive of the data is completed? – Signal propogation delay + router queuing delay Bandwidth: how much time to send a large amount of data (e.g. 1MB)? Examples: – Ethernet: Bandwidth 100Mbps, 1Gbps, 10Gbps, 100Gbps Latency: 25us -100us (user level, single hop, try ping between linprog’s) – InfiniBand Bandwidth: 20Gbps, 40Gbps, 54Gbps, 80Gbps, …… Latency: 1-3us (user level, single hop)

4 Interconnection network performance Latency and Bandwidth – Different levels User level: the performance that users feel Systems level, device level Which level will have the highest bandwidth? – Example: 1Gbps Ethernet, 800Mbps at system level, 650Mbps at the user level. 1Gbps Ethernet, which level? 0.115ms ping latency, which level? – Some measurement trap: single pair.vs. multiple pair.

5 Network components Network interface (card) Communication between a node and the network Link Bundle of wires and fibers that carry signals Switches Connects a fixed number of input channels to a fixed number of output channels. In this community, switches may also have the router functions.

6 Switch The cross-bar can realize a communication from any input port to any output port. The simplest form is a dedicated computer with memory (e.g. linux router).

7 Most expensive form: Cross-bar functionality – all permutations can be realized simultaneously 1 2 3 4 1234 inputinput output A 4x4 cross-bar Permutation: (1, 2, 3, 4) -> (3, 1, 2, 4) A communication pattern where each source happens once, each destination happens once. The input registers send control signals to the control, routing, scheduling module indicating the pattern; the control module computes and sets the dots. 1234 (1,2, 3, 4)-> (3, 1, 2, 4) 1234 (1,2,3,4)->(4,3,2,2) Only (1,2,3,4)->(4,3,2,-) 1 2 3 4 1 2 3 4

8 Switch example: 24-port 1Gbps Ethernet switch 24 input ports and 24 output ports – each Ethernet jacket has one input port and one output port. All 24 machines can send and receive simultaneously. switch Ethernet card machine

9 Alternatives to cross-bars A question: why buffers when we can always do permutation? An N x N cross bar has O(N^2) cross points (on/off switches). – Not scalable, expensive An alternative for low end switches: bus and memory – When bus and memory is fast enough, moving data between input and output ports are like memory copy in a typical computer.

10 Bus and memory alternative to crossbar Realizing (1, 2, 3, 4) -> (4, 3, 2, 1) – Read from input port 1 to memory A – Read from input port 2 to memory B – Read from input port 3 to memory C – Read from input port 4 to memory D – Run forwarding logic (find out the output ports) – Write A to output port 4 – Write B to output port 3 – Write C to output port 2 – Write D to output port 1

11 Bus and memory alternative to crossbar A typical northbridge bandwidth is a few GBps. Let us assume the bandwidth is 4GBps, how many ports can the northbridge support in 100Mbps Ethernet swithes?

12 Another alternative: multistage interconnection network Realize all permutations without controlling O(N^2) cross-points. – Clos networks, Benes networks Each of the dot is a 2x2 switch, controlled by two states. 0 1 How to realize 0000->0000, 0001->0001, 0010->1011?

13 Switch All approximate crossbars – High end ones are equivalent to or close to crossbars: all permutations can happens simultaneously. – Low end ones will have limited total bandwidth (aggregate bandwidth). Example: High end and low end 24 port 1Gbps switch connecting 24 computers. – With one pair of Source/destination, the throughput will be about 800Mbps for both (no difference). – When 24 pairs send/receive at the same time High end one will get 24*800Mbps Low end one will get a total of X Mbps, X < 24*800Mbps (X can sometimes be about 5*800Mbps) – Different pairs may also have different throughput depending on the scheduling algorithm.

14 Network level components Topology (what) – Physical interconnection structure of the network graph. – Physically limits the performance of the networks. Routing algorithm (which) – Restricts the set of paths that messages can follow. Switching strategy (how) – How data in a message traverses a route (passing routers) Flow control mechanism (when) – When a message or portions of it traverse a route – What happens when traffic collides

15 Topology How the components are connected. Important properties Diameter: maximum distance between any two nodes in the network (hop count, or # of links). Nodal degree: how many links connect to each node. Bisection bandwidth: The smallest bandwidth between half of the nodes to another half of the nodes. A good topology: small diameter, small nodal degree, large bisection bandwidth.

16 Topology Regular topologies – Nodes are connected with some kind of patterns. The graph has a structure. – Nodes are identified by coordinates. – Routing can usually pre-determined by the coordinates of the nodes. Irregular topologies – Nodes are connected arbitrarily. The graph does not have a structure, e.g. internet More extensible in comparison to regular topology. – Usually use variations of shortest path routing.

17 Example regular topology: complete binary tree Nodal degree = ? Diameter = ? Bisection bandwidth = ?

18 Example regular topology: ring topology Nodal degree = ? Diameter = ? Bisection bandwidth = ? 01234

19 Routing: deciding which path to take from a source to a destination 0 to 1: 0->1 or 0->4->3->2->1 Which path to use? This is a routing issue. Routing objective: – Minimize resources used Shortest path routing – The load on all links are as balanced as possible (load balancing). ??? 01234

20 Classification of routing schemes 0 to 1: 0->1 or 0->4->3->2->1 Deterministic.vs. adaptive – Deterministic – always the same route – Adaptive – choose load depending on traffic condition? Minimal routing: always use shortest path Source routing: the source node supplies the path Destination routing: routing based on destination ID 01234

21 Switching Communication data units: – Message – Packet – Flit How a packet passes a switch. Circuit switching – circuit setup, all data pass through Packet switching: the whole packet stored in a switch, and then forwarded to the next hop

22 Flow-control Used between hops to make sure that when data is sent, there is available buffer for the data. Built into switching mechanism sometimes.


Download ppt "Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters."

Similar presentations


Ads by Google