Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Science and Engineering Advanced Computer Architecture CSE 8383 April 24, 2008 Session 12.

Similar presentations


Presentation on theme: "Computer Science and Engineering Advanced Computer Architecture CSE 8383 April 24, 2008 Session 12."— Presentation transcript:

1 Computer Science and Engineering Advanced Computer Architecture CSE 8383 April 24, 2008 Session 12

2 Computer Science and Engineering Contents Message Passing Systems (Chapters 5 & 7) Communication Patterns Network Computing Client/Server System Clusters Grid Interconnection Networks

3 Computer Science and Engineering Message Passing Mechanisms Message Format Message  arbitrary number of fixed length packets Packet  basic unit containing destination address. Sequence number is needed A packet can further be divided into flits (flow control digits) Routing and sequence occupy header flit

4 Computer Science and Engineering Message, Packets, Flits Message Packet Data flit Destination Sequence

5 Computer Science and Engineering Store and Forward Routing Packets are the basic units of information flow Each node uses a packet buffer A packet is transferred from S to D through a sequence of intermediate nodes Channel and buffer must be available

6 Computer Science and Engineering Wormhole Routing Flits are the basic units of information flow Each node uses a flit buffer Flits are transferred from S to D through a sequence of intermediate routers in order (Pipeline) Can be visualized as a railroad train Flits from different packets cannot be mixed up

7 Computer Science and Engineering Latency Analysis L  packet length (in bits) W  Channel bandwidth (bits/sec) D  Distance (number of hops) F  flit length (in bits)

8 Computer Science and Engineering Store and Forward Latency D

9 Computer Science and Engineering WH Latency D

10 Computer Science and Engineering Latency Analysis L  packet length (in bits) W  Channel bandwidth (bits/sec) D  Distance (number of hops) F  flit length (in bits) T SF = D * L/W T WH = L/W + D* F/W  L/W if L>>F (independent of D)

11 Computer Science and Engineering Communication Patterns Point to Point  1 - 1 Multicast  1 - n Broadcast  1 - all Conference  n - n

12 Computer Science and Engineering Routing potential problems Deadlock: When 2 messages, each is holding the resources required by the other in order to move, both messages will be blocked (cyclic dependency for resources) Straightforward solution (but inefficient) is rerouting Another solution is avoidance of occurrence of deadlock using a strict monotonic order of network resources Channel dependency graph (CDG) is a technique for developing a deadlock-free routing algorithm.

13 Computer Science and Engineering 0 32 1 c1c1 c2c2 c8c8 c5c5 c6c6 c4c4 c7c7 c3c3 c1c1 c2c2 c3c3 c5c5 c4c4 c6c6 c7c7 c8c8 c8c8 c7c7 c6c6 c5c5 c1c1 c2c2 c3c3 c4c4 (a) A 4-node network (b) Channel dependency graph (CDG) (c) CDG for a deadlock-free version of the network A 4-node network and its CDGs

14 Computer Science and Engineering Livelock: A message goes around the network and never reaches its destination It results from using adaptive routing algorithms with dynamic injection, where nodes inject their messages in the network at arbitrary times Policies to avoid livelock are based on assigning a priority to a message injected to the network: Messages are routed according to their priorities Once a message is injected, only a finite number of messages will be injected with higher or equal priority.

15 Computer Science and Engineering Starvation: A node suffers from starvation if it has a message to inject into the network but is never allowed to do so. The simplest policy to avoid starvation is to allow each node to have an injection queue that competes with the queues of the incoming links to the same node. The main disadvantage is that a node with a high message injection rate can slow down all the other nodes in the network.

16 Computer Science and Engineering Routing Efficiency Two Parameters Channel Traffic (number of channels used to deliver the message involved) Communication Latency (distance)

17 Computer Science and Engineering Multicast on a mesh (5 unicasts) Traffic ? Latency ?

18 Computer Science and Engineering Multicast on a mesh (multicast pattern 1) Traffic ? Latency ?

19 Computer Science and Engineering Multicast on a mesh (multicast pattern 2) Traffic ? Latency ?

20 Computer Science and Engineering Broadcast (tree structure) 3 23 4 2 12 3 1 1 2

21 Computer Science and Engineering Message Passing in PVM (Revisit) User application Library Daemon 1 23 4 User application Library Daemon 5 6 7 8 Sending TaskReceiving Task

22 Computer Science and Engineering Standard PVM asynchronous communication A sending task issues a send command (point 1) The message is transferred to the daemon (point 2) Control is returned to the user application (points 3 & 4) The daemon will transmit the message on the physical wire sometime after returning control to the user application (point 3)

23 Computer Science and Engineering Standard PVM asynchronous communication (cont.) The receiving task issues a receive command (point 5) at some other time In the case of a blocking receive, the receiving task blocks on the daemon waiting for a message (point 6). After the message arrives, control is returned to the user application (points 7 & 8) In the case of a non-blocking receive, control is returned to the user application immediately (points 7 & 8)

24 Computer Science and Engineering Send (3 steps) 1. A send buffer must be initialized 2. The message is packed into the buffer 3. The completed message is sent to its destination(s)

25 Computer Science and Engineering Receive (2 steps) 1. The message is received 2. The received items are unpacked

26 Computer Science and Engineering Message Buffers Buffer Creation (before packing) Bufid = pvm_initsend(encoding_option) Bufid = pvm_mkbuf(encoding_option) Encoding optionMeaning 0XDR 1No encoding 2Leave data in place

27 Computer Science and Engineering Message Buffers (cont.) Data Packing pvm_pk*() pvm_pkstr() – one argument pvm_pkstr(“This is my data”); Others – three arguments 1. Pointer to the first item 2. Number of items to be packed 3. Stride pvm_pkint(my_array, n, 1); Packing functions can be called multiple times to pack data into a single message

28 Computer Science and Engineering Sending a message Point to point (one receiver) info = pvm_send(tid, tag) broadcast (multiple receivers) info = pvm_mcast(tids, n, tag) info = pvm_bcast(group_name, tag) Pack and Send (one step) info = pvm_psend(tid, tag, my_array, length, data type)

29 Computer Science and Engineering Receiving a message Blocking bufid = pvm_recv(tid, tag) -1  wild card in either tid or tag Nonblocking bufid = pvm_nrecv(tid, tag) bufid = 0 (no message was received) Timeout bufid = pvm_trecv(tid, tag, timeout) bufid = 0 (no message was received)

30 Computer Science and Engineering Different Receive in PVM Pvm_recv() wait Time Funciton is called Time is expired Message arrival Blocking Pvm_nrecv() Continue execution Non-blocking Pvm_trecv() wait Timeout Resume execution

31 Computer Science and Engineering Data unpacking pvm_upk*() pvm_upkstr() – one argument pvm_upkstr(string); Others – three arguments 1. Pointer to the first item 2. Number of items to be unpacked 3. Stride pvm_upkint(my_array, n, 1);

32 Computer Science and Engineering Networks Computing Four categories WAN MAN LAN SAN Internet TCP/IP

33 Computer Science and Engineering Other Network technologies Fast Ethernet and Gigabit Ethernet The Fiber Distributed Data Interface (FDDI) High-Performance Parallel Interface (HIPPI) Asynchronous Transfer Mode (ATM) Scalable Coherent Interface (SCI)

34 Computer Science and Engineering HiPPI ATM 10 Base T 100 Base T SCI SANLANMAN WAN 10Mbps 100Mbps 1000Mbps 10Gbps 1000 Base T FDDI A representation of network technologies

35 Computer Science and Engineering Client/Server Systems Interconnection Network Interconnection Network Server Threads Client Server Client

36 Computer Science and Engineering Sockets Sockets are used to provide the capability of making connections from one application running on one machine to another running on a different machine. Once a socket is created, it can be used to wait for an incoming connection (passive socket) or can be used to initiate connection (active socket). Client Serve r A Socket Connection

37 Computer Science and Engineering A Client Server Framework for Parallel Applications Interconnection Network Interconnection Network Master (Supervisor) Server 1 Server 2 Server 3Server n Client Slaves (Workers)

38 Computer Science and Engineering Computer Clusters  Advances in commodity processors and network technology  Network of PCs and workstations connected via LAN or WAN forms a Parallel System  Compete favorably (cost/performance)

39 Computer Science and Engineering Cluster Architecture M C P I/O OS M C P I/O OS M C P I/O OS Middleware Programming Environment Interconnection Network Home cluster

40 Computer Science and Engineering Internet Grids Dependable, consistent, pervasive, and inexpensive access to high end computing. Geographically distributed platforms.

41 Computer Science and Engineering Interconnection Networks Ethernet A packet-switched LAN technology. All hosts connected to an Ethernet receive every transmission, making it possible to broadcast a packet to all hosts at the same time. Ethernet uses a distributed access control scheme called Carrier Sense Multiple Access with Collision Detect (CSMA/CD). Each computer connected to an Ethernet network is assigned a unique 48-bit address known as its Ethernet address, also called the media access control address, (MAC).

42 Computer Science and Engineering Switches A n 1 x n 2 switch consists of: n 1 input ports n 2 output ports Links connecting each input to every output Control logic to select a specific connection Internal buffers The connections between input ports and output ports may be: One-to-one (point-to-point) One-to-many (multicast or broadcast) Many-to-one: may cause conflicts at the output ports and needs arbitration.

43 Computer Science and Engineering When only one-to-one connections are allowed, the switch is called crossbar. An n x n crossbar switch can establish n! connections. If we allow both one-to-one as well as one-to-many in an n x n switch, the number of connections that can be established is n n. (We discussed this before, remember?)

44 Computer Science and Engineering Routing can be achieved using 2 mechanisms: Source-path: the entire path to the destination is stored in the packet header at the source location. Table-based: the switch must have a complete routing table that determines the corresponding port for each destination. Port 0 Port 1 Port 2 Port 3 Port 4 Port 5 Port 6 Port 7 605 Port 0 Port 1 Port 2 Port 3 Port 4 Port 5 Port 6 Port 7 Dest-id 6id Routing table Source-path Routing versus Table-based Routing

45 Computer Science and Engineering Myrinet Clos network Myrinet is a high-performance, packet communication and switching technology. Myrinet switches are multiple-port components that route a packet entering on an input channel of a port to the output channel of the port selected by the packet.

46 Computer Science and Engineering Myrinet Clos network 128 Hosts Network Spine Clos “Spreader” Network Connects Spine (upper 8 switches) to Leaves (16 lower switches) 128-host Clos Network using 16-port Myrinet Switch

47 Computer Science and Engineering Myrinet Clos network 2 links each Network Spine 64 Hosts 64-host Clos Network using 16-port Myrinet Switch (Each line represents 2 links)

48 Computer Science and Engineering Myrinet Clos network 4 links each 32 hosts Network Spine 32-host Clos Network using 16-port Myrinet Switch (Each line represents 4 links)

49 Computer Science and Engineering The Quadrics network (QsNet) Consists of 2 hardware building blocks A programmable network interface called Elan: connects the Quadrics network to a processing node containing one or more CPUs Elan provides substantial local processing power to implement high-level message passing protocols (ex: MPI). High-bandwidth, low-latency communication switch called Elite: QsNet connects Elite switches in a quaternary fat-tree topology.

50 Computer Science and Engineering The Quadrics network (QsNet) Processing Nodes


Download ppt "Computer Science and Engineering Advanced Computer Architecture CSE 8383 April 24, 2008 Session 12."

Similar presentations


Ads by Google