1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,

Slides:



Advertisements
Similar presentations
Routing in a Parallel Computer. A network of processors is represented by graph G=(V,E), where |V| = N. Each processor has unique ID between 1 and N.
Advertisements

Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
1 Chapter 9 Computer Networks. 2 Chapter Topics OSI network layers Network Topology Media access control Addressing and routing Network hardware Network.
Parallel Routing Bruce, Chiu-Wing Sham. Overview Background Routing in parallel computers Routing in hypercube network –Bit-fixing routing algorithm –Randomized.
Interconnection Networks Lecture 8: February 12, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by Wanping Zhang.
Communication operations Efficient Parallel Algorithms COMP308.
Predictive Load Balancing Reconfigurable Computing Group.
Rotary Router : An Efficient Architecture for CMP Interconnection Networks Pablo Abad, Valentín Puente, Pablo Prieto, and Jose Angel Gregorio University.
Interconnection Networks
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
Issues in System-Level Direct Networks Jason D. Bakos.
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
Storage area network and System area network (SAN)
Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Lecturer: Tamanna Haque Nipa
Switching, routing, and flow control in interconnection networks.
Switching Techniques Student: Blidaru Catalina Elena.
Interconnect Network Topologies
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Chapter 2 The Infrastructure. Copyright © 2003, Addison Wesley Understand the structure & elements As a business student, it is important that you understand.
Communication issues for NOC By Farhadur Arifin. Objective: Future system of NOC will have strong requirment on reusability and communication performance.
Interconnect Networks
Distributed Routing Algorithms. In a message passing distributed system, message passing is the only means of interprocessor communication. Unicast, Multicast,
1 Lecture 7: Interconnection Network Part I: Basic Definitions Part II: Message Passing Multicomputers.
Author : Jing Lin, Xiaola Lin, Liang Tang Publish Journal of parallel and Distributed Computing MAKING-A-STOP: A NEW BUFFERLESS ROUTING ALGORITHM FOR ON-CHIP.
Computer Networks Performance Metrics. Performance Metrics Outline Generic Performance Metrics Network performance Measures Components of Hop and End-to-End.
Deadlock CEG 4131 Computer Architecture III Miodrag Bolic.
Sami Al-wakeel 1 Data Transmission and Computer Networks The Switching Networks.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
Network-on-Chip Introduction Axel Jantsch / Ingo Sander
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
Interconnect simulation. Different levels for Evaluating an architecture Numerical models – Mathematic formulations to obtain performance characteristics.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Connecting Devices CORPORATE INSTITUTE OF SCIENCE & TECHNOLOGY, BHOPAL Department of Electronics and.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
SYSTEM ADMINISTRATION Chapter 2 The OSI Model. The OSI Model was designed by the International Standards Organization (ISO) as a structural framework.
2/14/2016  A. Orda, A. Segall, 1 Queueing Networks M nodes external arrival rate (Poisson) service rate in each node (exponential) upon service completion.
LECTURE 12 NET301 11/19/2015Lect NETWORK PERFORMANCE measures of service quality of a telecommunications product as seen by the customer Can.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
Virtual-Channel Flow Control William J. Dally
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
1 Switching and Forwarding Sections Connecting More Than Two Hosts Multi-access link: Ethernet, wireless –Single physical link, shared by multiple.
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
Network Models. The OSI Model Open Systems Interconnection (OSI). Developed by the International Organization for Standardization (ISO). Model for understanding.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Review of Useful Definitions Statistical multiplexing is a method of sharing a link among transmissions. When computers use store-and-forward packet switching,
The OSI Model. History of OSI Model ISO began developing the OSI model in It is widely accepted as a model for understanding network communication.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University
How to Train your Dragonfly
Lecture 23: Interconnection Networks
ECE 544: Traffic engineering (supplement)
ECE 544 Protocol Design Project 2016
Azeddien M. Sllame, Amani Hasan Abdelkader
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Chapter 3: Open Systems Interconnection (OSI) Model
ECE 544 Protocol Design Project 2016
Interconnection Networks: Routing
Communication operations
Storage area network and System area network (SAN)
Net301 LECTURE 10 11/19/2015 Lect
CEG 4131 Computer Architecture III Miodrag Bolic
EE 122: Lecture 7 Ion Stoica September 18, 2001.
EE382C Lecture 6 Adaptive Routing 4/14/11 What is tornado traffic?
Switching, routing, and flow control in interconnection networks
Multiprocessors and Multi-computers
Chapter 2 from ``Introduction to Parallel Computing'',
Presentation transcript:

1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally, B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann, 2004

2 Definitions [1] A network channel c=(x,y) is characterized by –width w c : the number of parallel signals it contains, –frequency f c : the rate at which bits are transported at each signal –latency t c is the time required for a bit to travel from x to y. A bandwidth of a channel is W= w c * f c. The throughput Θ of a network is the data rate in bits per second that network accepts per input port. Under a particular traffic pattern, the channel that carries the largest fraction of the traffic determines the maximum channel load γ. Load on the channel can be equal or smaller than channel bandwidth. Θ=W/γ

3 Taxonomy of Routing Algorithms [1] Deterministic: The simplest algorithm - for each source, destination pair, there is a single path. This routing algorithm usually achieves poor performance because it fails to use alternative routes, and concentrates traffic on only one set of channels. Oblivious: So named because it ignores the state of the network when determining a path. Unlike deterministic, it considers a set of paths from a source to a destination, and chooses between them. Adaptive: The routing algorithm changes based on the state of the network.

4 Routing algorithms [1] Greedy: Always send the packet in the shortest direction around the ring. For example, always route from 0 to 3 in the clockwise direction and from 0 to 5 in the counterclockwise direction. If the distance is the same in both directions, pick a direction randomly. Uniform random: Randomly pick a direction for each packet, with equal probability of picking either direction. Weighted random: Randomly pick a direction for each packet, but weight the short direction with probability 1 - Δ /8 and the long direction with Δ/8, where Δ is the (minimum) distance between the source and destination. Adaptive: Send the packet in the direction for which the local channel has the lowest load. We may approximate load by either measuring the length of the queue serving this channel or recording how many packets it has transmitted over the last T slots.

5 Example [1] Consider a tornado traffic pattern in which each node i sends a packet to i + 3 mod 8. Which algorithm gives the best worst-case throughput?

6

7 Explanation [1] With the greedy routing algorithm, all of the traffic routes in the clockwise direction around the ring, leaving all of the counterclockwise channels idle and loading the clockwise channels with 3 units of traffic, that is, γ = 3, which gives every terminal a throughput of Θ = W/3. With random routing, the counterclockwise links become the bottleneck with a load of γ = 5/2, since half of the traffic traverses 5 links in the counterclockwise direction. This gives a throughput of 2W/5. Weighting the random decision sends 5/8 of the traffic over 3 links and 3/8 of the traffic over 5 links for a load of γ = 15/8 in both directions giving a throughput of 8W/15. Adaptive routing, with some assumptions on how the adaptivity is implemented, will match this perfect load balance in the steady state, giving the same throughput as weighted random routing.

8 Message Formats [2] Message: logical unit for internode communication Packet: basic unit containing destination address for routing Packets have sequencing # for reassembly Flits: flow control digits of packets Store-and-forward: packets Wormhole routing: flits

9 Packets and Flits [2] Header flits contain routing information and sequence number Flit length affected by network size Packet length determined by routing scheme and network implementation Lengths also dependent on channel b/w, router design, network traffic, etc.

10 Message Format [2]

11 Latency Analysis [2] L=packet length W=channel b/w (bits/s) D=distanceF=flit length T SF =(D + 1)L/W T WH =L/W + D*F/W Store-and-forward: controlled by s/w Wormhole: controlled by h/w

12 From [3]