The Alpha 21364 Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented.

Slides:



Advertisements
Similar presentations
A Novel 3D Layer-Multiplexed On-Chip Network
Advertisements

Better answers The Alpha and Microprocessors: Continuing the Performance Lead Beyond Y2K Shubu Mukherjee, Ph.D. Principal Hardware Engineer.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
Miguel Gorgues, Dong Xiang, Jose Flich, Zhigang Yu and Jose Duato Uni. Politecnica de Valencia, Spain School of Software, Tsinghua University, China, Achieving.
1 Lecture 17: On-Chip Networks Today: background wrap-up and innovations.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
4-1 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side, delivers.
CSE 291-a Interconnection Networks Lecture 12: Deadlock Avoidance (Cont’d) Router February 28, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter.
1 Lecture 16: On-Chip Networks Today: on-chip networks background.
10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.
CSCI 8150 Advanced Computer Architecture
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
1 Lecture 13: Interconnection Networks Topics: flow control, router pipelines, case studies.
1 Lecture 25: Interconnection Networks Topics: flow control, router microarchitecture Final exam:  Dec 4 th 9am – 10:40am  ~15-20% on pre-midterm  post-midterm:
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 26: Interconnection Networks Topics: flow control, router microarchitecture.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
CPU Chips The logical pinout of a generic CPU. The arrows indicate input signals and output signals. The short diagonal lines indicate that multiple pins.
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
4: Network Layer4b-1 Router Architecture Overview Two key router functions: r run routing algorithms/protocol (RIP, OSPF, BGP) r switching datagrams from.
Blue Gene / C Cellular architecture 64-bit Cyclops64 chip: –500 Mhz –80 processors ( each has 2 thread units and a FP unit) Software –Cyclops64 exposes.
On-Chip Networks and Testing
Networks-on-Chips (NoCs) Basics
Dynamic Networks CS 213, LECTURE 15 L.N. Bhuyan CS258 S99.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Blue Gene/L Torus Interconnection Network N. R. Adiga, et.al IBM Journal.
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
LIBRA: Multi-mode On-Chip Network Arbitration for Locality-Oblivious Task Placement Gwangsun Kim Computer Science Department Korea Advanced Institute of.
DUKE UNIVERSITY Self-Tuned Congestion Control for Multiprocessor Networks Shubhendu S. Mukherjee VSSAD, Alpha Development Group.
Shubhendu S. Mukherjee, Peter Bannon, Steven Lang, Aaron Spink, and David Webb Alpha Development Group, Compaq HOT Interconnects 9 (2001) Presented by.
ECE669 L21: Routing April 15, 2004 ECE 669 Parallel Computer Architecture Lecture 21 Routing.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Switch Microarchitecture Basics.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Alpha 21364: A Scalable Single-chip SMP Peter Bannon Senior Consulting Engineer Compaq Computer Corporation Shrewsbury, MA.
The Alpha Network Architecture Mukherjee, Bannon, Lang, Spink, and Webb Summary Slides by Fred Bower ECE 259, Spring 2004.
Forwarding.
Sun Starfire: Extending the SMP Envelope Presented by Jen Miller 2/9/2004.
Lecture 16: Router Design
Intel Slide 1 A Comparative Study of Arbitration Algorithms for the Alpha Pipelined Router Shubu Mukherjee*, Federico Silla !, Peter Bannon $, Joel.
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.
Network Layer4-1 Chapter 4 Network Layer All material copyright J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking: A Top Down.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
Corse Overview Miodrag Bolic ELG7187 Topics in Computers: Multiprocessor Systems on Chip.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
Presented by: Nick Kirchem Feb 13, 2004
Overview Parallel Processing Pipelining
Lecture 23: Interconnection Networks
Lecture 23: Router Design
Lecture 16: On-Chip Networks
Chapter 4: Network Layer
What’s “Inside” a Router?
CMSC 611: Advanced Computer Architecture
Architecture of Parallel Computers CSC / ECE 506 Summer 2006 Scalable Programming Models Lecture 11 6/19/2006 Dr Steve Hunter.
Advanced Computer and Parallel Processing
Low-Latency Virtual-Channel Routers for On-Chip Networks Robert Mullins, Andrew West, Simon Moore Presented by Sailesh Kumar.
Peter Bannon Staff Fellow HP
CSC3050 – Computer Architecture
CS 6290 Many-core & Interconnect
Advanced Computer and Parallel Processing
Lecture 25: Interconnection Networks
Chapter 4: Network Layer
Multiprocessors and Multi-computers
Presentation transcript:

The Alpha Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented by Luis Alfredo Campos

Alpha Goals Support communication-intensive server applications –High performance technical computing –Database servers –Web servers –Telecommunication applications Achieve: –Extremely low latency –Enormous bandwidth –Support directory cache coherence Improve: –Reliability –Availability

Overview Alpha core with enhancements Tightly-Coupled multiprocessor network –Connects up to 128 processors –Two-Dimensional torus network Integrated L2 Cache Integrated memory controllerRouter –Directory-Based CC –Separate Virtual Channels –Packet Classes

Network Packet Classes Seven Packet Classes –Request (3 Flits) –Forward (3 Flits) –Block Response (18 or 19 Flits) –Non-Block Response (2 or 3 Flits) –Write I/O (19 Flits) –Read I/O (3 Flits) –Special (1 or 3 Flits) Flits Are 32 Bits Data Plus 7 Bits ECC

Network Architecture Two-dimensional torus –Limited Support for Imperfect Tori Allows Fault Remapping Virtual Cut-Through Routing –Buffer space for 316 packets

Adaptive Routing Four Rectangles With Current and Destination At Diagonals Packets route within the minimum rectangle Maximize the bandwidth between source and destination

Avoiding Deadlocks in Adaptive Routing “Adaptive routing will not deadlock a network as long as packets can drain via a deadlock-free path” 19 Virtual Channels –3 sets of virtual channel per Packet class except for the Special Class (only one channel) Adaptive, VC0, and VC1 –Adaptive Is First Choice –VC0 and VC1 combination creates deadlock-free network

Router Architecture 9 pipeline types –Input and Output: Local, Interprocessor, and I/O Pin to pin latency of 13 cycles –Running at 1.2 Ghz Network Links run 33% slower –Running at 0.8 Ghz –Synchronous with outgoing links –Asynchronous with incoming links

Arbitration Needs to avoid central bottleneck –16 local arbiters –7 global arbiters Least Recently Selected (LRS) Scheme –Local Arbiters Classes Virtual Channel –Global Arbiters Input ports Rotary Rule mode –Priority to oldest packets Coherence Dependence Priority (CDP) Rule mode –Priority depending on class ordering

Questions How Is the 1.2 GHz Internal/800 MHz External Clock OK? Why 2-d Torus? –What Are the Limitations Imposed?