The Alpha Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented by Luis Alfredo Campos
Alpha Goals Support communication-intensive server applications –High performance technical computing –Database servers –Web servers –Telecommunication applications Achieve: –Extremely low latency –Enormous bandwidth –Support directory cache coherence Improve: –Reliability –Availability
Overview Alpha core with enhancements Tightly-Coupled multiprocessor network –Connects up to 128 processors –Two-Dimensional torus network Integrated L2 Cache Integrated memory controllerRouter –Directory-Based CC –Separate Virtual Channels –Packet Classes
Network Packet Classes Seven Packet Classes –Request (3 Flits) –Forward (3 Flits) –Block Response (18 or 19 Flits) –Non-Block Response (2 or 3 Flits) –Write I/O (19 Flits) –Read I/O (3 Flits) –Special (1 or 3 Flits) Flits Are 32 Bits Data Plus 7 Bits ECC
Network Architecture Two-dimensional torus –Limited Support for Imperfect Tori Allows Fault Remapping Virtual Cut-Through Routing –Buffer space for 316 packets
Adaptive Routing Four Rectangles With Current and Destination At Diagonals Packets route within the minimum rectangle Maximize the bandwidth between source and destination
Avoiding Deadlocks in Adaptive Routing “Adaptive routing will not deadlock a network as long as packets can drain via a deadlock-free path” 19 Virtual Channels –3 sets of virtual channel per Packet class except for the Special Class (only one channel) Adaptive, VC0, and VC1 –Adaptive Is First Choice –VC0 and VC1 combination creates deadlock-free network
Router Architecture 9 pipeline types –Input and Output: Local, Interprocessor, and I/O Pin to pin latency of 13 cycles –Running at 1.2 Ghz Network Links run 33% slower –Running at 0.8 Ghz –Synchronous with outgoing links –Asynchronous with incoming links
Arbitration Needs to avoid central bottleneck –16 local arbiters –7 global arbiters Least Recently Selected (LRS) Scheme –Local Arbiters Classes Virtual Channel –Global Arbiters Input ports Rotary Rule mode –Priority to oldest packets Coherence Dependence Priority (CDP) Rule mode –Priority depending on class ordering
Questions How Is the 1.2 GHz Internal/800 MHz External Clock OK? Why 2-d Torus? –What Are the Limitations Imposed?