Download presentation
Presentation is loading. Please wait.
Published byCynthia Kilbourn Modified over 10 years ago
1
1 Keith D. Underwood, Eric Borch May 16, 2011 A Unified Algorithm for both Randomized Deterministic and Adaptive Routing in Torus Networks
2
2 Legal Disclaimer Notice: This document contains information on products in the design phase of development. The information here is subject to change without notice. Do not finalize a design with this information. Contact your local Intel sales office or your distributor to obtain the latest specification before placing your product order. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications, product descriptions, and plans at any time, without notice. All products, dates, and figures are preliminary for planning purposes and are subject to change without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined. Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The (Intel products discussed herein) may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's website at http://www.intel.com.http://www.intel.com Intel® Itanium®, Xeon, Pentium®, Intel SpeedStep® and Intel NetBurst® are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Copyright © 2009, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
3
3 Motivation Torus topologies are popular, because they are simple and easy to build –Enable short links for power efficiency –Easy to integrate (e.g. BlueGene) Torus networks are particular susceptible to routing congestion –Some traffic patterns are extremely bad with deterministic routing Existing adaptive routing approaches have some limitations –Only work for virtual cut through flow control –Do not provide ordering for the programming models that require it
4
4 Current Adaptive Routing Algorithms for Torus Networks VCT adaptive routing –Used on the T3E –For each message class, use two deterministic channels plus one adaptive channel –If a deterministic channel is busy, can enter the adaptive channel –If the adaptive channel blocks, must re-enter the deterministic channel (and stay there) –Only works for VCT flow control Bubble adaptive routing –Used in BlueGene/L –Requires one packet worth of space to turn in the deterministic direction –Requires two packets worth of space to turn in the adaptive direction
5
5 Related Adaptive Routing Algorithms Turn model adaptive routing –Prevents deadlocks by making a specific subset of turns illegal –Example: negative first algorithm makes it illegal for a packet moving in any positive direction to turn into a negative direction Dimension reversal –Start with dimension ordered routing –Any time the preferred adaptive route would cross from a lower numbered to a high numbered dimension (e.g. moving in Y to moving in X), increment the VC
6
6 Limitations of Adaptive Routing Algorithms Two most recently deployed versions only work for virtual cut through flow control Turn-model based routing does not handle the torus link well Adaptive routing creates a specific challenge at the end-point: Requests are not in order Virtual Cut Through flow control creates a specific challenge at the end-point: messages are interleaved at a packet level granularity
7
7 New Algorithm Objectives Allow wormhole flow-control –Minimize message interleaving to simply the network end-point –Simplify router buffer control Provide deterministic variant that increases throughput –Eliminate requirement to use adaptive routing to achieve high throughput –Maintain compatibility with adaptive routing (e.g. of response messages) Make the route computation algorithmic, even if the end implementation might not do it that way
8
8 New Algorithm Overview (1) Based on turn model –Use turn model rules, except when crossing the torus link –Any of the turn model algorithms can be used Leverage concept from dimension reversal –Treat crossing the torus link as an illegal turn –Increment the virtual channel when crossing the torus link –Allows torus links to be treated as any other link in turn model Virtual channel requirement: number of dimensions plus one –Inject on VC0 –Can cross up to N torus links, where N is the number of dimensions –Four VCs per message class for 3D torus compares reasonably to 3 VCs per message class for other schemes
9
9 New Algorithm Overview (2) Both adaptive routing and deterministic routing use the same rules for which links are legal –Adaptive routing chooses the next link based on load –Deterministic routing chooses the next link based on a hash of the source, destination, and current router All VCs can be used for traffic injection –Leverage the T3E approach to VC spreading –Spread over all VCs based on destination –E.g. if a message only needs to cross 2 torus links, can inject on VC0 or VC1
10
10 Methodology Created a router model in ASIM Evaluated system size of 4K nodes/4K routers Simulated traditional traffic patterns (random, bit reverse, bit complement, transpose, shuffle) Traffic modeled as simple request/response pattern –Single Flit version: one flit request generates one flit response –VCT version: –16 flit write request / 2 flit write response –4 flit read request / 16 flit read response –50/50 mix of reads and writes –Long message version: 4 flit read request / 64K flit read response Simulated 500K cycles (results did not change from 250K cycle intermediate drop)
11
11 Router Architecture Modeled
12
12 Single Flit Throughput: Transpose Bit Reverse is similar
13
13 Single Flit Throughput: Random
14
14 Single Flit Throughput: Shuffle
15
15 Virtual Cut Through Throughput: Transpose Bit Reverse is similar
16
16 Virtual Cut Through Throughput: Random
17
17 Long Message Throughput: Transpose Bit Reverse is similar
18
18 Long Message Throughput: Random
19
19 Long Message Throughput: Bit Complement
20
20 Conclusions Current adaptive routing algorithms have two basic shortcomings –Only virtual cut through flow control is supported –Improving throughput requires message reordering Introduced an adaptive routing variant that: –Allows wormhole flow control –Includes a deterministic spreading variant –Achieves a high fraction of adaptive routing advantages with deterministic routing Future work: address limitations of deterministic spreading for some traffic patterns
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.