Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano.

Slides:



Advertisements
Similar presentations
Ch. 12 Routing in Switched Networks
Advertisements

Ch. 12 Routing in Switched Networks Routing in Packet Switched Networks Routing Algorithm Requirements –Correctness –Simplicity –Robustness--the.
A Novel 3D Layer-Multiplexed On-Chip Network
Dynamic Topology Optimization for Supercomputer Interconnection Networks Layer-1 (L1) switch –Dumb switch, Electronic “patch panel” –Establishes hard links.
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
Flattened Butterfly Topology for On-Chip Networks John Kim, James Balfour, and William J. Dally Presented by Jun Pang.
REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Weighted Random Oblivious Routing on Torus Networks Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego.
Advanced Networking Wickus Nienaber Daniel Beech.
Montek Singh COMP Nov 10,  Design questions at various leves ◦ Network Adapter design ◦ Network level: topology and routing ◦ Link level:
Reporter: Bo-Yi Shiu Date: 2011/05/27 Virtual Point-to-Point Connections for NoCs Mehdi Modarressi, Arash Tavakkol, and Hamid Sarbazi- Azad IEEE TRANSACTIONS.
Allocator Implementations for Network-on-Chip Routers Daniel U. Becker and William J. Dally Concurrent VLSI Architecture Group Stanford University.
Module R R RRR R RRRRR RR R R R R Efficient Link Capacity and QoS Design for Wormhole Network-on-Chip Zvika Guz, Isask ’ har Walter, Evgeny Bolotin, Israel.
Predictive Load Balancing Reconfigurable Computing Group.
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
Issues in System-Level Direct Networks Jason D. Bakos.
1 Indirect Adaptive Routing on Large Scale Interconnection Networks Nan Jiang, William J. Dally Computer System Laboratory Stanford University John Kim.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
1 Near-Optimal Oblivious Routing for 3D-Mesh Networks ICCD 2008 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering Department University.
Routing Algorithms ECE 284 On-Chip Interconnection Networks Spring
Dragonfly Topology and Routing
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Diamonds are a Memory Controller’s Best Friend* *Also known as: Achieving Predictable Performance through Better Memory Controller Placement in Many-Core.
Switching, routing, and flow control in interconnection networks.
High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Tightly-Coupled Multi-Layer Topologies for 3D NoCs Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi (NII, JAPAN) Hideharu Amano (Keio Univ, JAPAN)
Communication issues for NOC By Farhadur Arifin. Objective: Future system of NOC will have strong requirment on reusability and communication performance.
On-Chip Networks and Testing
A Vertical Bubble Flow Network using Inductive-Coupling for 3D CMPs
SHAPES scalable Software Hardware Architecture Platform for Embedded Systems Hardware Architecture Atmel Roma, INFN Roma, ST Microelectronics Grenoble,
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Adding Slow-Silent Virtual Channels for Low-Power On-Chip Networks Hiroki Matsutani (Keio Univ, Japan) Michihiro Koibuchi (NII, Japan) Daihan Wang (Keio.
R OUTE P ACKETS, N OT W IRES : O N -C HIP I NTERCONNECTION N ETWORKS Veronica Eyo Sharvari Joshi.
Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.
Three-Dimensional Layout of On-Chip Tree-Based Networks Hiroki Matsutani (Keio Univ, Japan) Michihiro Koibuchi (NII, Japan) D. Frank Hsu (Fordham Univ,
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Improving Capacity and Flexibility of Wireless Mesh Networks by Interface Switching Yunxia Feng, Minglu Li and Min-You Wu Presented by: Yunxia Feng Dept.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
A Lightweight Fault-Tolerant Mechanism for Network-on-Chip
Design and Evaluation of Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​ Kevin Chang, Greg Nazario, Reetuparna.
O1TURN : Near-Optimal Worst-Case Throughput Routing for 2D-Mesh Networks DaeHo Seo, Akif Ali, WonTaek Lim Nauman Rafique, Mithuna Thottethodi School of.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Network-on-Chip Introduction Axel Jantsch / Ingo Sander
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
10/03/2005: 1 Physical Synthesis of Latency Aware Low Power NoC Through Topology Exploration and Wire Style Optimization CK Cheng CSE Department UC San.
Jose Miguel Montanana (NII, Japan) Michihiro Koibuchi (NII, Japan ) Hiroki Matsutani ( U of Tokyo, Japan ) Hideharu Amano ( Keio U/ NII, Japan ) Stabilizing.
BZUPAGES.COM Presentation On SWITCHING TECHNIQUE Presented To; Sir Taimoor Presented By; Beenish Jahangir 07_04 Uzma Noreen 07_08 Tayyaba Jahangir 07_33.
Runtime Power Gating of On-Chip Routers Using Look-Ahead Routing
University of Michigan, Ann Arbor
Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Yu Cai Ken Mai Onur Mutlu
1 Oblivious Routing Design for Mesh Networks to Achieve a New Worst-Case Throughput Bound Guang Sun 1,2, Chia-Wei Chang 1, Bill Lin 1, Lieguang Zeng 2,
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
Topology-aware QOS Support in Highly Integrated CMPs Boris Grot (UT-Austin) Stephen W. Keckler (NVIDIA/UT-Austin) Onur Mutlu (CMU) WIOSCA '10.
1 Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO’03, Princeton A Gracefully Degrading and.
High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group January 9,
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University
Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 
Datacenter Interconnection Network Design
Azeddien M. Sllame, Amani Hasan Abdelkader
Israel Cidon, Ran Ginosar and Avinoam Kolodny
Switching, routing, and flow control in interconnection networks
Presentation transcript:

Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano Keio Univ. National Institute of Informatics Toshiba RDC Keio Univ.

Network-on-Chip (NoC) Tile-based Multi-Core –Core: Execution –Router: Packet delivery RAW –2D Mesh ACM –Tree aSoC –2D Mesh [Taylor, Micro2002] [Liang, TVLSI2004] [Furtek, FPL2004] Tile (RISC, RAM, I/O)

Network-on- b Chip (NoC) [Taylor, Micro2002] [Liang, TVLSI2004] [Furtek, FPL2004] MIPS Memory Router Tile-based Multi-Core –Core: Execution –Router: Packet delivery RAW –2D Mesh ACM –Tree aSoC –2D Mesh

Network-on-Chip (NoC) SoC is growing!  NoC is one of Scalable on-chip interconnects Better Wiring Delay –Global wiring –Limited-length Links Improve Modularity –Standard Network I/F ○ Advantage Overhead × Drawback Tile (RISC, RAM, I/O)

Stream Processing ~ Simulation ~ Module(a)Module(b) Data No Clock for execution Module(a)Module(b) Data Communication is cycle accurate Clock MPEG, JPEG, Viterbi –System Level Design RTL Model UnTimed Functional Bus Cycle Accurate UTF Model BCA Model High Abstraction Detail Design Application is divided into some Tasks based on Simulation.

Task Flow Graph Stream Processing ~ Map, Route ~ Shared Links –Link Congestion  Throughput is degraded Optimization (in general) –Mapping: Minimum Communication Length –Routing : Minimal Paths (2) (1)(3)(4) Physical Tile of NoC (1)(2) (4)(3) Strong access locality !! Too short to distribute path congestion by Minimal paths.

Existing Routing ~ Is non-minimal path useful? ~ Packet delivery –WH Switching Common feature of SAN & NoC Predictable communication  Load balancing with non-minimal Deadlock freedom –Turn-Model, … Various applications, Various traffic patterns –Non-minimal paths make unstable state Feature of SAN [Ho, HPCA2003] Fixed application, Fixed traffic patterns –System level simulation Feature of NoC

Flee ~ Non-minimal routing strategy ~ Stream processing in NoCs –Strong access locality !! –Too short to distribute path congestions Partially non-minimal paths Path establishment based on Traffic Amount –Heavy Traffic Comm.  Minimal Path –Light Traffic Comm.  Avoiding Congestion Non-minimal paths are basically inefficient… Increase # of alternative paths by introducing non-minimal paths

Flee ~ Traffic pattern Analysis ~ # time, src, dst, size (0) (1) (0) (2) (0) (3) (1) (2) (0) (1) (0) (2) (0) (3) (2) (3) (1) (2) (2) (3) 4 Traffic Pattern Traffic Analysis 1. For each src-dst pair, –Totalize packet size E.g., src-dst pair(0,1)  Sorting in descending order –In order of TotalSize # src  dst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … Analysis Record Src-dst pair with largest TotalSize is in first line Each src-dst pair gets a path in order of Analysis Record. Heavy!

# src  dst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … (0)(1)(2)(3) Flee ~ Establishing Paths ~ In order of Traffic Amount : –Search for lowest cost path –Increase the cost of links selected Each link has “Cost” 解析結果 # src  dst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … Analysis Record # src  dst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … Analysis Record # src  dst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … Analysis Record # src  dst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … Analysis Record # src  dst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … Analysis Record Paths are assigned not to disturb previously established paths There will be several alternative paths …  Link with high cost is hotspot …

Simulation Environments Router Model –4 ports for adj. Routers –1 port for Core Network Topology –4×4 Mesh –4×4 Torus 16 node 2D mesh Router Core Packet size259 flit (2 flit header) Switching methodWormhole switching # of Virtual channels Mesh : 1, Torus : 2 Simulation time1,000,000 cycle

Applications for Evaluation App. Traces –Viterbi Decoder –JPEG Codec –IPsec –Uniform (0) Header Analysis (1) Huffman Decode (2) Inverse Quant. (3) I-DCT for Row (4)(5) Yuv-rgb Convert (6) MCU Mapping (7) I-DCT for Col (8) Rgb-yuv Convert (9) MCU Samping (10) I-DCT for Col (11) I-DCT for Row (12)(13) Stream Gen. (14) Huffman Code (15) Quant. Tile mapping example of JPEG Codec ( for Decoder, for Encoder)

Results ~ 2D Mesh ~ Flee –Avg Hop count : 2.52 DOR –Avg Hop count : 1.84 X-axis : Accepted Traffic [flit/cycle/node] Y-axis: Latency [cycle] 14.2% Improved Communication in Viterbi trace includes Fork and Join. (Dimension-Order Routing)

Results ~ 2D Torus ~ Flee –Avg Hop count : 1.87 DOR –Avg Hop count : % Improved X-axis : Accepted Traffic [flit/cycle/node] Flee improves 22.2% of throughput with non-minimal paths. Y-axis: Latency [cycle] Communication in Viterbi trace includes Fork and Join. (Dimension-Order Routing)

Results ~ 2D Mesh ~ Flee –Avg Hop count : 1.01 DOR –Avg Hop count : 1.00 No difference X-axis : Accepted Traffic [flit/cycle/node] Y-axis: Latency [cycle] In JPEG trace, data is sequentially process. No fork and join pattern. (Dimension-Order Routing) Communication is between neighbors  No need non-minimal

Results ~ Effect of Traffic Analysis ~ Flee –Known data amount Flee (Incomplete) –Unknown data amount Incomplete Flee: Not Improved 2D Mesh Y-axis: Latency [cycle] X-axis : Accepted Traffic [flit/cycle/node]  All data transfer size is “1”

Results ~ Effect of Traffic Analysis ~ Flee –Known data amount Flee (Incomplete) –Unknown data amount Incomplete Flee: Partially Improved  All data transfer size is “1” 2D Torus X-axis : Accepted Traffic [flit/cycle/node] Communication size is key factor to improve performance. Y-axis: Latency [cycle]

Summary ~ Non-minimal routing strategy ~ Stream Processing in NoCs –Strong access locality !! –Too short to distribute path congestions Flee: Non-minimal routing strategy –Heavy Traffic Comm.  Minimal Paths –Light Traffic Comm.  Avoiding Congestions Improve 22.2% of Throughput Increase # of alternative paths by introducing non-minimal paths

Thank you for your listening