Presentation is loading. Please wait.

Presentation is loading. Please wait.

Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano.

Similar presentations


Presentation on theme: "Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano."— Presentation transcript:

1 Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano Keio Univ. National Institute of Informatics Toshiba RDC Keio Univ.

2 Network-on-Chip (NoC) Tile-based Multi-Core –Core: Execution –Router: Packet delivery RAW –2D Mesh ACM –Tree aSoC –2D Mesh [Taylor, Micro2002] [Liang, TVLSI2004] [Furtek, FPL2004] 012 345 678 Tile (RISC, RAM, I/O)

3 Network-on- b Chip (NoC) [Taylor, Micro2002] [Liang, TVLSI2004] [Furtek, FPL2004] MIPS Memory Router Tile-based Multi-Core –Core: Execution –Router: Packet delivery RAW –2D Mesh ACM –Tree aSoC –2D Mesh

4 Network-on-Chip (NoC) 012 345 678 SoC is growing!  NoC is one of Scalable on-chip interconnects Better Wiring Delay –Global wiring –Limited-length Links Improve Modularity –Standard Network I/F ○ Advantage Overhead × Drawback Tile (RISC, RAM, I/O)

5 Stream Processing ~ Simulation ~ Module(a)Module(b) Data No Clock for execution Module(a)Module(b) Data Communication is cycle accurate Clock MPEG, JPEG, Viterbi –System Level Design RTL Model UnTimed Functional Bus Cycle Accurate UTF Model BCA Model High Abstraction Detail Design Application is divided into some Tasks based on Simulation.

6 Task Flow Graph Stream Processing ~ Map, Route ~ Shared Links –Link Congestion  Throughput is degraded Optimization (in general) –Mapping: Minimum Communication Length –Routing : Minimal Paths (2) (1)(3)(4) Physical Tile of NoC (1)(2) (4)(3) Strong access locality !! Too short to distribute path congestion by Minimal paths.

7 Existing Routing ~ Is non-minimal path useful? ~ Packet delivery –WH Switching Common feature of SAN & NoC Predictable communication  Load balancing with non-minimal Deadlock freedom –Turn-Model, … Various applications, Various traffic patterns –Non-minimal paths make unstable state Feature of SAN [Ho, HPCA2003] Fixed application, Fixed traffic patterns –System level simulation Feature of NoC

8 Flee ~ Non-minimal routing strategy ~ Stream processing in NoCs –Strong access locality !! –Too short to distribute path congestions Partially non-minimal paths Path establishment based on Traffic Amount –Heavy Traffic Comm.  Minimal Path –Light Traffic Comm.  Avoiding Congestion Non-minimal paths are basically inefficient… Increase # of alternative paths by introducing non-minimal paths

9 Flee ~ Traffic pattern Analysis ~ # time, src, dst, size 10000 (0) (1) 32 10000 (0) (2) 4 10000 (0) (3) 4 10010 (1) (2) 32 10010 (0) (1) 32 10010 (0) (2) 4 10010 (0) (3) 4 10020 (2) (3) 32 10020 (1) (2) 32 10030 (2) (3) 4 Traffic Pattern Traffic Analysis 1. For each src-dst pair, –Totalize packet size E.g., src-dst pair(0,1) 32 + 32  64 2. Sorting in descending order –In order of TotalSize # src  dst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … Analysis Record Src-dst pair with largest TotalSize is in first line Each src-dst pair gets a path in order of Analysis Record. Heavy!

10 # src  dst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … (0)(1)(2)(3) Flee ~ Establishing Paths ~ In order of Traffic Amount : –Search for lowest cost path –Increase the cost of links selected Each link has “Cost” 解析結果 # src  dst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … Analysis Record # src  dst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … Analysis Record # src  dst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … Analysis Record # src  dst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … Analysis Record # src  dst, TotalSize (0)  (1) 8192 (1)  (2) 8192 (2)  (3) 8192 (0)  (2) 1024 (0)  (3) 1024 … Analysis Record Paths are assigned not to disturb previously established paths There will be several alternative paths …  Link with high cost is hotspot …

11 Simulation Environments Router Model –4 ports for adj. Routers –1 port for Core Network Topology –4×4 Mesh –4×4 Torus 16 node 2D mesh 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 Router Core Packet size259 flit (2 flit header) Switching methodWormhole switching # of Virtual channels Mesh : 1, Torus : 2 Simulation time1,000,000 cycle

12 Applications for Evaluation App. Traces –Viterbi Decoder –JPEG Codec –IPsec –Uniform (0) Header Analysis (1) Huffman Decode (2) Inverse Quant. (3) I-DCT for Row (4)(5) Yuv-rgb Convert (6) MCU Mapping (7) I-DCT for Col (8) Rgb-yuv Convert (9) MCU Samping (10) I-DCT for Col (11) I-DCT for Row (12)(13) Stream Gen. (14) Huffman Code (15) Quant. Tile mapping example of JPEG Codec ( for Decoder, for Encoder)

13 Results ~ Viterbi @ 2D Mesh ~ Flee –Avg Hop count : 2.52 DOR –Avg Hop count : 1.84 X-axis : Accepted Traffic [flit/cycle/node] Y-axis: Latency [cycle] 14.2% Improved Communication in Viterbi trace includes Fork and Join. (Dimension-Order Routing)

14 Results ~ Viterbi @ 2D Torus ~ Flee –Avg Hop count : 1.87 DOR –Avg Hop count : 1.48 22.2% Improved X-axis : Accepted Traffic [flit/cycle/node] Flee improves 22.2% of throughput with non-minimal paths. Y-axis: Latency [cycle] Communication in Viterbi trace includes Fork and Join. (Dimension-Order Routing)

15 Results ~ JPEG @ 2D Mesh ~ Flee –Avg Hop count : 1.01 DOR –Avg Hop count : 1.00 No difference X-axis : Accepted Traffic [flit/cycle/node] Y-axis: Latency [cycle] In JPEG trace, data is sequentially process. No fork and join pattern. (Dimension-Order Routing) Communication is between neighbors  No need non-minimal

16 Results ~ Effect of Traffic Analysis ~ Flee –Known data amount Flee (Incomplete) –Unknown data amount Incomplete Flee: Not Improved Viterbi @ 2D Mesh Y-axis: Latency [cycle] X-axis : Accepted Traffic [flit/cycle/node]  All data transfer size is “1”

17 Results ~ Effect of Traffic Analysis ~ Flee –Known data amount Flee (Incomplete) –Unknown data amount Incomplete Flee: Partially Improved  All data transfer size is “1” Viterbi @ 2D Torus X-axis : Accepted Traffic [flit/cycle/node] Communication size is key factor to improve performance. Y-axis: Latency [cycle]

18 Summary ~ Non-minimal routing strategy ~ Stream Processing in NoCs –Strong access locality !! –Too short to distribute path congestions Flee: Non-minimal routing strategy –Heavy Traffic Comm.  Minimal Paths –Light Traffic Comm.  Avoiding Congestions Improve 22.2% of Throughput Increase # of alternative paths by introducing non-minimal paths

19 Thank you for your listening


Download ppt "Non-Minimal Routing Strategy for Application-Specific Networks-on-Chips Hiroki Matsutani Michihiro Koibuchi Yutaka Yamada Jouraku Akiya Hideharu Amano."

Similar presentations


Ads by Google