Presentation is loading. Please wait.

Presentation is loading. Please wait.

George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks.

Similar presentations


Presentation on theme: "George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks."— Presentation transcript:

1 George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks 1

2 The PPL Vision Domain Embedding Language (Scala) Virtual Worlds Personal Robotics Data informatics Data informatics Scientific Engineering Scientific Engineering Physics (Liszt) Scripting Probabilistic (RandomT) Machine Learning (OptiML) Rendering Parallel Runtime (Delite, Sequoia, GRAMPS) Dynamic Domain Spec. Opt. Locality Aware Scheduling Staging Polymorphic Embedding Applications Domain Specific Languages Heterogeneous Hardware DSL Infrastructure Task & Data Parallelism Hardware Architecture OOO Cores SIMD Cores Threaded Cores Specialized Cores Static Domain Specific Opt. Programmable Hierarchies Programmable Hierarchies Scalable Coherence Scalable Coherence Isolation & Atomicity On-chip Networks On-chip Networks Pervasive Monitoring

3 In a Nutshell  Elastic-buffer (EB) flow-control uses the channels as distributed FIFOs Input buffers at routers are not needed  Compared to VC routers: Reduces cycle time up to 67% Provides 43% more throughput per unit power, and 22% more throughput per unit area Makes for a simpler network  EB uses duplicate subnetworks for traffic isolation For many classes, a hybrid EB-VC router is used instead Uses buffers only to alleviate severe contention and deadlocks. Increases power efficiency 3

4 Outline  Building EB channels The basic building blocks of EB networks  EB router design  Deadlock avoidance & congestion sensing  Evaluation results 4

5 The Idea  Use the network channels as distributed FIFOs  Use that storage instead of input buffers at routers To remove input buffer area and power costs Pipelined channel Channel as FIFO 5

6 Building an Elastic Buffer  To build an EB in a pipelined channel with master-slave flip-flops (FFs):  Use latches for storage by driving their enables independently Master-slave FF Elastic buffer 6

7 How Elastic Buffer Channels Work  Ready/valid handshake between elastic buffers Ready: At least one free storage slot Valid: Non-empty (driving valid data) Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6 7

8 Outline  Building EB channels  EB router design The implications in router design  Deadlock avoidance & congestion sensing  Evaluation results 8

9 Use EB Flow-Control Through the Router VC input-buffered router EB router Input buffer replaced by input EB VC & SW allocators removed. Per-output arbiters instead. Three-slot output EB to cover for arbitration done one cycle in advance. LA routing also applicable to EB networks. 9

10 Two Improved Router Designs  Enhanced two- stage Fixes baseline design’s main inefficiencies Prioritizes cycle time  Single-stage Removes pipelining overhead Prioritizes latency 10

11 Outline  Building EB channels  EB router design  Deadlock avoidance & congestion sensing How to provide traffic classes  Evaluation results 11

12 Deadlock Avoidance  No input buffers no virtual channels  Can provide traffic isolation with duplicate physical channels Duplicating subnetworks most efficient due to crossbar quadratic cost That is only true for up to a certain number of classes 12

13 Hybrid EB-VC Router  For many classes, have an input buffer to drain flits after a predefined number of blocking cycles  Thus, buffer is used only to alleviate heavy contention and resolve deadlocks In the common case, as energy efficient as EB networks 13

14 Output Channel Occupancy Load Metric  Flit-buffered networks use credit count  EB networks measure output channel occupancy At a certain segment of the output channel (shown in red) Occupancy decremented when flits leave that segment Incremented by a packet’s length when routing decision is made. Packets see other decisions in same cycle 14

15 Outline  Building EB channels  EB router design  Deadlock avoidance & congestion sensing  Evaluation results Let’s talk numbers 15

16 Throughput-Power Mesh (Baseline Router) EB network improvement: Same power: 10% increased throughput Same throughput: 12% reduced power Throughput gain EB: 18% lower cycle time. Not taken into account. 16

17 Router RTL Implementation  No buffers, VCs, allocators, credits VC router had look-ahead routing  Buffers: FF arrays. 2 VCs, 8 slots each AspectVC routerEB routerSavings Area (μm 2 )63,51514,73077% Clock (ns)3.32.718% Power (mW)2.590.1295% 45nm, LP-CMOS, worst-case Mesh 5x5 routers. DOR. 64-bit datapath 17

18 Router Comparison 18 Baseline: 9% less energy than single- stage. 35% than enhanced Enhanced: 26% reduced cycle time than single-stage. 42% than baseline

19 Hybrid EB-VC Comparison  Cycle time comparable to VC, not EB routers 19 Hybrid offers 21% more throughput per unit power than VC. 12% than EB The VC network offers 41% more throughput per unit area. The EB 49%

20 Conclusions  EB flow-control uses channels as distributed FIFOs Uses the pipeline flip-flops that are required anyway Removes input buffers from routers  Provides 43% more throughput per unit power, and 22% more throughput per unit area Depends on what fraction of the cost input buffers are  Reduces cycle time up to 67%  Hybrid EB-VC router provides a large number of classes. Input buffer is used only when it has to 21% more throughput per unit power than VC  Remove buffers, keep buffering. Elastic buffers! 20

21 Questions? 21


Download ppt "George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks."

Similar presentations


Ads by Google