Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Microarchitecture for Network-on-Chip Routers

Similar presentations


Presentation on theme: "Efficient Microarchitecture for Network-on-Chip Routers"— Presentation transcript:

1 Efficient Microarchitecture for Network-on-Chip Routers
Concurrent VLSI Architecture Group Efficient Microarchitecture for Network-on-Chip Routers Daniel U. Becker PhD Oral Examination 8/21/2012

2 Efficient Microarchitecture for NoC Routers
Outline INTRODUCTION Allocator Implementations Buffer Management Infrastructure Conclusions 8/21/12 Efficient Microarchitecture for NoC Routers

3 Efficient Microarchitecture for NoC Routers
Networks-on-Chip Chip Moore’s Law alive & well Many cores per chip Must work together Networks-on-Chip (NoCs) aim to provide scalable, efficient communication fabric Core 8/21/12 Efficient Microarchitecture for NoC Routers

4 Why Does the Network Matter?
Performance Latency Throughput Fairness, QoS Cost Die area Wiring resources Design complexity Power & energy efficiency [Harting et al., “Energy and Performance Benefits of Active Messages “ 8/21/12 Efficient Microarchitecture for NoC Routers

5 Optimizing the Network
Applications & programming models Communication primitives Topologies & Routing Flow control Router microarchitecture Circuit design 8/21/12 Efficient Microarchitecture for NoC Routers

6 Router Microarchitecture Overview
Part 1 Part 2 [Peh and Dally: “A Delay Model for Router Microarchitectures”] 8/21/12 Efficient Microarchitecture for NoC Routers

7 Efficient Microarchitecture for NoC Routers
Outline Introduction ALLOCATOR IMPLEMENTATIONS Buffer Management Infrastructure Conclusions [Becker and Dally: “Allocator Implementations for Network-on-Chip Routers,” SC’09] 8/21/12 Efficient Microarchitecture for NoC Routers

8 Efficient Microarchitecture for NoC Routers
Allocators Fundamental part of router control logic Manage access to network resources Orchestrate flow of packets through router Affect network utilization Potentially affect cycle time 8/21/12 Efficient Microarchitecture for NoC Routers

9 Virtual Channel Allocation
Virtual channels (VCs) allow multiple packets to be interleaved on physical channels Similar to lanes on a highway, allow traffic blocks to be bypassed Before packets can use network channel, need to claim ownership of a VC VC allocator assigns output VCs to waiting packets 8/21/12 Efficient Microarchitecture for NoC Routers

10 Efficient Microarchitecture for NoC Routers
Sparse VC Allocation IVC 64 Requests 32 Requests 24 Requests OVC NM P×2 Requests REQ P×8 Requests MIN P×4 Requests NM P×2 Requests REP MIN P×4 Requests 2×2×2 VCs 2×4 VCs 8 VCs [single input port shown] 8/21/12 Efficient Microarchitecture for NoC Routers

11 Efficient Microarchitecture for NoC Routers
VC Allocator Delay -58% Canonical design -30% -40% -30% 5 ports, 2x1 VCs 5 ports, 2x2 VCs 8/21/12 Efficient Microarchitecture for NoC Routers

12 Efficient Microarchitecture for NoC Routers
VC Allocator Area -78% 31800 -50% -78% -60% 5 ports, 2x1 VCs 5 ports, 2x2 VCs 8/21/12 Efficient Microarchitecture for NoC Routers

13 Efficient Microarchitecture for NoC Routers
Switch Allocation Once a VC is allocated, packet can be forwarded Broken down into flits For each flit, must request crossbar access Switch allocator generates crossbar schedule inputs outputs [Enright Jerger and Peh, “On-Chip Networks”] 8/21/12 Efficient Microarchitecture for NoC Routers

14 Speculative Switch Allocation
Reduce pipeline latency by attempting switch allocation in parallel with VC allocation Speculate that VC will be assigned! But mis-speculation wastes crossbar bandwidth Must prioritize non-speculative requests 8/21/12 Efficient Microarchitecture for NoC Routers

15 Pessimistic Speculation
Speculation matters most when network is lightly loaded At low network load, most requests are granted Idea: Assume all non-spec. requests will be granted! nonspec. allocator non-spec. requests nonspec. grants conflict detection spec. allocator spec. requests spec. grants mask 8/21/12 Efficient Microarchitecture for NoC Routers

16 Performance with Speculation
<2% -21% zero-load latency [Mesh, 2 VCs; UR traffic] 8/21/12 Efficient Microarchitecture for NoC Routers

17 Efficient Microarchitecture for NoC Routers
Area and Delay Impact [Full router; Mesh, 2 VCs; TSMC 45nm GP] +16% max. clock freq. -13% 1.2 GHz -5% 1 GHz 8/21/12 Efficient Microarchitecture for NoC Routers

18 Additional Contributions
Fast loop-free wavefront allocators Priority-based speculation Practical combined VC and switch allocation Details in thesis 8/21/12 Efficient Microarchitecture for NoC Routers

19 Efficient Microarchitecture for NoC Routers
Summary Sparse VC allocation exploits traffic classes to reduce VC allocator complexity Reduces delay by 30-60%, area by 50-80% No change in functionality Pessimistic speculation reduces overhead for speculative switch allocation Reduces overall router area by up to 13% Reduces critical path delay by up to 14% Trade for some throughput loss near saturation 8/21/12 Efficient Microarchitecture for NoC Routers

20 Efficient Microarchitecture for NoC Routers
Outline Introduction Allocator Implementations BUFFER MANAGEMENT Infrastructure Conclusions [Becker et al.: “Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks,” to appear in ICCD’12] 8/21/12 Efficient Microarchitecture for NoC Routers

21 Efficient Microarchitecture for NoC Routers
Buffer Cost [Wang et al.: “Power-driven Design of Router Microarchitectures in On-chip Networks”] 8/21/12 Efficient Microarchitecture for NoC Routers

22 Efficient Microarchitecture for NoC Routers
Buffer Management Many designs divide buffer statically among VCs Assign each VC its fair share But optimal buffer organization depends on load Low load favors deep VCs High load favors many VCs For fixed buffer size, static schemes must pick one or the other Improve utilization by allowing buffer space to be shared among VCs 8/21/12 Efficient Microarchitecture for NoC Routers

23 Buffer Management Performance
[linked-list based scheme; harmonic mean across traffic patterns] -18% +8% -28% 8/21/12 Efficient Microarchitecture for NoC Routers

24 Buffer Monopolization
Congestion leads to buffer monopolization Uncongested traffic sees reduced buffer space Increases latency, reduces throughput Congestion spreads across VCs! 8/21/12 Efficient Microarchitecture for NoC Routers

25 Adaptive Backpressure
Avoid unproductive use of buffer space Impose quotas on outstanding credits Share freely under benign conditions Limit sharing to avoid performance pathologies Vary backpressure based on demand 8/21/12 Efficient Microarchitecture for NoC Routers

26 Buffer Quota Heuristic
Goal: Set quota values just high enough to support observed throughput for each VC Allow credit stalls that overlap with other stalls Drain unproductive buffer occupancy Difficult to measure throughput directly Instead, infer from credit round trip times In absence of congestion, set quota to RTT For each downstream stall cycle, reduce by one 8/21/12 Efficient Microarchitecture for NoC Routers

27 Buffer Quota Motivation (1)
Router 0 Router 1 Router 0 Router 1 Tcrt,0 Tcrt,0+Tstall Tstall Excess flits Congestion causes downstream stall and unproductive buffer occupancy Full throughput is achieved in steady state 8/21/12 Efficient Microarchitecture for NoC Routers

28 Buffer Quota Motivation (2)
Router 0 Router 1 Router 0 Router 1 Tstall Tstall Tstall Excess flit drained Tidle Insufficient credit supply causes idle cycle downstream Credit stall resolves unproductive buffer occupancy 8/21/12 Efficient Microarchitecture for NoC Routers

29 Efficient Microarchitecture for NoC Routers
Network Stability 6.3x [tornado traffic] 8/21/12 Efficient Microarchitecture for NoC Routers

30 Efficient Microarchitecture for NoC Routers
Traffic Isolation [Measure zero-load latency increase with background traffic] -38% -33% [uniform random background traffic] [hotspot background traffic] [uniform random foreground traffic] 8/21/12 Efficient Microarchitecture for NoC Routers

31 Zero-load Latency with Background
-31% w/o background [50% uniform random background traffic] 8/21/12 Efficient Microarchitecture for NoC Routers

32 Throughput with Background
-13% w/o background 3.3x [50% uniform random background traffic] 8/21/12 Efficient Microarchitecture for NoC Routers

33 Application Performance Setup
Model traffic in heterogeneous CMP Each node generates two types of traffic: PARSEC application traffic models latency-optimized core Streaming traffic to memory controllers model array of throughput-optimized cores 8/21/12 Efficient Microarchitecture for NoC Routers

34 Application Performance
-31% w/o background [12.5% injection rate for streaming traffic] 8/21/12 Efficient Microarchitecture for NoC Routers

35 Efficient Microarchitecture for NoC Routers
Summary Sharing improves buffer utilization, but can lead to pathological performance Adaptive Backpressure minimizes unproductive use of shared buffer space Mitigates performance degradation in presence of adversarial traffic But maintains key benefits of buffer sharing under benign conditions 8/21/12 Efficient Microarchitecture for NoC Routers

36 Efficient Microarchitecture for NoC Routers
Infrastructure Open source NoC router RTL State-of-the-art router implementation Highly parameterized Topology, routing, allocators, buffers, … Pervasive clock gating Fully synthesizable 100 files, >22k LOC of Verilog-2001 Used in research efforts both inside and outside our research group 8/21/12 Efficient Microarchitecture for NoC Routers

37 Efficient Microarchitecture for NoC Routers
Conclusions Future large-scale chip multiprocessors will require efficient on-chip networks Router microarchitecture is one of many aspects that need to be optimized Allocation has direct impact on router delay and throughput By exploiting higher-level properties, we can reduce cost and delay without degrading performance Input buffers are attractive candidates for optimization However, care must be taken to avoid performance pathologies By avoiding unproductive use of buffer space, Adaptive Backpressure mitigates undesired interference effects 8/21/12 Efficient Microarchitecture for NoC Routers

38 Efficient Microarchitecture for NoC Routers
Acknowledgements Bill Christos and Kunle Prof. Nishi George, Ted, Curt & the rest of the CVA gang 8/21/12 Efficient Microarchitecture for NoC Routers

39 Efficient Microarchitecture for NoC Routers
Acknowledgements 8/21/12 Efficient Microarchitecture for NoC Routers

40 Efficient Microarchitecture for NoC Routers
Acknowledgements 8/21/12 Efficient Microarchitecture for NoC Routers

41 Efficient Microarchitecture for NoC Routers
That’s it for today. Thank You! 8/21/12 Efficient Microarchitecture for NoC Routers


Download ppt "Efficient Microarchitecture for Network-on-Chip Routers"

Similar presentations


Ads by Google