Presentation is loading. Please wait.

Presentation is loading. Please wait.

International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.

Similar presentations


Presentation on theme: "International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia."— Presentation transcript:

1 International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia Zhai, and Sachin S. Sapatnekar University of Minnesota – Twin Cities

2 MEM NoC dissipates substantial system energy CL1 L2 R R Tile-Based Multicore System RAW – 36%; Intel 80-tile – 28% [Vangal et al. 2008] 2

3 MEM Superscalar Machine VFS and Its Limitations NoC is – Potential performance bottleneck – Source of energy consumption Designed for diverse traffic patterns VFS to reduce energy Limitations of Aggressive VFS – Reduce throughput – Increase latency Work for limited traffic pattern Can we make VFS work for other important traffic patterns? 3 SensitiveInsensitive High Latency Throughput Low 3

4 Frequency Scaling Frequency = F 1 T 4 4 2 Frequency = 0.5F Animation Frequency scaling harms performance

5 1234 Reconfigure Pipeline Frequency = 0.5F T 4 Flexible pipeline can reduce router pipeline delay 5 1234 TT

6 Flexible Pipeline Routers + Reduce NoC energy + Negligible performance degradation SensitiveInsensitive High Low Latency Throughput Reduce frequency without increasing router latency 5 6 Target Application Low throughput Latency sensitive

7 Outline Background/Motivation Router Design Experimental Results Related work Conclusion 6 7

8 Route Computation VC Allocator (VA) Switch Allocator (SA) MC 1, VC 1 MC n, VC 1 Crossbar Switch (ST) Output ports Input ports Input Controller (BW/RC) Baseline Router Architecture How to reconfigure pipeline? BW RC BW RC Route Computation VA VC Allocator (VA) VC Allocator (VA) SA Switch Allocator (SA) Switch Allocator (SA) ST 7 8

9 Pipeline Stage Delay BW+RC VA SA ST 100 τ65.5 τ77.7 τ45 τ Delay of 4-stage pipeline: T clk = 72.1 τ 10 9 Time-borrowing Boost pipeline frequency Average out stage delays τ : inverter delay The router delay model is presented in [Peh et al., HPCA 2001].

10 Pipeline Reconfiguration Flex Router: pipeline reconfiguration BW+RC VA SA ST 100 τ 4 65.5 τ 4 77.7 τ 4 45 τ 4 BW+RC VA+SA+ST 100 τ 2 170.2 τ 2 BW+RC VA SA+ST 100 τ 3 65.5 τ 3 113.7 τ 3 BW+RC+VA+SA+ST 270.2 τ 1 4-stage pipeline V dd = 1.2 V 3-stage pipeline V dd = 1.0 V 2-stage pipeline V dd = 1.0 V 1-stage pipeline V dd = 0.8 V How much hardware overhead? T clk = 93.1τ 3 = 102.1τ 4 T clk = 135.1τ 2 = 148.7τ 4 T clk = 72.1τ 4 T clk = 270.2τ 1 = 337.7τ 4 10

11 Route Computation VC Allocator Switch Allocator Input Controller (with buffers) Flits outFlits in Route Computation VA SA Input Controller (with buffers) Flits outFlits in BW/RC ST Architecture Support BW+RC VA SA ST 4-stage pipeline R R R 11 RRR

12 BW+RC VA SA ST 4-stage pipeline RRR Architecture Support Route Computation VA SA Input Controller (with buffers) Flits outFlits in RR MUX R R R 11 BW/RC ST BW+RC VA SA ST 3-stage pipeline RR MUX BW+RC VA SA ST 2-stage pipeline R MUX BW+RC VA SA ST 1-stage pipeline MUX Less than 2% overhead in router area + Control Logics 11

13 Outline Background/Motivation Router Design Experimental Results Related work Conclusion 12

14 Experimental Platform Simulator – Full system simulator: GEMS – Power module: Wattch & Orion2.0 – Infrastructure: 8 Core, 1 issue in-order Benchmarks – From SPEC OMP2001, NU-Mine and PARSEC 13 MEM C L1 L2 R 1.5 GHz

15 Base: Baseline Router Base-2: VFS, Slowdown Factor of 2 Flex-2: VFS + Flexible-Pipeline Router Efficacy in Network Energy Saving 14 41%2% 14 Dynamic energy decreases quadratically as voltage goes down Clock energy reduction is significant ( 65% ) Changes in static energy are minimal

16 Sensitive Insensitive High Low Latency Throughput Base: Baseline Router Base-2: VFS Flex-2: VFS + Flexible-Pipeline Router Efficacy in Execution Time Workload L1 data cache (misses/K instructions) L2 cache (misses/K instructions) ammp13.74.4 art40.818.1 blackscholes8.10.9 equake2.82.6 fkmeans1.91.7 kmeans2.41.9 1.5% Average system performance degradation is reduced 15

17 System Energy System Delay System-level ED 2 Product – Cores, caches and the interconnection networks – E: System Energy – D: System Delay System-Level Evaluation 16 Network Energy Network Delay Tradeoff

18 Efficacy in System ED 2 Product ED 2 increase 16 Base: Baseline Router Base-2: VFS Flex-2: VFS + Flexible-Pipeline Router Frequency tuning should be based on workloads 17

19 Base: Baseline Router Flex-2: Flexible-Pipeline Router + Slowdown Factor of 2 Flex-4: Flexible-Pipeline Router + Slowdown Factor of 4 More Aggressive VFS: Network Energy Saving Flexible –Pipeline Router is scalable in reducing network energy 43% 39% 17 18

20 Base: Baseline Router Flex-2: Flexible-Pipeline Router + Slowdown Factor of 2 Flex-4: Flexible-Pipeline Router + Slowdown Factor of 4 More Aggressive VFS: Execution Time 18 Performance degradation is increasing 19

21 Base: Baseline Router Flex-2: Flexible-Pipeline Router + Slowdown Factor of 2 Flex-4: Flexible-Pipeline Router + Slowdown Factor of 4 Limits of VFS: System ED 2 Product Diminishing returns when pushing the frequency scaling limit Workload-dependent 19 20

22 Related Works “A case for dynamic frequency tuning in on-chip networks” [Mishra `09] Dynamically router VFS for reducing network power consumption – Flexible-pipeline routers enable more drastic scaling “A variable-pipeline on-chip router optimized to traffic pattern” [Hirata `10] Dynamically router VFS + variable-pipeline-routers – Flexible-pipeline routers have lower hardware overhead – Our work presents system-level evaluation 20 21

23 Conclusions 21 EnergyPerformance Flexible-Pipeline Router  Minimal hardware overhead  Enable aggressive VFS Flexible-Pipeline Router  Minimal hardware overhead  Enable aggressive VFS System Level Implications  Considerable energy saving  Negligible performance degradation System Level Implications  Considerable energy saving  Negligible performance degradation 22

24 Thank you! 21 Q & A

25 Router Delay Model * Router stage delay: 9 9 Route Computation VC Allocator (VA) Switch Allocator (SA) MC 1, VC 1 MC n, VC 1 Crossbar Switch (ST) Output ports Input ports Input Controller (BW/RC) p: # of input/output ports c: # of message classes v: # of VCs/message class ω : flit size in bits t i : sequential logic latency h : setup delay τ : inverter delay Stage titi h BW/RCconstant0VAf(p, v)9 τ9 τSAf(p, c, v)9 τ9 τSTf(p, ω)0 *This model is presented in [Peh et al., HPCA 2001].

26 System Energy Breakdown


Download ppt "International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia."

Similar presentations


Ads by Google