Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3.

Similar presentations


Presentation on theme: "Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3."— Presentation transcript:

1 Mohamed ABDELFATTAH Vaughn BETZ

2 2 Why NoCs on FPGAs? Embedded NoCs Comparison Against Buses 3 3

3 Interconnect 3 1. Why NoCs on FPGAs? Logic Blocks Switch Blocks Wires

4 4 1. Why NoCs on FPGAs? Logic Blocks Switch Blocks Wires Hard Blocks: Memory Multiplier Processor Hard Blocks: Memory Multiplier Processor

5 5 1. Why NoCs on FPGAs? Logic Blocks Switch Blocks Wires Hard Interfaces DDR/PCIe.. Hard Interfaces DDR/PCIe.. Interconnect still the same Hard Blocks: Memory Multiplier Processor Hard Blocks: Memory Multiplier Processor 1600 MHz 200 MHz 800 MHz

6 6 DDR3 PHY and Controller Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet 1600 MHz 200 MHz 800 MHz

7 7 DDR3 PHY and Controller Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet

8 BarcelonaLos Angeles Keep the roads, but add freeways. Hard Blocks Logic Cluster Source: Google Earth

9 9 DDR3 PHY and Controller 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated NoC RoutersLinks Router forwards data packet Router moves data to local interconnect

10 10 DDR3 PHY and Controller 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated 5.Abstraction favours modularity: – Parallel compilation – Partial reconfiguration – Multi-chip interconnect Pre-design NoC to requirements NoC links are re-usable NoC is heavily pipelined NoC abstraction favors modularity High bandwidth endpoints known

11 11 DDR3 PHY and Controller 1. Why NoCs on FPGAs? PCIe Controller Gigabit Ethernet Latency-tolerant communication NoC abstraction favors modularity Problems: 1.Bandwidth requirements for hard logic/interfaces 2.Timing closure 3.High interconnect utilization: – Huge CAD Problem – Slow compilation – Power/area utilization 4.Wire speed not scaling: – Delay is interconnect-dominated 5.Abstraction favours modularity: – Parallel compilation – Partial reconfiguration – Multi-chip interconnect

12 12 1. Why NoCs on FPGAs? Maxeler Geoscience (14x, 70x) Financial analysis (5x, 163x) Altera OpenCL Video compression (3x, 114x) Information filtering (5.5x) GPU CPU

13 13 1. Why NoCs on FPGAs?

14 14 1. Why NoCs on FPGAs?

15 15 1. Why NoCs on FPGAs? NoC

16 16 Why NoCs on FPGAs? Embedded NoCs Mixed NoCs Hard NoCs Comparison Against Buses 3 3

17 2. Embedded NoCs Mixed NoC Hard NoC Soft LinksHard Routers Hard LinksHard Routers = + + = Soft NoCSoft LinksSoft Routers + =

18 18 Soft Hard FPGA CAD Tools ASIC CAD Tools Design Compiler Area Speed Power? Power Toggle rates Gate-level simulation Mixed HSPICE

19 FPGA Router Embedded NoCs Logic blocks Baseline Router Programmable soft interconnect WidthVCsPortsBuffer /VC Mixed NoCSoft LinksHard Routers + =

20 FPGA Router Embedded NoCs 20 Mixed NoCSoft LinksHard Routers + =

21 Router 21 Assumed a mesh Can form any topology FPGA 2. Embedded NoCs Special Feature Configurable topology

22 FPGA Router Embedded NoCs Logic blocksDedicated hard interconnectProgrammable soft interconnect 22 Hard NoCHard LinksHard Routers + =

23 FPGA Router Embedded NoCs 23 Hard NoCHard LinksHard Routers + =

24 FPGA Router Embedded NoCs Low-V mode 1.1 V 0.9 V Save 33% Dynamic Power Special Feature ~15% slower 24 Hard NoCHard LinksHard Routers + =

25 25 Hard Router vs. Soft Router 9X smaller, 2.4X faster, 1.4X lower power 30X smaller, 6X faster, 14X lower power Hard Links vs. Soft Links

26 26 Area Gap Speed Gap Power Gap MixedHard (Low-V) Soft 20X – 23X smaller 5X – 6X faster 9X – 11X (15X) less Average 1X 3. Area/Power Analysis

27 MixedHard Soft Speed Bisection BW ~ 1.5% of FPGA 33% of FPGA 730 – 940 MHz 166 MHz ~ 50 GB/s ~ 10 GB/s 64 – NoC [65 nm] 3. Area/Power Analysis 576 LBs ~12,500 LBs Area 448 LBs 64-node NoC on Stratix III

28 MixedHard (Low-V) Soft Speed Bisection BW ~ 1.5% of FPGA 33% of FPGA 730 – 940 MHz 166 MHz ~ 50 GB/s ~ 10 GB/s 64 – NoC [65 nm] 3. Area/Power Analysis 576 LBs ~12,500 LBs Area 448 LBs Provides ~50GB/s peak bisection bandwidth Very Cheap! Less than cost of 3 soft nodes 64-node NoC on Stratix III

29 29 Soft NoCMixed NoCHard NoCHard NoC (Low-V) 17.4 W 250 GB/s total bandwidth Typical FPGA Dynamic Power 123% How much is used for system-level communication? 3. Area/Power Analysis Largest Stratix-III device

30 30 Soft NoCMixed NoCHard NoCHard NoC (Low-V) 17.4 W NoC 250 GB/s total bandwidth 15% Typical FPGA Dynamic Power 3. Area/Power Analysis 123%

31 31 NoC 17.4 W Typical FPGA Dynamic Power Soft NoCMixed NoCHard NoCHard NoC (Low-V) 250 GB/s total bandwidth 15% 123%11% 3. Area/Power Analysis

32 32 NoC 17.4 W Typical FPGA Dynamic Power Soft NoCMixed NoCHard NoCHard NoC (Low-V) 250 GB/s total bandwidth 15% 123%11% 7% 3. Area/Power Analysis

33 GB/s 17 GB/s DDR3 Module 1 PCIe Module 2 Full theoretical BW 126 GB/s Aggregate Bandwidth 3.5% NoC Power Budget Cross whole chip! 3. Area/Power Analysis

34 34 Why NoCs on FPGAs? Embedded NoCs Design Effort Comparison Against Buses 3 3 Area/Power Efficiency

35 35 4. Comparison Qsys bus: Build logical bus from fabric Embedded NoC: 16 Nodes, hard routers & links

36 36 4. Comparison Qsys bus: Build logical bus from fabric Embedded NoC: 16 Nodes, hard routers & links The Case for Embedded Networks-on-Chip on FPGAs To appear in IEEE Micro Magazine (February)

37 37 4. Comparison Steps to close timing using Qsys close FPGA

38 38 4. Comparison Steps to close timing using Qsys far FPGA

39 39 4. Comparison Steps to close timing using Qsys far FPGA Timing closure can be simplified with an embedded NoC

40 40 4. Comparison

41 41 4. Comparison

42 42 4. Comparison Entire NoC smaller than bus for 3 modules!

43 43 4. Comparison 1/8 Hard NoC BW used already less area for most systems

44 44 4. Comparison Hard NoC saves power for even the simplest systems

45 Big city needs freeways to handle traffic Area: 20-23X Why NoCs on FPGAs? Embedded NoCs: Mixed & Hard Speed: 5-6XPower: 9-15X Area Budget for 64 nodes: ~1% Power Budget for 100 GB/s: 3-7% Comparison Against P2P/Buses 3 3 Raw efficiency close to simplest P2P links NoC more efficient & lower design effort.

46 46

47


Download ppt "Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs 1 1 2 2 Comparison Against Buses 3 3."

Similar presentations


Ads by Google