Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Future of FPGA Interconnect Guy Lemieux The University of British Columbia Tuesday, December 8, 2009 FPT 2009 Workshop Getting the LUT-heads to work…

Similar presentations


Presentation on theme: "The Future of FPGA Interconnect Guy Lemieux The University of British Columbia Tuesday, December 8, 2009 FPT 2009 Workshop Getting the LUT-heads to work…"— Presentation transcript:

1 The Future of FPGA Interconnect Guy Lemieux The University of British Columbia Tuesday, December 8, 2009 FPT 2009 Workshop Getting the LUT-heads to work…

2 Layman’s viewpoint How do I explain FPGA interconnect to mom? Imagine planning a city on a grid – Maximum of 100,000 people, “LUT-heads” – For every LUT-head, given two things Home location Work location (often multiple work locations…) Problem: Getting the LUT-heads to work! Problem: Getting the LUT-heads to work! – Design a fixed road network – Every LUT-head drives in own lane (no time-sharing or bus) – Very expensive, lots of infrastructure “logic family” 2

3 Layman’s viewpoint (2) Problem, Version 2 – After 25yrs, every LUT-head changes home & work LUT-head population may grow or shrink – Same road network must still be used Can only ‘reconfigure lanes’ by changing road paint Problem, Version 3 – Start over, assuming 1,000,000 LUT-heads – New issues when the problem scales? Average trip length ? Average number of lanes in road ? 3

4 Overview What’s in FPGA interconnect? – Review of typical design What are the main application areas? – Driving the future of interconnect design What are the interconnect metrics? – Pushing the envelope, then becoming practical Open research problems? – Driving the future of interconnect design 4

5 Overview What’s in FPGA interconnect? – Review of typical design What are the main application areas? – Driving the future of interconnect design What are the interconnect metrics? – Pushing the envelope, then becoming practical Open research problems? – Driving the future of interconnect design 5

6 Input connections S Block C Block Altera Stratix Interconnect CLB aka LAB 6

7 Input connections IIB: input interconnect block Altera Stratix Interconnect 7

8 Input connections, neighbours 1 S Block C Block Connections in CLB grow bigger 8

9 Input connections, neighbours 2 S Block C Block Connections in C Block grow bigger 9

10 Output connections, local S Block C Block Altera Stratix Interconnect Single-driver: LUT outputs must only feed muxes 10

11 Output connections, global S Block C Block Altera Stratix Interconnect Single-driver: LUT outputs must only feed muxes extended to include LUT outputs 11

12 Design considerations Design of C Block / IIB – Selects LUT inputs – Overall function: ‘M’ choose ‘kN’ M = 100..500 wires (H + V) N = 8.. 16 LUTs k = 4..6 inputs/LUT 12

13 Design considerations Design of S Block – Steers M signals throughout array (turns) Also accepts N LUT outputs – Topologically simple Fs = 3: each wire connects to only 3 outgoing wires Exception: LUT outputs connect to > 3 wires – Strongly influenced by circuit implementation Bidirectional vs directional 13

14 Array of CLBs, C and S Blocks 14

15 Bidirectional vs. Directional Wiring bidir/dir == S Block Design + single-driver == C Block Design

16 Bidirectional Wires Logic C Block S Block 16

17 Bidirectional Wires Problem Half of tristate buffers left unused Buffers + input muxes dominate interconnect area 17

18 Bidirectional vs Directional 18

19 Bidirectional vs Directional 19

20 Bidirectional vs Directional 20

21 Bidirectional vs Directional 21

22 Bidirectional Switch Block 22

23 Directional Switch Block 23

24 Bidirectional vs Directional Switch Element Same quantity and type of circuit elements, twice the wiring Switch Block Directional half as many Switch Elements 24

25 Quantization of Channel Width Bidirectional (Q=1) 4 Switch Elements Ch. Width = 4 * Q = 4 * 1 Directional (Q=2) 2 Switch Elements Ch. Width = 2 * Q = 2 * 2 No “partial” switch elements with < Q wires 25

26 S Blocks with Long Wires Long wires, span L tiles – Example L = 3 Changes Q Q = L for bidirectional Q = 2L for directional 123 CLB 26

27 Building up Long Wires Start with One Switch Element Wire ends for straight connections. CLB 27

28 Building up Long Wires Connect MUX Inputs Extend MUX inputs CLB 28

29 Building up Long Wires Connect MUX Inputs TURN UP from wire-ends to mux CLB 29

30 Building up Long Wires Connect MUX Inputs TURN DOWN from wire-ends to mux CLB 30

31 Building up Long Wires Add +2 More Wires (4 total) Add LONG WIRES, turning UP and DOWN. CLB 31

32 Building up Long Wires Add +2 More Wires (6 total) Add LONG WIRES, turning UP and DOWN CLB 32

33 Building up Long Wires Twisting to Next Tile Add wire twisting CLB 33

34 CLB Full S Block with Long Wires Using One L=3 Switch Element (Q = 2L = 6) 34

35 Scaling Channel Width Using L=3 Switch Element CLB 2 Switch Elements Channel width = 2Q = 12 1 Switch Element Channel width = Q = 6 VERY IMPORTANT: Area growth is linear with channel width 35

36 Long Wires  Changes Quantum Long wires, span L tiles – Example L = 3 Q = L for bidirectional Q = 2L for directional 123 CLB 36

37 Multi-driver Wiring Logic outputs use tristate buffers (C Block) Directional & multi-driver wiring C Block S Block CLB 37

38 Single-driver Wiring Logic outputs use muxes (S Block) Directional & single-driver wiring New connectivity constraint S Block CLB 38

39 Directional, Single-driver Benefits Average improvements 0% channel width 9% delay 14% tile length of physical layout 25% transistor count 32% area-delay product 37% wiring capacitance Any reason to use bidir? – Important implications on future interconnect! 39

40 C Block design C Block 40

41 C Block design 41 M inputs (100 … 500) Up to kN outputs (4*8... 8*10)

42 C Block design 42

43 C Block design Sparse crossbar Similar # switchpoints – On inputs – On outputs Spread out pattern – Two columns have maximum Hamming distance (most # of different switch points) – True for all pairs of columns 43

44 Overview What’s in FPGA interconnect? – Review of typical design What are the main application areas? – Driving the future of interconnect design What are the interconnect metrics? – Pushing the envelope, then becoming practical Open research problems? – Driving the future of interconnect design 44

45 What are the main application areas? What are FPGAs used for? – A long long time ago… small glue logic Modern… – Internet routers (table lookups, multiplexing) – Embedded systems design (NIOS II, MicroBlaze) – Cell phone basestations (communications DSP) – HDTV sets / set-top boxes (video/image DSP) Future? 45

46 Application drivers What we know – FPGAs increasingly more powerful, constant cost – ASIC design costs escalating wildly Most ASICs use older technology (0.18/0.13  m) Increasingly, ASICs implemented as FPGAs instead – FPGAs only in low-volume E.g., being designed-out of HDTV sets Extrapolate to find new emerging markets … 46

47 Application drivers (2) Extrapolating… low volume, high margin – Industrial/scientific instruments: low volume, high margin Medical sensing, imaging (ultrasound, PET, …) Electronics test & measurement (router tester, …) Physics (neutrino detection, …) mixed volume, mixed margin – Computation: mixed volume, mixed margin Computer system simulation (RAMP, …) Molecular dynamics, financial modeling, seismic / oil & gas mixed volume, mixed margin – Portable/handheld: mixed volume, mixed margin Consumer Industrial/Medical 47

48 Application drivers (3) Problems with FPGAs – Expensive for high-volume markets Need cost-reduction strategy – Insufficient capacity Could just wait for Moore’s Law to catch up Capture emerging markets early: ultra-capacity FPGA – Hard to program Particularly important when used for computation Domain-specific languages help – Power – Slow 48

49 Overview What’s in FPGA interconnect? – Review of typical design What are the main application areas? – Driving the future of interconnect design What are the interconnect metrics? – Pushing the envelope, then becoming practical Open research problems? – Driving the future of interconnect design 49

50 Interconnect metrics Typical – Area – Delay (latency) – Power Obscure, but important! – Co$t – Bandwidth – Programmability/Ease of use – Reliability/Integrity – Flexibility/Runtime reconfigurability 50

51 Pushing the envelope Research is about discovery, ideas, exploration – Also evaluation, limit studies, potential uses One general research strategy – Pick a metric – Push the envelope How far did you get? – Back off until practical – Re-integrate with reality 51

52 Pushing the envelope (2) Example: Area – Cyclone/Spartan are low-cost (low-area) FPGAs Push area to the limits? – Reduce every routing buffer to 1x inverter – Extensive use of pass transistor switches – Reduce connectivity, force sparse logic – Bit-serial logic + routing for datapath How small can we get? – Is this practical? Is there a market? Is it cost-effective? – Increased parallelism? Prototype future FPGA designs now? 52

53 Pushing the envelope (3) Example: Bandwidth – Virtex/Stratix are high-performance FPGAs Push bandwidth to the limits? – E.g., pipeline every routing wire / switch – Use registers or wave-pipeline How much throughput can we get? – Wave-pipelining ~5Gbps in 65nm [FPGA2009] – Is this practical? Is there a market? 53

54 Pushing the envelope (4) Example: Flexibility/Runtime reconfigurability – Limited reconfigurability on Xilinx, not on Altera Push flexibility/RTR to the limits? – Note: not a naïve “fully connected” graph – Every switch is dynamically addressable, reconfigurable – Every route has an alternative/backup What can we gain? – Choose-your-own adventure routing [FPGA2009] – Improved NoC-on-FPGA (?) – Is this practical? Is there a market? 54

55 Pushing the envelope (5) Pushing envelope for other metrics – Power [Kaptanoglu, keynote FPT2007] – Co$t (area?) – Programmability/Ease of use (a CPU?) – Reliability/Integrity (built-in TMR & Razor?) 55

56 Pushing the envelope (5’) Pushing envelope for other metrics – Power [Kaptanoglu, keynote FPT2007] Portable/handheld Portable/handheld – Co$t (area?) Portable/handheld, computation Portable/handheld, computation – Programmability/Ease of use (a CPU?) Computation Computation – Reliability/Integrity (built-in TMR & Razor?) Scientific/industrial instruments Scientific/industrial instruments Markets exist for these metrics! 56

57 Overview What’s in FPGA interconnect? – Review of typical design What are the main application areas? – Driving the future of interconnect design What are the interconnect metrics? – Pushing the envelope, then becoming practical Open research problems? – Driving the future of interconnect design 57

58 Open research problems Defect tolerance IIB design – Hard core integration Memory-footprint // Runtime optimized Performance guarantees Layout-aware methods Efficient datapaths Expose the muxes Low-latency, area-efficient repeaters/switches 58

59 Open research problems (2) Defect tolerance – Future semiconductor technologies expected to be less reliable – Interconnect has built-in redundancy (by design) Issues – Defect localization – Delay-oriented defects – Abstraction suitable for CAD or bitstream-load – Intentional redundancy: how, where, quantity 59

60 Open research problems (3) IIB (input interconnect block) design – Function: ‘M’ choose ‘kN’ – Conserve ‘switchpoints’, area (# muxes, mux size), delay (levels) – Maximize ‘entropy’ == # of unique functional configurations Are some configurations more important than others? How to count # of configurations? – Generally, difficult topological design problem Most promising ‘type 3’ IIB [TRETS2008] ≈ Clos network ? IIB: input interconnect block M inputs kN outputs 60

61 Open research problems (4) Hard core integration – Heterogeneous instance of IIB design problem Issues – Each hard core has different # inputs, # outputs Complicates uniformity – Some have large # inputs, outputs Creates congestion ‘pinch points’ Need to design for ‘worst case’ routability – Would prefer ‘average case’ 61

62 Open research problems (5) Memory-footprint / Runtime optimized – Architecture graph – Netlist search graph Issues – Entire architecture graph is huge, static – Netlist search graph dynamic, alloc/dealloc – Random pointer-chasing – Cache-unfriendly, cache-DRAM bandwidth – Can architecture changes make improvements? 62

63 Open research problems (6) Performance guarantees – FPGA routers work well, nobody complains Thank you, PathFinder [McMurchie & Ebeling] Issues – Not guaranteed to find a solution (no detection!) Want ‘Just (unoptimally) route it!’ algorithm – No performance bounds on metrics Within X% tracks, Y% delay from minimum 63

64 Open research problems (7) Layout-aware methods – Altera, Xilinx know how to lay out interconnect – 10+ levels of metal, metal-over-switches, integration of switches and logic Issues – Arbitrary ‘topology’ graphs not practical to build – “One size fits all” FPGA diminishing “Application-specific” FPGA likely to arrive – Automated layout, automated circuit design tools Aware of FPGA architecture / structure 64

65 Open research problems (8) Efficient datapaths – Multi-bit connections; same source, same sink – Datapath connections coherent, seemingly simple – Very common in computation designs Issues – No successful datapath circuit-switched architecture Dedicated datapath interconnect only 5-10% smaller Abandon circuit switching?  power – How wide? 4b, 8b, 32b? – How to build? 65

66 Open research problems (9) Expose the muxes (1) – LUTs terrible for implementing multiplexers 2 x 4LUTs = 1 x 6LUT = 4:1 mux Imagine 54b barrel shifter (IEEE double-precision) 1 CLB ≈ 8 x 6LUTs ≈ 2 x 16:1 muxes – Interconnect is full of muxes 1 CLB ≈ 60 x 16:1 muxes Issues – How to ‘expose’ interconnect muxes to users? – Put routing mux select bits under user control – How to guarantee signal ordering? 66

67 Open research problems (9’) Expose the muxes (2) – Many systems use lots of 32b muxes NIOS, MicroBlaze, NoC, Compute engines – Can we use fast run-time reconfiguration instead of building muxes? Issues – How to expose programming bits to user? – How to enumerate & pre-p&r all configurations? 67

68 Summary Interconnect design is fun and challenging Many ‘practical’ of issues solved – Lots of ‘academically interesting’ problems remain – Can still ‘push the envelope’ Promising open problems Final thoughts… – Circuit design ↔ Topology ↔ Layout  CAD – Architectural models (C block, S block) restrictive 68

69 EOF 69

70 70


Download ppt "The Future of FPGA Interconnect Guy Lemieux The University of British Columbia Tuesday, December 8, 2009 FPT 2009 Workshop Getting the LUT-heads to work…"

Similar presentations


Ads by Google