Presentation is loading. Please wait.

Presentation is loading. Please wait.

FPGAs and Structured ASICs Overview & Research Challenges

Similar presentations


Presentation on theme: "FPGAs and Structured ASICs Overview & Research Challenges"— Presentation transcript:

1 FPGAs and Structured ASICs Overview & Research Challenges
Vaughn Betz Director, Software Engineering

2 Agenda What is an FPGA? FPGA & ASIC market dynamics FPGA technology
Structured ASIC technology Research Challenges Power Scalable CAD CAD to raise abstraction level Structured ASIC total cost

3 What is an FPGA?

4 What is an FPGA? Field Programmable Gate Array Gate Array
Two-dimensional array of logic gates Traditionally connected with customized metal Every logic circuit (customer) needs a custom-manufactured chip Field Programmable Customized by programming after manufacture One FPGA can serve every customer FPGA: re-programmable hardware

5 Basic Internals of an FPGA

6 Embedding a circuit in an FPGA
All done by CAD system (e.g. Quartus) Chop up circuit into little pieces of logic Each piece goes in a separate logic element (LE) Hook them together with the programmable routing

7 FPGA Logic Element Look-Up Table (LUT) + register + extra …
FPGAs typically use 4-input or larger LUTs Cyclone family (low cost): 4-inputs Stratix II: Adaptive Logic Module implements 4 – 6 input LUTs efficiently Virtex 5: 6 inputs

8 Connecting the Logic Logic elements implement the pieces of the circuit Now hook them up with the programmable routing

9 Programmable Routing Programmable switches connect fixed metal wires
Choose pattern so any logic element can connect to any other

10 Modern, mid-size FPGA – 2S60
90nm Stratix II 2S60

11 FPGA and ASIC Market Dynamics

12 FPGAs vs. Standard Cell ASICs
Parameter FPGA Standard Cell CAD tool Cost $2000 $Millions Mask Cost $1.4M 90 nm Bug Fix 1 hour ~10 weeks Electrical & Optical Check & Debug Vendor’s Problem Your Problem! Time to Market Fast Slow Die Size 2X to 20X 1X Volume Cost 1X to 20X Speed 0.3X to 0.6X Power 2X to 5X

13 CMOS Semiconductor Market

14 Traditional FPGA Users

15 Std Cell ASIC Development Cost Trend
45 Total Development Costs ($M) 40 35 30 25 20 15 10 5 0.18 µm 0.15 µm 0.13 µm 90 nm 65 nm 45 nm Masks & Wafers Test & Product Engineering Software Design/Verification & Layout Note: Conservative estimate; does not include re-spins.

16 Standard Cell/Gate Arrays
Result: Declining ASIC Starts 12000 Standard Cell/Gate Arrays 10000 8000 Design Starts 6000 4000 2000 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 Source: Dataquest/Gartner

17 Today’s “typical design”
+15=5:50 EE Times, Aug 23, 2004

18 New FPGA Users & Products
Designers “Priced Out” Of ASIC Start-Ups / Risk-Adverse Replacement For DSP Consumer & Industrial

19 Broadcast/Audio/Video

20 Wireless

21 Industrial, Test & Measurement

22 Consumer: Displays

23 Consumer Gadgets

24 FPGA Technology

25 FPGAs Need Vertical Integration
Silicon process models & expertise FPGA architecture Complete CAD system Intellectual Property cores Including soft processors Embedded software development tools

26 Silicon Process Knowledge
FPGAs move to latest process very early Helps close speed, area gap with ASICs in older processes High volume covers development costs Foundries use FPGAs as process drivers Large dies Both logic and RAM Regular structures help shake out systematic fab issues Need good silicon expertise to stay on the bleeding edge of process

27 90nm 9-layer Interconnect
Transistor

28 90nm Transistor Cross-section
Poly Spacer Contact Diffusion Isolation Dielectric Salicide

29 FPGA Architecture Want to improve speed, area & power to
Close gap with ASICs Stay ahead of competition Need to ensure device Is routable Has right mix of features Huge problem space Routing wires, switch pattern, LUT size, RAM types, logic block size, …

30 Architecting via Virtual Prototypiong
Customer Designs IP, Reference Designs FPGA Arch. Spec (150 pages) FMT FMT Synthesis Params FPGA Database (300M) Timing, Area Models FMT Place&Route Analysis: Speed & Area Routability, Power

31 Parallel design Carefully Manage Risk vs. Reward
Can’t Do This Sequentially Process Technology Circuit Design Concurrent Design +45 = 4:20 Risk: 130nm: X went with lowK, new process, new FAB, new IP – all in the same architecture Process: tuned by TSMC Circuit design: multiple Vt, non-minimal L Architecture: Hand in hand with tools – 150K experiment runs Software: entire CAD flow directly supporting our chips 300 software engineers. Process Technology is one key aspect to get right, but Circuit Design, FPGA Architecture, and CAD Software all need to be developed. It’s the combination of these aspects that is key to building a successful FPGA. Aspects of circuit design are managing power using multiple threshold voltages, and transistor sizing. In particular deciding when to use fast leaky transistors as opposed to slower, more power efficient transistors is key to balancing performance and power. In terms of the FPGA architecture itself, the key innovation is to shift the balance between Logic Element Delays and Routing Delays. In particular, since routing delay increases disproportionately to cell delay, it makes more sense to increase the logic cell capabilities in order to reduce logic depth and therefore routing delay. Also the design tools need to be developed hand-in-hand with these changes, so that correct optimisation choices can be made throughout, and then the CAD optimisations can be rolled out quickly to designers. In order to link all these optimisations together we have run over 150,000 experiments on a set of representative circuits to tune the exact details. FPGA Architecture Software

32 Complete Design Flow: Quartus II
IP Cores Verilog, VHDL Synthesis 3-rd Party or Altera Placement & Routing Physical Synthesis Timing & Power Analysis Assembler Report Over 10 Million Lines of Code!

33 IP Core: Nios II Soft Processor
Three CPU Choices: Nios II/f Fast: Optimized for Performance Nios II/s Standard: Faster and Smaller than Nios Nios II/e Economy: Smallest FPGA Footprint Choose peripherals you want SoPC Builder software builds bus interfaces, arbitration etc. Nios II/e Nios II/s Nios II/f Smaller Faster

34 Soft Processors are Affordable
Largest Stratix II 180,000 LEs Small Cyclone II LEs Nios II Nios II Nios II FPGA FPGA Nios II Nios II 600 LEs 13% of FPGA Nios II/e “economy” Nios II 35¢ in lowest cost FPGA 1800 LEs, 1% of FPGA Nios II/f “fast”

35 © 2005 Altera Corporation - Massively Parallel Nios II Barco Media & Entertainment Olite 510 LED Display System Modular LED Display System 100 Nios II Processors per square meter! For more information on this slide, go the Success Stories section of Molson: Or contact Martin Won FPGA Used:

36 Structured ASIC Technology

37 What is a Structured ASIC?
Use fixed masks for most layers Use customer-specific masks for a few via & metal layers To customize the logic cells and to route signals between logic cells Has characteristics between an FPGA and a standard cell ASIC Faster and smaller than an FPGA But lower development cost & time than a standard cell ASIC

38 FPGA to Structured ASIC
Two Metal Layers for Customization Configuration Routing LEs, Memory, PLLs, DSP Blocks, Internal Routing Flip-Chip Bumps Common Base Die Signal Routing Stratix FPGA forms the basis of our HardCopy I line We create the base arrays by removing the programmability The base array layers are architecturally equivalent to the FPGA Two layers for customization No one else other than the FPGA has the ability to design their silicon like this Other solutions are conversions, not migrations.

39 Stratix HardCopy Base Array
Remove Interconnect System Remove Configuration, Logic & Memory Programmability Logic Elements Configuration Memory & Logic Memory Resulting Base Die Up to 70% Smaller Interconnect As the design moves into a structured ASIC, the programmability is not longer necessary; the design elements are fixed now for functionality. Programmable interconnects are replaced by design-specific interconnects obtained from the customer design’s netlist. Configurability is no longer necessary as the design is no longer required to be programmable. By removing the programmability and configurability, the die is shrunk by ~70% resulting in significant cost, performance and power consumption benefits. Start With The FPGA Die

40 Development Cost and Risk
Mask cost reduced vs. Std. Cell ~5 masks instead of ~30 Verification of crosstalk, electromigration etc. much easier than Std. Cell Since most layers are standard Same PLLs, I/Os, RAMs and packages as FPGA Debug your system with an FPGA, then do a drop-in replacement with HardCopy Can ship systems with FPGA until volume merits going to HardCopy Can get customer feedback on systems with FPGAs and tweak before going to HardCopy

41 Identical Operation Key selling point for Altera HardCopy
EP1S80F1020, 105C, VCC-5% HC1S80F1020, 105C, VCC-5% Data Rate 840 Mbps, LVDS HardCopy devices fully support “drop-in” replacement of the FPGAs through identical package (pinout, footprint are same), thus facilitating “total” seamless migration. Cell internals like the buffers and other components are designed to meet the FPGA I/O electrical specifications. Key selling point for Altera HardCopy

42 FPGA to HardCopy CAD Flow
FPGA Constraints Stratix II POF Quartus Stratix II Flow Equality Checker HDL Code Same HDL source Handoff Design Files Quartus HardCopy II Flow HardCopy Constraints HardCopy Design Center Same CAD flow & guaranteed equivalence

43 2nd Generation: HardCopy II
First generation HardCopy Removed programmability from FPGA Second generation HardCopy II Removes programmability Re-maps logic and DSP blocks to a fabric that is more efficient in a structured ASIC Larger die size reduction But more complex CAD flow Typical results vs. Stratix II 70% die size reduction 60% power reduction 50% speed increase

44 HardCopy II Logic Remapping
Section of HardCopy II Floorplan Section of Stratix II Floorplan HCell Macro Implementations of ALMs M4K Block Logic ALMs This is just a sample section of the respective device floorplan. This color code is used in the following slides as well. Green – Registers Blue – Adders, Comb. Logic & Buffers Not Drawn to Scale Illustration Only, Not Actual Quartus II Floorplan View

45 Stratix II Floorplan (only DSP Blocks Shown)
DSP Block Remapping Stratix II Floorplan (only DSP Blocks Shown) HardCopy II Floorplan Built as Needed using HCell Macros Can be Placed Anywhere in the Floorplan where HCells Exist Similar to logic placement flexibility, DSP blocks can also be placed anywhere in the silicon. This flexibility further aids high performance designs. Lack of this flexibility could seriously impact how optimal the placement can be, especially when you factor in the large HCell Macros of the DSP block implementation.

46 Research Challenges

47 Power

48 Power Scaling 130 nm and above 90 nm and below
FPGAs scaled without regard to power Got full performance boost of process 90 nm and below Power-constrained scaling Low-cost FPGA power budget: ¼ W to 3 W High-speed FPGA: 2 W to 20 W Maximum performance within power budget

49 Process Scaling & Power
Dynamic Power drops per LE But reduction is less than 50% / LE Doubling LE count increases power budget Static Power tends to increase Use higher Vt, thicker Tox, longer L on non-timing-critical circuitry If still too high, sacrifice speed by increasing Vt, Tox, L on timing-critical circuitry Can compensate by making architecture faster E.g. Larger LUT

50 Controlling Power 90 nm 65 & 45 nm 32 nm Process parameters
FPGA CAD tools optimize for power 20% dynamic power reduction Innovate on performance, then trade for Pstatic E.g. Stratix II ALM: larger LUT 65 & 45 nm Innovation needed! 32 nm Process will likely have better static power Double-gates FETs, high-K gate dielectric

51 CAD for Power Optimization
Timing-Driven Compiler Power-Driven Compiler Yes Timing Min Yes Critical? Timing Min Delay Critical? Delay No No Yes Power Min Critical? Power Min Area No Min Area

52 E.g. Power-Optimized RAM Mapping
1K X 16 RAM Default Option Power Efficient Option 16 16 2:4 Decoder 4 1Kx4 M4K RAMs 4 256x16 M4K RAMs

53 E.g. Power-Driven Place & Route
Minimize capacitance of high-toggling signals Without violating timing constraints 20 Million Toggle/s 100 Million Toggle/s Power Optimize

54 CAD Scalability

55 FPGA Logic & Memory Growth
10 20 30 40 700 2009 600 500 400 Logic Elements (K) Memory Bits (Mbits) 2006 300 EP2S180 200 2004 EP1S80 EP20K1500E EP2A70 100 EPF10K200E EP20K600E 2002 2001 2000 1999 1998 250 nm 180 nm 180 nm 150 nm 130 nm 90 nm 65 nm 45 nm

56 FPGA Capacity vs. CPU Speed
30X logic growth from 1998 to 2006 Over 30X memory bits growth ~8X CPU speed increase from 1998 to 2006 FPGA CAD problem growing more rapidly than CPU speed But productivity of FPGA designers depends on many compiles To iteratively debug, add features, close timing

57 Compile Time Need to find highly scalable algorithms
For placement, routing, synthesis Do not sacrifice result quality Future: single processor speed-up will fall further behind FPGA capacity growth But more cores per chip Today: 2 2007: 4 Parallel CAD tools, with same result quality? Need sequentially consistent algorithms, or debugging is a nightmare

58 Increasing Design Abstraction

59 FPGA Usage FPGA design is usually done in Hardware Description Language (HDL) Limits FPGA use to hardware designers FPGAs can: Outperform DSPs Create custom hardware / software systems that outperform fixed microcontrollers Usage in these fields limited by unfamiliarity with HDL design

60 Efficiency vs. Development Cost
High Power & System Cost* Development Difficulty & Cost Low Processor DSP FPGA Struct. Std. Cell Full ASIC Custom *For applications with significant parallelism

61 Raising Design Abstraction
Ideal: software engineers can design hardware C to gates Not achievable in general Practical: domain-specific higher-level tools SoPC builder: Build a custom microcontroller Integrate IP cores C-HAC, Impulse, Celoxica: Hardware accelerator for targeted C code, soft processor for rest DSP Builder: Convert DSP block diagrams to hardware Other tools?

62 Modern FPGA RTL Design Flow
IP Cores Third-Party Software Hardware/Software Debug Design Verification Timing Verification & Debug Place-&-Route & Physical Synth. RTL Logic Synthesis Functional Custom RTL Development Specification Compilation & Optimization Power Analysis Product 62

63 Extending the Design Flow
RTL Design Flow Back-end Flow Hardware/Software Debug Product 63

64 Extending the Design Flow To System Level
HW/SW Interface Generation Higher Level Languages Embedded Soft Processors IP Core Reuse System Integration Interface Synthesis RTL Design Flow Back-end Flow Hardware/Software Debug Product 64

65 Structured ASIC Architecture

66 Structured ASIC Architecture
Many questions similar to FPGA Logic cell, RAM types, structure of custom metal routing layers for best speed, area, power Metal programmed  answers different than FPGA How to keep non-recurring engineering cost low Few masks? Cheap masks? Make custom layers easy to electrically and optically verify? Clever tricks? Still have to beat FPGA speed, area, power And device must be routable


Download ppt "FPGAs and Structured ASICs Overview & Research Challenges"

Similar presentations


Ads by Google