Presentation is loading. Please wait.

Presentation is loading. Please wait.

FPGA Devices & FPGA Design Flow

Similar presentations


Presentation on theme: "FPGA Devices & FPGA Design Flow"— Presentation transcript:

1 FPGA Devices & FPGA Design Flow
ECE 545 Lecture 2 FPGA Devices & FPGA Design Flow

2 Two competing implementation approaches
FPGA Field Programmable Gate Array ASIC Application Specific Integrated Circuit designed all the way from behavioral description to physical layout no physical layout design; design ends with a bitstream used to configure a device designs must be sent for expensive and time consuming fabrication in semiconductor foundry bought off the shelf and reconfigured by designers themselves

3 What is an FPGA? Configurable Logic Blocks I/O Blocks Block RAMs

4 Which Way to Go? ASICs FPGAs Off-the-shelf High performance
Low development cost Low power Short time to market Low cost in high volumes Reconfigurability

5 Other FPGA Advantages Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower Mistakes not detected at design time have large impact on development time and cost FPGAs are perfect for rapid prototyping of digital circuits Easy upgrades like in case of software Unique applications reconfigurable computing

6 Major FPGA Vendors SRAM-based FPGAs Xilinx, Inc. Altera Corp. Atmel
Lattice Semiconductor Flash & antifuse FPGAs Actel Corp. Quick Logic Corp. Share about 90% of the market

7 The Programmable Marketplace Q1 Calendar Year 2005
PLD Segment FPGA Sub-Segment Lattice QuickLogic: 2% Xilinx Actel Other: 2% 5% 7% 58% 33% 51% 31% 11% It is clear from these two charts that Xilinx is not only the clear leader in programmable logic products, but is also the leader in FPGA market share. This is due primarily to the fact that we produce products the meet the requirements of our customers. We understand the problems facing our customers and we make it our business to provide solutions to those problems Note: Atmel and Cypress number (each less than 1%) are not included in this calculation. Xilinx Altera Altera All Others Two dominant suppliers, indicating a maturing market Source: Company reports Latest information available; computed on a 4-quarter rolling basis

8 ISE Alliance and Foundation Series Design Software
Xilinx Primary products: FPGAs and the associated CAD software Main headquarters in San Jose, CA Fabless* Semiconductor and Software Company UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996} Seiko Epson (Japan) TSMC (Taiwan) Samsung (Korea) Programmable Logic Devices ISE Alliance and Foundation Series Design Software

9 Xilinx FPGA Families Old families XC3000, XC4000, XC5200
Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs. High-performance families Virtex (220 nm) Virtex-E, Virtex-EM (180 nm) Virtex-II (130 nm) Virtex-II PRO (130 nm) Virtex-4 (90 nm) Virtex-5 (65 nm) Virtex-6 (40 nm) Low Cost Family Spartan/XL – derived from XC4000 Spartan-II – derived from Virtex Spartan-IIE – derived from Virtex-E Spartan-3 (90 nm) Spartan-3E (90 nm) – logic optimized Spartan-3A (90 nm) – I/O optimized Spartan-3AN (90 nm) – non-volatile, Spartan-3A DSP (90 nm) – DSP optimized Spartan-6 (45 nm)

10

11 CLB Structure

12 General structure of an FPGA
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN Copyright © 2004 Mentor Graphics Corp. (

13 Xilinx CLB The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN Copyright © 2004 Mentor Graphics Corp. (

14 CLB Structure The configurable logic block (CLB) contains two slices. Each slice contains two 4-input look-up tables (LUT), carry & control logic and two registers. There are two 3-state buffers associated with each CLB, that can be accessed by all the outputs of a CLB. Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplex functions. The Xilinx internal bus can also be extended off chip.

15 Xilinx CLB Slice The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN Copyright © 2004 Mentor Graphics Corp. (

16 CLB Slice Structure Each slice contains two sets of the following:
Four-input LUT Any 4-input logic function, or 16-bit x 1 sync RAM (SLICEM only) or 16-bit shift register (SLICEM only) Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control Two slices form a CLB. These slices can be used independently or together for wider logic functions.Within each slice also, the LUT and the flip flop can be used for the same function or for independent functions. The flip flops do not handcuff the designers into only having a set or clear. And for more ASIC like flows, the flip flop can be sued as latch. So, the designers do not need to re-code the design for the device architecture.

17 LUT (Look-Up Table) Functionality
Look-Up tables are primary elements for logic implementation Each LUT can implement any function of 4 inputs

18 5-Input Functions implemented using two LUTs
One CLB Slice can implement any function of 5 inputs Logic function is partitioned between two LUTs F5 multiplexer selects LUT

19 5-Input Functions implemented using two LUTs
OUT LUT

20 Xilinx Multipurpose LUT
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN Copyright © 2004 Mentor Graphics Corp. (

21 Simplified view of a Xilinx Logic Cell
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN Copyright © 2004 Mentor Graphics Corp. (

22 Distributed RAM = or CLB LUT configurable as Distributed RAM
RAM16X1S O D WE WCLK A0 A1 A2 A3 RAM32X1S A4 RAM16X2S O1 D0 D1 O0 = LUT or RAM16X1D SPO DPRA0 DPO DPRA1 DPRA2 DPRA3 CLB LUT configurable as Distributed RAM A single LUT equals 16x1 RAM Two LUTs Implement Single and Dual-Port RAMs Cascade LUTs to increase RAM size Synchronous write Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read When the CLB LUT is configured as memory, it can implement 16x1 synchronous RAM. One LUT can implement 16x1 Single-Port RAM. Two LUTs are used to implement 16x1 dual port RAM. The LUTs can be cascaded for desired memory depth and width. The write operation is synchronous. The read operation is asynchronous and can be made synchronous by using the accompanying flip flops of the CLB LUT. The distributed ram is compact and fast which makes it ideal for small ram based functions.

23 Shift Register = Each LUT can be configured as shift register
Q CE LUT IN CLK DEPTH[3:0] OUT = Each LUT can be configured as shift register Serial in, serial out Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays Use CLB flip-flops to add depth The LUT can be configured as a shift register (serial in, serial out) with bit width programmable from 1 to 16. For example, DEPTH[3:0] = 0010(binary) means that the shift register is 3-bit wide. In the simplest case, a 16 bit shift register can be implemented in a LUT, eliminating the need for 16 flip flops, and also eliminating extra routing resources that would have been lowered the performance otherwise.

24 Shift Register Register-rich FPGA
64 Operation A 4 Cycles 8 Cycles Operation B 3 Cycles Operation C 12 Cycles 9-Cycle imbalance Register-rich FPGA Allows for addition of pipeline stages to increase throughput Data paths must be balanced to keep desired functionality In this example, there is a cycle imbalance, which must be fixed. Let’s think of how the shift register can fix the imbalanced cycles. As seen from the slide, the logic will be off by nine clock cycles.

25 Carry & Control Logic SLICE Carry & Control Logic Carry & Control
COUT YB Look-Up Table Carry & Control Logic Y G4 G3 G2 G1 S D Q O CK EC R F5IN BY SR XB Look-Up Table Carry & Control Logic X S F4 F3 F2 F1 D Q O The configurable logic block (CLB) contains two slices. Each slice contains two 4-input look-up tables (LUT), carry & control logic and two registers. There are two 3-state buffers associated with each CLB, that can be accessed by all the outputs of a CLB. Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplex functions. The Xilinx internal bus can also be extended off chip. CK EC R CIN CLK CE SLICE

26 Fast Carry Logic Each CLB contains separate logic and routing for the fast generation of sum & carry signals Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters Carry logic is independent of normal logic and routing resources MSB Carry Logic Routing LSB

27 Accessing Carry Logic All major synthesis tools can infer carry logic for arithmetic functions Addition (SUM <= A + B) Subtraction (DIFF <= A - B) Comparators (if A < B then…) Counters (count <= count +1)

28 Input/Output Blocks (IOBs)

29 Basic I/O Block Structure
Three-State D Q FF Enable EC Three-State Control Clock SR Set/Reset Output D Q FF Enable EC Output Path SR Direct Input FF Enable Input Path Registered Input Q D EC SR

30 IOB Functionality IOB provides interface between the package pins and CLBs Each IOB can work as uni- or bi-directional I/O Outputs can be forced into High Impedance Inputs and outputs can be registered advised for high-performance I/O Inputs can be delayed

31 Other Components of Spartan 3 FPGAs

32 RAM Blocks and Multipliers in Xilinx FPGAs
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN Copyright © 2004 Mentor Graphics Corp. (

33 A simple clock tree The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN Copyright © 2004 Mentor Graphics Corp. (

34 Digital Clock Manager (DCM)
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN Copyright © 2004 Mentor Graphics Corp. (

35 Spartan-3 Family Attributes

36 Spartan-3 FPGA Family Members

37 FPGA Nomenclature

38 FPGA device present on the RC10 board
XC3S1500-4FG320 Spartan 3 family 1500 k = 1.5 M equivalent logic gates speed grade -4 = standard performance 320 pins package type

39 FPGA Design Flow

40 Design flow (1) Specification (Lab Experiments)
Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds….. Specification (Lab Experiments) VHDL description (Your Source Files) Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core; Functional simulation Synthesis Post-synthesis simulation

41 Design flow (2) Implementation Timing simulation Configuration
On chip testing

42 Tools used in FPGA Design Flow
Functionally verified VHDL code Design VHDL code Synplicity Synplify Pro Xilinx XST Synthesis Netlist Implementation Xilinx ISE Bitstream

43 Synthesis

44 Synthesis Tools Xilinx XST Synplify Pro … and others

45 Logic Synthesis VHDL description Circuit netlist
architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW;

46 Circuit netlist (RTL view)

47 Mapping LUT0 LUT4 LUT1 FF1 LUT5 LUT2 FF2 LUT3

48 RTL view in Synplify Pro
General logic structures can be recognized in RTL view comparator incrementer MUX

49 Crossprobing between RTL view and code
Each port, net or block can be chosen by mouse click from the browser or directly from the RTL View By double-clicking on the element its source code can be seen: Reverse crossprobing is also possible: if section of code is marked, appropriate element of RTL View is marked too:

50 Technology View in Synplify Pro
Technology view is a mapped RTL view. It can be seen by pressing button or by double-click on “.srm” file As in case of “RTL View”, buttons can be used here Two additional buttons are enabled: show critical path - open timing analyst Pay attention: technology view is usually large and presented on number of sheets Technology view is presented using device primitives Ports, nets and blocks browser

51 Viewing critical path Critical path can be viewed by pressing on
Delay values are written near each component of the path

52 Timing Analyst Timing analyst opened by pressing on
Timing analyst gives a possibility to analyze different paths in the design Timing analyst can be opened only from Technology View

53 Implementation

54 Implementation After synthesis the entire implementation process is performed by FPGA vendor tools

55

56 Translation Circuit netlist Timing Constraints Native Constraint File
Synthesis Circuit netlist Timing Constraints Constraint Editor or Text Editor Native Constraint File Electronic Design Interchange Format EDIF NCF UCF User Constraint File Translation NGD Native Generic Database file

57 Mapping LUT0 LUT4 LUT1 FF1 LUT5 LUT2 FF2 LUT3

58 Placing FPGA CLB SLICES

59 Routing FPGA Programmable Connections

60 Configuration Once a design is implemented, you must create a file that the FPGA can understand This file is called a bit stream: a BIT file (.bit extension) The BIT file can be downloaded directly to the FPGA, or can be converted into a PROM file which stores the programming information

61 Two main stages of the FPGA Design Flow
Synthesis Implementation Technology dependent Technology independent RTL Synthesis Map Place & Route Configure Code analysis - Derivation of main logic constructions Technology independent optimization Creation of “RTL View” Mapping of extracted logic structures to device primitives Technology dependent optimization Application of “synthesis constraints” Netlist generation Creation of “Technology View” Placement of generated netlist onto the device Choosing best interconnect structure for the placed design Application of “physical constraints” Bitstream generation Burning device

62 Report files ECE 448 – FPGA and ASIC Design with VHDL

63 Map report header Release 8.1i Map I.24
Xilinx Mapping Report File for Design 'Lab3Demo' Design Information Command Line : c:\Xilinx\bin\nt\map.exe -p 3S1500FG o map.ncd -pr b -k 4 -cm area -c 100 Lab3Demo.ngd Lab3Demo.pcf Target Device : xc3s1500 Target Package : fg320 Target Speed : -4 Mapper Version : spartan3 -- $Revision: 1.34 $ Mapped Date : Tue Feb 13 17:04:

64 Map report Design Summary -------------- Number of errors: 0
Number of warnings: 0 Logic Utilization: Number of Slice Flip Flops: out of 26, % Number of 4 input LUTs: out of 26, % Logic Distribution: Number of occupied Slices: out of 13, % Number of Slices containing only related logic: out of % Number of Slices containing unrelated logic: out of % *See NOTES below for an explanation of the effects of unrelated logic Total Number 4 input LUTs: out of 26, % Number used as logic: Number used as a route-thru: Number of bonded IOBs: out of % IOB Flip Flops: Number of GCLKs: out of %

65 Place & route report Asterisk (*) preceding a constraint indicates it was not met. This may be due to a setup or hold violation. Constraint | Requested | Actual | Logic | Absolute |Number of | | | Levels | Slack |errors * TS_CLOCK = PERIOD TIMEGRP "CLOCK" 5 ns | 5.000ns | 5.140ns | | ns | 5 HIGH 50% | | | | | TS_gen1Hz_Clock1Hz = PERIOD TIMEGRP "gen1 | 5.000ns | 4.137ns | | 0.863ns | 0 "gen1Hz_Clock1Hz" 5 ns HIGH 50% | | | | |

66 Post layout timing report
Clock to Setup on destination clock CLOCK | Src:Rise| Src:Fall| Src:Rise| Src:Fall| Source Clock |Dest:Rise|Dest:Rise|Dest:Fall|Dest:Fall| CLOCK | | | | | Timing summary: Timing errors: 9 Score: 543 Constraints cover 574 paths, 0 nets, and 187 connections Design statistics: Minimum period: ns (Maximum frequency: MHz)


Download ppt "FPGA Devices & FPGA Design Flow"

Similar presentations


Ads by Google