Presentation is loading. Please wait.

Presentation is loading. Please wait.

George Mason University FPGA Devices & FPGA Design Flow ECE 545 Lecture 8.

Similar presentations


Presentation on theme: "George Mason University FPGA Devices & FPGA Design Flow ECE 545 Lecture 8."— Presentation transcript:

1 George Mason University FPGA Devices & FPGA Design Flow ECE 545 Lecture 8

2 2 Required Reading Xilinx, Inc. Spartan-3 FPGA Family Spartan-3 FPGA Family Data Sheet Module 1: Introduction Features Architectural Overview Package Marking Module 2: CLB Overview

3 3 Required Reading Xilinx, Inc. Spartan-3 FPGA Family Spartan-3 Generation FPGA User Guide Chapter 5 Using Configurable Logic Blocks (CLBs) Chapter 6 Using Look-Up Tables as Distributed RAM Chapter 7: Using Look-Up Tables as Shift Registers (SRL16) Chapter 9: Using Carry and Arithmetic Logic

4 4 designs must be sent for expensive and time consuming fabrication in semiconductor foundry bought off the shelf and reconfigured by designers themselves Two competing implementation approaches ASIC Application Specific Integrated Circuit FPGA Field Programmable Gate Array designed all the way from behavioral description to physical layout no physical layout design; design ends with a bitstream used to configure a device

5 5 Block RAMs Configurable Logic Blocks I/O Blocks What is an FPGA? Block RAMs

6 6 Which Way to Go? Off-the-shelf Low development cost Short time to market Reconfigurability High performance ASICsFPGAs Low power Low cost in high volumes

7 7 Other FPGA Advantages Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower Mistakes not detected at design time have large impact on development time and cost FPGAs are perfect for rapid prototyping of digital circuits Easy upgrades like in case of software Unique applications reconfigurable computing

8 8 Major FPGA Vendors SRAM-based FPGAs Xilinx, Inc. Altera Corp. Atmel Lattice Semiconductor Flash & antifuse FPGAs Microsemi SoC Products Group (formerly Actel Corp.) Quick Logic Corp. Share about 85% of the market

9 9 Xilinx  Primary products: FPGAs and the associated CAD software  Main headquarters in San Jose, CA  Fabless* Semiconductor and Software Company  UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996}  Seiko Epson (Japan)  TSMC (Taiwan)  Samsung (Korea) Programmable Logic Devices ISE Alliance and Foundation Series Design Software

10 10 Xilinx FPGA Families Old families XC3000, XC4000, XC5200 Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs. High-performance families Virtex (220 nm) Virtex-E, Virtex-EM (180 nm) Virtex-II (130 nm) Virtex-II PRO (130 nm) Virtex-4 (90 nm) Virtex-5 (65 nm) Virtex-6 (40 nm) Low Cost Family Spartan/XL – derived from XC4000 Spartan-II – derived from Virtex Spartan-IIE – derived from Virtex-E Spartan-3 (90 nm) Spartan-3E (90 nm) – logic optimized Spartan-3A (90 nm) – I/O optimized Spartan-3AN (90 nm) – non-volatile, Spartan-3A DSP (90 nm) – DSP optimized Spartan-6 (45 nm)

11 11

12 George Mason University CLB Structure

13 13 The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) General structure of an FPGA

14 14 The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) Xilinx Spartan 3 CLB

15 15 COUT D Q CK S R EC D Q CK R EC O G4 G3 G2 G1 Look-Up Table Carry & Control Logic O YB Y F4 F3 F2 F1 XB X Look-Up Table F5IN BY SR S Carry & Control Logic CIN CLK CE SLICE CLB Slice = 2 Logic Cells

16 16 The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) Xilinx Multipurpose LUT (MLUT) 16 x 1 ROM (logic)

17 17 Spartan 3 CLB Structure

18 18 CLB Slice Structure Each slice contains two sets of the following: Four-input LUT Any 4-input logic function (16x1 ROM), or 16-bit x 1 sync RAM (SLICEM only) or 16-bit shift register (SLICEM only) Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control

19 19 COUT D Q CK S R EC D Q CK R EC O G4 G3 G2 G1 Look-Up Table Carry & Control Logic O YB Y F4 F3 F2 F1 XB X Look-Up Table F5IN BY SR S Carry & Control Logic CIN CLK CE SLICE Multipurpose Look-Up Table (MLUT)

20 20 The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) MLUT as 16x1 ROM

21 21 LUT (Look-Up Table) Functionality Look-Up tables are primary elements for logic implementation Each LUT can implement any function of 4 inputs

22 22 5-Input Functions implemented using two LUTs One CLB Slice can implement any function of 5 inputs Logic function is partitioned between two LUTs F5 multiplexer selects LUT

23 23 5-Input Functions implemented using two LUTs LUT OUT

24 24 The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) MLUT as 16x1 RAM

25 25 RAM16X1S O D WE WCLK A0 A1 A2 A3 RAM32X1S O D WE WCLK A0 A1 A2 A3 A4 RAM16X2S O1 D0 WE WCLK A0 A1 A2 A3 D1 O0 = = LUT or LUT RAM16X1D SPO D WE WCLK A0 A1 A2 A3 DPRA0DPO DPRA1 DPRA2 DPRA3 or Distributed RAM CLB LUT configurable as Distributed RAM A single LUT equals 16x1 RAM Two LUTs Implement Single and Dual-Port RAMs Cascade LUTs to increase RAM size Synchronous write Synchronous/Asynchronous read Accompanying flip-flops used for synchronous read

26 26 The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) MLUT as 16-bit Shift Register (SRL16)

27 27 DQ CE DQ DQ DQ LUT IN CE CLK DEPTH[3:0] OUT LUT = Shift Register Each LUT can be configured as shift register Serial in, serial out Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays Use CLB flip-flops to add depth

28 28 Using Multipurpose Look-Up Tables in the Shift Register Mode (SRL16) ECE 448 – FPGA and ASIC Design with VHDL Inferred from behavioral description in VHDL for shift-registers with - one serial input, one serial output - no reset, no set

29 29 Cascading LUT Shift Registers into Shift Registers Longer than 16 bits ECE 448 – FPGA and ASIC Design with VHDL

30 30 Shift Register Register-rich FPGA Allows for addition of pipeline stages to increase throughput Data paths must be balanced to keep desired functionality 64 Operation A 4 Cycles8 Cycles Operation B 3 Cycles Operation C 64 12 Cycles 3 Cycles 9-Cycle imbalance

31 31ECE 448 – FPGA and ASIC Design with VHDL COUT D Q CK S R EC D Q CK R EC O G4 G3 G2 G1 Look-Up Table Carry & Control Logic O YB Y F4 F3 F2 F1 XB X Look-Up Table F5IN BY SR S Carry & Control Logic CIN CLK CE SLICE Carry & Control Logic

32 Full-adder x y c out s FA x + y + c in = ( c out s ) 2 2 1 xy c out s 0000111100001111 0011001100110011 0001011100010111 0110100101101001 c in 0101010101010101

33 Full-adder Alternative implementations xyc out s 00110011 01010101 0101 c in

34 x y A2 A1 XOR D 01 C in C out S p g Full-adder Alternative implementations Implementation used to generate fast carry logic in Xilinx FPGAs xyc out 00110011 01010101 yyyy c in p = x  y g = y s= p  c in = x  y  c in

35 Carry & Control Logic in Spartan 3 FPGAs LUT Hardwired (fast) logic

36 Simplified View of Spartan-3 FPGA Carry and Arithmetic Logic in One Logic Cell

37 Simplified View of Carry Logic in One Spartan 3 Slice

38

39 Critical Path for an Adder Implemented Using Xilinx Spartan 3/Spartan 3E FPGAs

40 Number and Length of Carry Chains for Spartan 3 FPGAs

41 Bottom Operand Input to Carry Out Delay T OPCYF 0.9 ns for Spartan 3

42 0.2 ns for Spartan 3 Carry Propagation Delay t BYP

43 Carry Input to Top Sum Combinational Output Delay T CINY 1.2 ns for Spartan 3

44 Critical Path Delays and Maximum Clock Frequencies (into account surrounding registers)

45 45  Each CLB contains separate logic and routing for the fast generation of sum & carry signals Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters  Carry logic is independent of normal logic and routing resources Fast Carry Logic LSB MSB Carry Logic Routing

46 46 Accessing Carry Logic  All major synthesis tools can infer carry logic for arithmetic functions Addition (SUM <= A + B) Subtraction (DIFF <= A - B) Comparators (if A < B then…) Counters (count <= count +1)

47 47 Logic Cell = ½ of a CLB Slice ECE 448 – FPGA and ASIC Design with VHDL

48 48 CLB Slice = 2 Logic Cells ECE 448 – FPGA and ASIC Design with VHDL

49 George Mason University Examples: Determine the amount of Spartan 3 resources needed to implement a given circuit

50 Circuit 1: Top level

51 1010 1010 0123456701234567 cin xy cout s <<<3 x3 x2 x1 x0 y3 y2 y1 y0 w1 w0 En y3 y2 y1 y0 a b c d a b c d c a b e e f 3 2-to-4 Decoder Full Adder f g h g h y Circuit 1: F – function

52 R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 z abcdeabcde y F d clk 01 run Circuit 2: Top level

53 1010 1010 0123456701234567 xy cout s >>2 x3 x2 x1 x0 y3 y2 y1 y0 y1 y0 z w3 w2 w1 w0 a b c d a e f g h 3 Priority Encoder Half Adder g h i e i y a b c d Circuit 2: F – function

54 Circuit 3: Top level

55 Circuit 4: Top level

56 George Mason University Other Components of Spartan 3 FPGAs

57 57 RAM Blocks and Multipliers in Xilinx FPGAs The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)

58 58 Combinational and Registered Multiplier ECE 448 – FPGA and ASIC Design with VHDL

59 59 Dedicated Multiplier Block

60 60 Block RAM Spartan-3 Dual-Port Block RAM Port A Port B Block RAM Most efficient memory implementation Dedicated blocks of memory Ideal for most memory requirements 4 to 36 memory blocks in Spartan 3 18 kbits = 18,432 bits per block (16 k without parity bits) Use multiple blocks for larger memories Builds both single and true dual-port RAMs Synchronous write and read (different from distributed RAM)

61 61 Block RAM can have various configurations (port aspect ratios) 0 16,383 1 4,095 4 0 8,191 2 0 2047 8+1 0 1023 16+2 0 16k x 1 8k x 2 4k x 4 2k x (8+1) 1024 x (16+2)

62 62 Block RAM Port Aspect Ratios

63 63 Single-Port Block RAM DI[w-p-1:0] DO[w-p-1:0]

64 64 Dual-Port Block RAM DIA[w A -p A -1:0] DOA[w A -p A -1:0] DOA[w B -p B -1:0] DIB[w B -p B -1:0]

65 George Mason University Input/Output Blocks (IOBs)

66 66 Basic I/O Block Structure D EC Q SR D EC Q SR D EC Q SR Three-State Control Output Path Input Path Three-State Output Clock Set/Reset Direct Input Registered Input FF Enable

67 67 IOB Functionality IOB provides interface between the package pins and CLBs Each IOB can work as uni- or bi-directional I/O Outputs can be forced into High Impedance Inputs and outputs can be registered advised for high-performance I/O Inputs can be delayed

68 George Mason University Spartan-3 Family Attributes

69 69 Spartan-3 FPGA Family Members

70 70 FPGA Nomenclature

71 71 FPGA Nomenclature Example XC3S1500-4FG320 Spartan 3 family 1500 k = 1.5 M equivalent logic gates speed grade -4 = standard performance 320 pins package type

72 George Mason University FPGA Design Flow

73 FPGA Design process (1) Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds….. Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core; Specification / Pseudocode VHDL description (Your Source Files) Functional simulation Post-synthesis simulation Synthesis On-paper hardware design (Block diagram & ASM chart)

74 FPGA Design process (2) Implementation Configuration Timing simulation On chip testing

75 75 Tools used in FPGA Design Flow Xilinx XST Design Synthesis Implementation Xilinx ISE VHDL code Netlist Bitstream Synplify Premier Functionally verified VHDL code

76 George Mason University Synthesis

77 77 Synthesis Tools … and others Synplify Premier Xilinx XST

78 78 architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW; VHDL description Circuit netlist Logic Synthesis

79 79 Circuit netlist (RTL view)

80 80 Mapping LUT2 LUT3 LUT4 LUT5 LUT1 FF1 FF2 LUT0

81 RTL view in Synplify Premier incrementercomparator General logic structures can be recognized in RTL view MUX

82 Crossprobing between RTL view and code Each port, net or block can be chosen by mouse click from the browser or directly from the RTL View By double-clicking on the element its source code can be seen: Reverse crossprobing is also possible: if section of code is marked, appropriate element of RTL View is marked too:

83 Technology View in Synplify Pro Technology view is a mapped RTL view. It can be seen by pressing button or by double-click on “.srm” file As in case of “RTL View”, buttons can be used here Two additional buttons are enabled: - show critical path - open timing analyst - open timing analyst Technology view is presented using device primitives Ports, nets and blocks browser Pay attention: technology view is usually large and presented on number of sheets

84 Viewing critical path Critical path can be viewed by pressing on Delay values are written near each component of the path

85 Timing Analyst Timing analyst opened by pressing on Timing analyst gives a possibility to analyze different paths in the design Timing analyst can be opened only from Technology View

86 George Mason University Implementation

87 87 Implementation After synthesis the entire implementation process is performed by FPGA vendor tools

88 88

89 89 Translation UCF NGD EDIF NCF Native Generic Database file Constraint Editor or Text Editor User Constraint File Native Constraint File Electronic Design Interchange Format Circuit netlistTiming Constraints Synthesis

90 90 Mapping LUT2 LUT3 LUT4 LUT5 LUT1 FF1 FF2 LUT0

91 91 Placing CLB SLICES FPGA

92 92 Routing Programmable Connections FPGA

93 93 Configuration Once a design is implemented, you must create a file that the FPGA can understand This file is called a bit stream: a BIT file (.bit extension) The BIT file can be downloaded directly to the FPGA, or can be converted into a PROM file which stores the programming information

94 Two main stages of the FPGA Design Flow Synthesis Technologyindependent Technologydependent Implementation RTL Synthesis Map Place & Route Place & Route Configure - Code analysis - Derivation of main logic constructions - Technology independent optimization - Creation of “RTL View” - Mapping of extracted logic structures to device primitives - Technology dependent optimization - Application of “synthesis constraints” -Netlist generation - Creation of “Technology View” - Placement of generated netlist onto the device -Choosing best interconnect structure for the placed design -Application of “physical constraints” - Bitstream generation - Burning device

95 95ECE 448 – FPGA and ASIC Design with VHDL Report files

96 96 Map report header Release 8.1i Map I.24 Xilinx Mapping Report File for Design 'Lab3Demo' Design Information ------------------ Command Line : c:\Xilinx\bin\nt\map.exe -p 3S1500FG320-4 -o map.ncd -pr b -k 4 -cm area -c 100 Lab3Demo.ngd Lab3Demo.pcf Target Device : xc3s1500 Target Package : fg320 Target Speed : -4 Mapper Version : spartan3 -- $Revision: 1.34 $ Mapped Date : Tue Feb 13 17:04:54 2007

97 97 Map report Design Summary -------------- Number of errors: 0 Number of warnings: 0 Logic Utilization: Number of Slice Flip Flops: 30 out of 26,624 1% Number of 4 input LUTs: 38 out of 26,624 1% Logic Distribution: Number of occupied Slices: 33 out of 13,312 1% Number of Slices containing only related logic: 33 out of 33 100% Number of Slices containing unrelated logic: 0 out of 33 0% *See NOTES below for an explanation of the effects of unrelated logic Total Number 4 input LUTs: 62 out of 26,624 1% Number used as logic: 38 Number used as a route-thru: 24 Number of bonded IOBs: 10 out of 221 4% IOB Flip Flops: 7 Number of GCLKs: 1 out of 8 12%

98 98 Related and Unrelated Logic Related logic is defined as being logic that shares connectivity – e.g. two LUTs are "related" if they share common inputs. When assembling slices, Map gives priority to combine logic that is related. Doing so results in the best timing performance. Unrelated logic shares no connectivity. Map will only begin packing unrelated logic into a slice once 99% of the slices are occupied through related logic packing. Note that once logic distribution reaches the 99% level through related logic packing, this does not mean the device is completely utilized. Unrelated logic packing will then begin, continuing until all usable LUTs and FFs are occupied. Depending on your timing budget, increased levels of unrelated logic packing may adversely affect the overall timing performance of your design.

99 99 Place & route report Asterisk (*) preceding a constraint indicates it was not met. This may be due to a setup or hold violation. ------------------------------------------------------------------------------------------------------ Constraint | Requested | Actual | Logic | Absolute |Number of | | | Levels | Slack |errors ------------------------------------------------------------------------------------------------------ * TS_CLOCK = PERIOD TIMEGRP "CLOCK" 5 ns | 5.000ns | 5.140ns | 4 | -0.140ns | 5 HIGH 50% | | | | | ------------------------------------------------------------------------------------------------------ TS_gen1Hz_Clock1Hz = PERIOD TIMEGRP "gen1 | 5.000ns | 4.137ns | 2 | 0.863ns | 0 "gen1Hz_Clock1Hz" 5 ns HIGH 50% | | | | | ------------------------------------------------------------------------------------------------------

100 100 Post layout timing report Clock to Setup on destination clock CLOCK ---------------+---------+---------+---------+---------+ | Src:Rise| Src:Fall| Src:Rise| Src:Fall| Source Clock |Dest:Rise|Dest:Rise|Dest:Fall|Dest:Fall| ---------------+---------+---------+---------+---------+ CLOCK | 5.140| | | | ---------------+---------+---------+---------+---------+ Timing summary: --------------- Timing errors: 9 Score: 543 Constraints cover 574 paths, 0 nets, and 187 connections Design statistics: Minimum period: 5.140ns (Maximum frequency: 194.553MHz)

101 TechnologyLow-costHigh- performance 120/150 nmVirtex 2, 2 Pro 90 nmSpartan 3Virtex 4 65 nmVirtex 5 45 nmSpartan 6 40 nmVirtex 6 Xilinx FPGA Devices

102 Altera FPGA Devices TechnologyLow-costMid-rangeHigh- performanc e 130 nmCycloneStratix 90 nmCyclone IIStratix II 65 nmCyclone IIIArria IStratix III 40 nmCyclone IVArria IIStratix IV


Download ppt "George Mason University FPGA Devices & FPGA Design Flow ECE 545 Lecture 8."

Similar presentations


Ads by Google