Presentation is loading. Please wait.

Presentation is loading. Please wait.

® Xilinx FPGA Architecture Overview. ® Virtex/Spartan-II Top-level Architecture  Gate-array like architecture  Configurable logic blocks.

Similar presentations

Presentation on theme: "® Xilinx FPGA Architecture Overview. ® Virtex/Spartan-II Top-level Architecture  Gate-array like architecture  Configurable logic blocks."— Presentation transcript:

1 ® Xilinx FPGA Architecture Overview

2 ® Virtex/Spartan-II Top-level Architecture  Gate-array like architecture  Configurable logic blocks —Implement logic here!  I/O blocks —16 signal standards  Block RAM —On-chip memory for higher performance  Clocks & Delay-Locked Loop  Interconnect resources —Three-state internal buses

3 ® Logic Cell Capacity  A better first-order alternative to gate counting  Better comparisons among different FPGAs  Logic cell definition: —4-input look-up table + dedicated flip-flop  Logic cells per CLB: —Xc4000/Spartan 2.375 (2 4-LUTs, 1 3-LUT, 2 FFs) —Virtex/Spartan-II 4.5 (4 4-LUTs, 1 F5MUX, 4 FFs)

4 ® Combinational Logic Function (LUT) Flip- Flop Inputs Outputs Configurable Logic Block (CLB)  Combinational logic generated in a lookup table (LUT) —Any function of available inputs  LUT output feeds CLB output or D input of flip-flop

5 ® CLB MUXF6 Slice LUT MUXF5 Slice LUT MUXF5 Virtex/Spartan-II Function Generators  Four 4-input function generators —Independent inputs (4 functions of 4 inputs)  MUXF5 combines 2 LUTs to form —4x1 multiplexer —Or any 5-input function  MUXF6 combines 2 slices to form —8x1 multiplexer —Or any 6-input function

6 ® LUT Lookup Table  Generates any function of its inputs —Typically 4 inputs  Logically equivalent to a 16 x 1 ROM Inputs Output 00000 00011 00101 00110

7 ® CLB Lookup Table Targeting LUT-based Logic  LUT limit is on inputs, not complexity —Reducing inputs/function (fan-in) to fit CLBs improves density and speed —Automatically done by Xilinx synthesis and implementation tools  Inverters are free

8 ® I1 N1 must go to two places, so O1 may require a second level of logic Duplicating first gate allows N1A to always be collapsed inside a single lookup table O1 N1 O1 I1 N1A N1B Duplicating Logic Can Improve Results  Collapsing of logic into CLBs affects number of levels required and therefore speed  The gates you use will determine mapping —Nets with a fanout >1 may be outside a CLB

9 ® AND2 Defining Lookup Tables With Gate Primitives  Example of gate primitive  Up to five inputs with all combinations of inversion —AND2B1 indicates 1 “bubbled” or inverted input  Up to nine inputs non-inverted —Add external INV primitives if desired

10 ®  Stores data (D) on rising edge of clock (K) —Clock enable (CE) —Asynchronous clear (C) KCECDQ Xx1x0 10dd 0x0xq D K Q C CE Flip-Flops

11 ® Additional Flip-Flop Controls  Reset (Clear) and/or Set  Global initialization (GSR) —Use to initialize all flip- flops  Programmable clock polarity  Clock enable can be left unconnected

12 ® Virtex/Spartan-II CLB Slice  1 CLB holds 2 slices  Each slice has two sets of —Four-input LUT –Any 4-input logic function –Or 16-bit x 1 RAM –Or 16-bit shift register —Carry & Control –Fast arithmetic logic –Multiplier logic –Multiplexer logic —Storage element –Latch or flip-flop –Set and reset –True or inverted inputs –Sync. or Async. Control

13 ® Dedicated Multiplier Logic  Highly efficient ‘Shift & Add’ implementation —For a 16x16 multiplier –30% reduction in area –1 less logic level

14 ® On-chip RAM  All Xilinx FPGAs use RAM-based programming  Adding Write Enable to LUT creates on-chip SelectRAM memory

15 ® Data Write Enable Write Clock Address Output Data Write Enable Write Clock Write Address/ Single-Port Read Address Single-Port Output Dual-Port Output Dual-Port Read Address SelectRAM Benefits  Single-Port —Synchronous —Simple timing  Dual-Port

16 ® kilobytes Block RAM 200 MHz Memory Continuum bytes 16x1 DSP Coefficients Small FIFOs Shallow/Wide Distributed RAM 4Kx1 2Kx2 1Kx4 512x8 256x16 Large FIFOs Packet Buffers Video Line Buffers Cache Tag Memory Deep/Wide megabytes SDRAM ZBTRAM SSRAM SGRAM External RAM Memory Bandwidth and Flexibility  Virtex/Spartan-II On-Chip SelectRAM+ Memory

17 ® Spartan-II Dual-R/W Port Block RAM Port A Port B W R W R W R R W Spartan-II Memory  CLB LUTs provide small distributed RAM (16 bits/LUT)  Block RAM provides 4K bits each —Dual read/write port. Each port has… –Independent Clock, R/W, and Enable –Independently configurable data width from 4K x 1 to 256 x 16

18 ® IOB Pad Bonded to Package Pin Clocks TS O I I/O Block (IOB)  Periphery of identical I/O blocks —Input, output, or bi-directional —Direct or registered (or latched input) —Pullup/Pulldown —Programmable slew rate —Three-state output —Programmable thresholds

19 ® IPAD IBUF Use Special IOB Primitives  User explicitly defines what resources in the IOB are to be used  I/Os are defined with —1 pad primitive —At least 1 function primitive –1 input element, 1 output element or both –Inverters may also be pulled into IOBs

20 ® Locking Down I/O Locations  LOC=Pxx attribute defines I/O pad location(s)  Avoid locking IOBs early —Makes routing more difficult  Use IOB LOC= to lock pins late in design cycle once PCB is built —Can lock IOBs if floorplanning the connected CLBs

21 ® IPAD IBUF Use Pullups/Pulldowns  Pullup automatically connected on unused IOBs  User can specify PULLUP or PULLDOWN primitive on used IOBs  Inputs should not be left floating —Add Pullup to design inputs that may be left floating to reduce power and noise

22 ® Input Buffer Q D Routing Delay Pad Example IOB External Data External Clock Delay External Clock Routed Clock External Data Delay Data X X Faster Setup With NODELAY  Delay included by default —Compensates for clock routing delay to prevent hold time  NODELAY attribute removes delay element —Creates hold time

23 ® FAST OPAD OBUF Slew Rate Control  Slew rate controls output speed  Default slow slew rate reduces noise & ground bounce  Use fast slew rate wherever speed is important —FAST parameter on output logic primitive

24 ® OE OBUFE T T OBUFT OE Output Three-State Control  Free inverter on output buffer control —Use OBUFE macro for active-high enable —Use OBUFT primitive for active-low enable

25 ® STARTUP GTS GSR Global Three-State  3-state control either local and/or via a dedicated global net —Global three-state controlled by STARTUP... primitive

26 ® Virtex/Spartan-II I/O Block (Simplified)

27 ® Multiple I/O Interface Standards  16 to 20 I/O interface standards supported  CMOS, HSTL, SSTL, GTL, CTT, PCI  As many as eight banks on a device —Package dependent  Different banks can support different standards at the same time —Logic level translation —Boards with mixed standards

28 ® 2ns CLB Array High Performance Routing  Hierarchical Routing —Singles, Hexes, Longs  Sparse connections on longer interconnects for high speed  Routing delay depends primarily on distance —Direction independent —Device-size independent  Predictable for early design analysis

29 ® Flexible General-Purpose Interconnect  Flexible but slow if crosses many channels —Programmable switch matrix at each channel crossing —Connects across, changes direction or fans out

30 ® Switch Matrix  Bidirectional pass transistors  High routing flexibility

31 ® Reduce Fanout  Higher fanout nets (>16 loads) are harder to route & slower  Consider duplicating source in schematic to improve routing or speed

32 ® CLB Long Lines for High Fanout Nets  Metal lines that traverse length & width of chip  Lowest skew  Ideal for high fan-out signals  Ideal for clocking  Requires vertical or horizontal alignment of loads

33 ® Internal Three-State Buses  Two 3-state drivers per CLB  OR-AND logic implementation in place of 3-state drivers —With no drivers enabled, bus is a logic 1  Low power —No danger of contention when multiple BUFTs enabled —No physical pullups or large capacitance to drive

34 ® General Clock Support  Use clock buffers for highest fanout clocks —Drive high-speed long line resources –Lowest skew across a device –No internal hold times —Use generic BUFG primitive –Allows software to choose best type of buffer –Allows easy migration across families  Four dedicated global low skew buffers —Dedicated input pin (clock distribution only)  Additional shared resources (i.e., long lines) —Distribute low-skew/high-fanout signals (10ns max.)  Four delay-locked loops on each device —All-digital implementation —Two global buffers associated with each DLL pair

35 ® Configuration  Schematic or HDL description is converted to a configuration file by the Xilinx development system  Configuration file is loaded into FPGA on power-up —Stored in configuration latches —Controls CLBs, IOBs, interconnect, etceteras

36 ® Configuration Bitstream  Binary programming file  Length depends only on device, not utilization —Typically 1 ms per bit (total from a few ms to <1s)  FPGA can load its configuration automatically on power-up, or under microprocessor control  Can be loaded directly into device/configuration PROM

37 ® Configuration Modes  Bit-serial configuration —Simple, uses few device pins —Controlled by FPGA (Master) or externally (Slave) —Xilinx serial proms available  Byte-parallel configuration —Can drive PROM addresses (Master) —Can be microprocessor-controlled

38 ® Configuration Pins  Configuration starts on power-up  Mode pin(s) checked to determine method —Usable as extra I/O after configuration  All I/O not used for configuration are disabled  Reconfiguration possible by pulling PROGRAM pin low

39 ® RIP DATA TRIG CLK READBACK Readback  Configuration data can be read back serially —Allows verification of programming  Readback data can include user-register values —Allows in-circuit functional verification —Requires READBACK... symbol

40 ® Boundary Scan  IEEE 1149.1-compatible boundary scan (JTAG)  Available before configuration  Configuration & readback possible via boundary scan logic

41 ® Power Consumption  CMOS SRAM technology provides low standby power  Operating power is mostly dynamic —Proportional to transition frequency of internal nodes —Xilinx segmented interconnect minimizes amount of metal capacitance to switch, minimizing power

Download ppt "® Xilinx FPGA Architecture Overview. ® Virtex/Spartan-II Top-level Architecture  Gate-array like architecture  Configurable logic blocks."

Similar presentations

Ads by Google