4 FUSE Fuses are the basic storage element in TTL programmable circuits. Passing a large current through fuse layer blows it. This allows the IC to store data by having the fuses selectively blown.
5 EPROM In CMOS the metal fuse is replaced by FAMOS transistor. By hot electron injection, a charge is placed onto the floating gate and switch action is provided. UV erasable.
6 EEPROM and SRAM EEPROM —Electrically erasable floating gate. —No UV. SRAM —Loads configuration memory cells that control the logic and interconnect. (i.e. pass-transistors) —To erase, turn the power off.
7 Programming Technologies 1) Bipolar fusible link - Closed device, burned open by high current 2) SRAM based - Uses pass transistors controlled by SRAM - CMOS based 3) E/EEPROM based - Floating gate - CMOS based
15 XC7300 Dual Block Architecture UIM I/O FO Universal Interconnect Matrix - SMARTswitch FAST 5 ns Pin to Pin f CLK =167 MHz I/O FAST t SU = 4.0 ns t C0 = 5.5 ns High Density Function Block Fast Function Block Fast Function Block High Density Function Block Input Registers High Drive - 24 mA 3.3 /5 Volt I/O PAL-like Function Block
16 XC9500 - Flexible Architecture Function Block 1 JTAG Controller Function Block 2 I/O Function Block n 3 Global Tri-States 2 or 4 Function Block 3 I/O In-System Programming Controller FastCONNECT Switch Matrix JTAG Port 3 I/O Global Set/Reset Global Clocks I/O Blocks 1
17 XC9500 Function Block To FastCONNECT From FastCONNECT 2 or 4 3 Global Tri-State Global Clocks I/O 36 Product- Term Allocator Macrocell 1 AND Array Macrocell 18
18 XC9500 Architectural Features Uniform, PAL-like architecture Flexible function block —36 inputs with 18 outputs —Expandable to 90 product terms per macrocell —Product term and global 3-state enables —Product term and global clocks 3.3V/5V I/O operation
19 XC9500 Optimizes Pin-Locking Inputs QD/T Fixed Output Pin FastCONNECT Switch Matrix Function Block Logic Add more logic Add another FB input Add another pin or FB output 36 Inputs
23 XC4000 Configurable Logic Blocks 2 Four-input function generators (Look Up Tables) —16x1 RAM or Logic function 2 Registers - Each can be configured as Flip Flop or Latch - Independent clock polarity - Synchronous and asynchronous Set/Reset
24 Look Up Tables Capacity is limited by number of inputs, not complexity Choose to use each function generator as 4 input logic (LUT) or as high speed sync.dual port RAM Combinatorial Logic is stored in 16x1 SRAM Look Up Tables (LUTs) in a CLB Example: A B C D Z 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1... 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 Look Up Table Combinatorial Logic A B C D Z 4-bit address G Func. Gen. G4 G3 G2 G1 WE 2 (2 ) 4 = 64K !
25 ROM is Equivalent to Logic When using ROM, it is simply defining logic functions in a look-up table format —Memory might be an easier way to define logic —Xilinx provides ROM library cells FPGA lookup tables are essentially blocks of RAM —Data is written during configuration —Data is read after configuration –Effectively operate as a ROM O = I1*I2 I1 I2 O F1 F2 X DATA(0)=0 DATA(1)=0 DATA(2)=0 DATA(3)=1 A0 A1 DOUT F1 F2 X As GatesAs ROM
26 RAM Provides 16X the Storage of Flip-Flops 32 bits versus 2 bits of storage —Two 16x1 RAMS or One 32X1 Single Port Ram fit in one CLB —One 16x1 Dual Port RAM fits in one CLB 32x8 shift register with RAM = 11 CLBs —Using flip-flops, takes 128 CLBs for data alone —Address decoders not included 32 bits A0 A1 A2 A3 A4 O1 2 bits DQ DQ Q1 Q2 CLB D1 D2 WE CLK D1
28 RAM Guidelines Less than 32 words is best —32x1 or 16x2 per RAM requires only one CLB –Delays are short, (one level of logic) —Data and output MUXes are required to expand depth Less than 256 words recommended per RAM —Use external memory for 256 words or more Width easily expanded —Connect the address lines to multiple blocks Recommendation: Use less than 1/2 of max memory resources —Maximum memory uses all logic resources of CLBs
29 XC4000E I/O Block Diagram D Q Slew Rate Control Passive Pull-Up, Pull-Down Delay Vcc Output Buffer Input Buffer Q D OK (Output Clock) IK (Input Clock) I 1 2 I O T/OE Pad CE Elements in BLUE are not in the XC3000 family.
30 Xilinx FPGA Routing Fast Direct Interconnect - CLB to CLB General Purpose Interconnect - Uses switch matrix CLB Switch Matrix Switch Matrix Long Lines —Segmented across chip —Global clocks, lowest skew —2 Tri-states per CLB for busses
31 Fast Direct Interconnect Direct connections from CLB to adjacent CLB or IOB Fastest interconnect —Less than 1 ns delay CLB
32 Flexible General-Purpose Interconnect Flexible but slow if crosses many channels XC3000 —5 lines per channel XC4000 —8 similar Single- Length lines —4 Double-Length lines skip every other switch matrix —4 Quadrable-Length Lines skip three switch matrices. CLB Switch Matrix Switch Matrix
33 Single metal lines that traverse length & width of chip Lowest skew Ideal for high fan-out signals Ideal for clocking Internal three-state buffers for buses and wide functions Use Long Lines for High Fanout Nets CLB
34 CPLD or FPGA? CPLD Non-volatile Wide fan-in Fast counters, state machines Combinational Logic FPGA SRAM reconfiguration Excellent for computer architecture, DSP, registered designs PROM required for non- volatile operation
36 Avoiding Metastability Metastability caused by violation of timing specifications such as setup In-between state takes unknown time to resolve —Two destinations could be responding to different values Error rate decreases by a factor of 40 for every additional 1ns of delay before destinations respond to signal Be aware but not paranoid! DQ Data and Clock Change Simultaneously Metastable Output
37 Use Synchronous Design Easy to analyze internal timing of synchronous designs Hold time is not an issue —Clock skew is guaranteed to be much shorter than the minimum clock-to-Q of any CLB Use global clock distribution networks —If not, check for clock skew problems DQDQ 3.0ns3.1ns 2.5ns
38 Avoid Gated Clock or Asynchronous Reset Move gating to non-clock pin to prevent glitch from affecting logic Or separate input signal changes by at least a CLB delay to minimize the likelihood of a glitch DQ Carry Q0 Q1 Q2 3-Bit Counter DQ Carry-1 Q0 Q1 Q2 3-Bit Counter
39 Pipeline for Speed Register-rich FPGAs encourage pipelining Pipelining improves speed —Consider wherever latency is not an issue —Use for terminal counts, carry lookahead, etc. Clock period will be approximately —2 x (number of combinatorial levels) x (speed grade) —XC3100A-3: 3 levels x 2 x 3ns = 18 ns clock period
40 Use Dedicated Carry for Large Counters Use XC4000/XC5000 carry logic to improve counter speed and density —Especially for counters of >5 bits AdderAdder RegReg t ADDER t CO t NET
41 Use One-Hot Encoding for State Machines Shift register is always fast and dense —“One-hot” uses one flip-flop for each count —Useful for state machine encoding Use MooreType state machines. DQDQDQDQDQ
42 Use LFSRs for Fixed Count Consider Linear Feedback Shift Register for speed when terminal count is all that is needed —Or when any regular sequence is acceptable (e.g., FIFO) Maximal length sequence of 2 n -1 Use XNOR feedback to make lockup state all 1s 10-bit Shift Register Q1Q10Q7 D1
43 Use Global Clock Buffers Use clock buffers for highest fanout clocks —Drive low-skew, high-speed long line resources —Use BUFG primitive to be family-independent Limit number of clocks to ease placement issues —XC3000: 2 (GCLK, ACLK) —XC4000/XC5000: 4 (BUFGP / BUFG) Additional clocks might be routable on long lines —Otherwise routed on general interconnect –Slower and higher skew
44 Using a Clock Generated Off-Chip Connect IPAD directly to clock buffer primitive —Required for BUFGP Provides higher speed and uses fewer routing resources IPAD BUFG D
46 Use Clock Enables Instead of Gating Clock Use clock enable when using most of or all logic inputs —Not recommended to gate clock signal directly Use muxed data when using only 1-2 logic inputs —Easier to route Some macros use logic for clock enable while others use the CE pin —Make sure CE, if unused, is always connected to VCC DQ CE DQ FDxE