Presentation is loading. Please wait.

Presentation is loading. Please wait.

Field Programmable Gate Arrays

Similar presentations


Presentation on theme: "Field Programmable Gate Arrays"— Presentation transcript:

1 Field Programmable Gate Arrays
EE141 Field Programmable Gate Arrays 刘鹏 Dept. ISEE Zhejiang University source: EECS150 April. 24, 2012 Winter ZDMC – Lec. #1 – 1

2 Static RAM Cell Read: Write: Why does this work? 1. Select row
EE141 Static RAM Cell 6-Transistor SRAM Cell word (row select) 1 1 Read: 1. Select row 2. Cell pulls one line low and one high 3. Sense output on bit and bit Write: 1. Drive bit lines (e.g, bit=1, bit=0) 2. Select row Why does this work? When one bit-line is low, it will force output high; that will set new state bit bit The classical SRAM cell looks like this. It consists of two back-to-back inverters that serves as a flip-flop. Here is an expanded view of this cell, you can see it consists of 6 transistors. In order to write a value into this cell, you need to drive from both sides. For example, if you want to write a 1, you will drive “bit” to 1 while at the same time, drive “bit bar” to zero. Once the bit lines are driven to their desired values, you will turn on these two transistors by setting the word line to high so the values on the bit lines will be written into the cell. Remember now these are very very tiny transistors so we cannot rely on them to drive these long bit lines effectively during read. Also, the pull down devices are usually much stronger than the pull up devices. So the first thing we need to do on read is to charge these two bit lines to a high values. Once these bit lines are charged to high, we will turn on these two transistors so one of these inverters (the lower one in our example) will start pulling one of the bit line low while the other bit line will remain at HI. It will take this small inverter a long time to drive this long bit line to low but we don’t have to wait that long since all we need to detect the difference between these two bit lines. And if you ask any circuit designer, they will tell you it is much easier to detect a “differential signal” (point to bit and bit bar) than to detect an absolute signal. +2 = 30 min. (Y:10) Winter ZDMC – Lec. #1 – 2

3 Typical SRAM Organization: 16-word x 4-bit
EE141 Typical SRAM Organization: 16-word x 4-bit Din 3 Din 2 Din 1 Din 0 WrEn - + Wr Driver - + Wr Driver - + Wr Driver - + Wr Driver Word 0 A0 SRAM Cell SRAM Cell SRAM Cell SRAM Cell A1 Word 1 Address Decoder A2 SRAM Cell SRAM Cell SRAM Cell SRAM Cell A3 This picture shows you how to connect the SRAM cells into a 15-word by 4-bit SRAM array. The word lines are connected horizontally to the address decoder while the bit lines are connected vertically to the sense amplifier and write driver. **** What do you think is longer? Word line or bit line **** Since a typical SRAM will have thousands if not millions of words (vertical) and usually be less than 10s of bits, the bit line will be much much much longer than the word line. This is bad because if we have a large load on the word line (large capacitance), we can always build a bigger address decoder to drive them no sweat. But for the bit lines, we still have to rely on the tiny transistors (SRAM cell). That’s why we need to precharge them to high and use sense amp to detect the differences. Read enable is not needed here because if Write Enable is not asserted, read is by default.(Don’t need Read Enable) The internal logic will detect an address changes and precharge the bit lines. Once the bit lines are precharged, the values of the new address will appear at the Dout pin. +2 = 32 min. (Y:12) : : : : Word 15 SRAM Cell SRAM Cell SRAM Cell SRAM Cell - + Sense Amp - + Sense Amp - + Sense Amp - + Sense Amp Dout 3 Winter ZDMC – Lec. #1 – 3 Dout 2 Dout 1 Dout 0

4 Simplified SRAM timing diagram
Read: Valid address, then Chip Select Access Time: address good to data valid even if not visible on out Cycle Time: min between subsequent mem operations Write: Valid address and data with WE_l, then CS Address must be stable a setup time before WE and CS go low And hold time after one goes high When do you drive, sample, or Z the data bus? Winter ZDMC – Lec. #1 – 4

5 Logic Diagram of a Typical SRAM
EE141 Logic Diagram of a Typical SRAM A D OE_L 2 N “words” x M bit SRAM M WE_L Write Enable is usually active low (WE_L) Din and Dout are combined to save pins: A new control signal, Output Enable (OE_L) WE_L is asserted (Low), OE_L is unasserted (High) D serves as the data input pin WE_L is unasserted (High), OE_L is asserted (Low) D is the data output pin Neither WE_L and OE_L are asserted? Chip is disconneted Never both asserted! Here is the logic diagram of a typical SRAM. In order to save pins, Din and Dout are combined into a set of bidirectional pins so you need a new control signal: Output Enable. Both write enable and output enable are usually asserted low. When Write Enable is asserted, the D pins serve as the data input pin. When Output Enable is asserted, the D pins serve as the data output pin. +1 = 33 min. (Y:13) or chipSelect (CS) + WE Winter ZDMC – Lec. #1 – 5

6 Example: ST microelectronics M68AW256M
Winter ZDMC – Lec. #1 – 6

7 EE141 Typical SRAM Timing OE determines direction Hi = Write, Lo = Read Writes are dangerous! Be careful! Double signaling: OE Hi, WE Lo A D OE_L 2 N words x M bit SRAM M WE_L Write Timing: Read Timing: Write Setup Time Write Hold Time High Z D Data In Data Out Data Out Read Access Time Read Access Time Junk For write, you set up your address and data on the A and D pins and then you generate a write pulse that is long enough for the write access time. For simplicity, I have assumed the Write setup time for address and data to be the same. In real life, they can be different. For read operation, you have disasserted Wr Enable and assert Output Enable. Since you are supplying garbage address here so as soon as you assert OE_L, you will get garbage out. If you then present an valid address to the SRAM, valid data will be available at the output after a delay of the Write Access Time. SRAM’s timing is much simpler than the DRAM timing which I will show you later. +1 = 34 min. (Y:14) A Write Address Read Address Read Address OE_L WE_L Winter ZDMC – Lec. #1 – 7

8 Read Series with CS Rate determined by cycle time
Data valid: max(addr + Access time, CS + Tco) Remain valid TOHA after addr changes Return to tri-state after read sequence Winter ZDMC – Lec. #1 – 8

9 Example 1 continued: read
Winter ZDMC – Lec. #1 – 9

10 Example 1 continued: write
Winter ZDMC – Lec. #1 – 10

11 Programmable Logic Regular logic Field Programmable Gate Arrays
EE141 Programmable Logic Regular logic Programmable Logic Arrays Multiplexers/Decoders ROMs Field Programmable Gate Arrays Xilinx Vertex “Random Logic” Full Custom Design “Regular Logic” Structured Design Winter ZDMC – Lec. #1 – 11

12 Programmable Logic Arrays (PLAs)
EE141 Programmable Logic Arrays (PLAs) Pre-fabricated building block of many AND/OR gates Actually NOR or NAND ”Personalized" by making or breaking connections among gates Programmable array block diagram for sum of products form • • • inputs AND array • • • outputs OR array product terms Winter ZDMC – Lec. #1 – 12

13 Enabling Concept Shared product terms among outputs F0 = A + B' C'
EE141 Enabling Concept Shared product terms among outputs F0 = A + B' C' F1 = A C' + A B F2 = B' C' + A B F3 = B' C + A example: input side: personality matrix 1 = uncomplemented in term 0 = complemented in term – = does not participate product inputs outputs term A B C F0 F1 F2 F3 AB 1 1 – B'C – AC' 1 – B'C' – A 1 – – output side: reuse of terms 1 = term connected to output 0 = no connection to output Winter ZDMC – Lec. #1 – 13

14 EE141 Before Programming All possible connections available before "programming" In reality, all AND and OR gates are NANDs Winter ZDMC – Lec. #1 – 14

15 After Programming Unwanted connections are "blown"
EE141 After Programming Unwanted connections are "blown" Fuse (normally connected, break unwanted ones) Anti-fuse (normally disconnected, make wanted connections) A B C F1 F2 F3 F0 AB B'C AC' B'C' Winter ZDMC – Lec. #1 – 15

16 Alternate Representation for High Fan-in Structures
EE141 Alternate Representation for High Fan-in Structures Short-hand notation--don't have to draw all the wires Signifies a connection is present and perpendicular signal is an input to gate notation for implementing F0 = A B + A' B' F1 = C D' + C' D A B C D AB AB+A'B' A'B' CD' CD'+C'D C'D Winter ZDMC – Lec. #1 – 16

17 Programmable Logic Array Example
EE141 Programmable Logic Array Example Multiple functions of A, B, C F1 = A B C F2 = A + B + C F3 = A' B' C' F4 = A' + B' + C' F5 = A xor B xor C F6 = (A xnor B xnor C)’ full decoder as for memory address bits stored in memory A'B'C' A'B'C A'BC' A'BC AB'C' AB'C ABC' ABC A B C F1 F2 F3 F4 F5 F6 A B C F1 F2 F3 F4 F5 F6 Winter ZDMC – Lec. #1 – 17

18 PLA Design Example BCD to Gray code converter
EE141 PLA Design Example BCD to Gray code converter A B C D W X Y Z – – – – – 1 1 – – – – – – X 1 X X X X X D A B C X 0 X X X X X D A B C K-map for W K-map for X X 0 X X X X X D A B C X 1 X X X X X D A B C minimized functions: W = A + B D + B C X = B C' Y = B + C Z = A'B'C'D + B C D + A D' + B' C D' K-map for Y K-map for Z Winter ZDMC – Lec. #1 – 18

19 PLA Design Example (cont’d)
EE141 PLA Design Example (cont’d) Code converter: programmed PLA A B C D W X Y Z A BD BC BC' B C A'B'C'D BCD AD' BCD' minimized functions: W = A + B D + B C X = B C' Y = B + C Z = A'B'C'D + B C D + A D' + B' C D' not a particularly good candidate for PLA implementation since no terms are shared among outputs however, much more compact and regular implementation when compared with discrete AND and OR gates Winter ZDMC – Lec. #1 – 19

20 PLA Design Example BCD to Gray code converter
EE141 PLA Design Example BCD to Gray code converter A B C D W X Y Z – – – – – 1 1 – – – – – – A A X 1 X X X X X X 0 X X X X X D D C C B B K-map for W K-map for X A A X 0 X X X X X X 1 X X X X X minimized functions: W = X = Y = Z = D D C C B B K-map for Y K-map for Z Winter ZDMC – Lec. #1 – 20

21 PLA Design Example #1 BCD to Gray code converter BC’
EE141 PLA Design Example #1 BCD to Gray code converter A B C D W X Y Z – – – – – 1 1 – – – – – – A A X 1 X X X X X X 0 X X X X X D D C C B B K-map for W K-map for X BC’ A A X 0 X X X X X X 1 X X X X X minimized functions: W = X = Y = Z = D D C C B B K-map for Y K-map for Z Winter ZDMC – Lec. #1 – 21

22 Multiplexer/Demultiplexer: Making Connections
EE141 Multiplexer/Demultiplexer: Making Connections Direct point-to-point connections between gates Multiplexer: route one of many inputs to a single output Demultiplexer: route single input to one of many outputs control control multiplexer demultiplexer 4x4 switch Winter ZDMC – Lec. #1 – 23

23 Multiplexers/Selectors
EE141 Multiplexers/Selectors Multiplexers/Selectors: general concept 2n data inputs, n control inputs (called "selects"), 1 output Used to connect 2n points to a single point Control signal pattern forms binary index of input connected to output I1 I0 A Z A Z 0 I0 1 I1 Z = A' I0 + A I1 functional form logical form two alternative forms for a 2:1 Mux truth table Winter ZDMC – Lec. #1 – 24

24 Multiplexers/Selectors (cont'd)
EE141 Multiplexers/Selectors (cont'd) 2:1 mux: Z = A' I0 + A I1 4:1 mux: Z = A' B' I0 + A' B I1 + A B' I2 + A B I3 8:1 mux: Z = A'B'C'I0 + A'B'CI1 + A'BC'I2 + A'BCI AB'C'I4 + AB'CI5 + ABC'I6 + ABCI7 In general, Z =  (mkIk) in minterm shorthand form for a 2n:1 Mux n 2 -1 k=0 I0 I1 I2 I3 I4 I5 I6 I7 A B C 8:1 mux Z I0 I1 I2 I3 A B 4:1 mux Z I0 I1 A 2:1 mux Z Winter ZDMC – Lec. #1 – 25

25 Cascading Multiplexers
EE141 Cascading Multiplexers Large multiplexers implemented by cascading smaller ones Z I0 I1 I2 I3 A I4 I5 I6 I7 B C 4:1 mux 2:1 mux 8:1 mux alternative implementation C Z A B 4:1 mux 2:1 mux I4 I5 I2 I3 I0 I1 I6 I7 8:1 mux control signals B and C simultaneously choose one of I0, I1, I2, I3 and one of I4, I5, I6, I7 control signal A chooses which of the upper or lower mux's output to gate to Z Winter ZDMC – Lec. #1 – 26

26 Multiplexers as Lookup Tables (LUTs)
EE141 Multiplexers as Lookup Tables (LUTs) 2n:1 multiplexer implements any function of n variables With the variables used as control inputs and Data inputs tied to 0 or 1 In essence, a lookup table Example: F(A,B,C) = m0 + m2 + m6 + m = A'B'C' + A'BC' + ABC' + ABC = A'B'(C') + A'B(C') + AB'(0) + AB(1) C A B S2 8:1 MUX S1 S0 F Winter ZDMC – Lec. #1 – 27

27 Multiplexers as LUTs (cont’d)
EE141 Multiplexers as LUTs (cont’d) 2n-1:1 mux can implement any function of n variables With n-1 variables used as control inputs and Data inputs tied to the last variable or its complement Example: F(A,B,C) = m0 + m2 + m6 + m = A'B'C' + A'BC' + ABC' + ABC = A'B'(C') + A'B(C') + AB'(0) + AB(1) C A B S2 8:1 MUX S1 S0 A B C F C' C' A B S1 S0 F 4:1 MUX C' C' 0 1 F Winter ZDMC – Lec. #1 – 28

28 Multiplexers as LUTs (cont’d)
EE141 Multiplexers as LUTs (cont’d) Generalization Example: F(A,B,C,D) implemented by an 8:1 MUX I0 I In-1 In F In In' 1 four possible configurations of truth table rows can be expressed as a function of In n-1 mux control variables single mux data variable C A B 1 D D’ D D’ D’ S2 8:1 MUX S1 S0 1 0 1 1 0 0 D A 0 1 B C choose A,B,C as control variables multiplexer implementation Winter ZDMC – Lec. #1 – 29

29 Demultiplexers/Decoders
EE141 Demultiplexers/Decoders Decoders/demultiplexers: general concept Single data input, n control inputs, 2n outputs Control inputs (called “selects” (S)) represent binary index of output to which the input is connected Data input usually called “enable” (G) 1:2 Decoder: O0 = G  S’ O1 = G  S 3:8 Decoder: O0 = G  S2’  S1’  S0’ O1 = G  S2’  S1’  S0 O2 = G  S2’  S1  S0’ O3 = G  S2’  S1  S0 O4 = G  S2  S1’  S0’ O5 = G  S2  S1’  S0 O6 = G  S2  S1  S0’ O7 = G  S2  S1  S0 2:4 Decoder: O0 = G  S1’  S0’ O1 = G  S1’  S0 O2 = G  S1  S0’ O3 = G  S1  S0 Winter ZDMC – Lec. #1 – 30

30 Demultiplexers as General-Purpose Logic
EE141 Demultiplexers as General-Purpose Logic n:2n decoder implements any function of n variables With the variables used as control inputs Enable inputs tied to 1 and Appropriate minterms summed to form the function A'B'C' A'B'C A'BC' A'BC AB'C' AB'C ABC' ABC C A B S2 3:8 DEC S1 S0 “1” demultiplexer generates appropriate minterm based on control signals (it "decodes" control signals) Winter ZDMC – Lec. #1 – 31

31 Demultiplexers as General-Purpose Logic (cont’d)
EE141 Demultiplexers as General-Purpose Logic (cont’d) F1 = A' B C' D + A' B' C D + A B C D F2 = A B C' D’ + A B C F3 = (A' + B' + C' + D') A B 0 A'B'C'D' 1 A'B'C'D 2 A'B'CD' 3 A'B'CD 4 A'BC'D' 5 A'BC'D 6 A'BCD' 7 A'BCD 8 AB'C'D' 9 AB'C'D 10 AB'CD' 11 AB'CD 12 ABC'D' 13 ABC'D 14 ABCD' 15 ABCD 4:16 DEC Enable C D F1 F2 F3 Winter ZDMC – Lec. #1 – 32

32 Cascading Decoders 5:32 decoder 1x2:4 decoder 4x3:8 decoders
EE141 Cascading Decoders 5:32 decoder 1x2:4 decoder 4x3:8 decoders 0 A'B'C'D'E' A'BC'DE' 3:8 DEC 3:8 DEC S2 S1 S0 S2 S1 S0 F 2:4 DEC S1 S0 ABCDE 0 AB'C'D'E' AB'CDE A B 3:8 DEC 3:8 DEC S2 S1 S0 S2 S1 S0 C D E C D E Winter ZDMC – Lec. #1 – 33

33 Read-only Memories Two dimensional array of 1s and 0s
EE141 Read-only Memories Two dimensional array of 1s and 0s Entry (row) is called a "word" Width of row = word-size Index is called an "address" Address is input Selected word is output word lines (only one is active – decoder is just right for this) 1 1 1 1 n 2 -1 i word[i] = word[j] = 1010 decoder j internal organization 0 n-1 Address bit lines (normally pulled to 1 through resistor – selectively connected to 0 by word line controlled switches) Winter ZDMC – Lec. #1 – 34

34 ROMs and Combinational Logic
EE141 ROMs and Combinational Logic Combinational logic implementation (two-level canonical form) using a ROM F0 = A' B' C + A B' C' + A B' C F1 = A' B' C + A' B C' + A B C F2 = A' B' C' + A' B' C + A B' C' F3 = A' B C + A B' C' + A B C' truth table A B C F0 F1 F2 F3 block diagram ROM 8 words x 4 bits/word address outputs A B C F0 F1 F2 F3 Winter ZDMC – Lec. #1 – 35

35 memory array (2n words by m bits)
EE141 ROM Structure Similar to a PLA structure but with a fully decoded AND array Completely flexible OR array (unlike PAL) n address lines • • • inputs decoder • • • outputs memory array (2n words by m bits) m data lines 2n word lines Winter ZDMC – Lec. #1 – 36

36 EE141 ROM vs. PLA ROM Design time is short (no need to minimize output functions) Most input combinations are needed (e.g., code converters) Little sharing of product terms among output functions Size doubles for each additional input Can't exploit don't cares Cheap (high-volume component) Can implement any function of n inputs Medium speed PLA Design tools are available for multi-output minimization There are relatively few unique minterm combinations Many minterms are shared among the output functions Most complex in design, need more sophisticated tools Can implement any function up to a product term limit Slow (two programmable planes) Winter ZDMC – Lec. #1 – 37

37 FPGA Overview Simplified version of FPGA internal architecture:
EE141 FPGA Overview Basic idea: two-dimensional array of logic blocks and flip-flops with a means for the user to configure: 1. the interconnection between the logic blocks, 2. the function of each block. Simplified version of FPGA internal architecture: Winter ZDMC – Lec. #1 – 38

38 EE141 Why FPGAs? By the early 1980’s most of the logic circuits in typical systems where absorbed by a handful of standard large scale integrated circuits (LSI). Microprocessors, bus/IO controllers, system timers, ... Every system still had the need for random “glue logic” to help connect the large ICs: generating global control signals (for resets etc.) data formatting (serial to parallel, multiplexing, etc.) Systems had a few LSI components and lots of small low density SSI (small scale IC) and MSI (medium scale IC) components. Winter ZDMC – Lec. #1 – 39

39 EE141 Why FPGAs? Custom ICs sometimes designed to replace the large amount of glue logic: reduced system complexity and manufacturing cost, improved performance. However, custom ICs are very expensive to develop, and delay introduction of product to market (time to market) because of increased design time. Note: need to worry about two kinds of costs: 1. cost of development, sometimes called non-recurring engineering (NRE) 2. cost of manufacture A tradeoff usually exists between NRE cost and manufacturing costs Winter ZDMC – Lec. #1 – 40

40 Why FPGAs? Custom IC approach viable for products that are …
EE141 Why FPGAs? Custom IC approach viable for products that are … very high volume (where NRE could be amortized), not time-to-market sensitive. FPGAs introduced as an alternative to custom ICs for implementing glue logic: improved density relative to discrete SSI/MSI components (within around 10x of custom ICs) with the aid of computer aided design (CAD) tools circuits could be implemented in a short amount of time (no physical layout process, no mask making, no IC manufacturing), relative to ASICs. lowers NREs shortens TTM Because of Moore’s law the density (gates/area) of FPGAs continued to grow through the 80’s and 90’s to the point where major data processing functions can be implemented on a single FPGA. Winter ZDMC – Lec. #1 – 41

41 Field-Programmable Gate Arrays
PLAs: 100s of gate equivalents FPGAs: s gates Logic blocks Implement combinational and sequential logic Interconnect Wires to connect inputs and outputs to logic blocks I/O blocks Special logic blocks at periphery of device for external connections Key questions: How to make logic blocks programmable? How to connect the wires? After the chip has been fabbed Winter ZDMC – Lec. #1 – 42

42 EE141 Tradeoffs in FPGAs Logic block - how are functions implemented: fixed functions (manipulate inputs) or programmable? Support complex functions, need fewer blocks, but they are bigger so less of them on chip Support simple functions, need more blocks, but they are smaller so more of them on chip Interconnect How are logic blocks arranged? How many wires will be needed between them? Are wires evenly distributed across chip? Programmability slows wires down – are some wires specialized to long distances? How many inputs/outputs must be routed to/from each logic block? What utilization are we willing to accept? 50%? 20%? 90%? Winter ZDMC – Lec. #1 – 43

43 Why are FPGAs Interesting?
Technical viewpoint: For hardware/system-designers, like ASICs only better! “Tape-out” new design every few minutes/hours. Does the “reconfigurability” or “reprogrammability” offer other advantages over fixed logic? Dynamic reconfiguration? In-field reprogramming? Selfmodifying hardware,evolvable hardware? FPGAs have tracked Moore’s Law better than any other programmable device. Staggering logic capacity growth (10000x): Winter ZDMC – Lec. #1 – 44

44 Why are FPGAs Interesting?
Logic capacity now only part of the story: on-chip RAM, high-speed I/Os, “hard” function blocks, ... Modern FPGAs are “reconfigurable systems” Have been an archetype for the semiconductor industry as a whole: But, the heterogeneity erodes the “purity”argument. Mapping is more difficult. Introduces uncertainty in efficiency of solution. Winter ZDMC – Lec. #1 – 45

45 Why are FPGAs Interesting?
Have attracted an huge amount of investment for new ventures: Most startups have failed. Why? Business dominated by Xilinx and Altera FPGAs at the leading edge of IC processing: Xilinx V7 out next year with 28nm TSMC processing Foundaries like FPGAs - regularity help get process up the “learning curve” High-volume commitment gets interest of foundry (Gives FPGAs a competitive edge over ASICs, which usually are built on an older process.) FPGAs have been wildly successful even though they are inefficient in silicon area, energy, and performance : “Measuring the Gap Between FPGAs and ASICs”, Versus ASICs: area 40X, delay 3-4X, power 12X How can this be? Is there something more important than silicon efficiency? Winter ZDMC – Lec. #1 – 46

46 FPGA Variations Families of FPGA’s differ in:
EE141 FPGA Variations Families of FPGA’s differ in: physical means of implementing user programmability, arrangement of interconnection wires, and the basic functionality of the logic blocks. Most significant difference is in the method for providing flexible blocks and connections: Anti-fuse based (ex: Actel) Non-volatile, relatively small fixed (non-reprogrammable) Winter ZDMC – Lec. #1 – 47

47 User Programmability Latches are used to:
EE141 User Programmability Latches are used to: 1. make or break cross-point connections in the interconnect 2. define the function of the logic blocks 3. set user options: within the logic blocks in the input/output blocks global reset/clock “Configuration bit stream” can be loaded under user control: All latches are strung together in a shift chain: Latch-based (Xilinx, Altera, …) reconfigurable volatile relatively large. Winter ZDMC – Lec. #1 – 48

48 Idealized FPGA Logic Block
EE141 Idealized FPGA Logic Block 4-input look up table (LUT) implements combinational logic functions Register optionally stores output of LUT what's a LUT? Winter ZDMC – Lec. #1 – 49

49 LUT as general logic gate
EE141 LUT as general logic gate Example: 4-lut An n-lut as a direct implementation of a function truth-table. Each latch location holds the value of the function corresponding to one input combination. Example: 2-lut Implements any function of 2 inputs. How many of these are there? How many functions of n inputs? Winter ZDMC – Lec. #1 – 50

50 4-LUT Implementation n-bit LUT is implemented as a 2n x 1 memory:
EE141 4-LUT Implementation n-bit LUT is implemented as a 2n x 1 memory: inputs choose one of 2n memory locations. memory locations (latches) are normally loaded with values from user’s configuration bit stream. Inputs to mux control are the CLB inputs. Result is a general purpose “logic gate”. n-LUT can implement any function of n inputs! Winter ZDMC – Lec. #1 – 51

51 Xilinx 4000 Series Programmable Gate Arrays
CLB - Configurable Logic Block 5-input, 1 output function or 2 4-input, 1 output functions optional register on outputs Built-in fast carry logic Can be used as memory Three types of routing direct general-purpose long lines of various lengths RAM-programmable can be reconfigured Winter ZDMC – Lec. #1 – 52

52 The Xilinx 4000 CLB Winter ZDMC – Lec. #1 – 53

53 Two 4-Input Functions, Registered Output
Winter ZDMC – Lec. #1 – 54

54 5-Input Function, Combinational Output
Winter ZDMC – Lec. #1 – 55

55 CLB Used as RAM Winter ZDMC – Lec. #1 – 56

56 Xilinx 4000 Interconnect Winter ZDMC – Lec. #1 – 57

57 Xilinx FPGA Combinational Logic Examples
EE141 Xilinx FPGA Combinational Logic Examples Key: General functions are limited to 5 inputs (4 even better - 1/2 CLB) No limitation on function complexity Example 2-bit comparator: A B = C D and A B > C D implemented with 1 CLB (GT) F = A C' + A B D' + B C' D' (EQ) G = A'B'C'D'+ A'B C'D + A B'C D'+ A B CD Can implement some functions of > 5 input Winter ZDMC – Lec. #1 – 58

58 Xilinx FPGA Combinational Logic
Examples N-input majority function: 1 whenever n/2 or more inputs are 1 N-input parity functions: 5 input/1 CLB; 2 levels yield 25 inputs! 5-input Majority Circuit 9 Input Parity Logic CLB CLB 7-input Majority Circuit CLB CLB CLB CLB Winter ZDMC – Lec. #1 – 59

59 Xilinx FPGA Adder Example
2-bit binary adder - inputs: A1, A0, B1, B0, CIN outputs: S0, S1, Cout Full Adder, 4 CLB delays to final carry out 2 x Two-bit Adders (3 CLBs each) yields 2 CLBs to final carry out Winter ZDMC – Lec. #1 – 60

60 FPGA Generic Design Flow
EE141 FPGA Generic Design Flow Design Entry: Create your design files using: schematic editor or hardware description language (Verilog, VHDL) Design “implementation” on FPGA: Partition, place, and route to create bit-stream file Design verification: Use Simulator to check function, other software determines max clock frequency. Load onto FPGA device (cable connects PC to development board) check operation at full speed in real environment. Winter ZDMC – Lec. #1 – 61

61 Example Partition, Placement, and Route
EE141 Example Partition, Placement, and Route Idealized FPGA structure: Example Circuit: collection of gates and flip-flops Circuit combinational logic must be “covered” by 4-input 1-output “gates”. Flip-flops from circuit must map to FPGA flip-flops. (Best to preserve “closeness” to CL to minimize wiring.) Placement in general attempts to minimize wiring. Winter ZDMC – Lec. #1 – 62

62 Xilinx Virtex-E Floorplan
EE141 Xilinx Virtex-E Floorplan Configurable Logic Blocks 4-input function gens buffers flipflop Input/Output Blocks combinational, latch, and flipflop output sampled inputs Block RAM 4096 bits each every 12 CLB columns Winter ZDMC – Lec. #1 – 63

63 Virtex-E Configurable Logic Block (CLB)
EE141 Virtex-E Configurable Logic Block (CLB) CLB = 4 logic cells (LC) in two slices LC: 4-input function generator, carry logic, storage ele’t 80 x 120 CLB array on 2000E FF or latch 16x1 synchronous RAM Winter ZDMC – Lec. #1 – 64

64 Details of Virtex-E Slice
EE141 Details of Virtex-E Slice LUT 4-input fun 16x1 sram 32x1 or 16x2 in slice 16 bit shift register Storage element D flipflip latch Combinational outputs 5 and 6 input functions Carry chain arithmetic along row or col Winter ZDMC – Lec. #1 – 65

65 Xilinx FPGAs (interconnect detail)
EE141 Xilinx FPGAs (interconnect detail) Winter ZDMC – Lec. #1 – 66

66 Virtex-E Input/Output block (IOB) detail
EE141 Virtex-E Input/Output block (IOB) detail Many I/O signaling stds D FF or latch 3-state output buf Winter ZDMC – Lec. #1 – 67

67 Virtex-E Family of Parts
EE141 Virtex-E Family of Parts Winter ZDMC – Lec. #1 – 68

68 Xilinx FPGAs How they differ from idealized array:
EE141 Xilinx FPGAs How they differ from idealized array: In addition to their use as general logic “gates”, LUTs can alternatively be used as general purpose RAM. Each 4-lut can become a 16x1-bit RAM array. Special circuitry to speed up “ripple carry” in adders and counters. Therefore adders assembled by the CAD tools operate much faster than adders built from gates and luts alone. Many more wires, including tri-state capabilities. Winter ZDMC – Lec. #1 – 69

69 Combinational Logic Implementation
EE141 Combinational Logic Implementation Regular Logic Structures Programmable Logic Arrays Programmable connections: AND-OR (NOR-NOR) Arrays Multiplexers/decoders Multipoint connections for signal routing Lookup Tables ROMs Truth table in hardware Field Programmable Gate Arrays (FPGAs) Programmable logic (LUTs, Truth Tables) and connections Advantages/disadvantages of each Winter ZDMC – Lec. #1 – 70

70 EE141 Summary Logic design process influenced by available technology AND economic drivers Volume, Time to Market, Costs, Power Fundamental understanding of digital design techniques carry over Specifics on design trade-offs and implementation differ FPGA offer a valuable new sweet spot Low TTM, medium cost, tremendous flexibility Fundamentally tied to powerful CAD tools Build everything (simple or complex) from one set of building blocks LUTs + FF + routing + storage + IOs Winter ZDMC – Lec. #1 – 71


Download ppt "Field Programmable Gate Arrays"

Similar presentations


Ads by Google