Download presentation
Presentation is loading. Please wait.
1
Field Programmable Gate Arrays (FPGA)
EE446 Embedded Architectures
2
What is an FPGA? It is primarily a semiconductor device that can be configured by the user (customer or designer) after the manufacturing process has been completed The term "field-programmable" means the device is programmed by the customer, not the manufacturer. Can be programmed using a logic circuit diagram or source code in VHDL or Verilog It offers partial re-configuration of a portion of design
3
What is an FPGA? An FPGA (Field Programmable Gate Array) is a reprogrammable chip which contains hundreds of thousands of logic gates that internally connects together to build complex digital circuitry. 4/20/2017
4
Benefits of FPGA’s Real-time analysis of high-rate data streams (Performance) Deterministic hardware dedicated to every task (Reliability) Nonrecurring engineering expenses (Reconfigurability ) Radiation Hardened and Program Integrity (Durability) Flexible and rapid prototyping (Development) The shortcomings of using common commercial electronics for space applications include extensive radiation-induced damage, unique parts acquisition problems, and lengthy timelines for acquiring custom parts. Meanwhile, the nonrecurring engineering costs of redesigning and fabricating specially hardened parts is often prohibitive. AFRL's (Air Force Research Lab) Spacecraft Electronics Branch is working to solve problems of affordability and scheduling for custom electronic functions, and Xilinx Corporation's complete circuit-by-circuit, block-by-block redesign effort--involving about a billion transistors--translates to what is by far the most complex and advanced chip ever hardened for space. Xilinx plans to start beta testing with seven early-adopters representing major satellite and missile prime contractors, followed by an off-the-shelf product announcement thereafter. With this radiation-hardened FPGA development, Air Force space and missile avionics programs will be able to develop the next-generation electronics capabilities they need at a fraction of the cost of custom logic chips.
5
FPGA Performance FPGAs excel at computing non-data dependent algorithms in parallel. Customizable data path and ALU allow very large amounts of data to be transferred and computed within several clock cycles. Despite lower clock frequencies, FPGA’s can outperform conventional CPU’s on certain data processing tasks “Parallel processing is benefical for images/video, since the calculations involved are indepependet to each other so you can do entire matrix operations at once.” 4/20/2017
6
Enabling Technology Cheap/fast fuse connections-One time programmable
small area (can fit lots of them) low resistance wires (fast even if in multiple segments) very high resistance when not connected small capacitance (wires can be longer) Antifuse: One-time programmable Pass transistors (switches) used to connect wires bi-directional EEPROM SRAM Multiplexors used to connect one of a set of possible sources to input can be used to implement logic functions An antifuse is an electrical device that performs the opposite function to a fuse. Whereas a fuse starts with a low resistance and is designed to permanently break an electrically conductive path (typically when the current through the path exceeds a specified limit), an antifuse starts with a high resistance and is designed to permanently create an electrically conductive path (typically when the voltage across the antifuse exceeds a certain level). Xilinx FPGAs - 6
7
Comparable Technology
FPGAs have always been slower and required more energy leading to less functionality than ASICs Due to fabrication enhancements, and greater R&D the performance has been nearly normalized between FPGAs and ASICs An application-specific integrated circuit (ASIC) is an integrated circuit (IC) customized for a particular use, rather than intended for general-purpose use. For example, a chip designed solely to run a cell phone is an ASIC.
8
Comparable Technology
Advantages of FPGAs over ASICs: Shorter time to market Can be re-programmed in the field to fix bugs, and lower engineering costs Hardware can be developed on ordinary FPGAs, leading to a finalized version that can no longer be modified after the design has been decided
9
The reasons they may NOT fit your design are:
Power consumption - FPGAs fundamentally use a lot more power than ASICs Price - they also fundamentally cost more Speed - ASICs can still blow any FPGA away in speed although design techniques can help with this issue Density - ASICs can still pack a lot more logic into a single chip than an FPGA IP - modern, complex IP (a complete PCI Express of Hyper-transport core for example) may take up most or all of an FPGA but only 10% of an ASIC
10
Architectural Features of FPGAs
Common FPGA architecture involves: Configurable Logic Blocks (CLBs) I/O pads Routing Paths usually of the same width (# of wires) A lookup table is a data structure in the form of an array, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than undergoing an 'expensive' computation The flip flop allows for control of the output in correlation with clock management and is also used for feedback purposes Each input is accessible from one side of the logic block, while the output pin can connect to routing wires in both the channel to the right and the channel below the logic block. Each logic block output pin can connect to any of the wiring segments in the channels adjacent to it. Similarly, an I/O pad can connect to any one of the wiring segments in the channel adjacent to it. For example, an I/O pad at the top of the chip can connect to any of the W wires (where W is the channel width) in the horizontal channel immediately below it. Generally, the FPGA routing is unsegmented. That is, each wiring segment spans only one logic block before it terminates in a switch box. By turning on some of the programmable switches within a switch box, longer paths can be constructed. For higher speed interconnect, some FPGA architectures use longer routing lines that span multiple logic blocks. Whenever a vertical and a horizontal channel intersect, there is a switch box. In this architecture, when a wire enters a switch box, there are three programmable switches that allow it to connect to three other wires in adjacent channel segments. The pattern, or topology, of switches used in this architecture is the planar or domain-based switch box topology. In this switch box topology, a wire in track number one connects only to wires in track number one in adjacent channel segments, wires in track number 2 connect only to other wires in track number 2 and so on Standard Logic Block Logic Block Pin Assignment
11
CAD process to implement a circuit in an FPGA
Logic optimization. Performs two-level or multi-level minimization of the Boolean equations to optimize area, delay, or a combination of both. Technology mapping. Transforms the Boolean equations into a circuit of FPGA logic blocks. This step also optimizes the total number of logic blocks required (area optimization) or the number of logic blocks in time-critical paths (delay optimization).
12
CAD process to implement a circuit in an FPGA
Placement. Selects the specific location for each logic block in the FPGA, while trying to minimize the total length of interconnect required. Routing. Connects the available FPGA’s routing resources1 with the logic blocks distributed inside the FPGA by the placement tool, carrying signals from where they are generated to where they are used.
13
Programming Technologies
Fuse and anti-fuse fuse makes or breaks link between two wires one-time programmable Flash High density Process issues RAM-based memory bit controls a switch that connects/disconnects two wires can be programmed and re-programmed easily (tested at factory)
14
Tradeoffs in FPGAs Logic block - how are functions implemented: fixed functions (manipulate inputs) or programmable? support complex functions, need fewer blocks, but they are bigger so less of them on chip support simple functions, need more blocks, but they are smaller so more of them on chip
15
Tradeoffs in FPGAs Interconnect how are logic blocks arranged?
how many wires will be needed between them? are wires evenly distributed across chip? programmability slows wires down – are some wires specialized to long distances? how many inputs/outputs must be routed to/from each logic block? what utilization are we willing to accept? 50%? 20%? 90%?
16
Xilinx Programmable Gate Arrays
CLB - Configurable Logic Block 5-input, 1 output function or 2 4-input, 1 output functions optional register on outputs Built-in fast carry logic Can be used as memory Three types of routing direct general-purpose long lines of various lengths RAM-programmable can be reconfigured Note that the LUT is just RAM
17
In electronics, a delay-locked loop (DLL) is a digital circuit similar to a phase-locked loop(PLL), with the main difference being the absence of an internal voltage-controlled oscillator, replaced by a delay line. A DLL can be used to change the phase of a clock signal (a signal with a periodicwaveform), usually to enhance the clock rise-to-data output valid timing characteristics of integrated circuits (such as DRAM devices). DLLs can also be used for clock recovery(CDR). From the outside, a DLL can be seen as a negative-delay gate placed in the clock path of a digital circuit. The DLL circuitry allows for very precise synchronization of external and internal clocks.
18
The Virtex CLB
19
Details of One Virtex Slice
20
CLB Slice Structure Each slice contains two sets of the following:
Four-input LUT Any 4-input logic function, or 16-bit x 1 sync RAM (SLICEM only) or 16-bit shift register (SLICEM only) Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control Two slices form a CLB. These slices can be used independently or together for wider logic functions.Within each slice also, the LUT and the flip flop can be used for the same function or for independent functions. The flip flops do not handcuff the designers into only having a set or clear. And for more ASIC like flows, the flip flop can be sued as latch. So, the designers do not need to re-code the design for the device architecture.
21
Implements any Two 4-input Functions
registered
22
Implement Some Larger Functions
e.g. 9-input parity
23
LUT (Look-Up Table) Functionality
Look-Up tables are primary elements for logic implementation Each LUT can implement any function of 4 inputs
24
Carry & Control Logic SLICE Carry & Control Logic Carry & Control
COUT YB Look-Up Table Carry & Control Logic Y G4 G3 G2 G1 S D Q O CK EC R F5IN BY SR XB Look-Up Table Carry & Control Logic X S F4 F3 F2 F1 D Q O The configurable logic block (CLB) contains two slices. Each slice contains two 4-input look-up tables (LUT), carry & control logic and two registers. There are two 3-state buffers associated with each CLB, that can be accessed by all the outputs of a CLB. Xilinx is the only major FPGA vendor that provides dedicated resources for on-chip 3-state bussing. This feature can increase the performance and lower the CLB utilization for wide multiplex functions. The Xilinx internal bus can also be extended off chip. CK EC R CIN CLK CE SLICE
25
Carry & Control Logic in Xilinx FPGAs
COUT 1 1 x y y CIN CIN Propagate = x y Generate = y Sum= Propagate CIN = x y CIN
26
Carry & Control Logic LUT Hardwired (fast) logic
28
Critical Path for an Adder Implemented Using Xilinx Spartan 3 FPGAs
29
Xilinx Routing The general architecture of Xilinx FPGAs consists of a two-dimensional array of programmable blocks, called Configurable Logic Blocks – CLBs, with horizontal and vertical routing channels between CLB’s rows and columns.
30
Island Style Architectecture
31
Connection boxes Flexibility of Connection, Fc = 2, Can A connect to B? Connection boxes: The C boxes connect the channel wires with the input and output pins of the CLBs. It has two major properties that can affect the routability of a design: its flexibility, Fc, which is the number of wires that each logic block pin can connect to; and its topology, which is the pattern of switches2 that make the connection (especially if Fc is low). For example in figure 1, for a C box with Fc = 2, topology 1 can not connect pin A with pin B, meanwhile topology 2 can.
32
Switch Boxes Fs, defines for a wiring segment entering the S block the number of other wiring segments it can be connected to The S boxes allow wires to switch between vertical and horizontal wires. Its flexibility, Fs, defines for a wiring segment entering the S block the number of other wiring segments it can be connected to. The topology of the S blocks is very important since it is possible to choose two different topologies with the same flexibility Fs that result in very different routabilities. For example, this figure shows that meanwhile topology 1 can’t connect wire A with B, topology 2 can.
33
Routings using C and S Boxes
34
Single length wire Single-length lines. They are intended for relatively short connections among CLBs and they span through one CLB only.
35
Double-length and Long Lines
Double-length lines. They are similar to the Single-length Lines, except that each one spans two CLBs, offering lower routing delays for moderately long connection. Long lines. They are appropriate for connections that require reaching several CLBs with low-skew.
36
Routing Algorithms Maze Router A* Search Routing The Pathfinder
37
Example Maze Router The Maze routing algorithm is based on a wavefront expansion technique that attempts to find the shortest path between two points while avoiding any used routing resources. This algorithm is an iterative process that rips up and re-routes some of the routes to eliminate congested routing channels. The principal drawback of the maze routing is that it does the routing without taking into account that the path found can block the routing of the subsequent nets. This means that the performance of the algorithm is net ordering dependent, and different orderings will yield different results. For i.e.: if the order in which the two nets are routed in figure 5 is reversed, a better solution is found.
38
Xilinx Virtex-II Pro Development System
Has 2 Physical Power PC Cores
39
Xilinx Virtex-II Pro Development System Logic and FPGA Interaction
Top: Block Diagram of the Virtex-II System LEFT Side: Ways to configure the FPGA, Power the FPGA, Manage Clocking of the FPGA RIGHT Side: Components/Peripherals and Outputs that can be utilized by the FPGA Bottom: I/O pad connections to the Peripheral Devices
40
Xilinx Virtex 5 Development System (Front)
Unlike the Virtex 2 the Virtex 5 uses a MicroBlaze processor; it is entirely software-based. Designed for Xilinx FPGAs from Xilinx. As a soft-core processor, MicroBlaze is implemented entirely in the general-purpose memory and logic fabric of Xilinx FPGAs
41
Xilinx Virtex 5 Development System (Back)
42
Xilinx Virtex 5 Development System Diagram
Shows the input and output associations with the Virtex-5 FPGA
43
Virtex 5 Development System Components (FPGA)
In Comparison to the Virtex 2 Configurable Logic Blocks Array (Row*Column): 80*46 Virtex 2 Slices: 13,969 Max Distributed RAM (Kb): 428 Block RAM Blocks Max (Kb): 2,448 Configurable Logic Blocks Array (Row*Column): 160*54 Virtex 5 Slices: 17,280 Max Distributed RAM (Kb): 1,120 Block RAM Blocks 18Kb: 296 36Kb: 148 Max (Kb): 5,328 DSP48E Slices: 64 CMTs: 6 PowerPC Processor Blocks: 0 1. Virtex-5 FPGA slices are organized differently from previous generations. Each Virtex-5 FPGA slice contains four LUTs and four flip-flops (previously it was two LUTs and two flip-flops.) 2. Each DSP48E slice contains a 25 x 18 multiplier, an adder, and an accumulator. 3. Block RAMs are fundamentally 36 Kbits in size. Each block can also be used as two independent 18-Kbit blocks. 4. Each Clock Management Tile (CMT) contains two DCMs and one PLL. 5. This table lists separate Ethernet MACs per device. 6. RocketIO GTP transceivers are designed to run from 100 Mb/s to 3.75 Gb/s. RocketIO GTX transceivers are designed to run from 150 Mb/s to 6.5 Gb/s. 7. This number does not include RocketIO transceivers. 8. Includes configuration Bank 0. DCM: Digital Clock Manager A phase-locked loop or phase lock loop (PLL) is a control system that generates a signal that has a fixed relation to the phase of a "reference" signal. A phase-locked loop circuit responds to both the frequency and the phase of the input signals, automatically raising or lowering the frequency of a controlled oscillator until it is matched to the reference in both frequency and phase. A phase-locked loop is an example of a control system using negative feedback. In simpler terms, a PLL compares the frequencies of two signals and produces an error signal which is proportional to the difference between the input frequencies. The error signal is then low-pass filtered and used to drive a voltage-controlled oscillator (VCO) which creates an output frequency. The output frequency is fed through a frequency divider back to the input of the system, producing a negative feedback loop. If the output frequency drifts, the error signal will increase, driving the frequency in the opposite direction so as to reduce the error. Thus the output is locked to the frequency at the other input. This input is called the reference and is often derived from a crystal oscillator, which is very stable in frequency.
44
Virtex 5 Development System (FPGA I/O Blocks)
I/O blocks provide the interface between package pins and the internal configurable logic Most popular and leading-edge I/O standards are supported by programmable I/O blocks (IOBs) I/O Blocks are those such as USB 2.0, Serial, RS232
45
Virtex 5 Development System (FPGA Configurable Logic Blocks)
The basic logic elements for Xilinx® FPGAs, providing combinatorial and synchronous sequential logic as well as distributed memory and shift register capability Virtex-5 FPGA CLBs are based on real 6-input look-up table technology and provide superior capabilities and performance
46
Virtex 5 Development System (FPGA Block RAM)
Block RAM modules provide flexible 36 Kbit true dual port RAM that are cascadable; this allows for the formation of larger memory blocks Virtex-5 FPGA block RAMs possess programmable FIFO logic for increased device utilization Each block RAM can also be configured as two independent 18 Kbit true dual- port RAM blocks, providing for designs needing smaller RAM blocks Inside of each small logic block is a configurable lookup table. It is normally used for logic functions, but you can reconfigure it as a few bits of RAM. You can combine several (or many) of them into a larger RAM. This is distributed RAM.
47
Virtex 5 Development System (FPGA DSP48E Slices & CMT)
Cascadable embedded DSP48E slices with 25 x 18 two’s complement multipliers and 48-bit adder/subtracter/accumulator provide massively parallel DSP algorithm support Clock Management Tile (CMT) blocks provide the most flexible, highest-performance clocking for FPGAs In addition, each DSP48E slice can be used to perform bitwise logical functions. Each CMT contains two Digital Clock Manager (DCM) blocks (self-calibrating, fully digital), and one PLL block (self calibrating, analog) for clock distribution delay compensation, clock multiplication/division, coarse-/fine-grained clock phase shifting, and input clock jitter filtering.
48
Virtex 5 Development System Components (Continued)
16-Character x 2-Line LCD 256 MB SODIMM Compact Flash Card The Xilinx System ACE Compact Flash (CF) configuration controller allows a Type I Compact Flash card to program the FPGA through the JTAG port. A DIMM or dual in-line memory module, comprises a series of dynamic random-access memory integrated circuits. SO-DIMMs (also written SODIMMs) are a smaller alternative to a DIMM, being roughly half the size of regular DIMMs. SO-DIMMs are often used in systems which have space restrictions. The LCD module has a connector that allows the LCD to be removed from the board to access to the components below it. Both hardware and software data can be downloaded through the JTAG port. The System ACE controller supports up to eight configuration images on a single Compact Flash card. The configuration address switches allow the user to choose which of the eight configuration images to use.
49
Virtex 5 Development System Components (Continued)
Eight general-purpose (active-High) DIP switches are connected to the user I/O pins of the FPGA 15 LEDs controllable by the FPGA: 8 green LEDs are general purpose LEDs arranged in a row, 5 green LEDs are positioned next to the pushbuttons, 2 red LEDs are for error conditions, but Is not limited to that purpose Some LEDs are buffered through the CPLD to allow the LED signals to be used as higher performance I/O by way of the XGI expansion connector. The Analog Devices AD1981 Audio Codec supports stereo 16-bit audio with up to 48-kHz sampling. The sampling rate for record and playback can be different. Separate audio jacks are provided for Microphone, Line In, Line Out, and Headphone. All jacks are stereo except for Microphone. The Headphone jack is driven by the audio codec's internal 50-mW amplifier. The SPDIF jack supplies digital audio output from the codec. Ethernet Port 10/100/1000 Mb/s Audio Jacks for Microphone, Line In, Line Out, and Headphone. Supports stereo 16-bit audio with up to 48-kHz sampling
50
Virtex 5 Development System Components (Continued)
The USB Controller provides USB connectivity for the board and supports host and peripheral modes of operation. The USB controller has an internal microprocessor to assist in handling of USB commands. The firmware for this processor can be stored in its own dedicated IIC EEPROM or can be downloaded from a host computer via a peripheral connector. The USB controller‘s serial port is connected to J30 through an RS-232 transceiver to assist with debug. USB Controller with Host and Peripheral Ports A Cypress CY7C67300 embedded USB host controller provides USB connectivity for the board. The USB controller supports host and peripheral modes of operation. The USB controller has two serial interface engines (SIE) that can be used independently. SIE1 is connected to the USB Host connector (P18). SIE2 is connected only to the USB Peripheral connector (P17). The USB controller has an internal microprocessor to assist in processing USB commands. The firmware for this processor can be stored in its own dedicated IIC EEPROM (U28) or can be downloaded from a host computer via a peripheral connector. The USB controller‘s serial port is connected to J30 through an RS-232 transceiver to assist with debug. Jumper J50 can be installed to prevent the USB controller from executing firmware stored in the IIC EEPROM. The JTAG port supports the Xilinx Parallel Cable III, Parallel Cable IV, or Platform USB cable products. Third-party configuration products might also be available. The JTAG chain can also be extended to an expansion board by setting jumper J21 accordingly The JTAG configuration port for the allows for programming the FPGA along with debugging support.
51
Programming Environment (Terminology)
52
Programming Environment (ISE Simulator)
ISE Foundation (Project Navigator) allows for the start of the FPGA design process Runs in background to maintain operation and flow of design by managing the chain of tools involved including but not limited to: Embedded Development Kit (EDK), ChipScope Pro and AccelDSP EDK consists of XPS as mentioned before this can be run independently to begin a project however use of the project navigator provides for a more organized design process of an embedded system
53
Programming Environment (EDK)
XPS (Xilinx Platform Studio) and the XPS SDK (Software Development Kit) are the main components of the EDK Allows for the utilization for the Base System Builder (BSB) if required for development of an existing board including layout and pin connections Given that you have a supported embedded processor development board available from Xilinx the BSB allows you pick from the peripherals available on that board, automatically match the FPGA pinout to the board, and create a completed platform and test application ready to download and run on the board.
54
Programming Environment (Base System Builder)
The Base System Builder allows for the selection of the following system attributes: Processor type (MicroBlaze or PowerPC, depending on your selected target FPGA device) Reference and processor-bus clock frequency (BSB automatically infers and configures a Digital Clock Manager (DCM) primitive when needed) Standard processor buses (all peripherals are automatically connected via appropriate buses) Debug interface Cache configuration Memory size and type (both on-chip block RAM and controllers for off-chip memory devices) Common peripherals (such as general purpose I/O, Universal Asynchronous Receiver-Transmitter (UART), and timer) Automatic selection of the on-board FPGA Selection of clock rates supported by the on-board oscillators Automatic setting of reset polarity Automatic generation of FPGA pinout to match the board connections, for the selected set of peripherals
55
Programming Environment (EDK) (Continued)
Upon completion of BSB a Microprocessor Hardware Specification (MHS) file is created and loaded into the XPS project The XPS can then be used to develop the embedded subsystem that was established through the BSB, which acts as a wizard/template for overall board capabilities The next course of action would be to design all constraints, etc. of the system Add the embedded system as a sub module to a top-level Xilinx® ISE® project in Project Navigator; declare, instantiate, and interconnect the embedded sub module in your top-level FPGA design when choosing to begin through the ISE Project Navigator
56
Programming Environment (Project Flow)
57
Applications
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.