An Introduction to FPGA Design

An Introduction to FPGA Design
Avnet SpeedWay Workshops An Introduction to FPGA Design FPGA Architecture

Xilinx FPGA Architecture
Avnet SpeedWay Workshops Xilinx FPGA Architecture Logic Fabric Gates and flip-flops Embedded Blocks Memory DSP/Multipliers Clock management High speed serial I/O Soft/hard processors Programmable I/Os In-system programmable This presentation is somewhat targeted towards Spartan 3E (and that is the demo board used in the lab) but with discussion of other families and architectures as well. Some slides will be more or less generic depending on what is being discussed. Ensure that the audience recognizes this so that they aren’t confused and think that PPC405s are included in the Spartan-3E architecture. Ver 1.1a

Avnet SpeedWay Workshops
Logic Fabric I3 I1 I2 I0 O D Q SET RST CE Logic Cell Lookup table (LUT) Flip-Flop Carry logic Muxes (not shown) Slice Two Logic Cells Spartan-3E FPGAs 2K to 33K logic cells Explain the basic LUT/Slice architecture. Since this is generic, you may decide to mention the CLB – don’t muddy the water, however. The main idea is to explain the composition of the logic fabric, and the typical fpga sizes in terms of logic cells. The F5MUX and FiMUX benefits and operations are covered in XAPP466 here: The basics are that the additional muxes have the capability of implementing any function of 5 inputs (F5), 6 inputs (F6), 7 inputs (F7) and 8 inputs (F8) without leaving the CLB. This doesn’t introduce any level of logic delay because the routes are inside the CLB. As far as mux functionality, the F5,6,7,8 muxes can be used to create 4:1, 8:1, 16:1 and 32:1 muxes. Ver 1.1a

Memory Block RAM RAM or ROM True dual port Separate read and write ports Independent port size Data width translation Excellent for FIFOs DIA DOA DIPA DOPA ADDRA CLKA DIB DOB DIPB DOPB ADDRB CLKB Ver 1.1a

Multipliers 18 x 18 Multipliers Signed or unsigned Optional pipeline stage Cascadable 18 bit 36 bit 18 bit Pipelining in Spartan-3E means using the registers at the input and output of the multiplier. Unlike Spartan-3, the registers are part of the multiplier block and are not used from the fabric. Ver 1.1a

Clock Management Digital Clock Managers (DCMs) Clock de-skew Phase shifting Clock multiplication Clock division Frequency synthesis CLKIN CLK0 CLK90 CLKFX Ver 1.1a

Programmable I/Os Single-ended Differential / LVDS Programmable I/O standards Multiple I/O banks DDR I/O registers On-chip termination Reg DDR mux 3-State PAD Input Output I/O Banks The list of standards on the left is taken from the Spartan-3 data sheet and is only meant to be an example. Point this out during the presentation so that the audience doesn’t think that this is the complete list of electrical standards supported by the Xilinx fpgas. It is only an example. DDR I/O registers – allows data to be transmitted and/or received on both edges of the clock. This type of I/O is used in DDR memory interfaces, and other high-speed I/O schemes. Example: a 311MHz clock can be used to achieve 622Mb/s interfaces with use of DDR data tansfers, Ver 1.1a

Xilinx Spartan-3E Family
Avnet SpeedWay Workshops Xilinx Spartan-3E Family 36 28 20 12 4 18x18 Multipliers 8 2 DCMs 136K 504K 304 19,512 1.2M 15K 72K 108 2,160 100K 33,192 10,476 5,508 Logic Cells 648K 360K 216K Block RAM bits 231K 73K 38K Distributed RAM bits 376 232 172 Maximum I/O 1.6M 500K 250K Gates Device 3S1200E 3S100E 3S1600E 3S500E 3S250E This is a Spartan-3E family chard provided because the presentation / labs target the Spartan-3E. There is no V-4 slide because the presentation needs to take only 50 minutes and this is meant to be a fundamentals of design class rather than an fpga family presentation. Ver 1.1a

Avnet SpeedWay Workshops An Introduction to FPGA Design Why doing DSP In FPGA ?

High-Speed DSP Challenges
High performance digital communication and video imaging designs challenge existing DSP solutions Need higher performance Need lower costs Need lower power Compromises are often made… Performance is sacrificed Time is spent designing substitute implementations Ver 1.1a

FPGAs Enable Massively Parallel DSP
Example 256 TAP Filter Implementation Programmable DSP - Sequential FPGA - Fully Parallel Implementation Data In Data In Reg Reg Reg Reg Coefficients X … C0 X C1 X C0 C2 X C3 X C255 X MAC Unit 256 clock cycles needed + + 256 operations in 1 clock cycle Reg Data Out Data Out 1 GHz 256 clock cycles = 4 MSPS 500 MHz 1 clock cycle = 500 MSPS “… the unprecedented signal processing requirements of next-generation wireless devices threaten to outpace the capabilities of DSP processors, creating opportunities for massively parallel and highly customized devices.” BDTI, 2004 Ver 1.1a

Usual Parallel Adder Tree Implementation
Data In Reg Reg Reg Reg Reg Reg Reg Reg Reg C0 X C1 X C0 C2 X C3 X C4 X C5 X C0 C6 X C7 X C30 X C31 X + + + + + + + Consumes Logic to Implement Adders + Variable Latency + 32 TAP filter implementation will consume 1,461 logic cells to implement adders in fabric Data Out Fabric and Routing May Reduce Performance Ver 1.1a

Virtex-4 Parallel Implementation
Parallel Adder Cascade Implementation Data In Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg C0 X C1 X C2 X C3 X C4 X C5 X C6 X C7 X C30 X C31 X + + + + + + + + + + Reg Reg Reg Reg Reg Reg Reg Reg Reg Data Out Filters Implemented Entirely Within the XtremeDSP Slice Guaranteed 500MHz Performance Regardless of Filter Size 32 TAP filter implementation using 32 XtremeDSP Slices Ver 1.1a

Xilinx 4th Generation XtremeDSP
Virtex-4 XtremeDSP Highest DSP Bandwidth Available 4th Generation 256GMACs/s DSP Bandwidth GMACs/s 250 200 3rd Generation 111GMACs/s 150 2nd Generation 32GMACs/s 100 1st Generation 11GMACs/s 50 Virtex-E Virtex-II Virtex-II Pro Virtex-4 Ver 1.1a

Avnet SpeedWay Workshops An Introduction to FPGA Design Development Flow

Xilinx Design Process Implementation Constraints Silicon Design Entry Synthesis Timing Simulation Floor-Planning Behavioral Simulation Timing Analysis Place & Route Map Translate Ver 1.1a

Xilinx ISE Software ISE Foundation Windows & Linux support ISE Simulator Lite All Virtex series FPGAs All Spartan-II/3 series FPGAs All CPLDs $2,495 USD ISE WebPACK Windows & Linux support ISE Simulator Lite Limited Virtex series FPGAs All Spartan-II/3 series FPGAs All CPLDs FREE Web download or CD Optional Software Accessories ChipScope Pro Full ISE Simulator MXE-III PlanAhead FPGA real-time debug $695 For ISE Foundation HDL simulator $995 ModelTech HDL simulator $945 Hierarchical Floorplanner $4,995 Evaluation Versions Available Ver 1.1a

Project Navigator Viewing Area Sources in project Processes for source Message Console Ver 1.1a

ISE Tools and Processes
Avnet SpeedWay Workshops ISE Tools and Processes Design entry Synthesis Implementation Configuration Simulation processes only appear when a simulation testbench is the selected source. Ver 1.1a

HDL Basics Coding style affects how logic is inferred Asynch vs. Synch reset Flip-flop initial value See Language Templates for coding style examples Do not gate clocks! Introduces skew Negative effects on performance Use CE function on Flip-flop Use BUFGMUX Optimizing HDL for design performance is covered in a separate training class D CE Q R S Ver 1.1a

Don’t Re-Invent the Wheel!!
Avnet SpeedWay Workshops Don’t Re-Invent the Wheel!! 64 Tap FIR FILTER Ver 1.1a

Core Generator & Architecture Wizard
Avnet SpeedWay Workshops Core Generator & Architecture Wizard Generate customized IP Extensive library of macros (parameterized blocks) Core Generator Output Files HDL black box declaration HDL instantiation template Black box netlist Access Core Generator via ProjectNew SourceIP . . . The Fundamentals of FPGA Design course includes more information (including a lab) on the Architecture Wizard. Example: DSP Core – Finite Impulse Response (FIR) Filter Ver 1.1a

Avnet SpeedWay Workshops An Introduction to FPGA Design Timing Constraints

Timing Constraints Timing Constraints give the tools a performance goal Place & Route uses timing constraints PAR runs the timing analysis tool in the background Real-time analysis of current results against performance goals Without constraints, PAR tries to reduce run-time Finishes quickly Modest effort to optimize performance With constraints, PAR tries to meet performance goals Run-time may be longer Aggressive time constraints and higher effort levels can significantly increase run-time Ver 1.1a

Basic Timing Constraints
Avnet SpeedWay Workshops Basic Timing Constraints PERIOD – Target clock period for internal sequential paths OFFSET IN BEFORE – Target “input setup” time (Tsu) Reference between external INPUT pin and CLK pin OFFSET OUT AFTER – Target “clock to out” time (Tco) Reference between external CLK pin and OUTPUT pin Ver 1.1a

PERIOD Constraint CLOCK PERIOD NET “MYCLK" TNM_NET = " MYCLK "; TIMESPEC " TS_MYCLK " = PERIOD " MYCLK " 10 ns HIGH 50 %; Data paths between synchronous elements only Does not cover Cross clock domains between unrelated clocks Ver 1.1a

OFFSET IN BEFORE Constraint
Avnet SpeedWay Workshops OFFSET IN BEFORE Constraint OFFSET IN BEFORE OFFSET = IN 5 ns BEFORE “MYCLK" ; “The signal will be valid at the pad X nanoseconds before the clock appears at the clock pad…” Covers the first path to a synchronous element Recommendation for high performance: Use the IOB registers Ver 1.1a

OFFSET OUT AFTER Constraint
Avnet SpeedWay Workshops OFFSET OUT AFTER Constraint OFFSET OUT AFTER OFFSET = OUT 7 ns AFTER " MYCLK " ; “The signal will be valid at the pad X nanoseconds after the clock appears at the clock pad…” Covers the last path from a synchronous element Recommendation for high performance: Use the IOB registers Ver 1.1a

UCF – User Constraints File
Avnet SpeedWay Workshops UCF – User Constraints File # #Global Clock Constraint # constrain net on external pin NET “CLK” TNM_NET = “CLK”: TIMESPEC “TS_CLK” = PERIOD “CLK” 20 ns HIGH 50% #Input Timing OFFSET = IN 8 ns BEFORE “CLK” ; #Output Timing OFFSET = OUT 5 ns AFTER “CLK” ; #Pad-pad combinatorial timing TIMESPEC “TS_P2P” = FROM “PADS” TO “PADS” 15 ns; # Input timing exception from global input constraint NET “STRTSTOP” OFFSET = IN 3 ns BEFORE “CLK” ; Ver 1.1a

Timing Constraints Editor
Avnet SpeedWay Workshops Timing Constraints Editor PERIOD OFFSET OFFSET IN OUT All of the clocks in the design will appear in the “Clock Net Name” list. If unexpected signals appear, then the synthesis tool found some clocks that were unintentional. The designer needs to go back to the HDL and analyze the coding style and make any necessary changes. Ver 1.1a

Timing Analysis Synthesis Estimated Timing Report User Timing Constraints Translate Map “Post Map Static Timing” Not used for most designs PAR Timing Analyzer “Post Route Timing” Ver 1.1a

Avnet SpeedWay Workshops An Introduction to FPGA Design Thank You !

An Introduction to FPGA Design

Similar presentations

Presentation on theme: "An Introduction to FPGA Design"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Introduction to FPGA Design

Similar presentations

Presentation on theme: "An Introduction to FPGA Design"— Presentation transcript:

Similar presentations

About project

Feedback