Presentation on theme: "Basic FPGA Architecture (Virtex-6)"— Presentation transcript:
1Basic FPGA Architecture (Virtex-6) Slice and I/O Resources
2Objectives After completing this module, you will be able to: Describe the CLB and slice resources available in Virtex-6 FPGAsDescribe flip-flop functionalityAnticipate building proper HDL code for Virtex-6 FPGAs
3Virtex-6 CLB CLB contains two slices CINCOUTSwitch MatrixCLB contains two slicesConnected to a switch matrix for routing toother FPGA resourcesCarry chain runs vertically in a columnfrom one slice to the one aboveThe Virtex-6 FPGA has a separate carry chain for each slice
4RoutingThe Virtex-6 FPGAs use a diagonally symmetric interconnect patternA rich set of programmable interconnections exist between one switch matrix and the switch matrices nearbyMany CLBs can be reached with only a few “hops”A hop is a connection through an active connection pointThe mapping of logical connections to these physical routing resources is entirely managed by the router (PAR)The place and route solution is directed by your use timing constraints (very important)With the exception of the carry chain, all slice connections are done through the switch matrixCLBDirect1 Hop2 Hops3 HopsThis diagram graphically describes the “pipulation” from one CLB to another. In this case, there is one direct hop to a particular neighboring CLB. There are also several more routing solutions to a neighboring CLB that only require one hop (this will have a slightly longer routing delay). Likewise, there are more ways to route that require two and three hops.The goal of routing is to assure that there are sufficient routing opportunities that enable a design to be routed to completion and meet timing. However, this will depend on your timing objective (timing constraints used). One of the best things is that the implementation tools will manage the routing of your design for you.
56-Input LUT with Dual Output 6-input LUT with 1 output or……it can be two 5-input LUTs (using common inputs) with 2 outputsMinimal speed impact for either configurationOne or two outputsAny function of six variables or two independent functions of five variablesLUTs can perform any combinatorial function limited only by the number of inputs. LUTs are the primary combinatorial logic resource and are the industry standard. The look-up table functionality is essentially a small memory containing the desired output value for each combination of input values. The truth table for the desired function is effectively stored in a small memory, where the inputs to the function act as the address to be read from the memory.The values for the storage elements are generated by the ISE® software tools, and downloaded to all LUTs during configuration. Each 6-input LUT can be configured as two 5-input LUTs. This gives the device a great deal of flexibility to build an efficient design. Thus, the slice can be used to build any function of six variables or two independent functions of five variables.
6FPGA Slice Resources Four six-input Look Up Tables (LUT) Four additional flip-flopsThese are the new flip-flopsFour flip-flop/latchesCarry chainThis is supported on four of the eight flip-flopsWide multiplexersThe implementation tools (MAP) will choose the packing of the designLUT/RAM/SRLHere is a simplified view of the full slice. The SRL cascade paths are not shown.LUT/RAM/SRLLUT/RAM/SRLLUT/RAM/SRL0 1
7Wide Multiplexers Each F7MUX combines the outputs of two LUTs together Can implement an arbitrary 7-input functionCan implement an 8-1 multiplexerThe F8MUX combines the outputs of the two F7MUXesCan implement an arbitrary 8-input functionCan implement a 16-1 multiplexerMUX output can drive or bypass the flip-flop/latchMUX is controlled by the BX/CX/DX slice inputLUT/RAM/SRLLUT/RAM/SRLThe synthesis and implementation tools will automatically map logic to the F7MUX and F8MUX when the designer uses a CASE statement to infer the appropriate behavior.LUT/RAM/SRLLUT/RAM/SRL0 1
8Carry LogicCarry logic can implement fast arithmetic addition and subtractionCarry out is propagated vertically through the four LUTs in a sliceThe carry logic propagates from one slice to the next CLB aboveRequires bit orderingCarry look-aheadCombinatorial carry look-ahead over the four LUTs in a sliceImplements faster carry cascading from slice to sliceLUT/RAM/SRLLUT/RAM/SRLLUT/RAM/SRLLUT/RAM/SRL0 1
9Flip-Flops and Latches Each slice has four flip-flop/latches (FF/L)Can be configured as either flip-flops or latchesThe D input can come from the O6 LUT output, the carry chain, the wide multiplexer, or the AX/BX/CX/DX slice inputEach slice also has four flip-flops (FF)D input can come from O5 output or the AX/BX/CX/DX inputThese don’t have access to the carry chain, wide multiplexers, or the slice inputsIf any of the FF/L are configured as latches, the four FFs are not availableFFFF/LLUT/RAM/SRLThe four original storage elements are referred to as “flip-flop/latch” elements. These correspond to the storage elements that existed in previous FPGA families. They are named AFF/LATCH, BFF/LATCH, CFF/LATCH, and DFF/LATCH.The four new storage elements are referred to simply as “flip-flop” elements. They are named AFF, BFF, CFF, and DFF.LUT/RAM/SRLLUT/RAM/SRLLUT/RAM/SRL0 1
10CLB Control SignalsAFFAll flip-flops and flip-flop/latches share the same CLK, SR, and CE signalsThis is referred to as the “control set” of the flip-flopsCE and SR are active highCLK can be inverted at the slice boundarySet/Reset (SR) signal can be configured as synchronous or asynchronousAll four flip-flop/latches are configured the sameAll four flip-flops are configured the sameSR will cause the flip-flop to be set to the state specified by the SRVAL attributeFFs in the Virtex-6 FPGA have an additional INITVALAFF/LATCHDCESRQCKDCESRQCKDCECKSRThe SRVAL of a flip-flop is set by the software depending on the reset state of the flip-flop. The SRVAL will be set to SRLOW if the flip-flop is set to 0 during the reset condition, or SRHIGH if the flip-flop is set to 1.Virtex-6 FPGA flip-flops also have a separate INITVAL, which determines the state of the flip-flop after configuration, or when the Global Set Reset (GSR) is asserted. Some synthesis tools extract the INITVAL from the initial state of the underlying reg/signal from the RTL code.● ● ●● ● ●DFFDFF/LATCHDCESRQCKDCESRQCK
11SLICEM as Distributed RAM Uses the same storage that is used for the look-up table functionSynchronous write, asynchronous readCan be converted to synchronous read using the flip-flops available in the sliceVarious configurationsSingle portOne LUT6 = 64x1 or 32x2 RAMCascadable up to 256x1 RAMDual port (D)1 read / write port + 1 read-only portSimple dual port (SDP)1 write-only port + 1 read-only portQuad-port (Q)1 read / write port + 3 read-only portsSingle PortDual PortSimple Dual PortQuad Port32x2 32x4 32x6 32x8 64x1 64x2 64x3 64x4128x1 128x2 256x132x2D 32x4D 64x1D 64x2D 128x1D32x6SDP 64x3SDP32x2Q 64x1QBy allowing these storage elements to be modified using FPGA fabric resources, the LUT can be used for the implementation of a small distributed memory. Each LUT can be a single ported 64-bit RAM with synchronous write and asynchronous read.LUTs in slices can be combined to create small dual-port and multi-port RAMs.In the Virtex-6 FPGAs, approximately one quarter of slices are SLICEMs in which the LUTs can be programmed as distributed RAMs (this varies with family).Simple dual-port configurations can be used to implement LUT FIFOs and MicroBlaze™ processor register files.Each port has independent address inputs
12SLICEM as 32-bit Shift Register Versatile SRL-type shift registersVariable-length shift registerSynchronous FIFOsContent-Addressable Memory (CAM)Pattern generatorCompensate for delay / latencyShift register length is determined by the addressConstant value giving fixed delay lineDynamic addressing for elastic bufferCascadable up to 128x1 shift register in one sliceSRL is not loadable, has no reset, and only supports serial in/serial out32MUXA5Qn32-bit Shift registerDCLKQ 31LUTIn the SLICEM slices, the LUT can also be configured as a dynamically addressable shift register. This component basically acts as a programmable pipeline delay element. The SRL has no set or reset capabilities, it is not loadable, and data can only be read serially. To ensure that software can map pipeline delays to the SRL, be sure to code them with these restrictions in mind.Each LUT6 can implement a maximum delay of 32 clock cycles. The SRLs within a slice can be cascaded for longer shift registers (up to 128). The shift register length can also be changed asynchronously by changing the value applied to the address pins (A). This means that you can dynamically change the pipeline delay associated with an SRL.SRL Configurations in one Slice (4 LUTs)16x1, 16x2, 16x4, 16x6, 16x832x1, 32x2, 32x3, 32x464x1, 64x296x1128x1
13Shift Register LUT Example 20 CyclesOperation D - NOP must add 17 pipeline stages of 64 bits each1,088 flip-flops (136 slices) or64 SRLs (16 slices)Operation AOperation B648 Cycles12 Cycles64Operation COperation D - NOP3 Cycles17 CyclesBecause there are so many SRLs in FPGAs, pipelining is an effective method of designing to increase design performance. Since pipelines can sometimes become unbalanced, it may be necessary to delay branches of the pipeline. SRLs are ideal for this purpose.In this example, you see a 64-bit bus processed through operations A, B, and C. A has a delay of eight cycles, B has a delay of twelve cycles, and C has a delay of three cycles. Because the data processed is also grouped at its output with a multiplexer, these datapaths must be synchronized so that appropriate data is compared at the multiplexer. To do this, the SRL can be used to delay the C operation by seventeen clock cycles.If you were to do this with registers, it would require 1,088 registers. If you use the SRL functionality instead, you only need 64 LUTs, each programmed for seventeen clock cycles of delay.Paths are StaticallyBalanced20 Cycles
14Two Types of Slices Two types of slices SLICEL SLICEL SLICEM SLICEL Virtex-6 FPGATwo types of slicesSLICEM: Full slice (25%)LUT can be used for logic and memory/SRLHas wide multiplexers and carry chainSLICEL: Logic and arithmetic only (75%)LUT can only be used for logic (not memory)SLICELSLICEMSLICELorSLICELIn the Virtex-6 FPGA, approximately ¼ of slices are SLICEM, the remainder are SLICEL. CLB columns on both sides of the block RAM columns have SLICEM/SLICEL CLBs, resulting in slightly more than ¼ of SLICEM.
15I/O Bank Structure I/Os are grouped into banks All I/O banks are in columns9 – 30 I/O banks, depending on chip type40 I/Os per bankUsed to clock data in and clock data out of the deviceVoltage translation only allows compatible I/O standards in one bank (share common power supply)This is called the I/O banking rulesBased on common VCCO, VREFMore I/O banks allows greater mixture of standards across the chipClocking resources specific to each bankGlobal and/or regional clocking resourcesVirtex-6 FPGABANKThe Virtex-6 T subfamily will have a fifth column of IOBs on the right edge of the die.
16I/O VersatilityEach I/O supports 40+ voltage and protocol standards, includingLVCMOSLVDS, Bus LVDSLVPECLSSTLHSTLRSDS_25 (point-to-point)Based on banking rules (some standards not compatible within the same bank)Each pin can be input and output (including 3-state)Each pin can be individually configuredIODELAY, drive strength, input threshold, termination, weak pull-up or pull-downI/O standards will vary slightly by device family, so be sure to check your device data sheet.There is also a 3-state buffer available for each I/O pin. This typically implements 3-state outputs or bi-directional I/O.
17I/O Electrical Resources P and N pins can be configured as single- ended…or differential pairThis example shows a differential pair that is coupling two neighboring (and pre-assigned) pinsReceiver available in all banksReceiver termination available in all banksTxPRxLVDSTerminationTxNRx
18IOB Element Input path Output path Two DDR registersOutput pathTwo 3-state enable DDR registersSeparate clocks and clock enables for input and outputSet and reset signals are sharedTo clock the DDR registers, remember that you can use any pair of the PLL outputs that are 180 degrees out of phase (such as the CLK90 and CLK270 outputs, likewise the CLK2X and CLK2X180, CLKFX and CLKFX180).
19Interconnect to FPGA fabric I/O Logical ResourcesTwo IOLOGIC blocks per I/O pairMaster and slaveCan operate independently or concatenatedEach IOLOGIC contains…IOSERDESParallel to serial converter (serializer)Serial to parallel converter (De-serializer)IODELAYSelectable fine-grained delaySDR and DDR resourcesMaster IOLOGICIOSERDESIODELAYInterconnect to FPGA fabricSlave IOLOGICIOSERDESIODELAY
20Flip-Flop Details All eight flip-flops share the same control signals CK – clockCE – Clock EnableSR – Set/ResetEach flip-flop has four input signalsD – data inputCE – clock enable (Active High)SR – async/sync set/reset (Active High)Either Set or Reset can be implemented (not both)DCESRQFFCK
21Design Tips FF1 Suggestions for faster and smaller designs FF8 Design synchronouslyUse a synchronous Set/Reset whenever possibleDon’t gate your clock (use the CE)Manage your clocks skew (use global or regional clock routing resourcesLeverage FPGA Global Reset whenever possibleRequires instantiation of the Startup componentSave routing resourcesUse active-high CE and Set/Reset (no local inverter)DQCECKSR● ● ●FF8DQCECKSR
22Software packs logic for optimum performance Software intelligently packs logicDesignFPGALUTSliceSoftware places the logic and flip-flop in the same sliceLUTLUTThis process is called “related packing,” and is a function of MAP. It will only be possible if the control signals associated with the FFs are identical.You can see the amount of related and unrelated packing by looking at the MAP report (map.mrp).Related logic and flip-flops are codedSoftware packs logic for optimum performance
23Control Signals Different flip-flop configurations Case Design FPGA If coded registers do not map cleanly to the flip-flops, the software tools will automatically implement the missing functionality by using additional slice resourcesCan increase overall LUT utilizationCaseDesignFPGACE active LowBoth Synchronous Set and Reset are usedIn earlier architectures (Virtex-4/Spartan-3 and earlier FPGAs), the slice flip-flops had additional features. Including local inversion of the control signals and the availability of dedicated Set and Reset ports.In the Virtex-6 FPGAs, code that calls for these additional features are still supported, however, the software will automatically implement equivalent logic by using LUT resources. Both the inverter and OR gate shown in the examples above can be implemented using LUT resources. This may increase your overall LUT usage.For new designs, it is best to consider the capabilities of the Virtex-6 flip-flops when coding. Use active high resets and chip enables, and avoid circuits that will require both Set and Reset controls.DQCECKCEDQDCKDQCKSsetSResetDDQSsetSResetSRCKSoftware uses logic to map extra control functions
24Control Set ReductionFlip-flops with different control sets cannot be packed into the same sliceSoftware can be instructed to reduce the number of control sets by mapping control logic to LUT resourcesThis results in higher LUT utilization, but a lower overall slice utilizationThis feature can be controlled using the “Reduce Control Sets” property of the synthesis process. In some instances, the increased combinatorial logic can be combined with existing logic, or placed in an unused LUT connected to the flip-flop. The overall increase in LUT utilization may be small (this will vary by design).A design can only be implemented in a particular FPGA if the number of slices used by the design is less than or equal to the number that exist in that device. Therefore, reducing the total number of slices used can be important when trying to keep your FPGA small.DesignFPGADQCKSsetSReset1 SliceDQCKDQCK3 SlicesSsetDQCKSReset
25Using the Slice Resources Three primary mechanisms for using FPGA resourcesInferenceDescribe the behavior of the desired circuit using Register Transfer Language (RTL)The synthesis tool will analyze the described behavior and use the required FPGA resources to implement the equivalent circuitInstantiationCreate an instance of the FPGA resource using the name of the primitive and manually connecting the ports and setting the attributesCORE Generator™ tool and Architecture WizardThe CORE Generator software and Architecture Wizard are graphical tools that allow you to build and customize modules with specific functionalityThe resulting modules range from simple modules containing few FPGA resources or highly complex Intellectual Property (IP) cores
26InferenceAll primary slice resources can be inferred by XST and SynplifyLUTsMost combinatorial functions will map to LUTsFlip-flopsCoding style defines the behaviorSRLNon-loadable, serial functionalityMultiplexersUse a CASE statement or other conditional operatorsCarry logicUse arithmetic operators (addition, subtraction, comparison)Inference should be used wherever possibleHDL code is portable, compact, and easily understood and maintainedNote that coding for an SRL with a reset functionality will infer extra logic resources (depending on your synthesis tool) that will not only be significantly larger, but will require multiple clock cycles to clear.
27InstantiationFor a list of primitives that can be instantiated, see the HDL library guideProvides a list of primitives, their functionality, ports, and attributesUse instantiation when it is difficult to infer the exact resource you wantFor a list of possible configurations for the sequential elements, refer to the Libraries Guide on The Libraries Guide contains a list of all of the possible primitives and macros that Xilinx has to offer. All primitives and macros are listed in alphabetical order and include a schematic drawing, port names (for HDL instantiation), attribute names, a functional description, and a truth table on the behavior of the component.One of the benefits of using the Libraries Guide is that while inference of a resource can sometimes be challenging, you can always instantiate the primitive you want into your design. In fact, it is common practice to instantiate the high-end cores that are available in Virtex-6 devices. You should at least look at the document once.Another option available to you is to use the Architecture Wizard and CORE Generator software to instantiate device primitives. These utilities allow you to customize components with GUIs and then copy the generated instantiation template into your design. The Architecture Wizard is used for adding common components, such as the Digital Clock Managers (commonly called the DCMs).The CORE Generator software is used to add larger components, such as filters, arithmetic components, and bus interfaces. The CORE Generator software is used in the Designing for Performance course.Help Software Manuals Libraries Guides
28Architecture WizardThe CORE Generator tool and Architecture Wizard can help you create modules with the required functionalityTypically used for FPGA-specific resources (like clocking, memory, or I/O), or for more complex functions (like memory controllers or DSP functions)
29Summary All slices contain four 6-input LUTs and eight registers LUTs can perform any combinatorial function of up to six inputs or two functions of five inputsFour of the eight registers can be used as flip-flops or latches; the remaining four can only be used as flip-flopsFlip-flops have active high CE inputs and active high synchronous or asynchronous Set/Rest inputsSLICEL slices also contain carry logic and the dedicated multiplexersThe MUXF7 multiplexers combine LUT outputs to create 8-input multiplexersThe MUXF8 multiplexers combine the MUXF7 outputs to create 16-input multiplexersThe carry logic can be used to implement fast arithmetic functionsThe LUTs in SLICEM slices can also SRL and distributed memory functionalityManage your control set usage to reduce the size and increase the speed of your design
30Where Can I Learn More? Software Manuals Start Xilinx ISE Design Suite 13.1 ISE Design Tools Documentation Software ManualsThis includes the Synthesis & Simulation Design GuideThis guide has example inferences of many architectural resourcesXST User GuideHDL language constructs and coding recommendationsTargeting and Retargeting Guide for Virtex-6 FPGAs, WP309Virtex-6 FPGA User GuidesXilinx Education Services coursesXilinx tools and architecture coursesHardware description language coursesBasic FPGA architecture, Basic HDL Coding Techniques, and other Free Videos!Check out the Virtex-6 FPGA user guides and data sheets at