Presentation is loading. Please wait.

Presentation is loading. Please wait.

Virtex-6 and Spartan-6 HDL Coding Techniques

Similar presentations


Presentation on theme: "Virtex-6 and Spartan-6 HDL Coding Techniques"— Presentation transcript:

1 Virtex-6 and Spartan-6 HDL Coding Techniques
Xilinx Training

2 Welcome If you are new to FPGA design, this module will help you code properly for Spartan-6 and Virtex-6 register resources These design techniques promote fast and efficient FPGA designs Page 2

3 Objectives After completing this module you will be able to…
Code your register resources so your design will have fewer control sets and run at a higher system speed Avoid the most common coding mistakes that reduce device utilization and system speed Anticipate how your design will map to the register resources Page 3

4 Introduction There is no single “perfect” way to create a design
The coding techniques described here are strongly recommended because they reduce the device utilization and this enables the implementation tools to obtain a better place and route solution A better PAR solution gets better system speed There are however guidelines that usually lead to improved results The expert that developed this material described this content as “things that generally work for me and work in the designs I've seen”. The tips provided in this module are suggestions and their results may vary. Page 4

5 Tactics to Meet Timing Use as many of the dedicated resources as possible (SRLs, DSP slices, and block RAMs) Clever use of unused dedicated resources can improve system speed, reduce power, and improve device utilization (for example, using block RAM for a FSM) Different tactics must be used when your device is full Timing does not matter if your design does not fit in the device The tactics that will be discussed generally work best in designs that are not full One of the most effective ways to reduce power in FPGAs is to reduce the number of LUTs and FFs One of the side benefits of these techniques is that they will allow you to improve performance and reduce power Try turning off the Logic Replication synthesis option Page 5

6 Limiting FPGA Resources
Spartan-6, and Virtex-6 devices use a 6-input LUT, which allows you to pack more logic into your LUT However, synthesis tools cannot remove added pipeline registers, so if your design is migrating you will have to recode your design Build a design that uses fewer registers Many designs run out of registers before other components (especially if the design is heavily pipelined) Registers are most often the limiting resource in Spartan-6 and Virtex-6, so try to use as much of the SRLs, RAM, and DSP slice resources as possible if your design is migrating to a 6-input LUT architecture you might need to reduce the number of LUTs required and thus reduce the number of layers of logic. This means that fewer (or possibly no) pipeline registers may be needed; therefore, the designer may have to manually move or remove the pipeline registers. It is ALWAYS wise to maximize the use of dedicated silicon resources as it guarantees performance through that portion of the design. The vast majority of the time it will also reduce power consumption. As a general rule, infer as much as possible. By inferring resources, these resourced can be individually steered toward fabric or silicon resource via attributes in the code or within the constraint file. Instantiate only when there is no other way to generate the design that the user wants. Page 6

7 FPGA Registers Why are registers sometimes a scarce resource?
Control signal limitations that limit grouping of slice resources (this will be covered later) Coding for active high control signals Active low is not recommended for FPGAs Active low control signals do not save power Inappropriate replication of registers (logic replication) Careful use of synthesis options that may increase your design size is important Page 7

8 Control Signal Usage is a Concern
Eight registers per slice; all share the same control signals If the number of registers in the control set do not divide cleanly by eight, some registers may go unused This will depend on the remaining registers in the design being grouped into the same slice, which is determined by the remaining registers use of control signals This is of concern for designs that have several very low fanout control signals A design with a large number of control sets potentially can show lower device utilization (but not always) Designs with a small number of control sets are preferable The key is to evaluate slices and CLBs that have wasted registers With the Spartan-6 and Virtex-6 FPGAs, there are eight registers per slice. These are tied to the same clock, same clock enable, same set, and same reset. If any one of the control signals change, then those registers cannot be put in the same slice. Tip: Evaluate the control signals that go to very few flip-flops. These are the control signals that should be minimized. Page 8

9 Flip-Flop Details Each flip-flop has four control ports FF
D – data input CK – clock CE – clock enable (Active High) SR – async/sync set/reset (Active High) Either Set or Reset can be implemented (not both) All eight flip-flops share the same control signals CE – Clock Enable SR – Set/Reset D CE SR Q FF CK Page 9

10 Software Software intelligently packs logic Design FPGA LUT LUT
Slice LUT LUT LUT This process is called “related packing,” and is a function of MAP. It is always enabled. It will only be possible if the control signals associated with the FFs are identical. You can see the amount of related and unrelated packing by looking at the MAP report (map.mrp). LUT Related logic and flip-flops are coded Software places the logic and flip-flop in the same slice Software packs slices for optimum performance Page 10

11 Flip-Flops – Control Signals
Different flip-flop configurations If coded registers do not map cleanly to Virtex-6/Spartan-6 FPGA flip-flops, the software tools will automatically implement the missing functionality by using additional slice resources Can increase overall LUT utilization Case Design FPGA CE active Low Both Synchronous Set and Reset are used In the Virtex-6 and Spartan-6 FPGAs, code that calls for these additional features such as Inversion of the control signals. Both Set and Reset ports. are still supported; however, the software will automatically implement equivalent logic by using LUT resources. Both the inverter and OR gate shown in the examples above can be implemented using LUT resources. This may increase your overall LUT usage. For new designs, it is best to consider the capabilities of the Virtex-6 and Spartan-6 FPGA flip-flops when coding. Use active high resets and chip enables. Avoid circuits that will require both Set and Reset controls. D Q CE CK CE D Q D CK D Q CK Sset SReset D D Q Sset SReset SR CK Software uses logic to map extra control functions Page 11

12 Control Set Reduction Flip-flops with different control sets cannot be packed into the same slice Software can be instructed to reduce the number of control sets by mapping control logic to LUT resources This results in higher LUT utilization, but a lower overall slice utilization Design FPGA This feature can be controlled using the “Reduce Control Sets” property of the synthesis process. In some instances, the increased combinatorial logic can be combined with existing logic, or placed in an unused LUT connected to the flip-flop. The overall increase in LUT utilization may be small. A design can only be implemented in a particular FPGA if the number of slices used by the design is less than or equal to the number that exist in that device. Therefore, reducing the total number of slices used can be important when trying to keep your FPGA small. D Q D Q CK CK D Q CK D 3 Slices D Sset Q 1 Slice Sset CK D Q CK D D Q SReset SReset CK Page 12

13 Related logic packed into single LUT
LUT – Combining LUTs with common input signals Design FPGA LUT6_2 LUT5 LUT5 f1 Shared Inputs f1 LUT5 LUT5 connections This feature can be controlled using the “LUT combining” property of the synthesis process. LUT combining can reduce the overall LUT usage of your design, but can have a negative impact on timing performance. f2 f2 LUTs are combined into the same slice LUTs that share common inputs will be packed into the same slice Two 5-input functions with common inputs Two 5-input functions maps to two LUTs Both LUTs share the same input signals Related logic packed into single LUT Page 13

14 Related SRL packed into single LUT
SRL – Combining SRLs with common input signals Design FPGA LUT6_2 SRL16 LUT5 SRL16 Only the address, clock, and write enable inputs of the SRL must be common for SRL combining to take place. The two LUT5s have independent DI and O ports. SRL16 LUT5 connections SRL16 Two SRL16 on a bus with common inputs Two SRL16 are used Both SRL16 share the same input signals LUTs are combined into the same Slice LUTs that share common inputs will be packed into the same slice Related SRL packed into single LUT Page 14

15 Introduction to Control Sets
A control signal is Clock Enable / Gate Enable Write Enable Set / Reset Preset / Clear Clock / Gate A control set is A group of enable, set, reset, and clock Unique control sets are The number of groups of unique control signals in your design The implementation tools cannot group flip-flops into the same slice if they do not share the same control signals However the tools can pack FFs into the slice if the control signals all come from the same set (next slide) Note that Xilinx still discourages designers from using latches. The gate enable is a control signal that is only used with latches. Note that VCC and Gnd are connected to each unused port (set, reset, CE, etc.). So if you are not using a set, reset, or clock enable, they will be tied to VCC or ground, depending on which level enables. So VCC and Gnd are also considered control signals. It is important to understand that a unique control set is a group that has a unique membership of elements. This means that if there are two registers and both use the same clock and reset, but different clock enables, then each register is part of a different control set. However, if one register uses a clock enable and the other does not, then both registers can be part of the same control set. Page 15

16 Question Can these FFs be placed into the same slice (are they a part of the same control set)? (Note…all control signals drive the control port of the FF) Case 1 FF1: CE, Set, Reset FF2: Set, Reset FF3: Reset Case 2 FF2: Set2, Reset Case 3 FF2: Set, not Reset Page 16

17 Answer Case 1… Case 2… Case 3…
FF1: CE, Set, Reset FF2: Set, Reset FF3: Reset Yes! The tools can pack FFs into the slice if the control signals all come from the same set. Case 2… FF2: Set2, Reset No, two different sets cannot be grouped into the same slice. Same is true for CEs and the other control signals. Case 3… FF2: Set, not Reset Maybe, if the Reset is synchronous then your synthesis tool should be able to drive the reset to a LUT input and invert the reset signal. Note…if the control signal can be implemented as a LUT input (synchronous set, synchronous reset, or CE) then the tools have more flexibility to group those FFs into the same slice. Page 17

18 Control Port Usage Rules
Control signals are the signals that are connected to the actual control ports on the register Clocks and asynchronous set/resets always become connected to FF control ports They cannot be moved to the datapath Clock enables and synchronous set/resets sometimes become connected to FF control ports (this is decided by the synthesis tool) These control signals can be moved to the datapath (to a LUT input) Asynchronous sets/resets have priority access to the control ports over synchronous sets/resets For example…If a global asynchronous reset and a local reset are inferred on a single register… The asynchronous reset gets the port on the register The synchronous reset gets a LUT input There is no coding style or synthesis option that allows users to control when a LUT will be used for this purpose Clock enables can be on data paths; you can feedback the output back to an input to the LUT. This makes the CE part of the datapath and, at times, this is a good decision because the CE no longer uses the CE port on the register. But synthesis tools generally make the CE a control signal so that it does not burn any of your LUT inputs. Likewise, synchronous set/resets can be mapped to LUT inputs, rather than using a registers control signal input. As you will see later, having the option to switch between the two implementations is useful since this enables those registers to be part of additional control sets. Tip: Clock enables and synchronous sets and resets can be moved to the datapath (via a LUT input) Page 18

19 Control Signals Problems
Instantiation of primitives and cores Gate-level connection of UNISIM and core primitives dictates control signal usage Be aware that some IP does not necessarily follow these guidelines Synthesis optimization Synthesis may choose to build a control signal for logic optimization Physical synthesis, design hierarchy, and incremental design practices Can change control sets from the original specifications (be careful) Global or logic optimization may choose to build a control signal for logic optimization Remember that when you instantiate a primitive, you are going to get what you build. So if you instantiate a register (or a core that uses a register) and connect that port up, it will use that control signal. Physical synthesis is most commonly done with Synopsys Amplify software. The FPGA architecture also dictates what types of control signals can be used. For example, block RAM has a synchronous output register. If you try to infer an asynchronous output, your synthesis tool will not give you that output register. Tip: The cores you instantiate should share the same control signals you infer. This will minimize the number of control sets in your design. Page 19

20 Active-Low Control Signals
Problem: Active-low control signals can produce sub-optimal results Why? Control ports on registers are active-high Hierarchical design and design re-use can propagate bad design practices This results in… Poor device utilization More LUTs Less dense slice packing More routing resources necessary This requires additional inverters at all lower leaf levels Longer run times Prohibits hierarchical design flows (like incremental design) More difficult timing Worse timing and power Active-low signals use more LUTs because they require inversion before they can directly drive the control port of a register. This inversion must be done with a LUT and thus takes up a LUT input. Likewise, because only a single reset can be brought to the register, if the remaining logic to be grouped does not use the same reset, CE, or set, it cannot be grouped into the same slice. If you use hierarchical design techniques (meaning you are using partitions, using keep hierarchy equals true, using cores, using multiple netlists, and using bottom-up synthesis), it can increase the number of control signals because the flip-flops do not have a programmable inverter. With hierarchical design, each signal would have a different name, and you would still have to run each signal through an inverter, which means that you are burning a LUT. Hierarchical design necessitates this, because designers want to keep hierarchy and partition their design. Tip: Use active-high signals for CEs, sets, and resets Page 20

21 Use Active-High Control Signals
Flip-Flop The inverters cannot be combined into the same slice This consumes more power and makes timing difficult Remember that the synthesis tools and the implementation tools cannot merge this logic into the same slice because the designer is not allowing optimization across a hierarchical boundary. Hierarchical design methods can proliferate LUT usage on active-low control signals Page 21

22 Design Tips FF1 Suggestions for faster and smaller designs
Use synchronous Set/Reset whenever possible Use active-high CE and Set/Reset (no local inverter for secondary control signals) Try to build your design with as few control signals as possible Rules to recognize Clocks and asynchronous set/resets always connect to the control port of the FF Asynchronous sets/resets have priority access to the control ports over synchronous sets/resets Clock enables and synchronous set/resets can become control signals D Q CE CK SR ● ● ● FF8 D Q CE CK SR Page 22

23 Where Can I Learn More? Software Manuals
Start  Xilinx ISE Design Suite 12.1  ISE Design Tools  Documentation  Software Manuals This includes the Synthesis & Simulation Design Guide This guide has example inferences of many architectural resources XST User Guide HDL language constructs and coding recommendations Software User Guides and software tutorials Xilinx Education Services courses Xilinx tools and architecture courses Hardware description language courses Basic FPGA architecture, Basic HDL Coding Techniques, and other Free videos! Page 23

24 Virtex-6 and Spartan-6 HDL Coding Techniques

25 Welcome If you are new to FPGA design, this module will help you code properly for Spartan-6 and Virtex-6 register resources These design techniques promote fast and efficient FPGA designs Page 25

26 After completing this module, you will able to:
Objectives After completing this module you will be able to… Code your design so you can infer more of the dedicated hardware resources Avoid the most common coding mistakes which hurt device utilization Reduce your dependence on global resets by taking advantage of the Global Set/Reset net (GSR) Page 26

27 Tactics to Meet Timing Use as many of the dedicated resources as possible (SRLs, DSP slices, and block RAMs) Understanding the features of the resource is essential if you are going to infer the resource This is critical to reducing the number of LUTs and registers in your FPGA design and increasing the amount of dedicated hardware you infer One of the most effective ways to reduce power in FPGAs is to reduce the number of LUTs and FFs One of the side benefits of these techniques is that they will allow you to improve performance and device utilization If you barely get a design to fit into an FPGA, then the PAR timing options are very limited. So if your design barely fits, you are going to probably need to reduce its size first. Many of the tips that will be discussed in this module are going to reduce the size of your design. Page 27

28 DSP Slice Uses a Synchronous Reset
Each DSP slice effectively has more than 250 registers None have an asynchronous reset The DSP slice is more versatile than most realize It can be used for multipliers, add/sub, MACC, counters (with programmable terminal count), comparators, shifters, multiplexer, pattern match, and many other logic functions Many designs that run out of slices are not fully utilizing their DSP slice resources Synthesis tools will infer the DSP slice resources for multipliers, but they are not smart enough to infer other functions Can control synthesis use with attributes, but NOT if an asynchronous reset is used It is important to remember that taking advantage of the dedicated hardware resources like this are going to provide extra slice resources, which will give the tools the ability to reach a higher design speed. Page 28

29 BRAM Uses a Synchronous Reset
Block RAMs obtain minimum clock-to-output time by using their output register Output registers only have synchronous resets Unused block RAMs can be used for many alternative purposes ROMs, large LUTs, complex logic, state machines, deep-shift registers, etc. Using unused block RAMs for other purposes can free up hundreds of flip-flops Using the block RAM in dual-port mode allows for greater utilization of this resource Many designs that run out of slices are not fully utilizing the block RAM resources Synthesis tools are not yet smart enough to infer less obvious functions Note that inferring a Block RAM requires a synchronous reset in your HDL. Also note that Block RAMs have an enable that gives precedence to the enable over the reset. Registers, however, give priority to the reset signal over the clock enable. Coding for this functionality requires anticipation by the designer. But with no local reset, the enable precedence has no consequence and special HDL coding is not necessary. Page 29

30 Synchronous Sets/Resets
Synchronous Sets and Resets give the tools more flexibility Can improve timing and device utilization Synthesis could choose to move low-fanout synchronous resets from a control signal to the datapath to free up more registers Synthesis tools can do this, but it may depend on synthesis settings and may not be on by default XST has a Use Synchronous Reset synthesis option (on by default) The Xilinx implementation tools cannot change what is synthesized This could allow packing of a register into a slice previously not possible This could eventually be the biggest reason. D S Low Fanout Page 30

31 Synchronous Sets/Resets
Synchronous sets/resets make FPGA designs more reliable Synchronous sets/resets are automatically timed Do not need any special timing constraints Do not need special switches or setting to analyze timing Synchronous reset nets are often the most critical net in a design Synchronous sets/resets are inherently more predictable Less susceptible to accidentally missing timing, runt pulses, or other phenomenon from upsetting logical functionality Less prone to a race condition Release of an asynchronous signal may not always have predictable results Synchronous resets are automatically timed and they are inherently more stable. Asynchronous resets are ignored by the implementation tools. If you are using them in your design, they can often lead to disastrous functionality. Synchronous resets do not need any special constraints in your design (unlike asynchronous resets). They also do not require any special switches in the timing analyzer for analysis (unlike asynchronous resets). Chances are that if you use synchronous resets, your design will work out of the box, if the timing analyzer says you are meeting timing and your functional simulation is working. With asynchronous resets, there is much less certainty, because there are a lot of things that can go unnoticed. Asynchronous resets are also prone to create meta-stable conditions. Tip: Synchronous resets enable your design to require minimal testing Page 31

32 Caveats to Synchronous Sets/Resets
Synchronous resets increase the number of constrained paths, may make timing more difficult, the design larger, and result in longer run times Why? The implementation tools automatically time synchronous reset paths This can result in More timing paths to analyze and meet timing On average ~five percent increase in the number of timing paths More replication of design resources With some synthesis tools this will use fewer SRLs, block RAM, DSP slices, and other dedicated hardware If you build a global asynchronous reset and do not put a FROM/TO constraint on the paths that include this net, the implementation tools will not perform any timing analysis on its paths. Likewise, if you do not assert the timing analyzer switch to report asynchronous delay paths, you may not do any analysis of these paths. The implementation tools can also do a poor job of routing these nets as well, because they are not constrained by default. If the reset becomes synchronous, the paths become timed by default. This will increase the number of timed paths (sometimes dramatically). The impact is that the implementation tools will now have much more work to do and this causes greater implementation times (run times). There have been cases where synthesis tools will simplify the synthesis result so that it does not take as much advantage of dedicated hardware when it determines that a design will be harder to meet timing. It basically infers very little to SRLs and block RAMs, for example. So the design gets built out of additional LUTs and flip-flops to try and get as lean as possible, which can make it harder to meet timing. Page 32

33 Changing to Synchronous Resets
All new code should use synchronous resets when a reset is necessary For existing code, you have three choices Leave alone Acknowledge the possible critical drawbacks of asynchronous resets Use synthesis switch (dangerous!) Not the same as changing to synchronous reset This can make the synthesis result different from the behavioral simulation Recommended: manually (or use a script) to change the asynchronous reset to synchronous Removing the top-level reset port does not get the same result as removing the reset from your code If you have the existing code and are not having any trouble meeting timing, are fitting your device, and are happy with the operation of your device then you may choose to do nothing. Synthesis tools do not like to give you logic that is not reflective of the code. So be careful with this option. Using these switches is not the same as manually changing code. Also note that some believe that if the global set/reset comes to a top of a port and disconnects the port, the optimization algorithms will rip out the reset. It does not work this way. The synthesis algorithms that do that optimization are after they have chosen what resources are to be mapped. So you still end up with a flip-flop that has an asynchronous reset. Likewise, connecting a reset port high or low in your HDL would not remove the reset from a design that uses the reset as part of a core, since the core would not be regenerated. Synplify: syn_clean_reset XST: -async_to_sync YES Page 33

34 Resets Two kinds of resets – Global and Local
Global…usually used to reset after configuration This is done by default after configuration of the FPGA and does not need to be coded into the design Access to this net is done with the GSR port from the Startup component (only necessary if you wish to perform a global reset s second time) Note…if you are coding in a global reset into your HDL you are actually coding in a second reset Some ASIC technologies require at most an initialization when they power up. But FPGAs do not require a reset. Local…used as a standard part of some components behavior FSM, counters, etc Most designs do not need a global reset in an FPGA. ASIC devices usually need resets. There are some ASIC technologies that at most require an initialization when they power up. FPGAs inherently have a global set/reset (GSR) that occurs during power up, and it does not need to be coded in to the design. If you are coding it in, you are actually coding in a second reset. The Startup-Virtex design element is used to interface device pins and logic to the Global Set/Reset (GSR) signal, the Global Tristate (GTS) dedicated routing, the internal configuration signals, or the input pins for the SPI PROM if a SPI PROM is used to configure the device. This primitive can also be used to specify a different clock for the device startup sequence at the End of Configuring of the device, and to access the configuration clock to the internal logic. The GSR input port is an active-high global set/reset signal and uses dedicated routed resources. The GSR does not use general interconnect. Page 34

35 Getting By Some designs can get away without any resets but many designs need some resets Very few designs require resets on all registers, but most designers want a global reset after initialization Most ASIC emulation also requires a described reset on every register. Implement this global reset with the built-in Global Set/Reset (GSR) GSR is good for initializing the values of your synchronous elements (FFs, Block RAMs) Delay of GSR is slow (3 clock cycles after configuration) so use it after configuration, but don’t reset again unless you can tolerate the entire design being reset Xilinx suggests that you selectively remove resets, or even better, selectively put them in. You can tell that you have used sufficient resets when your design simulates properly. If you can functionally simulate an RTL design, it should work in the FPGA. Page 35

36 Global Reset Net The GSR input is an active-high global set/reset net that is active at the end of configuration It uses a dedicated routing resource for signal distribution Saves general interconnect It can also be used to restore the initial state of the FFs in the FPGA at any time The initial state is communicated with an INIT attribute It drives the output FFs for each block RAM, but does not affect the contents of each memory or SRL It is connected to all synchronous elements through a wired OR gate This allows a local reset to also drive the FF’s set/reset port The Startup block also gives users access to the Global Tri-State net and configuration clock. Page 36

37 Startup Instantiation (VHDL)
VHDL and Verilog instantiations are available from the Xilinx Unified Libraries Guide Library UNISIM; use UNISIM.vcomponents.all; -- STARTUP_VIRTEX6: Virtex-6 Configuration Start-Up Sequence Interface -- Virtex-6 -- Xilinx HDL Libraries Guide, version 12.1 STARTUP_VIRTEX6_inst : STARTUP_VIRTEX6 generic map (PROG_USR=>"FALSE“) -- Activate program event security feature port map (… CLK => CLK, -- 1-bit User start-up clock GSR => GSR, -- 1-bit Active high Global Set/Reset signal GTS => GTS, -- 1-bit Active high Global 3-State signal ); -- End of STARTUP_VIRTEX6_inst instantiation Page 37

38 Inferring an Initialization (XST only)
If you have a reset, you can initialize all registers in VHDL / Verilog code SR will cause the flip-flop to be set to the state inferred here Inference is supported only for data types std_logic, bit_vector, bit, but NOT integer This is helpful for RTL simulation of the design If it functions during simulation, it should function on the FPGA Note…if you design without a reset in your design, you still get a free global reset This is an example of initialization. Synthesis tools do react on this code. Most people assume that this unsynthesizable code, but it will affect the init value of the register because that is tied to the registers primitive. This code will reset to a 0 when the GSR is released after configuration. This way, you can have a known value in your chip. This will also simulate to a 0 for start up during a functional simulation. If you do not do this and you have a counter, you will have Xs counting +1 and all you have is an x-counter. But if you use this with a counter or anything else, it will start with a known value. This is mandatory if you design without a reset. Note that this inference is currently not supported for Spartan-6. Inferring an initial value only works for data types of std_logic, bit_vector, bit, but NOT integer. This can also be set with a UCF INIT attribute set for each FF. Don’t forget that the SRL should not be initialized in your HDL (it will come up by default as 0). In the VHDL example, the signal my_register will only become a register if used in conjunction with a clock. For example, my_register <= new_information when rising_edge(clk); Otherwise, my_register could just become wires in which the initial value will only appear for a short time during simulation and have no effect in implmentation. VHDL: signal my_regsiter : std_logic_vector (7 downto 0) := (others <= ‘0’); Verilog: reg [7:0] my_register = 8’h00; Page 38

39 No Reset is Best Synthesis can infer SRL-based shift registers
But only if no resets are used (otherwise flip-flops are wasted) Or, the synthesis tool can emulate the reset This will uses extra resources and take extra clock cycles to set up (not what you want) There is no reset functionality built into the Shift-Register LUT. If you have a reset on the shift register you code, the synthesis tools are left with one or two choices. They either do not implement the SRL, meaning they will use a bunch of registers (not what you want), or they try to emulate the reset. Emulating the reset means that they add more logic and it becomes even slower than it should be. In one customer design, there was a global set/reset through everything and the known shift registers could not be connected. They in fact were inferring 100 SRLs and were doing a good job making good use of the dedicated hardware. By removing the global set/rest and obtaining 150 SRLs, the customer obtained even more uses of the SRL than was recognized. Can you infer an asynchronous reset with the SRL? Yes, but Synplify is the only tool to do that, and it adds logic to emulate the reset which eliminates the benefit of using the SRL. Synplify can also add the glue logic to emulate a synchronous or asynchronous reset when using the DSP48. But likewise, it kills the benefit of saving registers when using the SRL. Note that XST cannot infer an asynchronous reset with the SRL. Page 39

40 No Reset is Best Designs without resets have fewer timing paths
By an average of 18 percent fewer timing paths Results in less run time Improved performance Less memory necessary during PAR Tip: NO reset builds a faster design and saves run time Page 40

41 Use the GSR Routing can be considered one of the most valuable resources Resets compete for the same resources as the rest of the active signals of the design Including timing-critical paths More available routing gives the tools a better chance to meet your timing objectives You can see how much fanout a typical reset can have. This is probably the biggest reason to remove resets from your design. Can you migrate a reset to the global routing resources? Yes. Instantiate the STARTUP_VIRTEX6 component from the Xilinx Unified Library, add an IBUFG and connect it to the GSR input to the block, and assign the pin to a dedicated clock pin. The designer must also be aware that the tools will not automatically move resets to dedicated clock pins, and that they have to control this. Note that in the past, the implementation tools would infer the GSR signal for true global set/resets automatically. The implementation tools do not do this any more. Tip: Using the GSR saves routing and improves design speed Page 41

42 Block RAM Avoid “read before write” mode for fastest performance by instantiating your memory with the CORE Generator™ tool Synplify and other third-party synthesis tools can insert bypass logic to prevent a possible mismatch error between your RTL and hardware behavior Intended to force RAM outputs to a known value when read and write operations occur on the same memory cell If you know this will never happen you can prevent this logic from being added and damaging your performance with an attribute Attribute syn_ramstyle of mem : signal is “no_rw_check”; Note: BRAM can effectively be inferred and this “read before write” can easily be avoided in the process of coding. This gives the tools the additional advantage of using distributed RAM or BRAM based on synthesis options, or attributes/constraints. Page 42

43 Clock Enable Control the use of clock enables from the code
Code them only when needed If a low-fanout CE is necessary, use synthesis attributes to control the use of control signals at the signal or module level Do not use global switches to turn off the use of CEs Results in an average of 25-percent LUT increase Consider using alternative coding methods for low-fanout clock enables Xilinx is not recommending that you design asynchronously, just that you try to reduce designing low-fanout CEs (fewer than eight registers). For basic HDL Coding help refer to the Language Templates included with the ISE software. For more information about the proper use of XST, good HDL Coding techniques, and proper timing constraints refer to the XST User Guide forVirtex-6 and Spartan-6 Devices. This is available from Help -> Software Manuals from the ISE software. VHDL: Q <= ((not CE) AND A) OR (CE AND Q); Verilog: Q <= (~CE & A) | (CE & Q); VHDL: if (CE=‘1’) then Q <= A; Verilog: if (CE=‘1’) Q <= A; This will map the CE to a LUT input This will map the CE to the control port Page 43

44 Global Clock Enable To gate entire clock domains for power reduction, use the clock-enabled global buffer resource BUGCE or the BUFHCE For applications that only pause the clock on small areas of the design, use the clock enable pin of the FPGA register Tip: This will save general routing resources Page 44

45 DSP Slice Use adder chains instead of adder trees Adder Tree
Adder trees tend to have varying size This usually makes larger adders in the last stages, which increases logic levels Spartan-6 and Virtex-6 FPGAs uses adder chains which obtain peak performance and use minimal power Requires pipelining Adds latency Adder Chain For more information on this, refer to UG369:Virtex-6 FPGAs DSP48E1 Slice User Guide. It is not recommended that you build this by hand. Xilinx recommends using the Core Generator, Matlab, or Simulink to build these components. Adder Tree Page 45

46 Synthesis Options Over-constraining during synthesis can significantly increase register use Seen as an average increase from 1–5 percent Do NOT over-constrain during synthesis Global optimization can lead to mixed results Can achieve ~10 percent flip-flop reduction Gives back much of the utilization benefits (and sometimes more) due to control signals FSM optimization Turning off FSM optimization can yield a small flip-flop savings One-hot encoding is not as useful Do NOT use slice or LUT compression switches In some cases, latch-thrus are used and consume registers Synthesis tools react to timing constraints by replicating and making designs bigger, which is how they improve performance. Remember that the implementation tools are already using worst-case temperature and voltage, so you already have some built-in slack. Only put the timing constraints that you need. Global optimization is widening out your trees in terms of your global fan-in. It also uses your set and resets as a part of that. This allows the use of sets and reset signals as part of the optimizations path, which is great at improving performance. Some designs benefit with this option, other do not. Most designs end up losing registers, but increase the number of control sets. FSM optimization can yield more significant benefits if your design has many FSMs. With the Virtex-6 FPGA, the 6-LUT means one-hot encoding is not quite as useful as it used be with the 4-LUT structure. Often, you can still meet performance by using binary or other types of encoding schemes and still use much fewer registers. Note designers should NEVER overconstrain during synthesis or implementation. Page 46

47 Synthesis Options Replicate registers with high fan-out
This allows high fan-out logic to be moved closer to destinations This can be determined from a timing report Manual duplication or replication constraints with the synthesis tools should be applied Retiming option should be used, especially if design has been pipelined Pipelining is still encouraged Synthesis tools react to timing constraints by replicating and making designs bigger, which is how they improve performance. Page 47

48 I/O Registers Tip: Use IOB registers when necessary to meet I/O timing
IOB registers provide fixed setup and clock-to-output times Fastest way to capture input data and clock data off the device IOB register can make it difficult to meet internal timing Their use can lengthen route delays to internal logic Only use IOB registers when it is necessary to meet I/O timing It is best to allow your synthesis tool to put registers into IOBs based on timing constraints (if your tool supports this). Otherwise complete the following steps… Disable global I/O register usage in your synthesis tool Disable the Map option to pack registers into IOBs (PAR) Selectively move registers into IOB with a UCF attribute XST does NOT support migrating registers to the IOBs based on timing constraints. Synplify does support this. You can also assign the registers to the IOBs in your HDL or NCF (synthesis constraints file). UCF and NCF Syntax Example INST “instance_name” IOB={TRUE|FALSE}; where • TRUE allows the flip-flop or latch to be pulled into an IOB • FALSE indicates not to pull it into an IOB Example: The following statement instructs the mapper from placing the foo/bar instance into an IOB component. INST “foo/bar” IOB=TRUE; The instance name for each register can be found from the synthesis tools schematic viewer or from a timing report. Also note that –timing (or timing driven packing, another MAP option) does NOT move registers into the IOBs based on the timing constraints. Tip: Use IOB registers when necessary to meet I/O timing Page 48

49 Summary Avoid asynchronous resets on block RAMs (the block RAM’s output register only supports a synchronous reset) Avoid asynchronous resets on DSP slice resources (their flip-flops only support a synchronous reset) IOB registers can make it more difficult to meet internal timing Use IOB registers only for improving IO timing Xilinx recommends NOT using the synthesis option to convert asynchronous resets to synchronous Page 49

50 Summary Synthesis tools can move synchronous resets from control ports to the data path Avoid the use of global resets Initialize all registers from your HDL If you need a global reset use the Startup_Virtex6 or the Startup_Spartan6 primitive to access the GSR net If you can remove a global reset, you will save a lot of routing and build a faster design Avoid resets on SRLs (no reset functionality) Page 50

51 Where Can I Learn More? Software Manuals
Start  Xilinx ISE Design Suite 12.1  ISE Design Tools  Documentation  Software Manuals This includes the Synthesis & Simulation Design Guide This guide has example inferences of many architectural resources XST User Guide HDL language constructs and coding recommendations Software User Guides and software tutorials Xilinx Education Services courses Xilinx tools and architecture courses Hardware description language courses Basic FPGA architecture, Basic HDL Coding Techniques, and other Free videos! Page 51

52 Trademark Information
Xilinx is disclosing this Document and Intellectual Property (hereinafter “the Design”) to you for use in the development of designs to operate on, or interface with Xilinx FPGAs. Except as stated herein, none of the Design may be copied, reproduced, distributed, republished, downloaded, displayed, posted, or transmitted in any form or by any means including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise, without the prior written consent of Xilinx. Any unauthorized use of the Design may violate copyright laws, trademark laws, the laws of privacy and publicity, and communications regulations and statutes. Xilinx does not assume any liability arising out of the application or use of the Design; nor does Xilinx convey any license under its patents, copyrights, or any rights of others. You are responsible for obtaining any rights you may require for your use or implementation of the Design. Xilinx reserves the right to make changes, at any time, to the Design as deemed desirable in the sole discretion of Xilinx. Xilinx assumes no obligation to correct any errors contained herein or to advise you of any correction if such be made. Xilinx will not assume any liability for the accuracy or correctness of any engineering or technical support or assistance provided to you in connection with the Design. THE DESIGN IS PROVIDED “AS IS" WITH ALL FAULTS, AND THE ENTIRE RISK AS TO ITS FUNCTION AND IMPLEMENTATION IS WITH YOU. YOU ACKNOWLEDGE AND AGREE THAT YOU HAVE NOT RELIED ON ANY ORAL OR WRITTEN INFORMATION OR ADVICE, WHETHER GIVEN BY XILINX, OR ITS AGENTS OR EMPLOYEES. XILINX MAKES NO OTHER WARRANTIES, WHETHER EXPRESS, IMPLIED, OR STATUTORY, REGARDING THE DESIGN, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS. IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT, EXEMPLARY, SPECIAL, OR INCIDENTAL DAMAGES, INCLUDING ANY LOST DATA AND LOST PROFITS, ARISING FROM OR RELATING TO YOUR USE OF THE DESIGN, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE TOTAL CUMULATIVE LIABILITY OF XILINX IN CONNECTION WITH YOUR USE OF THE DESIGN, WHETHER IN CONTRACT OR TORT OR OTHERWISE, WILL IN NO EVENT EXCEED THE AMOUNT OF FEES PAID BY YOU TO XILINX HEREUNDER FOR USE OF THE DESIGN. YOU ACKNOWLEDGE THAT THE FEES, IF ANY, REFLECT THE ALLOCATION OF RISK SET FORTH IN THIS AGREEMENT AND THAT XILINX WOULD NOT MAKE AVAILABLE THE DESIGN TO YOU WITHOUT THESE LIMITATIONS OF LIABILITY. The Design is not designed or intended for use in the development of on-line control equipment in hazardous environments requiring fail-safe controls, such as in the operation of nuclear facilities, aircraft navigation or communications systems, air traffic control, life support, or weapons systems (“High-Risk Applications”). Xilinx specifically disclaims any express or implied warranties of fitness for such High-Risk Applications. You represent that use of the Design in such High-Risk Applications is fully at your risk. © 2012 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners.


Download ppt "Virtex-6 and Spartan-6 HDL Coding Techniques"

Similar presentations


Ads by Google