2Objectives After completing this module, you will be able to: Specify FPGA resources that may need to be instantiatedIdentify some basic design guidelines that successful FPGA designers followSelect a proper HDL coding style for fast, efficient circuitsNote: The guidelines in this module are not specific to any particular synthesis tool or Xilinx FPGA family.
3Breakthrough Performance Three steps to achieve breakthrough performance1. Utilize dedicated resourcesDedicated resources are faster than a LUT/flip-flop implementation and consume less powerTypically built with the CORE Generator tool and instantiatedDSP48E, FIFO, block RAM, ISERDES, OSERDES, EMAC, and MGT, for example2. Write code for performanceUse synchronous design methodologyEnsure the code is written optimally for critical pathsPipeline when necessary3. Drive your synthesis toolTry different optimization techniquesAdd critical timing constraints in synthesisPreserve hierarchyApply full and correct constraintsUse High effortPerformance MeterVirtex™-6 FPGANote that “applying full and correct constraints” refers to applying constraints for all clocks in the design for use during implementation. Xilinx does not currently (this may change in the future) recommend the use of timing constraints for use during synthesis when synthesizing with XST. Additionally, false paths and multicycle paths should be correctly constrained, as should the I/O.The timing closure flow chart was created to help achieve breakthrough performance.
4Use Dedicated Blocks Dedicated block timing is correct by construction Not dependent on programmable routingUses less powerOffers as much as 3x the performance of soft implementationsExamplesBlock RAM and FIFO at 600 MHzDSP48E at 600 MHzDSP48E SliceSmart RAM FIFOFIFODual-PortBRAMPhy InterfaceRxStats MxTxEMAC CoreHost InterfaceStatistics InterfaceClient InterfaceHost BusDCR BusProcessorInterfaceEMAC Core
6Instantiation versus Inference Instantiate a component when you must dictate exactly which resource is neededThe synthesis tool is unable to infer the resourceThe synthesis tool fails to infer the resourceXilinx recommends inference whenever possibleInference makes your code more portableXilinx recommends using the CORE Generator software to create functions such as Arithmetic Logic Units (ALUs), fast multipliers, and Finite Impulse Response (FIR) filters for instantiationXilinx recommends using the Architecture Wizard utility to create DCM, PLL, and clock buffer instantiationsInstantiation: Directly referencing a library primitive or macro in your HDL.Inference: Writing a Register Transfer Level (RTL) description of circuit behavior that the synthesis tool converts into library primitives.Why instantiate?Instantiation is useful when you cannot infer the component. For example, inferring the DCM component of a Spartan-6 FPGA is not possible; hence, instantiating the DCM block is the only way to use it.
7FPGA Resources Can be inferred by all synthesis tools Shift register LUT (SRL16E/ SRLC32E)F7 and F8 multiplexersCarry logicMultipliers and counters using the DSP48EGlobal clock buffers (BUFG)SelectIO™ (single-ended) interfaceI/O registers (single data rate)Input DDR registersCan be inferred by some synthesis toolsMemoriesGlobal clock buffers (BUFGCE, BUFGMUX, BUFGDLL)Some DSP functionsCannot be inferred by any synthesis toolsSelectIO (differential) interfaceOutput DDR registersDCM / PLLLocal clock buffers (BUFIO, BUFR)
8Suggested Instantiation Xilinx recommends that you instantiate the following elementsMemory resourcesBlock RAMs specifically (use the CORE Generator software to build large memories)SelectIO interface resourcesClocking resourcesDCM, PLL (use the Architecture Wizard)IBUFG, BUFGMUX_CTRL, BUFGCEBUFIO, BUFR
9Suggested Instantiation Why does Xilinx suggest this?Easier to port your HDL to other and newer technologiesFewer synthesis constraints and attributes to pass onKeeping most of the attributes and constraints in the Xilinx User Constraints File (UCF) keeps it simple—one file contains critical informationCreate a separate hierarchical block for instantiating these resourcesAbove the top-level block, create a Xilinx “wrapper” with instantiations specific to XilinxInstead use VHDL configuration statements or put wrappers around each instantiationThis maintains hierarchy and makes it easy to swap instantiationsSTARTUPXilinx “wrapper” top_xlnxTop-LevelBlockOBUFIBUFGDCMBUFGOBUFIBUFOBUF
10Hierarchy Management Synplify and XST software The basic settings are Flatten the design: Allows total combinatorial optimization across all boundariesMaintain hierarchy: Preserves hierarchy without allowing optimization of combinatorial logic across boundaries (recommended)If you have followed the synchronous design guidelines, use the setting -maintain hierarchyIf you have not followed the synchronous design guidelines, use the setting -flatten the design.Consider using the “keep” attribute to preserve nodes for testingYour synthesis tool may have additional settingsRefer to your synthesis documentation for details on these settingsTo access hierarchy control:Synplify software: SCOPE Constraints EditorSynplify also has an additional setting: Maintain hierarchy but allow optimization. This setting allows combinatorial logic to be optimized while maintaining hierarchy in the netlist (setting in Synplify is “firm”).XST: Turn on the Advanced Property Display level in the Edit Preferences dialog box. Then look under Properties for the Synthesize process Synthesis Options tab Keep Hierarchy.
11Hierarchy Preservation Benefits Easily locate problems in the code based on the hierarchical instance names contained within static timing analysis reportsEnables floorplanning and incremental design flowThe primary advantage of flattening is to optimize combinatorial logic across hierarchical boundariesIf the outputs of leaf-level blocks are registered, there is generally no need to flattenRegistering outputs of each leaf-level block is part of the synchronous design techniques methodology. Registering the output boundaries helps because you know the delays from one block to the next. That is, the delays are not variable based on combinatorial outputs. Logic cannot be optimized across a registered boundary. Therefore, if you do register outputs, you know the delay is minimized from one hierarchical or functional block to the next and you also know that no logic optimization can occur across hierarchical domains.In addition to the benefits listed above, preserving hierarchy has the added benefit of limiting name changes to registers—thus, the element names used in a UCF will generally not change. If you flatten the design, the register and element names and hierarchical path and references in a flattened design can change from one iteration to the next. In this case, maintaining the UCF can be quite a burden.However, preserving hierarchy can prevent register balancing (retiming) and register duplication. Nevertheless, the benefits of preserving hierarchy generally outweigh the benefits of flattening except when you have combinatorial outputs.And in general, preserve hierarchy for large designs. For smaller designs, preserve the hierarchy if you registered leaf-level outputs; otherwise, you might consider flattening the design. If you flatten the design, remember the extra burdens of name changes (UCF and static timing analysis) from one iteration to the next and the limits on floorplanning.
12Multiplexers Multiplexers are generated from IF and CASE statements IF/THEN statements generate priority encodersUse a CASE statement to generate complex encodingThere are several issues to consider with a multiplexerDelay and sizeAffected by the number of inputs and number of nested clauses to an IF/THEN or CASE statementUnintended latches or clock enablesGenerated when IF/THEN or CASE statements do not cover all conditionsReview your synthesis tool warningsCheck by looking at the component with a schematic viewer
13IF/THEN Statement Priority Encoder Most critical input listed first Least critical input listed lastdo_cdo_econd_ccond_bdo_bcond_ado_acrit_sigdo_doput1IF (crit_sig) THEN oput <= do_d ;ELSIF cond_a THEN oput <= do_a;ELSIF cond_b THEN oput <= do_b;ELSIF cond_c THEN oput <= do_c;ELSE oput <= do_e;END IF;
14Avoid Nested IF and IF/ELSE Nested IF or IF/THEN/ELSE statements form priority encodersCASE statements do not have priorityIf nested IF statements are necessary, put critical input signals on the first IF statementThe critical signal ends up in the last logic stage
15CASE StatementsCASE statements in a combinatorial process (VHDL) or always statement (Verilog)Latches are inferred if outputs are not defined in all branchesUse default assignments before the CASE statement to prevent latchesCASE statements in a sequential process (VHDL) or always statement (Verilog)Clock enables are inferred if outputs are not defined in all branchesThis is not “wrong”, but might generate a long clock enable equationUse default assignments before CASE statement to prevent clock enables
16CASE Statements Register the select inputs if possible (pipelining) Can reduce the number of logic levels between flip-flopsConsider using one-hot select inputsEliminating the select decoding can improve performanceDetermine how your synthesis tool synthesizes the order of the select linesIf there is a critical select input, this input should be included “last” in the logic for fastest performance
17CASE StatementThis Verilog code describes a 6:1 multiplexer with binary-encoded select inputsThis uses fewer LUTs, but requires multiple LUTs in series on the timing critical pathThe advantage of using the “don’t care” for the default, is that the synthesizer will have more flexibility to create a smaller, faster circuitHow could the code be changed to use one-hot select inputs?module case_binary (clock, sel, data_out, in_a,in_b, in_d, in_c, in_e, in_f) ;input clock ;input [2:0] sel ;input in_a, in_b, in_c, in_d, in_e, in_f ;output data_out ;reg data_out;clock)begincase (sel)3'b000 : data_out <= in_a;3'b001 : data_out <= in_b;3'b010 : data_out <= in_c;3'b011 : data_out <= in_d;3'b100 : data_out <= in_e;3'b101 : data_out <= in_f;default : data_out <= 1'bx;endcaseendendmoduleVHDL version (entity declaration omitted):process (clock)beginif (clock’event and clock = ‘1’) thencase (sel) iswhen “000” => data_out <= in_a;when “001” => data_out <= in_b;when “010” => data_out <= in_c;when “011” => data_out <= in_d;when “100” => data_out <= in_e;when “101” => data_out <= in_f;when others => data_out <= ‘X’;end case;end if;end process;
18CASE Statement This is the same code with one-hot select inputs This used more LUTs, but requires fewer logic levels on the timing critical pathThis yields a greater benefit when the mux is largerEnumerated types allow you to quickly test different encoding…and makes simulation more readablemodule case_onehot (clock, sel, data_out, in_a,in_b, in_d, in_c, in_e, in_f) ;input clock ;input [5:0] sel ;input in_a, in_b, in_c, in_d, in_e, in_f ;output data_out ;reg data_out;clock)begincase (sel)6'b : data_out <= in_a;6'b : data_out <= in_b;6'b : data_out <= in_c;6'b : data_out <= in_d;6'b : data_out <= in_e;6'b : data_out <= in_f;default : data_out <= 1'bx;endcaseendendmoduleVHDL version (entity declaration omitted):process (clock)beginif (clock’event and clock = ‘1’) thencase (sel) iswhen “000001” => data_out <= in_a;when “000010” => data_out <= in_b;when “000100” => data_out <= in_c;when “001000” => data_out <= in_d;when “010000” => data_out <= in_e;when “100000” => data_out <= in_f;when others => data_out <= ‘X’;end case;end if;end process;
19Other Basic Performance Tips Avoid high-level loop constructsSynthesis tools may not produce optimal resultsOrder and group arithmetic and logical functions and operatorsA <= B + C + D + E; should be: A <= (B + C) + (D + E)Use a synchronous resetMore reliable system controlAvoiding inadvertent latch inference can easily be accomplished with default assignments.// Default assignments before if-then-else or case statement// Now, all outputs are assigned in every branch – avoiding// inadvertent latch inferenceout1 = 1’b0;out2 = input2;out3 = out3_registered;if (a = b)out1 = a;else if (a = c)out2 = b;else if (a = d)out3 = c;
20Synchronous Design Rewards Always make your design synchronousRecommended for all FPGAsFailure to use synchronous design can potentiallyWaste device resourcesNot using a synchronous element will not save silicon and it wastes moneyWaste performanceReduces capability of end products; higher speed grades cost moreLead to difficult design processDifficult timing specifications and tool-effort levelsCause long-term reliability issuesProbability, race conditions, temperature, and process effectsSynchronous designs haveFew clocksSynchronous resetsNo gated clocks; instead, clock enablesAll FPGAS require synchronous design techniques to run reliably. Synchronous design techniques will also help the software tools and lead to higher performance implementations in lower speed grade (lower cost) devices.All the challenging silicon design problems of balanced (near zero skew) clock trees and synchronous elements have already been solved by Xilinx when it manufactures the devices. Because the clock networks exist, and every block can (or has to) be synchronous, make the most of the device and design synchronously.
21Inferred Register Examples Ex 1 D Flip-FlopEx 2. D Flip-Flop with Asynch PresetCLOCK)Q = D_IN;CLOCKor posedge RESET)if (RESET)Q = 0;elseCLOCK orposedge PRESET)if (PRESET)Q = 1;elseQ = D_IN;CLOCK)if (RESET)Q = 0;Ex 3. D Flip-Flop with Asynch ResetEx 4. D Flip-Flop with Synch Reset
22Clock Enables Coding style will determine if clock enables are used VHDLFF_AR_CE: process(ENABLE,CLK)beginif (CLK’event and CLK = ‘1’) thenif (ENABLE = ‘1’) thenQ <= D_IN;end if;end processVerilogCLOCK)if (ENABLE)Q = D_IN;
23SummaryUse as much of the dedicated hardware resources as possible to ensure optimum speed and device utilizationPlan on instantiating clocking and memory resourcesTry to use the Core Generator tool to create optimized components that target dedicated FPGA resources (BRAM, DSP48E, and FIFO)Maintain your design hierarchy to make debugging, simulation, and report generation easier
24SummaryCASE and IF/THEN statements produce different types of multiplexersCASE statements tend to build logic in parallel while IF/THEN statements tend to build priority encodersAvoid nested CASE and IF/THEN statementsYou should always build a synchronous design for your FPGAInferring many types of flip-flops from HDL code is possibleSynchronous sets/resets are preferred
25Where Can I Learn More? Software Manuals Start Xilinx ISE Design Suite 13.1 ISE Design Tools Documentation Software ManualsThis includes the Synthesis & Simulation Design GuideThis guide has example inferences of many architectural resourcesXST User GuideHDL language constructs and coding recommendationsSoftware User Guides and software tutorialsXilinx Education Services coursesXilinx tools and architecture coursesHardware description language coursesBasic FPGA architecture and other topics
28Objectives After completing this module, you will be able to: Identify some basic design guidelines that successful FPGA designers followSelect a proper HDL coding style for fast, efficient finite state machinesEasily pipeline your designNote: The guidelines in this module are not specific to any particular synthesis tool or Xilinx FPGA family.
29State Machine Design Put the next-state logic in one CASE statement The state register can also be included here or in a separate process block or always blockPut the state machine outputs in a separate process or always blockPrevents resource sharing, which can hurt performanceInputs to FSMS2S1S3StateMachineModuleFinite State Machines (FSMs) are faster when they are in separate processes because the combinatorial logic does not share resources; hence, logic can be combined into a single Look-Up Table (LUT).S5S4HDL CodeNext-state logicState registerState machine outputs
30The Perfect State Machine The perfect state machine has…Inputs: Input signals and state jumpsOutputs: Output states, control signals, and enable signals to the rest of the designNO arithmetic logic, datapaths, or combinatorial functions inside the state machineCurrent State Feedback to Drive State JumpsStateJumpsOnly!Next StateOutput State and EnablesState RegisterInput Signals
31State Machine Encoding Use enumerated types to define state vectors (VHDL)Most synthesis tools have commands to extract and re-encode state machines described in this wayUse one-hot encoding for high-performance state machinesUses more registers, but simplifies next-state logicExamine trade-offs: Gray and Johnson encoding styles can also improve performanceRefer to the documentation of your synthesis tool to determine how your synthesis tool chooses the default encoding schemeRegister state machine outputs for higher performanceOne-hot: The advantage of using one-hot encoding in Xilinx FPGAs is that the next-state decoding logic can be simplified to logic equations with six inputs or fewer, which can fit into a single LUT. This maximizes the performance of the state machine.Many synthesis tools automatically choose one-hot encoding for state machines when you target a Xilinx FPGA, so check your synthesis tools documentation.
32Benefits of FSM Encoding BinarySmallest (fewest registers)Complex FSM tends to build multiple levels of logic (slow)Synthesis tools usually map to this encoding when FSM has eight or fewer statesOne-hotLargest (more registers), but simplifies next-state logic (fast)Synthesis tools usually map this when FSM has between 8 and 16 statesAlways evaluate undefined states (you may need to cover your undefined states)Gray and JohnsonEfficient size and can have good speedWhich is best?Depends on the number of states, inputs to the FSM, complexity of transitionsHow do you determine which is best?Build your FSM and then synthesize it for each encoding and compare size and speedBy choosing an enumerated type for your FSM, you can easily experiment with each of these encoding techniques.You also can simulate with binary encoding (easy to read) and then re-synthesize with a different encoding before implementation.
33State Machine Example (Verilog) module STATE(signal_a, signal_b, clock, reset, usually_one, usually_zero); input signal_a, signal_b, clock, reset; output usually_one, usually_zero; reg [4:0] current_state, next_state; parameter s0 = 0, s1 = 1, s2 = 2, s3 = 3, s4 = 4; clock or posedge reset) begin if (reset) current_state <= s0; end else current_state <= next state;This is an example of a simple state machine. Note that the outputs are not defined in this always block.The first always block is the synchronous portion of the state machine. On a reset, the state machine returns to s0. Otherwise, the next state is loaded on each clock edge.Outputs are not defined here (good)Placed in a separate always blockAsynchronous reset (bad)
34State Machine Example (Verilog) (current_state or signal_a or signal_b) begin case (current_state) s0: if (signal_a) next_state = s0; else next_state = s1; s1: if (signal_a && ~signal_b) next_state = s4; next_state = s2; s2: next_state = s4; s3: next_state = s3; s4: next_state = s0; default: next_state = ‘bx; endcase end endmoduleThis always block shows the next-state logic.The state machine output logic is not shown here, but it would use a CASE statement similar to this one to determine the values of the outputs usually_one and usually_zero based on the current state (and perhaps signal_a and signal_b).Use a default statement as part of your next state assignments (good)
35Binary Encoding (Verilog) Test different FSM encodings yourself (good)Don’t always trust your synthesis tool to choose the best encodingreg [3:0] current_state, next_state; parameter state1 = 2’b00, state2 = 2’b01, state3 = 2’b10, state4 = 2’b11; (current_state) case (current_state) state1 : next_state = state2; state2 : next_state = state3; state3 : next_state = state4; state4 : next_state = state1; endcase (posedge clock) current_state = next_state;The previous example used integers to represent the states. You may also use a PARAMETER statement to explicitly define the state values. This example shows a binary encoded state machine.
37State Machine Example (VHDL) library IEEE; use IEEE.std_logic_1164.all; entity STATE is port ( signal a, signal b: in STD_LOGIC; clock, reset: in STD_LOGIC; usually_zero, usually_one: out STD_LOGIC ); end STATE; architecture STATE_arch of STATE is type STATE_TYPE is (s0,s1, s2, s3); signal current_state, next_state: STATE_TYPE; signal usually_zero_comb, usually_one_comb : STD_LOGIC; beginThis is an example of a simple state machine.
38State Machine Example (VHDL) COMB_STATE_MACHINE: process(current_state, signal a, signal b) begin next_state <= s0; usually_zero_comb <= '0'; usually_one_comb <= '1'; -- set default to one and reset to zero when necessary case current_state is when s0 => next_state <= s1; if signal a = '1' then end if; when s1 => next_state <= s2; if signal a='1' AND signal b = '0' then next_state <= s3; usually_zero_comb <= '1'; when s2 => when s3 => usually_one_comb <= '0'; when others => end case; end process;Default state is used to define output values (good)This process contains the next-state logic and the state machine output logic.You could also separate the state machine outputs into their own process, which is recommended for larger and more complex state machines.
39State Machine Example (VHDL) SYNCH_STATE_MACHINE: process(clock, reset) begin if (reset = '1') then current_state <= s0; usually_zero <= '0'; usually_one <= '1'; elsif (clock'event and clock = '1') then current_state <= next_state; usually_zero <= usually_zero_comb; usually_one <= usually_one_comb; end if; end process; end STATE_arch;This is the synchronous portion of the state machine. On a reset, the state machine returns to s0. Otherwise the next state is loaded on each clock edge.Notice that the state machine outputs are also registered in this process. If the state machine outputs were purely combinatorial, they would not be included here.Asynchronous reset (bad, unreliable)
40Unspecified Encoding (VHDL) entity EXAMPLE is port( A,B,C,D,E, CLOCK: in std_logic; X,Y,Z: out std_logic); end EXAMPLE; architecture XILINX of EXAMPLE is type STATE_LIST is (S1, S2, S3, S4, S5, S6, S7); signal STATE: STATE_LIST; begin P1: process( CLOCK ) begin if( CLOCK’event and CLOCK = ‘1’) then case STATE is when S1 => X <= ‘0’; Y <= ‘1’; Z <= ‘1’; if( A = ‘1’ ) then STATE <= S2; else STATE <= S1;Most synthesis tools will implement an “unspecified encoding” as a binary encoded state machine. The synthesis tool assigns state values starting with the leftmost value in the list. In this example, S1 = 000, S2 = 001, … and S7 = 110.Undefined encoding (bad, probably inefficient)
41One-Hot Encoding (VHDL) architecture one-hot_arch of one-hot is subtype state_type is std_logic_vector(5 downto 0); signal current_state, next_state: state_type; constant s0 : state_type := "000001"; constant s0_bit : integer := 0; constant s1_bit : integer := 1; constant s2_bit : integer := 2; constant s3_bit : integer := 3; constant s4a_bit : integer := 4; constant s4b_bit : integer := 5; signal usually_zero_comb, usually_one_comb : std_logic; begin comb_state_machine: process(current_state, signal a, signal b, signal c, signal d) next_state <= state_type'(others => '0'); if current_state(s0_bit) = '1' then if signal a = '1' then next_state(s0_bit) <= '1'; else next_state(s1_bit) <= '1'; end if; if current_state(s1_bit) = '1' then next_state(s4a_bit) <= '1'; end;Most synthesis tools have options to compile state machines as binary or one-hot. If your synthesis tool has this option, use the “Unspecified Encoding” example along with this synthesis option to create a one-hot state machine.Some synthesis tools also allow for an attribute of “one-hot” on a user-defined type. This is another way to get a one-hot state machine without resorting to the more cumbersome syntax shown here.OHE a little harder in VHDL (recommend using your synthesis tools attribute, if possible)
42Pipelining Concept fMAX = n MHz fMAX 2n MHz D Q D Q D Q D Q D Q two logic levelsDQDQInserting flip-flops into a datapath is called pipelining.Pipelining increases performance by reducing the number of logic levels (LUTs) between flip-flops.All Xilinx FPGA device families support pipelining. The basic slice structure is a logic level (six-input LUT) followed by a flip-flop.Adding a pipeline stage, as shown in this example, will not exactly double fMAX. The flip-flop that is added to the circuit has an input setup time and a clock-to-Q time that make the pipelined circuit run at less than double the original frequency.You will see a more detailed example of increasing performance by pipelining later in this section.one levelone levelfMAX 2n MHzDQDQDQ
43Pipelining Three situations in which to pipeline Register I/O Usually done by the designer from the beginningRegister the outputs of each lower leaf-level outputTypically done after timing analysisCan easily be done for purely combinatorial componentsRegister high-fanout secondary control signals (Set, Reset, CEs)These are just the most obvious suggestions. For every design, there may be more tricks or other clever things that can improve performance.Pipelining is the one thing that helps the most, and for most systems today, pipelining is always an option because bandwidth is what defines the system, not the latency. Latency can be important, but if it is, it is usually the latency in a different order of magnitude than the one that is caused by pipelining.FPGAs have lots of registers, so re-timing and clever use of arithmetic functions can yield tremendous performance. If designers need to balance the latency among different paths in the system, the SRLs can be used to compensate efficiently for delay differences.
44Performance by Design Code A Code B Switch D Q High fanout One level of logic, but the routing can be prohibitiveMay require higher speed grade, adding costDQEnabledata_inCEreg_dataDQSwitchCode BIn ‘Code B’ the ‘reg_data’ is enabled one clock cycle later, so it would be important to ensure that the ‘data_in’ is still valid. This may require the user to balance the latency on the input path for data_in in Code B. Recall that the data will be available one cycle later than Code A.Adding the pipeline flip-flop really costs nothing because it comes with the LUT. Again, you increase gate count and increase performance without adding to the cost.In practice, the coding style leads to preparing clock enables one cycle in advance of when they are required. Note that the addition of the register adds some latency that you will have to plan for as well.DQreg_enableOne level of logicMaximum time for routing of high fanout netFlip-flop adds nothing to the costData_in must also be registeredDQHigh fanoutDQEnableCEreg_dataDQDQdata_in
45Performance by Design (Verilog) These two pieces of code are not functionally identicalCode B (on the right) forms a pipeline stage for the circuit and improves its speed, Code A does NOTIn each casereg_data and data_in are 16-bit busesswitch and enable are outputs from flip-flopsCode ACode Bclk)beginif (switch && enable)reg_data <= data_in;endclk)beginif (set_in && enable_in)reg_enable <= 1'b1;elsereg_enable <= 1'b0;if (reg_enable)reg_data <= data_in;endThe code on the left performs a decode as part of the enable. The code on the right performs the decode separately to form a registered signal, which is then used to enable a bank of registers.Real designs may have a far more complicated set of rules for loading this register. It usually increases the levels of logic and the delay. Here is an example:If address=X”1B3D” thenreg_data <= processor_outputEnd if;
46Performance by Design (VHDL) These two pieces of code are not functionally identicalThe code on the right forms a pipeline stage for the circuit and improves its speedIn each casereg_data and data_in are 16-bit busesswitch and enable are outputs from flip-flopsCode Bcapture: process (clk)beginif clk'event and clk='1' thenif switch ='1’ and enable=‘1’ thenreg_enable <= ‘1’;elsereg_enable <= ‘0’;end if;if reg_enable='1’ thenreg_data <= data_in;end process;Code Acapture: process (clk)beginif clk'event and clk='1' thenif switch='1’ and enable=‘1’ thenreg_data <= data_in;end if;end process;The code on the left performs a decode as part of the enable. The code on the right performs the decode separately to form a registered signal which is then used to enable a bank of registers.Real designs may have a far more complicated set of rules for loading this register. It usually increases the levels of logic and the delay.If address=X”1B3D” thenreg_data <= processor_outputEnd if;
47SummaryWhen coding a state machine, separate the next-state logic from state machine output equationsEvaluate whether you need to use binary, one-hot, Gray, or Johnson encoding style for your FSMThis will yield a smaller and/or faster FSMPipeline data paths to improve speed
48Where Can I Learn More? Software Manuals Start Xilinx ISE Design Suite 13.1 ISE Design Tools Documentation Software ManualsThis includes the Synthesis & Simulation Design GuideThis guide has example inferences of many architectural resourcesXST User GuideHDL language constructs and coding recommendationsSoftware User Guides and software tutorialsXilinx Education Services coursesXilinx tools and architecture coursesHardware description language coursesBasic FPGA architecture and other topics