Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic HDL Coding Techniques

Similar presentations


Presentation on theme: "Basic HDL Coding Techniques"— Presentation transcript:

1 Basic HDL Coding Techniques
Part 1

2 Objectives After completing this module, you will be able to:
Specify FPGA resources that may need to be instantiated Identify some basic design guidelines that successful FPGA designers follow Select a proper HDL coding style for fast, efficient circuits Note: The guidelines in this module are not specific to any particular synthesis tool or Xilinx FPGA family.

3 Breakthrough Performance
Three steps to achieve breakthrough performance 1. Utilize dedicated resources Dedicated resources are faster than a LUT/flip-flop implementation and consume less power Typically built with the CORE Generator tool and instantiated DSP48E, FIFO, block RAM, ISERDES, OSERDES, EMAC, and MGT, for example 2. Write code for performance Use synchronous design methodology Ensure the code is written optimally for critical paths Pipeline when necessary 3. Drive your synthesis tool Try different optimization techniques Add critical timing constraints in synthesis Preserve hierarchy Apply full and correct constraints Use High effort Performance Meter Virtex™-6 FPGA Note that “applying full and correct constraints” refers to applying constraints for all clocks in the design for use during implementation. Xilinx does not currently (this may change in the future) recommend the use of timing constraints for use during synthesis when synthesizing with XST. Additionally, false paths and multicycle paths should be correctly constrained, as should the I/O. The timing closure flow chart was created to help achieve breakthrough performance.

4 Use Dedicated Blocks Dedicated block timing is correct by construction
Not dependent on programmable routing Uses less power Offers as much as 3x the performance of soft implementations Examples Block RAM and FIFO at 600 MHz DSP48E at 600 MHz DSP48E Slice Smart RAM FIFO FIFO Dual-Port BRAM Phy Interface Rx Stats Mx Tx EMAC Core Host Interface Statistics Interface Client Interface Host Bus DCR Bus Processor Interface EMAC Core

5 Timing Closure

6 Instantiation versus Inference
Instantiate a component when you must dictate exactly which resource is needed The synthesis tool is unable to infer the resource The synthesis tool fails to infer the resource Xilinx recommends inference whenever possible Inference makes your code more portable Xilinx recommends using the CORE Generator software to create functions such as Arithmetic Logic Units (ALUs), fast multipliers, and Finite Impulse Response (FIR) filters for instantiation Xilinx recommends using the Architecture Wizard utility to create DCM, PLL, and clock buffer instantiations Instantiation: Directly referencing a library primitive or macro in your HDL. Inference: Writing a Register Transfer Level (RTL) description of circuit behavior that the synthesis tool converts into library primitives. Why instantiate? Instantiation is useful when you cannot infer the component. For example, inferring the DCM component of a Spartan-6 FPGA is not possible; hence, instantiating the DCM block is the only way to use it.

7 FPGA Resources Can be inferred by all synthesis tools
Shift register LUT (SRL16E/ SRLC32E) F7 and F8 multiplexers Carry logic Multipliers and counters using the DSP48E Global clock buffers (BUFG) SelectIO™ (single-ended) interface I/O registers (single data rate) Input DDR registers Can be inferred by some synthesis tools Memories Global clock buffers (BUFGCE, BUFGMUX, BUFGDLL) Some DSP functions Cannot be inferred by any synthesis tools SelectIO (differential) interface Output DDR registers DCM / PLL Local clock buffers (BUFIO, BUFR)

8 Suggested Instantiation
Xilinx recommends that you instantiate the following elements Memory resources Block RAMs specifically (use the CORE Generator software to build large memories) SelectIO interface resources Clocking resources DCM, PLL (use the Architecture Wizard) IBUFG, BUFGMUX_CTRL, BUFGCE BUFIO, BUFR

9 Suggested Instantiation
Why does Xilinx suggest this? Easier to port your HDL to other and newer technologies Fewer synthesis constraints and attributes to pass on Keeping most of the attributes and constraints in the Xilinx User Constraints File (UCF) keeps it simple—one file contains critical information Create a separate hierarchical block for instantiating these resources Above the top-level block, create a Xilinx “wrapper” with instantiations specific to Xilinx Instead use VHDL configuration statements or put wrappers around each instantiation This maintains hierarchy and makes it easy to swap instantiations START UP Xilinx “wrapper” top_xlnx Top-Level Block OBUF IBUFG DCM BUFG OBUF IBUF OBUF

10 Hierarchy Management Synplify and XST software The basic settings are
Flatten the design: Allows total combinatorial optimization across all boundaries Maintain hierarchy: Preserves hierarchy without allowing optimization of combinatorial logic across boundaries (recommended) If you have followed the synchronous design guidelines, use the setting -maintain hierarchy If you have not followed the synchronous design guidelines, use the setting -flatten the design. Consider using the “keep” attribute to preserve nodes for testing Your synthesis tool may have additional settings Refer to your synthesis documentation for details on these settings To access hierarchy control: Synplify software: SCOPE Constraints Editor Synplify also has an additional setting: Maintain hierarchy but allow optimization. This setting allows combinatorial logic to be optimized while maintaining hierarchy in the netlist (setting in Synplify is “firm”). XST: Turn on the Advanced Property Display level in the Edit  Preferences dialog box. Then look under Properties for the Synthesize process  Synthesis Options tab  Keep Hierarchy.

11 Hierarchy Preservation Benefits
Easily locate problems in the code based on the hierarchical instance names contained within static timing analysis reports Enables floorplanning and incremental design flow The primary advantage of flattening is to optimize combinatorial logic across hierarchical boundaries If the outputs of leaf-level blocks are registered, there is generally no need to flatten Registering outputs of each leaf-level block is part of the synchronous design techniques methodology. Registering the output boundaries helps because you know the delays from one block to the next. That is, the delays are not variable based on combinatorial outputs. Logic cannot be optimized across a registered boundary. Therefore, if you do register outputs, you know the delay is minimized from one hierarchical or functional block to the next and you also know that no logic optimization can occur across hierarchical domains. In addition to the benefits listed above, preserving hierarchy has the added benefit of limiting name changes to registers—thus, the element names used in a UCF will generally not change. If you flatten the design, the register and element names and hierarchical path and references in a flattened design can change from one iteration to the next. In this case, maintaining the UCF can be quite a burden. However, preserving hierarchy can prevent register balancing (retiming) and register duplication. Nevertheless, the benefits of preserving hierarchy generally outweigh the benefits of flattening except when you have combinatorial outputs. And in general, preserve hierarchy for large designs. For smaller designs, preserve the hierarchy if you registered leaf-level outputs; otherwise, you might consider flattening the design. If you flatten the design, remember the extra burdens of name changes (UCF and static timing analysis) from one iteration to the next and the limits on floorplanning.

12 Multiplexers Multiplexers are generated from IF and CASE statements
IF/THEN statements generate priority encoders Use a CASE statement to generate complex encoding There are several issues to consider with a multiplexer Delay and size Affected by the number of inputs and number of nested clauses to an IF/THEN or CASE statement Unintended latches or clock enables Generated when IF/THEN or CASE statements do not cover all conditions Review your synthesis tool warnings Check by looking at the component with a schematic viewer

13 IF/THEN Statement Priority Encoder Most critical input listed first
Least critical input listed last do_c do_e cond_c cond_b do_b cond_a do_a crit_sig do_d oput 1 IF (crit_sig) THEN oput <= do_d ; ELSIF cond_a THEN oput <= do_a; ELSIF cond_b THEN oput <= do_b; ELSIF cond_c THEN oput <= do_c; ELSE oput <= do_e; END IF;

14 Avoid Nested IF and IF/ELSE
Nested IF or IF/THEN/ELSE statements form priority encoders CASE statements do not have priority If nested IF statements are necessary, put critical input signals on the first IF statement The critical signal ends up in the last logic stage

15 CASE Statements CASE statements in a combinatorial process (VHDL) or always statement (Verilog) Latches are inferred if outputs are not defined in all branches Use default assignments before the CASE statement to prevent latches CASE statements in a sequential process (VHDL) or always statement (Verilog) Clock enables are inferred if outputs are not defined in all branches This is not “wrong”, but might generate a long clock enable equation Use default assignments before CASE statement to prevent clock enables

16 CASE Statements Register the select inputs if possible (pipelining)
Can reduce the number of logic levels between flip-flops Consider using one-hot select inputs Eliminating the select decoding can improve performance Determine how your synthesis tool synthesizes the order of the select lines If there is a critical select input, this input should be included “last” in the logic for fastest performance

17 CASE Statement This Verilog code describes a 6:1 multiplexer with binary-encoded select inputs This uses fewer LUTs, but requires multiple LUTs in series on the timing critical path The advantage of using the “don’t care” for the default, is that the synthesizer will have more flexibility to create a smaller, faster circuit How could the code be changed to use one-hot select inputs? module case_binary (clock, sel, data_out, in_a, in_b, in_d, in_c, in_e, in_f) ; input clock ; input [2:0] sel ; input in_a, in_b, in_c, in_d, in_e, in_f ; output data_out ; reg data_out; clock) begin case (sel) 3'b000 : data_out <= in_a; 3'b001 : data_out <= in_b; 3'b010 : data_out <= in_c; 3'b011 : data_out <= in_d; 3'b100 : data_out <= in_e; 3'b101 : data_out <= in_f; default : data_out <= 1'bx; endcase end endmodule VHDL version (entity declaration omitted): process (clock) begin if (clock’event and clock = ‘1’) then case (sel) is when “000” => data_out <= in_a; when “001” => data_out <= in_b; when “010” => data_out <= in_c; when “011” => data_out <= in_d; when “100” => data_out <= in_e; when “101” => data_out <= in_f; when others => data_out <= ‘X’; end case; end if; end process;

18 CASE Statement This is the same code with one-hot select inputs
This used more LUTs, but requires fewer logic levels on the timing critical path This yields a greater benefit when the mux is larger Enumerated types allow you to quickly test different encoding …and makes simulation more readable module case_onehot (clock, sel, data_out, in_a, in_b, in_d, in_c, in_e, in_f) ; input clock ; input [5:0] sel ; input in_a, in_b, in_c, in_d, in_e, in_f ; output data_out ; reg data_out; clock) begin case (sel) 6'b : data_out <= in_a; 6'b : data_out <= in_b; 6'b : data_out <= in_c; 6'b : data_out <= in_d; 6'b : data_out <= in_e; 6'b : data_out <= in_f; default : data_out <= 1'bx; endcase end endmodule VHDL version (entity declaration omitted): process (clock) begin if (clock’event and clock = ‘1’) then case (sel) is when “000001” => data_out <= in_a; when “000010” => data_out <= in_b; when “000100” => data_out <= in_c; when “001000” => data_out <= in_d; when “010000” => data_out <= in_e; when “100000” => data_out <= in_f; when others => data_out <= ‘X’; end case; end if; end process;

19 Other Basic Performance Tips
Avoid high-level loop constructs Synthesis tools may not produce optimal results Order and group arithmetic and logical functions and operators A <= B + C + D + E; should be: A <= (B + C) + (D + E) Use a synchronous reset More reliable system control Avoiding inadvertent latch inference can easily be accomplished with default assignments. // Default assignments before if-then-else or case statement // Now, all outputs are assigned in every branch – avoiding // inadvertent latch inference out1 = 1’b0; out2 = input2; out3 = out3_registered; if (a = b) out1 = a; else if (a = c) out2 = b; else if (a = d) out3 = c;

20 Synchronous Design Rewards
Always make your design synchronous Recommended for all FPGAs Failure to use synchronous design can potentially Waste device resources Not using a synchronous element will not save silicon and it wastes money Waste performance Reduces capability of end products; higher speed grades cost more Lead to difficult design process Difficult timing specifications and tool-effort levels Cause long-term reliability issues Probability, race conditions, temperature, and process effects Synchronous designs have Few clocks Synchronous resets No gated clocks; instead, clock enables All FPGAS require synchronous design techniques to run reliably. Synchronous design techniques will also help the software tools and lead to higher performance implementations in lower speed grade (lower cost) devices. All the challenging silicon design problems of balanced (near zero skew) clock trees and synchronous elements have already been solved by Xilinx when it manufactures the devices. Because the clock networks exist, and every block can (or has to) be synchronous, make the most of the device and design synchronously.

21 Inferred Register Examples
Ex 1 D Flip-Flop Ex 2. D Flip-Flop with Asynch Preset CLOCK) Q = D_IN; CLOCK or posedge RESET) if (RESET) Q = 0; else CLOCK or posedge PRESET) if (PRESET) Q = 1; else Q = D_IN; CLOCK) if (RESET) Q = 0; Ex 3. D Flip-Flop with Asynch Reset Ex 4. D Flip-Flop with Synch Reset

22 Clock Enables Coding style will determine if clock enables are used
VHDL FF_AR_CE: process(ENABLE,CLK) begin if (CLK’event and CLK = ‘1’) then if (ENABLE = ‘1’) then Q <= D_IN; end if; end process Verilog CLOCK) if (ENABLE) Q = D_IN;

23 Summary Use as much of the dedicated hardware resources as possible to ensure optimum speed and device utilization Plan on instantiating clocking and memory resources Try to use the Core Generator tool to create optimized components that target dedicated FPGA resources (BRAM, DSP48E, and FIFO) Maintain your design hierarchy to make debugging, simulation, and report generation easier

24 Summary CASE and IF/THEN statements produce different types of multiplexers CASE statements tend to build logic in parallel while IF/THEN statements tend to build priority encoders Avoid nested CASE and IF/THEN statements You should always build a synchronous design for your FPGA Inferring many types of flip-flops from HDL code is possible Synchronous sets/resets are preferred

25 Where Can I Learn More? Software Manuals
Start  Xilinx ISE Design Suite 13.1  ISE Design Tools  Documentation  Software Manuals This includes the Synthesis & Simulation Design Guide This guide has example inferences of many architectural resources XST User Guide HDL language constructs and coding recommendations Software User Guides and software tutorials Xilinx Education Services courses Xilinx tools and architecture courses Hardware description language courses Basic FPGA architecture and other topics

26 Trademark Information
Xilinx is disclosing this Document and Intellectual Property (hereinafter “the Design”) to you for use in the development of designs to operate on, or interface with Xilinx FPGAs. Except as stated herein, none of the Design may be copied, reproduced, distributed, republished, downloaded, displayed, posted, or transmitted in any form or by any means including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise, without the prior written consent of Xilinx. Any unauthorized use of the Design may violate copyright laws, trademark laws, the laws of privacy and publicity, and communications regulations and statutes. Xilinx does not assume any liability arising out of the application or use of the Design; nor does Xilinx convey any license under its patents, copyrights, or any rights of others. You are responsible for obtaining any rights you may require for your use or implementation of the Design. Xilinx reserves the right to make changes, at any time, to the Design as deemed desirable in the sole discretion of Xilinx. Xilinx assumes no obligation to correct any errors contained herein or to advise you of any correction if such be made. Xilinx will not assume any liability for the accuracy or correctness of any engineering or technical support or assistance provided to you in connection with the Design. THE DESIGN IS PROVIDED “AS IS" WITH ALL FAULTS, AND THE ENTIRE RISK AS TO ITS FUNCTION AND IMPLEMENTATION IS WITH YOU. YOU ACKNOWLEDGE AND AGREE THAT YOU HAVE NOT RELIED ON ANY ORAL OR WRITTEN INFORMATION OR ADVICE, WHETHER GIVEN BY XILINX, OR ITS AGENTS OR EMPLOYEES. XILINX MAKES NO OTHER WARRANTIES, WHETHER EXPRESS, IMPLIED, OR STATUTORY, REGARDING THE DESIGN, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS. IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT, EXEMPLARY, SPECIAL, OR INCIDENTAL DAMAGES, INCLUDING ANY LOST DATA AND LOST PROFITS, ARISING FROM OR RELATING TO YOUR USE OF THE DESIGN, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE TOTAL CUMULATIVE LIABILITY OF XILINX IN CONNECTION WITH YOUR USE OF THE DESIGN, WHETHER IN CONTRACT OR TORT OR OTHERWISE, WILL IN NO EVENT EXCEED THE AMOUNT OF FEES PAID BY YOU TO XILINX HEREUNDER FOR USE OF THE DESIGN. YOU ACKNOWLEDGE THAT THE FEES, IF ANY, REFLECT THE ALLOCATION OF RISK SET FORTH IN THIS AGREEMENT AND THAT XILINX WOULD NOT MAKE AVAILABLE THE DESIGN TO YOU WITHOUT THESE LIMITATIONS OF LIABILITY. The Design is not designed or intended for use in the development of on-line control equipment in hazardous environments requiring fail-safe controls, such as in the operation of nuclear facilities, aircraft navigation or communications systems, air traffic control, life support, or weapons systems (“High-Risk Applications”). Xilinx specifically disclaims any express or implied warranties of fitness for such High-Risk Applications. You represent that use of the Design in such High-Risk Applications is fully at your risk. © 2012 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners.

27 Basic HDL Coding Techniques
Part 2

28 Objectives After completing this module, you will be able to:
Identify some basic design guidelines that successful FPGA designers follow Select a proper HDL coding style for fast, efficient finite state machines Easily pipeline your design Note: The guidelines in this module are not specific to any particular synthesis tool or Xilinx FPGA family.

29 State Machine Design Put the next-state logic in one CASE statement
The state register can also be included here or in a separate process block or always block Put the state machine outputs in a separate process or always block Prevents resource sharing, which can hurt performance Inputs to FSM S2 S1 S3 State Machine Module Finite State Machines (FSMs) are faster when they are in separate processes because the combinatorial logic does not share resources; hence, logic can be combined into a single Look-Up Table (LUT). S5 S4 HDL Code Next-state logic State register State machine outputs

30 The Perfect State Machine
The perfect state machine has… Inputs: Input signals and state jumps Outputs: Output states, control signals, and enable signals to the rest of the design NO arithmetic logic, datapaths, or combinatorial functions inside the state machine Current State Feedback to Drive State Jumps State Jumps Only! Next State Output State and Enables State Register Input Signals

31 State Machine Encoding
Use enumerated types to define state vectors (VHDL) Most synthesis tools have commands to extract and re-encode state machines described in this way Use one-hot encoding for high-performance state machines Uses more registers, but simplifies next-state logic Examine trade-offs: Gray and Johnson encoding styles can also improve performance Refer to the documentation of your synthesis tool to determine how your synthesis tool chooses the default encoding scheme Register state machine outputs for higher performance One-hot: The advantage of using one-hot encoding in Xilinx FPGAs is that the next-state decoding logic can be simplified to logic equations with six inputs or fewer, which can fit into a single LUT. This maximizes the performance of the state machine. Many synthesis tools automatically choose one-hot encoding for state machines when you target a Xilinx FPGA, so check your synthesis tools documentation.

32 Benefits of FSM Encoding
Binary Smallest (fewest registers) Complex FSM tends to build multiple levels of logic (slow) Synthesis tools usually map to this encoding when FSM has eight or fewer states One-hot Largest (more registers), but simplifies next-state logic (fast) Synthesis tools usually map this when FSM has between 8 and 16 states Always evaluate undefined states (you may need to cover your undefined states) Gray and Johnson Efficient size and can have good speed Which is best? Depends on the number of states, inputs to the FSM, complexity of transitions How do you determine which is best? Build your FSM and then synthesize it for each encoding and compare size and speed By choosing an enumerated type for your FSM, you can easily experiment with each of these encoding techniques. You also can simulate with binary encoding (easy to read) and then re-synthesize with a different encoding before implementation.

33 State Machine Example (Verilog)
module STATE(signal_a, signal_b, clock, reset, usually_one, usually_zero); input signal_a, signal_b, clock, reset; output usually_one, usually_zero; reg [4:0] current_state, next_state; parameter s0 = 0, s1 = 1, s2 = 2, s3 = 3, s4 = 4; clock or posedge reset) begin if (reset) current_state <= s0; end else current_state <= next state; This is an example of a simple state machine. Note that the outputs are not defined in this always block. The first always block is the synchronous portion of the state machine. On a reset, the state machine returns to s0. Otherwise, the next state is loaded on each clock edge. Outputs are not defined here (good) Placed in a separate always block Asynchronous reset (bad)

34 State Machine Example (Verilog)
(current_state or signal_a or signal_b) begin case (current_state) s0: if (signal_a) next_state = s0; else next_state = s1; s1: if (signal_a && ~signal_b) next_state = s4; next_state = s2; s2: next_state = s4; s3: next_state = s3; s4: next_state = s0; default: next_state = ‘bx; endcase end endmodule This always block shows the next-state logic. The state machine output logic is not shown here, but it would use a CASE statement similar to this one to determine the values of the outputs usually_one and usually_zero based on the current state (and perhaps signal_a and signal_b). Use a default statement as part of your next state assignments (good)

35 Binary Encoding (Verilog)
Test different FSM encodings yourself (good) Don’t always trust your synthesis tool to choose the best encoding reg [3:0] current_state, next_state; parameter state1 = 2’b00, state2 = 2’b01, state3 = 2’b10, state4 = 2’b11; (current_state) case (current_state) state1 : next_state = state2; state2 : next_state = state3; state3 : next_state = state4; state4 : next_state = state1; endcase (posedge clock) current_state = next_state; The previous example used integers to represent the states. You may also use a PARAMETER statement to explicitly define the state values. This example shows a binary encoded state machine.

36 One-Hot Encoding (Verilog)
reg [4:0] current_state,next_state; parameter state1 = 4’b0001, state2 = 4’b0010, state3 = 4’b0100, state4 = 4’b1000; (current_state) case (current_state) state1 : next_state = state2; state2 : next_state = state3; state3 : next_state = state4; state4 : next_state = state1; endcase (posedge clock) current_state = next_state; This PARAMETER statement is defining a one-hot encoded state machine. Encoding is easily changed

37 State Machine Example (VHDL)
library IEEE; use IEEE.std_logic_1164.all; entity STATE is port ( signal a, signal b: in STD_LOGIC; clock, reset: in STD_LOGIC; usually_zero, usually_one: out STD_LOGIC ); end STATE; architecture STATE_arch of STATE is type STATE_TYPE is (s0,s1, s2, s3); signal current_state, next_state: STATE_TYPE; signal usually_zero_comb, usually_one_comb : STD_LOGIC; begin This is an example of a simple state machine.

38 State Machine Example (VHDL)
COMB_STATE_MACHINE: process(current_state, signal a, signal b) begin next_state <= s0; usually_zero_comb <= '0'; usually_one_comb <= '1'; -- set default to one and reset to zero when necessary case current_state is when s0 => next_state <= s1; if signal a = '1' then end if; when s1 => next_state <= s2; if signal a='1' AND signal b = '0' then next_state <= s3; usually_zero_comb <= '1'; when s2 => when s3 => usually_one_comb <= '0'; when others => end case; end process; Default state is used to define output values (good) This process contains the next-state logic and the state machine output logic. You could also separate the state machine outputs into their own process, which is recommended for larger and more complex state machines.

39 State Machine Example (VHDL)
SYNCH_STATE_MACHINE: process(clock, reset) begin if (reset = '1') then current_state <= s0; usually_zero <= '0'; usually_one <= '1'; elsif (clock'event and clock = '1') then current_state <= next_state; usually_zero <= usually_zero_comb; usually_one <= usually_one_comb; end if; end process; end STATE_arch; This is the synchronous portion of the state machine. On a reset, the state machine returns to s0. Otherwise the next state is loaded on each clock edge. Notice that the state machine outputs are also registered in this process. If the state machine outputs were purely combinatorial, they would not be included here. Asynchronous reset (bad, unreliable)

40 Unspecified Encoding (VHDL)
entity EXAMPLE is port( A,B,C,D,E, CLOCK: in std_logic; X,Y,Z: out std_logic); end EXAMPLE; architecture XILINX of EXAMPLE is type STATE_LIST is (S1, S2, S3, S4, S5, S6, S7); signal STATE: STATE_LIST; begin P1: process( CLOCK ) begin if( CLOCK’event and CLOCK = ‘1’) then case STATE is when S1 => X <= ‘0’; Y <= ‘1’; Z <= ‘1’; if( A = ‘1’ ) then STATE <= S2; else STATE <= S1; Most synthesis tools will implement an “unspecified encoding” as a binary encoded state machine. The synthesis tool assigns state values starting with the leftmost value in the list. In this example, S1 = 000, S2 = 001, … and S7 = 110. Undefined encoding (bad, probably inefficient)

41 One-Hot Encoding (VHDL)
architecture one-hot_arch of one-hot is subtype state_type is std_logic_vector(5 downto 0); signal current_state, next_state: state_type; constant s0 : state_type := "000001"; constant s0_bit : integer := 0; constant s1_bit : integer := 1; constant s2_bit : integer := 2; constant s3_bit : integer := 3; constant s4a_bit : integer := 4; constant s4b_bit : integer := 5; signal usually_zero_comb, usually_one_comb : std_logic; begin comb_state_machine: process(current_state, signal a, signal b, signal c, signal d) next_state <= state_type'(others => '0'); if current_state(s0_bit) = '1' then if signal a = '1' then next_state(s0_bit) <= '1'; else next_state(s1_bit) <= '1'; end if; if current_state(s1_bit) = '1' then next_state(s4a_bit) <= '1'; end; Most synthesis tools have options to compile state machines as binary or one-hot. If your synthesis tool has this option, use the “Unspecified Encoding” example along with this synthesis option to create a one-hot state machine. Some synthesis tools also allow for an attribute of “one-hot” on a user-defined type. This is another way to get a one-hot state machine without resorting to the more cumbersome syntax shown here. OHE a little harder in VHDL (recommend using your synthesis tools attribute, if possible)

42 Pipelining Concept fMAX = n MHz fMAX  2n MHz D Q D Q D Q D Q D Q
two logic levels D Q D Q Inserting flip-flops into a datapath is called pipelining. Pipelining increases performance by reducing the number of logic levels (LUTs) between flip-flops. All Xilinx FPGA device families support pipelining. The basic slice structure is a logic level (six-input LUT) followed by a flip-flop. Adding a pipeline stage, as shown in this example, will not exactly double fMAX. The flip-flop that is added to the circuit has an input setup time and a clock-to-Q time that make the pipelined circuit run at less than double the original frequency. You will see a more detailed example of increasing performance by pipelining later in this section. one level one level fMAX  2n MHz D Q D Q D Q

43 Pipelining Three situations in which to pipeline Register I/O
Usually done by the designer from the beginning Register the outputs of each lower leaf-level output Typically done after timing analysis Can easily be done for purely combinatorial components Register high-fanout secondary control signals (Set, Reset, CEs) These are just the most obvious suggestions. For every design, there may be more tricks or other clever things that can improve performance. Pipelining is the one thing that helps the most, and for most systems today, pipelining is always an option because bandwidth is what defines the system, not the latency. Latency can be important, but if it is, it is usually the latency in a different order of magnitude than the one that is caused by pipelining. FPGAs have lots of registers, so re-timing and clever use of arithmetic functions can yield tremendous performance. If designers need to balance the latency among different paths in the system, the SRLs can be used to compensate efficiently for delay differences.

44 Performance by Design Code A Code B Switch D Q High fanout
One level of logic, but the routing can be prohibitive May require higher speed grade, adding cost D Q Enable data_in CE reg_data D Q Switch Code B In ‘Code B’ the ‘reg_data’ is enabled one clock cycle later, so it would be important to ensure that the ‘data_in’ is still valid. This may require the user to balance the latency on the input path for data_in in Code B. Recall that the data will be available one cycle later than Code A. Adding the pipeline flip-flop really costs nothing because it comes with the LUT. Again, you increase gate count and increase performance without adding to the cost. In practice, the coding style leads to preparing clock enables one cycle in advance of when they are required. Note that the addition of the register adds some latency that you will have to plan for as well. D Q reg_enable One level of logic Maximum time for routing of high fanout net Flip-flop adds nothing to the cost Data_in must also be registered D Q High fanout D Q Enable CE reg_data D Q D Q data_in

45 Performance by Design (Verilog)
These two pieces of code are not functionally identical Code B (on the right) forms a pipeline stage for the circuit and improves its speed, Code A does NOT In each case reg_data and data_in are 16-bit buses switch and enable are outputs from flip-flops Code A Code B clk) begin if (switch && enable) reg_data <= data_in; end clk) begin if (set_in && enable_in) reg_enable <= 1'b1; else reg_enable <= 1'b0; if (reg_enable) reg_data <= data_in; end The code on the left performs a decode as part of the enable. The code on the right performs the decode separately to form a registered signal, which is then used to enable a bank of registers. Real designs may have a far more complicated set of rules for loading this register. It usually increases the levels of logic and the delay. Here is an example: If address=X”1B3D” then reg_data <= processor_output End if;

46 Performance by Design (VHDL)
These two pieces of code are not functionally identical The code on the right forms a pipeline stage for the circuit and improves its speed In each case reg_data and data_in are 16-bit buses switch and enable are outputs from flip-flops Code B capture: process (clk) begin if clk'event and clk='1' then if switch ='1’ and enable=‘1’ then reg_enable <= ‘1’; else reg_enable <= ‘0’; end if; if reg_enable='1’ then reg_data <= data_in; end process; Code A capture: process (clk) begin if clk'event and clk='1' then if switch='1’ and enable=‘1’ then reg_data <= data_in; end if; end process; The code on the left performs a decode as part of the enable. The code on the right performs the decode separately to form a registered signal which is then used to enable a bank of registers. Real designs may have a far more complicated set of rules for loading this register. It usually increases the levels of logic and the delay. If address=X”1B3D” then reg_data <= processor_output End if;

47 Summary When coding a state machine, separate the next-state logic from state machine output equations Evaluate whether you need to use binary, one-hot, Gray, or Johnson encoding style for your FSM This will yield a smaller and/or faster FSM Pipeline data paths to improve speed

48 Where Can I Learn More? Software Manuals
Start  Xilinx ISE Design Suite 13.1  ISE Design Tools  Documentation  Software Manuals This includes the Synthesis & Simulation Design Guide This guide has example inferences of many architectural resources XST User Guide HDL language constructs and coding recommendations Software User Guides and software tutorials Xilinx Education Services courses Xilinx tools and architecture courses Hardware description language courses Basic FPGA architecture and other topics

49 Trademark Information
Xilinx is disclosing this Document and Intellectual Property (hereinafter “the Design”) to you for use in the development of designs to operate on, or interface with Xilinx FPGAs. Except as stated herein, none of the Design may be copied, reproduced, distributed, republished, downloaded, displayed, posted, or transmitted in any form or by any means including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise, without the prior written consent of Xilinx. Any unauthorized use of the Design may violate copyright laws, trademark laws, the laws of privacy and publicity, and communications regulations and statutes. Xilinx does not assume any liability arising out of the application or use of the Design; nor does Xilinx convey any license under its patents, copyrights, or any rights of others. You are responsible for obtaining any rights you may require for your use or implementation of the Design. Xilinx reserves the right to make changes, at any time, to the Design as deemed desirable in the sole discretion of Xilinx. Xilinx assumes no obligation to correct any errors contained herein or to advise you of any correction if such be made. Xilinx will not assume any liability for the accuracy or correctness of any engineering or technical support or assistance provided to you in connection with the Design. THE DESIGN IS PROVIDED “AS IS" WITH ALL FAULTS, AND THE ENTIRE RISK AS TO ITS FUNCTION AND IMPLEMENTATION IS WITH YOU. YOU ACKNOWLEDGE AND AGREE THAT YOU HAVE NOT RELIED ON ANY ORAL OR WRITTEN INFORMATION OR ADVICE, WHETHER GIVEN BY XILINX, OR ITS AGENTS OR EMPLOYEES. XILINX MAKES NO OTHER WARRANTIES, WHETHER EXPRESS, IMPLIED, OR STATUTORY, REGARDING THE DESIGN, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS. IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT, EXEMPLARY, SPECIAL, OR INCIDENTAL DAMAGES, INCLUDING ANY LOST DATA AND LOST PROFITS, ARISING FROM OR RELATING TO YOUR USE OF THE DESIGN, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE TOTAL CUMULATIVE LIABILITY OF XILINX IN CONNECTION WITH YOUR USE OF THE DESIGN, WHETHER IN CONTRACT OR TORT OR OTHERWISE, WILL IN NO EVENT EXCEED THE AMOUNT OF FEES PAID BY YOU TO XILINX HEREUNDER FOR USE OF THE DESIGN. YOU ACKNOWLEDGE THAT THE FEES, IF ANY, REFLECT THE ALLOCATION OF RISK SET FORTH IN THIS AGREEMENT AND THAT XILINX WOULD NOT MAKE AVAILABLE THE DESIGN TO YOU WITHOUT THESE LIMITATIONS OF LIABILITY. The Design is not designed or intended for use in the development of on-line control equipment in hazardous environments requiring fail-safe controls, such as in the operation of nuclear facilities, aircraft navigation or communications systems, air traffic control, life support, or weapons systems (“High-Risk Applications”). Xilinx specifically disclaims any express or implied warranties of fitness for such High-Risk Applications. You represent that use of the Design in such High-Risk Applications is fully at your risk. © 2012 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners.


Download ppt "Basic HDL Coding Techniques"

Similar presentations


Ads by Google