Presentation is loading. Please wait.

Presentation is loading. Please wait.

EET 3143 Programmable Logic Devices

Similar presentations


Presentation on theme: "EET 3143 Programmable Logic Devices"— Presentation transcript:

1 EET 3143 Programmable Logic Devices
Michigan Technological University Electrical Engineering Technology Instructor: Dr. Nasser Alaraje

2 Contact Information Name: Abdulnasser (Nasser) Alaraje Office: 417 EERC Building Phone (O): Office Hours: MWF 10:00 am – 12:00 pm (or by appointment)

3 Practical Course Course Objectives:
Upon Successful completion of this course, students should: Learn how to use HDL for modeling basic building blocks of digital system Learn FPGA technology and the impact of using FPGA in logic design Learn FPGA design flow using Altera’s Quartus® II development software Gain FPGA design experience by synthesizing, mapping, and placing and routing a given design on Altera’s DE2 FPGA evaluation board Work in groups of two or three and thereby learn how to cooperate in teams Gain a basic understanding of timing analysis Learn how to build SDC files for constraining FPGA designs Learn how to verify timing on simple design using the TimeQuest analyzer

4 Why FPGA? Respond to the Market needs of Skilled FPGA Engineers
FPGA-based re-programmable logic design became more attractive as a design medium during the last decade only 19.5 % of 4-year and 16.5 % of 2-year electrical and computer engineering technology programs at US academic institutions currently have a curriculum component in hardware description language and programmable logic design Curriculum has not yet “caught up” to industry needs. industry must be driving the curriculum development.

5 What projects are FPGAs good for
Aerospace & Defense Radiation-tolerant FPGAs along with intellectual property for image processing, waveform generation, and partial reconfiguration for SDRs. Automotive Automotive silicon and IP solutions for gateway and driver assistance systems, comfort, convenience, and in-vehicle infotainment. Broadcast Solutions enabling a vast array of broadcast chain tasks as video and audio finds its way from the studio to production and transmission and then to the consumer. Consumer Cost-effective solutions enabling next generation, full-featured consumer applications, such as converged handsets, digital flat panel displays, information appliances, home networking, and residential set top boxes. Industrial/Scientific/Medical Industry-compliant solutions addressing market-specific needs and challenges in industrial automation, motor control, and high-end medical imaging. Storage & Server Data processing solutions for Network Attached Storage (NAS), Storage Area Network (SAN), servers, storage appliances, and more. Wireless Communications RF, base band, connectivity, transport and networking solutions for wireless equipment, addressing standards such as WCDMA, HSDPA, WiMAX and others. Wired Communications End-to-end solutions for the Reprogrammable Networking Linecard Packet Processing, Framer/MAC, serial backplanes, and more

6 Who uses them

7 Why are they important They have the ability to revolutionize the way that prototyping is done. Allows companies to get to market quicker and stay in market longer.

8 Xilinx Largest manufacturer of HW Develop hardware and software
Embedded PowerPC University Program

9 Altera Second largest manufacturer Develop HW and SW
University Program

10 Which is best? It depends Time Existing resources Money
Level of effort Preference

11 Hardware/Software? Software: Quartus Software Hardware: DE2 FPGA board

12 Welcome to the Quartus II Software!
Turn on or off in Tools  Options Can be turned off – turn it back on through Tools -> Options -> General However –you can also start tutorials, start or open designs. Tutorial can be started through Help menu as well as Web links.

13 Altera DE2 Development Board

14 Entity Describes all inputs and outputs
Every VHDL design must has at least one entity Requires the use of Identifiers for naming the entity itself as well as the inputs and outputs Entity is a keyword and is reserved in VHDL for this purpose entity <entity identifier> is port (signal identifier); end entity <entity identifier> ENTITY Or2 IS PORT (x: IN std_logic; y: IN std_logic; F: OUT std_logic); END Or2;

15 Architecture Architecture declaration is where the operation of the logic function is specified For each entity there must be a corresponding architecture Each architecture must be associated by name with an entity architecture < architecture name> of <entity name> is begin The description of the logic function goes here end architecture <architecture name > ARCHITECTURE Or2_beh OF Or2 IS BEGIN PROCESS(x, y) F <= x OR y; END PROCESS; END Or2_beh;

16 VHDL Processes A process is executed in sequence
Sensitivity list is a list of signals to which the process is sensitive and is optional Name: process (sensitivity list) Declarations Begin Sequential statements End process; PROCESS(x, y) BEGIN F <= x OR y; END PROCESS;

17 VHDL Components Predefined logic
Place in a VHDL library and use repeatedly Any logic function can become a component and used in large programs COMPONENT And2 IS PORT (x: IN std_logic; y: IN std_logic; F: OUT std_logic); END COMPONENT; component name_of_component is port (port definition); end component name_of_component;

18 Conditional Statements
if-then if-then-else elsif case

19 If statement Causes a decision to be made
When the if statement is true, the code following the if statement is executed When the if statement is false, the code following the if statement until the end if is skipped if conditional statement then VHDL statements end if

20 If-Then-Else statement
else is an alternative path for the if statement if conditional statement then VHDL statements else end if

21 Elsif statement Use to allow multiple alternative paths
if conditional statement then VHDL statements elsif conditional statement then end if

22 Case statement example
case expression is when choice => VHDL statement; when others => VHDL statements; end case;

23 Processes in VHDL Processes Describe Sequential Behavior
Processes in VHDL Are Very Powerful Statements Allow to define an arbitrary behavior that may be difficult to represent by a real circuit Not every process can be synthesized Use Processes with Caution in the Code to Be Synthesized Use Processes Freely in Testbenches

24 Logic Operators Logic operators Logic operators precedence
and or nand nor xor not xnor only in VHDL-93 Highest not and or nand nor xor xnor No order precedents Lowest

25 Logic Operators - example
Order of evaluation Associative logical operator Need to describe XOR using and, or, not C = a and not b or not a and b Will be interpreted as: C = ((a and (not b)) or (not a) and b C = (ab’+a’)b  not correct Need to use parentheses as follows C = (a and not b) or (not a and b) and, or, xor, xnor are associative. f <= a and b and c; allowed nand or nor is not associative. g <= a nand b nand c; invalid G <= not (a and b and c) ; valid

26 Loops A loop repeatedly executes the sequential statements contained within the loop structure for loop Entry point Iteration terminal test for identifier in starting value to stopping value loop VHDL statements end loop

27 While loop A for loop stops after a fix number of iterations
A while loop continues to loop until a condition is met Structure Entry point Terminal test Exit point while Boolean expression loop VHDL statements end loop

28 Data Types bit bit_vector integer Boolean All are keywords
natural positive Boolean All are keywords Data types define the type of data and the set of values that can be assigned to.

29 Integer Data Type Can contain positive and negative whole numbers
entity declaration sets a range In the example the output will require 4 pins for the integer entity integer_1 is port( A, B: in bit; Z:out integer range 0 to 15); end entity integer_1

30 Natural data sub type A subtype of integer data
Holds whole numbers greater than or equal to zero In an application limit the range so you limit the number of pins assigned entity natural_1 is port( A: in natural range 0 to 16; X: out natural range 0 to 31); end entity natural_1;

31 Positive data sub type A subtype of integer data
Restricts integers to the range from 1 to the specified range limit. entity positive_1 is port( A, B: in bit; Z: out positive range 1 to 31); end entity positive_1;

32 Boolean Data Type Has two possible values true and false
In the example below two variables are declared on as true and the other is false variable v1: boolean := false; variable v2: boolean := true:

33 User-defined enumeration types - Examples
type state is (S0, S1); type alu_function is (disable, pass, add, subtract, multiply, divide); type octal_digit is (‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’);

34 Functions and Procedures
Types of subprograms in VHDL Allow for modularization and code reuse process can also be used as a subprogram, think of a subprogram as a process that is located outside of the architecture of a program. A function is a subprogram that operates on a set of inputs and returns an output A procedure is a subroutine that operates on an argument list and passes values back through the argument list function and procedure will require a call

35 Function syntax FUNCTION function_name (<parameter_list>)
RETURN data_type IS [declarations] BEGIN (function statements) Return value; END function_name;

36 Function example function and_gate (X, Y: in std_logic) return std_logic is begin return X and Y; end and_gate; To call a function : The output of a function can be assigned to an output port (same data type). Information can also be passed into the function by value. AND1: x<=and_gate (A,B); AND2: x<=and_gate(‘1’, B);

37 Procedure syntax PROCEDURE procedure_name (<parameter_list>) IS
[declarations] BEGIN (procedure statements) END procedure_name; Procedure: similar to a function; however, the arguments in a procedure can include both inputs and outputs (function has inputs only).

38 Procedure example procedure or_gate(X, Y : in std_logic; Z: out std_logic) is begin Z <= X or Y; end or_gate; To call a procedure: Inputs and outputs are used to pass data in and out a VHDL procedure (same data type). B1: or_gate (A=>X, B=>Y, Z =>V1);

39 Libraries, Packages and Package Bodies
They hold commonly-used elements and allows them to be stored and used over and over again without having to re-write them. Components, Procedures and functions are in packages Packages can be user defined or vendor supplied Libraries are used to hold packages

40 Libraries Two types IEEE Standard Library
Standard libraries (like IEEE standard library) User defined (holds user-defined packages) IEEE Standard Library Keyword library: make the packages in the IEEE library visible to the VHDL code. Keyword use: tells the VHDL code what is to be used from the IEEE library. You can specify a specific feature(s) from the package or you can use the keyword all to make them all available. VHDL library coding library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_1164.std_logic;

41 Packages Used to hold reusable code Components Functions procedures
Package declaration package user_defined_name is package declarations end package user_defined_name;

42 Package Body Package body syntax package body user_define_name is package body definitions End package body user_defined_name; Package body is where items listed in the declaration are defined.

43 Package containing a function (1)
LIBRARY IEEE; USE IEEE.std_logic_1164.all; PACKAGE specialFunctions IS FUNCTION AndGate( A,B: in std_logic) RETURN std_logic; END specialFunctions; PACKAGE BODY specialFunctions IS FUNCTION AndGate( A,B: in std_logic) RETURN std_logic is BEGIN return A AND B; END AndGate;

44 Package containing a function (2)
The package is saved as specialFunctions in the library name work, which is the default library. Once the package is compiled. It can be used by other VHDL programs. Example: LIBRARY IEEE; USE IEEE.std_logic_1164.all; USE work.specialFunctions.all; Entity ExamplePackage is Port (A, B: in std_logic; X: out std_logic); End Entity ExamplePackage; Architecture MyGate of ExamplePackage is Begin Process (A,B) A1: X<= AndGate(A,B); End process; End architecture MyGate;

45 FPGA Introduced by Xilinx in mid 1980 for implementing digital logic
F ield P rogrammable G ate A rray FPGA Can be visualized as a set of programmable logic blocks embedded in programmable interconnect Interconnect architecture provides the connectivity between logic blocks Programming Technology determines the method of storing configuration

46 FPGA Re-programmable Logic Applications
When FPGA first introduced, it was considered as another form of gate array SRAM-FPGA in-circuit reprogrammability feature provides a more than just a standard gate array FPGAs have gained rapid acceptance and growth over the past decade because they can be applied to a very wide range of applications random logic Custom computing machine device controllers communication encoding and filtering programmable logic becomes the dominant form of digital logic design and implementation

47 FPGA design flow Design Flow is the step-by-step methodology to go through the process of FPGA design The design flow can be divided into 6 basic steps Design Entry Functional Verification and Simulation FPGA Synthesis FPGA Place & Route Circuit Analysis (Timing, Power …) Programming FPGA devices

48 Description of Design steps
Design Entry – describes the design that has to be implemented onto FPGA Functional Verification and Simulation – checks logical correctness of design FPGA synthesis – converts design entry into actual gates/blocks needed FPGA Place & Route – selects the optimal position and minimizes length of interconnections on device Time Analysis – determines the speed of the circuit which has been completely placed and routed Programming to FPGA – downloads bitstream codes onto FPGA devices

49 Lets put these design steps in order
FPGA Design Flow Lets put these design steps in order design entry (VHDL) FPGA Synthesis FPGA Place and Route Download to FPGA FUNCTIONAL VERIFICATION & SIMULATION CIRCUIT ANALYSIS (Timing)

50 FPGA Design Flow design entry (VHDL) FPGA Synthesis
Analysis Path Implementation Path FUNCTIONAL VERIFICATION & SIMULATION design entry (VHDL) CIRCUIT ANALYSIS (Timing) FPGA Synthesis FPGA Place and Route Download to FPGA

51 The origin of FPGA First transistor created at Bell Lab in 1947.
First Phase Shift oscillator fabricated on a single chip by TI in Around mid-1960, TI introduced 54xx and 74xx series. In 1971, Intel announced the world’s first uP (4004), contains 2300 transistors and could execute 60,000 operations per second. The first programmable IC were referred to as Programmable Logic Devices (PLDs : PROM) arrived in 1970 (simple as compared to new device called Complex PLDs)

52 PLD PLD (Programmable Logic Device)
Contains thousands of basic logic gates in a single package Capable of performing advanced sequential functions Must be configured to perform a specific function

53 PROMs The first PLD Consists of a fixed array of AND functions driving a programmable array of OR 3-input, 3-output PROM, programmable OR link, each OR has 8 inputs, used to implement simple logic functions. Must be configured to perform a specific function

54 PROMs – Example, 3-input, 3-output function
PROM Programmed to implement the 3 functions, W, X, and Y

55 PLAs The first became available in 1975
Both AND and OR arrays were programmable. 3-input, 3-output PLA, number of AND is independent of the number of inputs (PROM) OR array is independent of number of AND functions or number of inputs.

56 PLAs - Example PLA Programmed to implement the 3 functions, W, X, and Y

57 PALs The first became available in late 1970
The exact opposite of PROM, Programmable AND, fixed OR. 3-input, 3-output PAL, faster because only one array is programmable. Allow a restricted number of products to be Ored.

58 CPLDs The first became available in early 1980, Complex PLD
Mega-PAL, compromised of four standard PALs with some interconnect linking them together. Altera introduced CPLD based on a combination of EPROM. A generic device consists of a number of SPLD blocks sharing a common programmable interconnection matrix.

59 Programming PLDs USE device programmer, each vendor has file format, very time consuming design flow. In 1980, a committee of the (Joint Electron Device Engineering Council – JEDEC) proposed a standard format for PLD programming text files.

60 ASICs Four main classes of ASIC (Application Specific Integrated Circuit). Full Custom: Engineer have complete control over every mask layer used to fabricate the silicon chip. ASIC vendor does not prefabricate any component on the silicon or does not provide ant libraries of predefined logic gates and functions. Highly complex and time consuming design process

61 ASICs Gate Arrays: based on the idea of a basic cell consisting of a collection of unconnected transistors and resistors. ASIC vendor prefab silicon chip containing array of the basic cells. Channeled gate array are presented either single-column or dual-column arrays. Vendor defines a set of logic function to be used by design engineer (MUX for example) referred as cell. ASIC Design flow is beyond the scoop of this course.

62 FPGAs Around 1980s, a gap in the digital IC.
SPLD and CPLD, programmable and had fast design and modification time, but could not support large or complex functions. ASIC, support extremely large and complex function, but painfully expensive and time-consuming to design, once the design had been implemented, it is frozen in the silicon. To address this Gap, Xilinx developed a new class of IC called Field-Programmable Gate Array (FPGA).

63 FPGAs FPGA are based on the concept of programmable logic block, simple, 3-input lookup table (LUT), a register and a MUX. Each FPGA contained a large number of these programmable logic blocks embedded in configurable routing architecture. Every block could be configured to perform different function, register can be programmed on positive or negative clock. The MUX feeding the FFs could be configured to accept output from the LUT or a separate input to the logic block, the LUT could be configured 3-input logic function.

64 FPGAs Example: configure the LUT to perform Y = (A and B) OR (NOT C)

65 FPGAs Large number of programmable blocks (islands) surrounded by a (sea) of programmable interconnects

66 Altera’s Quartus II Tutorial
Start the Quartus II software and prepare to implement the Boolean equation X = AB +CD.

67 Altera’s Quartus II Tutorial
Create a new project Create a block design file (bdf) Draw the digital logic for the Boolean equation Make the circuit connections Compile the project

68 Altera’s Quartus II Tutorial
Create a vector waveform file (vwf) Add inputs and outputs to the waveform display Create timing waveforms for the inputs Perform a functional simulation of the x-output

69 Altera’s Quartus II Tutorial
Use the Altera development and education board to program an FPGA. Assign pins Recompile the project Program the FPGA Test the logic Use the VHDL text editor to recreate the design used in the block design.

70 FPGA Programming Technology SRAM-based FPGA
Fabric: means the underlying structure of the device. Majority of FPGA are SRAM based. They can be configured over and over again. Impact the memory R&D. SRAM cells are created exactly the same as the rest of the device. Downside: Have to be reconfigured every time the system is powered up. Configuration file is stored in external memory. Security issues with protecting your IP. Some SRAM-based FPGA supports encryption.

71 Antifuse-based FPGA Programmed offline using a special programmer.
Nonvolatile, configurations remains when power is off. No external memory device to store configuration data. Application: military and Aerospace. Once programmed, it can not be altered. NO Security issues with protecting your IP. Downside: They are OTP, once programmed, function is set stone.

72 EPROM/Flash-based FPGA
Can be configured offline or using in-system programming. Nonvolatile, once programmed, the data is nonvolatile. Support protection mechanism. Application: military and Aerospace.

73 Summary

74 FPGA architectures (Fine, medium, and coarse-grained)
Reminder: large number of programmable logic blocks (islands) embedded in a (sea) of programmable interconnect. Fine-grain: each logic block can be used to implement only a very simple function such as any 3-input function. Coarse-grain: relatively larger logic block. As the granularity of the blocks increases to medium or high, the amount of connections into the blocks decreases compared to functionality they can support.

75 MUX based logic block Consider example y = (A AND B) OR C;
Each input to the block is presented with a logic 0, a logic 1, or the true or the inverse of a signal Implemented using MUX

76 LUT based logic block Consider example y = (A AND B) OR C;
A group of input signals is used as an index (address) to the lookup table. Load the 3-input LUT with the appropriate values. LUT is SRAM based.

77 LUT based logic block

78 MUX versus LUT logic block?
Majority of today’s FPGA architectures are LUT based. MUX based does not provide high-speed carry logic chains, in which LUT are leader in anything to do with arithmetic processing. First FPGAs were based on 3-input LUTs. Mainly 4-input LUTs architecture.

79 CLBs versus LABs? Can not LIVE by LUTs alone.
Will contain other elements such as MUX and registers.

80 Xilinx logic cell Each vendor has its own names for things.
Xilinx call it logic cell (LC), comprises: 4-input LUT MUX Register Clock can be configured rising versus falling Register can be configured as FFs or as a latch. Altera call it logic element (LE)

81 Slicing Next step up of the hierarchy is a slice.
Slice has one set of clock, clock enable, and set/reset signals common to both logic cells.

82 CLBs versus LABs? Next step up is CLB for Xilinx and LAB for Altera.
Xilinx have two or more slices in each CLB, example shows fours slices per CLB, additionally, fast programmable interconnect with the CLB to connect neighboring slices.

83 CLBs versus LABs? Each 4-bit LUT can be used as 16x1 RAM. Also, the four slices per CLB, all LUTs can be configured to implement the following: Single port 16X8 bit RAM Single port 32X4 bit RAM Single port 64X2 bit RAM Single port 128X1 bit RAM Dual port 16X8 bit RAM Dual port 32X4 bit RAM Dual port 64X2 bit RAM Each 4-bit LUT can be used as 16-bit shift register

84 Embedded RAMs Every applications needs memory.
FPGA now include large chunks of embedded RAM called e-RAM or block RAM. Usually organized in columns. Each block can be used independently or multiple blocks can be combined together to implement large blocks. Useful to implement single-, dual-, FIFO, state machines …

85 Embedded multiplier, adders, …
Some functions are inherently slow if they are implemented by connecting a large number programmable logic blocks. Many FPGA incorporate special hard-wired multiplier blocks Located in close proximity to the embedded RAM blocks.

86 Embedded multiplier, adders, …
Some FPGA offers dedicated adder blocks ( very useful in DSP applications) Multiply-and-Accumulate (MAC). If FPGA only provides multiplier blocks, you can combine multiplier with adder and store results in registers.

87 Embedded processor cores
Many application make use of microprocessors in one form or another. High-end FPGA contain one or more embedded microprocessor, referred to as microprocessor cores. Hard processor cores: dedicated predefined block. Either locate it in the strip, advantages: main FPGA fabric is identical, easier for design tools

88 Embedded processor cores
Embed within the main fabric, design tools needs to account for the presence of these blocks in the fabric.

89 Embedded processor cores - soft
Configure a group of programmable logic blocks to act as a microprocessor, soft cores. Are simpler and slower than hard-cores. Advantages: you implement it if you need it Instantiate as many as you need.

90 Clock trees and clock managers
All of the synchronous elements need to be driven by clock signals. Clock signal originates outside the FPGA, comes to FPGA via a special clock input pin and then routed through the device. Clock Tree: the main clock signal branches. This structure ensures that all of the flip-flops see their version as close together as possible.

91 Clock trees and clock managers – cnt’d
If the clock were distributed as a single long track driving all registers, one after another, then registers closer to clock pin will see the clock signal sooner, this is referred as skew (avoid!). The clock tree is implement using special track and separate from the general-purpose programmable interconnect. Usually, you will have multiple clock domain and multiple clock pins.

92 Clock trees and clock managers – cnt’d
Instead of connecting clock pin into an internal clock tree, it can drive special hard-wired function (block) called clock manager. Clock manager generates a number of daughter clocks. Daughter clocks can drive internal clock trees or external output pins to provide external clock.

93 Clock trees and clock managers – cnt’d
Each FPGA family has its own type of clock manager. Clock manager supports jitter removal, clock edges may arrive a little early or a little late (Jitter). The FPGA clock manager can be used to detect and correct this jitter and to provide a clean daughter clock signals for use inside the device.

94 Clock trees and clock managers – cnt’d
Frequency Synthesis: outside clock is not what the engineers wish for. Clock manager can be used to generate daughter clocks with frequencies derived from original clock. Example: 3 daughter clocks, 1.0, 2.0, 0.5 x original clock frequency.

95 Clock trees and clock managers – cnt’d
Phase shifting: some designs require the use of clocks that are phase shifted (delayed) with respect to each other. Clock managers allow to select from a fixed phase shifts, 90, 180, and 270 or configure the exact amount of phase shift. Example: 1st is in phase, 2nd is shifted by 90, and so forth.

96 General Purpose I/O Today’s FPGA package can have 1,000 or more pins, arranged as array across the base of the package. Each FPGA general purpose I/O can be configured to accept and generate signals conforming to whichever standard is required. General purpose I/O splits in a number of banks, starting from 0 to 7

97 General Purpose I/O – cnt’d
Each bank can be configured to support a particular I/O standard: LVTTL LVCMOS PCI LVDS This allows FPGA to work with multiple I/O standard, or to translate between different protocols that based on particular electrical standards.

98 FPGA Families Many different types suited for almost every kind of application. FPGAs are grouped into categories, often referred as families or series, each with common characteristics. Some FPGAs are characterized as having high volume, low cost, high temperature, available in various sizes, packages, and speed. Manufacturers group FPGAs according to their application. (automotive, space, medical,.. Etc)

99 Altera Families Refers to FPGA in series Stratix:
High end and High Density On-Chip Transceivers Arria: Midrange Transceiver based. Cyclone: Low cost Low power consumption

100 A Complete Solutions Portfolio
CPLDs Low-cost FPGAs High-density, high-performance FPGAs Mid-range Transceiver FPGAs ASICs Altera is the programmable solutions company, offering a complete portfolio of products for creating programmable logic. That starts with the devices: low-end MAX series of cplds that include onboard ROM for storing device programming information, Cyclone devices offer the lowest cost FPGA with the largest amount of logic resources available. to the high-end Stratix series of FPGAs for creating high-performance designs that require large amounts of logic. Arria devices let you to create high-speed transceiver designs in a low-cost package. Finally, HardCopy ASICs let you transfer your stratix FPGA design to a fixed ASICs at an affordable price. Besides the actual devices, Altera offers tools to assist with the creation of your designs. We offer the Nios II embedded processor for the creation of complete system on a programmable chip solutions, as well as a number of custom, optimized ip. Development kits are available for most devices for early design and prototyping. But the focus for this training will be the Quartus II software, the main tool you’ll use to create programmable logic designs using Altera devices. Embedded soft processors Intellectual Property (IP) Design software Development kits 100

101 Altera – IP Many FPGA manufacturers offer a variety of what are called Intellectual property (IP) cores or functions. Allow the designer to select and customize specific desired function. Advantages: Faster code development time Reduced design risk less likelihood of errors. Better and faster compiling Some IP cores or functions are free, others are fee based, The IP cores or functions are manufacturer dependent. Altera’s IP called Megafunctions, designed for only their FPGAs.

102 Altera Megafunctions Pre-made design blocks Benefits Two versions
Configurable, parameterized settings add flexibility & portability “Drop-in” support to accelerate design entry Pre-optimized for Altera architecture Two versions Quartus II megafunctions Intellectual Property (IP) megafunctions Another way of creating a design. Simple logic or more complex logic such as DDR controllers. There are free blocks of these functions in the software, non-encrypted written in Altera HDL. Non-standard blocks prefixed as alt. include optional customizable functions. LPM megafunctions are industry standard functions. Can download from web.

103 Quartus II Megafunctions
Free & installed with Quartus II software Non-encrypted functions written in AHDL (Altera HDL) HDL simulation models installed in Quartus II libraries Two types Altera-specific megafunctions (begin with “ALT”) Library of parameterized modules (LPMs) Examples Arithmetic On-chip RAM/ROM PLLs DDR/QDR/RLDRAM memory controllers Removed counter/compare in Don’t see them in the MegaWizard.

104 IP Megafunctions Logic for IP function is encrypted
Must purchase license (except IP base suite) Logic for IP function is encrypted Two types MegaCore® IP – Developed by Altera Altera Megafunctions Partner Program (AMPP℠) IP All MegaCore functions & some AMPP functions support OpenCore® Plus feature Develop design using free version of core HDL simulation models provided with IP Generate time-limited configuration/programming files See AN320: OpenCore Plus Evaluation of Megafunctions MegaCore IPDeveloped by Altera – Installed wi Altera Megafunctions Partner Program (AMPP℠) IP Developed by 3rd-party IP vendors & certified by Altera Contact vendor for evaluating and licensing function th Quartus II software or download/install individually from Altera website

105 MegaCore IP Examples Included in IP base suite License required
FIR Compiler                                    Fast Fourier Transform                    DDR/DDR2 High Performance Memory Controlle License required Triple-Speed Ethernet MAC CRC Compiler PCI Compiler Some IP is included in the Quartus II subscription; some requires a license. See the link shown for a complete list of IP. See for a complete list of Altera IP solutions

106 MegaWizard Plug-in Manager
Eases implementation and configuration of megafunctions & IP GUI, command line, or both Language Tools  MegaWizard Plug-In Manager or Tasks window File Name Cold Demo the next 3 slides. Quickly create mega-functions. Divided into categories. Device family will be based on the device selected from the Quartus II project. Select Megafunction or IP Command line: qmegawiz <-silent> <module | wizard>=<mf_name> <ports & parameters options> file_name

107 MegaWizard Example Multiply-Add megafunction
Updating graphical representation Customization options Locate documentation in Quartus II Help or the web Three step process to configure megafunction After selecting the type of megafunction and naming the output file, the megawizard plug-in manager guides you through all the options available for the selected megafunction. The megawizard plug-in manager customizes itself depending on what megafunction is being configured as well as the options selected. A graphical representation of the final megafunction is displayed along with the estimated device resource usage. Documentation about the selected megafunction is available through the documentation button, which provides links to the built-in Quartus II help as well as the online megafunction user guides. Every megafunction has its own user’s guide to provide information about all the configuration options that are available.. Navigate through the megawizard quickly using the tabs in the upper-left hand corner. Creating a megafunction in the megawizard is a three-step process, starting with the parameter settings, as shown here for a multiply-adder function. The second tab, labeled EDA, displays the simulation model you would use if you were to simulate this megafunction in a third-party simulation tool.

108 MegaWizard Output File Selection
The third tab, summary show where the output files will be created. Some optional files depending on the options selected.

109 Programming an FPGA – configuration cells
Configuration file: contains the information that will be uploaded into the FPGA in order to program it (bit file). Simple: load the configuration file into the device. Programmable interconnect: connects the device’s primary inputs and outputs to the programmable logic blocks and blocks to each others.

110 Programming an FPGA – configuration cells
An example of usage of SRAM-controlled switches is illustrated showing two applications of SRAM cells: for controlling the gate nodes of pass-transistor switches and to control the select lines of multiplexers that drive logic block inputs. The figures gives an example of the connection of one logic block (represented by the AND-gate in the upper left corner) to another through two pass-transistor switches, and then a multiplexer, all controlled by SRAM cells.

111 Programming an FPGA – configuration cells
A simple programmable logic block: 4-input LUT, MUX, and a register. Configuration cell: MUX: which input is to be selected. Register: Edge-trigger FF or latch, positive or negative clock edge, active low or high enable, whether to be initialized to zero or 1. LUT: 16-configuration cells

112 Programming an FPGA – SRAM based
Volatile: have to be programmed in-system, always need to be reprogrammed when power is first applied to the system. All SRAM configuration cells as a long shift register. Beginning and end of the register are accessible from outside world. Data out is only used if multiple FPGAs are configured by cascading (daisy-chaining) together. FPGA can contain 25 mil cells, clocking 25 mil bits of configuration data into the device.

113 Programming an FPGA – SRAM based
LUT: can be configured to act as LUT, 16x1 chunk of distributed RAM, or as 16-bit shift register. Configuration port: small dedicated group of pins used to inform the device which configuration mode is going to be used, two pins are used to provide four modes. Mode pins are hardwired to desired logic (0 or 1)

114 Programming an FPGA – SRAM based Serial load with FPGA as a master
Serial load with FPGA as a master: simplest mode, use external PROM (now flash), has a single data output pin connected to configuration data in pin. FPGA uses several bit to control the external memory device, reset, clock. FPGA clocks the configuration data out of the memory device. Configuration data out is used to read the configuration data from the device for any reason. OR FPGA can be daisy-chained sharing a single memory device.

115 Programming an FPGA – SRAM based Parallel load with FPGA as a master
Very similar to serial mode, except that data is read in 8-bit chunk from memory device. FPGA also supplies the external memory with an address bus. FPGA has internal counter used to generate the address to the external memory and keeps incrementing. Offers speed: not really, data read still needs to be clocked in serially in early device, now yes! Issues with signal integrity, 8-bit data bus and 24-bit address bus. Newer version of external memory does not require external address, FPGA no longer requires counter.

116 Programming an FPGA – SRAM based Parallel load with FPGA as a slave
FPGA as a master: attractive, only FPGA and external memory involved. Microprocessor can be used to load the FPGA, it informs the FPGA to start the configuration process, it reads a byte of data fro memory device and writes into the FPGA.

117 Using the JTAG port Today’s FPGA are equipped with JTAG port (Joint Test Action Group, IEEE standards, originally used for testing the circuit boards. JTAG port: input data, output data, JTAG registers are daisy-chained. Serially clock the data in the JTAG register, FPGA operates on data and ultimately clock the result back out of the JTAG port.

118 Using the JTAG port JTAG can be used for more than Boundary Scan, FPGA connect SRAM shift register to JTAG scan chain, In this case, JTAG can be used to program the FPGA. Today’s FPGA can support five different programming modes, thus require three mode pins.

119 Using an embedded processor
When FPGA contains embedded processor, may have its own dedicated JTAG port. JTAG can be used to initialize the internal microprocessor core, configuration then can be handled by the processor.

120 FPGA Design Flow – Design Phase
The first development phase is Design FPGA design can be: Converting schematic to HDL Modify existing design Totally new design Very critical phase? Goals: Learn how to evaluate design package Decisions to make prior to creating the design How to create the design FUNCTIONAL VERIFICATION & SIMULATION CIRCUIT ANALYSIS (Timing) Implementation Path Analysis Path design entry (VHDL) FPGA Synthesis FPGA Place and Route Download to FPGA

121 Design Phase More than just create the design
Design materials must be understood “ the design package”: contains the requirements that define the FPGA features and functions, what the design must do and how. Success or failure of the design largely depends on: The quality of the design inputs Making Key decision Development Tools

122 Design Package Usually written by system engineer, or architect.
Includes: Creating of design architecture Partitioning the design into sections Creation of design requirements Creation of Timing and other diagrams (supporting documents). Do not create your own requirements? Always ask You should always evaluate its content prior to starting the design.

123 Design Package example
Timing Diagram, Requirement Documents, State machine, Schematics … etc Evaluate: Package Analysis: Be sure to have a clear understanding of what your are to design. (questions: always ask? Getting Clarification: not all design packages are crystal clear, go directly to the source. Organize: make sure you work from the latest and most accurate information.

124 Pre-design Decisions Design format, FPGA vendor? Tools used?
Design requirements may define one or some pre-design decisions. Making one decision can automatically determine the other option, selecting Altera for FPGA vendor determine Quartus Tool. Manufacturer must be known in the design phase for manufacturer dependent designs, synthesis phase when manufacturer and part number are needed for independent designs.

125 Design Format Prior to create a design, You must select the design’s format: Schematic capture, HDL, or a combination. Sometimes, the decision has been made by your design package. You select manufacturer and development tools. If you are starting a new design, you may have the option to select the design format. Schematic Capture: Pros: Design is drawn as a schematic, easier to create, read, and understand. Cons: Logic symbols are proprietary, design is manufacturer dependent, less flexible. Option on development tools are limited HDL: Pros: more design and manufacturer flexibility, manufacturer independent Cons: May b difficult to read and understand

126 FPGA Manufacturer How to select a device: Need to know how much resources your design require, can be difficult at first. A good way: randomly select a device, synthesize the design, and review the resources required in the output report. With this information, use a datasheet to select a more appropriately sized device. Factors to consider when selecting the device: Design Application: Avionics, Military, Automotive, Medical, and so forth Environment: Military, Industrial, commercial Temperature range: Commercial, 0 to 85 C, Industrial -40 to 100 C, Military -55 to 125 C Design Size: Board allocated space, Package.

127 Development Tools Each development phase utilizes specific tools. Design phase development tool depends mainly on the output format, if your design is a schematic capture, then the design entry must support schematic capture. Cost: Fees can be very expensive, (license fees, yearly maintenance, know your needs!) Design sharing: Have a set of tools to manage and control the design and its revisions. Complete or Standalone: Manufacturers offer a complete development tools (Altera’s Quartus, Xilinx ISE. Standalone tools performs single function, such as synthesis or simulation: example: Mentor Graphics' ModelSim and Synopsys’s Synplify for design synthesis

128 Advanced VHDL – Design Phase
Writing synthesizable VHDL Inferring common logic functions Coding state machines Improving logic utilization & performance Writing parameterized code

129 Simulation vs. Synthesis
Code executed in the exact way it is written User has flexibility in writing Initialization of logic supported Synthesis Code is interpreted & hardware created Knowledge of PLD architecture is important Synthesis tools require certain coding to generate correct logic Subset of VHDL language supported Coding style is important for fast & efficient logic Initialization controlled by device Logic implementation can be adjusted to support initialization Pre- & post-synthesis logic should operate the same

130 Writing Synthesizable VHDL
Synthesizable VHDL Constructs Sensitivity lists Latches vs. registers IF-THEN-ELSE structures CASE statements Variables Synthesizable subprograms Combinatorial loops Gated clocks

131 Some Synthesizable VHDL Constructs
ENTITY ARCHITECTURE CONFIGURATION PACKAGE Concurrent signal assignments PROCESS SIGNAL VARIABLE (non-shared) CONSTANT IF-ELSE CASE Loops (fixed iteration) Multi-dimensional arrays PORT GENERIC (constant) COMPONENT Component & direct instantiation GENERATE FUNCTION PROCEDURE ASSERT (constant false) WAIT (one per process) TYPE SUBTYPE Synthesis tools may place certain restrictions on supported constructs See the online help in Quartus II (or your target synthesis tool) for a complete list

132 Some Non-Synthesizable VHDL Constructs
ACCESS ASSERT DISCONNECT FILE GROUP NEW Physical delay types PROTECTED SHARED VARIABLE Signal assignment delays These are some of the constructs not supported by Quartus II synthesis See the online help in Quartus II (or your target synthesis tool) for a complete list

133 Two Types of RTL PROCESS Statements
b sel c Combinatorial PROCESS Sensitive to all signals used on right-hand side of assignment statements Example PROCESS (a, b, sel) Sensitivity list includes all inputs used In the combinatorial logic CLRN ENA D Q d clk clr q Sequential PROCESS Sensitive to a clock and control signals Example PROCESS (clr, clk) Sensitivity list does not include the d input, only the clock or/and control signals

134 Sensitivity Lists Incomplete sensitivity list in combinatorial PROCESS blocks may result in differences between RTL & gate-level simulations Synthesis tool synthesizes as if sensitivity list complete PROCESS (a, b) y <= a AND b AND c; Incorrect Way – the simulated behavior is not that of the synthesized 3-input AND gate PROCESS (a, b, c) y <= a AND b AND c; Correct way for the intended AND logic !

135 Common Pitfall – Missing Inputs from Sensitivity List
Pitfall – Missing inputs from sensitivity list when describing combinational behavior Results in sequential behavior Wrong 4x1 mux example Has memory No compiler error Just not a mux LIBRARY ieee; USE ieee.std_logic_1164.ALL; ENTITY Mux4 IS PORT (i3, i2, i1, i0: IN std_logic; s1, s0: IN std_logic; d: OUT std_logic); END Mux4; ARCHITECTURE Beh OF Mux4 IS BEGIN -- Note: missing i3, i2, i1, i0 PROCESS(s1, s0) IF (s1='0' AND s0='0') THEN d <= i0; ELSIF (s1='0' AND s0='1') THEN d <= i1; ELSIF (s1='1' AND s0='0') THEN d <= i2; ELSE d <= i3; END IF; END PROCESS; END Beh; Missing i3-i0 from sensitivity list Recomputes d if s1 or s0 changes Fails to recompute d if i3 (or i2-i0) changes Reminder Combinational behavior: Output value is purely a function of the present input values Sequential behavior: Output value is a function of present and past input values, i.e., the system has memory i1 i3 s1 s0 d

136 Latches vs. Registers Altera devices have registers in logic elements, not latches Latches are implemented using combinatorial logic & can make timing analysis more complicated Look-up table (LUT) devices use LUTs in combinatorial loops Product-term devices use more product-terms Recommendations Design with registers (RTL) Watch out for inferred latches Latches inferred on combinatorial outputs when results not specified for set of input conditions Lead to simulation/synthesis mismatches

137 IF-ELSE Structure IF-ELSE (like WHEN-ELSE concurrent assignment) structure implies prioritization & dependency Nth clause implies all N-1 previous clauses not true Beware of needlessly “ballooning” logic Consider restructuring IF statements May flatten the multiplexer and reduce logic If sequential statements are mutually exclusive, individual IF structures may be more efficient Logical Equation (<cond1> • A) + (<cond1>’ • <cond2> • B) + (<cond1>’ • <cond2>’ • cond3 • C) + … IF <cond1> THEN IF <cond2> THEN IF <cond1> AND <cond2> THEN

138 When Writing IF-ELSE Structures…
Cover all cases Uncovered cases in combinatorial processes result in latches For efficiency, consider Using don’t cares (‘-’ or ‘X’) for final ELSE clause (avoiding unnecessary default conditions) Synthesis tool has freedom to encode don’t cares for maximum optimization Assigning initial values and explicitly covering only those results different from initial values

139 Unwanted Latches Combinatorial processes that do not cover all possible input conditions generate latches PROCESS (sel, a, b, c) BEGIN IF sel = “001” THEN output <= a; ELSIF sel = “010” THEN output <= b; ELSIF sel = “100” THEN output <= c; END IF; END PROCESS; sel(2) LOGIC LATCH output sel(1) A sel(0) B C

140 Unwanted Latches Removed
Close all IF-ELSE structures If possible, assign “don’t care’s” to else clause for improved logic optimization PROCESS (sel, a, b, c) BEGIN IF sel = “001” THEN output <= a; ELSIF sel = “010” THEN output <= b; ELSIF sel = “100” THEN output <= c; ELSE output <= (OTHERS => ‘X’); END IF; END PROCESS; sel(2) LOGIC output sel(1) A sel(0) B C

141 Common Pitfall – Output not Assigned on Every Pass
LIBRARY ieee; USE ieee.std_logic_1164.ALL; ENTITY Dcd2x4 IS PORT (i1, i0: IN std_logic; d3, d2, d1, d0: OUT std_logic); END Dcd2x4; ARCHITECTURE Beh OF Dcd2x4 IS BEGIN PROCESS(i1, i0) IF (i1='0' AND i0='0') THEN d3 <= '0'; d2 <= '0'; d1 <= '0'; d0 <= '1'; ELSIF (i1='0' AND i0='1') THEN d1 <= '1'; d0 <= '0'; ELSIF (i1='1' AND i0='0') THEN d3 <= '0'; d2 <= '1'; d1 <= '0'; d0 <= '0'; ELSIF (i1='1' AND i0='1') THEN d3 <= '1'; END IF; -- Note: missing assignments -- to all outputs in last ELSIF END PROCESS; END Beh; Pitfall – Failing to assign every output on every pass through the process for combinational behavior Results in sequential behavior Referred to as inferred latch Wrong 2x4 decoder example Has memory No compiler error Just not a decoder Missing assignments to outputs d2, d1, d0 i1i0=10  d2=1, others=0 i1i0=11  d3=1, but d2 stays same i1 i0 d3 d2

142 Common Pitfall – Output not Assigned on Every Pass
Same pitfall often occurs due to not considering all possible input combinations PROCESS(i1, i0) BEGIN IF (i1='0' AND i0='0') THEN d3 <= '0'; d2 <= '0'; d1 <= '0'; d0 <= '1'; ELSIF (i1='0' AND i0='1') THEN d1 <= '1'; d0 <= '0'; ELSIF (i1='1' AND i0='0') THEN d3 <= '0'; d2 <= '1'; d1 <= '0'; d0 <= '0'; END IF; END PROCESS; Last "ELSE" missing, so not all input combinations are covered (i.e., i1i0=11 not covered) – no update to the outputs

143 Mutually Exclusive IF-ELSE Latches
Beware of building unnecessary dependencies e.g. Outputs x, y, z are mutually exclusive, IF-ELSIF causes all outputs to be dependant on all tests & creates latches PROCESS (sel,a,b,c) BEGIN IF sel = “010” THEN x <= a; ELSIF sel = “100” THEN y <= b; ELSIF sel = “001” THEN z <= c; ELSE x <= ‘0’; y <= ‘0’; z <= ‘0’; END IF; END PROCESS; sel(2) LOGIC LATCH X sel(1) A sel(0) sel(2) LOGIC LATCH Y sel(1) B sel(0) sel(0) LOGIC LATCH sel(1) Z sel(2) C

144 Mutually Exclusive Latches Removed
Separate IF statements and close PROCESS (sel, a, b, c) BEGIN IF sel = “010” THEN x <= a; ELSE x <= ‘0’; END IF; IF sel = “100” THEN y <= b; y <= ‘0’; IF sel = “001” THEN z <= c; z <= ‘0’; END PROCESS; PROCESS (sel, a, b, c) BEGIN x <= ‘0’; y <= ‘0’; z <= ‘0’; IF sel = “010” THEN x <= a; END IF; IF sel = “100” THEN y <= b; IF sel = “001” THEN z <= c; END PROCESS; sel(0) LOGIC sel(1) X sel(2) A sel(0) LOGIC sel(1) Y sel(2) B sel(0) LOGIC sel(1) Z sel(2) C

145 Nested IF Generating Unwanted Latches
Use nested IF statements with care e.g. These nested IF statements do not cover all possible conditions (open IF statements) & latch is created ina inb out 1 1 1 0 0 0 0 1 0 1 0 ? PROCESS (ina, inb) BEGIN IF ina = '1' THEN IF inb = '1' THEN y <= '1'; END IF; ELSE y <= '0'; END PROCESS; ina inb y Uncovered cases infer latches No default value for objects

146 Nested IF – Unwanted Latches Removed
ina inb out 1 1 1 0 0 0 0 1 0 1 0 0 PROCESS (ina, inb) BEGIN y <= ‘0’; IF ina = '1' THEN IF inb = '1' THEN y <= '1'; END IF; END PROCESS; ina out inb Using initialization to cover all cases; no latch inferred

147 Case Statements Case statements usually synthesize more efficiently when mutual exclusivity exists Define outputs for all cases Undefined outputs for any given case generate latches VHDL already requires all case conditions be covered Use WHEN OTHERS clause to close undefined cases (if any remain)

148 Case Statement Recommendations
Initialize all case outputs or ensure outputs assigned in each case Assign initialized or default values to don’t cares (X) for further optimization, if logic allows

149 Unwanted Latches - Case Statements
Conditions where output is undetermined output: PROCESS (filter) BEGIN CASE filter IS WHEN idle => nxt <= '0'; first <= '0'; WHEN tap1 => sel <= "00"; first <= '1'; WHEN tap2 => sel <= "01"; WHEN tap3 => sel <= "10"; WHEN tap4 => sel <= "11"; nxt <= '1'; END CASE; END PROCESS output; Undetermined output conditions implies memory Latch generated for ALL 3 outputs sel missing nxt missing nxt missing nxt & first missing first missing

150 Latches Removed - Case Statements
Conditions where output is determined output: PROCESS(filter) BEGIN first <= ‘0’; nxt <= ‘0’; sel <= “00”; CASE filter IS WHEN idle => WHEN tap1 => first <= '1'; WHEN tap2 => sel <= "01"; WHEN tap3 => sel <= "10"; WHEN tap4 => sel <= "11"; nxt <= '1'; END CASE; END PROCESS output; To remove latches & ensure outputs are never undetermined Use signal initialization at beginning of case statement (case statement only deals with changes) Use don’t cares (‘-’) for WHEN OTHERS clause, if design allows (for better logic optimization) Manually set output in each case Signals Initialized

151 Variable Declarations
Variables are declared inside a process Variables are represented by: := Variable declaration VARIABLE <name> : <DATA_TYPE> := <value>; Variable temp : STD_LOGIC_VECTOR (7 DOWNTO 0); Variable assignments are updated immediately Do not incur a delay No Delay Temporary storage

152 Assigning Values to Variables
VARIABLE temp : STD_LOGIC_VECTOR (7 DOWNTO 0); Variable assignments are represented by := Examples All bits temp := “ ”; temp := x”aa” ; ( ) VHDL also supports ‘o’ for octal and ‘b’ for binary Bit-slicing temp (7 DOWNTO 4) := “1010”; Single bit temp(7) := ‘1’; Use double-quotes (“ “) to assign multi-bit values and single-quotes (‘ ‘) to assign single-bit values

153 Variable Assignment Variable c updated immediately and new value is available for assigning to y LIBRARY IEEE; USE IEEE.STD_LOGIC_1164.ALL; ENTITY var IS PORT ( a, b : IN STD_LOGIC; y : OUT STD_LOGIC ); END ENTITY var; ARCHITECTURE logic OF var IS BEGIN PROCESS (a, b) VARIABLE c : STD_LOGIC; BEGIN c := a AND b; y <= c; END PROCESS; END ARCHITECTURE logic; Variable declaration Variable assignment Variable is assigned to a signal to synthesize to a piece of hardware

154 Signal and Variable Scope
ARCHITECTURE Declared outside of the process statements (Visible to all process statements) {SIGNAL declarations} label1: PROCESS {VARIABLE Declarations} label2: PROCESS {VARIABLE Declarations} Declared inside the PROCESS statements (locally visible to the process statements)

155 Signals vs. Variables Signals (<=) Variables (:=) Assign Utility
assignee <= assignment assignee := assignment Utility Represent circuit interconnect Represent local storage Scope Architecture scope (communicate between processes within architecture) Local Scope (inside processes) Behavior Updated at end of current delta cycle (new value not immediately available) Updated immediately

156 Variables May synthesize to hardware depending on use
Advantages vs. signals Variables are a more behavioral construct as they don’t have a direct correlation to hardware (like signals) and may lead to more efficient logic Simulate more efficiently as they require less memory Signals not updated immediately, so simulator must store two values (current and next value) for every changing signal Variables updated immediately, so simulator stores single value Disadvantages vs. signals Must be assigned to signal before process ends Do not represent physical hardware unless equated with signal Must be handled with care Requires fully understand assigning values to variables and signals in same process and how dataflow is effected

157 Variables & Latches (Recommendations)
Assign an initial value or signal to a variable unless feedback is desired If a variable is not assigned an initial value or signal in a combinatorial process, a latch will be generated This could cause your design to not function as intended

158 Variable Uninitialized
ARCHITECTURE logic OF cmb_vari IS BEGIN PROCESS(i0, i1, a) VARIABLE val : INTEGER RANGE 0 TO 1; IF (a = '0') THEN val := val; ELSE val := val + 1; END IF; CASE val IS WHEN 0 => q <= i0; WHEN OTHERS => q <= i1; END CASE; END PROCESS; END ARCHITECTURE logic; Variable used without initialization a case (val) …; 1

159 Assign Initial Value to Variable
ARCHITECTURE logic OF cmb_vari IS BEGIN PROCESS(i0, i1, a) VARIABLE val : INTEGER RANGE 0 TO 1; val := 0; IF (a = '0') THEN val := val; ELSE val := val + 1; END IF; CASE val IS WHEN 0 => q <= i0; WHEN OTHERS => q <= i1; END CASE; END PROCESS; END ARCHITECTURE logic; Assign initial value or signal to variable a case (val) …;

160 Subprograms VHDL has 2 subprograms Uses
FUNCTION Performs calculation and returns value PROCEDURE Performs sequence of defined sequential statements Uses Replacing repetitive code Enhancing readability Break processes into executable sections Defined by means of subprogram declaration (optional) and subprogram body Subprogram declarations required if subprogram is called before subprogram body is read Consist of sequential statements (like a process) May be declared in process, architecture or package Determines visibility When placed in package, subprogram declaration goes in package declaration and subprogram body goes in package body (see earlier package example) Synthesis places restrictions on use of subprograms The difference between subprograms and instantiating components is that subprograms, on their own, do not generate hierarchy. The subprogram code is, in effect, included in the same level as the logic that called it. Subprograms really do implement substitution, so they are a replacement for having to write the code where the subprogram is called. Now the sequential statements inside a subprogram might, once synthesized, generate hierarchy. For example, if you perform an multiply operation, the synthesis tools can map this to a hierarchical multiply block.

161 Subprogram Diagram ARCHITECTURE FUNCTION PROCEDURE PARAMETERS
RETURN VALUE PROCEDURE IN PARAMETERS OUT PARAMETERS INOUT PARAMETERS

162 Function Definition & Call
Function Declaration FUNCTION ones_count (SIGNAL a : STD_LOGIC_VECTOR) RETURN VARIABLE; Must return a single value based on zero or more inputs Must be called in an expression Can be passed classes CONSTANT (default), SIGNAL or FILE Class for internal objects must be VARIABLE Function Body FUNCTION ones_count (SIGNAL a : STD_LOGIC_VECTOR) IS VARIABLE r : INTEGER; BEGIN r := 0; FOR i IN a’RANGE LOOP IF a(i) /= ’0’ THEN r := r + 1 ; END IF; END LOOP; RETURN r; -- Required END FUNCTION ones_count; This function performs the ones count operation found in the LOOPs section, but it is implemented as a function. Notice the STD_LOGIC_VECTOR input A to the function has no range. This means the function call can accept an vector of any width. The function, when called, counts the number of non-0’s in the vector and returns the integer count Since it is a function, it must be called in an expression. Invoking a Function total_ones <= ones_count (input) WHEN test_ones = ‘1’; Note: ‘RANGE is a VHDL attribute which returns the range of the object it is applied to (e.g. 7 DOWNTO 0)

163 Procedure Definition & Call
Procedure Declaration PROCEDURE incr_comp ( SIGNAL cnt_sig : INOUT STD_LOGIC_VECTOR; CONSTANT max : IN INTEGER; SIGNAL maxed_out : OUT BOOLEAN ); Procedure Declaration May have inputs, inouts and outputs May return zero or multiple outputs Must be called as a separate sequential statement Parameters may be any class Inputs are CONSTANT by default Outputs/inouts are VARIABLE by default PROCEDURE incr_comp ( SIGNAL cnt_sig : INOUT STD_LOGIC_VECTOR; CONSTANT max : IN INTEGER; SIGNAL maxed_out : OUT BOOLEAN ) IS -- declare any local objects (i.e. constants, variables,…) BEGIN IF cnt_sig >= max THEN maxed_out <= TRUE; ELSE maxed_out <= FALSE; cnt_sig <= cnt_sig + 1; END IF; END PROCEDURE incr_comp; incr_comp stands for “increment compare” There is a way to call procedures as concurrent statements, but it involves creating a corresponding process for the procedure. The corresponding process must have no sensitivity list and must only contain the procedure call and a wait statement. Again, notice the procedure connection CNT_SIG is a vector with no defined width, which means its width will be determined by the width of the vector passed to it. This procedure on this slide increments a count signal and tests to see if it is has reached a max count value. For example, if you have a design that has a myriad of counters for various operations, each that has a max number of values in which to count before signaling a flag, instead of typing “cnt <= cnt + 1” for each one and testing for the max, you can call this procedure. Pass the procedure the count signal and the max count value and it checks for you and signals a flag (that you also must pass to it) if you have reached your max count value. Invoking a Procedure incr_comp (err_cnt, 12, err_cnt_maxed); incr_comp (code_cnt, 144, code_cnt_maxed);

164 Functions vs. Procedures
Always execute in zero time Cannot pause their execution Can not contain any delay, event, or timing control statements Must have at least one input argument Inputs may not be affected by function Arguments may not be outputs and inouts Always return a single value May execute in non-zero simulation time May contain delay, event, or timing control statements May have zero or more input, output, or inout arguments Modify zero or more values Return values by means of parameter arguments

165 Synthesizable Subprograms
Make code more readable/reusable Two types Functions Synthesize to combinatorial logic Procedures Can synthesize to combinatorial or sequential logic Signal assignments in procedures called from clocked processes generate registers May test for clock edges May not be supported by all synthesis tools Must not contain WAIT statements Each call generates a separate block of logic No logic sharing Implement manual resource sharing, if possible (discussed later)

166 Combinational Loops Common cause of instability
Behavior of loop depends on the relative propagation delays through logic Propagation delays can change Simulation tools may not match hardware behavior PROCESS (clk, clrn) BEGIN IF clrn = ‘0’ THEN q <= 0; ELSIF rising_edge (clk) THEN q <= d; END IF; END PROCESS; clrn <= (ctrl1 XOR ctrl2) AND q; CLRN ENA D Q d clk q Logic

167 Combinational Loops All feedback loops should include registers Logic
PROCESS (clk, clrn) BEGIN IF clrn = ‘0’ THEN q <= 0; ELSIF rising_edge (clk) q <= d; END IF; END PROCESS; PROCESS (clk) IF rising_edge (clk) THEN clrn <= (ctrl1 XOR ctrl2) AND q; All feedback loops should include registers CLRN ENA D Q d clk q Logic clrn

168 Gated Clocks Can lead to both functional and timing problems
Clock behavior subject to both synthesis and placement & routing Can be a source of additional clock skew Glitches on clock path possible Recommendations: Use clock enables for clock gating functionality Use dedicated device resources (e.g. clock control blocks) to gate clocks synchronously and reduce power If you must build your own gating logic Use a synchronous gating structure Ensure global clock routing is used for clock signal Gate the clock at the source

169 Gated Clock Examples g_clk <= gate AND clk; PROCESS (g_clk, clrn)
BEGIN IF clrn = ‘0’ THEN q <= ‘0’; ELSIF rising_edge(g_clk) THEN q <= d; END IF; END PROCESS; PROCESS (clk) BEGIN IF falling_edge (clk) THEN sgate <= gate; END IF; END PROCESS; g_clk <= sgate AND clk; PROCESS (g_clk, clrn) IF clrn = ‘0’ THEN q <= ‘0’; ELSIF rising_edge (g_clk) THEN q <= d; Poor clock gating – Active clock edges occurring near gate signal changes may result in glitches Better clock gating – Gate signal clocked by falling edge clk, so gate may only change on inactive clock edge (Use OR gate when falling edge is the active clock edge)

170 How Many Registers? LIBRARY IEEE; USE IEEE.STD_LOGIC_1164.ALL;
ENTITY reg1 IS PORT ( d : IN STD_LOGIC; clk : IN STD_LOGIC; q : OUT STD_LOGIC ); END ENTITY reg1; ARCHITECTURE logic OF reg1 IS SIGNAL a, b : STD_LOGIC; BEGIN PROCESS (clk) IF rising_edge (clk) THEN a <= d; b <= a; q <= b; END IF; END PROCESS; END ARCHITECTURE reg1;

171 How Many Registers? Signal assignments inside the IF-THEN statement that checks the clock condition infer registers CLRN ENA D Q clk q b d a

172 How Many Registers? Signal Assignment Moved LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.ALL; ENTITY reg2 IS PORT ( d : IN STD_LOGIC; clk : IN STD_LOGIC; q : OUT STD_LOGIC ); END ENTITY reg2; ARCHITECTURE logic OF reg2 IS SIGNAL a, b : STD_LOGIC; BEGIN PROCESS (clk) IF rising_edge (clk) THEN a <= d; b <= a; END IF; END PROCESS; q <= b; END ARCHITECTURE reg1; Signal Assignment Moved

173 How Many Registers? Signal b to signal q assignment is no longer edge-sensitive because it is not inside the if-then statement that checks the clock condition q CLRN ENA D Q clk d a

174 How Many Registers? Signals changed to variables LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.ALL; ENTITY reg3 IS PORT ( d : IN STD_LOGIC; clk : IN STD_LOGIC; q : OUT STD_LOGIC ); END ENTITY reg3; ARCHITECTURE logic OF reg3 IS BEGIN PROCESS (clk) VARIABLE a, b : STD_LOGIC; IF rising_edge (clk) THEN a := d; b := a; q <= b; END IF; END PROCESS; END ARCHITECTURE reg1; Signals changed to variables

175 How Many Registers? Variable assignments are updated immediately
Signal assignments are updated on clock edge CLRN ENA D Q d clk q

176 Inferring Logic Functions
Using behavioral modeling to describe logic blocks Synthesis tools recognize description & insert equivalent logic functions (e.g. megafunctions) Functions typically pre-optimized for utilization or performance over general purpose functionally equivalent logic Use synthesis tool’s templates (if available) as starting point Use synthesis tool’s graphic display to verify logic recognition Makes code vendor-independent

177 Logic Inference Example
PROCESS (clock) BEGIN IF rising_edge (clock) THEN IF wren = ‘1’ THEN mem(conv_integer(address) <= data; END IF; q <= mem(conv_integer(address); END PROCESS; Synthesis tool sees Replaces with Altera megafunction and/or library cells

178 Quartus II VHDL Templates
Preview window: edit before inserting & save as user template Insert Template (Edit menu) 178

179 Quartus II Software RTL Viewer
Graphically represents results of synthesis Schematic View Toolbar Starting RTL Viewer Run Analysis & Elaboration (Processing menu or Task window) Any processing that performs elaboration Open RTL Viewer (Tools menu or Tasks window) Displays last successful analysis Hierarchy List Hierarchy List

180 Inferring Common Functions
Latches Registers Counters Tri-states Memory

181 Latch Inference – “Wanted” Latch
Latch in RTL Viewer LIBRARY IEEE; USE IEEE.std_logic_1164.ALL; ENTITY latch IS PORT ( data : IN std_logic; gate : IN std_logic; q : OUT std_logic); END ENTITY latch; ARCHITECTURE behavior OF latch IS BEGIN label_1: PROCESS (data, gate) IF gate = '1' THEN q <= data; END IF; END PROCESS label_1; END ARCHITECTURE behavior; Latch in Technology Viewer sensitivity list includes both inputs level sensitive…not edge What happens if gate = ‘0’? Implicit memory & feedback

182 DFF Using rising_edge Function
LIBRARY IEEE; USE IEEE.std_logic_1164.ALL; ENTITY dff_b IS PORT ( d : IN std_logic; clk : IN std_logic; q : OUT std_logic); END ENTITY dff_b; ARCHITECTURE behavior OF dff_b IS BEGIN PROCESS(clk) IF rising_edge(clk) THEN q <= d; END IF; END PROCESS; END ARCHITECTURE behavior; CLRN ENA D Q d clk q rising_edge IEEE function that is defined in the std_logic_1164 package specifies that the signal value must be 0 to 1 X, Z to 1 transition is not allowed

183 DFF Using clk’event and clk=‘1’
LIBRARY IEEE; USE IEEE.STD_LOGIC_1164.ALL; ENTITY dff_a IS PORT ( d : in std_logic; clk : in std_logic; q : out std_logic); END ENTITY dff_a; ARCHITECTURE behavior OF dff_a IS BEGIN PROCESS (clk) IF (clk'event and clk = '1’) THEN q <= d; END IF; END PROCESS; END ARCHITECTURE behavior; CLRN ENA D Q d clk q clk’event and clk=‘1’ clk is the signal name (any name) ‘event is a VHDL attribute, specifying that there needs to be a change in signal value clk=‘1’ means positive-edge triggered

184 Recommended DFF Inference
Use the rising_edge function for consistent simulation ‘X’ to ‘1’ transitions trigger the DFF when clk’event and clk=‘1’ is used, but not when rising_edge is used Both clk’event and clk=‘1’ & rising_edge produce the same synthesis Must use std_logic_1164 package for rising_edge or falling_edge functions

185 Secondary Control Signals
Register control signals vary between FPGA & CPLD families Clear, preset, load, clock enable, etc. Avoid using signals not available in architecture Functionality of design supported by creating extra logic cells Less efficient, possibly slower results

186 DFF with Secondary Control Signals
ARCHITECTURE behavior OF dff IS BEGIN PROCESS(clk, aclr, apre, aload, adata) IF aclr = ‘1' THEN q <= '0'; ELSIF apre = ‘1’ THEN q <= ‘1’; ELSIF aload = ‘1’ THEN q <= adata; ELSIF rising_edge(clk) THEN IF ena = ‘1’ THEN IF sclr = ‘1’ THEN q <= ‘0’; ELSIF sload = ‘1’ THEN q <= sdata; ELSE q <= d; END IF; END PROCESS; END ARCHITECTURE behavior; This is how to implement all asynchronous and synchronous control signals for the Altera PLD registers Conditions outside of the rising_edge statement are asynchronous Conditions inside of the rising_edge statement are synchronous Remove signals not required by your logic Synchronous controls are not included in sensitivity list

187 Incorrect Control Signal Priority
ARCHITECTURE behavior OF dff_clr IS BEGIN PROCESS(clk) IF rising_edge(clk) THEN IF sclr = ‘1’ THEN q <= ‘0’; ELSIF ena = ‘1’ THEN q <= d; END IF; END PROCESS; END ARCHITECTURE behavior; 2 control signals Considerations Do the registers in the hardware have both ports available? How does hardware behave? Does clear or enable have priority? Sync clear has priority enable over in code Enable has priority over sync clear in silicon Additional logic needed to force code priority

188 Control Signals Priority
Asynchronous clear (aclr) Asynchronous preset (pre) Asynchronous load (aload) Enable (ena) Synchronous clear (sclr) Synchronous load (sload) Same for all Altera FPGA families All signals not supported by all families Re-ordering generates extra logic

189 Incorrect Control Logic
PROCESS (clk, clr_n) BEGIN IF clr_n = '0' THEN x <= '0'; ELSIF rising_edge(clk) THEN x <= a; y <= b; END IF; END PROCESS; CLRN ENA D Q x a clk clr_n y is not included in clr_n condition What is the behaviour specified for y when clr_n is asserted? While clr_n clears x, it acts like an enable for y y b CLRN ENA D Q clk clr_n

190 DFF with Clock Enable q d clk ena clr_n
ARCHITECTURE behavior OF dff_all IS SIGNAL ena : std_logic; BEGIN PROCESS (clk, clr_n) IF clr_n = '0' THEN q <= '0'; ELSIF rising_edge(clk) THEN IF ena = '1' THEN q <= d; END IF; END PROCESS; ena <= (ena_a OR ena_b) XOR ena_c; END ARCHITECTURE behavior; d CLRN ENA D Q q clk ena clr_n To ensure that this is synthesised using DFFE primitives (DFF with enable) Place the enable statement directly after the rising edge statement Place enable expressions in separate process or assignment If the synthesis tool does not recognize this as an enable it will be implemented using extra LUTs

191 Shift Registers ARCHITECTURE behavior OF shift IS SIGNAL qi : STD_LOGIC_VECTOR (7 DOWNTO 0); BEGIN PROCESS (clock, aclr) IF aclr = ‘1' THEN qi <= (OTHERS => '0‘); ELSIF rising_edge(clock) THEN IF enable = '1' THEN qi <= qi (6 DOWNTO 0) & shiftin; END IF; END PROCESS; q <= qi; END ARCHITECTURE behavior; Shift register with parallel output, serial input, asynchronous clear and enable which shifts left Add or remove secondary controls similar to DFF Shift function (& = Concatenation)

192 Basic Counter Binary up counter with asynchronous clear
PROCESS (clock, aclr) VARIABLE cnt : std_logic_vector (7 DOWNTO 0); BEGIN IF aclr = ‘1' THEN cnt := (OTHERS => '0‘); ELSIF rising_edge(clock) THEN cnt := cnt + 1; END IF; q <= cnt; END PROCESS; Binary up counter with asynchronous clear Add or remove secondary controls similar to DFF Count function Note: These examples use the VARIABLE class as the count variable but a SIGNAL could have been used just as easily

193 Counter Using Integers
PROCESS (clock, aclr) VARIABLE cnt : INTEGER RANGE 0 TO 255; BEGIN IF aclr = ‘1’ THEN cnt := 0; ELSIF rising_edge(clock) THEN IF cnt = 255 THEN ELSE cnt := cnt + 1; END IF; q <= conv_std_logic_vector(cnt,8); END PROCESS; Range determines bit width for counter If range is left out, counter will default to at least 32 bits Must manually account for rollover No automatic rollover for integers (unlike std_logic) If missing, code causes end of range errors in simulation (synthesizes correctly) conv_std_logic_vector(<integer_name_or_value>, <bus_width>) converts integer to std_logic Found in std_logic_arith package

194 Up / Down Counter PROCESS (clock,aclr)
VARIABLE cnt : std_logic_vector(7 DOWNTO 0); VARIABLE direction : integer RANGE -1 TO 1; BEGIN IF aclr = ‘1’ THEN cnt := (OTHERS => '0‘); ELSIF rising_edge(clock) THEN IF updown = ‘1’ THEN direction := 1; ELSE direction := -1; END IF; cnt := cnt + direction; q <= cnt; END PROCESS;

195 Modulus 200 Counter PROCESS (clock,aclr)
VARIABLE cnt : std_logic_vector(7 DOWNTO 0); CONSTANT modulus : INTEGER := 200; BEGIN IF aclr = ‘1’ THEN cnt := (OTHERS => '0‘); ELSIF rising_edge(clock) THEN IF cnt = modulus-1 THEN ELSE cnt := cnt + 1; END IF; q <= cnt; END PROCESS;

196 Modulus 200 Counter Using Integers
PROCESS (clock,aclr) VARIABLE cnt : INTEGER RANGE 0 TO 199; CONSTANT modulus : INTEGER := 200; BEGIN IF aclr = ‘1’ THEN cnt := 0; ELSIF rising_edge(clock) THEN IF cnt = modulus-1 THEN ELSE cnt := cnt + 1; END IF; q <= conv_std_logic_vector(cnt,8); END PROCESS; Cannot simply change range Same logic if range was 0 to 255 Range used by synthesis tool to define bit width; Does not build decode logic for synchronous reset Logic must be defined explicitly

197 Integers vs. Standard Logic Arrays
Represent numbers only Are more behavioral than standard logic Synthesis tools more free to generate resulting logic May generate less logic Integers use less storage space during processing Simulate faster Always use RANGE to constrain integers for synthesis Defaults to 32 bits Use for internal calculations and describing internal logic Represent an array of 9 signal values Can be “sliced” Are more structural than integers Structure must be optimized down into efficient logic Can be set to bus widths wider than 32 bits Automatically roll over during calculations Use for I/O ports & data path

198 Tri-states Eliminates possible bus contention
IEEE defines ‘Z’ value in STD_LOGIC package Simulation: Behaves like high-impedance state Synthesis: Converted to tri-state buffers Altera devices have tri-state buffers only in I/O cells Benefits: Eliminates possible bus contention Location of internal logic is a non-issue Cost savings Don’t pay for unused tri-state buffers Less testing required of devices Internal tri-states must be converted to combinatorial logic Complex output enable may cause errors or inefficient logic

199 Inferring Tri-states Correctly
Conditional Signal Assignment Device ARCHITECTURE behavior OF tri1 IS BEGIN out_sig <= in_sig WHEN ena = ‘1’ ELSE ‘Z’; END ARCHITECTURE behavior; ena out_sig in_sig Process Statement I/O Cells ARCHITECTURE behavior OF tri2 IS BEGIN driver1 : PROCESS (ena, in_sig) IF (ena=‘1’) THEN out_sig <= in_sig; ELSE out_sig <= ‘Z’; END IF; END PROCESS; END ARCHITECTURE behavior; Only 1 Assignment to Output Variable Uses Tri-State Buffer in I/O Cell

200 Inferring Tri-states Incorrectly
ARCHITECTURE behavior OF tri3 IS BEGIN out_sig <= in_sig1 WHEN ena1 = ‘1’ ELSE ‘Z’; out_sig <= in_sig2 WHEN ena2 = ‘1’ ELSE ‘Z’; END ARCHITECTURE behavior; APEX II Device Logic ena1 ena2 out_sig in_sig1 in_sig2 2 Assignments to Same Signal Not Allowed in Synthesis Unless ‘Z” Is Used Output Enable Logic Emulated in LEs Simulation & Synthesis Do Not Match I/O Cells

201 Bidirectional Pins Declare pin as direction INOUT
ENTITY bidir_pin IS ( bidir : INOUT std_logic; oe, clk, from_core : IN std_logic; to_core : OUT std_logic; END ENTITY bidir_pin; ARCHITECTURE behavior OF bidir_pin IS BEGIN bidir <= from_core WHEN oe=‘1’ ELSE “Z”; to_core <= bidir; END ARCHITECTURE behavior; Declare pin as direction INOUT Use INOUT as both input & tri-stated output Input side always “on” For registered bidirectional I/O, use separate process to infer registers bidir as an tri-stated output bidir as an input

202 Memory Synthesis tools have different capabilities for recognizing memories Synthesis tools are sensitive to certain coding styles in order to recognize memories Usually described in the tool documentation Tools and target devices may have limitations in architecture implementation Synchronous inputs only Limitations in clocking schemes Memory size limitations Read-during-write support Must declare an array data type to hold memory values Recommendation: Read Quartus II Handbook, Volume 1, Chapter 6 for more information on inferring memories and read during write behavior

203 Inferred Single-Port Memory (1)
ARCHITECTURE logic OF sp_ram IS TYPE mem_type IS ARRAY (0 TO 63) OF std_logic_vector (7 DOWNTO 0); SIGNAL mem: mem_type; BEGIN PROCESS (clock) BEGIN IF rising_edge(clock) THEN IF (wren = '1') THEN mem(conv_integer(address)) <= data; END IF; END PROCESS; q <= mem(conv_integer(address)); END ARCHITECTURE logic; Code describes a 64 x 8 RAM with synchronous write & asynchronous read Cannot be implemented in Altera embedded RAM due to asynchronous read Uses general logic and registers conv_integer is a function found in the std_logic_unsigned (or signed) package Use TO_INTEGER if using numeric_std package

204 Inferred Single-Port Memory (2)
ARCHITECTURE logic OF sp_ram IS TYPE mem_type IS ARRAY (0 TO 63) OF std_logic_vector (7 DOWNTO 0); SIGNAL mem: mem_type; BEGIN PROCESS (clock) BEGIN IF rising_edge(clock) THEN IF (wren = '1') THEN mem(conv_integer(address)) <= data; END IF; q <= mem(conv_integer(address)); END PROCESS; END ARCHITECTURE logic; Code describes a 64 x 8 RAM with synchronous write & synchronous read Old data read-during-write behaviour Memory read in same process/cycle as memory write Check target architecture for support as unsupported features built using LUTs/registers

205 Inferred Single-Port Memory (3)
ARCHITECTURE logic OF sp_ram IS SUBTYPE byte IS std_logic_vector (7 DOWNTO 0); TYPE mem_type IS ARRAY (0 TO 63) OF byte; SIGNAL mem: mem_type; SIGNAL rdaddr_reg : byte; BEGIN PROCESS (clock) BEGIN IF rising_edge(clock) THEN IF (wren = '1') THEN mem(conv_integer(address)) <= data; END IF; rdaddr_reg <= address; END PROCESS; q <= mem(conv_integer(rdaddr_reg)); END ARCHITECTURE logic; Using subtype for vector width Same memory with new data read-during-write behaviour Read performed by separate concurrent statement/process Check target architecture for support Use ramstyle attribute set to “no_rw_check” to disable checking and prevent extra logic generation

206 Simple Dual-Port, Single-Clock Memory
Code describes a simple dual-port (separate read & write addresses) 64 x 8 RAM with single clock Code implies old data read-during-write behaviour New data support in simple dual-port requires additional RAM bypass logic ARCHITECTURE logic OF sdp_ram IS TYPE mem_type IS ARRAY (63 DOWNTO 0) OF std_logic_vector (7 DOWNTO 0); SIGNAL mem: mem_type; BEGIN PROCESS (clock) BEGIN IF rising_edge(clock) THEN IF (wren = '1') THEN mem(conv_integer(wraddress)) <= data; END IF; q <= mem(conv_integer(rdaddress)); END PROCESS; END ARCHITECTURE logic;

207 True Dual-Port, Dual-Clock Memory
ARCHITECTURE logic OF dp_dc_ram IS TYPE mem_type IS ARRAY (63 DOWNTO 0) OF std_logic_vector (7 DOWNTO 0); SIGNAL mem: mem_type; SIGNAL addr_reg_a, addr_reg_b : BEGIN PROCESS (clock_a) BEGIN IF rising_edge(clock_a) THEN IF (wren_a = '1') THEN mem(conv_integer(address_a)) <= data_a; END IF; addr_reg_a <= address_a; q_a <= mem(conv_integer(addr_reg_a)); END PROCESS; PROCESS (clock_b) BEGIN IF rising_edge(clock_b) THEN IF (wren_b = '1') THEN mem(conv_integer(address_b)) <= data_b; addr_reg_b <= address_b; q_b <= mem(conv_integer(addr_reg_b)); END ARCHITECTURE logic; Code describes a true dual-port (two individual addresses) 64 x 8 RAM May not be supported in all synthesis tools New data same-port read-during-write behaviour shown Mixed port behaviour undefined with multiple clocks

208 Initializing Memory Contents Using Files
ARCHITECTURE logic OF sp_ram IS TYPE mem_type IS ARRAY (0 TO 63) OF std_logic_vector (7 DOWNTO 0); SIGNAL mem: mem_type; ATTRIBUTE ram_init_file : STRING; ATTRIBUTE ram_init_file OF mem : SIGNAL IS “init_file_name.hex”; BEGIN PROCESS (clock) BEGIN IF rising_edge(clock) THEN IF (we = '1') THEN mem(conv_integer(address)) <= data; END IF; q <= mem(conv_integer(address)); END PROCESS; END ARCHITECTURE logic; Use VHDL attribute to assign initial contents to inferred memory Store initialization data as .HEX or .MIF Contents of initialization file downloaded into FPGA during configuration

209 Initializing Memory Using Default
Assign default value when declaring memory This example uses a function to establish memory values Recommendation: Use when initializing memory with patterned data Can also use a constant (see ROM example) Recommendation: Use when initializing memory with non-patterned data or single value (e.g. OTHERS => “ ”;) MIF file automatically generated during synthesis due to initialization ARCHITECTURE logic OF sp_ram IS TYPE mem_type IS ARRAY (0 TO 63) OF std_logic_vector (7 DOWNTO 0); FUNCTION init_ram RETURN mem_type Is VARIABLE mem_out : mem_type; BEGIN FOR I IN 0 TO 63 LOOP mem_out(i) := conv_std_logic_vector(i, 8); END FOR; RETURN mem_out; END FUNCTION init_ram; SIGNAL mem: mem_type := init_ram; Loop used to assign each memory address Default initial value for memory

210 Unsupported Control Signals
e.g. Clearing RAM contents with reset BEGIN PROCESS (clock, reset) IF reset = ‘1’ THEN mem(conv_integer(address)) <= (OTHERS => ‘0’); ELSIF rising_edge(clock) THEN IF (we = '1') THEN mem(conv_integer(address)) <= data; END IF; END PROCESS; q <= mem(conv_integer(address)); END ARCHITECTURE logic; Memory content cannot be cleared with reset Synthesizes to general logic resources Recommendations Avoid reset checking in RAM read or write processes Be wary of other control signals (i.e. clock enable) until validated with target architecture

211 Inferred ROM (Case Statement)
SIGNAL q : std_logic_vector (6 DOWNTO 0); BEGIN PROCESS(clock) IF rising_edge(clock) THEN CASE address IS WHEN "0000" => q <= " "; WHEN "0001" => q <= " "; WHEN "0010" => q <= " "; WHEN "0011" => q <= " "; WHEN "0100" => q <= " "; WHEN "1101" => q <= " "; WHEN "1110" => q <= " "; WHEN "1111" => q <= " "; WHEN OTHERS => q <= "XXXXXXX"; END CASE; END IF; END process; Automatically converted to ROM Tools generate ROM using embedded RAM & initialization file Requires constant explicitly defined for each choice in CASE statement May use romstyle synthesis attribute to control implementation Like RAMs, address or output must be registered to implement in Altera embedded RAM

212 Inferred ROM (Constant)
BEGIN PROCESS (clock) IF rising_edge (clock) THEN qa <= rom(CONV_INTEGER(addr_a)); qb <= rom(CONV_INTEGER(addr_b)); END IF; END PROCESS; END ARCHITECTURE logic; ARCHITECTURE logic OF rom16x7 IS TYPE rom_type IS ARRAY (0 TO 15) OF STD_LOGIC_VECTOR (6 DOWNTO 0); CONSTANT rom : rom_type := “ ”, “ ”, “ ”, “ ”, “ ”, “ ’, “ ”, “ ”, “ ”, “ ”, “ ”, “ ”, OTHERS => “ ” ); Needs 1 constant value for each ROM address Example shows dual-port access May place type & constant declaration in package for re-use Alternate: Create and use initialization function routine (see RAM example)

213 State Machine Coding Enumerated data type is used to define the different states in the state machine Using constants for states may not be recognized as state machine One or two signals assigned to the name of the state-variable : Use CASE statement to do the next-state logic, instead of IF-THEN statement Synthesis tools recognize CASE statements for implementing state machines Use CASE or IF-THEN-ELSE for output logic TYPE state_type IS (idle, fill, heat_w, wash, drain); SIGNAL current_state, next_state : state_type;

214 Quartus II Software State Machine Viewer
Use to verify correct coding of state machine Tools Menu  State Machine Viewer State Flow Diagram Use Drop-Down to Select State Machine Highlighting State in State Transition Table Highlights Corresponding State in State Flow Diagram State Transition/Encoding Table

215 State Declaration ENTITY wm IS PORT (
clk, reset, door_closed, full : in std_logic; heat_demand, done, empty : in std_logic; water, spin, heat, pump : out std_logic); END ENTITY wm; ARCHITECTURE behave OF wm IS TYPE state_type IS (idle, fill, heat_w, wash, drain); SIGNAL current_state, next_state : state_type; BEGIN IDLE Water = 0 Spin = 0 Heat = 0 Pump = 0 FILL Water = 1 Spin = 0 Heat = 0 Pump = 0 HEAT_W Water = 0 Spin = 1 Heat = 1 Pump = 0 WASH Water = 0 Spin = 1 Heat = 0 Pump = 0 DRAIN Water = 0 Spin = 1 Heat = 0 Pump = 1 Door_closed = 1 Full = 1 Heat_demand = 0 Heat_demand = 1 Done = 1 Empty = 1

216 Next State Logic Sequential state transitions
PROCESS (clk, reset) BEGIN IF reset = ‘1’ THEN current_state <= idle; ELSIF risting_edge(clk) THEN current_state <= next_state; END IF; END PROCESS; PROCESS (current_state, door_closed, full, heat_demand, done, empty) next_state <= current_state; CASE current_state IS WHEN idle => IF door_closed = ‘1’ THEN next_state <= fill; WHEN fill => IF full = ‘1’ THEN next_state <= heat_w; IDLE Water = 0 Spin = 0 Heat = 0 Pump = 0 FILL Water = 1 Spin = 0 Heat = 0 Pump = 0 HEAT_W Water = 0 Spin = 1 Heat = 1 Pump = 0 WASH Water = 0 Spin = 1 Heat = 0 Pump = 0 DRAIN Water = 0 Spin = 1 Heat = 0 Pump = 1 Door_closed = 1 Full = 1 Heat_demand = 0 Heat_demand = 1 Done = 1 Empty = 1 Sequential state transitions Default next state is current state Combinatorial next state logic

217 Combinatorial Outputs
Default output conditions PROCESS (current_state) BEGIN water <= ‘0’; spin <= ‘0’; heat <= ‘0’; pump <= ‘0’; CASE current_state IS WHEN idle => WHEN fill => water <= ‘1’; WHEN heat_w => spin <= ‘1’; heat <= ‘1’; WHEN wash => WHEN drain => pump <= ‘1’; END CASE; END PROCESS; IDLE Water = 0 Spin = 0 Heat = 0 Pump = 0 FILL Water = 1 Spin = 0 Heat = 0 Pump = 0 HEAT_W Water = 0 Spin = 1 Heat = 1 Pump = 0 WASH Water = 0 Spin = 1 Heat = 0 Pump = 0 DRAIN Water = 0 Spin = 1 Heat = 0 Pump = 1 Door_closed = 1 Full = 1 Heat_demand = 0 Heat_demand = 1 Done = 1 Empty = 1 Output logic function of current state only

218 State Machine Encoding Styles
Binary Encoding Grey-Code Encoding One-Hot Encoding Custom Encoding Idle 000 00001 ? Fill 001 00010 Heat_w 010 011 00100 Wash 01000 Drain 100 110 10000 Quartus II default encoding styles for Altera devices One-hot encoding for look-up table (LUT) devices Architecture features lesser fan-in per cell and an abundance of registers Binary (minimal bit) or grey-code encoding for product-term devices Architecture features fewer registers and greater fan-in

219 Quartus II Encoding Style
Options: One-Hot Gray Minimal Bits Sequential User-Encoded Johnson Apply Assignment to State Variable

220 Undefined States Noise and spurious events in hardware can cause state machines to enter undefined states If state machines do not consider undefined states, it can cause mysterious “lock-ups” in hardware Good engineering practice is to consider these states To account for undefined states Explicitly code for them (manual) Use “safe” synthesis constraint (automatic)

221 ‘Safe’ Binary State Machine?
TYPE state_type IS (idle, fill, heat_w, wash, drain); SIGNAL current_state, next_state : state_type; PROCESS (current_state, door_closed, full, heat_demand, done, empty) BEGIN next_state <= current_state; CASE current_state is WHEN idle => IF door_closed = ‘1’ THEN next_state <= fill; END IF; WHEN fill => IF full = ‘1’ THEN next_state <= heat_w; WHEN heat_w => IF heat_demand = ‘0’ THEN next_state <= wash; WHEN wash => IF heat_demand = ‘1’ THEN next_state <= heat_w; ELSIF done = ‘1’ THEN next_state <= drain; WHEN drain => IF empty = ‘1’ THEN next_state <= idle; WHEN others => next_state <= idle; END CASE; END PROCESS; IDLE Water = 0 Spin = 0 Heat = 0 Pump = 0 FILL Water = 1 Spin = 0 Heat = 0 Pump = 0 HEAT_W Water = 0 Spin = 1 Heat = 1 Pump = 0 WASH Water = 0 Spin = 1 Heat = 0 Pump = 0 DRAIN Water = 0 Spin = 1 Heat = 0 Pump = 1 Door_closed = 1 Full = 1 Heat_demand = 0 Heat_demand = 1 Done = 1 Empty = 1 This code does not consider undefined states The “when others” statement only considers other enumerated states The states “101”, “110” & “111” are not considered

222 Creating “Safe” State Machines
WHEN OTHERS clause does not make state machines “safe” Once state machine is recognized, synthesis tool only accounts for explicitly defined states Exception: Number of states equals power of 2 AND binary/grey encoding enabled Safe state machines created using synthesis constraints Quartus II software uses SAFE STATE MACHINE assignment applied project-wide and to individual FSMs VHDL synthesis attribute May increase logic usage If the synthesis tools actually recognize your code to be that of a state machine, then the default clause will only account for the states that you have actually defined, not all possible values possible with the bits available. In this case, you also want to make sure to apply the “safe” state machine attribute to your state machine.

223 Using Custom Encoding Styles
Remove glitches without output registers Eliminate combinatorial output logic Outputs mimic state bits Use additional state bits for states that do have exclusive outputs State Outputs Custom Encoding Idle 0000 Fill 1000 Heat_w 0110 Wash 0100 Drain 0101 Water Spin Heat Pump

224 Quartus II Custom State Encoding
IDLE Water = 0 Spin = 0 Heat = 0 Pump = 0 FILL Water = 1 Spin = 0 Heat = 0 Pump = 0 HEAT_W Water = 0 Spin = 1 Heat = 1 Pump = 0 WASH Water = 0 Spin = 1 Heat = 0 Pump = 0 DRAIN Water = 0 Spin = 1 Heat = 0 Pump = 1 Door_closed = 1 Heat_demand = 0 Heat_demand = 1 Empty = 1 ENTITY wm IS PORT ( clk, reset, door_closed, full : in std_logic; heat_demand, done, empty : in std_logic; water, spin, heat, pump : out std_logic); END wm; ARCHITECTURE behave OF wm IS TYPE state_type IS (idle, fill, heat_w, wash, drain); ATTRIBUTE syn_encoding : STRING; ATTRIBUTE syn_encoding OF state_type : TYPE IS " ”; SIGNAL current_state, next_state : state_type; BEGIN Full = 1 Done = 1 Must also set State Machine Processing assignment to “User Encoded” Output assignments are coded per previous examples Synthesis automatically handles reduction of output logic Some tools use VHDL attributes like enum_encoding OR syn_enum_encoding to perform custom state encoding

225 Writing Efficient State Machines
Remove counting, timing, arithmetic functions from state machine & implement externally Reduces overall logic & improves performance

226 VHDL Logic Optimization & Performance
Balancing operators Resource sharing Logic duplication Pipelining

227 Operators Synthesis tools replace operators with pre-defined (pre-optimized) blocks of logic Designer should control when & how many operators Ex. Dividers Dividers are large blocks of logic Every ‘/’, mod and rem inserts a divider block and leaves it up to synthesis tool to optimize Better resource optimization usually involves cleverly using multipliers or shift operations to do divide

228 Generating Logic from Operators
< Synthesis tools break down code into logic blocks They then assemble, optimize & map to hardware 1 Comparator IF (sel < 10) THEN y <= a + b; ELSE y <= a + 10; END IF; + + 2 Adders 1 Mulitplexer

229 Balancing Operators X X X X X X z <= a * b * c * d
Use parenthesis to define logic groupings Increases performance May increase utilization Balances delay from all inputs to output Circuit functionality unchanged Unbalanced Balanced z <= a * b * c * d z <= (a * b) * (c * d) a a X X b X b X z X c z c X d d

230 Balancing Operators: Example
a, b, c, d: 4-bit vectors Unbalanced Balanced z <= a * b * c * d z <= (a * b) * (c * d) a 4 x 4 a 4 x 4 X X 8 x 4 8 x 8 b X b 12 x 4 X z X c z c 4 x 4 16-bit 16-bit X d d Delay through 3 stages of multiply Delay through 2 stages of multiply

231 Resource Sharing Reduces area
Reduces number of operators needed Reduces area Two types Sharing operators among mutually exclusive functions Sharing common subexpressions Synthesis tools can perform automatic resource sharing Feature can be enabled or disabled

232 Mutually Exclusive Operators
Up/down counter 2 adders are mutually exclusive & can be shared (typically IF-THEN-ELSE with same operator in both choices) process(rst, clk) variable tmp_q : std_logic_vector(7 DOWNTO 0); begin if rst = '0' then tmp_q := (OTHERS => ‘0’); elsif rising_edge(clk) then if updn = '1' then tmp_q := tmp_q + 1; else tmp_q := tmp_q - 1; end if; end if; q <= tmp_q; end process; clk rst +1 + q Registers -1 +

233 Sharing Mutually Exclusive Operators
process(rst, clk) variable tmp_q : std_logic_vector(7 DOWNTO 0); variable dir : integer range -1 to 1; begin if rst = '0' then tmp_q := (OTHERS => ‘0’); elsif rising_edge(clk) then if updn = '1' then dir := 1; else dir := -1; end if; tmp_q := tmp_q + dir; q <= tmp_q; end process; Up/down counter Only one adder required clk rst +1 + -1 q Registers

234 How Many Multipliers? y <= a * b * c z <= b * c * d

235 How Many Multipliers? (Answer)
y <= a * b * c z <= b * c * d 4 Multipliers! a X b X y c X X z d

236 How Many Multipliers Again?
y <= a * (b * c) z <= (b * c) * d

237 How Many Multipliers Again? (Answer)
y <= a * (b * c) z <= (b * c) * d 3 Multipliers! b X c X y This is called sharing common subexpressions Some synthesis tools do this automatically, but some don’t! Parentheses guide synthesis tools If (b*c) is used repeatedly, assign to temporary signal a X z d

238 Topics Simulation versus Synthesis Latches versus registers
PLD PROM PLA PAL CPLD Programming PLD ASIC FPGA Architecture Quartus Development software FPGA Programming Technology SRAM versus Antifuse FPGA EEPROM/Flash FPGA Xilinx FPGA Architecture FPGA basic building blocks FPGA Embedded Blocks FPGA Clocking Mechanism FPGA Family Altera Megafunctions FPGA Design flow Design phase Advanced VHDL Topics Simulation versus Synthesis Latches versus registers Common pitfalls Unwanted latches Case statement Variable versus signals Synthesizable subprograms Gated clocks Inferring Logic Functions. Control Signal Priority Tri-state Memory

239 Example - 1 Explain the problem with gated clock? How can you implement a gated clock in your design? Cause of functional and timing problem source of additional clock skew To solve: Use a synchronous gating structure Ensure global clock routing is used for clock signal Gate the clock at the source

240 Example - 2 How many registers are? Four registers
Use variable that are updated immediately as shown One register now! ARCHITECTURE logic OF reg1 IS SIGNAL a, b, c : STD_LOGIC; BEGIN PROCESS (clk) IF rising_edge (clk) THEN a <= d; b <= a; c <= b; q <= c; END IF; END PROCESS; END ARCHITECTURE reg1; ARCHITECTURE logic OF reg1 IS VARIABLE a, b, c : STD_LOGIC; BEGIN PROCESS (clk) IF rising_edge (clk) THEN a := d; b := a; c := b; q <= c; END IF; END PROCESS; END ARCHITECTURE reg1;

241 Example - 3 Explain the problem with the following code?
Two drivers drive the same signal, use tri-state ARCHITECTURE beh OF example3 IS BEGIN q <= d; q <= i; END ARCHITECTURE beh;

242 Example - 4 Explain the problem with the following VHDL model? Fix It.
LIBRARY ieee; USE IEEE.std_logic_1164.all; ENTITY nolatch IS PORT (a,b,c : IN STD_LOGIC; sel: IN STD_LOGIC_VECTOR (4 DOWNTO 0); oput: OUT STD_LOGIC); END nolatch; ARCHITECTURE rtl OF nolatch IS BEGIN PROCESS (a,b,c,sel) BEGIN IF sel = "00000" THEN oput <= a; ELSIF sel = "00001" THEN oput <= b; ELSIF sel = "00010" THEN oput <= c; END IF; END PROCESS; END rtl;

243 Example - 4 Explain the problem with the following VHDL model?
Unwanted latch, code updated to remove the unwanted latch. LIBRARY ieee; USE IEEE.std_logic_1164.all; ENTITY nolatch IS PORT (a,b,c : IN STD_LOGIC; sel: IN STD_LOGIC_VECTOR (4 DOWNTO 0); oput: OUT STD_LOGIC); END nolatch; ARCHITECTURE rtl OF nolatch IS BEGIN PROCESS (a,b,c,sel) BEGIN IF sel = "00000" THEN oput <= a; ELSIF sel = "00001" THEN oput <= b; ELSIF sel = "00010" THEN oput <= c; ELSE Prevents latch inference oput <= 'X'; --/ END IF; END PROCESS; END rtl;

244 Pipelining Purposefully inserting register(s) into middle of combinatorial data (critical) path Increases clocking speed Adds levels of latency More clock cycles needed to obtain output Some tools perform automatic pipelining Same advantages/disadvantages as automatic fan-out

245 Adding Single Pipeline Stage
25 MHz System Counter,State Machine Decode Value x Logic 40 ns 50 MHz System Counter,State Machine Decode Value x-1 Logic 20 ns 20 ns

246 Adding Single Pipeline Stage In VHDL
Pipelined Non-Pipelined mult_pipe : PROCESS (clk, clr) BEGIN IF (clr = ‘0’) THEN atemp <= (OTHERS => ‘0’); btemp <= (OTHERS => ‘0’); ctemp <= (OTHERS => ‘0’); dtemp <= (OTHERS => ‘0’); int1 <= (OTHERS => ‘0’); int2 <= (OTHERS => ‘0’); result <= (OTHERS => ‘0’); ELSIF rising_edge(clk) atemp <= a; btemp <= b; ctemp <= c; dtemp <= d; int1 <= atemp * btemp; int2 <= ctemp * dtemp; result <= int1 * int2; END IF; END PROCESS; mult_ : PROCESS (clk, clr) BEGIN IF (clr = ‘0’) THEN atemp <= (OTHERS => ‘0’); btemp <= (OTHERS => ‘0’); ctemp <= (OTHERS => ‘0’); dtemp <= (OTHERS => ‘0’); result <= (OTHERS => ‘0’); ELSIF rising_edge(clk) atemp <= a; btemp <= b; ctemp <= c; dtemp <= d; result <= (atemp * btemp) * (ctemp * dtemp); END IF; END PROCESS;

247 Pipelined 4-input Multiplier
a X b X z c X d

248 Parameterized Code Logic blocks that are made scalable for reuse
Code is written for flexibility Different configurations of same model 4 constructs Pre-defined attributes Generics For generate If generate

249 Pre-Defined Attributes
Return information regarding associated object Object changes will automatically be reflected in returned values Uses Improving readability of code Creating parameterized models Improve flexibility of code, especially using loops Limit hard-coding logic resources Examples Array attributes Signal attributes (not discussed) e.g. ‘EVENT, ‘STABLE

250 Pre-Defined Array Attributes
a : IN STD_LOGIC_VECTOR(7 DOWNTO 0) a‘HIGH = 7 Upper bound of array index a‘LOW = 0 Lower bound of array index a‘RIGHT = 0 Right-most bound of array index a‘LEFT = 7 Left-most bound of array index a‘RANGE = 7 DOWNTO 0 Range declared for object, either TO or DOWNTO a‘REVERSE = 0 TO 7 Reverse of the range declared for object a‘LENGTH = 8 Number of values in range index a’ASCENDING = FALSE Returns TRUE if array range uses TO and FALSE if array range uses DOWNTO These array attributes are synthesizable

251 Generics (Review) Used to pass information to an entity instance
Timing values (for simulation) Scalable code ENTITY reg_bank IS GENERIC ( tplh , tphl : time := 5 ns; tphz, tplz : time := 3 ns; size : integer := 1; ); PORT ( clk : IN std_logic; d : IN std_logic_vector (size - 1 DOWNTO 0); q : OUT std_logic_vector (size - 1 DOWNTO 0) END ENTITY shift_reg;

252 Parameterized Counter
ENTITY counter IS GENERIC (width : INTEGER); PORT ( clk, clr, sload, cnt_en : IN std_logic; data : IN std_logic_vector (width - 1 DOWNTO 0); q : OUT std_logic_vector (width - 1 DOWNTO 0)); END ENTITY counter; ARCHITECTURE logic OF counter IS BEGIN PROCESS PROCESS (clk, clr) VARIABLE cnt : std_logic_vector (width - 1 DOWNTO 0; IF clr = ‘1’ THEN cnt := 0; ELSIF rising_edge(clk) THEN IF sload = ‘1' THEN cnt := data; ELSIF cnt_en = '1' THEN cnt := cnt + 1; END IF; q <= cnt; END PROCESS; END ARCHITECTURE logic; Generic width used to scale counter

253 Using A Parameterized Function
Must map to generics & port Generic & port resolution done at compile time u1 : counter GENERIC MAP (width => 16) PORT MAP (clk => tclk, clr => tclr, sload => tsload, cnt_en => tcnt_en, data => tdata, q => tq); top_counter 16 counter tclk clk sload tsload clr tclr data data cnt_en cnt_en q tq

254 Complete Code LIBRARY IEEE; USE IEEE.std_logic_1164.all;
USE IEEE.std_logic_arith.all; ENTITY top_counter IS PORT ( tclk, tclr, tsload, tcnt_en : IN std_logic; tdata : IN std_logic_vector (15 DOWNTO 0); tq : OUT std_logic_vector (15 DOWNTO 0) ); END ENTITY top_counter; ARCHITECTURE logic OF top_counter IS COMPONENT pcounter GENERIC (width : INTEGER); clk, clr, sload, cnt_en : IN std_logic; data : IN std_logic_vector (width - 1 DOWNTO 0); q : OUT std_logic_vector (width - 1 DOWNTO 0) END COMPONENT; BEGIN u1 : pcounter GENERIC MAP (width => 16) PORT MAP (clk => tclk, clr => tclr, sload => tsload, cnt_en => tcnt_en, data => tdata, q => tq); END ARCHITECTURE logic;

255 Generate Statements Used to create structural blocks
Resolved at compile time Reduce amount of code Can be nested For-generate Creates zero or a set number of duplicates of a structure No need to individual instantiate each duplicate If-generate Conditionally selects whether zero or one structure is made

256 For-Generate Syntax Can only use concurrent statements
Sets the number of structures created Similar to FOR loop Can only use concurrent statements Label is required label : FOR <identifier> IN <range> GENERATE --concurrent statements END GENERATE label;

257 PARITY: Block Diagram

258 PARITY: Entity Declaration
LIBRARY ieee; USE ieee.std_logic_1164.all; ENTITY parity IS PORT( parity_in : IN STD_LOGIC_VECTOR(7 DOWNTO 0); parity_out : OUT STD_LOGIC ); END parity;

259 PARITY: Block Diagram xor_out(1) xor_out(2) xor_out(3) xor_out(4)

260 PARITY: Architecture ARCHITECTURE parity_dataflow OF parity IS
SIGNAL xor_out: std_logic_vector (6 downto 1); BEGIN xor_out(1) <= parity_in(0) XOR parity_in(1); xor_out(2) <= xor_out(1) XOR parity_in(2); xor_out(3) <= xor_out(2) XOR parity_in(3); xor_out(4) <= xor_out(3) XOR parity_in(4); xor_out(5) <= xor_out(4) XOR parity_in(5); xor_out(6) <= xor_out(5) XOR parity_in(6); parity_out <= xor_out(6) XOR parity_in(7); END parity_dataflow;

261 PARITY: Block Diagram (2)
xor_out(0) xor_out(1) xor_out(2) xor_out(3) xor_out(4) xor_out(5) xor_out(6) xor_out(7)

262 PARITY: Architecture ARCHITECTURE parity_dataflow OF parity IS
SIGNAL xor_out: STD_LOGIC_VECTOR (7 downto 0); BEGIN xor_out(0) <= parity_in(0); xor_out(1) <= xor_out(0) XOR parity_in(1); xor_out(2) <= xor_out(1) XOR parity_in(2); xor_out(3) <= xor_out(2) XOR parity_in(3); xor_out(4) <= xor_out(3) XOR parity_in(4); xor_out(5) <= xor_out(4) XOR parity_in(5); xor_out(6) <= xor_out(5) XOR parity_in(6); xor_out(7) <= xor_out(6) XOR parity_in(7); parity_out <= xor_out(7); END parity_dataflow;

263 PARITY: Architecture (2)
ARCHITECTURE parity_dataflow OF parity IS SIGNAL xor_out: STD_LOGIC_VECTOR (7 DOWNTO 0); BEGIN xor_out(0) <= parity_in(0); G2: FOR i IN 1 TO 7 GENERATE xor_out(i) <= xor_out(i-1) XOR parity_in(i); END GENERATE G2; parity_out <= xor_out(7); END parity_dataflow;

264 Example – 16X1 Mux w 8 11 s 1 3 4 7 12 15 2 f

265 A 4-to-1 Multiplexer LIBRARY ieee ; USE ieee.std_logic_1164.all ;
ENTITY mux4to1 IS PORT ( w0, w1, w2, w3 : IN STD_LOGIC ; s : IN STD_LOGIC_VECTOR(1 DOWNTO 0) ; f : OUT STD_LOGIC ) ; END mux4to1 ; ARCHITECTURE Dataflow OF mux4to1 IS BEGIN WITH s SELECT f <= w0 WHEN "00", w1 WHEN "01", w2 WHEN "10", w3 WHEN OTHERS ; END Dataflow ;

266 Straightforward code for 16X1 Mux
LIBRARY ieee ; USE ieee.std_logic_1164.all ; ENTITY Example1 IS PORT ( w : IN STD_LOGIC_VECTOR(0 TO 15) ; s : IN STD_LOGIC_VECTOR(3 DOWNTO 0) ; f : OUT STD_LOGIC ) ; END Example1 ;

267 Straightforward code for 16X1 Mux
ARCHITECTURE Structure OF Example1 IS COMPONENT mux4to1 PORT ( w0, w1, w2, w3 : IN STD_LOGIC ; s : IN STD_LOGIC_VECTOR(1 DOWNTO 0) ; f : OUT STD_LOGIC ) ; END COMPONENT ; SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) ; BEGIN Mux1: mux4to1 PORT MAP ( w(0), w(1), w(2), w(3), s(1 DOWNTO 0), m(0) ) ; Mux2: mux4to1 PORT MAP ( w(4), w(5), w(6), w(7), s(1 DOWNTO 0), m(1) ) ; Mux3: mux4to1 PORT MAP ( w(8), w(9), w(10), w(11), s(1 DOWNTO 0), m(2) ) ; Mux4: mux4to1 PORT MAP ( w(12), w(13), w(14), w(15), s(1 DOWNTO 0), m(3) ) ; Mux5: mux4to1 PORT MAP ( m(0), m(1), m(2), m(3), s(3 DOWNTO 2), f ) ; END Structure ;

268 Modified code for 16X1 Mux ARCHITECTURE Structure OF Example1 IS
COMPONENT mux4to1 PORT ( w0, w1, w2, w3 : IN STD_LOGIC ; s : IN STD_LOGIC_VECTOR(1 DOWNTO 0) ; f : OUT STD_LOGIC ) ; END COMPONENT ; SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) ; BEGIN G1: FOR i IN 0 TO 3 GENERATE Muxes: mux4to1 PORT MAP ( w(4*i), w(4*i+1), w(4*i+2), w(4*i+3), s(1 DOWNTO 0), m(i) ) ; END GENERATE ; Mux5: mux4to1 PORT MAP ( m(0), m(1), m(2), m(3), s(3 DOWNTO 2), f ) ; END Structure ;

269 Example- 4X16 Decoder w w y y w w y y 1 1 1 1 y y 2 2 En y y 3 3 w y y
w w y y 1 1 1 1 y y 2 2 En y y 3 3 w y y 4 w y y 1 1 5 y y 2 6 w 2 w y En y y 3 7 w w y 3 1 1 y 2 En w En y w y y 3 8 w y y 1 1 9 y y 2 10 En y y 3 11 w y y 12 w y y 1 1 13 y y 2 14 En y y 3 15

270 A 2-to-4 binary decoder LIBRARY ieee ; USE ieee.std_logic_1164.all ;
ENTITY dec2to4 IS PORT ( w : IN STD_LOGIC_VECTOR(1 DOWNTO 0) ; En : IN STD_LOGIC ; y : OUT STD_LOGIC_VECTOR(0 TO 3) ) ; END dec2to4 ; ARCHITECTURE Dataflow OF dec2to4 IS SIGNAL Enw : STD_LOGIC_VECTOR(2 DOWNTO 0) ; BEGIN Enw <= En & w ; WITH Enw SELECT y <= "1000" WHEN "100", "0100" WHEN "101", "0010" WHEN "110", "0001" WHEN "111", "0000" WHEN OTHERS ; END Dataflow ;

271 VHDL code for 4X16 decoder LIBRARY ieee ;
USE ieee.std_logic_1164.all ; ENTITY dec4to16 IS PORT (w : IN STD_LOGIC_VECTOR(3 DOWNTO 0) ; En : IN STD_LOGIC ; y : OUT STD_LOGIC_VECTOR(0 TO 15) ) ; END dec4to16 ;

272 VHDL code for 4X16 decoder (2)
ARCHITECTURE Structure OF dec4to16 IS COMPONENT dec2to4 PORT ( w : IN STD_LOGIC_VECTOR(1 DOWNTO 0) ; En : IN STD_LOGIC ; y : OUT STD_LOGIC_VECTOR(0 TO 3) ) ; END COMPONENT ; SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) ; BEGIN G1: FOR i IN 0 TO 3 GENERATE Dec_ri: dec2to4 PORT MAP ( w(1 DOWNTO 0), m(i), y(4*i TO 4*i+3) ); END GENERATE ; Dec_left: dec2to4 PORT MAP ( w(3 DOWNTO 2), En, m ) ; END Structure ;

273 If Generate Syntax Condition controls whether a structure is created
Can only use concurrent statements Label is required label : IF <expression GENERATE --concurrent statements END GENERATE label;

274 Example 1 Based on Lshift, either a Left-Shift register or Right-shift register is generated. If Lshift is true – N-bit left-shift register If false, Right-shift register. entity shift_reg is generic(N: positive := 4; Lshift: Boolean := true);-- generic parameters used port(D: in bit_vector(N downto 1); named association Qout: out bit_vector(N downto 1); CLK, Ld, Sh, Shiftin: in bit); end shift_reg; architecture SRN of shift_reg is signal Q, shifter: bit_vector(N downto 1); begin Qout <= Q; genLS: if Lshift generate conditional generate of left shift register shifter <= Q(N-1 downto 1) & Shiftin; end generate; genRS: if not Lshift generate -- conditional generate of right shift register shifter <= Shiftin & Q(N downto 2); process(CLK) if CLK'event and CLK = '1' then if LD = '1' then Q <= D; elsif Sh = '1' then Q <= shifter; end if; end process; end SRN;

275 If Generate Example 2 selects rising or falling edge clock behavior
ENTITY counter IS GENERIC (width : INTEGER; rise_or_fall : INTEGER); PORT ( clk, clr, sload, cnt_en : IN std_logic; data : IN std_logic_vector (width - 1 DOWNTO 0); q : OUT std_logic_vector (width - 1 DOWNTO 0) ); END ENTITY counter; ARCHITECTURE logic OF counter IS SIGNAL clk_buf : std_logic; BEGIN clock : IF (rise_or_fall > 0) GENERATE clk_buf <= clk; END GENERATE; not_clock : IF (rise_or_fall <= 0) GENERATE clk_buf <= NOT clk; PROCESS (clk_buf, clr) VARIABLE cnt : INTEGER RANGE 0 TO (2**width)-1; IF clr = ‘1’ THEN cnt := 0; ELSIF rising_edge(clk_buf) THEN IF sload = ‘1' THEN cnt := conv_integer(data); ELSIF cnt_en = '1' THEN cnt := cnt + 1; END IF; q <= conv_std_logic_vector(cnt,width); END PROCESS; END ARCHITECTURE logic; u1 : pcounter3 GENERIC MAP (width => 16, rise_or_fall => 0) PORT MAP (clk => tclk, clr => tclr, sload => tsload, data => tdata, cnt_en => tcnt_en, q => tq); END ARCHITECTURE logic; selects rising or falling edge clock behavior One code slice can implement both a rising & falling edge counter Different (& better) than using IF-THEN-ELSE No clock mux is created; either clock inversion is implemented or it is not

276 FPGA Design Validation: Simulation & Design Verification

277 Design Implementation
FPGA Design Flow Product Delivery Design Verification Design Implementation Specifications Requirements Requirements: Provided by customer or generated internally May be ambiguous Little or no implementation details The customer (internal or external) may not know exactly what they want or what is possible

278 Design Implementation
FPGA Design Flow Product Delivery Design Verification Design Implementation Specifications Requirements Specification: Identify what the requirements mean Narrow the requirements to specifics Design blocks Components Input/Output What the design should and shouldn’t do

279 Design Implementation
Synthesis of specification into a workable design Other names: Design Engineering Cycle Initially iterative Design Test Examine Results Modify Specification

280 Design Verification Verify that your design functions according to the specification A complete specification will cover all cases A poor specification is not an excuse for a sloppy design

281 Product Delivery Output of the design process:
A product that performs according to the provided requirements Internally/Mutually developed specification Verification of the performance to the specification Documentation of due diligence Documentation and customer acknowledgement of all know design faults Assessment of risk severity DFMEA

282 DFMEA Design Failure Mode Engineering Analysis Basics:
Identify all possible design failure modes Assign a severity to the failure mode Assess the risk (probability) of this type of failure For all failure modes above a certain severity/probability develop mitigation plan Assign test criteria based on failure mode

283 Product Development for FPGAs
Simulation and Design Verification Crucial Part of the design process FPGA are not hardware and not software Hardware: Deterministic My schematic is my schematic Software: Non-deterministic Different compilers may produce operations for the same high-level program

284 Product Development for FPGAs – What is Simulation
Simulation is the process of applying stimulus or inputs that mimic actual data to the design and observing the output. Input to simulation phase: Design Synthesis netlist Implementation netlist

285 Product Development for FPGAs –Simulation Tools
Editor to create the inputs Text editor Graphical editor Simulator: compiles or connect the test inputs to the design, causing outputs to change based on input data. Input to simulation phase: Design Synthesis netlist Implementation netlist Example: Mentor Graphics

286 VHDL Design Validation
Levels of Simulation Register Transfer Level (RTL) Functional Gate Level

287 RTL Simulation Check for logic and syntax error
Does the design work on the target hardware Will it compile? Contains no timing evaluation

288 Functional Simulation
Performed on netlist or code generated by synthesis tool Sometimes necessary to direct synthesis tool to provide netlist Initial Timing Analysis Will the synthesized design fit or work on the target hardware

289 Gate Level Simulation Performed on the netlist generated by the implementation tool. Contains actually timing information Representative of hardware Most realistic Detects design timing problems

290 Simulation in the Design Process
Complete RTL Does the design function/compile? Complete Functional Simulation Will it function on the target hardware Gate level simulation Will it work as expected over all operational conditions A failure at any of these level require the other steps to be revisited

291 Developing a RTL Simulation
Identify Inputs/Outputs Identify Test Cases For each test case develop a vector waveform Run each test case and verify output Should hit every area of your design Test cases are referred to a stimulus

292 Vector Waveform Files (VWF)

293 Functional Simulation
Verify the functional operation Expand on RTL simulation Include some timing variation Looking for timing hazards VWF may include timing variations Pulse width Pulse spacing

294 Gate Level Simulation A full timing analysis including hardware effects Repeat of Functional Simulation

295 Hardware Verification
Stimuli developed in simulation can be supplied to a hardware test cases generator Build and program target hardware FPGA level Board Level System Level

296 FPGA Design Validation: Simulation & Design Verification

297 Design Implementation
FPGA Design Flow Requirements Specifications Design Implementation Design Verification Product Delivery Simulation: RTL Functional Gate Level

298 Simulation in the design process
Good practice to return at least to functional simulation before approving design changes Gate level simulation involving multiple timing cases can be time consuming

299 Stimulus Test cases/Stimulus: One test case for each condition
…and so on Test Case Input 1 Input 2 Q 1 Wide N/A Low 2 Default Short 3 High

300 Choosing a simulation tool
Hardcore: Develop HDL Company specific automated script generation tools IDE: Development Toolchain ModelSim Mentor Graphics (Also owns Cadence)

301 Introduction to Testbenches
Purpose of testbench Three classes of traditional testbenches General testbench methods Self verification methods Arrays for stimulus & results TEXTIO for stimulus & results

302 Purpose of Testbench Generate stimulus to test design for normal transactions, corner cases and error conditions Direct tests Random tests Automatically verify design to spec and log all errors Regression tests Log transactions in a readable format for easy debugging

303 Three Classes of Traditional Testbenches
Test bench applies stimulus to target code and outputs are manually reviewed Test bench applies stimulus to target code and verifies outputs functionally Requires static timing analysis Test bench applies stimulus to target code and verifies outputs with timing Does not require full static timing analysis Code and test bench data more complex Not covered

304 Advantages/Disadvantages
Testbench Type Advantages Disadvantages Recommendation Class I Simple to write Requires manual verification Takes longer for others (not original designer) to verify Easy for others to miss errors Great for verifying simple code Not intended for re-use Class II Easy to perform verification once complete “Set and forget it” Takes longer to write More difficult to debug initially Better for more complicated designs, designs with complicated stimulus/outputs and higher-level designs Promotes re-usability Class III Most in-depth “Guarantees” design operation, if successful (subject to model accuracy) Takes longest to write Most difficult to debug Physical changes (i.e. target device, process) requires changing testbench Might be overkill for many FPGA designs Required for non-Altera ASIC designs

305 General Testbench Methods
Create “test harness” code to instantiate the device under test (DUT) or target code Create stimulus signals to connect to DUT mycode_tb.vhd clk_assignment mycode.vhd clk datagen_process in1 Single Process to Control each Signal in2 out1 in3 out2 reset_assignment rst

306 Test Vector Generation
Develop sequence of fixed input values Test vector development from bottom up Write basic tasks Write more complex tasks based on basic tasks Perform tests Example – memory testing Basic tasks: readmem, writemem 2nd level tasks: initmem, copymem, comparemem Generation of tests based on tasks

307 Testbench Anatomy ENTITY my_entity_tb IS --TB entity has no ports
END my_entity_tb; ARCHITECTURE behavioral OF tb IS --Local signals and constants COMPONENT TestComp --All Design Under Test component declarations PORT ( ); END COMPONENT; BEGIN DUT:TestComp PORT MAP( Instantiations of DUTs ); testSequence: PROCESS -- Input stimuli END PROCESS; END behavioral; Internal signals are from DUT. Main process may be split into main process. I.e. one to drive clk, rst and other for test vectors. Many architectures can be tested by inserting more for DUT:TestComp use entity work.TestComp(archName) statmetns “work” is the name of the library that “TestComp” is being compiled to. The “DUT” tag is required.

308 Testbench for XOR3 (1) LIBRARY ieee; USE ieee.std_logic_1164.all;
ENTITY xor3_tb IS END xor3_tb; ARCHITECTURE behavioral OF xor3_tb IS -- Component declaration of the tested unit COMPONENT xor3 PORT( A : IN STD_LOGIC; B : IN STD_LOGIC; C : IN STD_LOGIC; Result : OUT STD_LOGIC ); END COMPONENT; -- Stimulus signals - signals mapped to the input and inout ports of tested entity SIGNAL test_vector: STD_LOGIC_VECTOR(2 DOWNTO 0); SIGNAL test_result : STD_LOGIC;

309 Testbench for XOR3 (2) BEGIN UUT : xor3 PORT MAP (
A => test_vector(2), B => test_vector(1), C => test_vector(0), Result => test_result); ); Testing: PROCESS test_vector <= "000"; WAIT FOR 10 ns; test_vector <= "001"; test_vector <= "010"; test_vector <= "011"; test_vector <= "100"; test_vector <= "101"; test_vector <= "110"; test_vector <= "111"; END PROCESS; END behavioral;

310 Generating selected values of one input
SIGNAL test_vector : STD_LOGIC_VECTOR(2 downto 0); BEGIN testing: PROCESS test_vector <= "000"; WAIT FOR 10 ns; test_vector <= "001"; test_vector <= "010"; test_vector <= "011"; test_vector <= "100"; END PROCESS; END behavioral;

311 Generating all values of one input
SIGNAL test_vector : STD_LOGIC_VECTOR(3 downto 0):="0000"; BEGIN testing: PROCESS WAIT FOR 10 ns; test_vector <= test_vector + 1; end process TESTING; END behavioral;

312 Generating periodical signals, such as clocks
CONSTANT clk1_period : TIME := 20 ns; CONSTANT clk2_period : TIME := 200 ns; SIGNAL clk1 : STD_LOGIC; SIGNAL clk2 : STD_LOGIC := ‘0’; BEGIN clk1_generator: PROCESS clk1 <= ‘0’; WAIT FOR clk1_period/2; clk1 <= ‘1’; END PROCESS; clk2 <= not clk2 after clk2_period/2; END behavioral;

313 Generating one-time signals, such as resets
CONSTANT reset1_width : TIME := 100 ns; CONSTANT reset2_width : TIME := 150 ns; SIGNAL reset1 : STD_LOGIC; SIGNAL reset2 : STD_LOGIC := ‘1’; BEGIN reset1_generator: PROCESS reset1 <= ‘1’; WAIT FOR reset_width; reset1 <= ‘0’; WAIT; END PROCESS; reset2_generator: PROCESS reset2 <= ‘0’; END behavioral;

314 Concurrent Statements
Signals with regular or limited transitions can be created with concurrent statements These statements can begin a testbench and reside outside any processes ARCHITECTURE logic OF test_b IS -- Use clkperiod constant to create 50 MHz clock CONSTANT clkperiod : TIME := 20 ns; -- clk initialized to ‘0’ SIGNAL clk : std_logic := ‘0’; SIGNAL reset : std_logic; BEGIN --clock must be initialized when declared to use this notation clk <= NOT clk AFTER clkperiod/2; reset <= ‘1’, ‘0’ AFTER 20 ns, ‘1’ AFTER 40 ns; END ARCHITECTURE logic; CLK RESET ns 5 10 15 20 25 30 35 40 45 50 55

315 Sequential Statements
More complex combinations can be created using sequential statements (i.e. LOOP, WAIT, IF-THEN, CASE) Statements dependent on clock edges Multiple processes & loops executing at once clkgen: PROCESS -- Another clock generation example CONSTANT clkperiod : TIME := 20 ns; BEGIN clk <= ‘0’; -- Initialize clock WAIT FOR 500 ns; -- Delay clock for 500 ns LOOP -- Infinite loop to create free-running clock clk <= ‘1’; WAIT FOR clkperiod/2; clk <= ‘0’; END LOOP; END PROCESS clkgen; buscount: PROCESS (clk) -- Generate counting pattern IF rising_edge (clk) THEN inbus <= count; count <= count + 1; END IF; END PROCESS buscount;

316 Sequential Statements (cont.)
bus_gray: PROCESS (clk) CONSTANT buswidth: INTEGER := 16; BEGIN inbus <= (OTHERS => ‘0’); FOR n IN 0 TO LOOP inbus <= TO_UNSIGNED(n, buswidth) XOR shift_right(TO_UNSIGNED(n, buswidth)), 1); WAIT UNTIL rising_edge(clk); END LOOP; END PROCESS; Example shows more complex stimulus generation Process uses sensitivity list and WAITs (not allowed in synthesis) (uses IEEE.numeric_std.all)

317 Sample VHDL Class I Testbench
LIBRARY ieee; USE ieee.std_logic_1164.all; USE ieee.std_logic_unsigned.all; ENTITY addtest IS -- Top-level entity with no ports END ENTITY addtest; ARCHITECTURE stimulus OF addtest IS -- Declare design being tested COMPONENT adder PORT  ( clk : IN std_logic; a, b: IN std_logic_vector(3 DOWNTO 0); sum : OUT std_logic_vector(3 DOWNTO 0) ); END COMPONENT; -- Signals to assign values and observe results SIGNAL a, b, sum: std_logic_vector(3 DOWNTO 0); SIGNAL clk : std_logic := ‘0’; -- Constants for timing values CONSTANT clkperiod : TIME := 20 ns; BEGIN -- Create clock to synchronize actions clk <= NOT clk AFTER clkperiod/2; -- Instantiate design being tested add1: adder PORT MAP ( clk => clk, a => a, b => b, sum => sum); -- Process to generate stimulus; Note operations -- take place on inactive clock edge PROCESS CONSTANT period : TIME := 40 ns; VARIABLE ina, inb : std_logic_vector(3 DOWNTO 0); BEGIN WAIT UNTIL falling_edge (clk); ina := (OTHERS => ‘0’); inb := (OTHERS => ‘0’); stim_loop: LOOP -- Apply generated stimulus to inputs a <= ina; b <= inb; WAIT FOR period; -- Exit loop once simulation reaches 1 us EXIT stim_loop WHEN NOW > 1 us ; -- Use equations below to generate new stimulus -- values ina := ina + 2; inb := inb + 3; END LOOP stim_loop; -- Final wait to keep process from repeating WAIT; END PROCESS;   END ARCHITECTURE stimulus;

318 Example Results

319 Topics – Exam II State Machine Coding
VHDL Logic Optimization & Performance Balancing operators Resource Sharing Pipelining Parameterized Code Constructs Pre-Defined Attributes Generics For Generate If generate Simulation RTL Simulation Functional Simulation Gate Level simulation Testbenchs Classes of Testbenches Advantages and Disadvantages Test Vector Generation

320 Example - 1 Explain One-Hot Encoding used by Altera’s Quartus? Show how you can encode the following 5 states? State 0 State 1 State 2 State 3 State 4

321 Example - 1 Explain One-Hot Encoding used by Altera’s Quartus? Show how you can encode the following 5 states? One-Hot Encoding: The default encoding style requiring N bits, in which N is the number of enumeration literals in the Enumeration Type. State State State State State

322 Example - 2 Generate the possible logic block from the given operators: IF (A > 20) THEN X <= B * C; ELSE X <= C *D; END IF;

323 Example - 2 Generate the possible logic block from the given operators: < IF (A > 20) THEN X <= B * C; ELSE X <= C *D; END IF; 1 Comparator 2 Multiplier X X 1 Mulitplexer

324 Example - 3 Use parenthesis to balance the following operators
Z <= a * b * c * d * e * f

325 Example - 3 Use parenthesis to balance the following operators Z <= a * b * c * d * e * f Balanced Unbalanced z <= (a * b) * (c * d) * (e * f) z <= a * b * c * d * e * f a a X X b b X X c X c X d d X e e X X X f z z f

326 Example - 4 Draw the test waveforms generated by the following testbench? ARCHITECTURE logic OF test_b IS -- Use clkperiod constant to create 50 MHz clock CONSTANT clkperiod : TIME := 20 ns; -- clk initialized to ‘1’ SIGNAL clk : std_logic := ‘1’; SIGNAL reset : std_logic; BEGIN --clock must be initialized when declared to use this notation clk <= NOT clk AFTER clkperiod/2; reset <= ‘1’, ‘0’ AFTER 25 ns, ‘1’ AFTER 40 ns; END ARCHITECTURE logic;

327 Example - 4 Draw the test waveforms generated by the following testbench? ARCHITECTURE logic OF test_b IS -- Use clkperiod constant to create 50 MHz clock CONSTANT clkperiod : TIME := 20 ns; -- clk initialized to ‘1’ SIGNAL clk : std_logic := ‘1’; SIGNAL reset : std_logic; BEGIN --clock must be initialized when declared to use this notation clk <= NOT clk AFTER clkperiod/2; reset <= ‘1’, ‘0’ AFTER 25 ns, ‘1’ AFTER 40 ns; END ARCHITECTURE logic; CLK RESET ns 5 10 15 20 25 30 35 40 45 50 55

328 Example - 5 Add to the following entity interface a generic clause defining generic constant Tpw_clk_h and Tpw_clk_l that specify the minimum clock pulse width timing. Both generic constants have a default value of 3 ns. ENTITY flipflop IS PORT (clk, d : IN STD_LOGIC; q, q_n : OUT STD_LOGIC); END ENTITY flipflop ;

329 Example - 5 Add to the following entity interface a generic clause defining generic constant Tpw_clk_h and Tpw_clk_l that specify the minimum clock pulse width timing. Both generic constants have a default value of 3 ns. ENTITY flipflop IS PORT (clk, d : IN STD_LOGIC; q, q_n : OUT STD_LOGIC); END ENTITY flipflop ; ENTITY flipflop IS GENERIC (Tpw_clk_h, Tpw_clk_l : delay_length := 3 ns); PORT (clk, d : IN STD_LOGIC; q, q_n : OUT STD_LOGIC); END ENTITY flipflop ;

330 Class II (& III) Methods
Add a compare process so that DUT outputs can be monitored Allows testbench to do “self-verification” mycode_tb.vhd clk_assignment mycode.vhd clk compare_process wavegen_process in1 in2 out1 in3 out2 reset_assignment clk

331 Self Verification Methods
Use “compare_process” or equivalent to check results generated by design against expected results Single simulation can use one or multiple testbench files Single testbench file containing all stimulus and all expected results Multiple testbench files based on stimulus, expected results or functionality (e.g. data generator, control stimulus) Many times signaling is too complicated to model without using vectors saved in “time-slices”

332 Simple Self Verifying Test Benches
clk <= NOT clk AFTER clkperiod/2; add1 : adder PORT MAP ( clk => clk, a => a, a => b, sum => sum); stim: PROCESS VARIABLE error : BOOLEAN; BEGIN WAIT UNTIL falling_edge(clk); a <= (OTHERS => ‘0’); b <= (OTHERS => ‘0’); WAIT FOR 40 ns; IF (sum /= 0) THEN error := TRUE; END IF; a <= “0010”; b <= “0011”; WAIT FOR 40 ns; IF (sum /= 5) THEN -- repeat above varying values of a and b WAIT; END PROCESS stim; Code repeated for each test case Result checked Simple self verifying test bench Each sub-block within process assigns values to a,b and waits to compare sum to its predetermined result Code not very efficient Each test case may require a lot of repeated code Improve this code by introducing a procedure

333 Simplifying Test Bench with Procedure
PROCEDURE test ( SIGNAL clk : IN std_logic; inval_a, inval_b, result : IN INTEGER RANGE 0 TO 15; SIGNAL in_a, in_b : OUT std_logic_vector(3 DOWNTO 0); SIGNAL sum_out : IN std_logic_vector(3 DOWNTO 0); SIGNAL error : INOUT BOOLEAN) IS BEGIN WAIT UNTIL falling_edge(clk); in_a <= conv_std_logic_vector(inval_a,4); in_b <= conv_std_logic_vector(inval_b,4); WAIT FOR 40 ns; IF sum_out /= result THEN error <= TRUE; ELSE error <= FALSE; END IF; END PROCEDURE; BEGIN – architecture begin clk <= NOT clk AFTER clkperiod/2; add1 : adder PORT MAP (clk => clk, a => a, a => b, sum => sum); PROCESS test(clk, 0, 0, 0, a, b, sum, error); test(clk, 2, 3, 5, a, b, sum, error); test(clk, 4, 6, 10, a, b, sum, error); test(clk, 6, 9, 15, a, b, sum, error); test(clk, 8, 12, 4, a, b, sum, error); WAIT ; END PROCESS; END ARCHITECTURE; Procedure used to simplify test bench Each procedure call passes in clock 3 integers representing input stimulus and expected result ports connecting to adder error flag Procedure improves efficiency and readability of testbench Advantage: Easier to write Disadvantages Each procedure call (like last example) assigns values to a, b then waits to compare sum to its predetermined result Very difficult to do for complicated signaling

334 “Time-Slice” Vectors Allows you to apply input stimulus and check results at specific simulation times Two methods for storage Internal arrays Faster simulation times Harder to write, creates very large VHDL file External files Slower simulation times Easier to write Use TEXTIO or STD_LOGIC_TEXTIO package TEXTIO for reading/writing built-in data types STD_LOGIC_TEXTIO for reading/writing standard logic

335 Add’l Useful VHDL Constructs for Testbenches
Record data types Assert & report statements Type conversion to STRING TEXTIO/File operations

336 Record Data Types Declares a new data type with multiple elements
Allows grouping of related data types/objects Each element may be of any previously defined data type, including arrays, enumerated types and even other records Similar to a struct in C Using in a testbench Set each record to the values for one time slice Cycle through records to apply stimulus and check results Examples Store input and output values in different elements Store different inputs in different elements TYPE test_record_type IS RECORD a, b : std_logic_vector(3 DOWNTO 0); sum : std_logic_vector(3 DOWNTO 0); END RECORD; element names element data types

337 Accessing Values in a Record
VARIABLE vector : test_record_type; Use selected name to access single record element Use aggregate to access entire record vector.a := “0010”; vector.b := “0011”; vector.sum := “0101”; vector := (a => “0010”, b => “0011”, sum => “0101”);

338 Using Internal Arrays for Stimulus & Results
Create array to store values (e.g. array of records) Assign values to array -- Create unconstrained array so the array depth can be set when object is declared of the array type TYPE test_array_type IS ARRAY (POSITIVE RANGE <>) OF test_record_type; -- Constant array with 6 records CONSTANT test_patterns : test_array_type := ( (a => “0000", b => “0000“, sum => “0000”), (a => “0010", b => “0011“, sum => “0101”), (a => “0100", b => “0110“, sum => “1010”), (a => “0110", b => “1001“, sum => “1111”), (a => “1000", b => “1100“, sum => “0100”), (a => “1010", b => “1111“, sum => “1001”) ); * POSITIVE is INTEGER data type with range of 1 to highest integer value

339 Assert Statements Checks condition expression and executes assertion if condition evaluates to false Use as concurrent or sequential statement Syntax Report (optional) Displays text in simulator window Must be type string Enclose character strings in “ “ Other data types must be converted (discussed later) Severity (optional) Expression choices: NOTE, WARNING, ERROR, FAILURE ERROR is the default Results of severity depend on simulator e.g. By default, ModelSim tool ends simulation on failure only ASSERT <condition_expression> REPORT <text_string> SEVERITY <expression>;

340 Report Statements Syntax Severity (optional)
Displays message without ASSERT statement No expression to check Sequential statement only Test must be type string Enclose character strings in “ “ Other data types must be converted (next slide) Syntax Severity (optional) Same options as ASSERT except NOTE is the default REPORT <text_string> SEVERITY <expression>;

341 Type Conversions to STRING
Use to display formatted messages <data_type>’IMAGE(obj) Type attribute that converts obj of type <data_type> to its string equivalent with no leading or trailing whitespace Examples INTEGER’IMAGE(integer_variable) TIME’IMAGE(time_variable) std_logic’IMAGE(1_bit_std_logic_variable) Conversion utilities Cannot use ‘IMAGE for vectors <data_type> must be a scalar type or subtype Simple web search can provide most (if not all) required conversion utilities

342 Sample Testbench Using Internal Array
-- entity and some of architecture declaration not shown SIGNAL testclk : std_logic := '0'; SIGNAL a, b : std_logic_vector (3 DOWNTO 0); SIGNAL sum : std_logic_vector (3 DOWNTO 0); CONSTANT clk_period : time := 20 ns; TYPE test_record_type IS RECORD a, b : std_logic_vector(3 DOWNTO 0); sum : std_logic_vector(3 DOWNTO 0); END RECORD; TYPE test_array_type IS ARRAY(POSITIVE RANGE <>) OF test_record_type; CONSTANT test_patterns : test_array_type := ( (a => “0000", b => “0000“, sum => “XXXX”), (a => “0010", b => “0011“, sum => “0000”), (a => “0100", b => “0110“, sum => “0101”), (a => “0110", b => “1001“, sum => “1010”), (a => “1000", b => “1100“, sum => “1111”), (a => “1000", b => “1100“, sum => “0100”) ); BEGIN -- beginning of architecture body -- instantiate unit under test (adder) add1 : adder PORT MAP ( clk => testclk, a => a, b => b, sum => sum); -- free-running clock process -- testclk <= NOT testclk AFTER clk_period/2; test: PROCESS VARIABLE vector : test_record_type; VARIABLE found_error : BOOLEAN := FALSE; BEGIN -- Loop through all the values in test_patterns FOR i IN test_patterns‘RANGE LOOP vector := test_patterns(i); -- apply the stimulus on a falling edge clock WAIT UNTIL falling_edge(testclk); a <= vector.a; b <= vector.b; -- check result on next falling edge of clock IF (sum /= vector.sum) THEN REPORT TIME’IMAGE(NOW) & “ : Calc= " & slv_to_string(sum) & ", Exp= " & slv_to_string(vector.sum); found_error := TRUE; END IF; END LOOP; ASSERT NOT found_error REPORT "---VECTORS FAILED---" SEVERITY FAILURE; ASSERT found_error REPORT "---VECTORS PASSED---" END PROCESS; END ARCHITECTURE;

343 Example Results Testbench fails (expected results ≠ actual results)
Testbench passes ModelSim Transcript Window ** Note: 72 ns : Calc = 0100, Exp= 1001 Time: 72 ns Iteration: 0 Instance: /record_add_tb ** Failure: ---VECTORS FAILED--- Time: 288 ns Iteration: 0 Process: /record_add_tb/test File: … Break in Process test at record_tb.vhd line 56 ModelSim Transcript Window ** Failure: ---VECTORS PASSED--- Time: 288 ns Iteration: 0 Process: /record_add_tb/test File: … Break in Process test at record_tb.vhd line 59

344 TEXTIO/FILE Operations
FILE declaration Creates file handle to represent file Opens file in READ_MODE, WRITE_MODE or APPEND_MODE LINE declaration Creates line variable for reading and writing to files READLINE(<file_handle>,<line_variable>) Reads a line from a file and stores information in a variable of type LINE READ(<line_variable>,<data_object>) Reads text from line variable and writes to data object depending on size/type of data objec Use STD_LOGIC_TEXTIO package to read directly into std_logic data objects Only built-in data types supported by TEXTIO package READ (BIT, BOOLEAN, STRING, TIME) WRITE(<line_variable>,<data_object>) Writes data object to a variable of type LINE as text Use STD_LOGIC_TEXTIO package to write directly from std_logic data objects Only built-in data types supported by TEXTIO package WRITE (BIT, BOOLEAN, STRING, TIME) WRITELINE(<file_handle>,<line_variable>) Writes information from variable of type LINE to file

345 Sample Testbench Using External File
-- Declare packages to enable file operations LIBRARY ieee; USE STD.TEXTIO.ALL; USE ieee.std_logic_1164.ALL; USE ieee.std_logic_textio.ALL; ENTITY file_tb IS END ENTITY file_tb; ARCHITECTURE stimulus OF file_tb IS COMPONENT adder PORT  (clk : IN std_logic; a, b: IN std_logic_vector(3 DOWNTO 0); sum: OUT std_logic_vector(3 DOWNTO 0)); END COMPONENT; -- create file handles to access text files, one for reading vectors and -- another to write output messages FILE vectorfile: TEXT OPEN READ_MODE IS “vectors.txt”; FILE results: TEXT OPEN WRITE_MODE IS “results.txt”; SIGNAL a, b, sum : std_logic_vector (3 DOWNTO 0); SIGNAL testclk : std_logic := ‘0’; CONSTANT clk_period : TIME := 20 ns; BEGIN -- beginning of architecture body -- instantiate unit under test (adder) add1 : adder PORT MAP ( clk => testclk, a => a, b => b, sum => sum); -- free-running clock process -- testclk <= NOT testclk AFTER clk_period/2;

346 Sample Testbench Using External File (cont.)
test: PROCESS VARIABLE found_error : BOOLEAN := FALSE; VARIABLE a_var, b_var, sum_var : std_logic_vector (3 DOWNTO 0); VARIABLE vectorline, resultsline : LINE; BEGIN WHILE NOT ENDFILE (vectorfile) LOOP -- read file into line and line into variables READLINE (vectorfile, vectorline); READ (vectorline, a_var); READ (vectorline, b_var); READ (vectorline, sum_var); -- apply the stimulus on a falling edge clock WAIT UNTIL falling_edge(testclk); a <= a_var; b <= b_var; -- check result on next falling clock edge IF (sum /= sum_var) THEN -- write current simulation time to line variable WRITE (resultsline, NOW); -- write string WRITE (resultsline, string'(" : Calc= ")); -- write result value WRITE (resultsline, sum); write string WRITE (resultsline, string'(", Exp= ")); -- write expected value WRITE (resultsline, sum_var); -- write entire line to text file WRITELINE (results, resultsline); found_error := TRUE; END IF; END LOOP; ASSERT NOT found_error REPORT "---VECTORS FAILED---" SEVERITY FAILURE; ASSERT found_error REPORT "---VECTORS PASSED---" END PROCESS test; END ARCHITECTURE stimulus;

347 Example Files vectors.txt results.txt (failure example)
No inherent formatting excepting white-space skipping Options Use separate files for stimulus and expected results Design custom tasks to extend capabilities (e.g. support comments) results.txt (failure example) 240 ns Calc= 0100, Exp= 1001

348 Example Test Plans Develop high-level behavioral (i.e. non-synthesizable) model of design Create stimulus/test vectors to simulate model Generate expected results from behavioral model simulation Replace behavioral blocks with RTL model blocks Simulate each RTL block with other behavioral blocks to ensure functionality is the same

349 Synthesis The first step in which HDL (or other design format) is associated with internal logic. Input: Design. Output: design netlist that feeds into the implementation tools. Other outputs: functional simulation netlist, and reports: provides pertinent information about synthesized design. Could be performed immediately following the design phase, it is mandatory (simulation is optional) Netlists: connects FPGA resources to perform the same function defined by the high level design.

350 What is Design Synthesis?
FPGA consists of logic blocks that can be configured to perform functions. Synthesis takes the high-level design and associates it with FPGA resources and reduce logic to make design more efficient. Synthesis process needs information about the FPGA device, such as speed, and internal resources. The FPGA is identified by selecting the family, device number, package, and speed.

351 What is Design Synthesis?
Three basic synthesis operations: Syntax check and Resource Association: design is checked for syntax and synthesis errors, once the design is error free, it is converted into structural elements, logic elements are inserted as replacement for arithmetic operators (X, -, ..) Optimization: Design is put together without concern for redundant logic, timing constraints (if provided), clock speed, or other design consideration. Next. Algorithms are used to optimize the design: Check for redundant logic, clock speed, evaluate multiple paths to ensure fastest timing is achieved.

352 Optimization Shortest path does not mean fastest time because of resources layout and how those resources are used. Example: option 2 is longer, however, option 1 has more resource delays and therefore option 2 is faster

353 What is Design Synthesis? Cnt’d
Technology Mapping: Map optimized design to technology associated with the targeted FPGA Synthesis tools use advanced techniques to make predictions about how the design will place and routed in the target device. Synthesis Tools produce synthesis timing estimates that are near the actual post-implementation timing, real time is unknown until after the design has been placed and routed. Example of some technology view symbols

354 Synthesis Phase Tools Synthesis tools are available as standalone or part of a complete package. Complete Package Synthesis: Examples: Xilinx ISE, Altera Quartus Advantages: Single tool: need to know only one tool Faster: eliminate time to switch between tools. Cheaper Manufacturer understands device better than a third party, data are more accurate. Disadvantages: Manufacturer dependent Standalone Package Synthesis: Examples: LeonardoSpectrum by Mentor Graphics, Synplify Pro by Synopsys Manufacturer independent Separate tools for synthesis and implementation More expensive than the complete package Not expert on device

355 Synthesis Setup Synthesis setup consists:
Device information ( family, device number, package, and speed) Input design User-defined constraint file(s). Input Design: Altera’s Quartus accepts: AHDL (Altera Hardware Description Language) VHDL Verilog Schematic Capture EDIF: vendor independent netlist file Outputs: Netlist: the synthesized design Status reports: utilization, timing, .. schematic view: RTL

356 Netlists The design netlist is what your design looks like after it has been sythesized (optimized, connected using internal FPGA logic) Functional Simulation netlist: allows to verify the synthesis process did not alter the design, you should expect same results using testbenchs. Functional Simulation is done using simulator. (ModelSim)

357 Status Reports Optional: reports on resource utilizations, timing information, critical paths, warnings and errors. Not used as input to other development phases. Very helpful information and allow you to identify real or potential problems, such as design is not meeting timing and other constraints.

358 Schematic View Synthesis tools generates two: RTL and technology
RTL: shows the pre-optimized design in terms of generic symbols, such as adder, multiplier, counters, AND gates, … etc. RTL is manufacturer independent, not associated yet with manufacturer, Technology: shows gates and elements as they will look in the device.

359 RTL Schematic View RTL: how the design looks as it is converted to logic elements

360 Technology Schematic View
Technology: shows the internal technology, such as lookup table connected to create the design.

361 Key points to remember Synthesis is required and must be performed prior to implementation Tools include complete package versus standalone. Functional simulation should be performed, time permitting RTL and technology views show what logic makes up the design.

362 Quartus II Full Compilation Flow
Design Files Analysis & Elaboration Constraints & settings Synthesis Functional Simulation Functional Netlist Constraints & settings Fitter Assembler Executed in parallel (multi-processor or multi-core systems only) Programming & Configuration files TimeQuest Timing Analysis This is the typical flow. Other module executables may be added if additional software features are enabled. Different tools have different definitions for what it means to “compile”. In general terms it means to process the input. In Quartus II that means it will analyze the input files for errors, build your design entry database, synthesize logic, run the fitter and generate output in the form of programming files and timing information. QII will also generate output simulation netlist files, if set up to do so. During synthesis and fitting of the design, any constraints or settings that have been applied to the project or revision will be used to guide the design compilation. Compilation = Synthesis + Place and route. EDA writer for 3rd party tools; import. EDA Netlist Writer Gate-Level Simulation Post-Fit Simulation Files

363 Netlist Viewers RTL Viewer
Schematic of design after Analysis and Elaboration Visually check initial HDL before synthesis optimizations Locate synthesized nodes for assigning constraints Debug verification issues Technology Map Viewers (Post-Mapping or Post-Fitting) Graphically represents results of mapping (post-synthesis) & fitting Analyze critical timing paths graphically Locate nodes & node names after optimizations Graphical tools that let you look at the results of compilation. RTL viewer looks at synthesized results. Technology viewer allows you to look at post map or post fit netlist. Constraint assignment or debug New in 9.0 Netlist viewers support HardCopy iV device family New Find Dialog box Dockable window for hierarchy tree navigation in netlist.

364 Tools menu  Netlist Viewers or Tasks window “Compile Design” tasks
RTL Viewer Schematic view Hierarchy list Note: Must perform elaboration first (e.g. Analysis & Elaboration OR Analysis & Synthesis) Tools menu  Netlist Viewers or Tasks window “Compile Design” tasks Find in hierarchy New in 9.0 Find tab in hierarchy list. Hierarchy on left Schematic view on the right Tooltips are displayed over a block of RTL.

365 Schematic View (RTL Viewer)
Place pointer over any element in schematic to see details Name Internal resource count Represents design using logic blocks & nets I/O pins Registers Muxes Gates (AND, OR, etc.) Operators (adders, multipliers, etc.) Elements are displayed as the synthesized elements.

366 Schematic Hierarchy Navigation
Descend hierarchy Double-click on instance Right-click & select Hierarchy Down Ascend hierarchy Double-click in white space Right-click & select Hierarchy Up Middle hierarchy Double-click on instance descends Double-click in white space ascends Use the Hierarchy Down command to go down into or expand an instance’s hierarchy and open a lower-level schematic that shows the internal logic of the instance. Use the Hierarchy Up command to go up in hierarchy or collapse a lower-level hierarchy and open the parent higher-level hierarchy. Select object by clicking it once, it turns red and then right click to display contents without descending. The green ring around the object can be right clicked to hide the contents.

367 Technology Map Viewers
Tools Menu  Netlist Viewers or Tasks window “Compile Design” tasks Now we are looking at post fit results. Hierarchy list Schematic view Note: Must run synthesis and/or fitting first

368 Schematic View (Technology Viewer)
Represents design using atoms I/O pins & cells Lcells Memory blocks MAC (DSP blocks) Place pointer over any element in schematic to see details Name Internal resource count Logic equation Now we are looking at logic cells, memory blocks, the RTL fit into the tech elements of the FPGA. RTL synthesized to library components. Logical equations; timing delays.

369 Hierarchy List Traverse between levels of design hierarchy
View logic schematic for each hierarchical level Break down each hierarchical level into netlist elements or atoms Instances Primitives Pins Nets State machines Logic clouds (if enabled) When you click any of the instance, primitive, pin, or net names in the hierarchy list, the RTL viewer performs the following action: If not currently displayed, the hierarchy and page that contain the selected item are displayed in the schematic view. If needed, changes the focus of the current schematic page to include the selected item. Highlights the selected item in red in the schematic view. You can select multiple items by pressing the Shift key while selecting with your mouse.

370 Using Hierarchy List Expanding instances shows Instances Pins, Nets
Clicking an instance and then can expand the view. Highlighting netlist element in hierarchy list highlights/views that element in schematic view Highlighting netlist element in hierarchy list highlights/views that element in schematic view

371 Timing Analysis - Quartus
TimeQuest GUI Using the TimeQuest Timing Analyzer in the Quartus II flow Online training course by Altera: Note: For more details on verifying designs for timing, please attend the course “Quartus II Software Design Series: Timing Analysis” Online training also available: TimeQuest Timing Analyzer

372 TimeQuest Timing Analyzer
Timing engine in Quartus II software Provides timing analysis solution for all levels of experience and design complexity Features Synopsys Design Constraints (SDC) support Easy-to-use interface Scripting emphasis The TQ timing analyzer is the main timing engine found in the QII software. It provides a powerful timing analysis solution for designers with any level of timing analysis experience and for designs of all levels of complexity. It’s easy to use, providing a graphical interface for beginners and those who prefer a GUI, while completely supporting a Tcl scripting-based environment. The TimeQuest timing analyzer provides fast, on-demand and interactive data reporting to save time and to make it easy to get detailed timing analysis only on the paths of interest. As already mentioned, the timing analyzer uses synopsys design constraints or SDC, a standard method for constraining timing in the ASIC world using Synopsys’ PrimeTime timing analysis tool. Altera has adopted the SDC standard for use with PLD designs. While Tcl and sdc are command line and text-based methods of interacting with the TimeQuest timing analyzer, this training will focus on the TimeQuest GUI and show the equivalent Tcl or SDC commands.

373 Opening the TimeQuest Interface
Toolbar button Tools menu Tasks window Stand-alone mode (run w/o opening the Quartus II software) quartus_staw Multiple ways of opening TimeQuest.

374 Quartus Settings File (QSF)
SDC constraints are not stored in QSF For 90 nm and older devices, TimeQuest TA provides a script to convert QSF timing assignments to SDC One important thing to note about sdc constraints is that they are not stored in the project’s qsf file. They are only stored in the SDC file that we’ve been talking about so far. It is possible, though, to take timing constraints that were in the qsf file and convert them into sdc constraints to create a new sdc file. This is done if you have a project that was using the classic timing analyzer and you want to start using tq. When you first start tq with such a project or any project that does not include an sdc file, you’ll see the dialog box shown here. If you click yes, any timing constraints contained in the qsf file will be converted to sdc and stored in a new sdc file. If you click no, you start from scratch. You’ll need to create a new sdc file with new sdc constraints for the project. For the lab exercises and in general, you’ll want to click no unless you have a lot of timing constraints set up in the qsf for use by the classic timing analyzer. If you are moving a project from using classic to using tq, I recommend checking out the switching to tq timing analyzer chapter in the qii handbook as well as the online training linked here. These resources explain the differences between the classic timing analyzer and tq and provide more information about how timing constraints stored in the qsf file get converted and placed in an sdc file.

375 Menu access to all TimeQuest features
TimeQuest GUI Menu access to all TimeQuest features Report pane View pane Tasks pane So as we saw in the demonstration, here is the TQ GUI for your reference. We’ll go over each of these parts of the window in more detail and return to each of them throughout the training. Console pane

376 SDC File Editor (1) Use Quartus II editor to create and/or edit SDC
TimeQuest File menu  New/Open SDC File Quartus II File menu  New  Other Files Features Access to GUI dialog boxes for constraint entry Syntax coloring Tooltip syntax help Command tooltip Text editor adapts itself to whatever the source file is. When bringing up an sdc it has nice features for inserting constraints.

377 Constraints inserted at cursor location
SDC File Editor (2) Construct an SDC file using the TimeQuest graphical constraint creation tools As mentioned, you can use the TimeQuest graphical constraint creation tools to easily create SDC constraints. When you select a type of constraint to create from the Insert Constraints submenu from the SDC file editor’s Edit menu as shown here, you are presented with a dialog box that includes text fields and options for creating the constraint. As you set or change the constraint options, the SDC command field updates on the fly to display what the final constraint will look like. When you click the Insert button, the constraint is entered into the SDC file at the cursor location. Since the constraint is entered exactly at the cursor location, remember to place your cursor correctly before accessing one of these dialog boxes for constraint entry. The GUI does not add carriage returns at the end of newly created constraints, so you need to add these and place the cursor correctly to avoid syntax errors in your constraints. Constraints inserted at cursor location

378 Using TimeQuest TA in Quartus II Flow
The flow for using TimeQuest with Quartus.

379 Steps to Using TimeQuest Tool
1. Generate timing netlist 2. Enter SDC constraints by creating or reading in an SDC file 3. Update timing netlist Four steps in TimeQuest. 4. Generate timing reports

380 1. Generate Timing Netlist
Create a timing netlist based on compilation results Post-synthesis (mapping) or post-fit (fully compiled) Delay model (slow or fast) Netlist menu gives complete control Tasks pane uses default (post-fit, slow) Netlist menu Tasks pane Step 1 – post fit is the default so if you double click on the task pane, that’s what you get. Timing netlist is a database for TQ. Tcl equivalent of command

381 2. Create or Read in SDC File
Create SDC file using SDC file editor - Don’t enter constraints using Constraints menu Read in constraints & exceptions from existing SDC file Execution - Read SDC File (Tasks pane or Constraints menu) File precedence (if no filename specified) Files specifically added to Quartus II project <current_revision>.sdc (if it exists in project directory) Step 2 looks for a top level entity revision .sdc file. Constraints entered through the menu don’t get saved. Could add constraint file to project before bringing up TimeQuest. Tcl: read_sdc [<filename>]

382 Constraining User MUST enter constraints for all paths to fully analyze design Timing analyzer only performs slack analysis on constrained design paths Constraints guide the fitter to place & route design in order to meet timing requirements Not as difficult a task as it may sound Wildcards Single, generalized constraints cover many paths, even all paths in an entire clock domain See Altera TimeQuest Timing Analyzer online training for information about basic SDC constraints You’re an engineer, you want to completely analyze your design. It’s not that hard using wildcards.

383 3. Update Timing Netlist Apply SDC constraints/exceptions to current timing netlist Generates warnings Undefined clocks Partially defined I/O delays Combinational loops Update timing netlist after adding any new constraint Execution Update Timing Netlist (Tasks pane or Netlist menu) If you were using defaults you could just double click on update task and steps 1 and 2 would also be done. Tcl: update_timing_netlist

384 4. Generate Timing Reports
Verify timing requirements and locate violations Check for fully constrained design or ignored timing constraints Two methods Tasks pane - Shortcut: Automatically creates/updates netlist & reads default SDC file if needed Reports menu - Must have valid netlist to access Once the netlist is updated with your timing constraints, you’ll finally be able to generate timing reports. Timing reports help you verify your timing requirements and locate violations. Use diagnostic reports to check your design to make sure it is fully constrained or get a report of what SDC constraints and exceptions have been applied to the timing netlist. You can generate many different types of detailed reports by using the report_timing command or by choosing a report from the reports menu or the reports section of the tasks pane. As a shortcut, if you’ve previously created a timing netlist and selected the SDC file to read in back in step 2, you can skip steps 1 through 3 and immediately start generating timing reports. Just double-click a report-to-create in the Tasks pane and the netlist will get created, the SDC file will be read in, and the netlist will get updated with the constraints from that file before immediately creating the selected report. Double-click individual report (shortcut to skip steps 1-3)

385 Reset Design Command Located in Tasks pane or Constraints menu
Flushes all timing constraints from current timing netlist Functional Tcl equivalent: delete_timing_netlist command followed by create_timing_netlist Uses “Re-starting” timing analysis on same timing netlist applying different constraints or SDC file Starting analysis over if results seem to be unexpected

386 Using TimeQuest TA in Quartus II Flow
Synthesize Quartus II project Use TimeQuest TA to specify timing requirements Enable TimeQuest TA in Quartus II project Perform full compilation (run Fitter) After synthesis and setting up TimeQuest with constraints, go back into Quartus and enable synthesis/fitting to use the constraints. Verify timing in TimeQuest TA

387 Enable TimeQuest TA in Quartus II Software
Tells the Quartus II software to use SDC constraints during fitting File order precedence Any SDC files manually added to Quartus II project (in order) <current_revision>.SDC located in project directory

388 Enabling in the Quartus II Software
In the Assignment -> settings menu look under Timing Analysis Notes: Arria GX and newer devices only support Timequest TA. TimeQuest TA is enabled by default for new Stratix III and Cyclone III designs.

389 Adding SDC File to Quartus II Project
Add SDC files to TimeQuest Timing Analyzer Multicorner timing analysis checks all process corners (On by default for Cyclone II, Stratix II, & newer devices) Click Add to add SDC to list You can add a constraint file here . If it’s the sane name as the top level hierarchy, it’s automatically added. Analyze fast and slow corners during compile

390 Using TimeQuest TA in Quartus II Flow
Synthesize Quartus II project Use TimeQuest TA to specify timing requirements Enable TimeQuest TA in Quartus II project Perform full compilation (run Fitter) And now fit with constraints. Verify timing in TimeQuest TA

391 Verifying Timing Requirements
View TimeQuest summary information directly in Quartus II Compilation Report Open TimeQuest TA for more thorough analysis Follow TimeQuest flow using Post-fit netlist Run TimeQuest easy-to-use reporting capabilities (Tasks pane) Place Tcl reporting commands into script file - Easy repetition Verify whether Fitter was able to meet timing requirements You can view timing summaries directly in the compilation report. Red reports indicate timing constraint failures.

392 3rd-Party Timing Analysis Tool Support
Synopsys PrimeTime Mentor Graphics TAU

393 Design Constraints: An Example
shows an example circuit including two clocks, a PLL, and other common synchronous design elements

394 SDC - Example # Create clock constraints
create_clock -name clockone -period [get_ports {clk1}] create_clock -name clocktwo -period [get_ports {clk2}] # Specify that clockone and clocktwo are unrelated by assigning # them to separate exclusive groups set_clock_groups -exclusive -group [get_clocks {clockone}] -group [get_clocks {clocktwo}] # set input and output delays set_input_delay -clock { clockone } -max 4 [get_ports {data1}] set_input_delay -clock { clockone } -min -1 [get_ports {data1}] set_input_delay -clock { clockone } -max 4 [get_ports {data2}] set_input_delay -clock { clockone } -min -1 [get_ports {data2}]

395 SDC Example The SDC file shown contains the following basic constraints you should include for most designs: Definitions of clockone and clocktwo as base clocks, and assignment of those settings to nodes in the design. create_clock Command create_clock -period 10 -name clk_sys [get_ports clk_sys] Specification of two mutually exclusive clock groups, one containing clockone and the other containing clocktwo. This overrides the default analysis of all clocks in the design as related to each other. set_clock_groups -exclusive -group [get_clocks {clockone}] -group [get_clocks {clocktwo}] Specification of input delays for the design to specify the external input delay requirement with reference to clock. set_input_delay -clock { clockone } -max 4 [get_ports {data1}]

396 Summary TimeQuest timing analyzer provides an easy-to-use tool to verify timing Entering timing constraints Run various timing reports

397 Implementation Also refers as Place and Route (PAR), the hardest job.
Input: Synthesized netlist Output: bit stream or programming file with an optional gate-level simulation netlist Maps the synthesized netlist to the specific or target FPGA’s resources and interconnects them to the FPGA’s internal logi and I/O resources. Physical layout is determined. Takes four steps to convert the mid-level netlist to a final programming file – translate, map, place and route, and generate programming file.

398 Translate Translation process takes the input netlist and merges it with the design constraints (if provided) to create a native generic database (NGD) output file. The synthesized netlist is automatically fed into the translation process. If error detected, the tool stops. Once completed, NGD output netlist is automatically fed into the mapping process.

399 Map Mapping takes the NGD netlist, the logical design, and maps it to the target FPGA. First, a logical DRC (design rule check) is performed on the NGD list. The logic is mapped to the target FPGA’s logic cells, I/O cells, and other internal resources. The output is a native circuit description (NCD) file. NCD: the physical representation of the design and mapped to the target FPGA’s internal resources and components. NCD feeds into place-and-route stage.

400 Place and Route Takes the NCD file and interconnects the design (places and routes it). The output is NCD which is used to create the programming bit stream. Optional gate level simulation, provides actual gate delay based on routing and placement. If a functional simulation was successful but not the gate-level simulation, need to narrow down where the problem first occurred.

401 Generate Program File The final Step: is to generate the programming file with NCD output file from the place-and-route step as input, output is the FPGA’s programming file. This programming file resides on a nonvolatile device like PROM or within the FPGA device. This bit stream is automatically downloaded to the FPGA at power-up, this process is called configuration. Implementation tool provides various option, the bit stream can be compressed or uncompressed, Security options are available to prevent unauthorized downloading of the bit stream. Once bit stream is ready, the next step is to program the FPGA.

402 Implementation Tools Implementation tool is offered by the FPGA’s manufacturer and generally not a third-party company. The tools use proprietary algorithms to process the synthesized netlist and produce the final programming file. Step up is easy, the synthesized netlist is automatically fed into the implementation process for a complete package development tools. Tools must be directed to the synthesized netlist for a third party’s netlist. Putting the design into the FPGA and interconnecting can be the most challenging and time-consuming part of the development process. Minimum Input: synthesized netlist with an optional user-defined constraints file.

403 Implementation Tools – cnt’d User Constraints
User-defined constraint files contain such information as timing, pin assignments, and internal placement for logic. Constraints make the tool work harder. Make sure to consider all the factors when determining when and what should be constrained. Try to keep the device utilization below a reasonable percent. Consider the room needed for potential growths and spare pins. Pin assignment is most used constraint since it impacts the board routing. Either the tool or you should assign pins. Possibly, let the tools make the initial pin assignment, review the list, and make changes as necessary.

404 Implementation Phase Tips
Remember to lock pin assignments, otherwise they are subject to change. Create constraints only when necessary. Implementation Processes can be performed continuously, if no errors are encountered. Consult the data sheet, user’s guide, or other manufacturer’s materials to find acceptable configuration options for your FPGA.

405 Programming Programming is the final development phase and the introduction of hardware. Programming involves transferring the bit stream into a nonvolatile or volatile memory device and configuring or programming the FPGA. Serially or Parallel data transfer. Configuration can involve one or a series of daisy chained or connected FPGAs. Nonvolatile device are located on the same board as the targeted FPGA or even on another board. The FPGA may be operating in wither master (controlling configuration) or slave (not controlling configuration) mode.

406 Tools and Hardware If the microprocessor holds the bit-stream, then it is merged with the software build. The processor configures the FPGA on power-up. For nonvolatile memory, programming options include: JTAG (Joint Test Advisory Group) in-system programming (ISP) Third-party programmer

407 JTAG - Joint Test Advisory Group
IEEE , Standard Test Access Port and Boundary Scan Architecture. Access pins on a JTAG –compatible device that provides visibility inside the device. Testing and debugging mechanism used to detect manufacturing faults on populated boards. Tools include JTAG software and a software host, and the hardware is JTAG cable. JTAG software is the interface used to transfer the bit stream from the host to the programmable device.

408 JTAG - Joint Test Advisory Group – cnt’d
Over time, it was realized that JTAG ports could be used for programming. The pins include: TDI (Test Data In) TDO (Test Data Out) TCK (Test Clock) TMS (Test Mode Select) Optional TRST (Test Reset) A JTAG programming involves transferring the bit stream from the host through the JTAG cable to a header, test pins ,or a connector on a board that connects to the JTAG-compatible nonvolatile memory devices. FPGA Manufacturers generally offer JTAG programming tools, cable, and any necessary supplies.

409 In-System Programming
Device can be programmed while the system is still operating. Datasheet specifies whether the device supports ISP. Tools needed: ISP software on Host, downloadable cable. Programming can be done by connecting Test Pins to ATE or a board connector. Supported protocols are the IEEE for Boundary-Scan-Based In-System (IEEE1532), JTAG, and serial peripheral interface (SPI). ISP is a better option.

410 Third Party Programming
Available from third-party manufacturers. Include GUI, programming base that connects to a computer, and some socket adaptors or all-in-one programmer. A socket adaptor is where the programmable device is placed to get programmed. Each is designed to hold specific package type. Example: Data I/O Manual programming

411 Hardware Configuration
Mode M2 M1 M0 Master serial Slave serial 1 Master Parallel Slave Parallel JTAG FPGA can be master or slave Configuration pins are set to specific values to indicate whether it is a master or a slave. Always make the programming pins accessible via test points, or a connector

412 Board Design Tips Tip 1: When daisy-chain device, make sure to add the ability to jump out or remove any of the device if necessary. Tip 2: Design with troubleshooting mindset, test points, pads, or connectors are valuable. Consider using test connectors that mate the lab equipment hardware. Tip 3: Select the FPGA package based on the ability to upgrade to a larger size in the same package without re-spinning the board. Make sure the two devices are pin-pin compatible. Goal: upgrade to a larger size without having to redo the board. Tip 4: Unused pins, make sure to consult with datasheet for appropriate level, (terminate unused pins).

413 DE2 board The DE2 board contains a serial EEPROM chip that stores configuration data for the Cyclone II FPGA. This configuration data is automatically loaded from the EEPROM chip into the FPGA each time power is applied to the board. Using the Quartus II software, it is possible to reprogram the FPGA at any time, and it is also possible to change the non-volatile data that is stored in the serial EEPROM chip. JTAG programming: In this method of programming, named after the IEEE standards Joint Test Action Group, the configuration bit stream is downloaded directly into the Cyclone II FPGA. The FPGA will retain this configuration as long as power is applied to the board the configuration is lost when the power is turned off. AS programming: In this method, called Active Serial programming, the configuration bit stream is downloaded into the Altera EPCS16 serial EEPROM chip. It provides non-volatile storage of the bit stream, so that the information is retained even when the power supply to the DE2 board is turned off. When the board's power is turned on, the configuration data in the EPCS16 device is automatically loaded into the Cyclone II FPGA.

414 Final Exam Scope – Wednesday Dec 19 @ 12:45 pm
Simulation versus Synthesis Latches versus registers Common pitfalls Unwanted latches Case statement Variable versus signals Synthesizable subprograms Gated clocks Inferring Logic Functions Control Signal Priority Tri-state Memory State Machine Coding VHDL Logic Optimization & Performance Balancing operators Resource Sharing Logic Duplication Pipelining PLD PROM PLA PAL CPLD Programming PLD ASIC FPGA Architecture Quartus Development software FPGA Programming Technology SRAM versus Antifuse FPGA EEPROM/Flash FPGA Xilinx FPGA Architecture FPGA basic building blocks FPGA Embedded Blocks FPGA Clocking Mechanism FPGA Family Altera Megafunctions FPGA Design flow Design phase Advanced VHDL Topics

415 Final Exam Scope – Wednesday Dec 19 @ 12:45 pm
Synthesis Operation Technology Mapping Synthesis Tools Netlists Status Reports Schematic View (RTL and Technology View) Timing Analysis using TimeQuest Implementation Implementation Processes Tools Programming Tools and hardware Parameterized Code Constructs Pre-Defined Attributes Generics For Generate If generate RTL Simulation Functional Simulation Gate Level simulation Testbenchs Classes of Testbenches Advantages and Disadvantages Test Vector Generation Self Verifying Testbenches Useful VHDL constructs for Testbenches Synthesis Synthesis Operation Syntax Check and resource association Optimization

416 Example - 1 What is DRC and Where it happened in Implementation phase?
State the four process of implementation phase? Explain the difference between Functional Simulation and Gate level Simulation?

417 Example - 1 What is DRC and Where it happened in Implementation phase?
DRC: Design Rule Check and is performed on the NGD list in Mapping. State the four process of implementation phase? Translate, Map, Place and Route, and Generate Program File

418 Example – 1 Functional vs. Gate-Level
Performed on netlist or code generated by synthesis tool Sometimes necessary to direct synthesis tool to provide netlist Initial Timing Analysis Will the synthesized design fit or work on the target hardware Performed on the netlist generated by the implementation tool. Contains actually timing information Will it work as expected over all operational conditions Detects design timing problems It is Representative of hardware Most realistic

419 Example - 2 Given the following entity declaration of a register:
Write a component instantiation that instantiates the reg entity to implement a 4-bit control register. The register data input connects to the rightmost four bits of data_out, the clk input to io_write, the reset input to io_reset and the data output to control signals io_en, io_int_en, io_dir, and io_mode. ENTITY reg IS GENERIC (width : positive); PORT ( d : IN STD_LOGIC_VECTOR (0 to width – 1); q: OUT STD_LOGIC_VECTOR (0 to width – 1); Clk, reset : IN STD_LOGIC); END ENTITY reg;

420 Example - 2 Write a component instantiation that instantiates the reg entity to implement a 4-bit control register. The register data input connects to the rightmost four bits of data_out, the clk input to io_write, the reset input to io_reset and the data output to control signals io_en, io_int_en, io_dir, and io_mode. Io_control_reg : reg GENERIC MAP (width => 4); PORT MAP ( d => data_out (3 downto 0), q(0) => io_en, q(1) => io_int_en, q(2) => io_dir, q(3) => io_mode, clk => io_write, reset => io_reset); END ENTITY reg;

421 Example - 3 Draw a diagram illustrating the circuit described by the following generate statement: Synch_delay_line : for stage in 1 to 4 generate Delay_ff : component d_ff port map (clk => sys_clock, d => delayed_data ( stage – 1), q => delayed_data (stage) ); End generate synch_delay_line;

422 Example - 3 Draw a diagram illustrating the circuit described by the following generate statement: Synch_delay_line : for stage in 1 to 4 generate Delay_ff : component d_ff port map (clk => sys_clock, d => delayed_data ( stage – 1), q => delayed_data (stage) ); End generate synch_delay_line;

423 Example - 4 Write a conditional generate statement that connects a signal external_clock directly to a signal internal_clock if a Boolean generic constant positive_clock is true. If the generic is false, the statement should connect external_clock to internal_clock via an instance of an inverter component.

424 Example - 4 Write a conditional generate statement that connects a signal external_clock directly to a signal internal_clock if a Boolean generic constant positive_clock is true. If the generic is false, the statement should connect external_clock to internal_clock via an instance of an inverter component.

425 Logic Duplication Intentional duplication of logic to improve performance Synthesis tools can perform automatically User sets maximum fan-out of a node

426 Fan-out Problems High fan-out increases placement difficulty
High fan-out node cannot be placed close to all destinations Ex: Fan-out of 1 & 15

427 Controlling Fan-out Logic fan-out reduced by replication
Path now contains fan-out of 3 & 5

428 Logic Duplication Example
High fan-out node duplicated & placed to reduce delay N

429 Automatic Fan-out Control
Most synthesis tools feature options which limit fan-out Advantage: Easy experimentation Disadvantage: Less control over results Knowing which nodes have high fan-out & their destination helps floor-planning

430 Quartus II Software Fan-out Control
Select Signal Details

431 Shift Register Example
PROCESS (clk) BEGIN IF rising_edge(clk) THEN IF sclr_cell = '1' THEN regc <= (others => '0'); ELSE regc <= regc(62 downto 0) & regb (63); END IF; regb <= (others => '0'); regb <= regb(62 downto 0) & rega (63); rega <= (others => '0'); rega <= rega(62 downto 0) & d; END PROCESS; q_out <= regc(63); sclr_cell fans out to each DFF within 3 64 bit shift registers The shift registers are cascaded to produce one 192 bit shift register sclr_cell provides a synchronous clear function

432 Fan-out to 192 Registers

433 Shift Reg with Reduced Fan-out
PROCESS (clk) BEGIN IF rising_edge(clk) THEN IF sclr_cell(2) = '1' THEN regc <= (others => '0'); ELSE regc <= regc(62 downto 0) & regb (63); END IF; IF sclr_cell(1) = '1' THEN regb <= (others => '0'); regb <= regb(62 downto 0) & rega (63); IF sclr_cell(0) = '1' THEN rega <= (others => '0'); rega <= rega(62 downto 0) & d; END PROCESS; q_out <= regc(63); sclr_cell is replicated so that it appears 3 times Fan-out from the previous cell has gone from 1 to 3 but this is insignificant

434 Fan-out to 64 Registers

435 a 1 b X 1 1 1 1 1 c

436 Topics – Exam II Synthesis Synthesis Operation
State Machine Coding VHDL Logic Optimization & Performance Balancing operators Resource Sharing Pipelining Parameterized Code Constructs Pre-Defined Attributes Generics For Generate If generate Simulation RTL Simulation Functional Simulation Gate Level simulation Testbenchs Classes of Testbenches Advantages and Disadvantages Test Vector Generation Self Verifying Testbenches Useful VHDL constructs for Testbenches Synthesis Synthesis Operation Syntax Check and resource association Optimization Technology Mapping Synthesis Tools Netlists Status Reports Schematic View (RTL and Technology View)


Download ppt "EET 3143 Programmable Logic Devices"

Similar presentations


Ads by Google