Presentation is loading. Please wait.

Presentation is loading. Please wait.

EET 3143 Programmable Logic Devices Michigan Technological University Electrical Engineering Technology Instructor: Dr. Nasser Alaraje.

Similar presentations


Presentation on theme: "EET 3143 Programmable Logic Devices Michigan Technological University Electrical Engineering Technology Instructor: Dr. Nasser Alaraje."— Presentation transcript:

1 EET 3143 Programmable Logic Devices Michigan Technological University Electrical Engineering Technology Instructor: Dr. Nasser Alaraje

2 Slide - 2 Contact Information Name: Abdulnasser (Nasser) Alaraje Office: 417 EERC Building Phone (O): Office Hours: MWF 10:00 am – 12:00 pm (or by appointment)

3 Slide - 3 Practical Course Course Objectives: Upon Successful completion of this course, students should:  Learn how to use HDL for modeling basic building blocks of digital system  Learn FPGA technology and the impact of using FPGA in logic design  Learn FPGA design flow using Altera’s Quartus® II development software  Gain FPGA design experience by synthesizing, mapping, and placing and routing a given design on Altera’s DE2 FPGA evaluation board  Work in groups of two or three and thereby learn how to cooperate in teams  Gain a basic understanding of timing analysis  Learn how to build SDC files for constraining FPGA designs  Learn how to verify timing on simple design using the TimeQuest analyzer

4 Slide - 4 Why FPGA? Respond to the Market needs of Skilled FPGA Engineers FPGA-based re-programmable logic design became more attractive as a design medium during the last decade only 19.5 % of 4-year and 16.5 % of 2-year electrical and computer engineering technology programs at US academic institutions currently have a curriculum component in hardware description language and programmable logic design Curriculum has not yet “caught up” to industry needs. industry must be driving the curriculum development.

5 Slide - What projects are FPGAs good for Aerospace & Defense Radiation-tolerant FPGAs along with intellectual property for image processing, waveform generation, and partial reconfiguration for SDRs. Automotive Automotive silicon and IP solutions for gateway and driver assistance systems, comfort, convenience, and in- vehicle infotainment. Broadcast Solutions enabling a vast array of broadcast chain tasks as video and audio finds its way from the studio to production and transmission and then to the consumer. Consumer Cost-effective solutions enabling next generation, full-featured consumer applications, such as converged handsets, digital flat panel displays, information appliances, home networking, and residential set top boxes. Industrial/Scientific/Medical Industry-compliant solutions addressing market-specific needs and challenges in industrial automation, motor control, and high-end medical imaging. Storage & Server Data processing solutions for Network Attached Storage (NAS), Storage Area Network (SAN), servers, storage appliances, and more. Wireless Communications RF, base band, connectivity, transport and networking solutions for wireless equipment, addressing standards such as WCDMA, HSDPA, WiMAX and others. Wired Communications End-to-end solutions for the Reprogrammable Networking Linecard Packet Processing, Framer/MAC, serial backplanes, and more 5

6 Slide - Who uses them 6

7 Slide - Why are they important They have the ability to revolutionize the way that prototyping is done. Allows companies to get to market quicker and stay in market longer. 7

8 Slide - Xilinx Largest manufacturer of HW Develop hardware and software Embedded PowerPC University Program 8

9 Slide - Altera Second largest manufacturer Develop HW and SW University Program 9

10 Slide - It depends –Time –Existing resources –Money –Level of effort –Preference Which is best? 10

11 Slide - Hardware/Software? Software:Quartus Software Hardware: DE2 FPGA board 11

12 Slide - Welcome to the Quartus II Software! Turn on or off in Tools  Options 12

13 Slide - Altera DE2 Development Board 13

14 Slide - 14 Entity Describes all inputs and outputs Every VHDL design must has at least one entity Requires the use of Identifiers for naming the entity itself as well as the inputs and outputs Entity is a keyword and is reserved in VHDL for this purpose entity is port (signal identifier); end entity ENTITY Or2 IS PORT (x: IN std_logic; y: IN std_logic; F: OUT std_logic); END Or2;

15 Slide - 15 Architecture Architecture declaration is where the operation of the logic function is specified For each entity there must be a corresponding architecture Each architecture must be associated by name with an entity architecture of is begin The description of the logic function goes here end architecture ARCHITECTURE Or2_beh OF Or2 IS BEGIN PROCESS(x, y) BEGIN F <= x OR y; END PROCESS; END Or2_beh;

16 Slide - 16 VHDL Processes A process is executed in sequence Sensitivity list is a list of signals to which the process is sensitive and is optional Name: process (sensitivity list) Declarations Begin Sequential statements End process; PROCESS(x, y) BEGIN F <= x OR y; END PROCESS;

17 Slide - 17 VHDL Components Predefined logic Place in a VHDL library and use repeatedly Any logic function can become a component and used in large programs component name_of_component is port (port definition); end component name_of_component; COMPONENT And2 IS PORT (x: IN std_logic; y: IN std_logic; F: OUT std_logic); END COMPONENT;

18 Slide - 18 Conditional Statements if-then if-then-else elsif case

19 Slide - 19 If statement Causes a decision to be made When the if statement is true, the code following the if statement is executed When the if statement is false, the code following the if statement until the end if is skipped if conditional statement then VHDL statements end if

20 Slide - 20 If-Then-Else statement else is an alternative path for the if statement if conditional statement then VHDL statements else VHDL statements end if

21 Slide - 21 Elsif statement Use to allow multiple alternative paths if conditional statement then VHDL statements elsif conditional statement then VHDL statements elsif conditional statement then VHDL statements end if

22 Slide - 22 Case statement example case expression is when choice => VHDL statement; when choice => VHDL statement; when others => VHDL statements; end case;

23 Slide - 23 Processes in VHDL Processes Describe Sequential Behavior Processes in VHDL Are Very Powerful Statements Allow to define an arbitrary behavior that may be difficult to represent by a real circuit Not every process can be synthesized Use Processes with Caution in the Code to Be Synthesized Use Processes Freely in Testbenches

24 Slide - 24 Logic Operators Logic operators Logic operators precedence and or nand nor xor not xnor not and or nand nor xor xnor Highest Lowest only in VHDL-93

25 Slide - Logic Operators - example Order of evaluation Need to describe XOR using and, or, not C = a and not b or not a and b Will be interpreted as: C = ((a and (not b)) or (not a) and b C = (ab’+a’)b  not correct Need to use parentheses as follows C = (a and not b) or (not a and b) Associative logical operator and, or, xor, xnor are associative. f <= a and b and c; allowed nand or nor is not associative. g <= a nand b nand c; invalid G <= not (a and b and c) ; valid 25

26 Slide - 26 Loops A loop repeatedly executes the sequential statements contained within the loop structure for loop –Entry point –Iteration –terminal test for identifier in starting value to stopping value loop VHDL statements end loop

27 Slide - 27 While loop A for loop stops after a fix number of iterations A while loop continues to loop until a condition is met Structure –Entry point –Terminal test –Exit point while Boolean expression loop VHDL statements end loop

28 Slide - 28 Data Types bit bit_vector integer –natural –positive Boolean All are keywords Data types define the type of data and the set of values that can be assigned to.

29 Slide - 29 Integer Data Type Can contain positive and negative whole numbers entity declaration sets a range In the example the output will require 4 pins for the integer entity integer_1 is port( A, B: in bit; Z:out integer range 0 to 15); end entity integer_1

30 Slide - 30 Natural data sub type A subtype of integer data Holds whole numbers greater than or equal to zero In an application limit the range so you limit the number of pins assigned entity natural_1 is port( A: in natural range 0 to 16; X: out natural range 0 to 31); end entity natural_1;

31 Slide - 31 Positive data sub type A subtype of integer data Restricts integers to the range from 1 to the specified range limit. entity positive_1 is port( A, B: in bit; Z: out positive range 1 to 31); end entity positive_1;

32 Slide - 32 Boolean Data Type Has two possible values true and false In the example below two variables are declared on as true and the other is false variable v1: boolean := false; variable v2: boolean := true:

33 Slide - 33 User-defined enumeration types - Examples type state is (S0, S1); type alu_function is (disable, pass, add, subtract, multiply, divide); type octal_digit is (‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’);

34 Slide - 34 Functions and Procedures Types of subprograms in VHDL Allow for modularization and code reuse process can also be used as a subprogram, think of a subprogram as a process that is located outside of the architecture of a program. A function is a subprogram that operates on a set of inputs and returns an output A procedure is a subroutine that operates on an argument list and passes values back through the argument list function and procedure will require a call

35 Slide - 35 Function syntax FUNCTION function_name ( ) RETURN data_type IS [declarations] BEGIN (function statements) Return value; END function_name;

36 Slide - 36 Function example function and_gate (X, Y: in std_logic) return std_logic is begin return X and Y; end and_gate; To call a function : The output of a function can be assigned to an output port (same data type). Information can also be passed into the function by value. AND1: x<=and_gate (A,B); AND2: x<=and_gate(‘1’, B);

37 Slide - 37 Procedure syntax PROCEDURE procedure_name ( ) IS [declarations] BEGIN (procedure statements) END procedure_name; Procedure: similar to a function; however, the arguments in a procedure can include both inputs and outputs (function has inputs only).

38 Slide - 38 Procedure example procedure or_gate(X, Y : in std_logic; Z: out std_logic) is begin Z <= X or Y; end or_gate; To call a procedure: Inputs and outputs are used to pass data in and out a VHDL procedure (same data type). B1: or_gate (A=>X, B=>Y, Z =>V1);

39 Slide - 39 Libraries, Packages and Package Bodies They hold commonly-used elements and allows them to be stored and used over and over again without having to re-write them. Components, Procedures and functions are in packages Packages can be user defined or vendor supplied Libraries are used to hold packages

40 Slide - 40 Libraries Two types –Standard libraries (like IEEE standard library) –User defined (holds user-defined packages) IEEE Standard Library VHDL library coding library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_1164.std_logic; Keyword library: make the packages in the IEEE library visible to the VHDL code. Keyword use: tells the VHDL code what is to be used from the IEEE library. You can specify a specific feature(s) from the package or you can use the keyword all to make them all available.

41 Slide - 41 Packages Used to hold reusable code –Components –Functions –procedures Package declaration package user_defined_name is package declarations end package user_defined_name;

42 Slide - 42 Package Body Package body is where items listed in the declaration are defined. Package body syntax package body user_define_name is package body definitions End package body user_defined_name;

43 Slide - 43 Package containing a function (1) LIBRARY IEEE; USE IEEE.std_logic_1164.all; PACKAGE specialFunctions IS FUNCTION AndGate( A,B: in std_logic) RETURN std_logic; END specialFunctions; PACKAGE BODY specialFunctions IS FUNCTION AndGate( A,B: in std_logic) RETURN std_logic is BEGIN return A AND B; END AndGate; END specialFunctions;

44 Slide - 44 Package containing a function (2) The package is saved as specialFunctions in the library name work, which is the default library. Once the package is compiled. It can be used by other VHDL programs. Example: LIBRARY IEEE; USE IEEE.std_logic_1164.all; USE work.specialFunctions.all; Entity ExamplePackage is Port (A, B: in std_logic; X: out std_logic); End Entity ExamplePackage; Architecture MyGate of ExamplePackage is Begin Process (A,B) Begin A1: X<= AndGate(A,B); End process; End architecture MyGate;

45 Slide - 45 FPGA Introduced by Xilinx in mid 1980 for implementing digital logic F ield P rogrammable G ate A rray FPGA Can be visualized as a set of programmable logic blocks embedded in programmable interconnect Interconnect architecture provides the connectivity between logic blocks Programming Technology determines the method of storing configuration

46 Slide - 46 FPGA Re-programmable Logic Applications When FPGA first introduced, it was considered as another form of gate array SRAM-FPGA in-circuit reprogrammability feature provides a more than just a standard gate array FPGAs have gained rapid acceptance and growth over the past decade because they can be applied to a very wide range of applications –random logic –Custom computing machine –device controllers –communication encoding and filtering programmable logic becomes the dominant form of digital logic design and implementation

47 Slide - 47 FPGA design flow Design Flow is the step-by-step methodology to go through the process of FPGA design The design flow can be divided into 6 basic steps Design Entry Functional Verification and Simulation FPGA Synthesis FPGA Place & Route Circuit Analysis (Timing, Power …) Programming FPGA devices

48 Slide - 48 Description of Design steps Design Entry – describes the design that has to be implemented onto FPGA Functional Verification and Simulation – checks logical correctness of design FPGA synthesis – converts design entry into actual gates/blocks needed FPGA Place & Route – selects the optimal position and minimizes length of interconnections on device Time Analysis – determines the speed of the circuit which has been completely placed and routed Programming to FPGA – downloads bitstream codes onto FPGA devices

49 Slide - 49 FPGA Design Flow design entry (VHDL) FPGA Place and Route FUNCTIONAL VERIFICATION & SIMULATION FPGA Synthesis Download to FPGA CIRCUIT ANALYSIS (Timing) Lets put these design steps in order

50 Slide - 50 FPGA Design Flow FUNCTIONAL VERIFICATION & SIMULATION CIRCUIT ANALYSIS (Timing) Implementation Path Analysis Path design entry (VHDL) FPGA Synthesis FPGA Place and Route Download to FPGA

51 Slide - The origin of FPGA First transistor created at Bell Lab in First Phase Shift oscillator fabricated on a single chip by TI in Around mid-1960, TI introduced 54xx and 74xx series. In 1971, Intel announced the world’s first uP (4004), contains 2300 transistors and could execute 60,000 operations per second. The first programmable IC were referred to as Programmable Logic Devices (PLDs : PROM) arrived in 1970 (simple as compared to new device called Complex PLDs) 51

52 Slide - PLD PLD (Programmable Logic Device) –Contains thousands of basic logic gates in a single package –Capable of performing advanced sequential functions –Must be configured to perform a specific function 52

53 Slide - PROMs The first PLD –Consists of a fixed array of AND functions driving a programmable array of OR –3-input, 3-output PROM, programmable OR link, each OR has 8 inputs, used to implement simple logic functions. –Must be configured to perform a specific function 53

54 Slide - PROMs – Example, 3-input, 3- output function PROM Programmed to implement the 3 functions, W, X, and Y 54

55 Slide - PLAs The first became available in 1975 –Both AND and OR arrays were programmable. –3-input, 3-output PLA, number of AND is independent of the number of inputs (PROM) –OR array is independent of number of AND functions or number of inputs. 55

56 Slide - PLAs - Example PLA Programmed to implement the 3 functions, W, X, and Y 56

57 Slide - PALs The first became available in late 1970 –The exact opposite of PROM, Programmable AND, fixed OR. –3-input, 3-output PAL, faster because only one array is programmable. –Allow a restricted number of products to be Ored. 57

58 Slide - CPLDs The first became available in early 1980, Complex PLD –Mega-PAL, compromised of four standard PALs with some interconnect linking them together. –Altera introduced CPLD based on a combination of EPROM. –A generic device consists of a number of SPLD blocks sharing a common programmable interconnection matrix. 58

59 Slide - Programming PLDs o USE device programmer, each vendor has file format, very time consuming design flow. o In 1980, a committee of the (Joint Electron Device Engineering Council – JEDEC) proposed a standard format for PLD programming text files. 59

60 Slide - ASICs o Four main classes of ASIC (Application Specific Integrated Circuit). o Full Custom: Engineer have complete control over every mask layer used to fabricate the silicon chip. ASIC vendor does not prefabricate any component on the silicon or does not provide ant libraries of predefined logic gates and functions. o Highly complex and time consuming design process 60

61 Slide - o Gate Arrays: based on the idea of a basic cell consisting of a collection of unconnected transistors and resistors. o ASIC vendor prefab silicon chip containing array of the basic cells. o Channeled gate array are presented either single-column or dual-column arrays. o Vendor defines a set of logic function to be used by design engineer (MUX for example) referred as cell. o ASIC Design flow is beyond the scoop of this course. ASICs 61

62 Slide - FPGAs o Around 1980s, a gap in the digital IC. o SPLD and CPLD, programmable and had fast design and modification time, but could not support large or complex functions. o ASIC, support extremely large and complex function, but painfully expensive and time-consuming to design, once the design had been implemented, it is frozen in the silicon. o To address this Gap, Xilinx developed a new class of IC called Field-Programmable Gate Array (FPGA). 62

63 Slide - FPGAs o FPGA are based on the concept of programmable logic block, simple, 3-input lookup table (LUT), a register and a MUX. o Each FPGA contained a large number of these programmable logic blocks embedded in configurable routing architecture. o Every block could be configured to perform different function, register can be programmed on positive or negative clock. o The MUX feeding the FFs could be configured to accept output from the LUT or a separate input to the logic block, the LUT could be configured 3-input logic function. 63

64 Slide - FPGAs o Example: configure the LUT to perform o Y = (A and B) OR (NOT C) 64

65 Slide - FPGAs Large number of programmable blocks (islands) surrounded by a (sea) of programmable interconnects 65

66 Slide - Altera’s Quartus II Tutorial Start the Quartus II software and prepare to implement the Boolean equation X = AB +CD. 66

67 Slide - Altera’s Quartus II Tutorial Create a new project Create a block design file (bdf) Draw the digital logic for the Boolean equation Make the circuit connections Compile the project 67

68 Slide - Altera’s Quartus II Tutorial Create a vector waveform file (vwf) Add inputs and outputs to the waveform display Create timing waveforms for the inputs Perform a functional simulation of the x-output 68

69 Slide - Altera’s Quartus II Tutorial Use the Altera development and education board to program an FPGA. –Assign pins –Recompile the project –Program the FPGA –Test the logic Use the VHDL text editor to recreate the design used in the block design. F1D26AD6D50FA7/0/oVvmeyVMtEIhttp://www.youtube.com/user/billkleitz#p/c/57 F1D26AD6D50FA7/0/oVvmeyVMtEI 69

70 Slide - FPGA Programming Technology SRAM-based FPGA Fabric: means the underlying structure of the device. Majority of FPGA are SRAM based. They can be configured over and over again. Impact the memory R&D. SRAM cells are created exactly the same as the rest of the device. Downside: Have to be reconfigured every time the system is powered up. Configuration file is stored in external memory. Security issues with protecting your IP. Some SRAM-based FPGA supports encryption. 70

71 Slide - Antifuse-based FPGA Programmed offline using a special programmer. Nonvolatile, configurations remains when power is off. No external memory device to store configuration data. Application: military and Aerospace. Once programmed, it can not be altered. NO Security issues with protecting your IP. Downside: They are OTP, once programmed, function is set stone. 71

72 Slide - EPROM/Flash-based FPGA Can be configured offline or using in-system programming. Nonvolatile, once programmed, the data is nonvolatile. Support protection mechanism. Application: military and Aerospace. 72

73 Slide - Summary 73

74 Slide - FPGA architectures (Fine, medium, and coarse-grained) Reminder: large number of programmable logic blocks (islands) embedded in a (sea) of programmable interconnect. Fine-grain: each logic block can be used to implement only a very simple function such as any 3-input function. Coarse-grain: relatively larger logic block. As the granularity of the blocks increases to medium or high, the amount of connections into the blocks decreases compared to functionality they can support. 74

75 Slide - MUX based logic block Consider example y = (A AND B) OR C; Each input to the block is presented with a logic 0, a logic 1, or the true or the inverse of a signal Implemented using MUX 75

76 Slide - LUT based logic block Consider example y = (A AND B) OR C; A group of input signals is used as an index (address) to the lookup table. Load the 3-input LUT with the appropriate values. LUT is SRAM based. 76

77 Slide - LUT based logic block 77

78 Slide - MUX versus LUT logic block? Majority of today’s FPGA architectures are LUT based. MUX based does not provide high-speed carry logic chains, in which LUT are leader in anything to do with arithmetic processing. First FPGAs were based on 3-input LUTs. Mainly 4-input LUTs architecture. 78

79 Slide - CLBs versus LABs? Can not LIVE by LUTs alone. Will contain other elements such as MUX and registers. 79

80 Slide - Xilinx logic cell Each vendor has its own names for things. Xilinx call it logic cell (LC), comprises: 4-input LUT MUX Register Clock can be configured rising versus falling Register can be configured as FFs or as a latch. Altera call it logic element (LE) 80

81 Slide - Slicing Next step up of the hierarchy is a slice. Slice has one set of clock, clock enable, and set/reset signals common to both logic cells. 81

82 Slide - CLBs versus LABs? Next step up is CLB for Xilinx and LAB for Altera. Xilinx have two or more slices in each CLB, example shows fours slices per CLB, additionally, fast programmable interconnect with the CLB to connect neighboring slices. 82

83 Slide - CLBs versus LABs? Each 4-bit LUT can be used as 16x1 RAM. Also, the four slices per CLB, all LUTs can be configured to implement the following: Single port 16X8 bit RAM Single port 32X4 bit RAM Single port 64X2 bit RAM Single port 128X1 bit RAM Dual port 16X8 bit RAM Dual port 32X4 bit RAM Dual port 64X2 bit RAM Each 4-bit LUT can be used as 16-bit shift register 83

84 Slide - 84 Embedded RAMs Every applications needs memory. FPGA now include large chunks of embedded RAM called e-RAM or block RAM. Usually organized in columns. Each block can be used independently or multiple blocks can be combined together to implement large blocks. Useful to implement single-, dual-, FIFO, state machines …

85 Slide - 85 Embedded multiplier, adders, … Some functions are inherently slow if they are implemented by connecting a large number programmable logic blocks. Many FPGA incorporate special hard-wired multiplier blocks Located in close proximity to the embedded RAM blocks.

86 Slide - 86 Embedded multiplier, adders, … Some FPGA offers dedicated adder blocks ( very useful in DSP applications) Multiply-and-Accumulate (MAC). If FPGA only provides multiplier blocks, you can combine multiplier with adder and store results in registers.

87 Slide - 87 Embedded processor cores Many application make use of microprocessors in one form or another. High-end FPGA contain one or more embedded microprocessor, referred to as microprocessor cores. Hard processor cores: dedicated predefined block. Either locate it in the strip, advantages: main FPGA fabric is identical, easier for design tools

88 Slide - 88 Embedded processor cores Embed within the main fabric, design tools needs to account for the presence of these blocks in the fabric.

89 Slide - 89 Embedded processor cores - soft Configure a group of programmable logic blocks to act as a microprocessor, soft cores. Are simpler and slower than hard-cores. Advantages: you implement it if you need it Instantiate as many as you need.

90 Slide - Clock trees and clock managers All of the synchronous elements need to be driven by clock signals. Clock signal originates outside the FPGA, comes to FPGA via a special clock input pin and then routed through the device. Clock Tree: the main clock signal branches. This structure ensures that all of the flip-flops see their version as close together as possible. 90

91 Slide - Clock trees and clock managers – cnt’d If the clock were distributed as a single long track driving all registers, one after another, then registers closer to clock pin will see the clock signal sooner, this is referred as skew (avoid!). The clock tree is implement using special track and separate from the general-purpose programmable interconnect. Usually, you will have multiple clock domain and multiple clock pins. 91

92 Slide - 92 Clock trees and clock managers – cnt’d Instead of connecting clock pin into an internal clock tree, it can drive special hard-wired function (block) called clock manager. Clock manager generates a number of daughter clocks. Daughter clocks can drive internal clock trees or external output pins to provide external clock.

93 Slide - 93 Clock trees and clock managers – cnt’d Each FPGA family has its own type of clock manager. Clock manager supports jitter removal, clock edges may arrive a little early or a little late (Jitter). The FPGA clock manager can be used to detect and correct this jitter and to provide a clean daughter clock signals for use inside the device.

94 Slide - 94 Clock trees and clock managers – cnt’d Frequency Synthesis: outside clock is not what the engineers wish for. Clock manager can be used to generate daughter clocks with frequencies derived from original clock. Example: 3 daughter clocks, 1.0, 2.0, 0.5 x original clock frequency.

95 Slide - 95 Clock trees and clock managers – cnt’d Phase shifting: some designs require the use of clocks that are phase shifted (delayed) with respect to each other. Clock managers allow to select from a fixed phase shifts, 90, 180, and 270 or configure the exact amount of phase shift. Example: 1 st is in phase, 2 nd is shifted by 90, and so forth.

96 Slide - 96 General Purpose I/O Today’s FPGA package can have 1,000 or more pins, arranged as array across the base of the package. Each FPGA general purpose I/O can be configured to accept and generate signals conforming to whichever standard is required. General purpose I/O splits in a number of banks, starting from 0 to 7

97 Slide - 97 General Purpose I/O – cnt’d Each bank can be configured to support a particular I/O standard: LVTTL LVCMOS PCI LVDS This allows FPGA to work with multiple I/O standard, or to translate between different protocols that based on particular electrical standards.

98 Slide - 98 FPGA Families Many different types suited for almost every kind of application. FPGAs are grouped into categories, often referred as families or series, each with common characteristics. Some FPGAs are characterized as having high volume, low cost, high temperature, available in various sizes, packages, and speed. Manufacturers group FPGAs according to their application. (automotive, space, medical,.. Etc)

99 Slide - 99 Altera Families Refers to FPGA in series Stratix: High end and High Density On-Chip Transceivers Arria: Midrange Transceiver based. Cyclone: Low cost Low power consumption

100 Slide - A Complete Solutions Portfolio 100 High-density, high-performance FPGAs CPLDsASICsLow-cost FPGAs Design software Development kits Embedded soft processors Mid-range Transceiver FPGAs Intellectual Property (IP)

101 Slide Altera – IP Many FPGA manufacturers offer a variety of what are called Intellectual property (IP) cores or functions. Allow the designer to select and customize specific desired function. Advantages: Faster code development time Reduced design risk less likelihood of errors. Better and faster compiling Some IP cores or functions are free, others are fee based, The IP cores or functions are manufacturer dependent. Altera’s IP called Megafunctions, designed for only their FPGAs.

102 Slide - Altera Megafunctions Pre-made design blocks Benefits –Configurable, parameterized settings add flexibility & portability –“Drop-in” support to accelerate design entry –Pre-optimized for Altera architecture Two versions –Quartus II megafunctions –Intellectual Property (IP) megafunctions 102

103 Slide - Quartus II Megafunctions Free & installed with Quartus II software –Non-encrypted functions written in AHDL (Altera HDL) –HDL simulation models installed in Quartus II libraries Two types –Altera-specific megafunctions (begin with “ALT”) –Library of parameterized modules (LPMs) Examples –Arithmetic –On-chip RAM/ROM –PLLs –DDR/QDR/RLDRAM memory controllers 103

104 Slide - IP Megafunctions Must purchase license (except IP base suite) –Logic for IP function is encrypted Two types –MegaCore ® IP – Developed by Altera –Altera Megafunctions Partner Program (AMPP ℠ ) IP All MegaCore functions & some AMPP functions support OpenCore ® Plus feature –Develop design using free version of core –HDL simulation models provided with IP –Generate time-limited configuration/programming files –See AN320: OpenCore Plus Evaluation of MegafunctionsAN320: OpenCore Plus Evaluation of Megafunctions 104

105 Slide - MegaCore IP Examples Included in IP base suite –FIR Compiler –Fast Fourier Transform –DDR/DDR2 High Performance Memory Controlle License required –Triple-Speed Ethernet MAC –CRC Compiler –PCI Compiler 105 See for a complete list of Altera IP solutionshttp://www.altera.com/products/ip/ipm-index.html

106 Slide - MegaWizard Plug-in Manager Eases implementation and configuration of megafunctions & IP GUI, command line, or both 106 Command line: qmegawiz = file_name Tools  MegaWizard Plug-In Manager or Tasks window File Name Select Megafunction or IP Language

107 Slide - MegaWizard Example 107 Multiply-Add megafunction Updating graphical representation Customization options Locate documentation in Quartus II Help or the web Three step process to configure megafunction

108 Slide - MegaWizard Output File Selection 108

109 Slide - Programming an FPGA – configuration cells Configuration file: contains the information that will be uploaded into the FPGA in order to program it (bit file). Simple: load the configuration file into the device. Programmable interconnect: connects the device’s primary inputs and outputs to the programmable logic blocks and blocks to each others. 109

110 Slide - Programming an FPGA – configuration cells 110 An example of usage of SRAM- controlled switches is illustrated showing two applications of SRAM cells: for controlling the gate nodes of pass- transistor switches and to control the select lines of multiplexers that drive logic block inputs. The figures gives an example of the connection of one logic block (represented by the AND- gate in the upper left corner) to another through two pass-transistor switches, and then a multiplexer, all controlled by SRAM cells.

111 Slide - Programming an FPGA – configuration cells A simple programmable logic block: 4-input LUT, MUX, and a register. Configuration cell: MUX: which input is to be selected. Register: Edge-trigger FF or latch, positive or negative clock edge, active low or high enable, whether to be initialized to zero or 1. LUT: 16-configuration cells 111

112 Slide - Programming an FPGA – SRAM based Volatile: have to be programmed in- system, always need to be reprogrammed when power is first applied to the system. All SRAM configuration cells as a long shift register. Beginning and end of the register are accessible from outside world. Data out is only used if multiple FPGAs are configured by cascading (daisy- chaining) together. FPGA can contain 25 mil cells, clocking 25 mil bits of configuration data into the device. 112

113 Slide - Programming an FPGA – SRAM based LUT: can be configured to act as LUT, 16x1 chunk of distributed RAM, or as 16-bit shift register. Configuration port: small dedicated group of pins used to inform the device which configuration mode is going to be used, two pins are used to provide four modes. Mode pins are hardwired to desired logic (0 or 1) 113

114 Slide - Serial load with FPGA as a master: simplest mode, use external PROM (now flash), has a single data output pin connected to configuration data in pin. FPGA uses several bit to control the external memory device, reset, clock. FPGA clocks the configuration data out of the memory device. Configuration data out is used to read the configuration data from the device for any reason. OR FPGA can be daisy- chained sharing a single memory device. Programming an FPGA – SRAM based Serial load with FPGA as a master 114

115 Slide - Programming an FPGA – SRAM based Parallel load with FPGA as a master Very similar to serial mode, except that data is read in 8-bit chunk from memory device. FPGA also supplies the external memory with an address bus. FPGA has internal counter used to generate the address to the external memory and keeps incrementing. Offers speed: not really, data read still needs to be clocked in serially in early device, now yes! Issues with signal integrity, 8-bit data bus and 24-bit address bus. Newer version of external memory does not require external address, FPGA no longer requires counter. 115

116 Slide - Programming an FPGA – SRAM based Parallel load with FPGA as a slave FPGA as a master: attractive, only FPGA and external memory involved. Microprocessor can be used to load the FPGA, it informs the FPGA to start the configuration process, it reads a byte of data fro memory device and writes into the FPGA. 116

117 Slide - Using the JTAG port Today’s FPGA are equipped with JTAG port (Joint Test Action Group, IEEE standards, originally used for testing the circuit boards. JTAG port: input data, output data, JTAG registers are daisy- chained. Serially clock the data in the JTAG register, FPGA operates on data and ultimately clock the result back out of the JTAG port. 117

118 Slide - Using the JTAG port JTAG can be used for more than Boundary Scan, FPGA connect SRAM shift register to JTAG scan chain, In this case, JTAG can be used to program the FPGA. Today’s FPGA can support five different programming modes, thus require three mode pins. 118

119 Slide - Using an embedded processor When FPGA contains embedded processor, may have its own dedicated JTAG port. JTAG can be used to initialize the internal microprocessor core, configuration then can be handled by the processor. 119

120 Slide - FPGA Design Flow – Design Phase The first development phase is Design FPGA design can be: Converting schematic to HDL Modify existing design Totally new design Very critical phase? Goals: Learn how to evaluate design package Decisions to make prior to creating the design How to create the design 120 FUNCTIONAL VERIFICATION & SIMULATION CIRCUIT ANALYSIS (Timing) Implementation Path Analysis Path design entry (VHDL) FPGA Synthesis FPGA Place and Route Download to FPGA

121 Slide - Design Phase More than just create the design Design materials must be understood “ the design package”: contains the requirements that define the FPGA features and functions, what the design must do and how. Success or failure of the design largely depends on: –The quality of the design inputs –Making Key decision –Development Tools 121

122 Slide - Design Package Usually written by system engineer, or architect. Includes: –Creating of design architecture –Partitioning the design into sections –Creation of design requirements –Creation of Timing and other diagrams (supporting documents). –Do not create your own requirements? Always ask You should always evaluate its content prior to starting the design. 122

123 Slide - Design Package example Timing Diagram, Requirement Documents, State machine, Schematics … etc Evaluate: Package Analysis: Be sure to have a clear understanding of what your are to design. (questions: always ask? Getting Clarification: not all design packages are crystal clear, go directly to the source. Organize: make sure you work from the latest and most accurate information. 123

124 Slide - Pre-design Decisions Design format, FPGA vendor? Tools used? Design requirements may define one or some pre-design decisions. Making one decision can automatically determine the other option, selecting Altera for FPGA vendor determine Quartus Tool. Manufacturer must be known in the design phase for manufacturer dependent designs, synthesis phase when manufacturer and part number are needed for independent designs. 124

125 Slide - Design Format Prior to create a design, You must select the design’s format: Schematic capture, HDL, or a combination. Sometimes, the decision has been made by your design package. You select manufacturer and development tools. If you are starting a new design, you may have the option to select the design format. Schematic Capture: –Pros: Design is drawn as a schematic, easier to create, read, and understand. –Cons: Logic symbols are proprietary, design is manufacturer dependent, less flexible. Option on development tools are limited HDL: –Pros: more design and manufacturer flexibility, manufacturer independent –Cons: May b difficult to read and understand 125

126 Slide - FPGA Manufacturer How to select a device: Need to know how much resources your design require, can be difficult at first. A good way: randomly select a device, synthesize the design, and review the resources required in the output report. With this information, use a datasheet to select a more appropriately sized device. Factors to consider when selecting the device: –Design Application: Avionics, Military, Automotive, Medical, and so forth –Environment: Military, Industrial, commercial –Temperature range: Commercial, 0 to 85 C, Industrial -40 to 100 C, Military -55 to 125 C –Design Size: Board allocated space, Package. 126

127 Slide - Development Tools Each development phase utilizes specific tools. Design phase development tool depends mainly on the output format, if your design is a schematic capture, then the design entry must support schematic capture. Cost: Fees can be very expensive, (license fees, yearly maintenance, know your needs!) Design sharing: Have a set of tools to manage and control the design and its revisions. Complete or Standalone: Manufacturers offer a complete development tools (Altera’s Quartus, Xilinx ISE. Standalone tools performs single function, such as synthesis or simulation: example: Mentor Graphics' ModelSim and Synopsys’s Synplify for design synthesis 127

128 Slide Advanced VHDL – Design Phase Writing synthesizable VHDL Inferring common logic functions Coding state machines Improving logic utilization & performance Writing parameterized code

129 Slide Simulation vs. Synthesis Simulation –Code executed in the exact way it is written –User has flexibility in writing –Initialization of logic supported Synthesis –Code is interpreted & hardware created Knowledge of PLD architecture is important –Synthesis tools require certain coding to generate correct logic Subset of VHDL language supported Coding style is important for fast & efficient logic –Initialization controlled by device –Logic implementation can be adjusted to support initialization Pre- & post-synthesis logic should operate the same

130 Slide Writing Synthesizable VHDL Synthesizable VHDL Constructs Sensitivity lists Latches vs. registers IF-THEN-ELSE structures CASE statements Variables Synthesizable subprograms Combinatorial loops Gated clocks

131 Slide ENTITY ARCHITECTURE CONFIGURATION PACKAGE Concurrent signal assignments PROCESS SIGNAL VARIABLE (non-shared) CONSTANT IF-ELSE CASE Loops (fixed iteration) Multi-dimensional arrays PORT GENERIC (constant) COMPONENT –Synthesis tools may place certain restrictions on supported constructs –See the online help in Quartus II (or your target synthesis tool) for a complete list Component & direct instantiation GENERATE FUNCTION PROCEDURE ASSERT (constant false) WAIT (one per process) TYPE SUBTYPE Some Synthesizable VHDL Constructs

132 Slide ACCESS ASSERT DISCONNECT FILE GROUP NEW Physical delay types PROTECTED SHARED VARIABLE Signal assignment delays –These are some of the constructs not supported by Quartus II synthesis –See the online help in Quartus II (or your target synthesis tool) for a complete list Some Non-Synthesizable VHDL Constructs

133 Slide a b sel c CLRN ENA DQ d clk clr q Sensitivity list includes all inputs used In the combinatorial logic Sensitivity list does not include the d input, only the clock or/and control signals Sequential PROCESS – Sensitive to a clock and control signals Example PROCESS (clr, clk) Combinatorial PROCESS –Sensitive to all signals used on right- hand side of assignment statements Example PROCESS (a, b, sel) Two Types of RTL PROCESS Statements

134 Slide Sensitivity Lists Incomplete sensitivity list in combinatorial PROCESS blocks may result in differences between RTL & gate-level simulations –Synthesis tool synthesizes as if sensitivity list complete PROCESS (a, b) y <= a AND b AND c; Incorrect Way – the simulated behavior is not that of the synthesized 3-input AND gate Correct way for the intended AND logic ! PROCESS (a, b, c) y <= a AND b AND c;

135 Slide Common Pitfall – Missing Inputs from Sensitivity List Pitfall – Missing inputs from sensitivity list when describing combinational behavior –Results in sequential behavior –Wrong 4x1 mux example Has memory No compiler error –Just not a mux LIBRARY ieee; USE ieee.std_logic_1164.ALL; ENTITY Mux4 IS PORT (i3, i2, i1, i0: IN std_logic; s1, s0: IN std_logic; d: OUT std_logic); END Mux4; ARCHITECTURE Beh OF Mux4 IS BEGIN -- Note: missing i3, i2, i1, i0 PROCESS(s1, s0) BEGIN IF (s1='0' AND s0='0') THEN d <= i0; ELSIF (s1='0' AND s0='1') THEN d <= i1; ELSIF (s1='1' AND s0='0') THEN d <= i2; ELSE d <= i3; END IF; END PROCESS; END Beh; d s1 s0 i3 i1 Missing i3-i0 from sensitivity list Recomputes d if s1 or s0 changes Fails to recompute d if i3 (or i2-i0) changes Reminder Combinational behavior: Output value is purely a function of the present input values Sequential behavior: Output value is a function of present and past input values, i.e., the system has memory

136 Slide Latches vs. Registers Altera devices have registers in logic elements, not latches Latches are implemented using combinatorial logic & can make timing analysis more complicated –Look-up table (LUT) devices use LUTs in combinatorial loops –Product-term devices use more product-terms Recommendations –Design with registers (RTL) –Watch out for inferred latches Latches inferred on combinatorial outputs when results not specified for set of input conditions Lead to simulation/synthesis mismatches

137 Slide -137 IF-ELSE Structure IF-ELSE (like WHEN-ELSE concurrent assignment) structure implies prioritization & dependency –Nth clause implies all N-1 previous clauses not true Beware of needlessly “ballooning” logic –Consider restructuring IF statements May flatten the multiplexer and reduce logic If sequential statements are mutually exclusive, individual IF structures may be more efficient IF THEN IF AND THEN ( A) + ( ’ B) + ( ’ ’ cond3 C) + … Logical Equation

138 Slide Cover all cases –Uncovered cases in combinatorial processes result in latches For efficiency, consider –Using don’t cares (‘-’ or ‘X’) for final ELSE clause (avoiding unnecessary default conditions) Synthesis tool has freedom to encode don’t cares for maximum optimization –Assigning initial values and explicitly covering only those results different from initial values When Writing IF-ELSE Structures…

139 Slide Unwanted Latches Combinatorial processes that do not cover all possible input conditions generate latches PROCESS (sel, a, b, c) BEGIN IF sel = “001” THEN output <= a; ELSIF sel = “010” THEN output <= b; ELSIF sel = “100” THEN output <= c; END IF; END PROCESS; sel(2) LOGIC LATCH output sel(1) A sel(0) B C

140 Slide Unwanted Latches Removed Close all IF-ELSE structures –If possible, assign “don’t care’s” to else clause for improved logic optimization PROCESS (sel, a, b, c) BEGIN IF sel = “001” THEN output <= a; ELSIF sel = “010” THEN output <= b; ELSIF sel = “100” THEN output <= c; ELSE output ‘X’); END IF; END PROCESS; sel(2) LOGIC output sel(1) A sel(0) B C

141 Slide Common Pitfall – Output not Assigned on Every Pass Pitfall – Failing to assign every output on every pass through the process for combinational behavior –Results in sequential behavior Referred to as inferred latch –Wrong 2x4 decoder example Has memory No compiler error –Just not a decoder LIBRARY ieee; USE ieee.std_logic_1164.ALL; ENTITY Dcd2x4 IS PORT (i1, i0: IN std_logic; d3, d2, d1, d0: OUT std_logic); END Dcd2x4; ARCHITECTURE Beh OF Dcd2x4 IS BEGIN PROCESS(i1, i0) BEGIN IF (i1='0' AND i0='0') THEN d3 <= '0'; d2 <= '0'; d1 <= '0'; d0 <= '1'; ELSIF (i1='0' AND i0='1') THEN d3 <= '0'; d2 <= '0'; d1 <= '1'; d0 <= '0'; ELSIF (i1='1' AND i0='0') THEN d3 <= '0'; d2 <= '1'; d1 <= '0'; d0 <= '0'; ELSIF (i1='1' AND i0='1') THEN d3 <= '1'; END IF; -- Note: missing assignments -- to all outputs in last ELSIF END PROCESS; END Beh; d3 d2 i0 i1 i1i0=10  d2=1, others=0 i1i0=11  d3=1, but d2 stays same Missing assignments to outputs d2, d1, d0

142 Slide Common Pitfall – Output not Assigned on Every Pass Same pitfall often occurs due to not considering all possible input combinations PROCESS(i1, i0) BEGIN IF (i1='0' AND i0='0') THEN d3 <= '0'; d2 <= '0'; d1 <= '0'; d0 <= '1'; ELSIF (i1='0' AND i0='1') THEN d3 <= '0'; d2 <= '0'; d1 <= '1'; d0 <= '0'; ELSIF (i1='1' AND i0='0') THEN d3 <= '0'; d2 <= '1'; d1 <= '0'; d0 <= '0'; END IF; END PROCESS; Last "ELSE" missing, so not all input combinations are covered (i.e., i1i0=11 not covered) – no update to the outputs

143 Slide sel(2) LOGICLATCH X sel(1) A sel(0) Beware of building unnecessary dependencies –e.g. Outputs x, y, z are mutually exclusive, IF-ELSIF causes all outputs to be dependant on all tests & creates latches PROCESS (sel,a,b,c) BEGIN IF sel = “010” THEN x <= a; ELSIF sel = “100” THEN y <= b; ELSIF sel = “001” THEN z <= c; ELSE x <= ‘0’; y <= ‘0’; z <= ‘0’; END IF; END PROCESS; sel(2) LOGICLATCH Y sel(1) B sel(0) LOGICLATCH Z sel(1) C sel(0) Mutually Exclusive IF-ELSE Latches

144 Slide sel(0) Separate IF statements and close PROCESS (sel, a, b, c) BEGIN IF sel = “010” THEN x <= a; ELSE x <= ‘0’; END IF; IF sel = “100” THEN y <= b; ELSE y <= ‘0’; END IF; IF sel = “001” THEN z <= c; ELSE z <= ‘0’; END IF; END PROCESS; LOGIC X sel(1) sel(2) A LOGIC Y sel(1) sel(2) B LOGIC Z sel(1) sel(2) C PROCESS (sel, a, b, c) BEGIN x <= ‘0’; y <= ‘0’; z <= ‘0’; IF sel = “010” THEN x <= a; END IF; IF sel = “100” THEN y <= b; END IF; IF sel = “001” THEN z <= c; END IF; END PROCESS; Mutually Exclusive Latches Removed

145 Slide Use nested IF statements with care –e.g. These nested IF statements do not cover all possible conditions (open IF statements) & latch is created PROCESS (ina, inb) BEGIN IF ina = '1' THEN IF inb = '1' THEN y <= '1'; END IF; ELSE y <= '0'; END IF; END PROCESS; ina inb y inainbout ?  Uncovered cases infer latches  No default value for objects Nested IF Generating Unwanted Latches

146 Slide PROCESS (ina, inb) BEGIN y <= ‘0’; IF ina = '1' THEN IF inb = '1' THEN y <= '1'; END IF; END PROCESS; inainbout ina out inb  Using initialization to cover all cases; no latch inferred Nested IF – Unwanted Latches Removed

147 Slide Case Statements Case statements usually synthesize more efficiently when mutual exclusivity exists Define outputs for all cases –Undefined outputs for any given case generate latches VHDL already requires all case conditions be covered –Use WHEN OTHERS clause to close undefined cases (if any remain)

148 Slide Case Statement Recommendations Initialize all case outputs or ensure outputs assigned in each case Assign initialized or default values to don’t cares (X) for further optimization, if logic allows

149 Slide Conditions where output is undetermined output: PROCESS (filter) BEGIN CASE filter IS WHEN idle => nxt <= '0'; first <= '0'; WHEN tap1 => sel <= "00"; first <= '1'; WHEN tap2 => sel <= "01"; first <= '0'; WHEN tap3 => sel <= "10"; WHEN tap4 => sel <= "11"; nxt <= '1'; END CASE; END PROCESS output; sel missing nxt missing nxt & first missing first missing –Undetermined output conditions implies memory –Latch generated for ALL 3 outputs Unwanted Latches - Case Statements

150 Slide Conditions where output is determined output: PROCESS(filter) BEGIN first <= ‘0’; nxt <= ‘0’; sel <= “00”; CASE filter IS WHEN idle => WHEN tap1 => first <= '1'; WHEN tap2 => sel <= "01"; WHEN tap3 => sel <= "10"; WHEN tap4 => sel <= "11"; nxt <= '1'; END CASE; END PROCESS output; Signals Initialized To remove latches & ensure outputs are never undetermined –Use signal initialization at beginning of case statement (case statement only deals with changes) –Use don’t cares (‘-’) for WHEN OTHERS clause, if design allows (for better logic optimization) –Manually set output in each case Latches Removed - Case Statements

151 Slide Variable Declarations Variables are declared inside a process Variables are represented by: := Variable declaration VARIABLE : := ; Variable temp : STD_LOGIC_VECTOR ( 7 DOWNTO 0 ); Variable assignments are updated immediately –Do not incur a delay No Delay Temporary storage

152 Slide Assigning Values to Variables Variable assignments are represented by := Examples –All bits temp : = “ ”; temp : = x ” aa ” ; ( ) –VHDL also supports ‘o’ for octal and ‘b’ for binary –Bit-slicing temp (7 DOWNTO 4) : = “ 1010 ” ; –Single bit temp(7) : = ‘ 1 ’ ; Use double-quotes (“ “) to assign multi-bit values and single-quotes (‘ ‘) to assign single-bit values VARIABLE temp : STD_LOGIC_VECTOR ( 7 DOWNTO 0 );

153 Slide LIBRARY IEEE ; USE IEEE.STD_LOGIC_1164.ALL ; ENTITY var IS PORT ( a, b : IN STD_LOGIC ; y : OUT STD_LOGIC ); END ENTITY var; ARCHITECTURE logic OF var IS BEGIN PROCESS (a, b) VARIABLE c : STD_LOGIC ; BEGIN c := a AND b; y <= c; END PROCESS ; END ARCHITECTURE logic; Variable declaration Variable assignment Variable is assigned to a signal to synthesize to a piece of hardware Variable Assignment Variable c updated immediately and new value is available for assigning to y

154 Slide ARCHITECTURE label1: PROCESS {VARIABLE Declarations} label2: PROCESS {VARIABLE Declarations} {SIGNAL declarations} Declared outside of the process statements (Visible to all process statements) Declared inside the PROCESS statements (locally visible to the process statements) Signal and Variable Scope

155 Slide Signals vs. Variables Signals ( <= )Variables ( := ) Assign assignee <= assignmentassignee := assignment Utility Represent circuit interconnectRepresent local storage Scope Architecture scope (communicate between processes within architecture) Local Scope (inside processes) Behavior Updated at end of current delta cycle (new value not immediately available) Updated immediately

156 Slide Variables May synthesize to hardware depending on use Advantages vs. signals –Variables are a more behavioral construct as they don’t have a direct correlation to hardware (like signals) and may lead to more efficient logic –Simulate more efficiently as they require less memory Signals not updated immediately, so simulator must store two values (current and next value) for every changing signal Variables updated immediately, so simulator stores single value Disadvantages vs. signals –Must be assigned to signal before process ends Do not represent physical hardware unless equated with signal –Must be handled with care Requires fully understand assigning values to variables and signals in same process and how dataflow is effected

157 Slide Variables & Latches (Recommendations) Assign an initial value or signal to a variable unless feedback is desired If a variable is not assigned an initial value or signal in a combinatorial process, a latch will be generated –This could cause your design to not function as intended

158 Slide ARCHITECTURE logic OF cmb_vari IS BEGIN PROCESS(i0, i1, a) VARIABLE val : INTEGER RANGE 0 TO 1; BEGIN IF (a = '0') THEN val := val; ELSE val := val + 1; END IF; CASE val IS WHEN 0 => q <= i0; WHEN OTHERS => q <= i1; END CASE; END PROCESS; END ARCHITECTURE logic; Variable used without initialization Variable Uninitialized a case (val) …; 0101

159 Slide ARCHITECTURE logic OF cmb_vari IS BEGIN PROCESS(i0, i1, a) VARIABLE val : INTEGER RANGE 0 TO 1; BEGIN val := 0; IF (a = '0') THEN val := val; ELSE val := val + 1; END IF; CASE val IS WHEN 0 => q <= i0; WHEN OTHERS => q <= i1; END CASE; END PROCESS; END ARCHITECTURE logic; Assign initial value or signal to variable Assign Initial Value to Variable a case (val) …;

160 Slide Subprograms VHDL has 2 subprograms –FUNCTION Performs calculation and returns value –PROCEDURE Performs sequence of defined sequential statements Uses –Replacing repetitive code –Enhancing readability –Break processes into executable sections Defined by means of subprogram declaration (optional) and subprogram body –Subprogram declarations required if subprogram is called before subprogram body is read Consist of sequential statements (like a process) May be declared in process, architecture or package –Determines visibility –When placed in package, subprogram declaration goes in package declaration and subprogram body goes in package body (see earlier package example) Synthesis places restrictions on use of subprograms

161 Slide - PROCEDURE FUNCTION ARCHITECTURE PARAMETERS IN PARAMETERS RETURN VALUE OUT PARAMETERS INOUT PARAMETERS Subprogram Diagram 161

162 Slide FUNCTION ones_count ( SIGNAL a : STD_LOGIC_VECTOR ) IS VARIABLE r : INTEGER; BEGIN r := 0; FOR i IN a’ RANGE LOOP IF a(i) /= ’0’ THEN r := r + 1 ; END IF; END LOOP; RETURN r ; -- Required END FUNCTION ones_count; Function Definition & Call Must return a single value based on zero or more inputs Must be called in an expression Can be passed classes CONSTANT (default), SIGNAL or FILE Class for internal objects must be VARIABLE total_ones <= ones_count (input) WHEN test_ones = ‘1’; Function Body Invoking a Function FUNCTION ones_count ( SIGNAL a : STD_LOGIC_VECTOR ) RETURN VARIABLE; Function Declaration Note: ‘RANGE is a VHDL attribute which returns the range of the object it is applied to (e.g. 7 DOWNTO 0)

163 Slide Procedure Definition & Call May have inputs, inouts and outputs May return zero or multiple outputs Must be called as a separate sequential statement Parameters may be any class –Inputs are CONSTANT by default –Outputs/inouts are VARIABLE by default PROCEDURE incr_comp ( SIGNAL cnt_sig : INOUT STD_LOGIC_VECTOR ; CONSTANT max : IN INTEGER ; SIGNAL maxed_out : OUT BOOLEAN ) IS -- declare any local objects (i.e. constants, -- variables,…) BEGIN IF cnt_sig >= max THEN maxed_out <= TRUE ; ELSE maxed_out <= FALSE ; cnt_sig <= cnt_sig + 1; END IF ; END PROCEDURE incr_comp; incr_comp (err_cnt, 12, err_cnt_maxed); incr_comp (code_cnt, 144, code_cnt_maxed); Invoking a Procedure Procedure Declaration PROCEDURE incr_comp ( SIGNAL cnt_sig : INOUT STD_LOGIC_VECTOR ; CONSTANT max : IN INTEGER ; SIGNAL maxed_out : OUT BOOLEAN ); Procedure Declaration

164 Slide Functions vs. Procedures Always execute in zero time –Cannot pause their execution –Can not contain any delay, event, or timing control statements Must have at least one input argument –Inputs may not be affected by function Arguments may not be outputs and inouts Always return a single value May execute in non-zero simulation time –May contain delay, event, or timing control statements May have zero or more input, output, or inout arguments Modify zero or more values Return values by means of parameter arguments FunctionsProcedures

165 Slide Synthesizable Subprograms Make code more readable/reusable Two types –Functions Synthesize to combinatorial logic –Procedures Can synthesize to combinatorial or sequential logic –Signal assignments in procedures called from clocked processes generate registers –May test for clock edges »May not be supported by all synthesis tools Must not contain WAIT statements Each call generates a separate block of logic –No logic sharing –Implement manual resource sharing, if possible (discussed later)

166 Slide Combinational Loops Common cause of instability Behavior of loop depends on the relative propagation delays through logic –Propagation delays can change Simulation tools may not match hardware behavior CLRN ENA DQ d clk q Logic PROCESS (clk, clrn) BEGIN IF clrn = ‘0’ THEN q <= 0; ELSIF rising_edge (clk) THEN q <= d; END IF; END PROCESS; clrn <= (ctrl1 XOR ctrl2) AND q;

167 Slide Combinational Loops All feedback loops should include registers CLRN ENA D Q d clk q Logic clrn CLRN ENA D Q PROCESS (clk, clrn) BEGIN IF clrn = ‘0’ THEN q <= 0; ELSIF rising_edge (clk) q <= d; END IF; END PROCESS; PROCESS (clk) BEGIN IF rising_edge (clk) THEN clrn <= (ctrl1 XOR ctrl2) AND q; END IF; END PROCESS;

168 Slide Gated Clocks Can lead to both functional and timing problems –Clock behavior subject to both synthesis and placement & routing –Can be a source of additional clock skew –Glitches on clock path possible Recommendations: –Use clock enables for clock gating functionality –Use dedicated device resources (e.g. clock control blocks) to gate clocks synchronously and reduce power –If you must build your own gating logic Use a synchronous gating structure Ensure global clock routing is used for clock signal Gate the clock at the source

169 Slide Gated Clock Examples g_clk <= gate AND clk; PROCESS (g_clk, clrn) BEGIN IF clrn = ‘0’ THEN q <= ‘0’; ELSIF rising_edge(g_clk) THEN q <= d; END IF; END PROCESS; PROCESS (clk) BEGIN IF falling_edge (clk) THEN sgate <= gate; END IF; END PROCESS; g_clk <= sgate AND clk; PROCESS (g_clk, clrn) BEGIN IF clrn = ‘0’ THEN q <= ‘0’; ELSIF rising_edge (g_clk) THEN q <= d; END IF; END PROCESS; Poor clock gating – Active clock edges occurring near gate signal changes may result in glitches Better clock gating – Gate signal clocked by falling edge clk, so gate may only change on inactive clock edge (Use OR gate when falling edge is the active clock edge)

170 Slide How Many Registers? LIBRARY IEEE ; USE IEEE. STD_LOGIC_1164.ALL ; ENTITY reg1 IS PORT ( d : IN STD_LOGIC ; clk: IN STD_LOGIC ; q : OUT STD_LOGIC ); END ENTITY reg1; ARCHITECTURE logic OF reg1 IS SIGNAL a, b : STD_LOGIC ; BEGIN PROCESS (clk) BEGIN IF rising_edge (clk) THEN a <= d; b <= a; q <= b; END IF ; END PROCESS ; END ARCHITECTURE reg1;

171 Slide CLRN ENA DQ clk q b CLRN ENA DQ clk CLRN ENA DQ d clk a How Many Registers? Signal assignments inside the IF-THEN statement that checks the clock condition infer registers

172 Slide Signal Assignment Moved How Many Registers? LIBRARY IEEE ; USE IEEE. STD_LOGIC_1164.ALL ; ENTITY reg2 IS PORT ( d : IN STD_LOGIC ; clk: IN STD_LOGIC ; q : OUT STD_LOGIC ); END ENTITY reg2; ARCHITECTURE logic OF reg2 IS SIGNAL a, b : STD_LOGIC ; BEGIN PROCESS (clk) BEGIN IF rising_edge (clk) THEN a <= d; b <= a; END IF ; END PROCESS ; q <= b; END ARCHITECTURE reg1;

173 Slide Signal b to signal q assignment is no longer edge-sensitive because it is not inside the if-then statement that checks the clock condition q CLRN ENA DQ clk CLRN ENA DQ d clk a How Many Registers?

174 Slide Signals changed to variables How Many Registers? LIBRARY IEEE ; USE IEEE. STD_LOGIC_1164.ALL ; ENTITY reg3 IS PORT ( d : IN STD_LOGIC ; clk: IN STD_LOGIC ; q : OUT STD_LOGIC ); END ENTITY reg3; ARCHITECTURE logic OF reg3 IS BEGIN PROCESS (clk) VARIABLE a, b : STD_LOGIC ; BEGIN IF rising_edge (clk) THEN a := d; b := a; q <= b; END IF ; END PROCESS ; END ARCHITECTURE reg1;

175 Slide Variable assignments are updated immediately Signal assignments are updated on clock edge CLRN ENA DQ d clk q How Many Registers?

176 Slide Inferring Logic Functions Using behavioral modeling to describe logic blocks Synthesis tools recognize description & insert equivalent logic functions (e.g. megafunctions) –Functions typically pre-optimized for utilization or performance over general purpose functionally equivalent logic –Use synthesis tool’s templates (if available) as starting point –Use synthesis tool’s graphic display to verify logic recognition Makes code vendor-independent

177 Slide Logic Inference Example PROCESS (clock) BEGIN IF rising_edge (clock) THEN IF wren = ‘1’ THEN mem(conv_integer(address) <= data; END IF; q <= mem(conv_integer(address); END IF; END PROCESS; Altera megafunction and/or library cells Synthesis tool sees Replaces with

178 Slide Quartus II VHDL Templates 178 Insert Template (Edit menu) Preview window: edit before inserting & save as user template

179 Slide Quartus II Software RTL Viewer Graphically represents results of synthesis Schematic View Hierarchy List Toolbar Starting RTL Viewer 1.Run Analysis & Elaboration (Processing menu or Task window) Any processing that performs elaboration 2.Open RTL Viewer (Tools menu or Tasks window) Displays last successful analysis

180 Slide Inferring Common Functions Latches Registers Counters Tri-states Memory

181 Slide Latch Inference – “Wanted” Latch sensitivity list includes both inputs LIBRARY IEEE; USE IEEE.std_logic_1164.ALL; ENTITY latch IS PORT ( data : IN std_logic; gate : IN std_logic; q : OUT std_logic); END ENTITY latch; ARCHITECTURE behavior OF latch IS BEGIN label_1: PROCESS (data, gate) BEGIN IF gate = '1' THEN q <= data; END IF; END PROCESS label_1; END ARCHITECTURE behavior; What happens if gate = ‘0’?  Implicit memory & feedback level sensitive…not edge Latch in RTL Viewer Latch in Technology Viewer

182 Slide DFF Using rising_edge Function rising_edge – IEEE function that is defined in the std_logic_1164 package – specifies that the signal value must be 0 to 1 – X, Z to 1 transition is not allowed CLRN ENA DQ d clk q LIBRARY IEEE; USE IEEE.std_logic_1164.ALL; ENTITY dff_b IS PORT ( d : IN std_logic; clk : IN std_logic; q : OUT std_logic); END ENTITY dff_b; ARCHITECTURE behavior OF dff_b IS BEGIN PROCESS(clk) BEGIN IF rising_edge(clk) THEN q <= d; END IF; END PROCESS; END ARCHITECTURE behavior;

183 Slide DFF Using clk’event and clk=‘1’ clk’event and clk=‘1’ – clk is the signal name (any name) – ‘event is a VHDL attribute, specifying that there needs to be a change in signal value – clk=‘1’ means positive-edge triggered CLRN ENA DQ d clk q LIBRARY IEEE; USE IEEE.STD_LOGIC_1164.ALL; ENTITY dff_a IS PORT ( d : in std_logic; clk : in std_logic; q : out std_logic); END ENTITY dff_a; ARCHITECTURE behavior OF dff_a IS BEGIN PROCESS (clk) BEGIN IF (clk'event and clk = '1’) THEN q <= d; END IF; END PROCESS; END ARCHITECTURE behavior;

184 Slide Recommended DFF Inference Use the rising_edge function for consistent simulation –‘X’ to ‘1’ transitions trigger the DFF when clk’event and clk=‘1’ is used, but not when rising_edge is used Both clk’event and clk=‘1’ & rising_edge produce the same synthesis Must use std_logic_1164 package for rising_edge or falling_edge functions

185 Slide Secondary Control Signals Register control signals vary between FPGA & CPLD families –Clear, preset, load, clock enable, etc. Avoid using signals not available in architecture –Functionality of design supported by creating extra logic cells –Less efficient, possibly slower results

186 Slide ARCHITECTURE behavior OF dff IS BEGIN PROCESS( clk, aclr, apre, aload, adata ) BEGIN IF aclr = ‘1' THEN q <= '0'; ELSIF apre = ‘1’ THEN q <= ‘1’; ELSIF aload = ‘1’ THEN q <= adata ; ELSIF rising_edge(clk) THEN IF ena = ‘1’ THEN IF sclr = ‘1’ THEN q <= ‘0’; ELSIF sload = ‘1’ THEN q <= sdata ; ELSE q <= d; END IF; END PROCESS; END ARCHITECTURE behavior; –This is how to implement all asynchronous and synchronous control signals for the Altera PLD registers –Conditions outside of the rising_edge statement are asynchronous –Conditions inside of the rising_edge statement are synchronous –Remove signals not required by your logic –Synchronous controls are not included in sensitivity list DFF with Secondary Control Signals

187 Slide Incorrect Control Signal Priority ARCHITECTURE behavior OF dff_clr IS BEGIN PROCESS(clk) BEGIN IF rising_edge(clk) THEN IF sclr = ‘1’ THEN q <= ‘0’; ELSIF ena = ‘1’ THEN q <= d; END IF; END PROCESS; END ARCHITECTURE behavior; –2 control signals –Considerations –Do the registers in the hardware have both ports available? –How does hardware behave? Does clear or enable have priority? –Sync clear has priority enable over in code –Enable has priority over sync clear in silicon –Additional logic needed to force code priority

188 Slide Control Signals Priority 1.Asynchronous clear (aclr) 2.Asynchronous preset (pre) 3.Asynchronous load (aload) 4.Enable (ena) 5.Synchronous clear (sclr) 6.Synchronous load (sload) Same for all Altera FPGA families –All signals not supported by all families Re-ordering generates extra logic

189 Slide Incorrect Control Logic PROCESS (clk, clr_n) BEGIN IF clr_n = '0' THEN x <= '0'; ELSIF rising_edge(clk) THEN x <= a; y <= b; END IF; END PROCESS; –y is not included in clr_n condition –What is the behaviour specified for y when clr_n is asserted? –While clr_n clears x, it acts like an enable for y CLRN ENA DQ a clk clr_n x CLRN ENA DQ b clk y clr_n

190 Slide DFF with Clock Enable ARCHITECTURE behavior OF dff_all IS SIGNAL ena : std_logic; BEGIN PROCESS (clk, clr_n) BEGIN IF clr_n = '0' THEN q <= '0'; ELSIF rising_edge(clk) THEN IF ena = '1' THEN q <= d; END IF; END PROCESS; ena <= (ena_a OR ena_b) XOR ena_c; END ARCHITECTURE behavior; CLRN ENA DQ d ena clr_n q clk –To ensure that this is synthesised using DFFE primitives (DFF with enable) –Place the enable statement directly after the rising edge statement –Place enable expressions in separate process or assignment –If the synthesis tool does not recognize this as an enable it will be implemented using extra LUTs

191 Slide Shift Registers ARCHITECTURE behavior OF shift IS SIGNAL qi : STD_LOGIC_VECTOR (7 DOWNTO 0); BEGIN PROCESS (clock, aclr) BEGIN IF aclr = ‘1' THEN qi '0‘); ELSIF rising_edge(clock) THEN IF enable = '1' THEN qi <= qi (6 DOWNTO 0) & shiftin; END IF; END PROCESS; q <= qi; END ARCHITECTURE behavior; –Shift register with parallel output, serial input, asynchronous clear and enable which shifts left –Add or remove secondary controls similar to DFF Shift function (& = Concatenation)

192 Slide Basic Counter PROCESS (clock, aclr) VARIABLE cnt : std_logic_vector (7 DOWNTO 0); BEGIN IF aclr = ‘1' THEN cnt := (OTHERS => '0‘); ELSIF rising_edge(clock) THEN cnt := cnt + 1; END IF; q <= cnt; END PROCESS; –Binary up counter with asynchronous clear –Add or remove secondary controls similar to DFF Count function Note: These examples use the VARIABLE class as the count variable but a SIGNAL could have been used just as easily

193 Slide Counter Using Integers PROCESS (clock, aclr) VARIABLE cnt : INTEGER RANGE 0 TO 255; BEGIN IF aclr = ‘1’ THEN cnt := 0; ELSIF rising_edge(clock) THEN IF cnt = 255 THEN cnt := 0; ELSE cnt := cnt + 1; END IF; q <= conv_std_logic_vector(cnt,8); END PROCESS; –Range determines bit width for counter –If range is left out, counter will default to at least 32 bits –Must manually account for rollover –No automatic rollover for integers (unlike std_logic) –If missing, code causes end of range errors in simulation (synthesizes correctly) conv_std_logic_vector(, ) converts integer to std_logic Found in std_logic_arith package

194 Slide Up / Down Counter PROCESS (clock,aclr) VARIABLE cnt : std_logic_vector(7 DOWNTO 0); VARIABLE direction : integer RANGE -1 TO 1; BEGIN IF aclr = ‘1’ THEN cnt := (OTHERS => '0‘); ELSIF rising_edge(clock) THEN IF updown = ‘1’ THEN direction := 1; ELSE direction := -1; END IF; cnt := cnt + direction; END IF; q <= cnt; END PROCESS;

195 Slide PROCESS (clock,aclr) VARIABLE cnt : std_logic_vector(7 DOWNTO 0); CONSTANT modulus : INTEGER := 200; BEGIN IF aclr = ‘1’ THEN cnt := (OTHERS => '0‘); ELSIF rising_edge(clock) THEN IF cnt = modulus-1 THEN cnt := (OTHERS => '0‘); ELSE cnt := cnt + 1; END IF; q <= cnt; END PROCESS; Modulus 200 Counter

196 Slide PROCESS (clock,aclr) VARIABLE cnt : INTEGER RANGE 0 TO 199; CONSTANT modulus : INTEGER := 200; BEGIN IF aclr = ‘1’ THEN cnt := 0; ELSIF rising_edge(clock) THEN IF cnt = modulus-1 THEN cnt := 0; ELSE cnt := cnt + 1; END IF; q <= conv_std_logic_vector(cnt,8); END PROCESS; –Cannot simply change range –Same logic if range was 0 to 255 –Range used by synthesis tool to define bit width; Does not build decode logic for synchronous reset –Logic must be defined explicitly Modulus 200 Counter Using Integers

197 Slide Integers vs. Standard Logic Arrays Represent numbers only –Are more behavioral than standard logic –Synthesis tools more free to generate resulting logic –May generate less logic Integers use less storage space during processing –Simulate faster Always use RANGE to constrain integers for synthesis –Defaults to 32 bits Use for internal calculations and describing internal logic Represent an array of 9 signal values –Can be “sliced” –Are more structural than integers –Structure must be optimized down into efficient logic Can be set to bus widths wider than 32 bits Automatically roll over during calculations Use for I/O ports & data path IntegersStandard Logic Arrays

198 Slide Tri-states IEEE defines ‘Z’ value in STD_LOGIC package –Simulation: Behaves like high-impedance state –Synthesis: Converted to tri-state buffers Altera devices have tri-state buffers only in I/O cells –Benefits: Eliminates possible bus contention Location of internal logic is a non-issue Cost savings –Don’t pay for unused tri-state buffers –Less testing required of devices –Internal tri-states must be converted to combinatorial logic –Complex output enable may cause errors or inefficient logic

199 Slide Inferring Tri-states Correctly ARCHITECTURE behavior OF tri2 IS BEGIN driver1 : PROCESS (ena, in_sig) BEGIN IF (ena=‘1’) THEN out_sig <= in_sig; ELSE out_sig <= ‘Z’; END IF; END PROCESS; END ARCHITECTURE behavior; ARCHITECTURE behavior OF tri1 IS BEGIN out_sig <= in_sig WHEN ena = ‘1’ ELSE ‘Z’; END ARCHITECTURE behavior; Conditional Signal Assignment Process Statement –Only 1 Assignment to Output Variable –Uses Tri-State Buffer in I/O Cell Device I/O Cells ena in_sig out_sig

200 Slide Inferring Tri-states Incorrectly ARCHITECTURE behavior OF tri3 IS BEGIN out_sig <= in_sig1 WHEN ena1 = ‘1’ ELSE ‘Z’; out_sig <= in_sig2 WHEN ena2 = ‘1’ ELSE ‘Z’; END ARCHITECTURE behavior; –2 Assignments to Same Signal Not Allowed in Synthesis Unless ‘Z” Is Used –Output Enable Logic Emulated in LEs –Simulation & Synthesis Do Not Match I/O Cells APEX II Device Logic ena1 ena2 in_sig1 in_sig2 out_sig

201 Slide Bidirectional Pins ENTITY bidir_pin IS ( bidir : INOUT std_logic; oe, clk, from_core : IN std_logic; to_core : OUT std_logic; ● END ENTITY bidir_pin; ARCHITECTURE behavior OF bidir_pin IS BEGIN bidir <= from_core WHEN oe=‘1’ ELSE “Z”; to_core <= bidir; ● END ARCHITECTURE behavior; bidir as an tri-stated output bidir as an input –Declare pin as direction INOUT –Use INOUT as both input & tri- stated output –Input side always “on” –For registered bidirectional I/O, use separate process to infer registers

202 Slide Memory Synthesis tools have different capabilities for recognizing memories Synthesis tools are sensitive to certain coding styles in order to recognize memories –Usually described in the tool documentation Tools and target devices may have limitations in architecture implementation –Synchronous inputs only –Limitations in clocking schemes –Memory size limitations –Read-during-write support Must declare an array data type to hold memory values Recommendation: Read Quartus II Handbook, Volume 1, Chapter 6 for more information on inferring memories and read during write behavior

203 Slide ARCHITECTURE logic OF sp_ram IS TYPE mem_type IS ARRAY (0 TO 63) OF std_logic_vector (7 DOWNTO 0); SIGNAL mem: mem_type; BEGIN PROCESS (clock) BEGIN IF rising_edge(clock) THEN IF (wren = '1') THEN mem(conv_integer(address)) <= data; END IF; END PROCESS; q <= mem(conv_integer(address)); END ARCHITECTURE logic; Inferred Single-Port Memory (1) –Code describes a 64 x 8 RAM with synchronous write & asynchronous read –Cannot be implemented in Altera embedded RAM due to asynchronous read –Uses general logic and registers –conv_integer is a function found in the std_logic_unsigned (or signed) package –Use TO_INTEGER if using numeric_std package

204 Slide ARCHITECTURE logic OF sp_ram IS TYPE mem_type IS ARRAY (0 TO 63) OF std_logic_vector (7 DOWNTO 0); SIGNAL mem: mem_type; BEGIN PROCESS (clock) BEGIN IF rising_edge(clock) THEN IF (wren = '1') THEN mem(conv_integer(address)) <= data; END IF; q <= mem(conv_integer(address)); END IF; END PROCESS; END ARCHITECTURE logic; Inferred Single-Port Memory (2) –Code describes a 64 x 8 RAM with synchronous write & synchronous read –Old data read-during-write behaviour –Memory read in same process/cycle as memory write –Check target architecture for support as unsupported features built using LUTs/registers

205 Slide ARCHITECTURE logic OF sp_ram IS SUBTYPE byte IS std_logic_vector (7 DOWNTO 0); TYPE mem_type IS ARRAY (0 TO 63) OF byte; SIGNAL mem: mem_type; SIGNAL rdaddr_reg : byte; BEGIN PROCESS (clock) BEGIN IF rising_edge(clock) THEN IF (wren = '1') THEN mem(conv_integer(address)) <= data; END IF; rdaddr_reg <= address; END IF; END PROCESS; q <= mem(conv_integer(rdaddr_reg)); END ARCHITECTURE logic; Inferred Single-Port Memory (3) –Same memory with new data read-during-write behaviour –Read performed by separate concurrent statement/process –Check target architecture for support –Use ramstyle attribute set to “no_rw_check” to disable checking and prevent extra logic generation Using subtype for vector width

206 Slide ARCHITECTURE logic OF sdp_ram IS TYPE mem_type IS ARRAY (63 DOWNTO 0) OF std_logic_vector (7 DOWNTO 0); SIGNAL mem: mem_type; BEGIN PROCESS (clock) BEGIN IF rising_edge(clock) THEN IF (wren = '1') THEN mem(conv_integer(wraddress)) <= data; END IF; q <= mem(conv_integer(rdaddress)); END IF; END PROCESS; END ARCHITECTURE logic; –Code describes a simple dual- port (separate read & write addresses) 64 x 8 RAM with single clock –Code implies old data read- during-write behaviour –New data support in simple dual-port requires additional RAM bypass logic Simple Dual-Port, Single-Clock Memory

207 Slide ARCHITECTURE logic OF dp_dc_ram IS TYPE mem_type IS ARRAY (63 DOWNTO 0) OF std_logic_vector (7 DOWNTO 0); SIGNAL mem: mem_type; SIGNAL addr_reg_a, addr_reg_b : std_logic_vector (7 DOWNTO 0); BEGIN PROCESS ( clock_a ) BEGIN IF rising_edge( clock_a ) THEN IF ( wren_a = '1') THEN mem(conv_integer( address_a )) <= data_a ; END IF; addr_reg_a <= address_a ; END IF; q_a <= mem(conv_integer(addr_reg_a)); END PROCESS; PROCESS ( clock_b ) BEGIN IF rising_edge( clock_b ) THEN IF ( wren_b = '1') THEN mem(conv_integer( address_b )) <= data_b ; END IF; addr_reg_b <= address_b ; END IF; q_b <= mem(conv_integer(addr_reg_b)); END PROCESS; END ARCHITECTURE logic; –Code describes a true dual-port (two individual addresses) 64 x 8 RAM –May not be supported in all synthesis tools –New data same-port read-during-write behaviour shown –Mixed port behaviour undefined with multiple clocks True Dual-Port, Dual-Clock Memory

208 Slide Initializing Memory Contents Using Files ARCHITECTURE logic OF sp_ram IS TYPE mem_type IS ARRAY (0 TO 63) OF std_logic_vector (7 DOWNTO 0); SIGNAL mem: mem_type; ATTRIBUTE ram_init_file : STRING; ATTRIBUTE ram_init_file OF mem : SIGNAL IS “init_file_name.hex”; BEGIN PROCESS (clock) BEGIN IF rising_edge(clock) THEN IF (we = '1') THEN mem(conv_integer(address)) <= data; END IF; q <= mem(conv_integer(address)); END IF; END PROCESS; END ARCHITECTURE logic; –Use VHDL attribute to assign initial contents to inferred memory –Store initialization data as.HEX or.MIF –Contents of initialization file downloaded into FPGA during configuration

209 Slide Initializing Memory Using Default ARCHITECTURE logic OF sp_ram IS TYPE mem_type IS ARRAY (0 TO 63) OF std_logic_vector (7 DOWNTO 0); FUNCTION init_ram RETURN mem_type Is VARIABLE mem_out : mem_type; BEGIN FOR I IN 0 TO 63 LOOP mem_out(i) := conv_std_logic_vector(i, 8); END FOR; RETURN mem_out; END FUNCTION init_ram; SIGNAL mem: mem_type := init_ram; BEGIN –Assign default value when declaring memory –This example uses a function to establish memory values –Recommendation: Use when initializing memory with patterned data –Can also use a constant (see ROM example) –Recommendation: Use when initializing memory with non-patterned data or single value (e.g. OTHERS => “ ”;) –MIF file automatically generated during synthesis due to initialization Default initial value for memory Loop used to assign each memory address

210 Slide Unsupported Control Signals e.g. Clearing RAM contents with reset BEGIN PROCESS (clock, reset) BEGIN IF reset = ‘1’ THEN mem(conv_integer(address)) <= (OTHERS => ‘0’); ELSIF rising_edge(clock) THEN IF (we = '1') THEN mem(conv_integer(address)) <= data; END IF; END PROCESS; q <= mem(conv_integer(address)); END ARCHITECTURE logic; –Memory content cannot be cleared with reset –Synthesizes to general logic resources –Recommendations 1.Avoid reset checking in RAM read or write processes 2.Be wary of other control signals (i.e. clock enable) until validated with target architecture

211 Slide SIGNAL q : std_logic_vector (6 DOWNTO 0); BEGIN PROCESS(clock) BEGIN IF rising_edge(clock) THEN CASE address IS WHEN "0000" => q <= " "; WHEN "0001" => q <= " "; WHEN "0010" => q <= " "; WHEN "0011" => q <= " "; WHEN "0100" => q <= " "; … WHEN "1101" => q <= " "; WHEN "1110" => q <= " "; WHEN "1111" => q <= " "; WHEN OTHERS => q <= "XXXXXXX"; END CASE; END IF; END process; Inferred ROM (Case Statement) –Automatically converted to ROM –Tools generate ROM using embedded RAM & initialization file –Requires constant explicitly defined for each choice in CASE statement –May use romstyle synthesis attribute to control implementation –Like RAMs, address or output must be registered to implement in Altera embedded RAM

212 Slide ARCHITECTURE logic OF rom16x7 IS TYPE rom_type IS ARRAY (0 TO 15) OF STD_LOGIC_VECTOR (6 DOWNTO 0); CONSTANT rom : rom_type := “ ”, “ ”, “ ”, “ ”, “ ”, “ ’, “ ”, “ ”, “ ”, “ ”, “ ”, “ ”, OTHERS => “ ” ); Inferred ROM (Constant) –Needs 1 constant value for each ROM address –Example shows dual-port access –May place type & constant declaration in package for re-use –Alternate: Create and use initialization function routine (see RAM example) BEGIN PROCESS (clock) BEGIN IF rising_edge (clock) THEN qa <= rom(CONV_INTEGER(addr_a)); qb <= rom(CONV_INTEGER(addr_b)); END IF; END PROCESS; END ARCHITECTURE logic;

213 Slide -213 State Machine Coding Enumerated data type is used to define the different states in the state machine –Using constants for states may not be recognized as state machine One or two signals assigned to the name of the state-variable : Use CASE statement to do the next-state logic, instead of IF-THEN statement –Synthesis tools recognize CASE statements for implementing state machines Use CASE or IF-THEN-ELSE for output logic TYPE state_type IS (idle, fill, heat_w, wash, drain); SIGNAL current_state, next_state : state_type;

214 Slide Use to verify correct coding of state machine Highlighting State in State Transition Table Highlights Corresponding State in State Flow Diagram State Flow Diagram State Transition/Encoding Table Tools Menu  State Machine Viewer Use Drop-Down to Select State Machine Quartus II Software State Machine Viewer

215 Slide ENTITY wm IS PORT ( clk, reset, door_closed, full : in std_logic; heat_demand, done, empty : in std_logic; water, spin, heat, pump : out std_logic); END ENTITY wm; ARCHITECTURE behave OF wm IS TYPE state_type IS (idle, fill, heat_w, wash, drain); SIGNAL current_state, next_state : state_type; BEGIN State Declaration IDLE Water = 0 Spin = 0 Heat = 0 Pump = 0 FILL Water = 1 Spin = 0 Heat = 0 Pump = 0 HEAT_W Water = 0 Spin = 1 Heat = 1 Pump = 0 WASH Water = 0 Spin = 1 Heat = 0 Pump = 0 DRAIN Water = 0 Spin = 1 Heat = 0 Pump = 1 Door_closed = 1 Full = 1 Heat_demand = 0 Heat_demand = 1 Done = 1 Empty = 1

216 Slide PROCESS (clk, reset) BEGIN IF reset = ‘1’ THEN current_state <= idle; ELSIF risting_edge(clk) THEN current_state <= next_state; END IF; END PROCESS; PROCESS (current_state, door_closed, full, heat_demand, done, empty) BEGIN next_state <= current_state; CASE current_state IS WHEN idle => IF door_closed = ‘1’ THEN next_state <= fill; WHEN fill => IF full = ‘1’ THEN next_state <= heat_w; Next State Logic IDLE Water = 0 Spin = 0 Heat = 0 Pump = 0 FILL Water = 1 Spin = 0 Heat = 0 Pump = 0 HEAT_W Water = 0 Spin = 1 Heat = 1 Pump = 0 WASH Water = 0 Spin = 1 Heat = 0 Pump = 0 DRAIN Water = 0 Spin = 1 Heat = 0 Pump = 1 Door_closed = 1 Full = 1 Heat_demand = 0 Heat_demand = 1 Done = 1 Empty = 1 Sequential state transitions Combinatorial next state logic Default next state is current state

217 Slide PROCESS (current_state) BEGIN water <= ‘0’; spin <= ‘0’; heat <= ‘0’; pump <= ‘0’; CASE current_state IS WHEN idle => WHEN fill => water <= ‘1’; WHEN heat_w => spin <= ‘1’; heat <= ‘1’; WHEN wash => spin <= ‘1’; WHEN drain => spin <= ‘1’; pump <= ‘1’; END CASE; END PROCESS; Combinatorial Outputs IDLE Water = 0 Spin = 0 Heat = 0 Pump = 0 FILL Water = 1 Spin = 0 Heat = 0 Pump = 0 HEAT_W Water = 0 Spin = 1 Heat = 1 Pump = 0 WASH Water = 0 Spin = 1 Heat = 0 Pump = 0 DRAIN Water = 0 Spin = 1 Heat = 0 Pump = 1 Door_closed = 1 Full = 1 Heat_demand = 0 Heat_demand = 1 Done = 1 Empty = 1 Default output conditions –Output logic function of current state only

218 Slide State Machine Encoding Styles StateBinary Encoding Grey-Code Encoding One-Hot Encoding Custom Encoding Idle ? Fill ? Heat_w ? Wash ? Drain ? Quartus II default encoding styles for Altera devices  One-hot encoding for look-up table (LUT) devices Architecture features lesser fan-in per cell and an abundance of registers  Binary (minimal bit) or grey-code encoding for product-term devices Architecture features fewer registers and greater fan-in

219 Slide Quartus II Encoding Style Apply Assignment to State Variable Options: One-Hot Gray Minimal Bits Sequential User-Encoded Johnson

220 Slide Undefined States Noise and spurious events in hardware can cause state machines to enter undefined states If state machines do not consider undefined states, it can cause mysterious “lock-ups” in hardware Good engineering practice is to consider these states To account for undefined states –Explicitly code for them (manual) –Use “safe” synthesis constraint (automatic)

221 Slide TYPE state_type IS (idle, fill, heat_w, wash, drain); SIGNAL current_state, next_state : state_type; PROCESS (current_state, door_closed, full, heat_demand, done, empty) BEGIN next_state <= current_state; CASE current_state is WHEN idle => IF door_closed = ‘1’ THEN next_state <= fill; END IF; WHEN fill => IF full = ‘1’ THEN next_state <= heat_w; END IF; WHEN heat_w => IF heat_demand = ‘0’ THEN next_state <= wash; END IF; WHEN wash => IF heat_demand = ‘1’ THEN next_state <= heat_w; ELSIF done = ‘1’ THEN next_state <= drain; END IF; WHEN drain => IF empty = ‘1’ THEN next_state <= idle; END IF; WHEN others => next_state <= idle; END CASE; END PROCESS; ‘Safe’ Binary State Machine? IDLE Water = 0 Spin = 0 Heat = 0 Pump = 0 FILL Water = 1 Spin = 0 Heat = 0 Pump = 0 HEAT_W Water = 0 Spin = 1 Heat = 1 Pump = 0 WASH Water = 0 Spin = 1 Heat = 0 Pump = 0 DRAIN Water = 0 Spin = 1 Heat = 0 Pump = 1 Door_closed = 1 Full = 1 Heat_demand = 0 Heat_demand = 1 Done = 1 Empty = 1 –This code does not consider undefined states –The “when others” statement only considers other enumerated states –The states “101”, “110” & “111” are not considered

222 Slide Creating “Safe” State Machines WHEN OTHERS clause does not make state machines “safe” –Once state machine is recognized, synthesis tool only accounts for explicitly defined states –Exception: Number of states equals power of 2 AND binary/grey encoding enabled Safe state machines created using synthesis constraints –Quartus II software uses SAFE STATE MACHINE assignment applied project-wide and to individual FSMs VHDL synthesis attribute –May increase logic usage

223 Slide Using Custom Encoding Styles Remove glitches without output registers Eliminate combinatorial output logic Outputs mimic state bits –Use additional state bits for states that do have exclusive outputs StateOutputsCustom Encoding Idle Fill Heat_w Wash Drain Water Spin Heat Pump

224 Slide ENTITY wm IS PORT ( clk, reset, door_closed, full : in std_logic; heat_demand, done, empty : in std_logic; water, spin, heat, pump : out std_logic); END wm; ARCHITECTURE behave OF wm IS TYPE state_type IS (idle, fill, heat_w, wash, drain); ATTRIBUTE syn_encoding : STRING; ATTRIBUTE syn_encoding OF state_type : TYPE IS " ”; SIGNAL current_state, next_state : state_type; BEGIN Quartus II Custom State Encoding Full = 1 Done = 1 IDLE Water = 0 Spin = 0 Heat = 0 Pump = 0 FILL Water = 1 Spin = 0 Heat = 0 Pump = 0 HEAT_W Water = 0 Spin = 1 Heat = 1 Pump = 0 WASH Water = 0 Spin = 1 Heat = 0 Pump = 0 DRAIN Water = 0 Spin = 1 Heat = 0 Pump = 1 Door_closed = 1 Heat_demand = 0 Heat_demand = 1 Empty = 1 –Must also set State Machine Processing assignment to “User Encoded” –Output assignments are coded per previous examples –Synthesis automatically handles reduction of output logic –Some tools use VHDL attributes like enum_encoding OR syn_enum_encoding to perform custom state encoding

225 Slide Writing Efficient State Machines Remove counting, timing, arithmetic functions from state machine & implement externally Reduces overall logic & improves performance

226 Slide VHDL Logic Optimization & Performance Balancing operators Resource sharing Logic duplication Pipelining

227 Slide Operators Synthesis tools replace operators with pre- defined (pre-optimized) blocks of logic Designer should control when & how many operators –Ex. Dividers Dividers are large blocks of logic Every ‘/’, mod and rem inserts a divider block and leaves it up to synthesis tool to optimize Better resource optimization usually involves cleverly using multipliers or shift operations to do divide

228 Slide Generating Logic from Operators IF (sel < 10) THEN y <= a + b; ELSE y <= a + 10; END IF; ++< 1 Comparator 2 Adders 1 Mulitplexer –Synthesis tools break down code into logic blocks –They then assemble, optimize & map to hardware

229 Slide Balancing Operators Use parenthesis to define logic groupings –Increases performance –May increase utilization –Balances delay from all inputs to output –Circuit functionality unchanged z <= a * b * c * d X a b X c X d z X a b z <= (a * b) * (c * d) X c d X z UnbalancedBalanced

230 Slide Balancing Operators: Example a, b, c, d: 4-bit vectors z <= a * b * c * d X a b X c X d z X a b z <= (a * b) * (c * d) X c d X z UnbalancedBalanced 4 x 4 8 x 4 12 x 4 16-bit 4 x 4 8 x 8 16-bit Delay through 3 stages of multiply Delay through 2 stages of multiply

231 Slide Resource Sharing Reduces number of operators needed –Reduces area Two types –Sharing operators among mutually exclusive functions –Sharing common subexpressions Synthesis tools can perform automatic resource sharing –Feature can be enabled or disabled

232 Slide Mutually Exclusive Operators process(rst, clk) variable tmp_q : std_logic_vector(7 DOWNTO 0); begin if rst = '0' then tmp_q := (OTHERS => ‘0’); elsif rising_edge(clk) then if updn = '1' then tmp_q := tmp_q + 1; else tmp_q := tmp_q - 1; end if; end if; q <= tmp_q; end process; –Up/down counter –2 adders are mutually exclusive & can be shared (typically IF-THEN-ELSE with same operator in both choices) + Registers +1 q rst clk +

233 Slide process(rst, clk) variable tmp_q : std_logic_vector(7 DOWNTO 0); variable dir : integer range -1 to 1; begin if rst = '0' then tmp_q := (OTHERS => ‘0’); elsif rising_edge(clk) then if updn = '1' then dir := 1; else dir := -1; end if; tmp_q := tmp_q + dir; end if; q <= tmp_q; end process; –Up/down counter –Only one adder required + Registers +1 q rst clk Sharing Mutually Exclusive Operators

234 Slide How Many Multipliers? y <= a * b * c z <= b * c * d

235 Slide How Many Multipliers? (Answer) X a b X c X X d y z y <= a * b * c z <= b * c * d 4 Multipliers!

236 Slide How Many Multipliers Again? y <= a * (b * c) z <= (b * c) * d

237 Slide X b c X a X d y z y <= a * (b * c) z <= (b * c) * d 3 Multipliers! –This is called sharing common subexpressions –Some synthesis tools do this automatically, but some don’t! –Parentheses guide synthesis tools –If (b*c) is used repeatedly, assign to temporary signal How Many Multipliers Again? (Answer)

238 Slide Topics PLD –PROM –PLA –PAL –CPLD –Programming PLD –ASIC FPGA Architecture Quartus Development software FPGA Programming Technology SRAM versus Antifuse FPGA EEPROM/Flash FPGA Xilinx FPGA Architecture FPGA basic building blocks FPGA Embedded Blocks FPGA Clocking Mechanism FPGA Family Altera Megafunctions FPGA Design flow Design phase Advanced VHDL Topics Simulation versus Synthesis Latches versus registers Common pitfalls Unwanted latches Case statement Variable versus signals Synthesizable subprograms Gated clocks Inferring Logic Functions. Control Signal Priority Tri-state Memory

239 Slide Example - 1 Explain the problem with gated clock? How can you implement a gated clock in your design? Cause of functional and timing problem source of additional clock skew To solve: Use a synchronous gating structure Ensure global clock routing is used for clock signal Gate the clock at the source

240 Slide Example - 2 How many registers are? Four registers Use variable that are updated immediately as shown One register now! ARCHITECTURE logic OF reg1 IS SIGNAL a, b, c : STD_LOGIC ; BEGIN PROCESS (clk) BEGIN IF rising_edge (clk) THEN a <= d; b <= a; c <= b; q <= c; END IF ; END PROCESS ; END ARCHITECTURE reg1; ARCHITECTURE logic OF reg1 IS VARIABLE a, b, c : STD_LOGIC ; BEGIN PROCESS (clk) BEGIN IF rising_edge (clk) THEN a := d; b := a; c := b; q <= c; END IF ; END PROCESS ; END ARCHITECTURE reg1;

241 Slide Example - 3 Explain the problem with the following code? Two drivers drive the same signal, use tri-state ARCHITECTURE beh OF example3 IS BEGIN q <= d; q <= i; END ARCHITECTURE beh;

242 Slide Example - 4 Explain the problem with the following VHDL model? Fix It. LIBRARY ieee; USE IEEE.std_logic_1164.all; ENTITY nolatch IS PORT (a,b,c : IN STD_LOGIC; sel: IN STD_LOGIC_VECTOR (4 DOWNTO 0); oput: OUT STD_LOGIC); END nolatch; ARCHITECTURE rtl OF nolatch IS BEGIN PROCESS (a,b,c,sel) BEGIN IF sel = "00000" THEN oput <= a; ELSIF sel = "00001" THEN oput <= b; ELSIF sel = "00010" THEN oput <= c; END IF; END PROCESS; END rtl;

243 Slide Example - 4 Explain the problem with the following VHDL model? Unwanted latch, code updated to remove the unwanted latch. LIBRARY ieee; USE IEEE.std_logic_1164.all; ENTITY nolatch IS PORT (a,b,c : IN STD_LOGIC; sel: IN STD_LOGIC_VECTOR (4 DOWNTO 0); oput: OUT STD_LOGIC); END nolatch; ARCHITECTURE rtl OF nolatch IS BEGIN PROCESS (a,b,c,sel) BEGIN IF sel = "00000" THEN oput <= a; ELSIF sel = "00001" THEN oput <= b; ELSIF sel = "00010" THEN oput <= c; ELSE --- Prevents latch inference oput <= 'X';--/ END IF; END PROCESS; END rtl;

244 Slide Pipelining Purposefully inserting register(s) into middle of combinatorial data (critical) path Increases clocking speed Adds levels of latency –More clock cycles needed to obtain output Some tools perform automatic pipelining –Same advantages/disadvantages as automatic fan-out

245 Slide Adding Single Pipeline Stage Decode Value x-1 Decode Value x-1 Logic 20 ns Decode Value x Decode Value x Counter, State Machine Logic 40 ns Counter, State Machine 25 MHz System 50 MHz System

246 Slide mult_ : PROCESS (clk, clr) BEGIN IF (clr = ‘0’) THEN atemp ‘0’); btemp ‘0’); ctemp ‘0’); dtemp ‘0’); result ‘0’); ELSIF rising_edge(clk) atemp <= a; btemp <= b; ctemp <= c; dtemp <= d; result <= (atemp * btemp) * (ctemp * dtemp); END IF; END PROCESS; mult_pipe : PROCESS (clk, clr) BEGIN IF (clr = ‘0’) THEN atemp ‘0’); btemp ‘0’); ctemp ‘0’); dtemp ‘0’); int1 ‘0’); int2 ‘0’); result ‘0’); ELSIF rising_edge(clk) atemp <= a; btemp <= b; ctemp <= c; dtemp <= d; int1 <= atemp * btemp; int2 <= ctemp * dtemp; result <= int1 * int2; END IF; END PROCESS; Non-Pipelined Pipelined Adding Single Pipeline Stage In VHDL

247 Slide Pipelined 4-input Multiplier X a b X c d X z

248 Slide Parameterized Code Logic blocks that are made scalable for reuse Code is written for flexibility –Different configurations of same model 4 constructs –Pre-defined attributes –Generics –For generate –If generate

249 Slide Pre-Defined Attributes Return information regarding associated object Object changes will automatically be reflected in returned values Uses –Improving readability of code –Creating parameterized models Improve flexibility of code, especially using loops Limit hard-coding logic resources Examples –Array attributes –Signal attributes (not discussed) e.g. ‘EVENT, ‘STABLE

250 Slide Pre-Defined Array Attributes a‘HIGH = 7 –Upper bound of array index a‘LOW = 0 –Lower bound of array index a‘RIGHT = 0 –Right-most bound of array index a‘LEFT = 7 –Left-most bound of array index a‘RANGE = 7 DOWNTO 0 –Range declared for object, either TO or DOWNTO a‘REVERSE = 0 TO 7 –Reverse of the range declared for object a‘LENGTH = 8 –Number of values in range index a’ASCENDING = FALSE –Returns TRUE if array range uses TO and FALSE if array range uses DOWNTO a : IN STD_LOGIC_VECTOR(7 DOWNTO 0) -These array attributes are synthesizable

251 Slide Generics (Review) Used to pass information to an entity instance –Timing values (for simulation) –Scalable code ENTITY reg_bank IS GENERIC ( tplh, tphl : time := 5 ns; tphz, tplz : time := 3 ns; size : integer := 1; ); PORT ( clk : IN std_logic; d : IN std_logic_vector (size - 1 DOWNTO 0); q : OUT std_logic_vector (size - 1 DOWNTO 0) ); END ENTITY shift_reg;

252 Slide Parameterized Counter ENTITY counter IS GENERIC (width : INTEGER); PORT ( clk, clr, sload, cnt_en : IN std_logic; data : IN std_logic_vector (width - 1 DOWNTO 0); q : OUT std_logic_vector (width - 1 DOWNTO 0)); END ENTITY counter; ARCHITECTURE logic OF counter IS BEGIN PROCESS BEGIN PROCESS (clk, clr) VARIABLE cnt : std_logic_vector (width - 1 DOWNTO 0; IF clr = ‘1’ THEN cnt := 0; ELSIF rising_edge(clk) THEN IF sload = ‘1' THEN cnt := data; ELSIF cnt_en = '1' THEN cnt := cnt + 1; END IF; q <= cnt; END PROCESS; END ARCHITECTURE logic; Generic width used to scale counter

253 Slide Using A Parameterized Function Must map to generics & port Generic & port resolution done at compile time u1 : counter GENERIC MAP (width => 16) PORT MAP (clk => tclk, clr => tclr, sload => tsload, cnt_en => tcnt_en, data => tdata, q => tq); top_counter counter clk clr cnt_en sload data q tclk tclr cnt_en tsload data tq 16

254 Slide Complete Code LIBRARY IEEE; USE IEEE.std_logic_1164.all; USE IEEE.std_logic_arith.all; ENTITY top_counter IS PORT ( tclk, tclr, tsload, tcnt_en : IN std_logic; tdata : IN std_logic_vector (15 DOWNTO 0); tq : OUT std_logic_vector (15 DOWNTO 0) ); END ENTITY top_counter; ARCHITECTURE logic OF top_counter IS COMPONENT pcounter GENERIC (width : INTEGER); PORT ( clk, clr, sload, cnt_en : IN std_logic; data : IN std_logic_vector (width - 1 DOWNTO 0); q : OUT std_logic_vector (width - 1 DOWNTO 0) ); END COMPONENT; BEGIN u1 : pcounter GENERIC MAP (width => 16) PORT MAP (clk => tclk, clr => tclr, sload => tsload, cnt_en => tcnt_en, data => tdata, q => tq); END ARCHITECTURE logic;

255 Slide Generate Statements Used to create structural blocks Resolved at compile time Reduce amount of code Can be nested For-generate –Creates zero or a set number of duplicates of a structure –No need to individual instantiate each duplicate If-generate –Conditionally selects whether zero or one structure is made

256 Slide For-Generate Syntax Sets the number of structures created Similar to FOR loop –Can only use concurrent statements Label is required label : FOR IN GENERATE --concurrent statements END GENERATE label;

257 Slide - PARITY: Block Diagram 257

258 Slide - PARITY: Entity Declaration LIBRARY ieee; USE ieee.std_logic_1164.all; ENTITY parity IS PORT( parity_in : IN STD_LOGIC_VECTOR(7 DOWNTO 0); parity_out : OUT STD_LOGIC ); END parity; 258

259 Slide - PARITY: Block Diagram xor_out(1) xor_out(2) xor_out(3) xor_out(4) xor_out(5) xor_out(6) 259

260 Slide - PARITY: Architecture ARCHITECTURE parity_dataflow OF parity IS SIGNAL xor_out: std_logic_vector (6 downto 1); BEGIN xor_out(1) <= parity_in(0) XOR parity_in(1); xor_out(2) <= xor_out(1) XOR parity_in(2); xor_out(3) <= xor_out(2) XOR parity_in(3); xor_out(4) <= xor_out(3) XOR parity_in(4); xor_out(5) <= xor_out(4) XOR parity_in(5); xor_out(6) <= xor_out(5) XOR parity_in(6); parity_out <= xor_out(6) XOR parity_in(7); END parity_dataflow; 260

261 Slide - PARITY: Block Diagram (2) xor_out(1) xor_out(2) xor_out(3) xor_out(4) xor_out(5) xor_out(6) xor_out(7) xor_out(0) 261

262 Slide - PARITY: Architecture ARCHITECTURE parity_dataflow OF parity IS SIGNAL xor_out: STD_LOGIC_VECTOR (7 downto 0); BEGIN xor_out(0) <= parity_in(0); xor_out(1) <= xor_out(0) XOR parity_in(1); xor_out(2) <= xor_out(1) XOR parity_in(2); xor_out(3) <= xor_out(2) XOR parity_in(3); xor_out(4) <= xor_out(3) XOR parity_in(4); xor_out(5) <= xor_out(4) XOR parity_in(5); xor_out(6) <= xor_out(5) XOR parity_in(6); xor_out(7) <= xor_out(6) XOR parity_in(7); parity_out <= xor_out(7); END parity_dataflow; 262

263 Slide - PARITY: Architecture (2) ARCHITECTURE parity_dataflow OF parity IS SIGNAL xor_out: STD_LOGIC_VECTOR (7 DOWNTO 0); BEGIN xor_out(0) <= parity_in(0); G2: FOR i IN 1 TO 7 GENERATE xor_out(i) <= xor_out(i-1) XOR parity_in(i); END GENERATE G2; parity_out <= xor_out(7); END parity_dataflow; 263

264 Slide - w 8 w 11 s 1 w 0 s 0 w 3 w 4 w 7 w 12 w 15 s 3 s 2 f Example – 16X1 Mux 264

265 Slide - A 4-to-1 Multiplexer LIBRARY ieee ; USE ieee.std_logic_1164.all ; ENTITY mux4to1 IS PORT (w0, w1, w2, w3: IN STD_LOGIC ; s: IN STD_LOGIC_VECTOR(1 DOWNTO 0) ; f: OUT STD_LOGIC ) ; END mux4to1 ; ARCHITECTURE Dataflow OF mux4to1 IS BEGIN WITH s SELECT f <= w0 WHEN "00", w1 WHEN "01", w2 WHEN "10", w3 WHEN OTHERS ; END Dataflow ; 265

266 Slide - Straightforward code for 16X1 Mux LIBRARY ieee ; USE ieee.std_logic_1164.all ; ENTITY Example1 IS PORT ( w : IN STD_LOGIC_VECTOR(0 TO 15) ; s : IN STD_LOGIC_VECTOR(3 DOWNTO 0) ; f : OUT STD_LOGIC ) ; END Example1 ; 266

267 Slide - Straightforward code for 16X1 Mux ARCHITECTURE Structure OF Example1 IS COMPONENT mux4to1 PORT (w0, w1, w2, w3: IN STD_LOGIC ; s: INSTD_LOGIC_VECTOR(1 DOWNTO 0) ; f: OUT STD_LOGIC ) ; END COMPONENT ; SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) ; BEGIN Mux1: mux4to1 PORT MAP ( w(0), w(1), w(2), w(3), s(1 DOWNTO 0), m(0) ) ; Mux2: mux4to1 PORT MAP ( w(4), w(5), w(6), w(7), s(1 DOWNTO 0), m(1) ) ; Mux3: mux4to1 PORT MAP ( w(8), w(9), w(10), w(11), s(1 DOWNTO 0), m(2) ) ; Mux4: mux4to1 PORT MAP ( w(12), w(13), w(14), w(15), s(1 DOWNTO 0), m(3) ) ; Mux5: mux4to1 PORT MAP ( m(0), m(1), m(2), m(3), s(3 DOWNTO 2), f ) ; END Structure ; 267

268 Slide - Modified code for 16X1 Mux ARCHITECTURE Structure OF Example1 IS COMPONENT mux4to1 PORT (w0, w1, w2, w3: IN STD_LOGIC ; s: INSTD_LOGIC_VECTOR(1 DOWNTO 0) ; f: OUT STD_LOGIC ) ; END COMPONENT ; SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) ; BEGIN G1: FOR i IN 0 TO 3 GENERATE Muxes: mux4to1 PORT MAP ( w(4*i), w(4*i+1), w(4*i+2), w(4*i+3), s(1 DOWNTO 0), m(i) ) ; END GENERATE ; Mux5: mux4to1 PORT MAP ( m(0), m(1), m(2), m(3), s(3 DOWNTO 2), f ) ; END Structure ; 268

269 Slide - w 0 En y 0 w 1 y 1 y 2 y 3 y 8 y 9 y 10 y 11 w 2 w 0 y 0 y 1 y 2 y 3 w 0 En y 0 w 1 y 1 y 2 y 3 w 0 y 0 w 1 y 1 y 2 y 3 y 4 y 5 y 6 y 7 w 1 w 0 y 0 w 1 y 1 y 2 y 3 y 12 y 13 y 14 y 15 w 0 En y 0 w 1 y 1 y 2 y 3 w 3 w Example- 4X16 Decoder 269

270 Slide - A 2-to-4 binary decoder LIBRARY ieee ; USE ieee.std_logic_1164.all ; ENTITY dec2to4 IS PORT ( w: IN STD_LOGIC_VECTOR(1 DOWNTO 0) ; En : IN STD_LOGIC ; y : OUT STD_LOGIC_VECTOR(0 TO 3) ) ; END dec2to4 ; ARCHITECTURE Dataflow OF dec2to4 IS SIGNAL Enw : STD_LOGIC_VECTOR(2 DOWNTO 0) ; BEGIN Enw <= En & w ; WITH Enw SELECT y <= "1000" WHEN "100", "0100" WHEN "101", "0010" WHEN "110", "0001" WHEN "111", "0000" WHEN OTHERS ; END Dataflow ; 270

271 Slide - VHDL code for 4X16 decoder LIBRARY ieee ; USE ieee.std_logic_1164.all ; ENTITY dec4to16 IS PORT (w: IN STD_LOGIC_VECTOR(3 DOWNTO 0) ; En : IN STD_LOGIC ; y: OUT STD_LOGIC_VECTOR(0 TO 15) ) ; END dec4to16 ; 271

272 Slide - VHDL code for 4X16 decoder (2) ARCHITECTURE Structure OF dec4to16 IS COMPONENT dec2to4 PORT (w : IN STD_LOGIC_VECTOR(1 DOWNTO 0) ; En : IN STD_LOGIC ; y: OUT STD_LOGIC_VECTOR(0 TO 3) ) ; END COMPONENT ; SIGNAL m : STD_LOGIC_VECTOR(0 TO 3) ; BEGIN G1: FOR i IN 0 TO 3 GENERATE Dec_ri: dec2to4 PORT MAP ( w(1 DOWNTO 0), m(i), y(4*i TO 4*i+3) ); END GENERATE ; Dec_left: dec2to4 PORT MAP ( w(3 DOWNTO 2), En, m ) ; END Structure ; 272

273 Slide If Generate Syntax Condition controls whether a structure is created Can only use concurrent statements Label is required label : IF

274 Slide - Example 1 Based on Lshift, either a Left-Shift register or Right- shift register is generated. If Lshift is true – N-bit left-shift register If false, Right-shift register. 274 entity shift_reg is generic(N: positive := 4; Lshift: Boolean := true);-- generic parameters used port(D: in bit_vector(N downto 1); -- named association Qout: out bit_vector(N downto 1); CLK, Ld, Sh, Shiftin: in bit); end shift_reg; architecture SRN of shift_reg is signal Q, shifter: bit_vector(N downto 1); begin Qout <= Q; genLS: if Lshift generate -- conditional generate of left shift register shifter <= Q(N-1 downto 1) & Shiftin; end generate; genRS: if not Lshift generate -- conditional generate of right shift register shifter <= Shiftin & Q(N downto 2); end generate; process(CLK) begin if CLK'event and CLK = '1' then if LD = '1' then Q <= D; elsif Sh = '1' then Q <= shifter; end if; end process; end SRN;

275 Slide If Generate Example 2 ENTITY counter IS GENERIC (width : INTEGER; rise_or_fall : INTEGER); PORT ( clk, clr, sload, cnt_en : IN std_logic; data : IN std_logic_vector (width - 1 DOWNTO 0); q : OUT std_logic_vector (width - 1 DOWNTO 0) ); END ENTITY counter; ARCHITECTURE logic OF counter IS SIGNAL clk_buf : std_logic; BEGIN clock : IF (rise_or_fall > 0) GENERATE clk_buf <= clk; END GENERATE; not_clock : IF (rise_or_fall <= 0) GENERATE clk_buf <= NOT clk; END GENERATE; PROCESS (clk_buf, clr) VARIABLE cnt : INTEGER RANGE 0 TO (2**width)-1; BEGIN IF clr = ‘1’ THEN cnt := 0; ELSIF rising_edge(clk_buf) THEN IF sload = ‘1' THEN cnt := conv_integer(data); ELSIF cnt_en = '1' THEN cnt := cnt + 1; END IF; q <= conv_std_logic_vector(cnt,width); END PROCESS; END ARCHITECTURE logic; u1 : pcounter3 GENERIC MAP (width => 16, rise_or_fall => 0) PORT MAP (clk => tclk, clr => tclr, sload => tsload, data => tdata, cnt_en => tcnt_en, q => tq); END ARCHITECTURE logic; selects rising or falling edge clock behavior –One code slice can implement both a rising & falling edge counter –Different (& better) than using IF-THEN-ELSE –No clock mux is created; either clock inversion is implemented or it is not

276 FPGA Design Validation: Simulation & Design Verification

277 Slide - FPGA Design Flow Requirements: –Provided by customer or generated internally –May be ambiguous –Little or no implementation details –The customer (internal or external) may not know exactly what they want or what is possible Product Delivery Design Verification Design Implementation SpecificationsRequirements 277

278 Slide - FPGA Design Flow Specification: –Identify what the requirements mean –Narrow the requirements to specifics Design blocks Components Input/Output What the design should and shouldn’t do Product Delivery Design Verification Design Implementation SpecificationsRequirements 278

279 Slide - Design Implementation Synthesis of specification into a workable design Other names: Design Engineering Cycle Initially iterative DesignTest Examine Results Modify Specification 279

280 Slide - Design Verification Verify that your design functions according to the specification A complete specification will cover all cases A poor specification is not an excuse for a sloppy design 280

281 Slide - Product Delivery Output of the design process: –A product that performs according to the provided requirements –Internally/Mutually developed specification –Verification of the performance to the specification Documentation of due diligence –Documentation and customer acknowledgement of all know design faults Assessment of risk severity DFMEA 281

282 Slide - DFMEA Design Failure Mode Engineering Analysis Basics: –Identify all possible design failure modes –Assign a severity to the failure mode –Assess the risk (probability) of this type of failure –For all failure modes above a certain severity/probability develop mitigation plan –Assign test criteria based on failure mode 282

283 Slide - Product Development for FPGAs Simulation and Design Verification Crucial Part of the design process FPGA are not hardware and not software Hardware: –Deterministic –My schematic is my schematic Software: –Non-deterministic –Different compilers may produce operations for the same high-level program 283

284 Slide - Product Development for FPGAs – What is Simulation Simulation is the process of applying stimulus or inputs that mimic actual data to the design and observing the output. Input to simulation phase: –Design –Synthesis netlist –Implementation netlist 284

285 Slide - Product Development for FPGAs –Simulation Tools Editor to create the inputs –Text editor –Graphical editor Simulator: compiles or connect the test inputs to the design, causing outputs to change based on input data. Input to simulation phase: –Design –Synthesis netlist –Implementation netlist Example: Mentor Graphics 285

286 Slide - VHDL Design Validation Levels of Simulation –Register Transfer Level (RTL) –Functional –Gate Level 286

287 Slide - RTL Simulation Check for logic and syntax error Does the design work on the target hardware Will it compile? Contains no timing evaluation 287

288 Slide - Functional Simulation Performed on netlist or code generated by synthesis tool Sometimes necessary to direct synthesis tool to provide netlist Initial Timing Analysis Will the synthesized design fit or work on the target hardware 288

289 Slide - Gate Level Simulation Performed on the netlist generated by the implementation tool. Contains actually timing information –Representative of hardware –Most realistic –Detects design timing problems 289

290 Slide - Simulation in the Design Process Complete RTL –Does the design function/compile? Complete Functional Simulation –Will it function on the target hardware Gate level simulation –Will it work as expected over all operational conditions A failure at any of these level require the other steps to be revisited 290

291 Slide - Developing a RTL Simulation Identify Inputs/Outputs Identify Test Cases For each test case develop a vector waveform Run each test case and verify output Should hit every area of your design Test cases are referred to a stimulus 291

292 Slide - Vector Waveform Files (VWF) 292

293 Slide - Functional Simulation Verify the functional operation Expand on RTL simulation Include some timing variation –Looking for timing hazards VWF may include timing variations –Pulse width –Pulse spacing 293

294 Slide - Gate Level Simulation A full timing analysis including hardware effects Repeat of Functional Simulation 294

295 Slide - Hardware Verification Stimuli developed in simulation can be supplied to a hardware test cases generator Build and program target hardware –FPGA level –Board Level –System Level 295

296 FPGA Design Validation: Simulation & Design Verification

297 Slide - FPGA Design Flow Simulation: –RTL –Functional –Gate Level Product Delivery Design Verification Design Implementation SpecificationsRequirements 297

298 Slide - Simulation in the design process Good practice to return at least to functional simulation before approving design changes Gate level simulation involving multiple timing cases can be time consuming 298

299 Slide - Stimulus Test cases/Stimulus: –One test case for each condition …and so on Test Case Input 1Input 2Q 1WideN/ALow 2DefaultShortLow 3Wide High 299

300 Slide - Choosing a simulation tool Hardcore: –Develop HDL –Company specific automated script generation tools IDE: Development Toolchain –ModelSim Mentor Graphics (Also owns Cadence) 300

301 Slide Introduction to Testbenches Purpose of testbench Three classes of traditional testbenches General testbench methods Self verification methods Arrays for stimulus & results TEXTIO for stimulus & results

302 Slide Purpose of Testbench Generate stimulus to test design for normal transactions, corner cases and error conditions –Direct tests –Random tests Automatically verify design to spec and log all errors –Regression tests Log transactions in a readable format for easy debugging

303 Slide Three Classes of Traditional Testbenches I.Test bench applies stimulus to target code and outputs are manually reviewed II.Test bench applies stimulus to target code and verifies outputs functionally Requires static timing analysis III.Test bench applies stimulus to target code and verifies outputs with timing Does not require full static timing analysis Code and test bench data more complex Not covered

304 Slide Advantages/Disadvantages Testbench TypeAdvantagesDisadvantagesRecommendation Class I Simple to write Requires manual verification Takes longer for others (not original designer) to verify Easy for others to miss errors Great for verifying simple code Not intended for re-use Class II Easy to perform verification once complete “Set and forget it” Takes longer to write More difficult to debug initially Better for more complicated designs, designs with complicated stimulus/outputs and higher-level designs Promotes re-usability Class III Most in-depth “Guarantees” design operation, if successful (subject to model accuracy) Takes longest to write Most difficult to debug Physical changes (i.e. target device, process) requires changing testbench Might be overkill for many FPGA designs Required for non-Altera ASIC designs

305 Slide General Testbench Methods Create “test harness” code to instantiate the device under test (DUT) or target code Create stimulus signals to connect to DUT mycode_tb.vhd mycode.vhd clk in1 in2 in3 out1 clk_assignment datagen_process rst reset_assignment out2 Single Process to Control each Signal

306 Slide Test Vector Generation Develop sequence of fixed input values Test vector development from bottom up –Write basic tasks –Write more complex tasks based on basic tasks –Perform tests Example – memory testing –Basic tasks: readmem, writemem –2 nd level tasks: initmem, copymem, comparemem –Generation of tests based on tasks

307 Slide - Testbench Anatomy ENTITY my_entity_tb IS --TB entity has no ports END my_entity_tb; ARCHITECTURE behavioral OF tb IS --Local signals and constants COMPONENT TestComp --All Design Under Test component declarations PORT ( ); END COMPONENT; BEGIN DUT:TestComp PORT MAP( -- Instantiations of DUTs ); testSequence: PROCESS -- Input stimuli END PROCESS; END behavioral; 307

308 Slide - Testbench for XOR3 (1) LIBRARY ieee; USE ieee.std_logic_1164.all; ENTITY xor3_tb IS END xor3_tb; ARCHITECTURE behavioral OF xor3_tb IS -- Component declaration of the tested unit COMPONENT xor3 PORT( A : IN STD_LOGIC; B : IN STD_LOGIC; C : IN STD_LOGIC; Result : OUT STD_LOGIC ); END COMPONENT; -- Stimulus signals - signals mapped to the input and inout ports of tested entity SIGNAL test_vector: STD_LOGIC_VECTOR(2 DOWNTO 0); SIGNAL test_result : STD_LOGIC; 308

309 Slide - Testbench for XOR3 (2) BEGIN UUT : xor3 PORT MAP ( A => test_vector(2), B => test_vector(1), C => test_vector(0), Result => test_result); ); Testing: PROCESS BEGIN test_vector <= "000"; WAIT FOR 10 ns; test_vector <= "001"; WAIT FOR 10 ns; test_vector <= "010"; WAIT FOR 10 ns; test_vector <= "011"; WAIT FOR 10 ns; test_vector <= "100"; WAIT FOR 10 ns; test_vector <= "101"; WAIT FOR 10 ns; test_vector <= "110"; WAIT FOR 10 ns; test_vector <= "111"; WAIT FOR 10 ns; END PROCESS; END behavioral; 309

310 Slide - Generating selected values of one input SIGNAL test_vector : STD_LOGIC_VECTOR(2 downto 0); BEGIN testing: PROCESS BEGIN test_vector <= "000"; WAIT FOR 10 ns; test_vector <= "001"; WAIT FOR 10 ns; test_vector <= "010"; WAIT FOR 10 ns; test_vector <= "011"; WAIT FOR 10 ns; test_vector <= "100"; WAIT FOR 10 ns; END PROCESS; END behavioral; 310

311 Slide - Generating all values of one input SIGNAL test_vector : STD_LOGIC_VECTOR(3 downto 0):="0000"; BEGIN testing: PROCESS BEGIN WAIT FOR 10 ns; test_vector <= test_vector + 1; end process TESTING; END behavioral; 311

312 Slide - Generating periodical signals, such as clocks CONSTANT clk1_period : TIME := 20 ns; CONSTANT clk2_period : TIME := 200 ns; SIGNAL clk1 : STD_LOGIC; SIGNAL clk2 : STD_LOGIC := ‘0’; BEGIN clk1_generator: PROCESS clk1 <= ‘0’; WAIT FOR clk1_period/2; clk1 <= ‘1’; WAIT FOR clk1_period/2; END PROCESS; clk2 <= not clk2 after clk2_period/2; END behavioral; 312

313 Slide - Generating one-time signals, such as resets CONSTANT reset1_width : TIME := 100 ns; CONSTANT reset2_width : TIME := 150 ns; SIGNAL reset1 : STD_LOGIC; SIGNAL reset2 : STD_LOGIC := ‘1’; BEGIN reset1_generator: PROCESS reset1 <= ‘1’; WAIT FOR reset_width; reset1 <= ‘0’; WAIT; END PROCESS; reset2_generator: PROCESS WAIT FOR reset_width; reset2 <= ‘0’; WAIT; END PROCESS; END behavioral; 313

314 Slide Concurrent Statements Signals with regular or limited transitions can be created with concurrent statements These statements can begin a testbench and reside outside any processes CLK RESET ns ARCHITECTURE logic OF test_b IS -- Use clkperiod constant to create 50 MHz clock CONSTANT clkperiod : TIME := 20 ns; -- clk initialized to ‘0’ SIGNAL clk : std_logic := ‘0’; SIGNAL reset : std_logic; BEGIN --clock must be initialized when declared to use -- this notation clk <= NOT clk AFTER clkperiod/2; reset <= ‘1’, ‘0’ AFTER 20 ns, ‘1’ AFTER 40 ns; END ARCHITECTURE logic;

315 Slide Sequential Statements More complex combinations can be created using sequential statements (i.e. LOOP, WAIT, IF- THEN, CASE) –Statements dependent on clock edges –Multiple processes & loops executing at once clkgen: PROCESS -- Another clock generation example CONSTANT clkperiod : TIME := 20 ns; BEGIN clk <= ‘0’; -- Initialize clock WAIT FOR 500 ns; -- Delay clock for 500 ns LOOP -- Infinite loop to create free-running clock clk <= ‘1’; WAIT FOR clkperiod/2; clk <= ‘0’; WAIT FOR clkperiod/2; END LOOP; END PROCESS clkgen; buscount: PROCESS (clk) -- Generate counting pattern BEGIN IF rising_edge (clk) THEN inbus <= count; count <= count + 1; END IF; END PROCESS buscount;

316 Slide Sequential Statements (cont.) Example shows more complex stimulus generation Process uses sensitivity list and WAITs (not allowed in synthesis) (uses IEEE.numeric_std.all) bus_gray: PROCESS (clk) CONSTANT buswidth: INTEGER := 16; BEGIN inbus ‘0’); FOR n IN 0 TO LOOP inbus <= TO_UNSIGNED(n, buswidth) XOR shift_right(TO_UNSIGNED(n, buswidth)), 1); WAIT UNTIL rising_edge(clk); END LOOP; END PROCESS;

317 Slide LIBRARY ieee; USE ieee.std_logic_1164.all; USE ieee.std_logic_unsigned.all; ENTITY addtest IS -- Top-level entity with no ports END ENTITY addtest; ARCHITECTURE stimulus OF addtest IS -- Declare design being tested COMPONENT adder PORT ( clk : IN std_logic; a, b: IN std_logic_vector(3 DOWNTO 0); sum : OUT std_logic_vector(3 DOWNTO 0) ); END COMPONENT; -- Signals to assign values and observe results SIGNAL a, b, sum: std_logic_vector(3 DOWNTO 0); SIGNAL clk : std_logic := ‘0’; -- Constants for timing values CONSTANT clkperiod : TIME := 20 ns; BEGIN -- Create clock to synchronize actions clk <= NOT clk AFTER clkperiod/2; -- Instantiate design being tested add1: adder PORT MAP ( clk => clk, a => a, b => b, sum => sum); Sample VHDL Class I Testbench -- Process to generate stimulus; Note operations -- take place on inactive clock edge PROCESS CONSTANT period : TIME := 40 ns; VARIABLE ina, inb : std_logic_vector(3 DOWNTO 0); BEGIN WAIT UNTIL falling_edge (clk); ina := (OTHERS => ‘0’); inb := (OTHERS => ‘0’); stim_loop: LOOP -- Apply generated stimulus to inputs a <= ina; b <= inb; WAIT FOR period; -- Exit loop once simulation reaches 1 us EXIT stim_loop WHEN NOW > 1 us ; -- Use equations below to generate new stimulus --values WAIT UNTIL falling_edge (clk); ina := ina + 2; inb := inb + 3; END LOOP stim_loop; -- Final wait to keep process from repeating WAIT; END PROCESS; END ARCHITECTURE stimulus;

318 Slide Example Results

319 Slide Topics – Exam II State Machine Coding VHDL Logic Optimization & Performance –Balancing operators –Resource Sharing –Pipelining Parameterized Code –Constructs Pre-Defined Attributes Generics For Generate If generate Simulation –RTL Simulation –Functional Simulation –Gate Level simulation Testbenchs –Classes of Testbenches Advantages and Disadvantages –Test Vector Generation

320 Slide Example - 1 Explain One-Hot Encoding used by Altera’s Quartus? Show how you can encode the following 5 states? State 0 State 1 State 2 State 3 State 4

321 Slide Example - 1 Explain One-Hot Encoding used by Altera’s Quartus? Show how you can encode the following 5 states? One-Hot Encoding: The default encoding style requiring N bits, in which N is the number of enumeration literals in the Enumeration Type. State State State State State

322 Slide Example - 2 Generate the possible logic block from the given operators: IF (A > 20) THEN X <= B * C; ELSE X <= C *D; END IF ;

323 Slide Example - 2 Generate the possible logic block from the given operators: IF (A > 20) THEN X <= B * C; ELSE X <= C *D; END IF ; < 1 Comparator 2 Multiplier 1 Mulitplexer X X

324 Slide Example - 3 Use parenthesis to balance the following operators Z <= a * b * c * d * e * f

325 Slide Example - 3 Use parenthesis to balance the following operators Z <= a * b * c * d * e * f z <= a * b * c * d * e * f X a b X c X d z X a b z <= (a * b) * (c * d) * (e * f) X c d X z Unbalanced Balanced X e X e f X f X

326 Slide Example - 4 Draw the test waveforms generated by the following testbench? ARCHITECTURE logic OF test_b IS -- Use clkperiod constant to create 50 MHz clock CONSTANT clkperiod : TIME := 20 ns; -- clk initialized to ‘1’ SIGNAL clk : std_logic := ‘1’; SIGNAL reset : std_logic; BEGIN --clock must be initialized when declared to use -- this notation clk <= NOT clk AFTER clkperiod/2; reset <= ‘1’, ‘0’ AFTER 25 ns, ‘1’ AFTER 40 ns; END ARCHITECTURE logic;

327 Slide Example - 4 Draw the test waveforms generated by the following testbench? CLK RESET ns ARCHITECTURE logic OF test_b IS -- Use clkperiod constant to create 50 MHz clock CONSTANT clkperiod : TIME := 20 ns; -- clk initialized to ‘1’ SIGNAL clk : std_logic := ‘1’; SIGNAL reset : std_logic; BEGIN --clock must be initialized when declared to use -- this notation clk <= NOT clk AFTER clkperiod/2; reset <= ‘1’, ‘0’ AFTER 25 ns, ‘1’ AFTER 40 ns; END ARCHITECTURE logic;

328 Slide Example - 5 Add to the following entity interface a generic clause defining generic constant Tpw_clk_h and Tpw_clk_l that specify the minimum clock pulse width timing. Both generic constants have a default value of 3 ns. ENTITY flipflop IS PORT (clk, d : IN STD_LOGIC; q, q_n : OUT STD_LOGIC); END ENTITY flipflop ;

329 Slide Example - 5 Add to the following entity interface a generic clause defining generic constant Tpw_clk_h and Tpw_clk_l that specify the minimum clock pulse width timing. Both generic constants have a default value of 3 ns. ENTITY flipflop IS PORT (clk, d : IN STD_LOGIC; q, q_n : OUT STD_LOGIC); END ENTITY flipflop ; ENTITY flipflop IS GENERIC (Tpw_clk_h, Tpw_clk_l : delay_length := 3 ns); PORT (clk, d : IN STD_LOGIC; q, q_n : OUT STD_LOGIC); END ENTITY flipflop ;

330 Slide Class II (& III) Methods Add a compare process so that DUT outputs can be monitored –Allows testbench to do “self-verification” mycode_tb.vhd mycode.vhd clk in1 in2 in3 out1 clk_assignment wavegen_process clk reset_assignment out2 compare_process

331 Slide Self Verification Methods Use “compare_process” or equivalent to check results generated by design against expected results Single simulation can use one or multiple testbench files –Single testbench file containing all stimulus and all expected results –Multiple testbench files based on stimulus, expected results or functionality (e.g. data generator, control stimulus) Many times signaling is too complicated to model without using vectors saved in “time-slices”

332 Slide Simple Self Verifying Test Benches clk <= NOT clk AFTER clkperiod/2; add1 : adder PORT MAP ( clk => clk, a => a, a => b, sum => sum); stim: PROCESS VARIABLE error : BOOLEAN; BEGIN WAIT UNTIL falling_edge(clk); a ‘0’); b ‘0’); WAIT FOR 40 ns; IF (sum /= 0) THEN error := TRUE; END IF; WAIT UNTIL falling_edge(clk); a <= “0010”; b <= “0011”; WAIT FOR 40 ns; IF (sum /= 5) THEN error := TRUE; END IF; -- repeat above varying values of a and b WAIT; END PROCESS stim; Code repeated for each test case Result checked –Simple self verifying test bench –Each sub-block within process assigns values to a,b and waits to compare sum to its predetermined result –Code not very efficient –Each test case may require a lot of repeated code –Improve this code by introducing a procedure

333 Slide PROCEDURE test ( SIGNAL clk : IN std_logic; inval_a, inval_b, result : IN INTEGER RANGE 0 TO 15; SIGNAL in_a, in_b : OUT std_logic_vector(3 DOWNTO 0); SIGNAL sum_out : IN std_logic_vector(3 DOWNTO 0); SIGNAL error : INOUT BOOLEAN) IS BEGIN WAIT UNTIL falling_edge(clk); in_a <= conv_std_logic_vector(inval_a,4); in_b <= conv_std_logic_vector(inval_b,4); WAIT FOR 40 ns; IF sum_out /= result THEN error <= TRUE; ELSE error <= FALSE; END IF; END PROCEDURE; BEGIN – architecture begin clk <= NOT clk AFTER clkperiod/2; add1 : adder PORT MAP (clk => clk, a => a, a => b, sum => sum); PROCESS BEGIN test(clk, 0, 0, 0, a, b, sum, error); test(clk, 2, 3, 5, a, b, sum, error); test(clk, 4, 6, 10, a, b, sum, error); test(clk, 6, 9, 15, a, b, sum, error); test(clk, 8, 12, 4, a, b, sum, error); WAIT ; END PROCESS; END ARCHITECTURE; Procedure used to simplify test bench Each procedure call passes in clock 3 integers representing input stimulus and expected result ports connecting to adder error flag –Procedure improves efficiency and readability of testbench –Advantage: Easier to write –Disadvantages –Each procedure call (like last example) assigns values to a, b then waits to compare sum to its predetermined result –Very difficult to do for complicated signaling Simplifying Test Bench with Procedure

334 Slide “Time-Slice” Vectors Allows you to apply input stimulus and check results at specific simulation times Two methods for storage –Internal arrays Faster simulation times Harder to write, creates very large VHDL file –External files Slower simulation times Easier to write Use TEXTIO or STD_LOGIC_TEXTIO package –TEXTIO for reading/writing built-in data types –STD_LOGIC_TEXTIO for reading/writing standard logic

335 Slide Add’l Useful VHDL Constructs for Testbenches Record data types Assert & report statements Type conversion to STRING TEXTIO/File operations

336 Slide Record Data Types Declares a new data type with multiple elements –Allows grouping of related data types/objects Each element may be of any previously defined data type, including arrays, enumerated types and even other records Similar to a struct in C Using in a testbench –Set each record to the values for one time slice Cycle through records to apply stimulus and check results –Examples Store input and output values in different elements Store different inputs in different elements TYPE test_record_type IS RECORD a, b : std_logic_vector(3 DOWNTO 0); sum : std_logic_vector(3 DOWNTO 0); END RECORD; element names element data types

337 Slide Accessing Values in a Record Use selected name to access single record element Use aggregate to access entire record VARIABLE vector : test_record_type; vector.a := “0010”; vector.b := “0011”; vector.sum := “0101”; vector := (a => “0010”, b => “0011”, sum => “0101”);

338 Slide Using Internal Arrays for Stimulus & Results Create array to store values (e.g. array of records) Assign values to array -- Create unconstrained array so the array depth can be set when object is -- declared of the array type TYPE test_array_type IS ARRAY (POSITIVE RANGE <>) OF test_record_type; -- Constant array with 6 records CONSTANT test_patterns : test_array_type := ( (a => “0000", b => “0000“, sum => “0000”), (a => “0010", b => “0011“, sum => “0101”), (a => “0100", b => “0110“, sum => “1010”), (a => “0110", b => “1001“, sum => “1111”), (a => “1000", b => “1100“, sum => “0100”), (a => “1010", b => “1111“, sum => “1001”) ); * POSITIVE is INTEGER data type with range of 1 to highest integer value

339 Slide Checks condition expression and executes assertion if condition evaluates to false –Use as concurrent or sequential statement Syntax Report (optional) –Displays text in simulator window –Must be type string Enclose character strings in “ “ Other data types must be converted (discussed later) Severity (optional) –Expression choices: NOTE, WARNING, ERROR, FAILURE ERROR is the default –Results of severity depend on simulator e.g. By default, ModelSim tool ends simulation on failure only Assert Statements ASSERT REPORT SEVERITY ;

340 Slide Report Statements Displays message without ASSERT statement –No expression to check –Sequential statement only Test must be type string –Enclose character strings in “ “ –Other data types must be converted (next slide) Syntax Severity (optional) –Same options as ASSERT except NOTE is the default REPORT SEVERITY ;

341 Slide Type Conversions to STRING Use to display formatted messages ’IMAGE(obj) –Type attribute that converts obj of type to its string equivalent with no leading or trailing whitespace –Examples INTEGER’IMAGE(integer_variable) TIME’IMAGE(time_variable) std_logic’IMAGE(1_bit_std_logic_variable) Conversion utilities –Cannot use ‘IMAGE for vectors must be a scalar type or subtype –Simple web search can provide most (if not all) required conversion utilities

342 Slide Sample Testbench Using Internal Array test: PROCESS VARIABLE vector : test_record_type; VARIABLE found_error : BOOLEAN := FALSE; BEGIN -- Loop through all the values in test_patterns FOR i IN test_patterns‘RANGE LOOP vector := test_patterns(i); -- apply the stimulus on a falling edge clock WAIT UNTIL falling_edge(testclk); a <= vector.a; b <= vector.b; -- check result on next falling edge of clock WAIT UNTIL falling_edge(testclk); IF (sum /= vector.sum) THEN REPORT TIME’IMAGE(NOW) & “ : Calc= " & slv_to_string(sum) & ", Exp= " & slv_to_string(vector.sum); found_error := TRUE; END IF; END LOOP; ASSERT NOT found_error REPORT "---VECTORS FAILED---" SEVERITY FAILURE; ASSERT found_error REPORT "---VECTORS PASSED---" END PROCESS; END ARCHITECTURE; -- entity and some of architecture declaration not shown SIGNAL testclk : std_logic := '0'; SIGNAL a, b : std_logic_vector (3 DOWNTO 0); SIGNAL sum : std_logic_vector (3 DOWNTO 0); CONSTANT clk_period : time := 20 ns; TYPE test_record_type IS RECORD a, b : std_logic_vector(3 DOWNTO 0); sum : std_logic_vector(3 DOWNTO 0); END RECORD; TYPE test_array_type IS ARRAY(POSITIVE RANGE <>) OF test_record_type; CONSTANT test_patterns : test_array_type := ( (a => “0000", b => “0000“, sum => “XXXX”), (a => “0010", b => “0011“, sum => “0000”), (a => “0100", b => “0110“, sum => “0101”), (a => “0110", b => “1001“, sum => “1010”), (a => “1000", b => “1100“, sum => “1111”), (a => “1000", b => “1100“, sum => “0100”) ); BEGIN -- beginning of architecture body -- instantiate unit under test (adder) add1 : adder PORT MAP ( clk => testclk, a => a, b => b, sum => sum); -- free-running clock process -- testclk <= NOT testclk AFTER clk_period/2;

343 Slide Example Results Testbench fails (expected results ≠ actual results) Testbench passes ** Note: 72 ns : Calc = 0100, Exp= 1001 Time: 72 ns Iteration: 0 Instance: /record_add_tb ** Failure: ---VECTORS FAILED--- Time: 288 ns Iteration: 0 Process: /record_add_tb/test File: … Break in Process test at record_tb.vhd line 56 ModelSim Transcript Window ** Failure: ---VECTORS PASSED--- Time: 288 ns Iteration: 0 Process: /record_add_tb/test File: … Break in Process test at record_tb.vhd line 59 ModelSim Transcript Window

344 Slide TEXTIO/FILE Operations FILE declaration –Creates file handle to represent file –Opens file in READ_MODE, WRITE_MODE or APPEND_MODE LINE declaration –Creates line variable for reading and writing to files READLINE(, ) –Reads a line from a file and stores information in a variable of type LINE READ(, ) –Reads text from line variable and writes to data object depending on size/type of data objec –Use STD_LOGIC_TEXTIO package to read directly into std_logic data objects Only built-in data types supported by TEXTIO package READ (BIT, BOOLEAN, STRING, TIME) WRITE(, ) –Writes data object to a variable of type LINE as text –Use STD_LOGIC_TEXTIO package to write directly from std_logic data objects Only built-in data types supported by TEXTIO package WRITE (BIT, BOOLEAN, STRING, TIME) WRITELINE(, ) –Writes information from variable of type LINE to file

345 Slide Sample Testbench Using External File -- Declare packages to enable file operations LIBRARY ieee; USE STD.TEXTIO.ALL; USE ieee.std_logic_1164.ALL; USE ieee.std_logic_textio.ALL; ENTITY file_tb IS END ENTITY file_tb; ARCHITECTURE stimulus OF file_tb IS COMPONENT adder PORT (clk : IN std_logic; a, b: IN std_logic_vector(3 DOWNTO 0); sum: OUT std_logic_vector(3 DOWNTO 0)); END COMPONENT; -- create file handles to access text files, one for reading vectors and --another to write output messages FILE vectorfile: TEXT OPEN READ_MODE IS “vectors.txt”; FILE results: TEXT OPEN WRITE_MODE IS “results.txt”; SIGNAL a, b, sum : std_logic_vector (3 DOWNTO 0); SIGNAL testclk : std_logic := ‘0’; CONSTANT clk_period : TIME := 20 ns; BEGIN -- beginning of architecture body -- instantiate unit under test (adder) add1 : adder PORT MAP ( clk => testclk, a => a, b => b, sum => sum); -- free-running clock process -- testclk <= NOT testclk AFTER clk_period/2;

346 Slide Sample Testbench Using External File (cont.) END LOOP; ASSERT NOT found_error REPORT "---VECTORS FAILED---" SEVERITY FAILURE; ASSERT found_error REPORT "---VECTORS PASSED---" SEVERITY FAILURE; END PROCESS test; END ARCHITECTURE stimulus; test: PROCESS VARIABLE found_error : BOOLEAN := FALSE; VARIABLE a_var, b_var, sum_var : std_logic_vector (3 DOWNTO 0); VARIABLE vectorline, resultsline : LINE; BEGIN WHILE NOT ENDFILE (vectorfile) LOOP -- read file into line and line into variables READLINE (vectorfile, vectorline); READ (vectorline, a_var); READ (vectorline, b_var); READ (vectorline, sum_var); -- apply the stimulus on a falling edge clock WAIT UNTIL falling_edge(testclk); a <= a_var; b <= b_var; -- check result on next falling clock edge WAIT UNTIL falling_edge(testclk); IF (sum /= sum_var) THEN -- write current simulation time to line variable WRITE (resultsline, NOW); -- write string WRITE (resultsline, string'(" : Calc= ")); -- write result value WRITE (resultsline, sum); -- write string WRITE (resultsline, string'(", Exp= ")); -- write expected value WRITE (resultsline, sum_var); -- write entire line to text file WRITELINE (results, resultsline); found_error := TRUE; END IF;

347 Slide Example Files vectors.txt –No inherent formatting excepting white-space skipping –Options Use separate files for stimulus and expected results Design custom tasks to extend capabilities (e.g. support comments) results.txt (failure example) ns Calc= 0100, Exp= 1001

348 Slide Example Test Plans Develop high-level behavioral (i.e. non- synthesizable) model of design Create stimulus/test vectors to simulate model Generate expected results from behavioral model simulation Replace behavioral blocks with RTL model blocks –Simulate each RTL block with other behavioral blocks to ensure functionality is the same

349 Slide - Synthesis 349 The first step in which HDL (or other design format) is associated with internal logic. Input: Design. Output: design netlist that feeds into the implementation tools. Other outputs: functional simulation netlist, and reports: provides pertinent information about synthesized design. Could be performed immediately following the design phase, it is mandatory (simulation is optional) Netlists: connects FPGA resources to perform the same function defined by the high level design.

350 Slide - What is Design Synthesis? 350 FPGA consists of logic blocks that can be configured to perform functions. Synthesis takes the high-level design and associates it with FPGA resources and reduce logic to make design more efficient. Synthesis process needs information about the FPGA device, such as speed, and internal resources. The FPGA is identified by selecting the family, device number, package, and speed.

351 Slide - What is Design Synthesis? 351 Three basic synthesis operations: Syntax check and Resource Association: design is checked for syntax and synthesis errors, once the design is error free, it is converted into structural elements, logic elements are inserted as replacement for arithmetic operators (X, -,..) Optimization: Design is put together without concern for redundant logic, timing constraints (if provided), clock speed, or other design consideration. Next. Algorithms are used to optimize the design: Check for redundant logic, clock speed, evaluate multiple paths to ensure fastest timing is achieved.

352 Slide - Optimization 352 Shortest path does not mean fastest time because of resources layout and how those resources are used. Example: option 2 is longer, however, option 1 has more resource delays and therefore option 2 is faster

353 Slide - What is Design Synthesis? Cnt’d 353 Technology Mapping: Map optimized design to technology associated with the targeted FPGA Synthesis tools use advanced techniques to make predictions about how the design will place and routed in the target device. Synthesis Tools produce synthesis timing estimates that are near the actual post- implementation timing, real time is unknown until after the design has been placed and routed. Example of some technology view symbols

354 Slide - Synthesis Phase Tools 354 Synthesis tools are available as standalone or part of a complete package. Complete Package Synthesis: Examples: Xilinx ISE, Altera Quartus Advantages: Single tool: need to know only one tool Faster: eliminate time to switch between tools. Cheaper Manufacturer understands device better than a third party, data are more accurate. Disadvantages: Manufacturer dependent Standalone Package Synthesis: Examples: LeonardoSpectrum by Mentor Graphics, Synplify Pro by Synopsys Advantages: Manufacturer independent Disadvantages: Separate tools for synthesis and implementation More expensive than the complete package Not expert on device

355 Slide - Synthesis Setup 355 Synthesis setup consists: Device information ( family, device number, package, and speed) Input design User-defined constraint file(s). Input Design: Altera’s Quartus accepts: AHDL (Altera Hardware Description Language) VHDL Verilog Schematic Capture EDIF: vendor independent netlist file Outputs: Netlist: the synthesized design Status reports: utilization, timing,.. schematic view: RTL

356 Slide - Netlists 356 The design netlist is what your design looks like after it has been sythesized (optimized, connected using internal FPGA logic) Functional Simulation netlist: allows to verify the synthesis process did not alter the design, you should expect same results using testbenchs. Functional Simulation is done using simulator. (ModelSim)

357 Slide - Status Reports 357 Optional: reports on resource utilizations, timing information, critical paths, warnings and errors. Not used as input to other development phases. Very helpful information and allow you to identify real or potential problems, such as design is not meeting timing and other constraints.

358 Slide - Schematic View 358 Synthesis tools generates two: RTL and technology RTL: shows the pre-optimized design in terms of generic symbols, such as adder, multiplier, counters, AND gates, … etc. RTL is manufacturer independent, not associated yet with manufacturer, Technology: shows gates and elements as they will look in the device.

359 Slide - RTL Schematic View 359 RTL: how the design looks as it is converted to logic elements

360 Slide - Technology Schematic View 360 Technology: shows the internal technology, such as lookup table connected to create the design.

361 Slide - Key points to remember 361 Synthesis is required and must be performed prior to implementation Tools include complete package versus standalone. Functional simulation should be performed, time permitting RTL and technology views show what logic makes up the design.

362 Slide - Quartus II Full Compilation Flow 362 Design Files Analysis & Elaboration Synthesis Fitter Constraints & settings Functional Simulation Gate-Level Simulation EDA Netlist Writer Functional Netlist Post-Fit Simulation Files Programming & Configuration files TimeQuest Timing Analysis Assembler Executed in parallel (multi-processor or multi- core systems only)

363 Slide - Netlist Viewers RTL Viewer –Schematic of design after Analysis and Elaboration –Visually check initial HDL before synthesis optimizations –Locate synthesized nodes for assigning constraints –Debug verification issues Technology Map Viewers (Post-Mapping or Post-Fitting) –Graphically represents results of mapping (post- synthesis) & fitting –Analyze critical timing paths graphically –Locate nodes & node names after optimizations 363

364 Slide - RTL Viewer 364 Schematic view Hierarchy list Note: Must perform elaboration first (e.g. Analysis & Elaboration OR Analysis & Synthesis) Tools menu  Netlist Viewers or Tasks window “Compile Design” tasks Find in hierarchy

365 Slide - Schematic View (RTL Viewer) Represents design using logic blocks & nets –I/O pins –Registers –Muxes –Gates (AND, OR, etc.) –Operators (adders, multipliers, etc.) 365 Place pointer over any element in schematic to see details Name Internal resource count

366 Slide - Schematic Hierarchy Navigation 366 Descend hierarchy –Double-click on instance –Right-click & select Hierarchy Down Ascend hierarchy –Double-click in white space –Right-click & select Hierarchy Up Middle hierarchy –Double-click on instance descends –Double-click in white space ascends

367 Slide - Technology Map Viewers 367 Tools Menu  Netlist Viewers or Tasks window “Compile Design” tasks Note: Must run synthesis and/or fitting first Schematic view Hierarchy list

368 Slide - Schematic View (Technology Viewer) Represents design using atoms –I/O pins & cells –Lcells –Memory blocks –MAC (DSP blocks) 368 Place pointer over any element in schematic to see details Name Internal resource count Logic equation

369 Slide - Hierarchy List Traverse between levels of design hierarchy View logic schematic for each hierarchical level Break down each hierarchical level into netlist elements or atoms –Instances –Primitives –Pins –Nets –State machines –Logic clouds (if enabled) 369

370 Slide - Using Hierarchy List 370 Expanding instances shows Instances Pins, Nets Highlighting netlist element in hierarchy list highlights/views that element in schematic view

371 Slide - Timing Analysis - Quartus TimeQuest GUI Using the TimeQuest Timing Analyzer in the Quartus II flow Online training course by Altera: bex/TimeQuest/player.htmlhttp://www.altera.com/customertraining/we bex/TimeQuest/player.html 371 Note: For more details on verifying designs for timing, please attend the course “Quartus II Software Design Series: Timing Analysis” Online training also available: TimeQuest Timing AnalyzerTimeQuest Timing Analyzer

372 Slide - TimeQuest Timing Analyzer Timing engine in Quartus II software Provides timing analysis solution for all levels of experience and design complexity 372 Features  Synopsys Design Constraints (SDC) support  Easy-to-use interface  Scripting emphasis

373 Slide - Opening the TimeQuest Interface Toolbar button Tools menu Tasks window Stand-alone mode (run w/o opening the Quartus II software) –quartus_staw 373

374 Slide - Quartus Settings File (QSF) SDC constraints are not stored in QSF For 90 nm and older devices, TimeQuest TA provides a script to convert QSF timing assignments to SDC 374

375 Slide - TimeQuest GUI 375 Report pane Tasks pane Console pane View pane Menu access to all TimeQuest features

376 Slide - SDC File Editor (1) Use Quartus II editor to create and/or edit SDC 376 TimeQuest File menu  New/Open SDC File Quartus II File menu  New  Other Files Command tooltip Features  Access to GUI dialog boxes for constraint entry  Syntax coloring  Tooltip syntax help

377 Slide - SDC File Editor (2) 377 Construct an SDC file using the TimeQuest graphical constraint creation tools Constraints inserted at cursor location

378 Slide - Using TimeQuest TA in Quartus II Flow 378

379 Slide - Steps to Using TimeQuest Tool Generate timing netlist 2. Enter SDC constraints by creating or reading in an SDC file 3. Update timing netlist 4. Generate timing reports

380 Slide - Create a timing netlist based on compilation results –Post-synthesis (mapping) or post-fit (fully compiled) –Delay model (slow or fast) Netlist menu gives complete control Tasks pane uses default (post-fit, slow) 1. Generate Timing Netlist 380 Netlist menu Tasks pane Tcl equivalent of command

381 Slide - 2. Create or Read in SDC File Create SDC file using SDC file editor - Don’t enter constraints using Constraints menu Read in constraints & exceptions from existing SDC file Execution - Read SDC File (Tasks pane or Constraints menu) File precedence (if no filename specified) –Files specifically added to Quartus II project –.sdc (if it exists in project directory) 381 Tcl: read_sdc [ ]

382 Slide - Constraining User MUST enter constraints for all paths to fully analyze design –Timing analyzer only performs slack analysis on constrained design paths –Constraints guide the fitter to place & route design in order to meet timing requirements Not as difficult a task as it may sound –Wildcards –Single, generalized constraints cover many paths, even all paths in an entire clock domain See Altera TimeQuest Timing Analyzer online training for information about basic SDC constraintsTimeQuest Timing Analyzer 382

383 Slide - 3. Update Timing Netlist Apply SDC constraints/exceptions to current timing netlist Generates warnings –Undefined clocks –Partially defined I/O delays –Combinational loops Update timing netlist after adding any new constraint Execution –Update Timing Netlist (Tasks pane or Netlist menu) 383 Tcl: update_timing_netlist

384 Slide - 4. Generate Timing Reports Verify timing requirements and locate violations Check for fully constrained design or ignored timing constraints Two methods –Tasks pane - Shortcut: Automatically creates/updates netlist & reads default SDC file if needed –Reports menu - Must have valid netlist to access 384 Double-click individual report (shortcut to skip steps 1-3)

385 Slide - Reset Design Command Located in Tasks pane or Constraints menu Flushes all timing constraints from current timing netlist –Functional Tcl equivalent: delete_timing_netlist command followed by create_timing_netlist Uses –“Re-starting” timing analysis on same timing netlist applying different constraints or SDC file –Starting analysis over if results seem to be unexpected 385

386 Slide - Using TimeQuest TA in Quartus II Flow 386 Enable TimeQuest TA in Quartus II project Synthesize Quartus II project Use TimeQuest TA to specify timing requirements Verify timing in TimeQuest TA Perform full compilation (run Fitter)

387 Slide - Enable TimeQuest TA in Quartus II Software Tells the Quartus II software to use SDC constraints during fitting File order precedence 1.Any SDC files manually added to Quartus II project (in order) 2..SDC located in project directory 387

388 Slide - Enabling in the Quartus II Software 388 Notes: Arria GX and newer devices only support Timequest TA. TimeQuest TA is enabled by default for new Stratix III and Cyclone III designs.

389 Slide - Adding SDC File to Quartus II Project Add SDC files to TimeQuest Timing Analyzer Multicorner timing analysis checks all process corners ( On by default for Cyclone II, Stratix II, & newer devices) 389 Analyze fast and slow corners during compile Click Add to add SDC to list

390 Slide - Using TimeQuest TA in Quartus II Flow 390 Enable TimeQuest TA in Quartus II project Synthesize Quartus II project Use TimeQuest TA to specify timing requirements Verify timing in TimeQuest TA Perform full compilation (run Fitter)

391 Slide - Verifying Timing Requirements View TimeQuest summary information directly in Quartus II Compilation Report Open TimeQuest TA for more thorough analysis –Follow TimeQuest flow using Post-fit netlist –Run TimeQuest easy-to-use reporting capabilities (Tasks pane) –Place Tcl reporting commands into script file - Easy repetition Verify whether Fitter was able to meet timing requirements 391

392 Slide - 3rd-Party Timing Analysis Tool Support Synopsys –PrimeTime Mentor Graphics –TAU 392

393 Slide - Design Constraints: An Example shows an example circuit including two clocks, a PLL, and other common synchronous design elements 393

394 Slide - SDC - Example 394 # Create clock constraints create_clock -name clockone -period [get_ports {clk1}] create_clock -name clocktwo -period [get_ports {clk2}] # Specify that clockone and clocktwo are unrelated by assigning # them to separate exclusive groups set_clock_groups -exclusive -group [get_clocks {clockone}] -group [get_clocks {clocktwo}] # set input and output delays set_input_delay -clock { clockone } -max 4 [get_ports {data1}] set_input_delay -clock { clockone } -min -1 [get_ports {data1}] set_input_delay -clock { clockone } -max 4 [get_ports {data2}] set_input_delay -clock { clockone } -min -1 [get_ports {data2}]

395 Slide - SDC Example The SDC file shown contains the following basic constraints you should include for most designs: –Definitions of clockone and clocktwo as base clocks, and assignment of those settings to nodes in the design. create_clock Command create_clock -period 10 -name clk_sys [get_ports clk_sys] –Specification of two mutually exclusive clock groups, one containing clockone and the other containing clocktwo. This overrides the default analysis of all clocks in the design as related to each other. set_clock_groups -exclusive -group [get_clocks {clockone}] -group [get_clocks {clocktwo}] – Specification of input delays for the design to specify the external input delay requirement with reference to clock. set_input_delay -clock { clockone } -max 4 [get_ports {data1}] 395

396 Slide - Summary TimeQuest timing analyzer provides an easy-to-use tool to verify timing –Entering timing constraints –Run various timing reports 396

397 Slide - Implementation 397 Also refers as Place and Route (PAR), the hardest job. Input: Synthesized netlist Output: bit stream or programming file with an optional gate- level simulation netlist Maps the synthesized netlist to the specific or target FPGA’s resources and interconnects them to the FPGA’s internal logi and I/O resources. Physical layout is determined. Takes four steps to convert the mid-level netlist to a final programming file – translate, map, place and route, and generate programming file.

398 Slide - Translate 398 Translation process takes the input netlist and merges it with the design constraints (if provided) to create a native generic database (NGD) output file. The synthesized netlist is automatically fed into the translation process. If error detected, the tool stops. Once completed, NGD output netlist is automatically fed into the mapping process.

399 Slide - Map 399 Mapping takes the NGD netlist, the logical design, and maps it to the target FPGA. First, a logical DRC (design rule check) is performed on the NGD list. The logic is mapped to the target FPGA’s logic cells, I/O cells, and other internal resources. The output is a native circuit description (NCD) file. NCD: the physical representation of the design and mapped to the target FPGA’s internal resources and components. NCD feeds into place-and-route stage.

400 Slide - Place and Route 400 Takes the NCD file and interconnects the design (places and routes it). The output is NCD which is used to create the programming bit stream. Optional gate level simulation, provides actual gate delay based on routing and placement. If a functional simulation was successful but not the gate- level simulation, need to narrow down where the problem first occurred.

401 Slide - Generate Program File 401 The final Step: is to generate the programming file with NCD output file from the place- and-route step as input, output is the FPGA’s programming file. This programming file resides on a nonvolatile device like PROM or within the FPGA device. This bit stream is automatically downloaded to the FPGA at power-up, this process is called configuration. Implementation tool provides various option, the bit stream can be compressed or uncompressed, Security options are available to prevent unauthorized downloading of the bit stream. Once bit stream is ready, the next step is to program the FPGA.

402 Slide - Implementation Tools 402 Implementation tool is offered by the FPGA’s manufacturer and generally not a third-party company. The tools use proprietary algorithms to process the synthesized netlist and produce the final programming file. Step up is easy, the synthesized netlist is automatically fed into the implementation process for a complete package development tools. Tools must be directed to the synthesized netlist for a third party’s netlist. Putting the design into the FPGA and interconnecting can be the most challenging and time-consuming part of the development process. Minimum Input: synthesized netlist with an optional user-defined constraints file.

403 Slide - Implementation Tools – cnt’d User Constraints 403 User-defined constraint files contain such information as timing, pin assignments, and internal placement for logic. Constraints make the tool work harder. Make sure to consider all the factors when determining when and what should be constrained. Try to keep the device utilization below a reasonable percent. Consider the room needed for potential growths and spare pins. Pin assignment is most used constraint since it impacts the board routing. Either the tool or you should assign pins. Possibly, let the tools make the initial pin assignment, review the list, and make changes as necessary.

404 Slide - Implementation Phase Tips 404 Remember to lock pin assignments, otherwise they are subject to change. Create constraints only when necessary. Implementation Processes can be performed continuously, if no errors are encountered. Consult the data sheet, user’s guide, or other manufacturer’s materials to find acceptable configuration options for your FPGA.

405 Slide - Programming 405 Programming is the final development phase and the introduction of hardware. Programming involves transferring the bit stream into a nonvolatile or volatile memory device and configuring or programming the FPGA. Serially or Parallel data transfer. Configuration can involve one or a series of daisy chained or connected FPGAs. Nonvolatile device are located on the same board as the targeted FPGA or even on another board. The FPGA may be operating in wither master (controlling configuration) or slave (not controlling configuration) mode.

406 Slide - Tools and Hardware 406 If the microprocessor holds the bit-stream, then it is merged with the software build. The processor configures the FPGA on power-up. For nonvolatile memory, programming options include: JTAG (Joint Test Advisory Group) in-system programming (ISP) Third-party programmer

407 Slide - JTAG - Joint Test Advisory Group 407 IEEE , Standard Test Access Port and Boundary Scan Architecture. Access pins on a JTAG –compatible device that provides visibility inside the device. Testing and debugging mechanism used to detect manufacturing faults on populated boards. Tools include JTAG software and a software host, and the hardware is JTAG cable. JTAG software is the interface used to transfer the bit stream from the host to the programmable device.

408 Slide - JTAG - Joint Test Advisory Group – cnt’d 408 Over time, it was realized that JTAG ports could be used for programming. The pins include: TDI (Test Data In) TDO (Test Data Out) TCK (Test Clock) TMS (Test Mode Select) Optional TRST (Test Reset) A JTAG programming involves transferring the bit stream from the host through the JTAG cable to a header, test pins,or a connector on a board that connects to the JTAG-compatible nonvolatile memory devices. FPGA Manufacturers generally offer JTAG programming tools, cable, and any necessary supplies.

409 Slide - In-System Programming 409 Device can be programmed while the system is still operating. Datasheet specifies whether the device supports ISP. Tools needed: ISP software on Host, downloadable cable. Programming can be done by connecting Test Pins to ATE or a board connector. Supported protocols are the IEEE for Boundary-Scan-Based In-System (IEEE1532), JTAG, and serial peripheral interface (SPI). ISP is a better option.

410 Slide - Third Party Programming 410 Available from third-party manufacturers. Include GUI, programming base that connects to a computer, and some socket adaptors or all-in-one programmer. A socket adaptor is where the programmable device is placed to get programmed. Each is designed to hold specific package type. Example: Data I/O Manual programming

411 Slide - Hardware Configuration 411 FPGA can be master or slave Configuration pins are set to specific values to indicate whether it is a master or a slave. Always make the programming pins accessible via test points, or a connector ModeM2M1M0 Master serial000 Slave serial111 Master Parallel011 Slave Parallel110 JTAG101

412 Slide - Board Design Tips 412 Tip 1: When daisy-chain device, make sure to add the ability to jump out or remove any of the device if necessary. Tip 2: Design with troubleshooting mindset, test points, pads, or connectors are valuable. Consider using test connectors that mate the lab equipment hardware. Tip 3: Select the FPGA package based on the ability to upgrade to a larger size in the same package without re- spinning the board. Make sure the two devices are pin-pin compatible. Goal: upgrade to a larger size without having to redo the board. Tip 4: Unused pins, make sure to consult with datasheet for appropriate level, (terminate unused pins).

413 Slide - DE2 board 413 The DE2 board contains a serial EEPROM chip that stores configuration data for the Cyclone II FPGA. This configuration data is automatically loaded from the EEPROM chip into the FPGA each time power is applied to the board. Using the Quartus II software, it is possible to reprogram the FPGA at any time, and it is also possible to change the non-volatile data that is stored in the serial EEPROM chip. JTAG programming: In this method of programming, named after the IEEE standards Joint Test Action Group, the configuration bit stream is downloaded directly into the Cyclone II FPGA. The FPGA will retain this configuration as long as power is applied to the board the configuration is lost when the power is turned off. AS programming: In this method, called Active Serial programming, the configuration bit stream is downloaded into the Altera EPCS16 serial EEPROM chip. It provides non-volatile storage of the bit stream, so that the information is retained even when the power supply to the DE2 board is turned off. When the board's power is turned on, the configuration data in the EPCS16 device is automatically loaded into the Cyclone II FPGA.

414 Slide Final Exam Scope – Wednesday Dec 12:45 pm PLD –PROM –PLA –PAL –CPLD –Programming PLD –ASIC FPGA Architecture Quartus Development software FPGA Programming Technology SRAM versus Antifuse FPGA EEPROM/Flash FPGA Xilinx FPGA Architecture FPGA basic building blocks FPGA Embedded Blocks FPGA Clocking Mechanism FPGA Family Altera Megafunctions FPGA Design flow Design phase Advanced VHDL Topics Simulation versus Synthesis Latches versus registers Common pitfalls Unwanted latches Case statement Variable versus signals Synthesizable subprograms Gated clocks Inferring Logic Functions Control Signal Priority Tri-state Memory State Machine Coding VHDL Logic Optimization & Performance Balancing operators Resource Sharing Logic Duplication Pipelining

415 Slide Parameterized Code –Constructs Pre-Defined Attributes Generics For Generate If generate RTL Simulation Functional Simulation Gate Level simulation Testbenchs –Classes of Testbenches Advantages and Disadvantages –Test Vector Generation –Self Verifying Testbenches –Useful VHDL constructs for Testbenches Synthesis –Synthesis Operation Syntax Check and resource association Optimization Synthesis Operation Technology Mapping Synthesis Tools Netlists Status Reports Schematic View (RTL and Technology View) Timing Analysis using TimeQuest Implementation Implementation Processes Tools Programming Tools and hardware Final Exam Scope – Wednesday Dec 12:45 pm

416 Slide Example - 1 What is DRC and Where it happened in Implementation phase? State the four process of implementation phase? Explain the difference between Functional Simulation and Gate level Simulation?

417 Slide Example - 1 What is DRC and Where it happened in Implementation phase? DRC: Design Rule Check and is performed on the NGD list in Mapping. State the four process of implementation phase? –Translate, Map, Place and Route, and Generate Program File

418 Slide Example – 1 Functional vs. Gate-Level Performed on netlist or code generated by synthesis tool Sometimes necessary to direct synthesis tool to provide netlist Initial Timing Analysis Will the synthesized design fit or work on the target hardware Performed on the netlist generated by the implementation tool. Contains actually timing information Will it work as expected over all operational conditions Detects design timing problems It is –Representative of hardware –Most realistic FunctionalGate-Level

419 Slide Example - 2 Given the following entity declaration of a register: Write a component instantiation that instantiates the reg entity to implement a 4-bit control register. The register data input connects to the rightmost four bits of data_out, the clk input to io_write, the reset input to io_reset and the data output to control signals io_en, io_int_en, io_dir, and io_mode. ENTITY reg IS GENERIC (width : positive); PORT ( d : IN STD_LOGIC_VECTOR (0 to width – 1); q: OUT STD_LOGIC_VECTOR (0 to width – 1); Clk, reset : IN STD_LOGIC); END ENTITY reg;

420 Slide Example - 2 Write a component instantiation that instantiates the reg entity to implement a 4-bit control register. The register data input connects to the rightmost four bits of data_out, the clk input to io_write, the reset input to io_reset and the data output to control signals io_en, io_int_en, io_dir, and io_mode. Io_control_reg : reg GENERIC MAP (width => 4); PORT MAP ( d => data_out (3 downto 0), q(0) => io_en, q(1) => io_int_en, q(2) => io_dir, q(3) => io_mode, clk => io_write, reset => io_reset); END ENTITY reg;

421 Slide Example - 3 Draw a diagram illustrating the circuit described by the following generate statement: Synch_delay_line : for stage in 1 to 4 generate Delay_ff : component d_ff port map (clk => sys_clock, d => delayed_data ( stage – 1), q => delayed_data (stage) ); End generate synch_delay_line;

422 Slide Example - 3 Draw a diagram illustrating the circuit described by the following generate statement: Synch_delay_line : for stage in 1 to 4 generate Delay_ff : component d_ff port map (clk => sys_clock, d => delayed_data ( stage – 1), q => delayed_data (stage) ); End generate synch_delay_line;

423 Slide Example - 4 Write a conditional generate statement that connects a signal external_clock directly to a signal internal_clock if a Boolean generic constant positive_clock is true. If the generic is false, the statement should connect external_clock to internal_clock via an instance of an inverter component.

424 Slide Example - 4 Write a conditional generate statement that connects a signal external_clock directly to a signal internal_clock if a Boolean generic constant positive_clock is true. If the generic is false, the statement should connect external_clock to internal_clock via an instance of an inverter component.

425 Slide Logic Duplication Intentional duplication of logic to improve performance Synthesis tools can perform automatically –User sets maximum fan-out of a node

426 Slide Fan-out Problems High fan-out increases placement difficulty –High fan-out node cannot be placed close to all destinations –Ex: Fan-out of 1 & 15

427 Slide Controlling Fan-out Logic fan-out reduced by replication –Path now contains fan-out of 3 & 5

428 Slide Logic Duplication Example High fan-out node duplicated & placed to reduce delay N

429 Slide Most synthesis tools feature options which limit fan-out Advantage: Easy experimentation Disadvantage: Less control over results –Knowing which nodes have high fan-out & their destination helps floor-planning Automatic Fan-out Control

430 Slide Quartus II Software Fan-out Control Select Signal Details