Digital Design: An Embedded Systems Approach Using Verilog

Digital Design: An Embedded Systems Approach Using Verilog
Digital Design — Chapter 1 — Introduction and Methodology 27 January 2018 Digital Design: An Embedded Systems Approach Using Verilog Chapter 1 Introduction and Methodology Portions of this work are from the book, Digital Design: An Embedded Systems Approach Using Verilog, by Peter J. Ashenden, published by Morgan Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.

Digital Design — Chapter 1 — Introduction and Methodology
27 January 2018 Digital Design Digital: circuits that use two voltage levels to represent information Logic: use truth values and logic to analyze circuits Design: meeting functional requirements while satisfying constraints Constraints: performance, size, power, cost, etc. Digital Design — Chapter 1 — Introduction and Methodology

Design using Abstraction
Digital Design — Chapter 1 — Introduction and Methodology 27 January 2018 Design using Abstraction Circuits contain millions of transistors How can we manage this complexity? Abstraction Focus on aspects relevant aspects, ignoring other aspects Don’t break assumptions that allow aspect to be ignored! Examples: Transistors are on or off Voltages are low or high Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Digital Systems Electronic circuits that use discrete representations of information Discrete in space and time Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Embedded Systems Most real-world digital systems include embedded computers Processor cores, memory, I/O Different functional requirements can be implemented by the embedded software by special-purpose attached circuits Trade-off among cost, performance, power, etc. Digital Design — Chapter 1 — Introduction and Methodology

Binary Representation
Digital Design — Chapter 1 — Introduction and Methodology 27 January 2018 Binary Representation Basic representation for simplest form of information, with only two states a switch: open or closed a light: on or off a microphone: active or muted a logical proposition: false or true a binary (base 2) digit, or bit: 0 or 1 Digital Design — Chapter 1 — Introduction and Methodology

Binary Representation: Example
Digital Design — Chapter 1 — Introduction and Methodology 27 January 2018 Binary Representation: Example Signal represents the state of the switch high-voltage => pressed, low-voltage => not pressed Equally, it represents state of the lamp lamp_lit = switch_pressed Digital Design — Chapter 1 — Introduction and Methodology

Binary Representation: Example
Digital Design — Chapter 1 — Introduction and Methodology 27 January 2018 Binary Representation: Example dark: it’s night time lamp_enabled: turn on lamp at night lamp_lit: lamp_enabled AND dark Logically: day time => NOT lamp_lit Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Basic Gate Components Primitive components for logic design Digital Design — Chapter 1 — Introduction and Methodology

Combinational Circuits
Digital Design — Chapter 1 — Introduction and Methodology 27 January 2018 Combinational Circuits Circuit whose output values depend purely on current input values Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Sequential Circuits Circuit whose output values depend on current and previous input values Include some form of storage of values Nearly all digital systems are sequential Mixture of gates and storage components Combinational parts transform inputs and stored values Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Flipflops and Clocks Edge-triggered D-flipflop stores one bit of information at a time Timing diagram Graph of signal values versus time Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Real-World Circuits Assumptions behind digital abstraction ideal circuits, only two voltages, instantaneous transitions, no delay Greatly simplify functional design Constraints arise from real components and real-world physics Meeting constraints ensures circuits are “ideal enough” to support abstractions Digital Design — Chapter 1 — Introduction and Methodology

Integrated Circuits (ICs)
Digital Design — Chapter 1 — Introduction and Methodology 27 January 2018 Integrated Circuits (ICs) Circuits formed on surface of silicon wafer Minimum feature size reduced in each technology generation Currently 90nm, 65nm Moore’s Law: increasing transistor count CMOS: complementary MOSFET circuits Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Logic Levels Actual voltages for “low” and “high” Example: 1.4V threshold for inputs Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Logic Levels TTL logic levels with noise margins VOL: output low voltage VIL: input low voltage VOH: output high voltage VIH: input high voltage Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Static Load and Fanout Current flowing into or out of an output High: SW1 closed, SW0 open Voltage drop across R1 Too much current: VO < VOH Low: SW0 closed, SW1 open Voltage drop across R0 Too much current: VO > VOL Fanout: number of inputs connected to an output determines static load Digital Design — Chapter 1 — Introduction and Methodology

Capacitive Load and Prop Delay
Digital Design — Chapter 1 — Introduction and Methodology 27 January 2018 Capacitive Load and Prop Delay Inputs and wires act as capacitors tr: rise time tf: fall time tpd: propagation delay delay from input transition to output transition Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Other Constraints Wire delay: delay for transition to traverse interconnecting wire Flipflop timing delay from clk edge to Q output D stable before and after clk edge Power current through resistance => heat must be dissipated, or circuit cooks! Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Area and Packaging Circuits implemented on silicon chips Larger circuit area => greater cost Chips in packages with connecting wires More wires => greater cost Package dissipates heat Packages interconnected on a printed circuit board (PCB) Size, shape, cooling, etc, constrained by final product Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Models Abstract representations of aspects of a system being designed Allow us to analyze the system before building it Example: Ohm’s Law V = I × R Represents electrical aspects of a resistor Expressed as a mathematical equation Ignores thermal, mechanical, materials aspects Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Verilog Hardware Description Language A computer language for modeling behavior and structure of digital systems Electronic Design Automation (EDA) using Verilog Design entry: alternative to schematics Verification: simulation, proof of properties Synthesis: automatic generation of circuits Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Module Ports Describe input and outputs of a circuit Digital Design — Chapter 1 — Introduction and Methodology

Structural Module Definition
Digital Design — Chapter 1 — Introduction and Methodology 27 January 2018 Structural Module Definition module vat_buzzer_struct ( output buzzer, input above_25_0, above_30_0, low_level_0, input above_25_1, above_30_1, low_level_1, input select_vat_1 ); wire below_25_0, temp_bad_0, wake_up_0; wire below_25_1, temp_bad_1, wake_up_1; // components for vat 0 not inv_0 (below_25_0, above_25_0); or or_0a (temp_bad_0, above_30_0, below_25_0); or or_0b (wake_up_0, temp_bad_0, low_level_0); // components for vat 1 not inv_1 (below_25_1, above_25_1); or or_1a (temp_bad_1, above_30_1, below_25_1); or or_1b (wake_up_1, temp_bad_1, low_level_1); mux2 select_mux (buzzer, select_vat_1, wake_up_0, wake_up_1); endmodule Digital Design — Chapter 1 — Introduction and Methodology

Behavioral Module Definition
Digital Design — Chapter 1 — Introduction and Methodology 27 January 2018 Behavioral Module Definition module vat_buzzer_struct ( output buzzer, input above_25_0, above_30_0, low_level_0, input above_25_1, above_30_1, low_level_1, input select_vat_1 ); assign buzzer = select_vat_1 ? low_level_1 | (above_30_1 | ~above_25_1) : low_level_0 | (above_30_0 | ~above_25_0); endmodule Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Design Methodology Simple systems can be design by one person using ad hoc methods Real-world systems are design by teams Require a systematic design methodology Specifies Tasks to be undertaken Information needed and produced Relationships between tasks dependencies, sequences EDA tools used Digital Design — Chapter 1 — Introduction and Methodology

A Simple Design Methodology
Digital Design — Chapter 1 — Introduction and Methodology 27 January 2018 A Simple Design Methodology Requirements and Constraints Design Synthesize Physical Implementation Manufacture Functional Verification Post-synthesis Verification Physical Verification Test Y Y Y OK? OK? OK? N N N Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Hierarchical Design Circuits are too complex for us to design all the detail at once Design subsystems for simple functions Compose subsystems to form the system Treating subcircuits as “black box” components Verify independently, then verify the composition Top-down/bottom-up design Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Hierarchical Design Architecture Design Unit Design Design Unit Verification Functional Verification N OK? Y OK? Y Integration Verification N N OK? Y Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Synthesis We usually design using register-transfer-level (RTL) Verilog Higher level of abstraction than gates Synthesis tool translates to a circuit of gates that performs the same function Specify to the tool the target implementation fabric constraints on timing, area, etc. Post-synthesis verification synthesized circuit meets constraints Digital Design — Chapter 1 — Introduction and Methodology

Physical Implementation
Digital Design — Chapter 1 — Introduction and Methodology 27 January 2018 Physical Implementation Implementation fabrics Application-specific ICs (ASICs) Field-programmable gate arrays (FPGAs) Floor-planning: arranging the subsystems Placement: arranging the gates within subsystems Routing: joining the gates with wires Physical verification physical circuit still meets constraints use better estimates of delays Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Codesign Methodology Requirements and Constraints Partitioning Hardware Requirements and Constraints Software Requirements and Constraints Hardware Design and Verification Software Design and Verification N N OK? OK? Manufacture and Test Digital Design — Chapter 1 — Introduction and Methodology

27 January 2018 Summary Digital systems use discrete (binary) representations of information Basic components: gates and flipflops Combinational and sequential circuits Real-world constraints logic levels, loads, timing, area, etc Verilog models: structural, behavioral Design methodology Digital Design — Chapter 1 — Introduction and Methodology

Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Digital Design: An Embedded Systems Approach Using Verilog Chapter 2 Combinational Basics Portions of this work are from the book, Digital Design: An Embedded Systems Approach Using Verilog, by Peter J. Ashenden, published by Morgan Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.

Combinational Circuits
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Combinational Circuits Circuits whose outputs depend only on current input values no storage of past input values no state Can be analyzed using laws of logic Boolean algebra, similar to propositional calculus Digital Design — Chapter 2 — Combinational Basics

Digital Design — Chapter 2 — Combinational Basics
27 January 2018 Boolean Functions Functions operating on two-valued inputs giving two-valued outputs 0, implemented as a low voltage level 1, implemented as a high voltage level Function defines output value for all possible combinations of input value Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Truth Tables Tabular definition of a Boolean function Logical OR Logical AND Logical NOT x y x + y 1 x y 1 x 1 inverter OR gate AND gate Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Boolean Expressions Combination of variables, 0 and 1 literals, operators: Parentheses for order of evaluation Precedence: · before + Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Boolean Equations Equality relation between Boolean expressions Often, LHS is a single variable name The Boolean equation then defines a function of that name Implemented as a combinational circuit Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Boolean Equations Boolean equations and truth tables are both valid ways to define a function x y z f 1 Evaluate f for each combination of input values, and fill in table Q: How many rows in a truth table for an n-input Boolean function? Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Minterms x y z f 1 Given a truth table For each rows where function value is 1, form a minterm: AND of variables where input is 1 NOT of variables where input is 0 Form OR of minterms Digital Design — Chapter 2 — Combinational Basics

27 January 2018 P-terms This is in sum-of-products form logical OR of p-terms (product terms) Not all p-terms are minterms eg, the following also defines f Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Equivalence These expressions all represent the same Boolean function The expressions are equivalent Consistent substitution of variable values gives the same values for the expressions Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Optimization Equivalence allows us to optimize choose a different circuit that implements the same function more cheaply Caution: smaller gate count is not always better choice depends on constraints that apply Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Complex Gates All Boolean functions can be implemented using AND, OR and NOT But other complex gates may meet constraints better in some fabrics x y NOR NAND XOR XNOR 1 NAND NOR XOR XNOR AND-OR- INVERT Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Complex Gate Example These two expressions are equivalent: The NAND-NOR circuit is much smaller and faster in most fabrics! Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Buffers Identity function: output = input Needed for high fanout signals Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Don’t Care Inputs Used where some inputs don’t affect the value of a function Example: multiplexer s a b z 1 s a b z – 1 Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Don’t Care Outputs For input combinations that can’t arise don’t care if output is 0 or 1 let the synthesis tool choose a b c f f1 f2 – 1 Digital Design — Chapter 2 — Combinational Basics

Boolean Algebra – Axioms
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Boolean Algebra – Axioms Commutative Laws Associative Laws Distributive Laws Identity Laws Complement Laws Dual of a Boolean equation substitute 0 for 1, 1 for 0, + for ·, · for + if original is valid, dual is also valid Digital Design — Chapter 2 — Combinational Basics

Hardware Interpretation
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Hardware Interpretation Laws imply equivalent circuits Example: Associative Laws Digital Design — Chapter 2 — Combinational Basics

27 January 2018 More Useful Laws Idempotence Laws Identity Laws Absorption Laws DeMorgan Laws Digital Design — Chapter 2 — Combinational Basics

Circuit Transformation
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Circuit Transformation Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Optimization Methods How do we decide which Law to apply? What are we trying to optimize? Methods Karnaugh maps, Quine-McClusky minimize gate count Espresso, Espresso-II, … multi-output minimization Manual methods are only tractable for small circuits Useful methods are embedded in EDA tools We just specify constraints Digital Design — Chapter 2 — Combinational Basics

Boolean Equations in Verilog
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Boolean Equations in Verilog Use logical operators in assignment statements module circuit ( output f, input x, y, z ); assign f = (x | (y & ~z)) & ~(y & z); endmodule Digital Design — Chapter 2 — Combinational Basics

Verilog Logical Operators
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Verilog Logical Operators a & b Precedence not has highest then &, then ^ and ~^, then | use parentheses to make order of evaluation clear Verilog bit values 1'b0 and 1'b1 a | b ~(a & b) ~(a | b) a ^ b a ~^ b ~a Digital Design — Chapter 2 — Combinational Basics

Boolean Equation Example
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Boolean Equation Example Air conditioner control logic heater_on = temp_low · auto_temp + manual_heat cooler_on = temp_high · auto_temp + manual_cool fan_on = heater_on + cooler_on + manual_fan module aircon ( output heater_on, cooler_on, fan_on, input temp_low, temp_high, auto_temp, input manual_heat, manual_cool, manual_fan ); assign heater_on = (temp_low & auto_temp) | manual_heat; assign cooler_on = (temp_high & auto_temp) | manual_cool; assign fan_on = heater_on | cooler_on | manual_fan; endmodule Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Binary Coding How do we represent information with more than two possible values? eg, numbers N voltage levels? — No. Multiple binary signals (multiple bits) (a1, a0): (0, 0), (0, 1), (1, 0), (1, 1) This is a binary code Each pair of values is a code word Uses two signal wires for a1, a0 Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Code Word Size An n-bit code has 2n code words To represent N possible values Need at least log2N code word bits More bits can be useful in some cases Example: code for inkjet printer black, cyan, magenta, yellow, red, blue six values, log26 = 3 black: (0, 0, 1), cyan: (0, 1, 0), magenta: (0, 1, 1), yellow: (1, 0, 0), red: (1, 0, 1), blue: (1, 1, 0) Digital Design — Chapter 2 — Combinational Basics

27 January 2018 One-Hot Codes Each code word has exactly one 1 bit Traffic light: red: (1,0,0), yellow: (0,1,0), green: (0,0,1) Three signal wires: red, yellow, green Each bit of a one-hot code corresponds to an encoded value No hardware needed to decode values Digital Design — Chapter 2 — Combinational Basics

Binary Codes in Verilog
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Binary Codes in Verilog Multiple bits represented by a vector wire [4:0] w; This is a five-element wire w[4], w[3], w[2], w[1], w[0] wire [1:3] a; This is a three-element wire A[1], a[2], a[3] Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Binary Coding Example Traffic-light controller with 1-hot code enable == 1: lights_out = lights_in enable == 0: lights_out = (0, 0, 0) module light_controller_and_enable ( output [1:3] lights_out, input [1:3] lights_in, input enable ); assign lights_out[1] = lights_in[1] & enable; assign lights_out[2] = lights_in[2] & enable; assign lights_out[3] = lights_in[3] & enable; endmodule Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Binary Coding Example module light_controller_conditional_enable ( output [1:3] lights_out, input [1:3] lights_in, input enable ); assign lights_out = enable ? lights_in : 3'b000; endmodule Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Bit Errors Electrical noise can change logic levels Bit flip: 0 → 1, 1 → 0 If flipped signal is in a code word result may be a different code word or an invalid code word inkjet printer, blue: (1, 1, 0) → ?: (1, 1, 1) Could ignore the possibility of a bit flip don’t specify behavior of circuit ok if probability is low, effect isn’t disastrous, and application is cost sensitive Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Fail-Safe Design Detect illegal code words produce a safe result Traffic-light controller with 1-hot code illegal code  red light Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Redundant Codes Include extra error code words each differs from a valid code word by a bit-flip ensure no two valid code words are a bit-flip apart Detect error code words take exceptional action eg, stop, error light, etc Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Parity Extend a code word with a parity bit Even parity: even number of 1 bits , Odd parity: odd number of 1 bits , To check for bit flip, count the 1s even parity: → What if there are two bit flips? even parity: → Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Parity Using XOR Gates XOR gives even parity for two bits extends to multiple bits, associatively Digital Design — Chapter 2 — Combinational Basics

Combinational Components
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Combinational Components We can build complex combination components from gates Decoders, encoders Multiplexers … Use them as subcomponents of larger systems Abstraction and reuse Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Decoders A decoder derives control signals from a binary coded signal One per code word Control signal is 1 when input has the corresponding code word; 0 otherwise For an n-bit code input Decoder has 2n outputs Example: (a3, a2, a1, a1) Output for (1, 0, 1, 1): Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Decoder Example Color Codeword (c2, c1, c0) black 0, 0, 1 cyan 0, 1, 0 magenta 0, 1, 1 yellow 1, 0, 0 red 1, 0, 1 blue 1, 1, 0 Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Decoder Example module ink_jet_decoder ( output black, cyan, magenta, yellow, light_cyan, light_magenta, input color2, color1, color0 ); assign black = ~color2 & ~color1 & color0; assign cyan = ~color2 & color1 & ~color0; assign magenta = ~color2 & color1 & color0; assign yellow = color2 & ~color1 & ~color0; assign light_cyan = color2 & ~color1 & color0; assign light_magenta = color2 & color1 & ~color0; endmodule Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Encoders An encoder encodes which of several inputs is 1 Assuming (for now) at most one input is 1 at a time What if no input is 1? Separate output to indicate this condition Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Encoder Example Zone Codeword Zone 1 0, 0, 0 Zone 2 0, 0, 1 Zone 3 0, 1, 0 Zone 4 0, 1, 1 Zone 5 1, 0, 0 Zone 6 1, 0, 1 Zone 7 1, 1, 0 Zone 8 1, 1, 1 Burglar alarm: encode which zone is active Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Priority Encoders If more than one input can be 1 Encode input that is 1 with highest priority zone intruder_zone valid (1) (2) (3) (4) (5) (6) (7) (8) (0) 1 – Digital Design — Chapter 2 — Combinational Basics

Priority Encoder Example
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Priority Encoder Example module alarm_priority_1 ( output [2:0] intruder_zone, output valid, input [1:8] zone ); assign intruder_zone = zone[1] ? 3'b000 : zone[2] ? 3'b001 : zone[3] ? 3'b010 : zone[4] ? 3'b011 : zone[5] ? 3'b100 : zone[6] ? 3'b101 : zone[7] ? 3'b110 : zone[8] ? 3'b111 : 'b000; assign valid = zone[1] | zone[2] | zone[3] | zone[4] | zone[5] | zone[6] | zone[7] | zone[8]; endmodule Digital Design — Chapter 2 — Combinational Basics

27 January 2018 BCD Code Binary coded decimal 4-bit code for decimal digits 0: 0000 1: 0001 2: 0010 3: 0011 4: 0100 5: 0101 6: 0110 7: 0111 8: 1000 9: 1001 Digital Design — Chapter 2 — Combinational Basics

Seven-Segment Decoder
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Seven-Segment Decoder Decodes BCD to drive a 7-segment LED or LCD display digit Segments: (g, f, e, d, c, b, a) Digital Design — Chapter 2 — Combinational Basics

Seven-Segment Decoder
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Seven-Segment Decoder module seven_seg_decoder ( output [7:1] seg, input [3:0] bcd, input blank ); reg [7:1] seg_tmp; case (bcd) 'b0000: seg_tmp = 7'b ; // 'b0001: seg_tmp = 7'b ; // 'b0010: seg_tmp = 7'b ; // 'b0011: seg_tmp = 7'b ; // 'b0100: seg_tmp = 7'b ; // 'b0101: seg_tmp = 7'b ; // 'b0110: seg_tmp = 7'b ; // 'b0111: seg_tmp = 7'b ; // 'b1000: seg_tmp = 7'b ; // 'b1001: seg_tmp = 7'b ; // default: seg_tmp = 7'b ; // "-" for invalid code endcase assign seg = blank ? 7'b : seg_tmp; endmodule Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Multiplexers Chooses between data inputs based on the select input 4-to-1 mux 2-to-1 mux two select bits sel z 00 a0 01 a1 10 a2 11 a3 sel z a0 1 a1 N-to-1 multiplexer needs log2 N select bits Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Multiplexer Example module multiplexer_4_to_1 ( output reg z, input [3:0] a, input sel ); case (sel) 'b00: z = a[0]; 'b01: z = a[1]; 'b10: z = a[2]; 'b11: z = a[3]; endcase endmodule Digital Design — Chapter 2 — Combinational Basics

Multi-bit Multiplexers
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Multi-bit Multiplexers To select between N m-bit codeword inputs Connect m N-input multiplexers in parallel Abstraction Treat this as a component Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Multi-bit Mux Example module multiplexer_3bit_2_to_1 ( output [2:0] z, input [2:0] a0, a1, input sel ); assign z = sel ? a1 : a0; endmodule Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Active-Low Logic We’ve been using active-high logic 0 (low voltage): falsehood of a condition 1 (high voltage): truth of a condition Active-low logic logic 0 (low voltage): truth of a condition 1 (high voltage): falsehood of a condition reverses the representation, not negative voltage! In circuit schematics, label active-low signals with overbar notation eg, lamp_lit: low when lit, high when not lit Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Active-Low Example Night-light circuit, lamp connected to power supply Overbar indicates active-low Match bubbles with active-low signals to preserve logic sense Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Implied Negation Negation implied by connecting An active-low signal to an active-high input/output An active-high signal to an active-low input/output Negation implied Digital Design — Chapter 2 — Combinational Basics

Active-Low Signals and Gates
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Active-Low Signals and Gates DeMorgan’s laws suggest alternate views for gates They’re the same electrical circuit! Use the view that best represents the logical function intended Match the bubbles, unless implied negation is intended Digital Design — Chapter 2 — Combinational Basics

Active-Low Logic in Verilog
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Active-Low Logic in Verilog Can’t draw an overbar in Verilog Use _N suffix on signal or port name 1'b0 and 1'b1 in Verilog mean low and high For active-low logic 1'b0 means the condition is true 1'b1 means the condition is false Example assign lamp_lit_N = 1'b0; turns the lamp on Digital Design — Chapter 2 — Combinational Basics

Combinational Verification
Digital Design — Chapter 2 — Combinational Basics 27 January 2018 Combinational Verification Combination circuits: outputs are a function of inputs Functional verification: making sure it's the right function! Verification Testbench Design Under Verification (DUV) Apply Test Cases Checker Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Verification Example Verify operation of traffic-light controller Property to check enable  lights_out == lights_in !enable  all lights are inactive Represent this as an assertion in the checker Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Testbench Module `timescale 1ms/1ms module light_testbench; wire [1:3] lights_out; reg [1:3] lights_in; reg enable; light_controller_and_enable duv ( .lights_out(lights_out), lights_in(lights_in), enable(enable) ); Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Applying Test Cases initial begin enable = 0; lights_in = 3'b000; #1000 enable = 0; lights_in = 3'b001; #1000 enable = 0; lights_in = 3'b010; #1000 enable = 0; lights_in = 3'b100; #1000 enable = 1; lights_in = 3'b001; #1000 enable = 1; lights_in = 3'b010; #1000 enable = 1; lights_in = 3'b100; #1000 enable = 1; lights_in = 3'b000; #1000 enable = 1; lights_in = 3'b111; #1000 $finish; end Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Checking Assertions or lights_in) begin # if (!( ( enable && lights_out == lights_in) || (!enable && lights_out == 3'b000) )) $display("Error in light controller output"); end endmodule Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Functional Coverage Did we test all possible input cases? For large designs, exhaustive testing is not tractable N inputs: number of cases = 2N Functional coverage Proportion of test cases covered by a testbench It can be hard to decide how much testing is enough Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Summary Combinational logic: output values depend only on current input values Boolean functions: defined by truth tables and Boolean equations Equivalence of functions  optimization Binary codes used to represent information with more than two values Digital Design — Chapter 2 — Combinational Basics

27 January 2018 Summary Combinational components gates: AND, OR, inverter, 2-to-1 mux complex gates: NAND, NOR, XOR, XNOR, AOI decoder, encoder, priority encoder Active-low logic Verification testbench apply test cases to DUV checker contains assertions Digital Design — Chapter 2 — Combinational Basics

Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Digital Design: An Embedded Systems Approach Using Verilog Chapter 3 Numeric Basics Portions of this work are from the book, Digital Design: An Embedded Systems Approach Using Verilog, by Peter J. Ashenden, published by Morgan Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.

Digital Design — Chapter 3 — Numeric Basics
27 January 2018 Numeric Basics Representing and processing numeric data is a common requirement unsigned integers signed integers fixed-point real numbers floating-point real numbers complex numbers Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Unsigned Integers Non-negative numbers (including 0) Represent real-world data e.g., temperature, position, time, … Also used in controlling operation of a digital system e.g., counting iterations, table indices Coded using unsigned binary (base 2) representation analogous to decimal representation Digital Design — Chapter 3 — Numeric Basics

Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Binary Representation Decimal: base 10 12410 = 1× × ×100 Binary: base 2 12410 = 1×26+1×25+1×24+1×23+1×22+0×21+0×20 = In general, a number x is represented using n bits as xn–1, xn–2, …, x0, where Digital Design — Chapter 3 — Numeric Basics

Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Binary Representation Unsigned binary is a code for numbers n bits: represent numbers from 0 to 2n – 1 0: 0000…00; 2n – 1: 1111…11 To represent x: 0 ≤ x ≤ N – 1, need log2N bits Computers use 8-bit bytes: 0, …, 255 32-bit words: 0, …, ~4 billion Digital circuits can use what ever size is appropriate Digital Design — Chapter 3 — Numeric Basics

Unsigned Integers in Verilog
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Unsigned Integers in Verilog Use vectors as the representation Can apply arithmetic operations module multiplexer_6bit_4_to_1 ( output reg [5:0] z, input [5:0] a0, a1, a2, a3, input [1:0] sel ); case (sel) 'b00: z = a0; 'b01: z = a1; 'b10: z = a2; 'b11: z = a3; endcase endmodule Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Octal and Hexadecimal Short-hand notations for vectors of bits Octal (base 8) Each group of 3 bits represented by a digit 0: 000, 1:001, 2: 010, …, 7: 111 2538 =  = 3138 Hex (base 16) Each group of 4 bits represented by a digit 0: 0000, …, 9: 1001, A: 1010, …, F: 1111 3CE16 =  = CB16 Digital Design — Chapter 3 — Numeric Basics

Extending Unsigned Numbers
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Extending Unsigned Numbers To extend an n-bit number to m bits Add leading 0 bits e.g., 7210 = = wire [3:0] x; wire [7:0] y; assign y = {4'b0000, x}; assign y = {4'b0, x}; assign y = x; Digital Design — Chapter 3 — Numeric Basics

Truncating Unsigned Numbers
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Truncating Unsigned Numbers To truncate from m bits to n bits Discard leftmost bits Value is preserved if discarded bits are 0 Result is x mod 2n assign x = y[3:0]; Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Unsigned Addition Performed in the same way as decimal carry bits overflow Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Addition Circuits Half adder for least-significant bits xi yi ci si ci+1 1 Full adder for remaining bits Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Ripple-Carry Adder Full adder for each bit, c0 = 0 overflow Worst-case delay from x0, y0 to sn carry must ripple through intervening stages, affecting sum bits Digital Design — Chapter 3 — Numeric Basics

Improving Adder Performance
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Improving Adder Performance Carry kill: xi yi ci si ci+1 1 Carry propagate: Carry generate: Adder equations Digital Design — Chapter 3 — Numeric Basics

Fast-Carry-Chain Adder
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Fast-Carry-Chain Adder Also called Manchester adder Xilinx FPGAs include this structure Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Carry Lookahead Digital Design — Chapter 3 — Numeric Basics

Carry-Lookahead Adder
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Carry-Lookahead Adder Avoids chained carry circuit Use multilevel lookahead for wider numbers Digital Design — Chapter 3 — Numeric Basics

Other Optimized Adders
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Other Optimized Adders Other adders are based on other reformulations of adder equations Choice of adder depends on constraints e.g., ripple-carry has low area, so is ok for low performance circuits e.g., Manchester adder ok in FPGAs that include carry-chain circuits Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Adders in Verilog Use arithmetic “+” operator wire [7:0] a, b, s; ... assign s = a + b; wire [8:0] tmp_result; wire c; ... assign tmp_result = {1'b0, a} + {1'b0, b}; assign c = tmp_result[8]; assign s = tmp_result[7:0]; assign {c, s} = {1'b0, a} + {1'b0, b}; assign {c, s} = a + b; Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Unsigned Subtraction As in decimal borrow bits Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Subtraction Circuits For least-significant bits xi yi bi si bi+1 1 For remaining bits Digital Design — Chapter 3 — Numeric Basics

Adder/Subtracter Circuits
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Adder/Subtracter Circuits Many systems add and subtract Trick: use complemented borrows Addition Subtraction Same hardware can perform both For subtraction: complement y, set Digital Design — Chapter 3 — Numeric Basics

Adder/Subtracter Circuits
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Adder/Subtracter Circuits Adder can be any of those we have seen depends on constraints Digital Design — Chapter 3 — Numeric Basics

Subtraction in Verilog
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Subtraction in Verilog module adder_subtracter ( output [11:0] s, output ovf_unf, input [11:0] x, y, input mode ); assign {ovf_unf, s} = !mode ? (x + y) : (x - y); endmodule Digital Design — Chapter 3 — Numeric Basics

Increment and Decrement
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Increment and Decrement Adding 1: set y = 0 and c0 = 1 These are equations for a half adder Similarly for decrementing: subtracting 1 Digital Design — Chapter 3 — Numeric Basics

Increment/Decrement in Verilog
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Increment/Decrement in Verilog Just add or subtract 1 wire [15:0] x, s; ... assign s = x + 1; // increment x assign s = x - 1; // decrement x Note: 1 (integer), not 1'b1 (bit) Automatically resized Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Equality Comparison XNOR gate: equality of two bits Apply bitwise to two unsigned numbers In Verilog, x == y gives a bit result 1'b0 for false, 1'b1 for true assign eq = x == y; Digital Design — Chapter 3 — Numeric Basics

Inequality Comparison
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Inequality Comparison Magnitude comparator for x > y Digital Design — Chapter 3 — Numeric Basics

Comparison Example in Verilog
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Comparison Example in Verilog Thermostat with target termperature Heater or cooler on when actual temperature is more than 5° from target module thermostat ( output heater_on, cooler_on, input [7:0] target, actual ); assign heater_on = actual < target - 5; assign cooler_on = actual > target + 5; endmodule Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Scaling by Power of 2 This is x shifted left k places, with k bits of 0 added on the right logical shift left by k places e.g., × 23 = Truncate if result must fit in n bits overflow if any truncated bit is not 0 Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Scaling by Power of 2 This is x shifted right k places, with k bits truncated on the right logical shift right by k places e.g., / 23 = Fill on the left with k bits of 0 if result must fit in n bits Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Scaling in Verilog Shift-left (<<) and shift-right (>>) operations result is same size as operand s = = 1910 s = = 1910 assign y = s << 2; assign y = s >> 2; y = = 7610 y = = 410 Digital Design — Chapter 3 — Numeric Basics

Unsigned Multiplication
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Unsigned Multiplication yi x 2i is called a partial product if yi = 0, then yi x 2i = 0 if yi = 1, then yi x 2i is x shifted left by i Combinational array multiplier AND gates form partial products adders form full product Digital Design — Chapter 3 — Numeric Basics

Unsigned Multiplication
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Unsigned Multiplication Adders can be any of those we have seen Optimized multipliers combine parts of adjacent adders Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Product Size Greatest result for n-bit operands: Requires 22n bits to avoid overflow Adding n-bit and m-bit operands requires n + m bits wire [ 7:0] x; wire [13:0] y; wire [21:0] p; ... assign p = {14'b0, x} * {8'b0, y}; assign p = x * y; // implicit resizing Digital Design — Chapter 3 — Numeric Basics

Other Unsigned Operations
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Other Unsigned Operations Division, remainder More complicated than multiplication Large circuit area, power Complicated operations are often performed sequentially in a sequence of steps, one per clock cycle cost/performance/power trade-off Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Gray Codes Important for position encoders Only one bit changes at a time Segment Code 0000 8 1100 1 0001 9 1101 2 0011 10 1111 3 0010 11 1110 4 0110 12 1010 5 0111 13 1011 6 0101 14 1001 7 0100 15 1000 See book for n-bit Gray code Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Signed Integers Positive and negative numbers (and 0) n-bit signed magnitude code 1 bit for sign: 0  +, 1  – n – 1 bits for magnitude Signed-magnitude rarely used for integers now circuits are too complex Use 2s-complement binary code Digital Design — Chapter 3 — Numeric Basics

2s-Complement Representation
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 2s-Complement Representation Most-negative number 1000…0 = –2n–1 Most-positive number 0111…1 = +2n–1 – 1 xn–1 = 1 ⇒ negative, xn–1 = 0 ⇒ non-negative Since Digital Design — Chapter 3 — Numeric Basics

2s-Complement Examples
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 2s-Complement Examples = 1×25 + 1×24 + 1×22 + 1×20 = 53 = –1×27 + 1×25 + 1×24 + 1×22 + 1×20 = – = –75 = 0 = –1 = –128 = +127 Digital Design — Chapter 3 — Numeric Basics

Signed Integers in Verilog
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Signed Integers in Verilog Use signed vectors wire signed [ 7:0] a; reg signed [13:0] b; Can convert between signed and unsigned interpretations wire [11:0] s1; wire signed [11:0] s2; ... assign s2 = $signed(s1); // s1 is known to be // less than 2** assign s1= $unsigned(s2); // s2 is known to be nonnegative Digital Design — Chapter 3 — Numeric Basics

Octal and Hex Signed Integers
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Octal and Hex Signed Integers Don’t think of signed octal or hex Just treat octal or hex as shorthand for a vector of bits E.g., is In hex: ⇒ 34C E.g., –4210 is In octal: ⇒ 1726 (10 bits) Digital Design — Chapter 3 — Numeric Basics

Resizing Signed Integers
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Resizing Signed Integers To extend a non-negative number Add leading 0 bits e.g., 5310 = = To truncate a non-negative number Discard leftmost bits, provided discarded bits are all 0 sign bit of result is 0 E.g., 4110 is Truncating to 6 bits: — error! Digital Design — Chapter 3 — Numeric Basics

Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Resizing Signed Integers To extend a negative number Add leading 1 bits See textbook for proof e.g., –7510 = = To truncate a negative number Discard leftmost bits, provided discarded bits are all 1 sign bit of result is 1 Digital Design — Chapter 3 — Numeric Basics

Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Resizing Signed Integers In general, for 2s-complement integers Extend by replicating sign bit sign extension Truncate by discarding leading bits Discarded bits must all be the same, and the same as the sign bit of the result wire signed [ 7:0] x; wire signed [15:0] y; ... assign y = {{8{x[7]}}, x}; assign y = x; ... assign x = y; Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Signed Negation Complement and add 1 Note that E.g., 43 is so –43 is = Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Signed Negation What about negating –2n–1? 1000…00 ⇒ 0111… = 1000…00 Result is –2n–1! Recall range of n-bit numbers is not symmetric Either check for overflow, extend by one bit, or ensure this case can’t arise In Verilog: use – operator E.g., assign y = –x; Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Signed Addition yields cn–1 Perform addition as for unsigned Overflow if cn–1 differs from cn See textbook for case analysis Can use the same circuit for signed and unsigned addition Digital Design — Chapter 3 — Numeric Basics

Signed Addition Examples
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Signed Addition Examples no overflow no overflow no overflow positive overflow negative overflow no overflow Digital Design — Chapter 3 — Numeric Basics

Signed Addition in Verilog
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Signed Addition in Verilog Result of + is same size as operands wire signed [11:0] v1, v2; wire signed [12:0] sum; ... assign sum = {v1[11], v1} + {v2[11], v2}; ... assign sum = v1 + v2; // implicit sign extension To check overflow, compare signs wire signed [7:0] x, y, z; wire ovf; ... assign z = x + y; assign ovf = ~x[7] & ~y[7] & z[7] | x[7] & y[7] & ~z[7]; Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Signed Subtraction Use a 2s-complement adder Complement y and set c0 = 1 Digital Design — Chapter 3 — Numeric Basics

Other Signed Operations
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Other Signed Operations Increment, decrement same as unsigned Comparison =, same as unsigned >, compare sign bits using Multiplication Complicated by the need to sign extend partial products Refer to Further Reading Digital Design — Chapter 3 — Numeric Basics

Scaling Signed Integers
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Scaling Signed Integers Multiplying by 2k logical left shift (as for unsigned) truncate result using 2s-complement rules Dividing by 2k arithmetic right shift discard k bits from the right, and replicate sign bit k times on the left e.g., s = " " -- –13 shift_right(s, 2) = " " -- –13 / 22 Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Fixed-Point Numbers Many applications use non-integers especially signal-processing apps Fixed-point numbers allow for fractional parts represented as integers that are implicitly scaled by a power of 2 can be unsigned or signed Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Positional Notation In decimal In binary Represent as a bit vector: 10101 binary point is implicit Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Unsigned Fixed-Point n-bit unsigned fixed-point m bits before and f bits after binary point Range: 0 to 2m – 2–f Precision: 2–f m may be ≤ 0, giving fractions only e.g., m= –2: Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Signed Fixed-Point n-bit signed 2s-complement fixed-point m bits before and f bits after binary point Range: –2m–1 to 2m–1 – 2–f Precision: 2–f E.g., , signed fixed-point, m = 2 = – = – Digital Design — Chapter 3 — Numeric Basics

Choosing Range and Precision
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Choosing Range and Precision Choice depends on application Need to understand the numerical behavior of computations performed some operations can magnify quantization errors In DSP fixed-point range affects dynamic range precision affects signal-to-noise ratio Perform simulations to evaluate effects Digital Design — Chapter 3 — Numeric Basics

Fixed-Point in Verilog
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Fixed-Point in Verilog Use vectors with implied scaling Index range matches powers of weights Assume binary point between indices 0 and –1 module fixed_converter ( input [5:-7] in, output signed [7:-7] out ); assign out = {2'b0, in}; endmodule Digital Design — Chapter 3 — Numeric Basics

Fixed-Point Operations
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Fixed-Point Operations Just use integer hardware e.g., addition: Ensure binary points are aligned Digital Design — Chapter 3 — Numeric Basics

Floating-Point Numbers
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Floating-Point Numbers Similar to scientific notation for decimal e.g., ×1023, ×10–19 Allow for larger range, with same relative precision throughout the range ×1023 mantissa radix exponent Digital Design — Chapter 3 — Numeric Basics

IEEE Floating-Point Format
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 IEEE Floating-Point Format e bits m bits s exponent mantissa s: sign bit (0  non-negative, 1  negative) Normalize: 1.0 ≤ |M| < 2.0 M always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) Exponent: excess representation: E + 2e–1–1 Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Floating-Point Range Exponents and reserved Smallest value exponent:  E = –2e–1 + 2 mantissa:  M = 1.0 Largest value exponent:  E = 2e–1 – 1 mantissa:  M ≈ 2.0 Range: Digital Design — Chapter 3 — Numeric Basics

Floating-Point Precision
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Floating-Point Precision Relative precision approximately 2–m all mantissa bits are significant m bits of precision m × log102 ≈ m × 0.3 decimal digits Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Example Formats IEEE single precision, 32 bits e = 8, m = 23 range ≈ ±1.2 × 10–38 to ±1.7 × 1038 precision ≈ 7 decimal digits Application-specific, 22 bits e = 5, m = 16 range ≈ ±6.1 × 10–5 to ±6.6 × 104 precision ≈ 5 decimal digits Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Denormal Numbers Exponent =  hidden bit is 0 Smaller than normal numbers allow for gradual underflow, with diminishing precision Mantissa = Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Infinities and NaNs Exponent = , mantissa = ±Infinity Can be used in subsequent calculations, avoiding need for overflow check Exponent = , mantissa ≠ Not-a-Number (NaN) Indicates illegal or undefined result e.g., 0.0 / 0.0 Can be used in subsequent calculations Digital Design — Chapter 3 — Numeric Basics

Floating-Point Operations
Digital Design — Chapter 3 — Numeric Basics 27 January 2018 Floating-Point Operations Considerably more complicated than integer operations E.g., addition unpack, align binary points, adjust exponents add mantissas, check for exceptions round and normalize result, adjust exponent Combinational circuits not feasible Pipelined sequential circuits Digital Design — Chapter 3 — Numeric Basics

27 January 2018 Summary Unsigned: Signed: Octal and Hex short-hand Operations: resize, arithmetic, compare Arithmetic circuits trade off speed/area/power Fixed- and floating-point non-integers Gray codes for position encoding Digital Design — Chapter 3 — Numeric Basics

Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Digital Design: An Embedded Systems Approach Using Verilog Chapter 4 Sequential Basics Portions of this work are from the book, Digital Design: An Embedded Systems Approach Using Verilog, by Peter J. Ashenden, published by Morgan Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.

Digital Design — Chapter 4 — Sequential Basics
27 January 2018 Sequential Basics Sequential circuits Outputs depend on current inputs and previous inputs Store state: an abstraction of the history of inputs Usually governed by a periodic clock signal Digital Design — Chapter 4 — Sequential Basics

27 January 2018 D-Flipflops 1-bit storage element We will treat it as a basic component Other kinds of flipflops SR (set/reset), JK, T (toggle) Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Registers Store a multi-bit encoded value One D-flipflop per bit Stores a new value on each clock cycle wire [n:0] d; reg [n:0] q; ... clk) q <= d; event list nonblocking asignment Digital Design — Chapter 4 — Sequential Basics

Pipelines Using Registers
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Pipelines Using Registers Total delay = Delay1 + Delay2 + Delay3 Interval between outputs > Total delay Clock period = max(Delay1, Delay2, Delay3) Total delay = 3 × clock period Interval between outputs = 1 clock period Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Pipeline Example Compute the average of corresponding numbers in three input streams New values arrive on each clock edge module average_pipeline ( output reg signed [5:-8] avg, input signed [5:-8] a, b, c, input clk ); wire signed [5:-8] a_plus_b, sum, sum_div_3; reg signed [5:-8] saved_a_plus_b, saved_c, saved_sum; ... Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Pipeline Example ... assign a_plus_b = a + b; clk) begin // Pipeline register saved_a_plus_b <= a_plus_b; saved_c <= c; end assign sum = saved_a_plus_b + saved_c; clk) // Pipeline register saved_sum <= sum; assign sum_div_3 = saved_sum * 14'b ; clk) // Pipeline register avg <= sum_div_3; endmodule Digital Design — Chapter 4 — Sequential Basics

D-Flipflop with Enable
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 D-Flipflop with Enable Storage controlled by a clock-enable stores only when CE = 1 on a rising edge of the clock CE is a synchronous control input Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Register with Enable One flipflop per bit clk and CE wired in common wire [n:0] d; wire ce; reg [n:0] q; ... clk) if (ce) q <= d; Digital Design — Chapter 4 — Sequential Basics

Register with Synchronous Reset
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Register with Synchronous Reset Reset input forces stored value to 0 reset input must be stable around rising edge of clk clk) if (reset) q <= 0; else if (ce) q <= d; Digital Design — Chapter 4 — Sequential Basics

Register with Asynchronous Reset
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Register with Asynchronous Reset Reset input forces stored value to 0 reset can become 1 at any time, and effect is immediate reset should return to 0 synchronously Digital Design — Chapter 4 — Sequential Basics

Asynch Reset in Verilog
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Asynch Reset in Verilog clk or posedge reset) if (reset) q <= 0; else if (ce) q <= d; reset is an asynchronous control input here include it in the event list so that the process responds to changes immediately Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Example: Accumulator Sum a sequence of signed numbers A new number arrives when data_en = 1 Clear sum to 0 on synch reset module accumulator ( output reg signed [7:-12] data_out, input signed [3:-12] data_in, input data_en, clk, reset ); wire signed [7:-12] new_sum; assign new_sum = data_out + data_in; clk) if (reset) data_out <= 20'b0; else if (data_en) data_out <= new_sum; endmodule Digital Design — Chapter 4 — Sequential Basics

Flipflop and Register Variations
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Flipflop and Register Variations module flip_flop_n ( output reg Q, output Q_n, input pre_n, clr_n, D, input clk_n, CE ); negedge clk_n or negedge pre_n or negedge clr_n ) begin if ( !pre_n && !clr_n) $display("Illegal inputs: pre_n and clr_n both 0"); if (!pre_n) Q <= 1'b1; else if (!clr_n) Q <= 1'b0; else if (CE) Q <= D; end assign Q_n = ~Q; endmodule Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Shift Registers Performs shift operation on stored data Arithmetic scaling Serial transfer of data Digital Design — Chapter 4 — Sequential Basics

Example: Sequential Multiplier
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Example: Sequential Multiplier 16×16 multiply over 16 clock cycles, using one adder Shift register for multiplier bits Shift register for lsb’s of accumulated product Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Latches Level-sensitive storage Data transmitted while enable is '1' transparent latch Data stored while enable is '0' Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Feedback Latches Feedback in gate circuits produces latching behavior Example: reset/set (RS) latch S Q R Current RTL synthesis tools don’t accept Verilog models with unclocked feedback Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Latches in Verilog Latching behavior is usually an error! if (~sel) begin z1 <= a1; z2 <= b1; end else begin z1 <= a2; z3 <= b2; end Oops! Should be z2 <= ... Values must be stored for z2 while sel = 1 for z3 while sel = 0 Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Counters Stores an unsigned integer value increments or decrements the value Used to count occurrences of events repetitions of a processing step Used as timers count elapsed time intervals by incrementing periodically Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Free-Running Counter Increments every rising edge of clk up to 2n–1, then wraps back to 0 i.e., counts modulo 2n This counter is synchronous all outputs governed by clock edge Digital Design — Chapter 4 — Sequential Basics

Example: Periodic Control Signal
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Example: Periodic Control Signal Count modulo 16 clock cycles Control output = 1 every 8th and 12th cycle decode count values 0111 and 1011 Digital Design — Chapter 4 — Sequential Basics

Example: Periodic Control Signal
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Example: Periodic Control Signal module decoded_counter ( output ctrl, input clk ); reg [3:0] count_value; clk) count_value <= count_value + 1; assign ctrl = count_value == 4'b0111 || count_value == 4'b1011; endmodule Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Count Enable and Reset Use a register with control inputs Increments when CE = 1 on rising clock edge Reset: synch or asynch Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Terminal Count Status signal indicating final count value TC is 1 for one cycle in every 2n cycles frequency = clock frequency / 2n Called a clock divider Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Divider Example Alarm clock beep: 500Hz from 1MHz clock Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Divide by k Decode k–1 as terminal count and reset counter register Counter increments modulo k Example: decade counter Terminal count = 9 Digital Design — Chapter 4 — Sequential Basics

Decade Counter in Verilog
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Decade Counter in Verilog module decade_counter ( output reg [3:0] q, input clk ); clk) q <= q == 9 ? 0 : q + 1; endmodule Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Down Counter with Load Load a starting value, then decrement Terminal count = 0 Useful for interval timer Digital Design — Chapter 4 — Sequential Basics

Loadable Counter in Verilog
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Loadable Counter in Verilog module interval_timer_rtl ( output tc, input [9:0] data, input load, clk ); reg [9:0] count_value; clk) if (load) count_value <= data; else count_value <= count_value - 1; assign tc = count_value == 0; endmodule Digital Design — Chapter 4 — Sequential Basics

Reloading Counter in Verilog
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Reloading Counter in Verilog module interval_timer_repetitive ( output tc, input [9:0] data, input load, clk ); reg [9:0] load_value, count_value; clk) if (load) begin load_value <= data; count_value <= data; end else if (count_value == 0) count_value <= load_value; else count_value <= count_value - 1; assign tc = count_value == 0; endmodule Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Ripple Counter Each bit toggles between 0 and 1 when previous bit changes from 1 to 0 Digital Design — Chapter 4 — Sequential Basics

Ripple or Synch Counter?
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Ripple or Synch Counter? Ripple counter is ok if length is short clock period long relative to flipflop delay transient wrong values can be tolerated area must be minimal E.g., alarm clock Otherwise use a synchronous counter Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Datapaths and Control Digital systems perform sequences of operations on encoded data Datapath Combinational circuits for operations Registers for storing intermediate results Control section: control sequencing Generates control signals Selecting operations to perform Enabling registers at the right times Uses status signals from datapath Digital Design — Chapter 4 — Sequential Basics

Example: Complex Multiplier
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Example: Complex Multiplier Cartesian form, fixed-point operands: 4 pre-, 12 post-binary-point bits result: 8 pre-, 24 post-binary-point bits Subject to tight area constraints 4 multiplies, 1 add, 1 subtract Perform sequentially using 1 multiplier, 1 adder/subtracter Digital Design — Chapter 4 — Sequential Basics

Complex Multiplier Datapath
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Complex Multiplier Datapath Digital Design — Chapter 4 — Sequential Basics

Complex Multiplier in Verilog
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Complex Multiplier in Verilog module multiplier ( output reg signed [7:-24] p_r, p_i, input signed [3:-12] a_r, a_i, b_r, b_i, input clk, reset, input_rdy ); reg a_sel, b_sel, pp1_ce, pp2_ce, sub, p_r_ce, p_i_ce; wire signed [3:-12] a_operand, b_operand; wire signed [7:-24] pp, sum reg signed [7:-24] pp1, pp2; ... Digital Design — Chapter 4 — Sequential Basics

Complex Multiplier in Verilog
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Complex Multiplier in Verilog assign a_operand = ~a_sel ? a_r : a_i; assign b_operand = ~b_sel ? b_r : b_i; assign pp = {{4{a_operand[3]}}, a_operand, 12'b0} * {{4{b_operand[3]}}, b_operand, 12'b0}; clk) // Partial product 1 register if (pp1_ce) pp1 <= pp; clk) // Partial product 2 register if (pp2_ce) pp2 <= pp; assign sum = ~sub ? pp1 + pp2 : pp1 - pp2; clk) // Product real-part register if (p_r_ce) p_r <= sum; clk) // Product imaginary-part register if (p_i_ce) p_i <= sum; ... endmodule Digital Design — Chapter 4 — Sequential Basics

Multiplier Control Sequence
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Multiplier Control Sequence Avoid resource conflict First attempt a_r * b_r → pp1_reg a_i * b_i → pp2_reg pp1 – pp2 → p_r_reg a_r * b_i → pp1_reg a_i * b_r → pp2_reg pp1 + pp2 → p_i_reg Takes 6 clock cycles Digital Design — Chapter 4 — Sequential Basics

Multiplier Control Sequence
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Multiplier Control Sequence Merge steps where no resource conflict Revised attempt a_r * b_r → pp1_reg a_i * b_i → pp2_reg pp1 – pp2 → p_r_reg a_r * b_i → pp1_reg a_i * b_r → pp2_reg pp1 + pp2 → p_i_reg Takes 5 clock cycles Digital Design — Chapter 4 — Sequential Basics

Multiplier Control Signals
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Multiplier Control Signals Step a_sel b_sel pp1_ce pp2_ce sub p_r_ce p_i_ce 1 – 2 3 4 5 Digital Design — Chapter 4 — Sequential Basics

Finite-State Machines
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Finite-State Machines Used the implement control sequencing Based on mathematical automaton theory A FSM is defined by set of inputs: Σ set of outputs: Γ set of states: S initial state: s0  S transition function: δ: S × Σ → S output function: ω: S × Σ → Γ or ω: S → Γ Digital Design — Chapter 4 — Sequential Basics

27 January 2018 FSM in Hardware Mealy FSM only Mealy FSM: ω: S × Σ → Γ Moore FSM: ω: S → Γ Digital Design — Chapter 4 — Sequential Basics

FSM Example: Multiplier Control
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 FSM Example: Multiplier Control Transition function One state per step Separate idle state? Wait for input_rdy = 1 Then proceed to steps 1, 2, ... But this wastes a cycle! Use step 1 as idle state Repeat step 1 if input_rdy ≠ 1 Proceed to step 2 otherwise Output function Defined by table on slide 43 Moore or Mealy? current_state input_ rdy next_ state step1 1 step2 – step3 step4 step5 Digital Design — Chapter 4 — Sequential Basics

27 January 2018 State Encoding Encoded in binary N states: use at least log2N bits Encoded value used in circuits for transition and output function encoding affects circuit complexity Optimal encoding is hard to find CAD tools can do this well One-hot works well in FPGAs Often use for idle state reset state register to idle Digital Design — Chapter 4 — Sequential Basics

27 January 2018 FSMs in Verilog Use parameters for state values Synthesis tool can choose an alternative encoding parameter [2:0] step1 = 3'b000, step2 = 3'b001, step3 = 3'b010, step4 = 3'b011, step5 = 3'b100; reg [2:0] current_state, next_state ; ... Digital Design — Chapter 4 — Sequential Basics

Multiplier Control in Verilog
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Multiplier Control in Verilog clk or posedge reset) // State register if (reset) current_state <= step1; else current_state <= next_state; // Next-state logic case (current_state) step1: if (!input_rdy) next_state = step1; else next_state = step2; step2: next_state = step3; step3: next_state = step4; step4: next_state = step5; step5: next_state = step1; endcase Digital Design — Chapter 4 — Sequential Basics

Multiplier Control in Verilog
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Multiplier Control in Verilog begin // Output_logic a_sel = 1'b0; b_sel = 1'b0; pp1_ce = 1'b0; pp2_ce = 1'b0; sub = 1'b0; p_r_ce = 1'b0; p_i_ce = 1'b0; case (current_state) step1: begin pp1_ce = 1'b1; end step2: begin a_sel = 1'b1; b_sel = 1'b1; pp2_ce = 1'b1; end step3: begin b_sel = 1'b1; pp1_ce = 1'b1; sub = 1'b1; p_r_ce = 1'b1; end step4: begin a_sel = 1'b1; pp2_ce = 1'b1; end step5: begin p_i_ce = 1'b1; end endcase end Digital Design — Chapter 4 — Sequential Basics

State Transition Diagrams
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 State Transition Diagrams Bubbles to represent states Arcs to represent transitions Example S = {s1, s2, s3} Inputs (a1, a2): Σ = {(0,0), (0,1), (1,0), (1,1)} δ defined by diagram Digital Design — Chapter 4 — Sequential Basics

State Transition Diagrams
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 State Transition Diagrams Annotate diagram to define output function Annotate states for Moore-style outputs Annotate arcs for Mealy-style outputs Example x1, x2: Moore-style y1, y2, y3: Mealy-style Digital Design — Chapter 4 — Sequential Basics

Multiplier Control Diagram
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Multiplier Control Diagram Input: input_rdy Outputs a_sel, b_sel, pp1_ce, pp2_ce, sub, p_r_ce, p_i_ce Digital Design — Chapter 4 — Sequential Basics

Bubble Diagrams or Verilog?
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Bubble Diagrams or Verilog? Many CAD tools provide editors for bubble diagrams Automatically generate Verilog for simulation and synthesis Diagrams are visually appealing but can become unwieldy for complex FSMs Your choice... or your manager's! Digital Design — Chapter 4 — Sequential Basics

Register Transfer Level
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Register Transfer Level RTL — a level of abstraction data stored in registers transferred via circuits that operate on data Digital Design — Chapter 4 — Sequential Basics

Clocked Synchronous Timing
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Clocked Synchronous Timing Registers driven by a common clock Combinational circuits operate during clock cycles (between rising clock edges) tco + tpd + tsu < tc Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Control Path Timing tco + tpd-s + tpd-o + tpd-c + tsu < tc tco + tpd-s + tpd-ns + tsu < tc Ignore tpd-s for a Moore FSM Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Timing Constraints Inequalities must hold for all paths If tco and tsu the same for all paths Combinational delays make the difference Critical path The combinational path between registers with the longest delay Determines minimum clock period for the entire system Focus on it to improve performance Reducing delay may make another path critical Digital Design — Chapter 4 — Sequential Basics

Interpretation of Constraints
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Interpretation of Constraints 1. Clock period depends on delays System can operate at any frequency up to a maximum OK for systems where high performance is not the main requirement 2. Delays must fit within a target clock period Optimize critical paths to reduce delays if necessary May require revising RTL organization Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Clock Skew Need to ensure clock edges arrive at all registers at the same time Use CAD tools to insert clock buffers and route clock signal paths Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Off-Chip Connections Delays going off-chip and inter-chip Input and output pad delays, wire delays Same timing rules apply Use input and output registers to avoid adding external delay to critical path Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Asynchronous Inputs External inputs can change at any time Might violate setup/hold time constraints Can induce metastable state in a flipflop Unbounded time to recover May violate setup/hold time of subsequent flipflop Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Synchronizers If input changes outside setup/hold window Change is simply delayed by one cycle If input changes during setup/hold window First flipflop has a whole cycle to resolve metastability See data sheets for metastability parameters Digital Design — Chapter 4 — Sequential Basics

Switch Inputs and Debouncing
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Switch Inputs and Debouncing Switches and push-buttons suffer from contact bounce Takes up to 10ms to settle Need to debounce to avoid false triggering Requires two inputs and two resistors Must use a break-before-make double-throw switch Digital Design — Chapter 4 — Sequential Basics

Switch Inputs and Debouncing
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Switch Inputs and Debouncing Alternative Use a single-throw switch Sample input at intervals longer than bounce time Look for two successive samples with the same value Assumption Extra circuitry inside the chip is cheaper than extra components and connections outside Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Debouncing in Verilog module debouncer ( output reg pb_debounced, input pb, input clk, reset ); reg [18:0] count500000; // values are in the range 0 to wire clk_100Hz; reg pb_sampled; clk or posedge reset) if (reset) count <= ; else if (clk_100Hz) count <= ; else count <= count ; assign clk_100Hz = count == 0; clk) if (clk_100Hz) begin if (pb == pb_sampled) pb_debounced <= pb; pb_sampled <= pb; end endmodule Digital Design — Chapter 4 — Sequential Basics

Verifying Sequential Circuits
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Verifying Sequential Circuits Verification Testbench Design Under Verification (DUV) Apply Test Cases Checker DUV may take multiple and varying number of cycles to produce output Checker needs to synchronize with test generator ensure DUV outputs occur when expected ensure DUV outputs are correct ensure no spurious outputs occur Digital Design — Chapter 4 — Sequential Basics

Example: Multiplier Testbench
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Example: Multiplier Testbench `timescale 1ns/1ns module multiplier_testbench; parameter t_c = 50; reg clk, reset; reg input_rdy; wire signed [3:-12] a_r, a_i, b_r, b_i; wire signed [7:-24] p_r, p_i; real real_a_r, real_a_i, real_b_r, real_b_i, real_p_r, real_p_i, err_p_r, err_p_i; task apply_test ( input real a_r_test, a_i_test, b_r_test, b_i_test ); begin real_a_r = a_r_test; real_a_i = a_i_test; real_b_r = b_r_test; real_b_i = b_i_test; input_rdy = 1'b1; @(negedge clk) input_rdy = 1'b0; repeat clk); end endtask Digital Design — Chapter 4 — Sequential Basics

Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Example: Multiplier Testbench multiplier duv ( .clk(clk), .reset(reset), input_rdy(input_rdy), a_r(a_r), .a_i(a_i), b_r(b_r), .b_i(b_i), p_r(p_r), .p_i(p_i) ); always begin // Clock generator #(t_c/2) clk = 1'b1; #(t_c/2) clk = 1'b0; end initial begin // Reset generator reset <= 1'b1; #(2*t_c) reset = 1'b0; end Digital Design — Chapter 4 — Sequential Basics

Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Example: Multiplier Testbench initial begin // Apply test cases @(negedge reset) @(negedge clk) apply_test(0.0, 0.0, 1.0, 2.0); apply_test(1.0, 1.0, 1.0, 1.0); // further test cases $finish; end assign a_r = $rtoi(real_a_r * 2**12); assign a_i = $rtoi(real_a_i * 2**12); assign b_r = $rtoi(real_b_r * 2**12); assign b_i = $rtoi(real_b_i * 2**12); Digital Design — Chapter 4 — Sequential Basics

Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Example: Multiplier Testbench clk) // Check outputs if (input_rdy) begin real_p_r = real_a_r * real_b_r - real_a_i * real_b_i; real_p_i = real_a_r * real_b_i + real_a_i * real_b_r; repeat clk); err_p_r = $itor(p_r)/2**(-24) - real_p_r; err_p_i = $itor(p_i)/2**(-24) - real_p_i; if (!( -(2.0**(-12)) < err_p_r && err_p_r < 2.0**(-12) && (2.0**(-12)) < err_p_i && err_p_i < 2.0**(-12) )) $display("Result precision requirement not met"); end endmodule Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Asynchronous Timing Clocked synchronous timing requires global clock distribution with minimal skew path delay between registers < clock period Hard to achieve in complex multi-GHz systems Globally asynch, local synch (GALS) systems Divide the systems into local clock domains Inter-domain signals treated as asynch inputs Simplifies clock managements and constraints Delays inter-domain communication Delay-insensitive asynchronous systems no clock signals Digital Design — Chapter 4 — Sequential Basics

Other Clock-Related Issues
Digital Design — Chapter 4 — Sequential Basics 27 January 2018 Other Clock-Related Issues Inter-chip clocking Distributing high-speed clocks on PCBs is hard Often use slower off-chip clock, with on-chip clock a multiple of off-chip clock Synchronize on-chip with phase-locked loop (PLL) In multi-PCB systems treat off-PCB signals as asynch inputs Low power design Continuous clocking wastes power Clock gating: turn off clock to idle subsystems Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Summary Registers for storing data synchronous and asynchronous control clock enable, reset, preset Latches: level-sensitive usually unintentional in Verilog Counters free-running dividers, terminal count, reset, load, up/down Digital Design — Chapter 4 — Sequential Basics

27 January 2018 Summary RTL organization of digital systems datapath and control section Finite-State Machine (FSM) states, inputs, transition/output functions Moore and Mealy FSMs bubble diagrams Clocked synch timing and constraints critical path and optimization Asynch inputs, switch debouncing Verification of sequential systems Digital Design — Chapter 4 — Sequential Basics

Digital Design — Chapter 5 — Memories 27 January 2018 Digital Design: An Embedded Systems Approach Using Verilog Chapter 5 Memories Portions of this work are from the book, Digital Design: An Embedded Systems Approach Using Verilog, by Peter J. Ashenden, published by Morgan Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.

Digital Design — Chapter 5 — Memories
27 January 2018 General Concepts A memory is an array of storage locations Each with a unique address Like a collection of registers, but with optimized implementation Address is unsigned-binary encoded n address bits ⇒ 2n locations All locations the same size 2n × m bit memory m bits 1 2 3 4 5 6 2n–2 2n–1 Digital Design — Chapter 5 — Memories

27 January 2018 Memory Sizes Use power-of-2 multipliers Kilo (K): 210 = 1,024 ≈ 103 Mega (M): 220 = 1,048,576 ≈ 106 Giga (G): 230 = 1,073,741,824 ≈ 109 Example 32K × 32-bit memory Capacity = 1,025K = 1Mbit Requires 15 address bits Size is determined by application requirements Digital Design — Chapter 5 — Memories

Basic Memory Operations
Digital Design — Chapter 5 — Memories 27 January 2018 Basic Memory Operations a inputs: unsigned address d_in and d_out Type depends on application Write operation en = 1, wr = 1 d_in value stored in location given by address inputs Read operation en = 1, wr = 0 d_out driven with value of location given by address inputs Idle: en = 0 Digital Design — Chapter 5 — Memories

Example: Audio Delay Unit
Digital Design — Chapter 5 — Memories 27 January 2018 Example: Audio Delay Unit System clock: 1MHz Audio samples: 8-bit signed, at 50kHz New sample arrives when audio_in_en = 1 Delay control: 8-bit unsigned ⇒ ms to delay Output: audio_out_en = 1 when output ready Digital Design — Chapter 5 — Memories

27 January 2018 Audio Delay Datapath Max delay = 255ms Need to store 255 × 50 = 12,750 samples Use a 16K × 8-bit memory (14 address bits) Digital Design — Chapter 5 — Memories

Audio Delay Control Section
Digital Design — Chapter 5 — Memories 27 January 2018 Audio Delay Control Section Step 1: (idle state) audio_in_en = 0 ⇒ do nothing audio_in_en = 1 ⇒ write memory using counter value as address Step 2: Read memory using subtracter output as address, increment counter State audio_ in_en Next state addr_sel mem_en mem_wr count_en audio_ out_en Step 1 1 Step 2 – Digital Design — Chapter 5 — Memories

27 January 2018 Wider Memories Memory components have a fixed width E.g., ×1, ×4, ×8, ×16, ... Use memory components in parallel to make a wider memory E.g, three 16K×16 components for a 16K×48 memory Digital Design — Chapter 5 — Memories

27 January 2018 More Locations To provide 2n locations with 2k-location components Use 2n/2k components Address A at offset A mod 2k least-significant k bits of A in component A/2k most-significant n–k bits of A decode to select component 1 2k–1 2k 2k+1 2×2k–1 2×2k 2×2k+1 3×2k–1 2n–2k 2n–2k +1 2n–1 Digital Design — Chapter 5 — Memories

27 January 2018 More Locations Example: 64K×8 memory composed of 16K×8 components Digital Design — Chapter 5 — Memories

27 January 2018 Tristate Drivers Allow multiple outputs to be connected together Only one active at a time Remaining outputs are high-impedance Both output transistors turned off Allow bidirectional input/output ports Digital Design — Chapter 5 — Memories

Memories with Tristate Ports
Digital Design — Chapter 5 — Memories 27 January 2018 Memories with Tristate Ports During write memory d drivers hi-Z memory senses d During read selected memory drives d Fewer pins and wires Reduced cost of PCB Usually not available within ASICs or FPGAs Digital Design — Chapter 5 — Memories

27 January 2018 Memory Types Random-Access Memory (RAM) Can read and write Static RAM (SRAM) Stores data so long as power is supplied Asynchronous SRAM: not clocked Synchronous SRAM (SSRAM): clocked Dynamic RAM (DRAM) Needs to be periodically refreshed Read-Only Memory (ROM) Combinational Programmable and Flash rewritable Volatile and non-volatile Digital Design — Chapter 5 — Memories

27 January 2018 Asynchronous SRAM Data stored in 1-bit latch cells Address decoded to enable a given cell Usually use active-low control inputs Not available as components in ASICs or FPGAs Digital Design — Chapter 5 — Memories

27 January 2018 Asynch SRAM Timing Timing parameters published in data sheets Access time From address/enable valid to data-out valid Cycle time From start to end of access Data setup and hold Before/after end of WE pulse Makes asynch SRAMs hard to use in clocked synchronous designs Digital Design — Chapter 5 — Memories

27 January 2018 Example Data Sheet Digital Design — Chapter 5 — Memories

Synchronous SRAM (SSRAM)
Digital Design — Chapter 5 — Memories 27 January 2018 Synchronous SRAM (SSRAM) Clocked storage registers for inputs address, data and control inputs stored on a clock edge held for read/write cycle Flow-through SSRAM no register on data output Digital Design — Chapter 5 — Memories

Example: Coefficient Multiplier
Digital Design — Chapter 5 — Memories 27 January 2018 Example: Coefficient Multiplier Compute function Coefficient stored in flow-through SSRAM 12-bit unsigned integer index for i x, y, ci 20-bit signed fixed-point 8 pre- and 8 post-binary point bits Use a single multiplier Multiply ci × x × x Digital Design — Chapter 5 — Memories

27 January 2018 Multiplier Datapath Digital Design — Chapter 5 — Memories

Multiplier Timing and Control
Digital Design — Chapter 5 — Memories 27 January 2018 Multiplier Timing and Control Digital Design — Chapter 5 — Memories

27 January 2018 Pipelined SSRAM Data output also has a register More suitable for high-speed systems Access RAM in one cycle, use the data in the next cycle Digital Design — Chapter 5 — Memories

27 January 2018 Memories in Verilog RAM storage represented by an array variable reg [15:0] data_RAM [0:4095]; ... clk) if (en) if (wr) begin data_RAM[a] <= d_in; d_out <= d_in; end else d_out <= data_RAM[a]; Digital Design — Chapter 5 — Memories

Digital Design — Chapter 5 — Memories 27 January 2018 Example: Coefficient Multiplier module scaled_square ( output reg signed [7:-12] y, input signed [7:-12] c_in, x, input [11:0] i, input start, input clk, reset ); wire c_ram_wr; reg c_ram_en, x_ce, mult_sel, y_ce; reg signed [7:-12] c_out, x_out; reg signed [7:-12] c_RAM [0:4095]; reg signed [7:-12] operand1, operand2; parameter [1:0] step1 = 2'b00, step2 = 2'b01, step3 = 2'b10; reg [1:0] current_state, next_state; assign c_ram_wr = 1'b0; Digital Design — Chapter 5 — Memories

Digital Design — Chapter 5 — Memories 27 January 2018 Example: Coefficient Multiplier clk) // c RAM - flow through if (c_ram_en) if (c_ram_wr) begin c_RAM[i] <= c_in; c_out <= c_in; end else c_out <= c_RAM[i]; clk) // y register if (y_ce) begin if (!mult_sel) begin operand1 = c_out; operand2 = x_out; end else begin operand1 = x_out; operand2 = y; end y <= operand1 * operand2; end Digital Design — Chapter 5 — Memories

Digital Design — Chapter 5 — Memories 27 January 2018 Example: Coefficient Multiplier clk) // State register // Next-state logic begin // Output logic endmodule Digital Design — Chapter 5 — Memories

Pipelined SSRAM in Verilog
Digital Design — Chapter 5 — Memories 27 January 2018 Pipelined SSRAM in Verilog reg pipelined_en; reg [15:0] pipelined_d_out; ... clk) begin if (pipelined_en) d_out <= pipelined_d_out; pipelined_en <= en; if (en) if (wr) begin data_RAM([a] <= d_in; pipelined_d_out <= d_in; end else pipelined_d_out <= data_RAM[a]; end output register SSRAM Digital Design — Chapter 5 — Memories

Generating SSRAM Components
Digital Design — Chapter 5 — Memories 27 January 2018 Generating SSRAM Components Variations on SSRAM behavior E.g., write-first, read-first or no-change on write cycle Burst accesses to successive locations Not all synthesis tools recognize the same templates Use a RAM core generator tool Digital Design — Chapter 5 — Memories

Example: RAM Core Generator
Digital Design — Chapter 5 — Memories 27 January 2018 Example: RAM Core Generator Digital Design — Chapter 5 — Memories

27 January 2018 Multiport Memories Multiple address, data and control connections to the storage locations Allows concurrent accesses Avoids multiplexing and sequencing Scenario Data producer and data consumer What if two writes to a location occur concurrently? Result may be unpredictable Some multi-port memories include an arbiter Digital Design — Chapter 5 — Memories

27 January 2018 FIFO Memories First-In/First-Out buffer Connecting producer and consumer Decouples rates of production/consumption Producer subsystem Consumer subsystem FIFO Implementation using dual-port RAM Circular buffer Full: write-addr = read-addr Empty: write-addr = read-addr read write Digital Design — Chapter 5 — Memories

Example: FIFO Datapath
Digital Design — Chapter 5 — Memories 27 January 2018 Example: FIFO Datapath Equal = full or empty Need to distinguish between these states — How? Digital Design — Chapter 5 — Memories

27 January 2018 Example: FIFO Control Control FSM → filling when write without concurrent read → emptying when without concurrent write Unchanged when concurrent write and read full = filling and equal empty = emptying and equal wr_en, rd_en Digital Design — Chapter 5 — Memories

Multiple Clock Domains
Digital Design — Chapter 5 — Memories 27 January 2018 Multiple Clock Domains Need to resynchronize data that traverses clock domains Use resynchronizing registers May overrun if sender's clock is faster than receiver's clock FIFO smooths out differences in data flow rates Latch cells inside FIFO RAM written with sender's clock, read with receiver's clock Digital Design — Chapter 5 — Memories

27 January 2018 Dynamic RAM (DRAM) Data stored in a 1-transistor/1-capacitor cell Smaller cell than SRAM, so more per chip But longer access time Write operation pull bit-line high or low (0 or 1) activate word line Read operation precharge bit-line to intermediate voltage activate word line, and sense charge equalization rewrite to restore charge Digital Design — Chapter 5 — Memories

27 January 2018 DRAM Refresh Charge on capacitor decays over time Need to sense and rewrite periodically Typically every cell every 64ms Refresh each location DRAMs organized into banks of rows Refresh whole row at a time Can’t access while refreshing Interleave refresh among accesses Or burst refresh every 64ms Digital Design — Chapter 5 — Memories

Read-Only Memory (ROM)
Digital Design — Chapter 5 — Memories 27 January 2018 Read-Only Memory (ROM) For constant data, or CPU programs Masked ROM Data manufactured into the ROM Programmable ROM (PROM) Use a PROM programmer Erasable PROM (EPROM) UV erasable Electrically erasable (EEPROM) Flash RAM Digital Design — Chapter 5 — Memories

27 January 2018 Combinational ROM A ROM maps address input to data output This is a combinational function! Specify using a table Example: 7-segment decoder Address Content 6 1 7 2 8 3 9 4 10–15 5 16–31 Digital Design — Chapter 5 — Memories

Example: ROM in Verilog
Digital Design — Chapter 5 — Memories 27 January 2018 Example: ROM in Verilog module seven_seg_decoder ( output reg [7:1] seg, input [3:0] bcd, input blank ); case ({blank, bcd}) 'b00000: seg = 7'b ; // 'b00001: seg = 7'b ; // 'b00010: seg = 7'b ; // 'b00011: seg = 7'b ; // 'b00100: seg = 7'b ; // 'b00101: seg = 7'b ; // 'b00110: seg = 7'b ; // 'b00111: seg = 7'b ; // 'b01000: seg = 7'b ; // 'b01001: seg = 7'b ; // 'b01010, 5'b01011, 5'b01100, 'b01101, 5'b01110, 5'b01111: seg = 7'b ; // "-" for invalid code default: seg = 7'b ; // blank endcase endmodule Digital Design — Chapter 5 — Memories

27 January 2018 Flash RAM Non-volatile, readable (relatively fast), writable (relatively slow) Storage partitioned into blocks Erase a whole block at a time, then write/read Once a location is written, can't rewrite until erased NOR Flash Can write and read individual locations Used for program storage, random-access data NAND Flash Denser, but can only write and read block at a time Used for bulk data, e.g., cameras, memory sticks Digital Design — Chapter 5 — Memories

27 January 2018 Memory Errors Bits in memory can be flipped Hard error The chip is broken E.g., manufacturing defect, wear (in Flash) Soft error Stored data corrupted, but cell still works E.g., from atmospheric neutrons Soft-error rate frequency of occurrence Digital Design — Chapter 5 — Memories

Error Detection using Parity
Digital Design — Chapter 5 — Memories 27 January 2018 Error Detection using Parity Add a parity bit to each location On write access compute data parity and store with data On read access check parity, take exception on error If we could tell which bit flipped correct by flipping it back, then write back to memory location Can’t do this with parity Digital Design — Chapter 5 — Memories

Error-Correcting Codes (ECC)
Digital Design — Chapter 5 — Memories 27 January 2018 Error-Correcting Codes (ECC) Allow identification of the flipped bit Hamming Codes E.g., for single-bit-error correction of N-bit word, need log2N + 1 extra bits Example: 8-bit word, d1... d8 12-bit ECC code, e1...e12 e1, e2, e4, e8 are check bits, the rest data Digital Design — Chapter 5 — Memories

27 January 2018 Hamming Code Example e1 = e3 ⊕ e5 ⊕ e7 ⊕ e9 ⊕ e11 e2 = e3 ⊕ e6 ⊕ e7 ⊕ e10 ⊕ e11 e1 1 e2 e4 e8 e3 e5 e6 e7 e9 e10 e11 e12 e4 = e5 ⊕ e6 ⊕ e7 ⊕ e12 e8 = e9 ⊕ e10 ⊕ e11 ⊕ e12 Every data bit covered by two or more check bits On write: Compute check bits and store with data Digital Design — Chapter 5 — Memories

27 January 2018 Hamming Code Example On read: Recompute check bits and XOR with read check bits result called the syndrome 0000 => no error If data bit flipped covering bits of syndrome are 1 = binary code of flipped ECC bit If stored check bit flipped that bit of syndrome is 1 On error, unflip bit and rewrite memory location e1 1 e2 e4 e8 e3 e5 e6 e7 e9 e10 e11 e12 Digital Design — Chapter 5 — Memories

Multiple-Error Detection
Digital Design — Chapter 5 — Memories 27 January 2018 Multiple-Error Detection What if two bits flip syndrome identifies wrong bit, or is invalid One extra check bit allows single-error correction, double-error detection N Single-bit correction Double-bit detection Check bits Overhead 8 4 50% 5 63% 16 31% 6 38% 32 19% 7 22% 64 11% 13% 128 6.3% 9 7.0% 256 3.5% 10 3.9% Digital Design — Chapter 5 — Memories

27 January 2018 Summary Memory: addressable storage locations Read and Write operations Asynchronous RAM Synchronous RAM (SSRAM) Dynamic RAM (DRAM) Read-Only Memory (ROM) and Flash Multiport RAM and FIFOs Error Detection and Correction Hamming Codes Digital Design — Chapter 5 — Memories

Digital Design — Chapter 6 — Implementation Fabrics 27 January 2018 Digital Design: An Embedded Systems Approach Using Verilog Chapter 6 Implementation Fabrics Portions of this work are from the book, Digital Design: An Embedded Systems Approach Using Verilog, by Peter J. Ashenden, published by Morgan Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.

Digital Design — Chapter 6 — Implementation Fabrics
27 January 2018 Integrated Circuits Early digital circuits Relays, vacuum tubes, discrete transistors Integrated circuits (ICs, or “chips”) Manufacture of multiple transistors and connections on surface of silicon wafer Invented in 1958: Jack Kilby at Texas Instruments (TI) Rapid growth since then, and ongoing Digital Design — Chapter 6 — Implementation Fabrics

IC Manufacture: Wafers
Digital Design — Chapter 6 — Implementation Fabrics 27 January 2018 IC Manufacture: Wafers Start with ingot of pure silicon Saw into wafers & polish Early wafers: 50mm Now 300mm Digital Design — Chapter 6 — Implementation Fabrics

IC manufacture: Processing
Digital Design — Chapter 6 — Implementation Fabrics 27 January 2018 IC manufacture: Processing Chemical processing steps based on photolithography Ion implantation Etching a deposited film SiO2, polysilicon, metal Digital Design — Chapter 6 — Implementation Fabrics

IC Manufacture: Test & Packaging
Digital Design — Chapter 6 — Implementation Fabrics 27 January 2018 IC Manufacture: Test & Packaging Defects cause some ICs to fail Test to identify which ICs don’t work Discard them when wafer is broken into chips Their cost is amortized over working chips Yield depends (in part) on IC area Constrain area to reduce final IC cost Working chips are packaged and tested further Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 Exponential Trends Circuit size and complexity depends on minimum feature size Which depends on manufacturing process Mask resolution, wavelength of light Process nodes (ITRS Roadmap) 350nm (1995), 250nm (1998), 180nm (2000), 130nm (2002), 90nm (2004), 65nm (2007), 45nm (2010), 32nm (2013), 22nm (2016), 16nm (2019) Smaller feature size  denser, faster Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 SSI and MSI In 1964, TI introduced 5400/7400 family of TTL ICs Other manufacturers followed, making 7400 family a de facto standard Small-scale integrated (SSI) 7400: 4 × NAND gate 7427: 4 × NOR gate 7474: 2 × D flip-flop … Medium-scale integrated (MSI) 7490: 4-bit counter 7494: 4-bit shift reg … Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 Other Logic Families Variations on electrical characteristics 74L… : low power 74S… : Schottky diodes  fast switching 74LS… : compromise between speed and power 74ALS… : advances low-power Schottky 74F… : fast CMOS families 4000 family: very low power, 3–15V 74HC…, 74AHC… : TTL compatible Digital Design — Chapter 6 — Implementation Fabrics

Large Scale Integration
Digital Design — Chapter 6 — Implementation Fabrics 27 January 2018 Large Scale Integration 1970s: LSI (thousands of transistors) Small microprocessors became feasible Custom LSI chips for high-volume applications SSI/MSI mainly used for glue logic Later additions to 74xx… families oriented toward glue-logic and interfacing E.g., multibit tristate drivers, registers Other functions supplanted by PLDs Digital Design — Chapter 6 — Implementation Fabrics

MSI Example: Counter/Display
Digital Design — Chapter 6 — Implementation Fabrics 27 January 2018 MSI Example: Counter/Display 74LS390: dual decade counter 74LS08 glue 74LS47: 7-segment decoder Digital Design — Chapter 6 — Implementation Fabrics

MSI Example: Counter/Display
Digital Design — Chapter 6 — Implementation Fabrics 27 January 2018 MSI Example: Counter/Display Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 VLSI and ASICs 1980s: Very Large Scale Integration Then ULSI, then what? VLSI now just means IC design Application-specific ICs (ASICs) Enabled by CAD tools, foundry services Often designed for a range of related products in a market segment Application-specific standard products (ASSPs) E.g., cell phone ICs Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 ASIC Economics ASIC has lower unit cost than an FPGA But more design/verification effort Higher non-recurring engineering (NRE) cost Amortized over production run ASICs make sense for high volumes Full custom Design each transistor and wire High NRE, but best performance & least area Standard cell Use basic components from a foundry’s library Digital Design — Chapter 6 — Implementation Fabrics

Programmable Logic Devices (PLDs)
Digital Design — Chapter 6 — Implementation Fabrics 27 January 2018 Programmable Logic Devices (PLDs) PLDs can be programmed after manufacture to vary their function C.f. fixed-function SSI/MSI ICs and ASICs Higher unit cost than ASIC But lower NRE Ideal for low to medium product volumes Digital Design — Chapter 6 — Implementation Fabrics

Programmable Array Logic (PALs)
Digital Design — Chapter 6 — Implementation Fabrics 27 January 2018 Programmable Array Logic (PALs) Introduced by Monolithic Memories Inc in 1970s First widely-used PLDs Programmed by blowing fusible links in the circuit Use a special programming instrument PAL16L8 16 inputs, 8 active-low outputs PAL16R8 16 inputs, 8 registered outputs Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 PAL16L8 I8 · I10 I1 · I2 + I3 · I10 Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 PAL16R8 Output Circuit Feedback path is useful for implementing FSMs Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 Designing with PALs Useful even for simple circuits Single package solution lowers cost Describe function using Boolean equations In HDL, or simple language such as ABEL Synthesize to fuse map file used by programming instrument If design doesn’t fit Partition into multiple PALs or use a more complex PLD Digital Design — Chapter 6 — Implementation Fabrics

Generic Array Logic (GALs)
Digital Design — Chapter 6 — Implementation Fabrics 27 January 2018 Generic Array Logic (GALs) Programmable Output Logic Macrocells (OLMCs) Use EEPROM technology E.g., GAL22V10 Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 Complex PLDs (CPLDs) Cramming multiple PALs into an IC Programmable interconnection network Use flash RAM technology to store configuration Embedded PAL Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 FPGAs Field Programmable Gate Arrays Smaller logic blocks, embedded SRAM Thousands or millions of equivalent gates Programmable interconnect Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 Logic Block Example Xilinx FPGA Logic Blocks Lookup Tables (LUTs) plus flip-flops E.g., Spartan-II Too complex to program LBs manually Let synthesis tools map HDL code to LBs and program the interconnect Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 I/O Blocks Typically allow for registered or combinational input/output, plus tristates Programmable logic levels, slew rate, input threshold, … Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 Platform FPGAs Include embedded cores for special applications Processor cores Signal processing arithmetic cores Network interface cores Embedded software can run from SRAM in the FPGA Single-chip solution, reduces cost Avoids high NRE of ASIC Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 Structured ASICs Array of very simple logic elements Not programmable, no programmable interconnect Customized by designing top metal interconnection layer(s) Lower NRE than full ASIC design Performance close to full ASIC May become popular for mid-volume applications Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 IC Packages ICs are encapsulated in protective packages External pins for connected to circuit board Bond-wires or flip-chip connections Digital Design — Chapter 6 — Implementation Fabrics

Printed Circuit Boards (PCBs)
Digital Design — Chapter 6 — Implementation Fabrics 27 January 2018 Printed Circuit Boards (PCBs) Layers of conducting wires (copper) between insulating material (fiberglass) Manufactured using photolithography and etching Wires interconnect ICs and other components External connections to other system components Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 Through-Hole PCBs IC package pins pass through drilled holes Soldered to PCB wires that join the hole Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 Surface Mount PCB IC package pins soldered to wires on PCB surface Packages and PCB features are generally smaller than through-hole Digital Design — Chapter 6 — Implementation Fabrics

Multichip Modules (MCMs)
Digital Design — Chapter 6 — Implementation Fabrics 27 January 2018 Multichip Modules (MCMs) Several ICs on a ceramic carrier Can also include thin-film passives and discrete components External connections for PCB mounting Ideal for high-density applications E.g., cell phones Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 Signal Integrity Signals propagate over bond wires, package pins, PCB traces Various effects cause distortion and noise Signal integrity: minimizing these effects Propagation delay in PCB trace ≈½c  ≈150mm/ns If two traces differ in length Skew at arrival point can be significant Careful PCB design needed Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 Ground Bounce Transient current flows when an output switches logic level Parasitic inductance causes voltage shift on power supply & ground signals Spikes on other drivers Threshold shift on receivers Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 Minimizing Bounce Bypass capacitors between ground and +V 0.01µF – 0.1µF, close to package pins Separate PCB planes for ground and +V Limit output slew rate Trade off against propagation delay Digital Design — Chapter 6 — Implementation Fabrics

Transmission Line Effects
Digital Design — Chapter 6 — Implementation Fabrics 27 January 2018 Transmission Line Effects Occur when rise time is comparable to path delay Reflections interfere with transitions, resulting in under/overshoot and ringing Can cause false/multiple switching Use PCB layout techniques to minimize effects Digital Design — Chapter 6 — Implementation Fabrics

Electromagnetic Interference
Digital Design — Chapter 6 — Implementation Fabrics 27 January 2018 Electromagnetic Interference Transitions cause electromagnetic fields Energy radiated from PCB traces Induces noise in other systems Subject to regulation Crosstalk Radiation to other traces in the system Particularly adjacent parallel traces PCB layout and slew-rate limiting can minimize both Digital Design — Chapter 6 — Implementation Fabrics

Differential Signaling
Digital Design — Chapter 6 — Implementation Fabrics 27 January 2018 Differential Signaling Reduces susceptibility to noise Transmit a signal (SP) and negation (SN) At receiver, sense difference between them SP – SN Noise induced on both SP and SN (SP + VNoise) – (SN + VNoise) = SP – SN Digital Design — Chapter 6 — Implementation Fabrics

27 January 2018 Summary Exponential improvements in IC manufacturing SSI and MSI TTL logic families ASICs: full-custom and standard cell PALs, CPLDs, FPGAs, platform FPGAs IC packages for PCB assembly Through-hole and surface mount Signal integrity Digital Design — Chapter 6 — Implementation Fabrics

Digital Design — Chapter 7 — Processor Basics 27 January 2018 Digital Design: An Embedded Systems Approach Using Verilog Chapter 7 Processor Basics Portions of this work are from the book, Digital Design: An Embedded Systems Approach Using Verilog, by Peter J. Ashenden, published by Morgan Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.

Digital Design — Chapter 7 — Processor Basics
27 January 2018 Embedded Computers A computer as part of a digital system Performs processing to implement or control the system’s function Components Processor core Instruction and data memory Input, output, and input/output controllers For interacting with the physical world Accelerators High-performance circuit for specialized functions Interconnecting buses Digital Design — Chapter 7 — Processor Basics

27 January 2018 Memory Organization Von Neumann architecture Single memory for instructions and data Harvard architecture Separate instruction and data memories Most common in embedded systems Digital Design — Chapter 7 — Processor Basics

27 January 2018 Bus Organization Single bus for low-cost low-performance systems Multiple buses for higher performance Digital Design — Chapter 7 — Processor Basics

27 January 2018 Microprocessors Single-chip processor in a package External connections to memory and I/O buses Most commonly seen in general purpose computers E.g., Intel Pentium family, PowerPC, … Digital Design — Chapter 7 — Processor Basics

27 January 2018 Microcontrollers Single chip combining Processor A small amount of instruction/data memory I/O controllers Microcontroller families Same processor, varying memory and I/O 8-bit microcontrollers Operate on 8-bit data Low cost, low performance 16-bit and 32-bit microcontrollers Higher performance Digital Design — Chapter 7 — Processor Basics

27 January 2018 Processor Cores Processor as a component in an FPGA or ASIC In FPGA, can be a fixed-function block E.g., PowerPC cores in some Xilinx FPGAs Or can be a soft core Implemented using programmable resources E.g., Xilinx MicroBlaze, Altera Nios-II In ASIC, provided as an IP block E.g., ARM, PowerPC, MIPS, Tensilica cores Can be customized for an application Digital Design — Chapter 7 — Processor Basics

Digital Signal Processors
Digital Design — Chapter 7 — Processor Basics 27 January 2018 Digital Signal Processors DSPs are processors optimized for signal processing operations E.g., audio, video, sensor data; wireless communication Often combined with a conventional core for processing other data Heterogeneous multiprocessor Digital Design — Chapter 7 — Processor Basics

27 January 2018 Instruction Sets A processor executes a program A sequence of instructions, each performing a small step of a computation Instruction set: the repertoire of available instructions Different processor types have different instruction sets High-level languages: more abstract E.g., C, C++, Ada, Java Translated to processor instructions by a compiler Digital Design — Chapter 7 — Processor Basics

Instruction Execution
Digital Design — Chapter 7 — Processor Basics 27 January 2018 Instruction Execution Instructions are encoded in binary Stored in the instruction memory A processor executes a program by repeatedly Fetching the next instruction Decoding it to work out what to do Executing the operation Program counter (PC) Register in the processor holding the address of the next instruction Digital Design — Chapter 7 — Processor Basics

27 January 2018 Data and Endian-ness Instructions operate on data from the data memory Byte: 8-bit data Data memory is usually byte addressed 16-bit, 32-bit, 64-bit words of data Digital Design — Chapter 7 — Processor Basics

27 January 2018 The Gumnut Core A small 8-bit soft core Can be used in FPGA designs Instruction set illustrates features typical of 8-bit cores and processors in general Programs written in assembly language Each processor instruction written explicitly Translated to binary representation by an assembler Resources available on companions web site Digital Design — Chapter 7 — Processor Basics

27 January 2018 Gumnut Storage Digital Design — Chapter 7 — Processor Basics

Arithmetic Instructions
Digital Design — Chapter 7 — Processor Basics 27 January 2018 Arithmetic Instructions Operate on register data and put result in a register add, addc, sub, subc Can have immediate value operand Condition codes Z: 1 if result is zero, 0 if result is non-zero C: carry out of add/addc, borrow out of sub/subc addc and subc include C bit in operation Digital Design — Chapter 7 — Processor Basics

Arithmetic Instructions
Digital Design — Chapter 7 — Processor Basics 27 January 2018 Arithmetic Instructions Examples add r3, r4, r1 add r5, r1, 2 sub r4, r4, 1 Evaluate 2x + 1; x in r3, result in r4 add r4, r4, r3 ; double x add r4, r4, 1 ; then add 1 Digital Design — Chapter 7 — Processor Basics

27 January 2018 Logical Instructions Operate on register data and put result in a register and, or, xor, mask (and not) Operate bitwise on 8-bit operands Can have immediate value operand Condition codes Z: 1 if result is zero, 0 if result is non-zero C: always 0 Digital Design — Chapter 7 — Processor Basics

27 January 2018 Logical Instructions Examples and r3, r4, r5 or r1, r1, 0x80 ; set r1(7) xor r5, r5, 0xFF ; invert r5 Set Z if least-significant 4 bits of r2 are 0101 and r1, r2, 0x0F ; clear high bits sub r0, r1, 0x05 ; compare with 0101 Digital Design — Chapter 7 — Processor Basics

27 January 2018 Shift Instructions Logical shift/rotate register data and put result in a register shl, shr, rol, ror Count specified as a literal operand Condition codes Z: 1 if result is zero, 0 if result is non-zero C: the value of the last bit shifted/rotated past the end of the byte Digital Design — Chapter 7 — Processor Basics

27 January 2018 Shift Instructions Examples shl r4, r1, 3 ror r2, r2, 4 Multiply r4 by 8, ignoring overflow shl r4, r4, 3 Multiply r4 by 10, ignoring overflow shl r1, r4, 1 ; multiply by 2 shl r4, r4, 3 ; multiply by 8 add r4, r4, r1 Digital Design — Chapter 7 — Processor Basics

27 January 2018 Memory Instructions Transfer data between registers and data memory Compute address by adding an offset to a base register value Load register from memory ldm r1, (r2)+5 Store from register to memory stm r1, (r4)-2 Use r0 if base address is 0 ldm r3, 23  ldm r3, (r0)+23 Condition codes not affected Digital Design — Chapter 7 — Processor Basics

27 January 2018 Memory Instructions Increment a 16-bit integer in memory Little-endian: address of lsb in r2, msb in next location ldm r1, (r2) ; increment lsb add r1, r1, 1 stm r1, (r2) ldm r1, (r2)+1 ; increment msb addc r1, r1, 0 ; with carry stm r1, (r2)+1 Digital Design — Chapter 7 — Processor Basics

Input/Output Instructions
Digital Design — Chapter 7 — Processor Basics 27 January 2018 Input/Output Instructions I/O controllers have registers that govern their operation Each has an address, like data memory Gumnut has separate data and I/O address spaces Input from I/O register inp r3, 157  inp r3, (r0)+157 Output to I/O register out r3, (r7)  out r3, (r7)+0 Condition codes not affected Further examples in Chapter 8 Digital Design — Chapter 7 — Processor Basics

27 January 2018 Branch Instructions Programs can evaluate conditions and take alternate courses of action Condition codes (Z, C) represent outcomes of arithmetic/logical/shift instructions Branch instructions examine Z or C bz, bnz, bc, bnc Add a displacement to PC if condition is true Specifies how many instructions forward or backward to skip Counting from instruction after branch Digital Design — Chapter 7 — Processor Basics

27 January 2018 Branch Example Elapsed seconds in location 100 Increment, wrapping to 0 after 59 ldm r1, 100 add r1, r1, 1 sub r0, r1, 60 ; Z set if r1 = 60 bnz ; Skip to store if add r1, r0, 0 ; Z is 0 stm r1, 100 Digital Design — Chapter 7 — Processor Basics

27 January 2018 Jump Instruction Unconditionally skips forward or backward to specified address Changes the PC to the address Example: if r1 = 0, clear data location 100 to 0; otherwise clear location 200 to 0 Assume instructions start at address 10 10: sub r0, r1, 0 11: bnz : stm r0, : jmp : stm r0, : ... Digital Design — Chapter 7 — Processor Basics

27 January 2018 Subroutines A sequence of instructions that perform some operation Can call them from different parts of a program using a jsb instruction Subroutine returns with a ret instruction Digital Design — Chapter 7 — Processor Basics

27 January 2018 Subroutine Example Subroutine to increment second count Address of count in r2 ldm r1, (r2) add r1, r1, 1 sub r0, r1, 60 bnz +1 add r1, r0, 0 stm r1, (r2) ret Call to increment locations 100 and 102 add r2, r0, 100 jsb 20 add r2, r0, 102 jsb 20 Digital Design — Chapter 7 — Processor Basics

27 January 2018 Return Address Stack The jsb saves the return address for use by the ret But what if the subroutine includes a jsb? Gumnut core includes an 8-entry push-down stack of return addresses Digital Design — Chapter 7 — Processor Basics

Miscellaneous Instructions
Digital Design — Chapter 7 — Processor Basics 27 January 2018 Miscellaneous Instructions Instructions supporting interrupts See Chapter 8 reti Return from interrupt enai Enable interrupts disi Disable interrupts wait Wait for an interrupt stby Stand by in low power mode until an interrupt occurs Digital Design — Chapter 7 — Processor Basics

27 January 2018 The Gumnut Assembler Gasm: translates assembly programs Generates memory images for program text (binary-coded instructions) and data See documentation on web site Write a program as a text file Instructions Directives Comments Use symbolic labels Digital Design — Chapter 7 — Processor Basics

27 January 2018 Example Program ; Program to determine greater of value_1 and value_2 text org 0x ; start here on reset jmp main ; Data memory layout data value_1: byte 10 value_2: byte 20 result: bss 1 ; Main program org 0x010 main: ldm r1, value_1 ; load values ldm r2, value_2 sub r0, r1, r ; compare values bc value_2_greater stm r1, result ; value_1 is greater jmp finish value_2_greater: stm r2, result ; value_2 is greater finish: jmp finish ; idle loop Digital Design — Chapter 7 — Processor Basics

Gumnut Instruction Encoding
Digital Design — Chapter 7 — Processor Basics 27 January 2018 Gumnut Instruction Encoding Instructions are a form of information Can be encoded in binary Gumnut encoding 18 bits per instruction Divided into fields representing different aspects of the instruction Opcodes and function codes Register numbers Addresses Digital Design — Chapter 7 — Processor Basics

Gumnut Instruction Encoding
Digital Design — Chapter 7 — Processor Basics 27 January 2018 Gumnut Instruction Encoding Digital Design — Chapter 7 — Processor Basics

27 January 2018 Encoding Examples Encoding for addc r3, r5, 24 Arithmetic immediate, fn = 001 05D18 Instruction encoded by 2ECFC bnc -4 Digital Design — Chapter 7 — Processor Basics

Other Instruction Sets
Digital Design — Chapter 7 — Processor Basics 27 January 2018 Other Instruction Sets 8-bit cores and microcontrollers Xilinx PicoBlaze: like Gumnut 8051, and numerous like it Originated as 8-bit microprocessors Instructions encoded as one or more bytes Instruction set is more complex and irregular Complex instruction set computer (CISC) C.f. Reduced instruction set computer (RISC) 16-, 32- and 64-bit cores Mostly RISC E.g., PowerPC, ARM, MIPS, Tensilica, … Digital Design — Chapter 7 — Processor Basics

Instruction and Data Memory
Digital Design — Chapter 7 — Processor Basics 27 January 2018 Instruction and Data Memory In embedded systems Instruction memory is usually ROM, flash, SRAM, or combination Data memory is usually SRAM DRAM if large capacity needed Processor/memory interfacing Gluing the signals together Digital Design — Chapter 7 — Processor Basics

Example: Gumnut Memory
Digital Design — Chapter 7 — Processor Basics 27 January 2018 Example: Gumnut Memory Digital Design — Chapter 7 — Processor Basics

Digital Design — Chapter 7 — Processor Basics 27 January 2018 Example: Gumnut Memory clk) // Instruction memory if (inst_cyc_o && inst_stb_o) begin inst_dat_i <= inst_ROM[inst_adr_o[10:0]]; inst_ack_i <= 1'b1; end else inst_ack_i <= 1'b0; Digital Design — Chapter 7 — Processor Basics

Digital Design — Chapter 7 — Processor Basics 27 January 2018 Example: Gumnut Memory clk) // Data memory if (data_cyc_o && data_stb_o) if (data_we_o) begin data_RAM[data_adr_o] <= data_dat_o; data_dat_i <= data_dat_o; data_ack_i <= 1'b1; end else begin data_dat_i <= data_RAM[data_adr_o]; data_ack_i <= 1'b1; else data_ack_i <= 1'b0; Digital Design — Chapter 7 — Processor Basics

Example: Microcontroller Memory
Digital Design — Chapter 7 — Processor Basics 27 January 2018 Example: Microcontroller Memory Digital Design — Chapter 7 — Processor Basics

27 January 2018 32-bit Memory Four bytes per memory word Little-endian: lsb at least address Big-endian: msb at least address Partial-word read Read all bytes, processor selects those needed Partial-word write Use byte-enable signals Digital Design — Chapter 7 — Processor Basics

Example: MicroBlaze Memory
Digital Design — Chapter 7 — Processor Basics 27 January 2018 Example: MicroBlaze Memory Digital Design — Chapter 7 — Processor Basics

27 January 2018 Cache Memory For high-performance processors Memory access time is several clock cycles Performance bottleneck Cache memory Small fast memory attached to a processor Stores most frequently accessed items, plus adjacent items Locality: those items are most likely to be accessed again soon Digital Design — Chapter 7 — Processor Basics

27 January 2018 Cache Memory Memory contents divided into fixed-sized blocks (lines) Cache copies whole lines from memory When processor accesses an item If item is in cache: hit - fast access Occurs most of the time If item is not in cache: miss Line containing item is copied from memory Slower, but less frequent May need to replace a line already in cache Digital Design — Chapter 7 — Processor Basics

Fast Main Memory Access
Digital Design — Chapter 7 — Processor Basics 27 January 2018 Fast Main Memory Access Optimize memory for line access by cache Wide memory Read a line in one access Burst transfers Send starting address, then read successive locations Pipelining Overlapping stages of memory access E.g., address transfer, memory operation, data transfer Double data rate (DDR), Quad data rate (QDR) Transfer on both rising and falling clock edges Digital Design — Chapter 7 — Processor Basics

27 January 2018 Summary Embedded computer Processor, memory, I/O controllers, buses Microprocessors, microcontrollers, and processor cores Soft-core processors for ASIC/FPGA Processor instruction sets Binary encoding for instructions Assembly language programs Memory interfacing Digital Design — Chapter 7 — Processor Basics

Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Digital Design: An Embedded Systems Approach Using Verilog Chapter 8 I/O Interfacing Portions of this work are from the book, Digital Design: An Embedded Systems Approach Using Verilog, by Peter J. Ashenden, published by Morgan Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.

I/O Devices and Transducers
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 I/O Devices and Transducers Transducers convert between real-world effects and digital representation Input transducers: sensors May require analog-to-digital converter (ADC) Output transducers: actuators May require digital-to-analog converter (DAC) Human-interface devices Buttons, switches, knobs, keypads, mouse Indicators, displays, speakers Digital Design — Chapter 8 — I/O Interfacing

Digital Design — Chapter 8 — I/O Interfacing
27 January 2018 Kaypads & Keyboards Recall switches and debouncing Keypad: array of push-button switches Digital Design — Chapter 8 — I/O Interfacing

Knobs & Position Encoders
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Knobs & Position Encoders In analog circuits, use a variable resistor In digital circuits, could use pushbuttons E.g., volume up/down Not as easy to use as knobs or sliders Can use a position encoder attached to a knob Recall Gray code encoder Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Incremental Encoder If absolute position is not important, incremental encoder is simpler Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Analog Inputs Physical effect produces an analog voltage or current Microphone In phones, cameras, voice recorders, … Accelerometer In airbag controllers Fluid-flow sensors In industrial machines, coffee machines, … Gas detectors In safety equipment Digital Design — Chapter 8 — I/O Interfacing

Analog-to-Digital Converters
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Analog-to-Digital Converters Basic element: analog comparator Flash ADC Simple, fast, but uses many comparators Resolution Number of output bits Digital Design — Chapter 8 — I/O Interfacing

Successive Approximation ADC
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Successive Approximation ADC Initial approximation: Comparator output gives d7 1 if Vin is higher than , 0 otherwise Next approximation: d Comparator output gives d6 Next approximation: d7d , etc Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 LED Indicators Single LED shows 1-bit state On/off, busy/ready, error/ok, … Brightness depends on current Determined by resistor I = (+V – VLED – VOL) / R Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 7-Segment LED Displays Each digit has common anodes or common cathodes Scan: turn on one digit at a time Digital Design — Chapter 8 — I/O Interfacing

Example: Multiplexed Display
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: Multiplexed Display Four BDC inputs, 10MHz clock Turn on decimal point of leftmost digit only 50Hz scan cycle (200Hz scan clock) module display_mux ( output reg [3:0] anode_n, output [7:0] segment_n, input [3:0] bcd0, bcd1, bcd2, bcd3, input clk, reset ); parameter clk_freq = ; parameter scan_clk_freq = 200; parameter clk_divisor = clk_freq / scan_clk_freq; reg scan_clk; reg [1:0] digit_sel; reg [3:0] bcd; reg [7:0] segment; integer count; Digital Design — Chapter 8 — I/O Interfacing

Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: Multiplexed Display // Divide master clock to get scan clock clk) if (reset) begin count = 0; scan_clk <= 1'b0; end else if (count == clk_divisor - 1) begin count = 0; scan_clk <= 1'b1; end else begin count = count + 1; scan_clk <= 1'b0; end // increment digit counter once per scan clock cycle clk) if (reset) digit_sel <= 2'b00; else if (scan_clk) digit_sel <= digit_sel + 1; Digital Design — Chapter 8 — I/O Interfacing

Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: Multiplexed Display // multiplexer to select a BCD digit case (digit_sel) 'b00: bcd = bcd0; 'b01: bcd = bcd1; 'b10: bcd = bcd2; 'b11: bcd = bcd3; endcase // activate selected digit's anode case (digit_sel) 'b00: anode_n = 4'b1110; 'b01: anode_n = 4'b1101; 'b10: anode_n = 4'b1011; 'b11: anode_n = 4'b0111; endcase Digital Design — Chapter 8 — I/O Interfacing

Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: Multiplexed Display // 7-segment decoder for selected digit case (bcd) 'b0000: segment[6:0] = 7'b ; // 'b0001: segment[6:0] = 7'b ; // 'b0010: segment[6:0] = 7'b ; // 'b0011: segment[6:0] = 7'b ; // 'b0100: segment[6:0] = 7'b ; // 'b0101: segment[6:0] = 7'b ; // 'b0110: segment[6:0] = 7'b ; // 'b0111: segment[6:0] = 7'b ; // 'b1000: segment[6:0] = 7'b ; // 'b1001: segment[6:0] = 7'b ; // default: segment[6:0] = 7'b ; // "-" endcase Digital Design — Chapter 8 — I/O Interfacing

Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: Multiplexed Display // decimal point is only active for digit 3 segment[7] = digit_sel == 2'b11; // segment outputs are negative logic assign segment_n = ~segment; endmodule Digital Design — Chapter 8 — I/O Interfacing

Liquid Crystal Displays (LCDs)
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Liquid Crystal Displays (LCDs) Advantages Low power Readable in bright ambient light conditions Custom segment shapes Disadvantages Require backlight for dark conditions Not as robust as LEDs LCD panels Rectangular array of pixels Can be used for alphanumeric/graphical display Controlled by a small microcontroller Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Actuators & Valves Actuators cause a mechanical effect Solenoid: current in coil moves armature Can attach rods, levers, etc to translate the movement Solenoid valve Armature controls fluid or gas valve Relay Armature controls electrical contacts Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Motors Can provide angular position or speed Use gears, screws, etc to convert to linear position or speed Stepper motors Rotate in discrete steps Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Motors Servo-motors DC motor, speed controlled by varying the drive voltage Use feedback to control the speed or to drive to a desired position Requires a position sensor or tachometer Servo-controller A digital circuit or an embedded processor Compensates for non-ideal mechanical effects Digital Design — Chapter 8 — I/O Interfacing

Digital-to-Analog Converters
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Digital-to-Analog Converters R-string DAC Voltage divider and analog multiplexer Requires 2n precision resistors Digital Design — Chapter 8 — I/O Interfacing

Digital-to-Analog Converters
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Digital-to-Analog Converters R-2R ladder DAC Sums binary-weighted currents Requires 2n matched resistors Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 I/O Controllers An embedded processor needs to access input/output data I/O controller Circuit that connects I/O device to a processor Includes control circuits Input registers: for reading data Output registers: for writing data I/O ports Registers accessible to embedded software Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Simple I/O Controller Just contains input and/or output registers Select among them using a port address module gumnut ( input clk_i, input rst_i, output port_cyc_o, output port_stb_o, output port_we_o, input port_ack_i, output [7:0] port_adr_o, output [7:0] port_dat_o, input [7:0] port_dat_i, ); endmodule Digital Design — Chapter 8 — I/O Interfacing

Example: Keypad Controller
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: Keypad Controller Output register for row drivers Input register for column sensing Digital Design — Chapter 8 — I/O Interfacing

Example: Keypad Controller
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: Keypad Controller module keypad_controller ( input clk_i, input cyc_i, input stb_i, input we_i, output ack_o, input [7:0] dat_i, output reg [7:0] dat_o, output reg [3:0] keypad_row, input [2:0] keypad_col ); reg [2:0] col_synch; clk_i) // Row register if (cyc_i && stb_i && we_i) keypad_row <= dat_i[3:0]; clk_i) begin // Column synchronizer dat_o <= {5'b0, col_synch}; col_synch <= keypad_col; end assign ack_o = cyc_i && stb_i; endmodule Digital Design — Chapter 8 — I/O Interfacing

Control/Status Registers
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Control/Status Registers Control register Contains bits that govern operation of the I/O device Written by processor Status register Contains bits that reflect status of device Read by processor Either or both may be needed in an input or output controller Digital Design — Chapter 8 — I/O Interfacing

Example: ADC Controller
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: ADC Controller Successive approximation ADC 1 × analog input with sample/hold 4 × analog reference voltages Control register Selects reference voltage Hold input voltage & start ADC Status register Is conversion done? Input data register Converted data Digital Design — Chapter 8 — I/O Interfacing

Example: ADC Controller
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: ADC Controller Digital Design — Chapter 8 — I/O Interfacing

Autonomous I/O Controllers
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Autonomous I/O Controllers Independently sequence operation of a device Processor initiates actions Controller notifies processor of events, such as data availability, error condition, … Processor can perform other operations concurrently Device operation not limited by processor performance or load Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Example: LCD Module Rectangular array of pixels Row and column connections Controller scans rows, activates columns Image or character data stored in a small memory in the controller Updated by an attached processor Digital Design — Chapter 8 — I/O Interfacing

Direct Memory Access (DMA)
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Direct Memory Access (DMA) For high-speed input or output Processor writes starting address to a control register Controller transfers data to/from memory autonomously Notifies processor on completion/error Reduces load on processor Common with accelerators Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Parallel Buses Interconnect components in a system Transfer bits of data in parallel Conceptual structure All inputs and output connected In practice Can’t tie multiple outputs together Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Multiplexed Buses Use multiplexer(s) to select among data sources Can partition to aid placement on chip Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Example: Wishbone Bus Non-proprietary bus spec OpenCores Organization Gumnut uses simple form of Wishbone One bus for each of instruction memory, data memory, and I/O ports “…_o” denotes output “…_i” denotes input Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Example: Wishbone Bus Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Tristate Buses Use tristate drivers for data sources Can “turn-off” (Hi-Z) when not supplying data Simplified bus wiring Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Tristate Bus Issues Floating bus can cause spurious switching Use pull-up resistors or weak keepers Need to avoid driver contention Dead cycle between turn-off and turn-on Or delayed enable Not all CAD tools and implementation fabrics support tristate buses Digital Design — Chapter 8 — I/O Interfacing

Tristate Drivers in Verilog
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Tristate Drivers in Verilog Assign Z to an output to turn driver off Example: single-bit driver assign d_out = d_en ? d_in : 1'bZ; Example: multi-bit driver assign bus_o = dat_en ? dat : 8'bZ; Any other driver contributing 0 or 1 overrides Z value Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Example: SN74x16541 Same as wire, but indicates tristate driver module sn74x16541 ( output tri [7:0] y1, y2, input [7:0] a1, a2, input en1_1, en1_2, en2_1, en2_2 ); assign y1 = (~en1_1 & ~en1_2) ? a1 : 8'bz; assign y2 = (~en2_1 & ~en2_2) ? a2 : 8'bz; endmodule Digital Design — Chapter 8 — I/O Interfacing

Unknown Values in Verilog
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Unknown Values in Verilog What if two drivers are turned on? One driving 0, the other driving 1 Resolved value is X — unknown Can test for X during simulation Use === and !== operators C.f. == and !=, which are logical equivalence and inequivalence tests Z and X are not electrical logic levels Notations for simulation and synthesis Real logic levels are only 0 or 1 Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Open-Drain Buses Bus is 0 if any driver pulls it low If all drivers are off, bus is pulled high Wired-AND Can also use open-collector drivers Digital Design — Chapter 8 — I/O Interfacing

Open-Drain Drivers in Verilog
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Open-Drain Drivers in Verilog Assign 0 or 1 to model driver Model pull-up on open-drain bus using wand net wand bus_sig; Resolved value is logical AND of driver values Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Bus Protocols Specification of signals, timing, and sequencing of bus operations Allows independent design of components Ensures interoperability Standard bus protocols PCI, VXI, … For connecting boards in a system AMBA (ARM), CoreConnect (IBM), Wishbone (Open Cores) For connecting blocks within a chip Digital Design — Chapter 8 — I/O Interfacing

Example: Gumnut Wishbone
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: Gumnut Wishbone Minimal 8-bit subset used for I/O ports Signals port_cyc_o: “cycle” control for sequence of port operations port_stb_o: “strobe” control for an operation port_we_o: write enable port_ack_i: acknowledge from addressed port port_adr_o: 8-bit port address port_dat_o: 8-bit data output from Gumnut port_dat_i: 8-bit data input to Gumnut Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Gumnut Wishbone Write No wait cycles One wait cycle Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Gumnut Wishbone Read No wait cycles One wait cycle Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Serial Transmission Bits transmitted one after another on a single wire Can afford to optimize the wire for speed C.f. parallel transmission, one wire per bit Requires more wires Cost per wire, greater area for wiring, complexity of place & route Requires more pins Cost of larger package Other effects Crosstalk, skew, delay due to increased area Serializer/deserializer (serdes) Converts between parallel and serial form Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Example: 64-bit Serdes Bit order is arbitrary, provided both ends agree Often specified by standards Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 NRZ Transmission Non-Return to Zero Just set signal to high or low for each bit time No indication of boundary between bit times Need to synchronize transmitter and receiver separately E.g., by a common clock and control signals, as in previous example Digital Design — Chapter 8 — I/O Interfacing

Start/Stop Bit Synchronization
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Start/Stop Bit Synchronization Hold signal high when there is no data To transmit Drive signal low for one bit time (start bit) Then drive successive data bits Then drive signal high for one bit time (stop bit) Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 UARTs Universal Asynchronous Receiver/Transmitter Common I/O controller for serial transmission using NRZ with start/stop bits Relies on Tx and Rx clocks being approximately the same frequency Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Manchester Encoding Combine Tx clock with Tx data Ensures regular edges in the serial signal Example: Manchester encoding Transition in the middle of each bit time 0: low-to-high transition 1: high-to-low transition May need a transition at the start of a bit time Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Clock Recovery Transmitter sends preamble before data A sequence of encoded 1 bits Serial signal then matches Tx clock Receiver uses a phase-locked loop (PLL) to match Rx clock to Tx clock Digital Design — Chapter 8 — I/O Interfacing

Serial Interface Standards
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Serial Interface Standards Connection of I/O devices to computers Connection of computers in networks Use of standards reduces design effort Reuse off-the-shelf components or IP RS-232: NRZ, start/stop bits Originally for modems, now widely used for low-bandwidth I/O Digital Design — Chapter 8 — I/O Interfacing

Serial Interface Standards
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Serial Interface Standards I2C: Inter-Integrated Circuit bus 2 wires (NRZ data, clock), open drain Simple protocol, low cost, 10kb/s–3.4Mb/s USB: Universal Serial Bus For connecting I/O devices to computers Differential signaling on 2 wires 1.5Mb/s, 12Mb/s, 480Mb/s, …, complex protocl IP blocks available FireWire: IEEE Std 1394 2 differential pairs (data, synch) 400Mb/s, 3.2Gb/s, complex protocol Digital Design — Chapter 8 — I/O Interfacing

I2C Example: Temperature Sensor
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 I2C Example: Temperature Sensor Gumnut, Analog Devices AD7414 I2C controller IP from OpenCores respository Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 I/O Software Use input and output instructions to access I/O controller registers I/O devices interact with the physical world Software must respond to events when they occur It must be able schedule activity at specific times or at regular intervals Real-time behavior Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Polling Software repeatedly reads I/O status to see if an event has occurred If so, it performs the required action Multiple controllers Software executes a polling loop, checking controllers in turn Advantage: simple I/O controllers Disadvantages Processor is continually busy, consuming power Delay in dealing with an event if processor is busy with another event Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Polling Example Safety monitor in factory automation Gumnut core 16 alarm inputs One per bit in registers at addresses 16 & 17 0  ok, 1  abnormal condition Temp sensor ADC at address 20 8-bit binary code for °C Above 50°C is abnormal Alarm output at address 40 0  ok, 1  ring alarm bell Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Polling Example alarm_in_1: equ ; address of alarm_in_1 input register alarm_in_2: equ ; address of alarm_in_2 input register temp_in: equ ; address of temp_in input register alarm_out: equ ; address of alarm_out output register max_temp: equ ; maximum permissible temperature poll_loop: inp r1, alarm_in_ sub r0, r1, bnz set_alarm ; one or more alarm_in_1 bits set inp r1, alarm_in_ sub r0, r1, bnz set_alarm ; one or more alarm_in_2 bits set inp r1, temp_in sub r0, r1, max_temp bnc set_alarm ; temp_in > max_temp out r0, alarm_out ; clear alarm_out jmp poll_loop set_alarm: add r1, r0, out r1, alarm_out ; set alarm_out bit 1 to jmp poll_loop Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Interrupts I/O controller notifies processor when an event occurs Processor interrupts what it was doing Executes interrupt service routine A.k.a. interrupt handler Then resumes interrupted task May enter low-power standby Some systems prioritize interrupt requests Allow higher priority events to interrupt service of lower priority events Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Interrupt Mechanisms Interrupt request signal Means of disabling/enabling interrupts So processor can execute critical regions Save processor state on an interrupt So interrupted task can be resumed On interrupt, disable further interrupts Until processor has saved state Find the handler code for the event Vector: address of handler, or index into table of handler addresses Instruction to return from handler Restoring saved state Digital Design — Chapter 8 — I/O Interfacing

Gumnut Interrupt Mechanisms
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Gumnut Interrupt Mechanisms int_req signal disi and enai instructions On interrupt, PC, Z, and C saved in special registers On interrupt, further interrupts are disabled Handler code starts at address 1 Gumnut sets PC to 1 reti instruction Restores PC, Z, and C from special registers, re-enables interrupts Digital Design — Chapter 8 — I/O Interfacing

Interrupt Acknowledgment
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Interrupt Acknowledgment Process may not respond immediately But must tell controller when it does Controller then deactivates request To avoid multiple interrupts for one event Processor acknowledges the request E.g., int_ack signal on Gumnut Alternative: reading a status register Digital Design — Chapter 8 — I/O Interfacing

Example: Sensor Controller
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: Sensor Controller 8-bit input from sensor Interrupt request on change of value Digital Design — Chapter 8 — I/O Interfacing

Example: Sensor Handler
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: Sensor Handler data saved_r1: bss 1 text sensor_data: equ ; address of sensor data ; input register org stm r1, saved_r inp r1, sensor_data ; process the data ldm r1, saved_r reti Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Timers Real-time clock (RTC) Generates periodic interrupts Uses a counter to divide system clock Control register for divisor Interrupt handler can perform periodic tasks E.g., activate next digit of a scanned display Digital Design — Chapter 8 — I/O Interfacing

Example: RTC for Gumnut
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: RTC for Gumnut 10µs timebase, divided by a down counter Initial count loaded from a register Interrupt triggered on count value = 0 Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Real-Time Executives Control program A.k.a. real-time operating system (RTOS) Timing based on a real-time clock Schedules execution of tasks In response to interrupts and timer events Can also manage other resources Memory allocation Storage (file system) Use of I/O controllers Use of accellerators Digital Design — Chapter 8 — I/O Interfacing

Example: Gumnut Executive
Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: Gumnut Executive RTC based at address 16 Calls task_2ms every 2ms ;;; ;;; Program reset: jump to main program text org jmp main ;;; ;;; Port addresses rtc_start_count: equ ; data output register rtc_count_value: equ ; data input register rtc_int_enable: equ ; control output register rtc_int_status: equ ; status input register Digital Design — Chapter 8 — I/O Interfacing

Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: Gumnut Executive ;;; ;;; init_interrupts: Initialize 2ms periodic interrupt, etc. data rtc_divisor: equ ; divide 100kHz down ; to 500Hz rtc_int_flag: bss 1 text init_interrupts: add r1, r0, rtc_divisor out r1, rtc_start_count add r1, r0, out r1, rtc_int_enable stm r0, rtc_int_flag ; other initializations ret Digital Design — Chapter 8 — I/O Interfacing

Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: Gumnut Executive ;;; ;;; Interrupt handler data int_r1: bss ; save location for ; handler registers text org 1 int_handler: stm r1, int_r1 ; save registers check_rtc: inp r1, rtc_status ; check for ; RTC interrupt sub r0, r1, bz check_next add r1, r0, stm r1, rtc_int_flag ; tell main ; program check_next: int_end: ldm r1, int_r1 ; restore registers reti Digital Design — Chapter 8 — I/O Interfacing

Digital Design — Chapter 8 — I/O Interfacing 27 January 2018 Example: Gumnut Executive ;;; ;;; main program text main: jsb init_interrupts enai main_loop: stby ldm r1, rtc_int_flag sub r0, r1, bnz main_next jsb task_2ms stm r0, rtc_int_flag main_next: jmp main_loop Note: task_2ms not called as part of interrupt handler Would slow down response to other interrupts Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Summary Transducers: sensors and actuators Analog-to-digital and digital-to-analog coverters Input and output devices Controllers Input, output, control, and status registers Autonomous controllers Buses: multiplexed, tristate, open-drain Bus protocols: signals, timing, operations Digital Design — Chapter 8 — I/O Interfacing

27 January 2018 Summary Serial transmission NRZ, embedded clock Real-time software Reacting to I/O and timer events Polling, interrupts Real-time executives Digital Design — Chapter 8 — I/O Interfacing

Digital Design — Chapter 9 — Accelerators 27 January 2018 Digital Design: An Embedded Systems Approach Using Verilog Chapter 9 Accelerators Portions of this work are from the book, Digital Design: An Embedded Systems Approach Using Verilog, by Peter J. Ashenden, published by Morgan Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.

Performance and Parallelism
Digital Design — Chapter 9 — Accelerators 27 January 2018 Performance and Parallelism A processor core performs steps in sequence Performance limited by the instruction rate Accelerating performance Perform steps in parallel Takes less time overall to complete an operation Instruction-level parallelism Within a processor core Pipelining, multiple-issue Accelerators Custom hardware for parallel operations Digital Design — Chapter 9 — Accelerators

Achievable Parallelism
Digital Design — Chapter 9 — Accelerators 27 January 2018 Achievable Parallelism How many steps can be performed at once? Regularly structured data Independent processing steps Examples Video and image pixel processing Audio or sensor signal processing Constrained by data dependencies Operations that depend on results of previous steps Digital Design — Chapter 9 — Accelerators

Digital Design — Chapter 9 — Accelerators
27 January 2018 Algorithm Kernels Algorithm: specification of the required processing steps Often expressed in a programming language Kernel: the part that involves the most intensive, repetitive processing “10% of operations take 90% of the time” Accelerating a kernel with parallel hardware gives the best payback Digital Design — Chapter 9 — Accelerators

27 January 2018 Amdahl’s Law Time for an algorithm is t Fraction f is spent on a kernel Accelerator speeds up kernel by a factor s Overall speedup factor s' For large f, s'  s For small f, s'  1 Digital Design — Chapter 9 — Accelerators

27 January 2018 Amdahl’s Law Example An algorithm with two kernels Kernel 1: 80% of time, can be sped up 10 times Kernel 2: 15% of time, can be sped up 100 times Which speedup gives best overall improvement? For kernel 1: For kernel 2: Digital Design — Chapter 9 — Accelerators

Parallel Architectures
Digital Design — Chapter 9 — Accelerators 27 January 2018 Parallel Architectures An architecture for an accelerator specifies Processing blocks Data flow between them Parallelism through replication Multiple identical block operating on different data elements Works well when elements can be processed independently Digital Design — Chapter 9 — Accelerators

Parallel Architectures
Digital Design — Chapter 9 — Accelerators 27 January 2018 Parallel Architectures Parallelism through pipelining Break a computation into steps, performs them in assembly-line fashion Latency (time to complete a single operation) is not increased Throughput (rate of completion of operations) is increased Ideally by a factor equal to the number of pipeline stages Digital Design — Chapter 9 — Accelerators

Direct Memory Access (DMA)
Digital Design — Chapter 9 — Accelerators 27 January 2018 Direct Memory Access (DMA) Input/Output data for accellerators must be transferred at high speed Using the processor would be too slow Direct memory access I/O controller and accellerator transfer data to and from memory autononously Program supplies starting address and length Digital Design — Chapter 9 — Accelerators

27 January 2018 Bus Arbitration Bus masters take turns to use bus to access slaves Controlled by a bus arbiter Arbitration policies Priority, round-robin, … Digital Design — Chapter 9 — Accelerators

Block-Processing Accelerator
Digital Design — Chapter 9 — Accelerators 27 January 2018 Block-Processing Accelerator Data arranged in regular groups of contiguous memory locations Accelerator works block by block E.g., images in blocks of 8 × 8 × 16-bit pixels Datapath comprises Memory access: address generation, counters Computation section Control section: finite-state machine(s) Digital Design — Chapter 9 — Accelerators

Stream-Processing Accelerator
Digital Design — Chapter 9 — Accelerators 27 January 2018 Stream-Processing Accelerator Streams of data from an input source E.g., high-speed sensors Digital signal processing (DSP) Analog sensor signal converted to stream of digital sample values Filtering, gain/attenuation, frequency-domain conversion (Fourier transform) Digital Design — Chapter 9 — Accelerators

Processor/Accelerator Interface
Digital Design — Chapter 9 — Accelerators 27 January 2018 Processor/Accelerator Interface Embedded software controls an accelerator Providing control parameters Synchronizing operations Input/output registers and interrupts Interact with the control sequencer Digital Design — Chapter 9 — Accelerators

Case Study: Edge Detection
Digital Design — Chapter 9 — Accelerators 27 January 2018 Case Study: Edge Detection Illustration of accelerator design Edge detection in video processing Identify where image intensity changes abruptly Typically at the boundary of objects First step in identifying objects in a scene Application areas Video surveillance, computer vision, … For this case study Monochrome images of 640 × 480 × 8-bit pixels Stored row-by-row in memory Pixel values: 0 (black) – 255 (white) Digital Design — Chapter 9 — Accelerators

27 January 2018 Sobel Edge Detection Compute derivatives of intensity in x and y directions Look for minima and maxima (where intensity changes most rapidly) Digital Design — Chapter 9 — Accelerators

27 January 2018 The Sobel Algorithm Use convolution to approximate partial derivatives Dx and Dy at each position Weighted sum of value of a pixel and its eight nearest neighbors Coefficients represented using a 3×3 convolution mask Sobel masks for x and y derivatives –1 +1 –2 +2 +1 +2 –1 –2 Digital Design — Chapter 9 — Accelerators

27 January 2018 The Sobel Algorithm Combine partial derivatives Since we just want maxima and minima in magnitude, approximate as: Edge pixels don’t have eight neighbors Skip computation of |D| for edges Just set them to 0 using software Digital Design — Chapter 9 — Accelerators

The Algorithm in Pseudocode
Digital Design — Chapter 9 — Accelerators 27 January 2018 The Algorithm in Pseudocode for (row = 1; row <= 478; row = row + 1) begin for (col = 1; col <= 638; col = col + 1) begin sumx = 0; sumy = 0; for (i = –1; i <= +1; i = i + 1) begin for (j = –1; j <= +1; j = j + 1) begin sumx = sumx + 0[row+i][col+j] * Gx[i][j]; sumy = sumy + 0[row+i][col+j] * Gy[i][j]; end D[row][col] = abs(sumx) + abs(sumy); Digital Design — Chapter 9 — Accelerators

27 January 2018 Data Formats and Rates Pixel values: 0 to 255 (8 bits) Coefficients are 0, ±1 and ±2 Partial products: –510 to +510 (10 bits) Dx and Dy: –1020 to (11 bits) |D|: 0 to 2040 (11 bits) Final pixel value: scale back to 8 bits Video rate: 30 frames/sec 640 × 480 = 307,200 pixels 307,200 × 30  10 million pixels/sec Digital Design — Chapter 9 — Accelerators

27 January 2018 Data Dependencies Pixels can be computed independently For each pixel: Digital Design — Chapter 9 — Accelerators

27 January 2018 System Architecture Data dependencies suggest a pipeline Coefficient multiplies are simple shift/negate, so merge with adder stage Digital Design — Chapter 9 — Accelerators

27 January 2018 Memory Bandwidth Assume memory read/write takes 20ns (2 cycles of 100MHz clock) Memory is 32-bits wide, byte addressable Bandwidth = 50M operations/sec Camera produces 10Mpixels/sec Accelerator needs to process at this rate (8 reads + 1 write) × 10Mpixel/sec = 90M operations/sec Greater than memory bandwidth Digital Design — Chapter 9 — Accelerators

27 January 2018 Memory Bandwidth Read 4 pixels at once from each of previous, current, and next rows Store in accelerator to compute multiple derivative image pixels Produce derivative pixels row-by-row, left-to-right Read 3 × 32-bit words for every 4th derivative pixel computed Write 4 pixels at a time (3 reads + 1 write) / 4 × 10Mpixel/sec = 10M operations/sec = 20% of available memory bandwidth Digital Design — Chapter 9 — Accelerators

Sobel Accelerator Architecture
Digital Design — Chapter 9 — Accelerators 27 January 2018 Sobel Accelerator Architecture Digital Design — Chapter 9 — Accelerators

27 January 2018 Accelerator Sequence Steady state Write 4 result pixels Read 4 pixels for previous, current, next rows Compute for 4 cycles Repeat… Start of row Omit writes until pipeline full End of row Omit reads to drain pipeline Digital Design — Chapter 9 — Accelerators

Memory Operation Timing
Digital Design — Chapter 9 — Accelerators 27 January 2018 Memory Operation Timing Steady state Digital Design — Chapter 9 — Accelerators

27 January 2018 Pixel Datapath // Computation datapath signals reg [31:0] prev_row, curr_row, next_row; reg [7:0] O [-1:+1][-1:+1]; reg signed [10:0] Dx, Dy, D; reg [7:0] abs_D; reg [31:0] result_row; ... // Computational datapath clk_i) // Previous row register if (prev_row_load) prev_row <= dat_i; else if (shift_en) prev_row[31:8] <= prev_row[23:0]; ... // Current row register ... // Next row register function [10:0] abs (input signed [10:0] x); abs = x >= 0 ? x : -x; endfunction ... Digital Design — Chapter 9 — Accelerators

27 January 2018 Pixel Datapath clk_i) // Computation pipeline if (shift_en) begin D = abs(Dx) + abs(Dy); abs_D <= D[10:3]; Dx <= - $signed({3'b000, O[-1][-1]}) $signed({3'b000, O[-1][+1]}) ($signed({3'b000, O[ 0][-1]}) << 1) ($signed({3'b000, O[ 0][+1]}) << 1) $signed({3'b000, O[+1][-1]}) $signed({3'b000, O[+1][+1]}); Dy <= $signed({3'b000, O[-1][-1]}) ($signed({3'b000, O[-1][ 0]}) << 1) $signed({3'b000, O[-1][+1]}) $signed({3'b000, O[+1][-1]}) ($signed({3'b000, O[+1][ 0]}) << 1) $signed({3'b000, O[+1][+1]}); Digital Design — Chapter 9 — Accelerators

27 January 2018 Pixel Datapath O[-1][-1] <= O[-1][0]; O[-1][ 0] <= O[-1][+1]; O[-1][+1] <= prev_row[31:24]; O[ 0][-1] <= O[0][ 0]; O[ 0][ 0] <= O[0][+1]; O[ 0][+1] <= curr_row[31:24]; O[+1][-1] <= O[+1][ 0]; O[+1][ 0] <= O[+1][+1]; O[+1][+1] <= next_row[31:24]; end clk_i) // Result row register if (shift_en) result_row <= {result_row[23:0], abs_D}; Digital Design — Chapter 9 — Accelerators

27 January 2018 Address Generation Given an image in memory at base address B Address for pixel in row r, column c is B + r × c Base address (B) is fixed Offset (r × c) increments by 4 for each group of 4 pixels read/written Use word-aligned addresses Two least-significant bits always 00 Increment word address by 1 Digital Design — Chapter 9 — Accelerators

27 January 2018 Address Generation Digital Design — Chapter 9 — Accelerators

27 January 2018 Address Generation clk_i) // O base address register if (O_base_ce) O_base <= dat_i[21:2]; clk_i) // O address offset counter if (offset_reset) O_offset <= 0; else if (O_offset_cnt_en) O_offset <= O_offset + 1; clk_i) // D base address register if (D_base_ce) D_base <= dat_i[21:2]; clk_i) // D address offset counter if (offset_reset) D_offset <= 0; else if (D_offset_cnt_en) D_offset <= D_offset + 1; ... Digital Design — Chapter 9 — Accelerators

27 January 2018 Address Generation assign O_prev_addr = O_base + O_offset; assign O_curr_addr = O_prev_addr + 640/4; assign O_next_addr = O_prev_addr /4; assign D_addr = D_base + D_offset; assign adr_o[21:2] = prev_row_load ? O_prev_addr : curr_row_load ? O_curr_addr : next_row_load ? O_next_addr : D_addr; assign adr_o[1:0] = 2'b00; Digital Design — Chapter 9 — Accelerators

Control/Status Registers
Digital Design — Chapter 9 — Accelerators 27 January 2018 Control/Status Registers Register Offset Read/Write Purpose Int_en Write-only Interrupt enable (bit 0). Start 4 Write causes image processing to start (value ignored). O_base 8 Original image base address. D_base 12 Derivative image base address Status Read-only Processing done (bit 0). Reading clears interrupt. Digital Design — Chapter 9 — Accelerators

27 January 2018 Slave Bus Interface assign start = cyc_i && stb_i && we_i && adr_i == 2'b01; assign O_base_ce = cyc_i && stb_i && we_i && adr_i == 2'b10; assign D_base_ce = cyc_i && stb_i && we_i && adr_i == 2'b11; clk_i) // Interrupt enable register if (rst_i) int_en <= 1'b0; else if (cyc_i && stb_i && we_i && adr_i == 2'b00) int_en <= dat_i[0]; clk_i) // Status register if (rst_i) done <= 1'b0; else if (done_set) // This occurs when last write is acknowledged, // and so cannot coincide with a read of the status register. done <= 1'b1; else if (cyc_i && stb_i && we_i && adr_i == 2'b00 && ack_o) done <= 1'b0; assign int_req = int_en && done; ... Digital Design — Chapter 9 — Accelerators

27 January 2018 Slave Bus Interface clk_i) // Generate ack output ack_o <= cyc_i && stb_i && !ack_o; // Wishbone data output multiplexer if (cyc_i && stb_i && !we_i) if (adr_i == 2'b00) dat_o = {31'b0, done}; // status register read else dat_o = 32'b0; // other registers read as 0 else dat_o = result_row; // for master write Digital Design — Chapter 9 — Accelerators

27 January 2018 Control Sequencing Use a finite-state machine Counters keep track of rows (0 to 477) and columns (0 to 159) See textbook for details of FSM output functions Digital Design — Chapter 9 — Accelerators

State Transition Diagram
Digital Design — Chapter 9 — Accelerators 27 January 2018 State Transition Diagram Digital Design — Chapter 9 — Accelerators

Accelerator Verification
Digital Design — Chapter 9 — Accelerators 27 January 2018 Accelerator Verification Simulation-based verification of each section of the accelerator Slave bus operations Computation sequencing Master bus operations Address generation Pixel computation Testbench including the accelerator Bus functional processor model Simplified memory and bus arbiter models Digital Design — Chapter 9 — Accelerators

Sobel Verification Testbench
Digital Design — Chapter 9 — Accelerators 27 January 2018 Sobel Verification Testbench Processor BFM Arbiter Sobel Accelerator Multiplexed Bus: Muxes and Connections Memory Model Digital Design — Chapter 9 — Accelerators

Processor Bus Functional Model
Digital Design — Chapter 9 — Accelerators 27 January 2018 Processor Bus Functional Model initial begin // Processor bus-functional model cpu_adr_o <= 23'h000000; cpu_sel_o <= 4'b0000; cpu_dat_o <= 32'h ; cpu_cyc_o <= 1'b0; cpu_stb_o <= 1'b0; cpu_we_o <= 1'b0; @(negedge rst); @(posedge clk); // Write (hex) to O_base_addr register bus_write(sobel_reg_base + sobel_O_base_reg_offset, 32'h ); // Write (hex) to D_base_addr register bus_write(sobel_reg_base + sobel_D_base_reg_offset, 32'h ); // Write 1 to interrupt control register (enable interrupt) bus_write(sobel_reg_base + sobel_int_reg_offset, 32'h ); // Write to start register (data value ignored) bus_write(sobel_reg_base + sobel_start_reg_offset, 32'h ); // End of write operations Digital Design — Chapter 9 — Accelerators

Processor Bus Functional Model
Digital Design — Chapter 9 — Accelerators 27 January 2018 Processor Bus Functional Model cpu_cyc_o = 1'b0; cpu_stb_o = 1'b0; cpu_we_o = 1'b0; begin: loop forever begin #10000; @(posedge clk); // Read status register cpu_adr_o <= sobel_reg_base + sobel_status_reg_offset; cpu_sel_o <= 4'b1111; cpu_cyc_o <= 1'b1; cpu_stb_o <= 1'b1; cpu_we_o <= 1'b0; @(posedge clk); while clk); cpu_cyc_o <= 1'b0; cpu_stb_o <= 1'b0; cpu_we_o <= 1'b0; if (cpu_dat_i[0]) disable loop; end end end Digital Design — Chapter 9 — Accelerators

Memory Bus Functional Model
Digital Design — Chapter 9 — Accelerators 27 January 2018 Memory Bus Functional Model always begin // Memory bus-functional model mem_ack_o <= 1'b0; mem_dat_o <= 32'h ; @(posedge clk); while (!(bus_cyc && clk); if (!bus_we) mem_dat_o <= 32'h ; // in place of read data mem_ack_o <= 1'b1; @(posedge clk); end Digital Design — Chapter 9 — Accelerators

27 January 2018 Bus Arbiter Uses sobel_cyc_o and cpu_cyc_o as request inputs If both request at the same time, give accelerator priority Mealy FSM Digital Design — Chapter 9 — Accelerators

27 January 2018 Bus Arbiter clk) // Arbiter FSM register if (rst) arbiter_current_state <= sobel; else arbiter_current_state <= arbiter_next_state; // Arbiter logic case (arbiter_current_state) sobel: if (sobel_cyc_o) begin sobel_gnt <= 1'b1; cpu_gnt <= 1'b0; arbiter_next_state <= sobel; end else if (!sobel_cyc_o && cpu_cyc_o) begin sobel_gnt <= 1'b0; cpu_gnt <= 1'b1; arbiter_next_state <= cpu; end else begin sobel_gnt <= 1'b0; cpu_gnt <= 1'b0; arbiter_next_state <= sobel; end cpu: if (cpu_cyc_o) begin sobel_gnt <= 1'b0; cpu_gnt <= 1'b1; arbiter_next_state <= cpu; end else if (sobel_cyc_o && !cpu_cyc_o) begin sobel_gnt <= 1'b1; cpu_gnt <= 1'b0; arbiter_next_state <= sobel; end else begin sobel_gnt <= 1'b0; cpu_gnt <= 1'b0; arbiter_next_state <= sobel; end endcase Digital Design — Chapter 9 — Accelerators

27 January 2018 Simulation Results See waveforms in textbook Demonstrates sequencing and address generation But what about… Data values computed correctly Interactions between processor and accelerator Need to use more sophisticated verification techniques Due to complexity of the design Digital Design — Chapter 9 — Accelerators

27 January 2018 Summary Accelerators boost performance using parallel hardware Replication, pipelining, … Ahmdahl’s Law Best payback from accelerating a kernel DMA avoids processor overhead Verification requires advanced techniques Digital Design — Chapter 9 — Accelerators

Lecture 1 Introduction to Digital Logic Design
Hai Zhou EECS 303 Advanced Digital Design Fall 2011 EECS 303 Lecture 1

Outline Class administration Digital design methodology
Representations of Digital Design Introduction to Mentor Graphics tools READING: Chapter 1 Chapter 2 EECS 303 Lecture 1

Class Administration Lectures twice a week, Tuesday-Thursday 3:30-4:50PM Instructor: Hai Zhou Office: L461 Tech PHONE: Teaching Assistant Peng Kang Office: M314 Tech Web Page: EECS 303 Lecture 1

Class Prerequisites EECS 203: Introduction to Computer Engineering
Need to have basic understanding of digital systems, logic gates, combinational and sequential logic Need to have been exposed to UNIX since we will use the Mentor Graphics tools on SUN workstations Class will form a background for other classes in Computer Engineering EECS 357: Introduction to VLSI CAD EECS 355: ASIC & FPGA Design EECS 361: Computer Architecture EECS 391: Introduction to VLSI Design EECS 303 Lecture 1

Class Administration Required Textbooks: Classnotes
Mano and Kime, “Logic & Computer Design Fundamentals”, Prentice Hall. Classnotes Copies of lecture transparencies to be made available EECS 303 Lecture 1

Class Grades 5 Homeworks 5 Labs Midterm exam Final exam
25% of grade 5 Labs Midterm exam 20% of grade Final exam 30% of grade Homeworks and labs will be due at the beginning of class on the due date A penalty of 10% per working day will be assigned to late assignments or labs EECS 303 Lecture 1

Lab Work You will be introduced to the use of a commercial computer aided design tool from Mentor Graphics Will use the Sun workstations in the Wilkinson Lab (3rd floor M wing of Tech) Lab Hours: Open There will be 5 labs Lab 1: Tutorial on Mentor Graphics (simple logic) Lab 2: Design of combinational logic (8-bit adder) Lab 3: Design of ALU and shifter Lab 4: Design of a simple 8-state finites state machine Lab 5: Use of VHDL for combinational and sequential design EECS 303 Lecture 1

The Process of Design Design
Initial concept: what is the function performed by the object? Constraints: How fast? How much area? How much cost? Refine abstract functional blocks into more concrete realizations Implementation Assemble primitives into more complex building blocks Composition via wiring Choose among alternatives to improve the design Debug Faulty systems: design flaws, composition flaws, component flaws Design to make debugging easier Hypothesis formation and troubleshooting skills EECS 303 Lecture 1

Digital Systems Digital vs. Analog Waveforms Digital:
only assumes discrete values Analog: values vary over a broad range continuously EECS 303 Lecture 1

Digital Hardware Systems
Boolean Algebra and Logical Operators Algebra: variables, values, operations In Boolean algebra, the values are the symbols 0 and 1 If a logic statement is false, it has value 0 If a logic statement is true, it has value 1 Operations: AND, OR, NOT EECS 303 Lecture 1

Combinational vs. Sequential Logic Network implemented from switching elements or logic gates. The presence of feedback distinguishes between sequential and combinational networks. Combinational logic no feedback among inputs and outputs outputs are a pure function of the inputs e.g., full adder circuit: (A, B, Carry In) mapped into (Sum, Carry Out) EECS 303 Lecture 1

Sequential logic inputs and outputs overlap outputs depend on inputs and the entire history of execution! network typically has only a limited number of unique configurations these are called states e.g., traffic light controller sequences infinitely through four states new component in sequential logic networks: storage elements to remember the current state output and new state is a function of the inputs and the old state i.e., the fed back inputs are the state! Synchronous systems period reference signal, the clock, causes the storage elements to accept new values and to change state Asynchronous systems no single indication of when to change state EECS 303 Lecture 1

Case Study of a Simple Logic Design: Seven Segment Display
Chip to drive digital display L1 L 6 L2 L3 L 7 L 4 L 5

Case Study (cont.) L1 L 6 L2 L3 L 7 L 4 L 5

Case Study (cont.) Implement L4: Some gate level implementation
of the Boolean function for L4

Representations of Digital Design: Switches
A switch connects two points under control signal. when the control signal is 0 (false), the switch is open when it is 1 (true), the switch is closed Normally Open Normally Closed when control is 1 (true), switch is open when control is 0 (false), switch is closed EECS 303 Lecture 1

Switch Representations
routing inputs to outputs through a maze Examples: Floating nodes: what happens if the car is not running? outputs are floating rather than forced to be false Under all possible control signal settings (1) all outputs must be connected to some input through a path (2) no output is connected to more than one input through any path EECS 303 Lecture 1

Switch Representations
Implementation of AND and OR Functions with Switches AND function Series connection to TRUE OR function Parallel connection to TRUE EECS 303 Lecture 1

Representations of a Digital Design
Truth Tables tabulate all possible input combinations and their associated output values Example: half adder adds two binary digits to form Sum and Carry Example: full adder adds two binary digits and Carry in to form Sum and Carry Out NOTE: 1 plus 1 is 0 with a carry of 1 in binary EECS 303 Lecture 1

Representations of Digital Design: Boolean Algebra
values: 0, 1 variables: A, B, C, . . ., X, Y, Z operations: NOT, AND, OR, . . . NOT X is written as X X AND Y is written as X & Y, or sometimes X Y X OR Y is written as X + Y Deriving Boolean equations from truth tables: Sum = A B + A B A 1 B 1 Sum 1 Carry 1 OR'd together product terms for each truth table row where the function is 1 if input variable is 0, it appears in complemented form; if 1, it appears uncomplemented EECS 303 Lecture 1 Carry = A B

Representations of a Digital Design: Boolean Algebra
Another example: A 1 B 1 Cin 1 Sum 1 Cout 1 Sum = A B Cin + A B Cin + A B Cin + A B Cin Cout = A B Cin + A B Cin + A B Cin + A B Cin EECS 303 Lecture 1

Gate Representations of a Digital Design
most widely used primitive building block in digital system design Standard Logic Gate Representation Half Adder Schematic Net: electrically connected collection of wires Netlist: tabulation of gate inputs & outputs and the nets they are connected to EECS 303 Lecture 1

Representations of a Digital Design: Gates
Full Adder Schematic Fan-in: number of inputs to a gate Fan-out: number of gate inputs an output is connected to Technology "Rules of Composition" place limits on fan-in/fan-out EECS 303 Lecture 1

Waveform Representation
dynamic behavior of a circuit real circuits have non-zero delays Timing Diagram of the Half Adder sum propagation delay sum propagation delay circuit hazard: 1 plus 0 is 1, not 0! Output changes are delayed from input changes The propagation delay is sensitive to paths in the circuit Outputs may temporarily change from the correct value to the wrong value back again to the correct value: this is called a glitch or hazard EECS 303 Lecture 1

Block Representation of a Digital Design
structural organization of the design black boxes with input and output connections corresponds to well defined functions concentrates on how the components are composed by wiring Full Adder realized in terms of composition of half adder blocks Block diagram representation of the Full Adder EECS 303 Lecture 1

Introduction to Mentor Graphics Tools
The Mentor Graphics CAD system has many components You will use a small portion of the tools for this course Falcon Design Framework Design Architect for entering logic designs Quicksim for simulating the designs QuickHDL for entering and simulating the VHDL designs Read through and execute Lab 1: Mentor Graphics tutorial EECS 303 Lecture 1

Introduction to Mentor Graphics
Typing “source /vol/ece303/mgc.env” on Sun workstation will set up env for 303 labs Typing “dmgr” for Design Manager will create a window for running several tools Mentor Graphics is not a single tool but a series of design tools that uses object oriented data representation to simplify the design process Data created in one tool (e.g. design architect) can be shipped to another tool (e.g. quicksim) for simulation A schematic is merely a pictorial representation of a circuit EECS 303 Lecture 1

Viewpoints in Electronic Design Objects
Data created by DESIGN ARCHITECT is saved in Component Viewpoint A component is a collection of models used to describe the functional, graphical aspects Component data is made of a schematic and a symbol A symbol is a graphical model of the input and output pins A schematic is a functional model of how outputs are related to input values A viewpoint can be thought of as a filter that other applications use to process component data Component Viewpoint Electronic Design Object Symbol for XOR EECS 303 Lecture 1

Moving Design Data Students familiar with UNIX, please refrain from using UNIX commands to move directories or files You MUST move these objects using the Design Manager Failure to use Design Manager will result in data corruption Design Architect will store the absolute pathname to a design Quicksim will try to use the symbol to look for the design from that pathname EECS 303 Lecture 1

Summary Class administration Digital design methodology
Representations of Digital Design Introduction to Mentor Graphics tools NEXT LECTURE: Memory Elements READING: Chapter 4 EECS 303 Lecture 1

Digital Design: An Embedded Systems Approach Using Verilog

Similar presentations

Presentation on theme: "Digital Design: An Embedded Systems Approach Using Verilog"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Digital Design: An Embedded Systems Approach Using Verilog

Similar presentations

Presentation on theme: "Digital Design: An Embedded Systems Approach Using Verilog"— Presentation transcript:

Similar presentations

About project

Feedback