David Culler Electrical Engineering and Computer Sciences

Slides:



Advertisements
Similar presentations
ENGIN112 L23: Finite State Machine Design Procedure October 27, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 23 Finite State Machine.
Advertisements

Chapter #8: Finite State Machine Design 8
Sequential Logic Optimization
Some Slides from: U.C. Berkeley, U.C. Berkeley, Alan Mishchenko, Alan Mishchenko, Mike Miller, Mike Miller, Gaetano Borriello Gaetano Borriello Introduction.
Chapter #10: Finite State Machine Implementation
Implementation Strategies
Finite State Machines (FSMs)
State-machine structure (Mealy)
VIII - Working with Sequential Logic © Copyright 2004, Gaetano Borriello and Randy H. Katz 1 Finite state machine optimization State minimization  fewer.
Give qualifications of instructors: DAP
Circuits require memory to store intermediate data
No. 9-1 Chapter #9: Finite State Machine Optimization.
CS 151 Digital Systems Design Lecture 37 Register Transfer Level
CS 151 Digital Systems Design Lecture 25 State Reduction and Assignment.
CS Fall 2005 – Lec #16 – Retiming - 1 State Machine Timing zRetiming ySlosh logic between registers to balance latencies and improve clock timings.
ECE C03 Lecture 111 Lecture 11 Finite State Machine Optimization Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
CS Spring 2007 – Lec #16 – Retiming - 1 State Machine Timing zRetiming ySlosh logic between registers to balance latencies and improve clock timings.
ECE C03 Lecture 101 Lecture 10 Finite State Machine Design Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
Digital Design – Optimizations and Tradeoffs
Give qualifications of instructors: DAP
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
Contemporary Logic Design Finite State Machine Design © R.H. Katz Transparency No Chapter #8: Finite State Machine Design Finite State.
Sequential Circuit Design
Spring 2002EECS150 - Lec15-seq2 Page 1 EECS150 - Digital Design Lecture 15 - Sequential Circuits II (Finite State Machines revisited) March 14, 2002 John.
ENGIN112 L25: State Reduction and Assignment October 31, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 25 State Reduction and Assignment.
SEQUENTIAL CIRCUITS Introduction
Chapter #8: Finite State Machine Design
1 Lecture 22 State encoding  One-hot encoding  Output encoding State partitioning.
Introduction to State Machine
9-1 Introduction Chapter #9: Finite State Machine Optimization.
DLD Lecture 26 Finite State Machine Design Procedure.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
George Mason University Finite State Machines Refresher ECE 545 Lecture 11.
1 CSE370, Lecture 24 Lecture 26 u Logistics n HW8 due Friday n Ant extra credit due Friday n Final exam a week from today, 12/8 8:30am-10:20am here n Review.
CHAPTER 16 SEQUENTIAL CIRCUIT DESIGN
Week #7 Sequential Circuits (Part B)
Finite state machine optimization
Sequential Networks and Finite State Machines
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN
Finite state machine optimization
Chapter #6: Sequential Logic Design
© Copyright 2004, Gaetano Borriello and Randy H. Katz
Register Transfer Specification And Design
Lecture 26 Logistics Last lecture Today HW8 due Friday
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
Lecture 27 Logistics Last lecture Today: HW8 due Friday
CS Spring 2008 – Lec #17 – Retiming - 1
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Sequential Networks and Finite State Machines
Lecture 27 Logistics Last lecture Today: HW8 due Friday
Lecture 14 Reduction of State Tables
Synthesis of sequential circuits
CSE 370 – Winter Sequential Logic-2 - 1
Lecture 25 Logistics Last lecture Today HW8 posted today, due 12/5
EECS Components and Design Techniques for Digital Systems Lec 08 – Using, Modeling and Implementing FSMs David Culler Electrical Engineering.
Lecture 17 Logistics Last lecture Today HW5 due on Wednesday
Guest Lecture by David Johnston
Lecture 25 Logistics Last lecture Today HW8 posted today, due 12/5
CSE 370 – Winter Sequential Logic-2 - 1
Lecture 24 Logistics Last lecture Today HW7 back today
Lecture 22 Logistics Last lecture Today
Lecture 23 Logistics Last lecture Today HW8 due Wednesday, March 11
Lecture 17 Logistics Last lecture Today HW5 due on Wednesday
Lecture 22 Logistics Last lecture Today HW7 is due on Friday
EGR 2131 Unit 12 Synchronous Sequential Circuits
ECE 352 Digital System Fundamentals
Lecture 22 Logistics Last lecture Today HW7 is due on Friday
Chapter5: Synchronous Sequential Logic – Part 3
CSE 370 – Winter Sequential Logic-2 - 1
Implementation Strategies
Presentation transcript:

EECS 150 - Components and Design Techniques for Digital Systems Lec 23 – Optimizing State Machines David Culler Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~culler http://inst.eecs.berkeley.edu/~cs150

Datapath vs Control Datapath Controller signals Control Points Datapath: Storage, FU, interconnect sufficient to perform the desired functions Inputs are Control Points Outputs are signals Controller: State machine to orchestrate operation on the data path Based on desired function and signals 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Control Points Discussion Control points on a bus? Control points on a register? Control points on a function unit? Signal? Relationship to STG? STT? Input A State / Output Input B 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Sequential Logic Optimization State Minimization Algorithms for State Minimization State, Input, and Output Encodings Minimize the Next State and Output logic Delay optimizations Retiming Parallelism and Pipelining (time permitting) 11/16/2018 EECS 150, Fa07, Lec 23-optimize

FSM Optimization in Context Understand the word specification Draw a picture Derive a state diagram and Symbolic State Table Determine an implementation approach (e.g., gate logic, ROM, FPGA, etc.) Perform STATE MINIMIZATION Perform STATE ASSIGNMENT Map Symbolic State Table to Encoded State Tables for implementation (INPUT and OUTPUT encodings) You can specify a specific state assignment in your Verilog code through parameter settings 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Finite State Machine Optimization State Minimization Fewer states require fewer state bits Fewer bits require fewer logic equations Encodings: State, Inputs, Outputs State encoding with fewer bits has fewer equations to implement However, each may be more complex State encoding with more bits (e.g., one-hot) has simpler equations Complexity directly related to complexity of state diagram Input/output encoding may or may not be under designer control 11/16/2018 EECS 150, Fa07, Lec 23-optimize

FSM Optimization State Reduction: Example: Odd parity checker. Motivation: lower cost fewer flip-flops in one-hot implementations possibly fewer flip-flops in encoded implementations more don’t cares in NS logic fewer gates in NS logic Simpler to design with extra states then reduce later. Example: Odd parity checker. Two machines - identical behavior. 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Algorithmic Approach to State Minimization Goal – identify and combine states that have equivalent behavior Equivalent States: Same output For all input combinations, states transition to same or equivalent states Algorithm Sketch 1. Place all states in one set 2. Initially partition set based on output behavior 3. Successively partition resulting subsets based on next state transitions 4. Repeat (3) until no further partitioning is required states left in the same set are equivalent Polynomial time procedure 11/16/2018 EECS 150, Fa07, Lec 23-optimize

State Minimization Example Sequence Detector for 010 or 110 Input Next State Output Sequence Present State X=0 X=1 X=0 X=1 Reset S0 S1 S2 0 0 0 S1 S3 S4 0 0 1 S2 S5 S6 0 0 00 S3 S0 S0 0 0 01 S4 S0 S0 1 0 10 S5 S0 S0 0 0 11 S6 S0 S0 1 0 S0 S3 S2 S1 S5 S6 S4 1/0 0/0 0/1 Note: Mealy machine Alternative STT 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Method of Successive Partitions Input Next State Output Sequence Present State X=0 X=1 X=0 X=1 Reset S0 S1 S2 0 0 0 S1 S3 S4 0 0 1 S2 S5 S6 0 0 00 S3 S0 S0 0 0 01 S4 S0 S0 1 0 10 S5 S0 S0 0 0 11 S6 S0 S0 1 0 ( S0 S1 S2 S3 S4 S5 S6 ) ( S0 S1 S2 S3 S5 ) ( S4 S6 ) ( S0 S1 S2 ) ( S3 S5 ) ( S4 S6 ) ( S0 ) ( S1 S2 ) ( S3 S5 ) ( S4 S6 ) S1 is equivalent to S2 S3 is equivalent to S5 S4 is equivalent to S6 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Minimized FSM State minimized sequence detector for 010 or 110 Input Next State Output Sequence Present State X=0 X=1 X=0 X=1 Reset S0 S1' S1' 0 0 0 + 1 S1' S3' S4' 0 0 X0 S3' S0 S0 0 0 X1 S4' S0 S0 1 0 S0 S1’ S3’ S4’ X/0 1/0 0/1 0/0 7 States reduced to 4 States 3 bit encoding replaced by 2 bit encoding 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Another Example – Row Matching Method 4-Bit Sequence Detector: output 1 after each 4-bit input sequence consisting of the binary strings 0110 or 1010 11/16/2018 EECS 150, Fa07, Lec 23-optimize

State Transition Table Group states with same next state and same outputs S’10 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Iterate the Row Matching Algorithm S’7 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Iterate One Last Time S’3 S’4 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Final Reduced State Machine 15 states (min 4 FFs) reduced to 7 states (min 3 FFs) 11/16/2018 EECS 150, Fa07, Lec 23-optimize

More Complex State Minimization Multiple input example inputs here 10 01 11 00 S0 [1] S2 [1] S4 [1] S1 [0] S3 S5 present next state output state 00 01 10 11 S0 S0 S1 S2 S3 1 S1 S0 S3 S1 S4 0 S2 S1 S3 S2 S4 1 S3 S1 S0 S4 S5 0 S4 S0 S1 S2 S5 1 S5 S1 S4 S0 S5 0 symbolic state transition table 11/16/2018 EECS 150, Fa07, Lec 23-optimize

State Reduction limits The “row matching” method is not guaranteed to result in the optimal solution in all cases, because it only looks at pairs of states. For example: Another (more complicated) method guarantees the optimal solution: “Implication table” method: cf. Mano, chapter 9 What ‘rule of thumb” heuristics? 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Minimized FSM Implication Chart Method Table of all pairs of stats 1st Eliminate incompatible states based on outputs Fill entry with implied equivalents based on next state Cross out cells where indexed chart entries are crossed out present next state output state 00 01 10 11 S0 S0 S1 S2 S3 1 S1 S0 S3 S1 S4 0 S2 S1 S3 S2 S4 1 S3 S1 S0 S4 S5 0 S4 S0 S1 S2 S5 1 S5 S1 S4 S0 S5 0 S1 S2 S3 S4 S5 S0 S0-S1 S1-S3 S2-S2 S3-S4 S0-S1 S3-S0 S1-S4 S4-S5 minimized state table (S0==S4) (S3==S5) present next state output state 00 01 10 11 S0' S0' S1 S2 S3' 1 S1 S0' S3' S1 S3' 0 S2 S1 S3' S2 S0' 1 S3' S1 S0' S0' S3' 0 S0-S0 S1-S1 S2-S2 S3-S5 S1-S0 S3-S1 S2-S2 S4-S5 S0-S1 S3-S4 S1-S0 S4-S5 S4-S0 S5-S5 S1-S1 S0-S4 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Minimizing Incompletely Specified FSMs Equivalence of states is transitive when machine is fully specified But its not transitive when don't cares are present e.g., state output S0 – 0 S1 is compatible with both S0 and S2 S1 1 – but S0 and S2 are incompatible S2 – 1 No polynomial time algorithm exists for determining best grouping of states into equivalent sets that will yield the smallest number of final states 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Minimizing States May Not Yield Best Circuit Example: edge detector - outputs 1 when last two input changes from 0 to 1 X Q1 Q0 Q1+ Q0+ 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1 – 1 0 0 0 00 [0] 11 [0] 01 [1] X’ X Q1+ = X (Q1 xor Q0) Q0+ = X Q1’ Q0’ 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Another Implementation of Edge Detector "Ad hoc" solution - not minimal but cheap and fast 00 [0] 10 [0] 01 [1] X’ X 11 [0] 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Announcements Reading: K&B 8.1-2 HW 9 due wed Last HW will go out next week TAs in lab this week as much as possible rather than official lab meetings Nov 29 Bring your question on a sheet of paper Down to the final stretch 11/16/2018 EECS 150, Fa07, Lec 23-optimize

State Assignment Choose bit vectors to assign to each “symbolic” state With n state bits for m states there are 2n! / (2n – m)! [log n <= m <= 2n] 2n codes possible for 1st state, 2n–1 for 2nd, 2n–2 for 3rd, … Huge number even for small values of n and m Intractable for state machines of any size Heuristics are necessary for practical solutions Optimize some metric for the combinational logic Size (amount of logic and number of FFs) Speed (depth of logic and fanout) Dependencies (decomposition) 11/16/2018 EECS 150, Fa07, Lec 23-optimize

State Assignment Strategies Possible Strategies Sequential – just number states as they appear in the state table Random – pick random codes One-hot – use as many state bits as there are states (bit=1 –> state) Output – use outputs to help encode states Heuristic – rules of thumb that seem to work in most cases No guarantee of optimality – another intractable problem 11/16/2018 EECS 150, Fa07, Lec 23-optimize

State Maps “K-maps” are used to help visualize good encodings. Assignment State q2 q1 q0 S0 0 0 0 S1 1 0 1 S2 1 1 1 S3 0 1 0 S4 0 1 1 Assignment State q2 q1 q0 S0 0 0 0 S1 0 0 1 S2 0 1 0 S3 0 1 1 S4 1 1 1 “K-maps” are used to help visualize good encodings. Adjacent states in the STD should be made adjacent in the map. 11/16/2018 EECS 150, Fa07, Lec 23-optimize

State Maps and Counting Bit Changes Bit Change Heuristic S0 1 S1 S2 S3 S4 S0 -> S1: 2 1 S0 -> S2: 3 1 S1 -> S3: 3 1 S2 -> S3: 2 1 S3 -> S4: 1 1 S4 -> S1: 2 2 Total: 13 7 11/16/2018 EECS 150, Fa07, Lec 23-optimize

State Assignment Alternative heuristics based on input and output behavior as well as transitions: Adjacent assignments to: states that share a common next state (group 1's in next state map) states that share a common ancestor state (group 1's in next state map) states that have common output behavior (group 1's in output map) 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Heuristics for State Assignment Successor/Predecessor Heuristics High Priority: S’3 and S’4 share common successor state (S0) Medium Priority: S’3 and S’4 share common predecessor state (S’1) Low Priority: 0/0: S0, S’1, S’3 1/0: S0, S’1, S’3, S’4 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Heuristics for State Assignment 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Another Example High Priority: S’3, S’4 S’7, S’10 Medium Priority: S1, S2 2 x S’3, S’4 S’7, S’10 Low Priority: 0/0: S0, S1, S2, S’3, S’4, S’7 1/0: S0, S1, S2, S’3, S’4, S’7, S10 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Example Continued Two alternative assignments at the left Choose assignment for S0 = 000 Place the high priority adjacency state pairs into the State Map Repeat for the medium adjacency pairs Repeat for any left over states, using the low priority scheme Two alternative assignments at the left 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Why Do These Heuristics Work? Attempt to maximize adjacent groupings of 1’s in the next state and output functions 11/16/2018 EECS 150, Fa07, Lec 23-optimize

General Approach to Heuristic State Assignment All current methods are variants of this 1) Determine which states “attract” each other (weighted pairs) 2) Generate constraints on codes (which should be in same cube) 3) Place codes on Boolean cube so as to maximize constraints satisfied (weighted sum) Different weights make sense depending on whether we are optimizing for two-level or multi-level forms Can't consider all possible embeddings of state clusters in Boolean cube Heuristics for ordering embedding To prune search for best embedding Expand cube (more state bits) to satisfy more constraints 11/16/2018 EECS 150, Fa07, Lec 23-optimize

One-hot State Assignment Simple Easy to encode, debug Small Logic Functions Each state function requires only predecessor state bits as input Good for Programmable Devices Lots of flip-flops readily available Simple functions with small support (signals its dependent upon) Impractical for Large Machines Too many states require too many flip-flops Decompose FSMs into smaller pieces that can be one-hot encoded Many Slight Variations to One-hot One-hot + all-0 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Output-Based Encoding Reuse outputs as state bits - use outputs to help distinguish states Why create new functions for state bits when output can serve as well Fits in nicely with synchronous Mealy implementations Inputs Present State Next State Outputs C TL TS ST H F 0 – – HG HG 0 00 10 – 0 – HG HG 0 00 10 1 1 – HG HY 1 00 10 – – 0 HY HY 0 01 10 – – 1 HY FG 1 01 10 1 0 – FG FG 0 10 00 0 – – FG FY 1 10 00 – 1 – FG FY 1 10 00 – – 0 FY FY 0 10 01 – – 1 FY HG 1 10 01 HG = ST’ H1’ H0’ F1 F0’ + ST H1 H0’ F1’ F0 HY = ST H1’ H0’ F1 F0’ + ST’ H1’ H0 F1 F0’ FG = ST H1’ H0 F1 F0’ + ST’ H1 H0’ F1’ F0’ HY = ST H1 H0’ F1’ F0’ + ST’ H1 H0’ F1’ F0 Output patterns are unique to states, we do not need ANY state bits – implement 5 functions (one for each output) instead of 7 (outputs plus 2 state bits) 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Current State Assignment Approaches For tight encodings using close to the minimum number of state bits Best of 10 random seems to be adequate (averages as well as heuristics) Heuristic approaches are not even close to optimality Used in custom chip design One-hot encoding Easy for small state machines Generates small equations with easy to estimate complexity Common in FPGAs and other programmable logic Output-based encoding Ad hoc - no tools Most common approach taken by human designers Yields very small circuits for most FSMs 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Sequential Logic Implementation Summary Implementation of sequential logic State minimization State assignment Implications for programmable logic devices When logic is expensive and FFs are scarce, optimization is highly desirable (e.g., gate logic, PLAs, etc.) In Xilinx devices, logic is bountiful (4 and 5 variable TTs) and FFs are many (2 per CLB), so optimization is not so crucial an issue as in other forms of programmable logic This makes sparse encodings like One-Hot worth considering 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Improving Cycle Time Retiming Parallelism Pipelining 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Example: Vending Machine State Machine Moore machine outputs associated with state Mealy machine outputs associated with transitions 0¢ [0] 10¢ 5¢ 15¢ [1] N’ D’ + Reset D N N+D N’ D’ Reset’ Reset 0¢ 10¢ 5¢ 15¢ (N’ D’ + Reset)/0 D/0 D/1 N/0 N+D/1 N’ D’/0 Reset’/1 Reset/0 11/16/2018 EECS 150, Fa07, Lec 23-optimize

State Machine Retiming Moore vs. (Async) Mealy Machine Vending Machine Example Open asserted only when in state 15 Open asserted when last coin inserted leading to state 15 11/16/2018 EECS 150, Fa07, Lec 23-optimize

State Machine Retiming Retiming the Moore Machine: Faster generation of outputs Synchronizing the Mealy Machine: Add a FF, delaying the output These two implementations have identical timing behavior Push the AND gate through the State FFs and synchronize with an output FF Like computing open in the prior state and delaying it one state time 11/16/2018 EECS 150, Fa07, Lec 23-optimize

State Machine Retiming Effect on timing of Open Signal (Moore Case) Clk FF prop delay State Open Out prop delay Out calc Plus set-up NOTE: overlaps with Next State calculation Retimed Open Open Calculation 11/16/2018 EECS 150, Fa07, Lec 23-optimize

State Machine Retiming Timing behavior is the same, but are the implementations really identical? FF input in retimed Moore implementation FF input in synchronous Mealy implementation Only difference in don’t care case of nickel and dime at the same time 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Parallelism Doing more than one thing at a time: optimization in h/w often involves using parallelism to trade between cost and performance Example, Student final grade calculation: read mt1, mt2, mt3, project; grade = 0.2  mt1 + 0.2  mt2 + 0.2  mt3 + 0.4  project; write grade; High performance hardware implementation: As many operations as possible are done in parallel 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Parallelism Is there a lower cost hardware implementation? Different tree organization? Can factor out multiply by 0.2: How about sharing operators (multipliers and adders)? 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Pipelining Principle Pipelining review from CS61C: Analog to washing clothes: step 1: wash (20 minutes) step 2: dry (20 minutes) step 3: fold (20 minutes) 60 minutes x 4 loads  4 hours wash load1 load2 load3 load4 dry load1 load2 load3 load4 fold load1 load2 load3 load4 20 min overlapped  2 hours 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Pipelining wash load1 load2 load3 load4 dry load1 load2 load3 load4 fold load1 load2 load3 load4 Increase number of loads, average time per load approaches 20 minutes Latency (time from start to end) for one load = 60 min Throughput = 3 loads/hour Pipelined throughput  # of pipe stages x un-pipelined throughput 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Pipelining Assume T = 8 ns TFF(setup +clkq) = 1 ns General principle: Cut the CL block into pieces (stages) and separate with registers: T’ = 4 ns + 1 ns + 4 ns +1 ns = 10 ns F = 1/(4 ns +1 ns) = 200 MHz CL block produces a new result every 5 ns instead of every 9 ns Assume T = 8 ns TFF(setup +clkq) = 1 ns F = 1/9 ns = 111 MHz Assume T1 = T2 = 4 ns 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Limits on Pipelining Without FF overhead, throughput improvement proportional to # of stages After many stages are added. FF overhead begins to dominate: Other limiters to effective pipelining: Clock skew contributes to clock overhead Unequal stages FFs dominate cost Clock distribution power consumption feedback (dependencies between loop iterations) FF “overhead” is the setup and clk to Q times. 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Pipelining Example Computation graph: F(x) = yi = a xi2 + b xi + c x and y are assumed to be “streams” Divide into 3 (nearly) equal stages. Insert pipeline registers at dashed lines. Can we pipeline basic operators? 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Example: Pipelined Adder Possible, but usually not done … (arithmetic units can often be made sufficiently fast without internal pipelining) 11/16/2018 EECS 150, Fa07, Lec 23-optimize

State Machine Retiming Summary Vending Machine Example Very simple output function in this particular case But if output takes a long time to compute vs. the next state computation time -- can use retiming to “balance” these calculations and reduce the cycle time Parallelism Tradeoffs in cost and performance Time reuse of hardware to reduce cost but sacrifice performance Pipelining Introduce registers to split computation to reduce cycle time and allow parallel computation Trade latency (number of stage delays) for cycle time reduction 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Row Matching Example State Transition Table NS output PS x=0 x=1 x=0 x=1 a a b 0 0 b c d 0 0 c a d 0 0 d e f 0 1 e a f 0 1 f g f 0 1 g a f 0 1 11/16/2018 EECS 150, Fa07, Lec 23-optimize

Row Matching Example (cont) NS output PS x=0 x=1 x=0 x=1 a a b 0 0 b c d 0 0 c a d 0 0 d e f 0 1 e a f 0 1 f e f 0 1 Reduced State Transition Diagram NS output PS x=0 x=1 x=0 x=1 a a b 0 0 b c d 0 0 c a d 0 0 d e d 0 1 e a d 0 1 11/16/2018 EECS 150, Fa07, Lec 23-optimize