Presentation is loading. Please wait.

Presentation is loading. Please wait.

David Culler Electrical Engineering and Computer Sciences

Similar presentations


Presentation on theme: "David Culler Electrical Engineering and Computer Sciences"— Presentation transcript:

1 EECS 150 - Components and Design Techniques for Digital Systems Lec 23 – Optimizing State Machines
David Culler Electrical Engineering and Computer Sciences University of California, Berkeley

2 Datapath vs Control Datapath Controller signals Control Points Datapath: Storage, FU, interconnect sufficient to perform the desired functions Inputs are Control Points Outputs are signals Controller: State machine to orchestrate operation on the data path Based on desired function and signals 11/16/2018 EECS 150, Fa07, Lec 23-optimize

3 Control Points Discussion
Control points on a bus? Control points on a register? Control points on a function unit? Signal? Relationship to STG? STT? Input A State / Output Input B 11/16/2018 EECS 150, Fa07, Lec 23-optimize

4 Sequential Logic Optimization
State Minimization Algorithms for State Minimization State, Input, and Output Encodings Minimize the Next State and Output logic Delay optimizations Retiming Parallelism and Pipelining (time permitting) 11/16/2018 EECS 150, Fa07, Lec 23-optimize

5 FSM Optimization in Context
Understand the word specification Draw a picture Derive a state diagram and Symbolic State Table Determine an implementation approach (e.g., gate logic, ROM, FPGA, etc.) Perform STATE MINIMIZATION Perform STATE ASSIGNMENT Map Symbolic State Table to Encoded State Tables for implementation (INPUT and OUTPUT encodings) You can specify a specific state assignment in your Verilog code through parameter settings 11/16/2018 EECS 150, Fa07, Lec 23-optimize

6 Finite State Machine Optimization
State Minimization Fewer states require fewer state bits Fewer bits require fewer logic equations Encodings: State, Inputs, Outputs State encoding with fewer bits has fewer equations to implement However, each may be more complex State encoding with more bits (e.g., one-hot) has simpler equations Complexity directly related to complexity of state diagram Input/output encoding may or may not be under designer control 11/16/2018 EECS 150, Fa07, Lec 23-optimize

7 FSM Optimization State Reduction: Example: Odd parity checker.
Motivation: lower cost fewer flip-flops in one-hot implementations possibly fewer flip-flops in encoded implementations more don’t cares in NS logic fewer gates in NS logic Simpler to design with extra states then reduce later. Example: Odd parity checker. Two machines - identical behavior. 11/16/2018 EECS 150, Fa07, Lec 23-optimize

8 Algorithmic Approach to State Minimization
Goal – identify and combine states that have equivalent behavior Equivalent States: Same output For all input combinations, states transition to same or equivalent states Algorithm Sketch 1. Place all states in one set 2. Initially partition set based on output behavior 3. Successively partition resulting subsets based on next state transitions 4. Repeat (3) until no further partitioning is required states left in the same set are equivalent Polynomial time procedure 11/16/2018 EECS 150, Fa07, Lec 23-optimize

9 State Minimization Example
Sequence Detector for 010 or 110 Input Next State Output Sequence Present State X=0 X=1 X=0 X=1 Reset S0 S1 S2 0 0 0 S1 S3 S4 0 0 1 S2 S5 S6 0 0 00 S3 S0 S0 0 0 01 S4 S0 S0 1 0 10 S5 S0 S0 0 0 11 S6 S0 S0 1 0 S0 S3 S2 S1 S5 S6 S4 1/0 0/0 0/1 Note: Mealy machine Alternative STT 11/16/2018 EECS 150, Fa07, Lec 23-optimize

10 Method of Successive Partitions
Input Next State Output Sequence Present State X=0 X=1 X=0 X=1 Reset S0 S1 S2 0 0 0 S1 S3 S4 0 0 1 S2 S5 S6 0 0 00 S3 S0 S0 0 0 01 S4 S0 S0 1 0 10 S5 S0 S0 0 0 11 S6 S0 S0 1 0 ( S0 S1 S2 S3 S4 S5 S6 ) ( S0 S1 S2 S3 S5 ) ( S4 S6 ) ( S0 S1 S2 ) ( S3 S5 ) ( S4 S6 ) ( S0 ) ( S1 S2 ) ( S3 S5 ) ( S4 S6 ) S1 is equivalent to S2 S3 is equivalent to S5 S4 is equivalent to S6 11/16/2018 EECS 150, Fa07, Lec 23-optimize

11 Minimized FSM State minimized sequence detector for 010 or 110
Input Next State Output Sequence Present State X=0 X=1 X=0 X=1 Reset S0 S1' S1' 0 0 0 + 1 S1' S3' S4' 0 0 X0 S3' S0 S0 0 0 X1 S4' S0 S0 1 0 S0 S1’ S3’ S4’ X/0 1/0 0/1 0/0 7 States reduced to 4 States 3 bit encoding replaced by 2 bit encoding 11/16/2018 EECS 150, Fa07, Lec 23-optimize

12 Another Example – Row Matching Method
4-Bit Sequence Detector: output 1 after each 4-bit input sequence consisting of the binary strings 0110 or 1010 11/16/2018 EECS 150, Fa07, Lec 23-optimize

13 State Transition Table
Group states with same next state and same outputs S’10 11/16/2018 EECS 150, Fa07, Lec 23-optimize

14 Iterate the Row Matching Algorithm
S’7 11/16/2018 EECS 150, Fa07, Lec 23-optimize

15 Iterate One Last Time S’3 S’4 11/16/2018
EECS 150, Fa07, Lec 23-optimize

16 Final Reduced State Machine
15 states (min 4 FFs) reduced to 7 states (min 3 FFs) 11/16/2018 EECS 150, Fa07, Lec 23-optimize

17 More Complex State Minimization
Multiple input example inputs here 10 01 11 00 S0 [1] S2 [1] S4 [1] S1 [0] S3 S5 present next state output state S0 S0 S1 S2 S S1 S0 S3 S1 S S2 S1 S3 S2 S S3 S1 S0 S4 S S4 S0 S1 S2 S S5 S1 S4 S0 S5 0 symbolic state transition table 11/16/2018 EECS 150, Fa07, Lec 23-optimize

18 State Reduction limits
The “row matching” method is not guaranteed to result in the optimal solution in all cases, because it only looks at pairs of states. For example: Another (more complicated) method guarantees the optimal solution: “Implication table” method: cf. Mano, chapter 9 What ‘rule of thumb” heuristics? 11/16/2018 EECS 150, Fa07, Lec 23-optimize

19 Minimized FSM Implication Chart Method Table of all pairs of stats
1st Eliminate incompatible states based on outputs Fill entry with implied equivalents based on next state Cross out cells where indexed chart entries are crossed out present next state output state S0 S0 S1 S2 S S1 S0 S3 S1 S S2 S1 S3 S2 S S3 S1 S0 S4 S S4 S0 S1 S2 S S5 S1 S4 S0 S5 0 S1 S2 S3 S4 S5 S0 S0-S1 S1-S3 S2-S2 S3-S4 S0-S1 S3-S0 S1-S4 S4-S5 minimized state table (S0==S4) (S3==S5) present next state output state S0' S0' S1 S2 S3' S1 S0' S3' S1 S3' S2 S1 S3' S2 S0' S3' S1 S0' S0' S3' 0 S0-S0 S1-S1 S2-S2 S3-S5 S1-S0 S3-S1 S2-S2 S4-S5 S0-S1 S3-S4 S1-S0 S4-S5 S4-S0 S5-S5 S1-S1 S0-S4 11/16/2018 EECS 150, Fa07, Lec 23-optimize

20 Minimizing Incompletely Specified FSMs
Equivalence of states is transitive when machine is fully specified But its not transitive when don't cares are present e.g., state output S0 – 0 S1 is compatible with both S0 and S2 S1 1 – but S0 and S2 are incompatible S2 – 1 No polynomial time algorithm exists for determining best grouping of states into equivalent sets that will yield the smallest number of final states 11/16/2018 EECS 150, Fa07, Lec 23-optimize

21 Minimizing States May Not Yield Best Circuit
Example: edge detector - outputs 1 when last two input changes from 0 to 1 X Q1 Q0 Q1+ Q – 00 [0] 11 [0] 01 [1] X’ X Q1+ = X (Q1 xor Q0) Q0+ = X Q1’ Q0’ 11/16/2018 EECS 150, Fa07, Lec 23-optimize

22 Another Implementation of Edge Detector
"Ad hoc" solution - not minimal but cheap and fast 00 [0] 10 [0] 01 [1] X’ X 11 [0] 11/16/2018 EECS 150, Fa07, Lec 23-optimize

23 Announcements Reading: K&B 8.1-2 HW 9 due wed
Last HW will go out next week TAs in lab this week as much as possible rather than official lab meetings Nov 29 Bring your question on a sheet of paper Down to the final stretch 11/16/2018 EECS 150, Fa07, Lec 23-optimize

24 State Assignment Choose bit vectors to assign to each “symbolic” state
With n state bits for m states there are 2n! / (2n – m)! [log n <= m <= 2n] 2n codes possible for 1st state, 2n–1 for 2nd, 2n–2 for 3rd, … Huge number even for small values of n and m Intractable for state machines of any size Heuristics are necessary for practical solutions Optimize some metric for the combinational logic Size (amount of logic and number of FFs) Speed (depth of logic and fanout) Dependencies (decomposition) 11/16/2018 EECS 150, Fa07, Lec 23-optimize

25 State Assignment Strategies
Possible Strategies Sequential – just number states as they appear in the state table Random – pick random codes One-hot – use as many state bits as there are states (bit=1 –> state) Output – use outputs to help encode states Heuristic – rules of thumb that seem to work in most cases No guarantee of optimality – another intractable problem 11/16/2018 EECS 150, Fa07, Lec 23-optimize

26 State Maps “K-maps” are used to help visualize good encodings.
Assignment State q2 q1 q0 S S S S S Assignment State q2 q1 q0 S S S S S “K-maps” are used to help visualize good encodings. Adjacent states in the STD should be made adjacent in the map. 11/16/2018 EECS 150, Fa07, Lec 23-optimize

27 State Maps and Counting Bit Changes
Bit Change Heuristic S0 1 S1 S2 S3 S4 S0 -> S1: S0 -> S2: S1 -> S3: S2 -> S3: S3 -> S4: S4 -> S1: Total: 11/16/2018 EECS 150, Fa07, Lec 23-optimize

28 State Assignment Alternative heuristics based on input and output behavior as well as transitions: Adjacent assignments to: states that share a common next state (group 1's in next state map) states that share a common ancestor state (group 1's in next state map) states that have common output behavior (group 1's in output map) 11/16/2018 EECS 150, Fa07, Lec 23-optimize

29 Heuristics for State Assignment
Successor/Predecessor Heuristics High Priority: S’3 and S’4 share common successor state (S0) Medium Priority: S’3 and S’4 share common predecessor state (S’1) Low Priority: 0/0: S0, S’1, S’3 1/0: S0, S’1, S’3, S’4 11/16/2018 EECS 150, Fa07, Lec 23-optimize

30 Heuristics for State Assignment
11/16/2018 EECS 150, Fa07, Lec 23-optimize

31 Another Example High Priority: S’3, S’4 S’7, S’10
Medium Priority: S1, S x S’3, S’4 S’7, S’10 Low Priority: 0/0: S0, S1, S2, S’3, S’4, S’7 1/0: S0, S1, S2, S’3, S’4, S’7, S10 11/16/2018 EECS 150, Fa07, Lec 23-optimize

32 Example Continued Two alternative assignments at the left
Choose assignment for S0 = 000 Place the high priority adjacency state pairs into the State Map Repeat for the medium adjacency pairs Repeat for any left over states, using the low priority scheme Two alternative assignments at the left 11/16/2018 EECS 150, Fa07, Lec 23-optimize

33 Why Do These Heuristics Work?
Attempt to maximize adjacent groupings of 1’s in the next state and output functions 11/16/2018 EECS 150, Fa07, Lec 23-optimize

34 General Approach to Heuristic State Assignment
All current methods are variants of this 1) Determine which states “attract” each other (weighted pairs) 2) Generate constraints on codes (which should be in same cube) 3) Place codes on Boolean cube so as to maximize constraints satisfied (weighted sum) Different weights make sense depending on whether we are optimizing for two-level or multi-level forms Can't consider all possible embeddings of state clusters in Boolean cube Heuristics for ordering embedding To prune search for best embedding Expand cube (more state bits) to satisfy more constraints 11/16/2018 EECS 150, Fa07, Lec 23-optimize

35 One-hot State Assignment
Simple Easy to encode, debug Small Logic Functions Each state function requires only predecessor state bits as input Good for Programmable Devices Lots of flip-flops readily available Simple functions with small support (signals its dependent upon) Impractical for Large Machines Too many states require too many flip-flops Decompose FSMs into smaller pieces that can be one-hot encoded Many Slight Variations to One-hot One-hot + all-0 11/16/2018 EECS 150, Fa07, Lec 23-optimize

36 Output-Based Encoding
Reuse outputs as state bits - use outputs to help distinguish states Why create new functions for state bits when output can serve as well Fits in nicely with synchronous Mealy implementations Inputs Present State Next State Outputs C TL TS ST H F 0 – – HG HG – 0 – HG HG – HG HY – – 0 HY HY – – 1 HY FG – FG FG – – FG FY – 1 – FG FY – – 0 FY FY – – 1 FY HG HG = ST’ H1’ H0’ F1 F0’ + ST H1 H0’ F1’ F0 HY = ST H1’ H0’ F1 F0’ + ST’ H1’ H0 F1 F0’ FG = ST H1’ H0 F1 F0’ + ST’ H1 H0’ F1’ F0’ HY = ST H1 H0’ F1’ F0’ + ST’ H1 H0’ F1’ F0 Output patterns are unique to states, we do not need ANY state bits – implement 5 functions (one for each output) instead of 7 (outputs plus 2 state bits) 11/16/2018 EECS 150, Fa07, Lec 23-optimize

37 Current State Assignment Approaches
For tight encodings using close to the minimum number of state bits Best of 10 random seems to be adequate (averages as well as heuristics) Heuristic approaches are not even close to optimality Used in custom chip design One-hot encoding Easy for small state machines Generates small equations with easy to estimate complexity Common in FPGAs and other programmable logic Output-based encoding Ad hoc - no tools Most common approach taken by human designers Yields very small circuits for most FSMs 11/16/2018 EECS 150, Fa07, Lec 23-optimize

38 Sequential Logic Implementation Summary
Implementation of sequential logic State minimization State assignment Implications for programmable logic devices When logic is expensive and FFs are scarce, optimization is highly desirable (e.g., gate logic, PLAs, etc.) In Xilinx devices, logic is bountiful (4 and 5 variable TTs) and FFs are many (2 per CLB), so optimization is not so crucial an issue as in other forms of programmable logic This makes sparse encodings like One-Hot worth considering 11/16/2018 EECS 150, Fa07, Lec 23-optimize

39 Improving Cycle Time Retiming Parallelism Pipelining 11/16/2018
EECS 150, Fa07, Lec 23-optimize

40 Example: Vending Machine State Machine
Moore machine outputs associated with state Mealy machine outputs associated with transitions [0] 10¢ 15¢ [1] N’ D’ + Reset D N N+D N’ D’ Reset’ Reset 10¢ 15¢ (N’ D’ + Reset)/0 D/0 D/1 N/0 N+D/1 N’ D’/0 Reset’/1 Reset/0 11/16/2018 EECS 150, Fa07, Lec 23-optimize

41 State Machine Retiming
Moore vs. (Async) Mealy Machine Vending Machine Example Open asserted only when in state 15 Open asserted when last coin inserted leading to state 15 11/16/2018 EECS 150, Fa07, Lec 23-optimize

42 State Machine Retiming
Retiming the Moore Machine: Faster generation of outputs Synchronizing the Mealy Machine: Add a FF, delaying the output These two implementations have identical timing behavior Push the AND gate through the State FFs and synchronize with an output FF Like computing open in the prior state and delaying it one state time 11/16/2018 EECS 150, Fa07, Lec 23-optimize

43 State Machine Retiming
Effect on timing of Open Signal (Moore Case) Clk FF prop delay State Open Out prop delay Out calc Plus set-up NOTE: overlaps with Next State calculation Retimed Open Open Calculation 11/16/2018 EECS 150, Fa07, Lec 23-optimize

44 State Machine Retiming
Timing behavior is the same, but are the implementations really identical? FF input in retimed Moore implementation FF input in synchronous Mealy implementation Only difference in don’t care case of nickel and dime at the same time 11/16/2018 EECS 150, Fa07, Lec 23-optimize

45 Parallelism Doing more than one thing at a time: optimization in h/w often involves using parallelism to trade between cost and performance Example, Student final grade calculation: read mt1, mt2, mt3, project; grade = 0.2  mt  mt2 + 0.2  mt  project; write grade; High performance hardware implementation: As many operations as possible are done in parallel 11/16/2018 EECS 150, Fa07, Lec 23-optimize

46 Parallelism Is there a lower cost hardware implementation? Different tree organization? Can factor out multiply by 0.2: How about sharing operators (multipliers and adders)? 11/16/2018 EECS 150, Fa07, Lec 23-optimize

47 Pipelining Principle Pipelining review from CS61C:
Analog to washing clothes: step 1: wash (20 minutes) step 2: dry (20 minutes) step 3: fold (20 minutes) 60 minutes x 4 loads  4 hours wash load1 load2 load3 load4 dry load1 load2 load3 load4 fold load1 load2 load3 load4 20 min overlapped  2 hours 11/16/2018 EECS 150, Fa07, Lec 23-optimize

48 Pipelining wash load1 load2 load3 load4 dry load1 load2 load3 load4
fold load1 load2 load3 load4 Increase number of loads, average time per load approaches 20 minutes Latency (time from start to end) for one load = 60 min Throughput = 3 loads/hour Pipelined throughput  # of pipe stages x un-pipelined throughput 11/16/2018 EECS 150, Fa07, Lec 23-optimize

49 Pipelining Assume T = 8 ns TFF(setup +clkq) = 1 ns
General principle: Cut the CL block into pieces (stages) and separate with registers: T’ = 4 ns + 1 ns + 4 ns +1 ns = 10 ns F = 1/(4 ns +1 ns) = 200 MHz CL block produces a new result every 5 ns instead of every 9 ns Assume T = 8 ns TFF(setup +clkq) = 1 ns F = 1/9 ns = 111 MHz Assume T1 = T2 = 4 ns 11/16/2018 EECS 150, Fa07, Lec 23-optimize

50 Limits on Pipelining Without FF overhead, throughput improvement proportional to # of stages After many stages are added. FF overhead begins to dominate: Other limiters to effective pipelining: Clock skew contributes to clock overhead Unequal stages FFs dominate cost Clock distribution power consumption feedback (dependencies between loop iterations) FF “overhead” is the setup and clk to Q times. 11/16/2018 EECS 150, Fa07, Lec 23-optimize

51 Pipelining Example Computation graph: F(x) = yi = a xi2 + b xi + c
x and y are assumed to be “streams” Divide into 3 (nearly) equal stages. Insert pipeline registers at dashed lines. Can we pipeline basic operators? 11/16/2018 EECS 150, Fa07, Lec 23-optimize

52 Example: Pipelined Adder
Possible, but usually not done … (arithmetic units can often be made sufficiently fast without internal pipelining) 11/16/2018 EECS 150, Fa07, Lec 23-optimize

53 State Machine Retiming Summary
Vending Machine Example Very simple output function in this particular case But if output takes a long time to compute vs. the next state computation time -- can use retiming to “balance” these calculations and reduce the cycle time Parallelism Tradeoffs in cost and performance Time reuse of hardware to reduce cost but sacrifice performance Pipelining Introduce registers to split computation to reduce cycle time and allow parallel computation Trade latency (number of stage delays) for cycle time reduction 11/16/2018 EECS 150, Fa07, Lec 23-optimize

54 Row Matching Example State Transition Table NS output
PS x=0 x=1 x=0 x=1 a a b b c d c a d d e f e a f f g f g a f 11/16/2018 EECS 150, Fa07, Lec 23-optimize

55 Row Matching Example (cont)
NS output PS x=0 x=1 x=0 x=1 a a b b c d c a d d e f e a f f e f Reduced State Transition Diagram NS output PS x=0 x=1 x=0 x=1 a a b b c d c a d d e d e a d 11/16/2018 EECS 150, Fa07, Lec 23-optimize


Download ppt "David Culler Electrical Engineering and Computer Sciences"

Similar presentations


Ads by Google