Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Min-Register Retiming Through Binary Max-Flow

Similar presentations


Presentation on theme: "Fast Min-Register Retiming Through Binary Max-Flow"— Presentation transcript:

1 Fast Min-Register Retiming Through Binary Max-Flow
Aaron Hurst Alan Mishchenko Robert Brayton FMCAD 2007

2 Retiming Retiming is the structural relocation of registers such that output functionality is preserved Registers can be relocated to one of several ends Minimizing worst-case delay Minimizing number of registers Either of the above under constraints Optimally or heuristically Other… ? In this work, we look at optimal register minimization without delay constraints: “Min-register”

3 Why Min-Register? Register count is critical in sequential verification “Make or break” effect Low investment Evidence of net utility Academic investigation: Cabodi, Quer, Somenzi DAC ‘01 Commercial sequential verification tools! State representation: linear decrease Total state space: exponential decrease Reachable state space: potential decrease

4 Orientation Consider one combinational frame of the circuit
A single directed acyclic graph of combinational logic Nodes: logic gates Edges: pair-wise net connections Inputs: register outputs, primary inputs Outputs: registers inputs, primary outputs register inputs primary outputs primary inputs register outputs

5 Cuts in a Frame Momentarily ignore all primary IOs and their transitive fan-in/out Retiming = a complete cut of the DAG Number of registers = Problem consists of finding minimum cut

6 Max-Flow Formulation Min-cut/Max-flow Duality Compute flow
Edges in graph are assigned a capacity Min-cut width = Max-flow through graph source sink source sink Compute flow Partition graph into {S,R}, S  R =  S = an augmenting path exists from source to s R = no augmenting path exists from source to r Reachable versus unreachable residual graph

7 Closest Min-Cut Insert registers between a node and any fan-out that lies in the other partition Fast: remove old registers; insert new ones Min-cut is not unique Minimum movement of registers sink source

8 Unconstrained Flow Width of minimum cut = capacity of crossing edges Effect of unconstrained edges? Restrict location of finite cut A useful tool…  1 = ? = 2 1 1

9 Reverse Edges Min-cut guarantees every path will be cut at least once
Retiming requires that every path is cut exactly once R1’ R2’ R3’ A cut must be crossed by a reverse edge to have a path with more than one crossing Solution: Use unconstrained flow to prevent reverse edges R1 R2 R3

10 Fanout Sharing Flow graph is composed of arcs
False model of register count One register per hyperedge “Fanout sharing” Introduce a structure to simulate fanout-sharing 1 1 1 1 1 1

11 Single Iteration What does the final flow graph look like?
1 What does the final flow graph look like? Reverse Edges Fanout-sharing One constraint per node output (not edge) Unitary Flow Simplification Binary marking scheme Flow computed on original graph 1

12 Primary Inputs/Outputs
Synchronization with environment is flexible Registers can be absorbed from / donated to environment… or not Constrain growth of additional initialization logic “Switching off” desynchronization: exclude TFO of primary I/Os Logic source sink PI Forward retiming past this node Increases latency through PI by 1 PO PI

13 Multiple Frames Thus far, we have only considered retiming within one combinational frame At most one register is moved across a node Global min-cut may stretch across multiple combinational frames May require moving multiple registers across a node Solution: Repeat over single frame Terminate when no further change Logic Logic Logic

14 Forward and Backward Backward retiming must also be considered
Solution for sequential core has already been found Considers retiming in fan-out cone of inputs Backward retiming is identical, with roles of PIs, POs, sources, and sinks reversed Logic Logic Logic Logic

15 Overall Algorithm Start Forward retiming Backward retiming Block Fan-out Cone of PIs? Block Fan-in Cone of POs? Compute Max-Flow Compute Max-Flow Yes Yes Improv.? No Implement Min-Cut Improv.? Implement Min-Cut No Forward retiming is preferred b/c of initial state computation Done

16 Asymptotic Analysis Minimum-register retiming can also be solved using other methods Original formulation: LP The Competition: Min-cost flow using cost scaling [Goldberg 97] Our Algorithm: Single iteration limited by maximum flow Total number of iterations is bounded by |R| Or, using unitary flow simplification…

17 Scalability Each iteration is strictly better than the previous
Runtime can be bounded and any intermediate result accepted At the low, low cost of suboptimality Register Savings per Iteration The “real” improvement over previous techniques

18 Experimental Results Applied to OpenCores designs
Reduction in number of registers Average = -11% Maximum = -62.5% Cost in delay Average = +25.8% Runtime is 5x faster than minimum-cost formulation <0.01s for 70% of benchmarks

19 Conclusions Max-flow approach to minimizing register count
Optimal solution with minimum register movement Handles different models of environment synchronization Faster than existing methods Algorithmically and practically easier network problem Allows simplification via binary marking Scalable


Download ppt "Fast Min-Register Retiming Through Binary Max-Flow"

Similar presentations


Ads by Google