Presentation is loading. Please wait.

Presentation is loading. Please wait.

Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification.

Similar presentations


Presentation on theme: "Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification."— Presentation transcript:

1 Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification

2 Outline  Motivation  Classical retiming  Continuous retiming  Experimental comparison

3 Motivation  Retiming can reduce the clock cycle of the circuit Critical path has delay 4Critical paths have delay 2

4 Motivation (cont.)  Previous algorithms for retiming require Computing latch-to-latch delays Computing latch-to-latch delays Solving an ILP problem Solving an ILP problem  The goal is to develop a more efficient algorithm that works directly on the circuit without ILP

5 Classical Formulation  During retiming the registers are moved over combinational nodes: w r (e u  v ) = r(v) + w(e u  v ) – r(u), where r(v), the retiming lags, are the number of registers moved from the outputs to the inputs of v.  For each path p: u  v we define its weight w(p) as the sum total of registers on all edges.  The minimum clock period stands for the maximum 0-weight path P = max  p: w(p) = 0 {d(p)}  Matrices W(u,v) and D(u,v) are defined for all pairs of vertices that are connected by a path that does not go through the host node W(u,v) = min  p: u  v {w(p)} and D(u,v) = max  p: u  v and w(p)= W(u,v) {d(p)} C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry, Algorithmica, 1991, vol. 6, pp. 5-35.

6 Classical Formulation (cont.)  W(u,v) denotes the minimum latency, in clock cycles, for the data flowing from u to v  D(u,v) gives the maximum delay from u to v over all path with the minimum latency  The computation of retiming labels for the clock period P is performed by solving a Linear Programming problem: r(u) – r(v)  w(e u  v ),  e u  v  E r(u) – r(v)  W(u,v) – 1,  D(u,v) > P  The constraints ensure that after retiming the latency of each edge is non-negative the latency of each edge is non-negative each path whose delay is larger than the clock period has at least one register on it each path whose delay is larger than the clock period has at least one register on it

7 Implementations of Retiming  Leiserson/Saxe compute the matrices, generate constraints, and then solve the LP problem  Shenoy/Rudell compute the matrix one column at a time Reduced space requirements, still prohibitive runtime Reduced space requirements, still prohibitive runtime  Sapatnekar proposed a way of utilizing retiming/skew equivalence to reduce the number of constraints generated S. S. Sapatnekar, R. B. Deokar, “Utilizing the retiming-skew equivalence in a practical algorithms for retiming large circuits”, IEEE Trans. CAD, vol. 15(10), Oct.1996, pp. 1237-1248.

8 Sapatenekar’s Retiming Algorithm  Find ASAP and ALAP skews for a feasible clock period Use binary search to find a feasible clock period Use binary search to find a feasible clock period  Perform min-delay retiming by moving latched to fit the timing window  Perform min-area retiming under delay constraints by solving a reduced LP problem The reduced set of constraints is generated using the skews The reduced set of constraints is generated using the skews The LP problem is solved efficiently using a variation of network simplex method The LP problem is solved efficiently using a variation of network simplex method  Improvement: Start by finding maximum ration using Howard’s algorithm

9 Pan’s Algorithm  Definitions  Pseudo-code  Convergence  Improvements  Experiments

10 Definitions  A circuit is an edge-weighted, node-weighted directed graph Weight of a node, d(v), is its combinational delay Weight of a node, d(v), is its combinational delay Weight of an edge, w(e), is its number of FFs Weight of an edge, w(e), is its number of FFs  Continuous retiming is a retiming, in which the number of latches retimed is a continuous value (rather than an integer)  The retiming value is computed as before: w r (e u  v ) = s(v) + w(e u  v ) – s(u), where s(v) are the continuous retiming lags.

11 Definitions  Definition. A circuit is retimed to a clock period  by a retiming r if the following two conditions are satisfied: (1) w r (e)  0 and (2) w r (p)  1 for each path p such that d(p)  .  Definition. A circuit is c-retimed to a clock period of  by a c-retiming s if w s (e)  d(v) /  for each edge u  v.  Definition of c-retiming enforces non-negative edge weights non-negative edge weights if d(u 1 ) – d(u 2 )  , then w s (p)  1. if d(u 1 ) – d(u 2 )  , then w s (p)  1.

12 Pseudo-code for each node v in N do if (v is a PI) s(v) = 0; if (v is a PI) s(v) = 0; else s(v) = -  ; else s(v) = -  ; for each i = 0 to |U| + 2 done = true; done = true; for each non-PI node vj in N do for each non-PI node vj in N do tmp = max e: u  vj { s(u) – w(e) + d(v j ) /  } tmp = max e: u  vj { s(u) – w(e) + d(v j ) /  } if ( v j is a PO and tmp > 1 ) return failure; if ( v j is a PO and tmp > 1 ) return failure; if (s(v j ) < tmp ) if (s(v j ) < tmp ) s(v j ) = tmp; done = false; s(v j ) = tmp; done = false; if (done == true ) if (done == true ) return success; // c-retiming reached a fixed point return success; // c-retiming reached a fixed point return failure;

13 Convergence  Theorem. If the nodes are relaxed according to the topological order, the algorithm stops in at most |U| + 1 relaxation iterations if there is no positive cycle, where U is a cut which breaks all the loops.

14 Reduction to Classical Retiming  Let s be a c-retiming that achieves clock period . Let r be the retiming defined as follows:  Then r can achieve a clock period less than  + D where D is the largest combinational delay of a node.

15 Area Minimization  The problem of minimizing the amount of (fractional) FFs subject to a given clock period  is a LP: minimize[  c w s (e) ] minimize[  c w s (e) ] subject to w s (e)  d(v) /  for each u  v. subject to w s (e)  d(v) /  for each u  v.  The dual of this problem is an uncapacitated min-cost flow problem The flow graph is a network The flow graph is a network The flow out of each node is difference between its fanout count and fanin count The flow out of each node is difference between its fanout count and fanin count The cost of an edge is w 1 (e) = - w(e) + d(v) /  The cost of an edge is w 1 (e) = - w(e) + d(v) / 

16 Improvements  Perform a “required time” c-retiming In addition to the “arrival time” c-retiming In addition to the “arrival time” c-retiming  Retime over circuits with choice nodes Combines logic synthesis and c-retiming Combines logic synthesis and c-retiming  Heuristically minimize area Leads to faster computation than solving ILP Leads to faster computation than solving ILP

17 Experimental Results  Comparing the following three algorithms P. Pan (ICCD ’96) P. Pan (ICCD ’96) Sapatnekar/Deokar (TCAD ’96) Sapatnekar/Deokar (TCAD ’96) Maheshwari/Sapatnekar (TVLSI ’98) Maheshwari/Sapatnekar (TVLSI ’98)

18 P. Pan (ICCD’96) CPU time is measured on Sparc 5

19 Sapatnekar/Deokar (TCAD ’96) CPU time is measured on HP 735 workstation

20 Maheshwari/Sapatnekar (TVLSI ’98) CPU time is measured on DEC AXP system 3000/900 workstation

21 Conclusions  Presented an alternative approach to retiming  Compared it with other methods  Proposed several improvements


Download ppt "Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification."

Similar presentations


Ads by Google