Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification.

Slides:



Advertisements
Similar presentations
April 2004NUCAD Northwestern University1 Minimal Period Retiming Under Process Variations Jia Wang and Hai Zhou Electrical & Computer Engineering Northwestern.
Advertisements

ECE 667 Synthesis and Verification of Digital Circuits
OCV-Aware Top-Level Clock Tree Optimization
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Chapter 4 Retiming.
Introduction to Algorithms
© KLMH Lienig Paper: A Unified Theory of Timing Budget Management Presented by: Hangcheng Lou Original Authors: Soheil Ghiasi, Elaheh Bozorgzadeh, Siddharth.
FPGA Technology Mapping Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Global Flow Optimization (GFO) in Automatic Logic Design “ TCAD91 ” by C. Leonard Berman & Louise H. Trevillyan CAD Group Meeting Prepared by Ray Cheung.
Clock Skewing EECS 290A Sequential Logic Synthesis and Verification.
Sequential Timing Optimization. Long path timing constraints Data must not reach destination FF too late s i + d(i,j) + T setup  s j + P s i s j d(i,j)
1 EL736 Communications Networks II: Design and Algorithms Class8: Networks with Shortest-Path Routing Yong Liu 10/31/2007.
Linear Programming.
NTHU-CS 1 Performance-Optimal Clustering with Retiming for Sequential Circuits Tzu-Chieh Tien and Youn-Long Lin Department of Computer Science National.
Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.
Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification.
Shortest Paths Definitions Single Source Algorithms –Bellman Ford –DAG shortest path algorithm –Dijkstra All Pairs Algorithms –Using Single Source Algorithms.
Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.
Spring 08, Feb 28 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2008 Retiming Vishwani D. Agrawal James J. Danaher.
EE290A 1 Retiming of AND- INVERTER graphs with latches Juliet Holwill 290A Project 10 May 2005.
Retiming. Consider the Following Circuit Suppose T XOR = 3 ns, T pcq = 1 ns, T setup = 1 ns, then this circuit can be clocked at 1 ns + (3 x 3 ns) + 1.
Shortest Paths Definitions Single Source Algorithms
Retiming with Interconnect and Gate Delay CUHK CSE CAD Group Dennis Tong 29 th Sept., 2003.
DAG-Aware AIG Rewriting Alan Mishchenko, Satrajit Chatterjee, Robert Brayton Department of EECS, University of California Berkeley Presented by Rozana.
Spring 07, Apr 5 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Retiming Vishwani D. Agrawal James J. Danaher Professor.
Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.
EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.
1 Retiming Outline: ProblemProblem FormulationFormulation Retiming algorithmRetiming algorithm.
ECE Synthesis & Verification - LP Scheduling 1 ECE 667 ECE 667 Synthesis and Verification of Digital Circuits Scheduling Algorithms Analytical approach.
Graph-Cut Algorithm with Application to Computer Vision Presented by Yongsub Lim Applied Algorithm Laboratory.
Linear Programming – Max Flow – Min Cut Orgad Keller.
ECE Synthesis & Verification 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Retiming.
1 IOE/MFG 543 Chapter 7: Job shops Sections 7.1 and 7.2 (skip section 7.3)
EDA (CS286.5b) Day 18 Retiming. Today Retiming –cycle time (clock period) –C-slow –initial states –register minimization.
DELAY INSERTION METHOD IN CLOCK SKEW SCHEDULING BARIS TASKIN and IVAN S. KOURTEV ISPD 2005 High Performance Integrated Circuit Design Lab. Department of.
CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.
Design Techniques for Approximation Algorithms and Approximation Classes.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
Spring 2014, Feb 14...ELEC 7770: Advanced VLSI Design (Agrawal)1 ELEC 7770 Advanced VLSI Design Spring 2014 Constraint Graph and Retiming Solution Vishwani.
Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
High-Level Synthesis-II Virendra Singh Indian Institute of Science Bangalore IEP on Digital System IIT Kanpur.
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
ELEC692 VLSI Signal Processing Architecture Lecture 3
1 Retiming and Re-synthesis Outline: RetimingRetiming Retiming and Resynthesis (RnR)Retiming and Resynthesis (RnR) Resynthesis of PipelinesResynthesis.
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
Pipelining and Retiming
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
Static Timing Analysis
Global Clustering-Based Performance-Driven Circuit Partitioning Jason Cong University of California Los Angeles Chang Wu Aplus Design.
Theory of Computing Lecture 12 MAS 714 Hartmut Klauck.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
Approximation Algorithms Duality My T. UF.
Retiming EECS 290A Sequential Logic Synthesis and Verification.
Approximation Algorithms based on linear programming.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
Min-Register Retiming Under Simultaneous Timing and Initial State Constraints Aaron Hurst Dec
The minimum cost flow problem
Alan Mishchenko UC Berkeley
James D. Z. Ma Department of Electrical and Computer Engineering
ELEC 7770 Advanced VLSI Design Spring 2012 Retiming
Standard-Cell Mapping Revisited
Vishwani D. Agrawal James J. Danaher Professor
SAT-Based Optimization with Don’t-Cares Revisited
Integrating Logic Synthesis, Technology Mapping, and Retiming
ELEC 7770 Advanced VLSI Design Spring 2016 Retiming
Timing Analysis and Optimization of Sequential Circuits
Fast Min-Register Retiming Through Binary Max-Flow
Presentation transcript:

Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification

Outline  Motivation  Classical retiming  Continuous retiming  Experimental comparison

Motivation  Retiming can reduce the clock cycle of the circuit Critical path has delay 4Critical paths have delay 2

Motivation (cont.)  Previous algorithms for retiming require Computing latch-to-latch delays Computing latch-to-latch delays Solving an ILP problem Solving an ILP problem  The goal is to develop a more efficient algorithm that works directly on the circuit without ILP

Classical Formulation  During retiming the registers are moved over combinational nodes: w r (e u  v ) = r(v) + w(e u  v ) – r(u), where r(v), the retiming lags, are the number of registers moved from the outputs to the inputs of v.  For each path p: u  v we define its weight w(p) as the sum total of registers on all edges.  The minimum clock period stands for the maximum 0-weight path P = max  p: w(p) = 0 {d(p)}  Matrices W(u,v) and D(u,v) are defined for all pairs of vertices that are connected by a path that does not go through the host node W(u,v) = min  p: u  v {w(p)} and D(u,v) = max  p: u  v and w(p)= W(u,v) {d(p)} C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry, Algorithmica, 1991, vol. 6, pp

Classical Formulation (cont.)  W(u,v) denotes the minimum latency, in clock cycles, for the data flowing from u to v  D(u,v) gives the maximum delay from u to v over all path with the minimum latency  The computation of retiming labels for the clock period P is performed by solving a Linear Programming problem: r(u) – r(v)  w(e u  v ),  e u  v  E r(u) – r(v)  W(u,v) – 1,  D(u,v) > P  The constraints ensure that after retiming the latency of each edge is non-negative the latency of each edge is non-negative each path whose delay is larger than the clock period has at least one register on it each path whose delay is larger than the clock period has at least one register on it

Implementations of Retiming  Leiserson/Saxe compute the matrices, generate constraints, and then solve the LP problem  Shenoy/Rudell compute the matrix one column at a time Reduced space requirements, still prohibitive runtime Reduced space requirements, still prohibitive runtime  Sapatnekar proposed a way of utilizing retiming/skew equivalence to reduce the number of constraints generated S. S. Sapatnekar, R. B. Deokar, “Utilizing the retiming-skew equivalence in a practical algorithms for retiming large circuits”, IEEE Trans. CAD, vol. 15(10), Oct.1996, pp

Sapatenekar’s Retiming Algorithm  Find ASAP and ALAP skews for a feasible clock period Use binary search to find a feasible clock period Use binary search to find a feasible clock period  Perform min-delay retiming by moving latched to fit the timing window  Perform min-area retiming under delay constraints by solving a reduced LP problem The reduced set of constraints is generated using the skews The reduced set of constraints is generated using the skews The LP problem is solved efficiently using a variation of network simplex method The LP problem is solved efficiently using a variation of network simplex method  Improvement: Start by finding maximum ration using Howard’s algorithm

Pan’s Algorithm  Definitions  Pseudo-code  Convergence  Improvements  Experiments

Definitions  A circuit is an edge-weighted, node-weighted directed graph Weight of a node, d(v), is its combinational delay Weight of a node, d(v), is its combinational delay Weight of an edge, w(e), is its number of FFs Weight of an edge, w(e), is its number of FFs  Continuous retiming is a retiming, in which the number of latches retimed is a continuous value (rather than an integer)  The retiming value is computed as before: w r (e u  v ) = s(v) + w(e u  v ) – s(u), where s(v) are the continuous retiming lags.

Definitions  Definition. A circuit is retimed to a clock period  by a retiming r if the following two conditions are satisfied: (1) w r (e)  0 and (2) w r (p)  1 for each path p such that d(p)  .  Definition. A circuit is c-retimed to a clock period of  by a c-retiming s if w s (e)  d(v) /  for each edge u  v.  Definition of c-retiming enforces non-negative edge weights non-negative edge weights if d(u 1 ) – d(u 2 )  , then w s (p)  1. if d(u 1 ) – d(u 2 )  , then w s (p)  1.

Pseudo-code for each node v in N do if (v is a PI) s(v) = 0; if (v is a PI) s(v) = 0; else s(v) = -  ; else s(v) = -  ; for each i = 0 to |U| + 2 done = true; done = true; for each non-PI node vj in N do for each non-PI node vj in N do tmp = max e: u  vj { s(u) – w(e) + d(v j ) /  } tmp = max e: u  vj { s(u) – w(e) + d(v j ) /  } if ( v j is a PO and tmp > 1 ) return failure; if ( v j is a PO and tmp > 1 ) return failure; if (s(v j ) < tmp ) if (s(v j ) < tmp ) s(v j ) = tmp; done = false; s(v j ) = tmp; done = false; if (done == true ) if (done == true ) return success; // c-retiming reached a fixed point return success; // c-retiming reached a fixed point return failure;

Convergence  Theorem. If the nodes are relaxed according to the topological order, the algorithm stops in at most |U| + 1 relaxation iterations if there is no positive cycle, where U is a cut which breaks all the loops.

Reduction to Classical Retiming  Let s be a c-retiming that achieves clock period . Let r be the retiming defined as follows:  Then r can achieve a clock period less than  + D where D is the largest combinational delay of a node.

Area Minimization  The problem of minimizing the amount of (fractional) FFs subject to a given clock period  is a LP: minimize[  c w s (e) ] minimize[  c w s (e) ] subject to w s (e)  d(v) /  for each u  v. subject to w s (e)  d(v) /  for each u  v.  The dual of this problem is an uncapacitated min-cost flow problem The flow graph is a network The flow graph is a network The flow out of each node is difference between its fanout count and fanin count The flow out of each node is difference between its fanout count and fanin count The cost of an edge is w 1 (e) = - w(e) + d(v) /  The cost of an edge is w 1 (e) = - w(e) + d(v) / 

Improvements  Perform a “required time” c-retiming In addition to the “arrival time” c-retiming In addition to the “arrival time” c-retiming  Retime over circuits with choice nodes Combines logic synthesis and c-retiming Combines logic synthesis and c-retiming  Heuristically minimize area Leads to faster computation than solving ILP Leads to faster computation than solving ILP

Experimental Results  Comparing the following three algorithms P. Pan (ICCD ’96) P. Pan (ICCD ’96) Sapatnekar/Deokar (TCAD ’96) Sapatnekar/Deokar (TCAD ’96) Maheshwari/Sapatnekar (TVLSI ’98) Maheshwari/Sapatnekar (TVLSI ’98)

P. Pan (ICCD’96) CPU time is measured on Sparc 5

Sapatnekar/Deokar (TCAD ’96) CPU time is measured on HP 735 workstation

Maheshwari/Sapatnekar (TVLSI ’98) CPU time is measured on DEC AXP system 3000/900 workstation

Conclusions  Presented an alternative approach to retiming  Compared it with other methods  Proposed several improvements