Fast Min-Register Retiming Through Binary Max-Flow

Slides:

Advertisements

Similar presentations

Address comments to FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1.

Advertisements

Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.

FPGA Technology Mapping Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

Global Flow Optimization (GFO) in Automatic Logic Design “ TCAD91 ” by C. Leonard Berman & Louise H. Trevillyan CAD Group Meeting Prepared by Ray Cheung.

Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.

Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification.

Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.

EE290A 1 Retiming of AND- INVERTER graphs with latches Juliet Holwill 290A Project 10 May 2005.

Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification.

EDA (CS286.5b) Day 3 Clustering (LUT Map and Delay) N.B. no lecture Thursday.

CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.

EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.

ECE Synthesis & Verification 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Retiming.

1 A Method for Fast Delay/Area Estimation EE219b Semester Project Mike Sheets May 16, 2000.

NetworkModel-1 Network Optimization Models. NetworkModel-2 Network Terminology A network consists of a set of nodes and arcs. The arcs may have some flow.

Electrical and Computer Engineering Archana Rengaraj ABC Logic Synthesis basics ECE 667 Synthesis and Verification of Digital Systems Spring 2011.

05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.

FORMAL VERIFICATION OF ADVANCED SYNTHESIS OPTIMIZATIONS Anant Kumar Jain Pradish Mathews Mike Mahar.

Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.

ABC: A System for Sequential Synthesis and Verification BVSRC Berkeley Verification and Synthesis Research Center Robert Brayton, Niklas Een, Alan Mishchenko,

Cut-Based Inductive Invariant Computation Michael Case 1,2 Alan Mishchenko 1 Robert Brayton 1 Robert Brayton 1 1 UC Berkeley 2 IBM Systems and Technology.

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 3: January 12, 2004 Clustering (LUT Mapping, Delay)

Technology Mapping. 2 Technology mapping is the phase of logic synthesis when gates are selected from a technology library to implement the circuit. Technology.

Pipelining and Retiming

1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.

Static Timing Analysis

Fast Synthesis of Clock Gating from Existing Logic Aaron P. Hurst Univ. of California, Berkeley Portions In Collaboration with… Artur Quiring and Andreas.

Retiming EECS 290A Sequential Logic Synthesis and Verification.

Enhancing Model Checking Engines for Multi-Output Problem Solving Alan Mishchenko Robert Brayton Berkeley Verification and Synthesis Research Center Department.

Min-Register Retiming Under Simultaneous Timing and Initial State Constraints Aaron Hurst Dec

Maximum Flow c v 3/3 4/6 1/1 4/7 t s 3/3 w 1/9 3/5 1/1 3/5 u z 2/2

Synthesis for Verification

CS137: Electronic Design Automation

The minimum cost flow problem

CS4234 Optimiz(s)ation Algorithms

Alan Mishchenko UC Berkeley

Delay Optimization using SOP Balancing

CS137: Electronic Design Automation

ESE535: Electronic Design Automation

Alan Mishchenko Satrajit Chatterjee Robert Brayton UC Berkeley

Applying Logic Synthesis for Speeding Up SAT

Reconfigurable Computing

Standard-Cell Mapping Revisited

Property Directed Reachability with Word-Level Abstraction

Instructor: Shengyu Zhang

Maximum Flow c v 3/3 4/6 1/1 4/7 t s 3/3 w 1/9 3/5 1/1 3/5 u z 2/2

SAT-Based Area Recovery in Technology Mapping

Integer Programming (정수계획법)

ESE535: Electronic Design Automation

SAT-Based Optimization with Don’t-Cares Revisited

Scalable and Scalably-Verifiable Sequential Synthesis

Integrating Logic Synthesis, Technology Mapping, and Retiming

Timing Optimization.

Integrating an AIG Package, Simulator, and SAT Solver

Improvements in FPGA Technology Mapping

Integer Programming (정수계획법)

Algorithms (2IL15) – Lecture 7

EE5900 Advanced Embedded System For Smart Infrastructure

Maximum Flow c v 3/3 4/6 1/1 4/7 t s 3/3 w 1/9 3/5 1/1 3/5 u z 2/2

ESE535: Electronic Design Automation

Text Book: Introduction to algorithms By C L R S

Recording Synthesis History for Sequential Verification

Delay Optimization using SOP Balancing

Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow

Innovative Sequential Synthesis and Verification

Robert Brayton Alan Mishchenko Niklas Een

CS137: Electronic Design Automation

Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.

Lecture 3: Incompletely Specified Functions and K Maps

Presentation transcript:

Fast Min-Register Retiming Through Binary Max-Flow Aaron Hurst Alan Mishchenko Robert Brayton FMCAD 2007

Retiming Retiming is the structural relocation of registers such that output functionality is preserved Registers can be relocated to one of several ends Minimizing worst-case delay Minimizing number of registers Either of the above under constraints Optimally or heuristically Other… ? In this work, we look at optimal register minimization without delay constraints: “Min-register”

Why Min-Register? Register count is critical in sequential verification “Make or break” effect Low investment Evidence of net utility Academic investigation: Cabodi, Quer, Somenzi DAC ‘01 Commercial sequential verification tools! State representation: linear decrease Total state space: exponential decrease Reachable state space: potential decrease

Orientation Consider one combinational frame of the circuit A single directed acyclic graph of combinational logic Nodes: logic gates Edges: pair-wise net connections Inputs: register outputs, primary inputs Outputs: registers inputs, primary outputs register inputs primary outputs primary inputs register outputs

Cuts in a Frame Momentarily ignore all primary IOs and their transitive fan-in/out Retiming = a complete cut of the DAG Number of registers = Problem consists of finding minimum cut

Max-Flow Formulation Min-cut/Max-flow Duality Compute flow Edges in graph are assigned a capacity Min-cut width = Max-flow through graph source sink source sink Compute flow Partition graph into {S,R}, S  R =  S = an augmenting path exists from source to s R = no augmenting path exists from source to r Reachable versus unreachable residual graph

Closest Min-Cut Insert registers between a node and any fan-out that lies in the other partition Fast: remove old registers; insert new ones Min-cut is not unique Minimum movement of registers sink source

Unconstrained Flow Width of minimum cut = capacity of crossing edges Effect of unconstrained edges? Restrict location of finite cut A useful tool…  1 = ? = 2 1 1

Reverse Edges Min-cut guarantees every path will be cut at least once Retiming requires that every path is cut exactly once R1’ R2’ R3’ A cut must be crossed by a reverse edge to have a path with more than one crossing Solution: Use unconstrained flow to prevent reverse edges R1 R2 R3

Fanout Sharing Flow graph is composed of arcs False model of register count One register per hyperedge “Fanout sharing” Introduce a structure to simulate fanout-sharing 1 1 1 1 1  1

Single Iteration What does the final flow graph look like? 1  What does the final flow graph look like? Reverse Edges Fanout-sharing One constraint per node output (not edge) Unitary Flow Simplification Binary marking scheme Flow computed on original graph 1 

Primary Inputs/Outputs Synchronization with environment is flexible Registers can be absorbed from / donated to environment… or not Constrain growth of additional initialization logic “Switching off” desynchronization: exclude TFO of primary I/Os Logic source sink PI Forward retiming past this node Increases latency through PI by 1 PO PI

Multiple Frames Thus far, we have only considered retiming within one combinational frame At most one register is moved across a node Global min-cut may stretch across multiple combinational frames May require moving multiple registers across a node Solution: Repeat over single frame Terminate when no further change Logic Logic Logic

Forward and Backward Backward retiming must also be considered Solution for sequential core has already been found Considers retiming in fan-out cone of inputs Backward retiming is identical, with roles of PIs, POs, sources, and sinks reversed Logic Logic Logic Logic

Overall Algorithm Start Forward retiming Backward retiming Block Fan-out Cone of PIs? Block Fan-in Cone of POs? Compute Max-Flow Compute Max-Flow Yes Yes Improv.? No Implement Min-Cut Improv.? Implement Min-Cut No Forward retiming is preferred b/c of initial state computation Done

Asymptotic Analysis Minimum-register retiming can also be solved using other methods Original formulation: LP The Competition: Min-cost flow using cost scaling [Goldberg 97] Our Algorithm: Single iteration limited by maximum flow Total number of iterations is bounded by |R| Or, using unitary flow simplification…

Scalability Each iteration is strictly better than the previous Runtime can be bounded and any intermediate result accepted At the low, low cost of suboptimality Register Savings per Iteration The “real” improvement over previous techniques

Experimental Results Applied to OpenCores designs Reduction in number of registers Average = -11% Maximum = -62.5% Cost in delay Average = +25.8% Runtime is 5x faster than minimum-cost formulation <0.01s for 70% of benchmarks

Conclusions Max-flow approach to minimizing register count Optimal solution with minimum register movement Handles different models of environment synchronization Faster than existing methods Algorithmically and practically easier network problem Allows simplification via binary marking Scalable