Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification.

Slides:



Advertisements
Similar presentations
Address comments to FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1.
Advertisements

ECE 667 Synthesis & Verificatioin - FPGA Mapping 1 ECE 667 Synthesis and Verification of Digital Systems Technology Mapping for FPGAs D.Chen, J.Cong, DAOMap.
FPGA Technology Mapping Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Clock Skewing EECS 290A Sequential Logic Synthesis and Verification.
ECE 667 Synthesis and Verification of Digital Systems
Sequential Timing Optimization. Long path timing constraints Data must not reach destination FF too late s i + d(i,j) + T setup  s j + P s i s j d(i,j)
ECE 667 Synthesis and Verification of Digital Systems
NTHU-CS 1 Performance-Optimal Clustering with Retiming for Sequential Circuits Tzu-Chieh Tien and Youn-Long Lin Department of Computer Science National.
ECE Synthesis & Verification - Lecture 8 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Introduction.
1 DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004 Presented by: Wei Chen.
Introduction to Logic Synthesis Alan Mishchenko UC Berkeley.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 7: February 11, 2008 Static Timing Analysis and Multi-Level Speedup.
EE290A 1 Retiming of AND- INVERTER graphs with latches Juliet Holwill 290A Project 10 May 2005.
Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification.
1 FRAIGs: Functionally Reduced And-Inverter Graphs Adapted from the paper “FRAIGs: A Unifying Representation for Logic Synthesis and Verification”, by.
EDA (CS286.5b) Day 3 Clustering (LUT Map and Delay) N.B. no lecture Thursday.
DAG-Aware AIG Rewriting Alan Mishchenko, Satrajit Chatterjee, Robert Brayton Department of EECS, University of California Berkeley Presented by Rozana.
1 A New Enhanced Approach to Technology Mapping Alan Mishchenko Presented by: Sheng Xu May 2 nd 2006.
CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.
EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.
Logic Synthesis Primer
FPGA Technology Mapping Algorithms
FPGA Technology Mapping. 2 Technology mapping:  Implements the optimized nodes of the Boolean network to the target device library.  For FPGA, library.
ECE Synthesis & Verification, Lecture 17 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Systems Technology.
Electrical and Computer Engineering Archana Rengaraj ABC Logic Synthesis basics ECE 667 Synthesis and Verification of Digital Systems Spring 2011.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.
Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 23: April 20, 2015 Static Timing Analysis and Multi-Level Speedup.
Logic Synthesis: Past and Future Alan Mishchenko UC Berkeley.
1 Stephen Jang Kevin Chung Xilinx Inc. Alan Mishchenko Robert Brayton UC Berkeley Power Optimization Toolbox for Logic Synthesis and Mapping.
Technology Mapping. 2 Technology mapping is the phase of logic synthesis when gates are selected from a technology library to implement the circuit. Technology.
Optimality Study of Logic Synthesis for LUT-Based FPGAs Jason Cong and Kirill Minkovich.
Give qualifications of instructors: DAP
Technology Mapping with Choices, Priority Cuts, and Placement-Aware Heuristics Alan Mishchenko UC Berkeley.
1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,
DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong , Computer Science Department , UCLA Presented.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 20: April 4, 2011 Static Timing Analysis and Multi-Level Speedup.
Retiming EECS 290A Sequential Logic Synthesis and Verification.
Min-Register Retiming Under Simultaneous Timing and Initial State Constraints Aaron Hurst Dec
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
Reducing Structural Bias in Technology Mapping
SINGLE-LEVEL PARTITIONING SUPPORT IN BOOM-II
Power Optimization Toolbox for Logic Synthesis and Mapping
Delay Optimization using SOP Balancing
Robert Brayton Alan Mishchenko Niklas Een
ESE535: Electronic Design Automation
Alan Mishchenko Satrajit Chatterjee Robert Brayton UC Berkeley
Applying Logic Synthesis for Speeding Up SAT
Reconfigurable Computing
Standard-Cell Mapping Revisited
SAT-Based Area Recovery in Technology Mapping
Alan Mishchenko University of California, Berkeley
SAT-Based Optimization with Don’t-Cares Revisited
Sungho Kang Yonsei University
Integrating Logic Synthesis, Technology Mapping, and Retiming
Alan Mishchenko UC Berkeley
Integrating an AIG Package, Simulator, and SAT Solver
Introduction to Logic Synthesis
Improvements in FPGA Technology Mapping
Recording Synthesis History for Sequential Verification
Delay Optimization using SOP Balancing
Logic Synthesis: Past and Future
Magic An Industrial-Strength Logic Optimization, Technology Mapping, and Formal Verification System Alan Mishchenko UC Berkeley.
Innovative Sequential Synthesis and Verification
Robert Brayton Alan Mishchenko Niklas Een
SAT-based Methods: Logic Synthesis and Technology Mapping
Fast Min-Register Retiming Through Binary Max-Flow
Alan Mishchenko Department of EECS UC Berkeley
CS137: Electronic Design Automation
Presentation transcript:

Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification

Outline  Motivation  Technology mapping for combinational circuits  Generalizing the concept of combinational delay to sequential circuit using the concept of l-value  Technology mapping for sequential circuits Computation of cuts Computation of cuts Search for the optimum-delay solution Search for the optimum-delay solution Computation of optimum l-valuesComputation of optimum l-values Constructing the solution Constructing the solution Retiming for optimum delay Retiming for optimum delay

Traditional Tech Mapping Approach  Cut sequential circuit at the latch boundary  Optimize and map the combinational part Pros: Preserves latch encoding Pros: Preserves latch encoding Cons: Potentially suboptimal Cons: Potentially suboptimal  (Optional) Retime the mapped circuit LI PO PI LO Logic Latches

Motivating Example: LUT Size = 3 ab c i1i1 i2i2 f ab c i1i1 i2i2 f i2i2 i1i1 f i1i1 f i2i2    2 LUTs mapping retiming 1 LUT

Basic Mapping: Overview  Pre-compute truth tables of gates (supergates)  Represent netlist as an AND-INV graph (AIG)  For each node, compute cuts  Map network for delay  Recover area using heuristics  Select final mapping

What is Mapping?  Mapping expresses functions using gates z1z1 z2z2 z3z3 x5x5 x4x4 x3x3 x2x2 x1x1

cd ab F(a,b,c,d) = ab + d(ac’+bc) F(a,b,c,d) = ac’(b’d’)’ + c(a’d’)’ = ac’(b+d) + bc(a+d) cd ab nodes 4 levels 7 nodes 3 levels bcac a b d acbdbcad Basic Mapping: AND-INV Graphs

Basic Mapping: Computing AIG  Technology-independent synthesis Any synthesis flow can be used Any synthesis flow can be used  Constructing AIG from factored forms SOPs are factored using algebraic factoring SOPs are factored using algebraic factoring  Balancing AIG Reduces delay Reduces delay z1z1 z2z2 z3z3 x5x5 x4x4 x3x3 x2x2 x1x1 n Fn= x 2 x 3 ’ x 4

Basic Mapping: Cuts  Definition. A cut C for a node n is a set of nodes, such that all paths from the primary inputs to n passes through a node in C Node itself is an elementary cut Node itself is an elementary cut k-feasible cuts are cuts containing at most k nodes k-feasible cuts are cuts containing at most k nodes An average number of 5-feasible cuts in benchmarks is ~20 cuts per node An average number of 5-feasible cuts in benchmarks is ~20 cuts per node n x3x3 x2x2 x1x1

Basic Mapping: Computing Cuts Compute all 2-feasible cuts of node n. Cuts for node p = {{p}, {s,x 2 }, {x 1,x 2 }} Cuts for node q = {{q}, {x 2,t}, {x 2,x 3 }} Cuts for node n = {{p}, {s,x 2 }, {x 1,x 2 }}  {{q}, {x 2,t}, {x 2,x 3 }}  {n} = {{n}, {p,q}, {p,x 2,t}, {p,x 2,x 3 }, …} 2-feasible cuts for node n = {{n}, {p,q}} n x3x3 x2x2 x1x1 q p s t  All k-feasible cuts are computed in one pass over the AIG Assign elementary cuts for primary inputs Assign elementary cuts for primary inputs For each internal node For each internal node merge the cut sets of children while removing duplicated cutsmerge the cut sets of children while removing duplicated cuts add the elementary cut composed of the node itselfadd the elementary cut composed of the node itself

Basic Mapping: Truth Tables  Truth table is a bit-string representing Boolean function of a cut  Truth tables are computed for all cuts of all nodes For each cut, assign elementary variables to cut leaves For each cut, assign elementary variables to cut leaves Compute the truth tables for the internal nodes in topological order Compute the truth tables for the internal nodes in topological order x3x3 x1x1 t q x2x2 x1 = x2 = x3 = t = x2 & x3 = q = x1 & t =  LSB  MSB

Basic Mapping: Delay Optimality  Assign the arrival times of the primary inputs  For each node, in topological order Compare the truth table of the cut with the truth tables of the gates (when they are equal, we have a match) Compare the truth table of the cut with the truth tables of the gates (when they are equal, we have a match) Compute the arrival times of each cut, in both phases Compute the arrival times of each cut, in both phases Select the best cut for each phase Select the best cut for each phase When arrival times are equal, use area as a tie-breaker When arrival times are equal, use area as a tie-breaker c1c1 c2c2 c3c3 c4c4 T c2 < T c3 < T c1 < T c4 C 2 is the best cut

Basic Mapping: Area Recovery  Performs three passes Minimize area flow Minimize area flow Minimize exact area for best matches Minimize exact area for best matches Minimize area by phase assignment Minimize area by phase assignment  In each pass, for all nodes, in topological order Consider matches with Consider matches with ArrivalTime <= RequiredTime ArrivalTime <= RequiredTime Among these matches, pick the one minimizing area(flow) Among these matches, pick the one minimizing area(flow) When area(flows) are equal, use delay as a tie-breaker When area(flows) are equal, use delay as a tie-breaker c1c1 c2c2 c3c3 c4c4 A c2 < A c3 < A c1 < A c4 C 2 is the best cut

Basic Mapping: Area Flow  Definition: Area flow of a primary input is 0 Area flow of a primary input is 0 Area flow of a node in the network is Area flow of a node in the network is AF(n) = [ Area(n) +  i AF(fanin i (n)) ] / NumFanouts(n) 0 0 1/3 (1+1/3) / 2 = 2/3 0

Basic Mapping: Area of a Match  Definition. Area of a match is the sum total of the areas of all the gates in maximum fanout-free cone (MFFC) of the root gate (includes the root gate and some of the fanins) M1M1 g1g1 g2g2 g3g3 g4g4 g5g5 g6g6 g7g7 g8g8 g9g9 g 10 g 11 g 12 g 13 A(M 1 )=A(g 1 )+ A(g 3 )+ A(g 4 )+ A(g 5 )+A(g 9 )

Basic Mapping: Select Final Mapping  Extracting the final mapping from the AIG after the best matches are assigned to each node Select the best match for each primary output node Select the best match for each primary output node Recursively, for each fanin of a selected match, select its best matches Recursively, for each fanin of a selected match, select its best matches z1z1 z2z2 z3z3 x5x5 x4x4 x3x3 x2x2 x1x1

Mapping for Sequential Circuits  Represent netlist as an AND-INV graph (AIG)  For each node, compute cuts (iteration over the circuit)  For each node, compute l-values (iteration over the circuit)  Map network for delay (iteration over the clock periods)  Recover area using heuristics  Select final mapping P. Pan and C.-C. Lin, “A new retiming-based technology mapping algorithm for LUT-based FPGAs”, Proc. FPGA ’98.

l-Value: A Generalization of Combinational Delay  Definition. For each edge e: u  v in S, we assign l-weight equal to -  d+  u  v, where  is the clock period,  is the clock period, d is the number of latches on the edge, and d is the number of latches on the edge, and  u  v is the combinational delay of pin u of node v.  u  v is the combinational delay of pin u of node v.  Definition. The l-value of a node in S is defined as the maximum weight of the paths from the PIs to the node using the l-weights.  Theorem: S can be retimed to a clock period  iff the l-value of each PO is less than or equal to .

Example ab c i1i1 i2i2 f D = 1  = 1 - infeasible l(a) = 1, l(c)=2, etc D = 1  = 2 - feasible l(a) = 1, l(c)=2, l(a) = 1, l(c) = 2, etc D = 1  = 3 - feasible l(a) = 1, l(c)=2, l(a) = 0, l(c) = 1, etc

Computing Cuts for each non-PO node v in N L v = {{v 0 }}; L v = {{v 0 }}; done = false; while ( done == false ) do done = true; done = true; for each node v (not PI or PO) in N do for each node v (not PI or PO) in N do tmp = merge (L u1, L u2, …, L ui ); tmp = merge (L u1, L u2, …, L ui ); if ( tmp  L v ) then if ( tmp  L v ) then Lv = tmp  {{v 0 }}; Lv = tmp  {{v 0 }}; done = false; done = false; return success; // L v settled to C v for each v merge(C u1,C u2,…,C ut ) = {c = c 1 d1  c 2 d2  …  c t dt |c i  C ui and |c|  k } where where c i di = {x d+di | x d  c i } and c i di = {x d+di | x d  c i } and d i is the number of latches on the edge from u i to v. d i is the number of latches on the edge from u i to v.

Example i 1 i 2 a b c i 1 i 2 a b c 0: {i 1 0 } {i 2 0 } {a 0 } {b 0 } {c 0 } 0: {i 1 0 } {i 2 0 } {a 0 } {b 0 } {c 0 } 1: {i 1 0, c 1 } {i 2 0, c 0 } {a 0, b 1 } 1: {i 1 0, c 1 } {i 2 0, c 0 } {a 0, b 1 } {a 0, i 2 1, c 1 } {a 0, i 2 1, c 1 } {i 1 0, c 1, b 1 } {i 1 0, c 1, b 1 } {i 1 0, c 1, i 2 1 } {i 1 0, c 1, i 2 1 } 2: {i 1 0, a 1, b 2 } {i 2 0, a 0, b 1 } 2: {i 1 0, a 1, b 2 } {i 2 0, a 0, b 1 } ab c i1i1 i2i2

Finding Minimum l-Values for each node v in N do if (v is a PI) l(v) = 0; if (v is a PI) l(v) = 0; else l(v) = -  ; else l(v) = -  ; done = false; while ( done == false ) do done = true; done = true; for each non-PI node v in N do for each non-PI node v in N do tmp = min c, a cut of v ( max[ l(u) -  d+  u  v | u d  c] ) tmp = min c, a cut of v ( max[ l(u) -  d+  u  v | u d  c] ) if ( l(v) < tmp ) if ( l(v) < tmp ) l(v) = tmp; done = false; l(v) = tmp; done = false; if ( v is a PO and l(v) >  ) return failure; if ( v is a PO and l(v) >  ) return failure; return success; // bound have settled

Constructing Mapping Solution U = the set of POs S = { v | v is a PI or PO } while ( U   ) do while ( U   ) do v = any node in U; U = U – {v}; v = any node in U; U = U – {v}; for each non-trivial cut c  C v do for each non-trivial cut c  C v do if ( l opt (v) == max[ l opt (u) -  d+  u  v | u d  c] ) if ( l opt (v) == max[ l opt (u) -  d+  u  v | u d  c] ) c best = c; c best = c; for each u d  c best do for each u d  c best do if ( u is not in S ) if ( u is not in S ) S = S  {u}; U = U  {u}; S = S  {u}; U = U  {u}; create an edge is S from u to v with d FFs; create an edge is S from u to v with d FFs; return S;

Performing Final Retiming  Retime each node v with the following retiming lag: where l opt (v) is the optimal retiming value and where l opt (v) is the optimal retiming value and  is the selected clock period  is the selected clock period