Wenlong Yang Lingli Wang State Key Lab of ASIC and System Fudan University, Shanghai, China Alan Mishchenko Department of EECS University of California,

Slides:



Advertisements
Similar presentations
FRAIGs - A Unifying Representation for Logic Synthesis and Verification - Alan Mishchenko, Satrajit Chatterjee, Roland Jiang, Robert Brayton ERL Technical.
Advertisements

ECE 667 Synthesis & Verification - Boolean Functions 1 ECE 667 Spring 2013 ECE 667 Spring 2013 Synthesis and Verification of Digital Circuits Boolean Functions.
ECE 331 – Digital System Design
Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification.
DARPA Scalable Simplification of Reversible Circuits Vivek Shende, Aditya Prasad, Igor Markov, and John Hayes The Univ. of Michigan, EECS.
Boolean Functions and their Representations
1 FRAIGs: Functionally Reduced And-Inverter Graphs Adapted from the paper “FRAIGs: A Unifying Representation for Logic Synthesis and Verification”, by.
Rewiring – Review, Quantitative Analysis and Applications Matthew Tang Wai Chung CUHK CSE MPhil 10/11/2003.
DAG-Aware AIG Rewriting Alan Mishchenko, Satrajit Chatterjee, Robert Brayton Department of EECS, University of California Berkeley Presented by Rozana.
1 A New Enhanced Approach to Technology Mapping Alan Mishchenko Presented by: Sheng Xu May 2 nd 2006.
Logic Synthesis Primer
Optimality Study of Logic Synthesis for LUT-Based FPGAs Jason Cong and Kirill Minkovich VLSI CAD Lab Computer Science Department University of California,
Faster Logic Manipulation for Large Designs Alan Mishchenko Robert Brayton University of California, Berkeley.
POWER-DRIVEN MAPPING K-LUT-BASED FPGA CIRCUITS I. Bucur, N. Cupcea, C. Stefanescu, A. Surpateanu Computer Science and Engineering Department, University.
Electrical and Computer Engineering Archana Rengaraj ABC Logic Synthesis basics ECE 667 Synthesis and Verification of Digital Systems Spring 2011.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
Area and Speed Oriented Implementations of Asynchronous Logic Operating Under Strong Constraints.
Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.
ABC: A System for Sequential Synthesis and Verification BVSRC Berkeley Verification and Synthesis Research Center Robert Brayton, Niklas Een, Alan Mishchenko,
Enumeration of Irredundant Circuit Structures Alan Mishchenko Department of EECS UC Berkeley UC Berkeley.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
Cut-Based Inductive Invariant Computation Michael Case 1,2 Alan Mishchenko 1 Robert Brayton 1 Robert Brayton 1 1 UC Berkeley 2 IBM Systems and Technology.
1 Stephen Jang Kevin Chung Xilinx Inc. Alan Mishchenko Robert Brayton UC Berkeley Power Optimization Toolbox for Logic Synthesis and Mapping.
Output Grouping Method Based on a Similarity of Boolean Functions Petr Fišer, Pavel Kubalík, Hana Kubátová Czech Technical University in Prague Department.
Wenlong Yang Lingli Wang State Key Lab of ASIC and System Fudan University, Shanghai, China Alan Mishchenko Department of EECS University of California,
Page 1 A Platform for Scalable One-pass Analytics using MapReduce Boduo Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy SIGMOD 2011 IDS Fall Seminar 2011.
1 Area-Efficient FPGA Logic Elements: Architecture and Synthesis Jason Anderson and Qiang Wang 1 IEEE/ACM ASP-DAC Yokohama, Japan January 26-28,
IPR: In-Place Reconfiguration for FPGA Fault Tolerance Zhe Feng 1, Yu Hu 1, Lei He 1 and Rupak Majumdar 2 1 Electrical Engineering Department 2 Computer.
1 Alan Mishchenko Research Update June-September 2008.
A Semi-Canonical Form for Sequential Circuits Alan Mishchenko Niklas Een Robert Brayton UC Berkeley Michael Case Pankaj Chauhan Nikhil Sharma Calypto Design.
Global Delay Optimization using Structural Choices Alan Mishchenko Robert Brayton UC Berkeley Stephen Jang Xilinx Inc.
Reducing Structural Bias in Technology Mapping
MAPLD 2005 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan Dr. V. Kamakoti.
Technology Mapping into General Programmable Cells
Power Optimization Toolbox for Logic Synthesis and Mapping
Mapping into LUT Structures
Delay Optimization using SOP Balancing
Faster Logic Manipulation for Large Designs
Alan Mishchenko Satrajit Chatterjee Robert Brayton UC Berkeley
Logic Synthesis Primer
A. Mishchenko S. Chatterjee1 R. Brayton UC Berkeley and Intel1
Magic An Industrial-Strength Logic Optimization, Technology Mapping, and Formal Verification System Alan Mishchenko UC Berkeley.
Applying Logic Synthesis for Speeding Up SAT
Versatile SAT-based Remapping for Standard Cells
Integrating an AIG Package, Simulator, and SAT Solver
Standard-Cell Mapping Revisited
Faster Logic Manipulation for Large Designs
Fast Computation of Symmetries in Boolean Functions Alan Mishchenko
SAT-Based Area Recovery in Technology Mapping
Alan Mishchenko University of California, Berkeley
SAT-Based Optimization with Don’t-Cares Revisited
Scalable and Scalably-Verifiable Sequential Synthesis
Mapping into LUT Structures
Improvements to Combinational Equivalence Checking
Alan Mishchenko UC Berkeley
Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow
Alan Mishchenko UC Berkeley (With many thanks to Donald Knuth for
Integrating an AIG Package, Simulator, and SAT Solver
Improvements in FPGA Technology Mapping
Recording Synthesis History for Sequential Verification
Delay Optimization using SOP Balancing
Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow
Magic An Industrial-Strength Logic Optimization, Technology Mapping, and Formal Verification System Alan Mishchenko UC Berkeley.
Innovative Sequential Synthesis and Verification
Robert Brayton Alan Mishchenko Niklas Een
SAT-based Methods: Logic Synthesis and Technology Mapping
Scalable Don’t-Care-Based Logic Optimization and Resynthesis
Robert Brayton Alan Mishchenko Niklas Een
Alan Mishchenko Department of EECS UC Berkeley
Integrating AIG Package, Simulator, and SAT Solver
Presentation transcript:

Wenlong Yang Lingli Wang State Key Lab of ASIC and System Fudan University, Shanghai, China Alan Mishchenko Department of EECS University of California, Berkeley 1 Lazy Man’s Logic Synthesis

 Introduction  Previous Work  Lazy Man’s Logic Synthesis(LMS)  Experimental Results  Conclusion & Future Work 2

 Goal of logic synthesis: Deriving a circuit or improving an available circuit  We proposed a “Lazy” approach to reuse optimal structures derived by other synthesis tools based on a pre-computed library AIG A Function with N variables Other tools LMS precomputed library 3

 Introduction  Previous Work  Lazy Man’s Logic Synthesis(LMS)  Experimental Results  Conclusion 4

 Logic synthesis based on precomputed library have been proposed in several papers, but they are all different from LMS: Previous work Precompute structures in terms of LUTs [Kennings, IWLS, 2010 ] Didn't use preexisting benchmarks or tools [Bjesse, ICCAD, 2004 ] Look at only 4-5 input functions [Li, IWLS, 2011 ] Only compute multiple structure choices [Chatterjee, TCAD, 2006 ] LMS Precompute structures in terms of AIGs Use public benchmarks and existing tools Look at 6-16 input functions Store many equivalent structures 5

For each node Compute several k-input cuts Perform delay-optimal tree balancing of the SOP Select the best one to replace the current structure. An AIG subgraph found in benchmark s27.blif where SOP balancing loses to the proposed approach F = !c*!b + !c*aF’ = !c*!(b*!a) 6

 Introduction  Previous Work  Lazy Man’s Logic Synthesis(LMS)  Equivalence Classes  Library Representation/Construction  Implementation  Experimental Results  Conclusion 7

 LMS is based on collecting, storing, and re-using circuit structures of Boolean functions with 6-16 input variables.  The total number of completely-specified Boolean functions of N variables is 2 ^( 2 ^N).  Experiments shows that even for the practical functions, this number can be very large. To reduce the number and memory need to store functions in a library, a canonical form is used to break them into Equivalence Classes. 8

 Two functions are NPN-equivalent if one of them can be obtained from the other by negation and/or permutation of the inputs and outputs. Complete NPN canonical form is not affordable to LMS Drawbacks of NPN computation: Time-consuming Complicated 9

 The idea is to order the input variables and the polarities of inputs/outputs using the number of positive minterms and cofactors w.r.t. each variable. Input: TruthTable F 1. Determine the polarity of F by the number of 1’s in TruthTable 2. Determine the polarity of each variable by the number of 1s in the negative cofactor w.r.t. each variable 3. Sort input variables by the number of 1s in their negative cofactors and permute inputs accordingly Output: canonicized TruthTable F A reasonable trade-off between accuracy and speed 10

 An N-input library contains functions up to N variables.  Structures of all functions are represented as a shared AIG  Each output of the AIG is the root node of one logic structure.  When a library is loaded, the following actions are performed:  A hash table is created to hash the outputs by its semi-canonical form.  For each structure, the area and pin-to-output delays are computed and stored. 11

Example of using pin-to-output delays to compute structure delay Suppose arrival time: {3, 2, 4, 5, 2, 3, 1} Pin-to-output delay: {3, 3, 3, 5, 5, 4, 1} + {6, 5, 7, 10, 7, 7, 2} = If one structure’s pin-to-output delay is worse than another with respect to every input, the structure is dominated. 12

 LUT mapper if in ABC is used as a structural cut browser to generate K- input cuts whose logic structures are added to the library. Input: Cut C 1. If cut C does not meet the requirements return 2. Compute Boolean function F of cut C as a truthtable 3. Compute the semi-canonical form of F 4. Rebuild the structure of the cut in the library 5. If ( the structure already exists or is dominated ) return 6. Add a new primary output to store the structure in the hash table 13

Input: And-Inverter Graph  For each node, in a topological order  Compute several K-input cuts  For each cut ▪ Compute truth table ▪ Look up in the library ▪ If there is no structure for this function  Mark the cut to ensure it is not selected as best cut ▪ Else if the best structure found leads to smaller AIG level  Save the cut as the best cut  If there is an improvement in level, update AIG 14

 The LMS algorithm is implemented in ABC. The LUT mapper if in ABC is used as:  (a) A cut browser for computing the libraries  (b) A mapper in the case study on AIG level minimization Commands related to library construction: rec_start: Starts the LMS recorder. rec_add: Add structures from benchmarks rec_filter: Removes the structures with less frequency rec_merge: Merges two previously computed libraries rec_ps: Prints statistics for the currently loaded library rec_use: Transforms the internal library to the current network in ABC rec_stop: Deletes the current library. Commands used to perform LMS mapping: if –y –K -C -y enables level optimization by LMS -K is the cut size -C is the number of cuts used at each node 15

 Introduction  Previous Work  Lazy Man’s Logic Synthesis(LMS)  Experimental Results  Library Coverage  6-input Library  Optimize Delay After LUT Mapping  Conclusion 16

 This experiment was performed to show that LMS has practical memory requirements for functions up to 12 inputs.  Semi-canonical classes of all functions appearing in the cuts of the benchmark circuits without synthesis, were collected and the frequency of their appearance was recorded. occurrence frequency ~2 M classes in total ~740 K classes for 90% functions ~400MB for truth tables 17 Function #

 The goal of this experiment is to derive a 6-input library used in the following case study of AIG level minimization.  The following ABC scripts are used to collect structures: read file; st; rec_add; dc2; rec_add; if -K 8; bidec; st; rec_add; if -K 8; mfs; st; rec_add; if -K 8; bidec; st; rec_add; if -g -K 6; st; rec_add; InputsClasses #Structures #Ratio ,43012, ,208471, ,148,5565,202, Total1,249,2295,687, Statistics of the precomputed 6-input library ~77MB AIGER file 18

 Two sets of benchmarks are used in this paper: 20 MCNC benchmarks and 10 large Altera benchmarks.  LUT mapping was performed by the following scripts:  Map: st; resyn2; if -K 4 or 6  MapC: st; resyn2; dch -f; if -K 4 or 6  SOPBC: st; if -gm -K 6; st; resyn2; dch -f; if -K 4 or 6  LMSC: st; if -ym -K 6; st; resyn2; dch -f; if -K 4 or 6  Benchmarks were run on a workstation with a Intel Xeon Quad Core CPU and 256 GBytes RAM (~4GB used for the experiment)  The resulting networks were verified by command cec in ABC. 19

LMSC reduced delay by 37% with an area increase of 13% 20

LMSC reduced delay by 26% with an area increase of 13% 21

Design4-LUT level4-LUT count6-LUT level6-LUT count MapMapCSOPBCLMSCMapMapCSOPBCLMSCMapMapCSOPBCLMSCMapMapCSOPBCLMSC alu apex b b b b b b clma des elliptic ex5p frisc i pdc s s seq spla tseng Raito LUTs: LMSC reduced delay by 10% with an area increase of 3% 6-LUTs: LMSC reduced delay by 12% with an area increase of 8% 22

 A new method to harvest and re-use circuit structures produced by different tools on benchmark circuits  The “lazy” approach is made practical by  A semi-canonical form to reduce the number of equivalence classes  Using AIGs to store precomputed libraries in memory and on disk  Using truth tables to manipulate Boolean functions  As the case-study, the proposed approach was applied to improve delay after FPGA mapping  For industrial benchmarks, compared to SOP balancing,  the delay was reduced by 17% ( 18% ) for LUT 4 (LUT 6 )  the area penalty was 2% ( 5% ) 23

 Improving implementation  Reducing memory by using a low-memory AIG  Building libraries in terms of multi-input gates  Filtering libraries based on their performance  Giving the user control over the area increase  Continuing experiments  Performing case studies with larger functions  Evaluating delay improvements after P&R 24

Authors'  Wenlong Yang  Lingli Wang  Alan 25

Deriving a circuit for a Boolean function or improving an available circuit are typical tasks solved by logic synthesis. Numerous algorithms in this area have been proposed and implemented over the last 50 years. This paper presents a "lazy” approach to logic synthesis based on the following observations: (a) optimal or near-optimal circuits for many practical functions are already derived by the tools, making it unnecessary to implement new algorithms or even run the old ones repeatedly; (b) larger circuits are composed of smaller ones, which are often isomorphic up to a permutation/negation of inputs/outputs. Experiments confirm these observations. Moreover, a case-study shows that logic level minimization using lazy man’s synthesis improves delay after LUT mapping into 4- and 6-input LUTs, compared to earlier work on high-effort delay optimization.