Optimality Study of Logic Synthesis for LUT-Based FPGAs Jason Cong and Kirill Minkovich VLSI CAD Lab Computer Science Department University of California,

Slides:

Advertisements

Similar presentations

Address comments to FPGA Area Reduction by Multi-Output Sequential Resynthesis Yu Hu 1, Victor Shih 2, Rupak Majumdar 2 and Lei He 1 1.

Advertisements

New Ways of Generating Large Realistic Benchmarks for Testing Synthesis Tools Petr Fišer, Jan Schmidt Faculty of Information Technology Czech Technical.

Mapping for Better Than Worst-Case Delays In LUT-Based FPGA Designs Kirill Minkovich and Jason Cong VLSI CAD Lab Computer Science Department University.

June 6, Using Negative Edge Triggered FFs to Reduce Glitching Power in FPGA Circuits Tomasz S. Czajkowski and Stephen D. Brown Department of Electrical.

ECE 667 Synthesis & Verificatioin - FPGA Mapping 1 ECE 667 Synthesis and Verification of Digital Systems Technology Mapping for FPGAs D.Chen, J.Cong, DAOMap.

Xing Wei, Wai-Chung Tang, Yu-Liang Wu Department of Computer Science and Engineering The Chinese University of HongKong

FPGA Technology Mapping Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

Global Flow Optimization (GFO) in Automatic Logic Design “ TCAD91 ” by C. Leonard Berman & Louise H. Trevillyan CAD Group Meeting Prepared by Ray Cheung.

Floating-Point FPGA (FPFPGA) Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009.

A polylogarithmic approximation of the minimum bisection Robert Krauthgamer The Hebrew University Joint work with Uri Feige.

FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.

It Is Better to Run Iterative Resynthesis on Parts of the Circuit Petr Fišer, Jan Schmidt Faculty of Information Technology Czech Technical University.

Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.

Comments on Development-Oriented vs Basic Research Prof. Jason Cong Computer Science Department University of California, Los Angeles.

Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification.

1 DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004 Presented by: Wei Chen.

VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.

Optimal Layout of CMOS Functional Arrays ECE665- Computer Algorithms Optimal Layout of CMOS Functional Arrays T akao Uehara William M. VanCleemput Presented.

Computational Complexity of Approximate Area Minimization in Channel Routing PRESENTED BY: S. A. AHSAN RAJON Department of Computer Science and Engineering,

EDA (CS286.5b) Day 3 Clustering (LUT Map and Delay) N.B. no lecture Thursday.

DAG-Aware AIG Rewriting Alan Mishchenko, Satrajit Chatterjee, Robert Brayton Department of EECS, University of California Berkeley Presented by Rozana.

 Y. Hu, V. Shih, R. Majumdar and L. He, “Exploiting Symmetries to Speedup SAT-based Boolean Matching for Logic Synthesis of FPGAs”, TCAD  Y. Hu,

Lecture 9: Multi-FPGA System Software October 3, 2013 ECE 636 Reconfigurable Computing Lecture 9 Multi-FPGA System Software.

FPGA Technology Mapping Algorithms

FPGA Technology Mapping. 2 Technology mapping:  Implements the optimized nodes of the Boolean network to the target device library.  For FPGA, library.

Applying Edge Partitioning to SPFD's 1 Applying Edge Partitioning to SPFD’s 219B Project Presentation Trevor Meyerowitz Mentor: Subarna Sinha Professor:

Constant Factor Approximation of Vertex Cuts in Planar Graphs Eyal Amir, Robert Krauthgamer, Satish Rao Presented by Elif Kolotoglu.

Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.

Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.

DEXA 2005 Quality-Aware Replication of Multimedia Data Yicheng Tu, Jingfeng Yan and Sunil Prabhakar Department of Computer Sciences, Purdue University.

June 10, Functionally Linear Decomposition and Synthesis of Logic Circuits for FPGAs Tomasz S. Czajkowski and Stephen D. Brown University of Toronto.

CPSC 171 Introduction to Computer Science 3 Levels of Understanding Algorithms More Algorithm Discovery and Design.

Power Reduction for FPGA using Multiple Vdd/Vth

APPROXIMATION ALGORITHMS VERTEX COVER – MAX CUT PROBLEMS

AMIN FARMAHININ-FARAHANI CHARLES TSEN KATHERINE COMPTON FPGA Implementation of a 64-bit BID-Based Decimal Floating Point Adder/Subtractor.

05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.

FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimisation in Lookup- Table Based FPGA Designs 04/06/ Presented by Qiwei Jin.

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen and Jason Cong Computer Science Department University of California,

1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.

On Logic Synthesis of Conventionally Hard to Synthesize Circuits Using Genetic Programming Petr Fišer, Jan Schmidt Faculty of Information Technology, Czech.

On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.

Optimality Study of Logic Synthesis for LUT-Based FPGAs Jason Cong and Kirill Minkovich.

1 Area-Efficient FPGA Logic Elements: Architecture and Synthesis Jason Anderson and Qiang Wang 1 IEEE/ACM ASP-DAC Yokohama, Japan January 26-28,

In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.

Digital Logic Design Lecture # 15 University of Tehran.

1 What happens to the location estimator if we minimize with a power other that 2? Robert J. Blodgett Statistic Seminar - March 13, 2008.

FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong ， Computer Science Department ， UCLA Presented.

Global Clustering-Based Performance-Driven Circuit Partitioning Jason Cong University of California Los Angeles Chang Wu Aplus Design.

SEMI-SYNTHETIC CIRCUIT GENERATION FOR TESTING INCREMENTAL PLACE AND ROUTE TOOLS David GrantGuy Lemieux University of British Columbia Vancouver, BC.

A Semi-Canonical Form for Sequential Circuits Alan Mishchenko Niklas Een Robert Brayton UC Berkeley Michael Case Pankaj Chauhan Nikhil Sharma Calypto Design.

Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.

Uniformed Search (cont.) Computer Science cpsc322, Lecture 6

Floating-Point FPGA (FPFPGA)

A New Logic Synthesis, ExorBDS

Hans Bodlaender, Marek Cygan and Stefan Kratsch

Mapping into LUT Structures

Delay Optimization using SOP Balancing

CS137: Electronic Design Automation

Applying Logic Synthesis for Speeding Up SAT

A Boolean Paradigm in Multi-Valued Logic Synthesis

Fast Computation of Symmetries in Boolean Functions Alan Mishchenko

Alan Mishchenko University of California, Berkeley

Redundancy-Aware, Fault-Tolerant Clustering

SAT-Based Optimization with Don’t-Cares Revisited

Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow

Delay Optimization using SOP Balancing

Fast Min-Register Retiming Through Binary Max-Flow

Donghui Zhang, Tian Xia Northeastern University

Presentation transcript:

Optimality Study of Logic Synthesis for LUT-Based FPGAs Jason Cong and Kirill Minkovich VLSI CAD Lab Computer Science Department University of California, Los Angeles Supported by Altera, Xilinx, and Magma under the California MICRO program.

UCLA VLSICAD LAB Outline u Motivation and background  Current testcases hinted towards algorithms not having much room for improvement. u LEKO  Logic synthesis Examples with Known Optimals  Creation, optimality, and results u LEKU  Logic synthesis Examples with Known Upper bounds  Creation and results u Conclusion

UCLA VLSICAD LAB Goals of Paper u Goal was to test the optimality of two design steps for logic synthesis:  Technology Mapping  Logic Optimization combined with Technology Mapping u Definitions  Technology Mapping  Logic Optimization  Logic Synthesis = Logic Optimization + Technology Mapping

UCLA VLSICAD LAB Motivation u Logic synthesis is NP-hard in general u Combining logic optimization & mapping is much harder  Academic tools mostly focus on mapping u Problems with current test cases  How far from optimal?  Logic optimization? u Decrease of FPGA synthesis papers  Suggests fewer improvements possible u Why there is a need for new ones  Test specific properties of logic synthesis tools  LEKO & LEKU

UCLA VLSICAD LAB Construction Overview (LEKO) u First create a small “core” graph, G5, with a known optimal mapping (and possibly a logic synthesis) solution. u G5 has to have the following properties 1.5 inputs (x 1,x 2,…,x 5 ) 2.5 outputs (y 1,y 2,…,y 5 ) 3.y i = f (x 1,x 2,…,x 5 ) 4.Internal nodes have exactly two inputs. 5.  optimal (in terms of area/depth) mapping of G5 into a 4-LUT mapping solution with only has 4-LUTs (no 3-LUTs or 2-LUTs). u Why these properties?  Simplest G5 for 4-LUT architecture  Can be cascaded into larger structures

UCLA VLSICAD LAB G5 – example (optimal 7 4-LUTs)

UCLA VLSICAD LAB Construction Overview (LEKO) u Algorithm Steps 1.Create a G5 2.Then duplicate it and connect them together is such a way s.t. there is a unique traversal of G5’s from PO to PI. u This creates a new graph where we have the following properties:  There exists a known optimal mapping solution  This also provides a tight upper-bound to the optimal logic synthesis solution u By using different G5s we can construct different LEKO networks with any variety of properties.  G5 can have different mapping and logic synthesis solutions  G5 can be based on realistic designs (multipliers, adders, etc)

UCLA VLSICAD LAB Construction Examples (LEKO) G5

UCLA VLSICAD LAB Optimality Theorem: The optimal mapping solution of an arbitrarily sized LEKO circuit without logic optimization is achieved when every G5 in the circuit is mapped optimally without overlapping any other G5. Proof Idea: A LUT spanning two layers can will not reduce the area of the solution. This can be easily shown by looking at what would happen to G5 at layer i and at layer i+1 u Complete proof is in the paper

UCLA VLSICAD LAB LEKO Examples u LEKO – Logic synthesis Examples with Known Optimals  Naming G 25 has 25 inputs and 25 outputs G 25 has 25 inputs and 25 outputs G x has x inputs and x outputs G x has x inputs and x outputs u Tools tested  Altera’s Quartus 5.0, Xilinx’s ISE 7.1i, UCLA’s DAOmap and Berkeley’s ABC  4-LUT architecture  Area optimization only (NP-hard) Circuits# NodesDepth# I/O Optimal # LUTsDepth LEKO G G G , ,5008

UCLA VLSICAD LAB Results (LEKO) u Only mapping needed to produce optimal results. u What do these mean?  Scaled fairly well  Average gap = 15% u Why Quartus and ISE did so well  Performed extra non-mapping steps CircuitsDAOmapABCQuartusISEOptimal LEKO(G 25 ) Area Ratio LEKO(G 125 ) Area Ratio LEKO(G 625 ) Area4,4354,0723,7373,9743,500 Ratio Average Ratio

UCLA VLSICAD LAB Creating LEKU u LEKU – Logic synthesis Examples with Known Upper bounds  Constructed from LEKO  Constructed from LEKO G 25 (25 inputs and 25 outputs) Collapse then decompose the graph Collapse then decompose the graph Creates much larger graph that is logically equivalent to original Creates much larger graph that is logically equivalent to original LEKU-CD – collapsed  decomposed into AND/OR gates LEKU-CD – collapsed  decomposed into AND/OR gates LEKU-CB – collapsed  balanced LEKU-CB – collapsed  balanced  LEKU-CD’ LEKU-CD was too large for Xilinx as a single input LEKU-CD was too large for Xilinx as a single input Split LEKU-CD into 25 separate designs, one for each PO Split LEKU-CD into 25 separate designs, one for each PO Circuits# NodesDepth#I/O Upper-Bound on Optimal # LUTsDepth LEKU-CD(G 25 )1,166, LEKU-CB(G 25 )

UCLA VLSICAD LAB Results on LEKU u Logic Optimization and Mapping were needed  Academic tools were allowed to use preprocessing tools u What does this mean?  There exist designs on which these tool perform very badly  Average gap = 171x  Suggest that all of these tools lack global minimization heuristics CircuitsDAOmapABCQuartusISE Upper Bounds LEKU- CD(G 25 ) Area22,71730,51110,381*70 Ratio *1 LEKU- CD(G 25 )’ Area25,24735,2715,0059,71770 Ratio LEKU- CB(G 25 ) Area Ratio Average Ratio (last 2 designs) Average Ratio (ALL) *1

UCLA VLSICAD LAB LEKO/LEKU vs Real Designs u Limitations  Whole circuit is combinational logic  Contain highly repeated structures in the original circuits  Doesn’t mean tools are 70x away from optimal on real designs u Different uses than real design  LEKO Test mapping phase of algorithm Test mapping phase of algorithm u Perform well on current LEKO benchmarks u Will construct larger core graphs  worse results ?  LEKU Test logic optimization phase of algorithm Test logic optimization phase of algorithm u Ability to reproduce original structure u Duplication removal u Logic Identification u Other global heuristics

UCLA VLSICAD LAB Conclusions u Conclusions  LEKO Only circuits that test optimality of technology mapping Only circuits that test optimality of technology mapping Have an optimal mapping solution Have an optimal mapping solution  LEKU Test global area minimizing heuristics Test global area minimizing heuristics Have a very tight upper bound on optimal solution Have a very tight upper bound on optimal solution  These circuits address a need for specific method testing u Current state of technology  Technology Mapping Current tools do very well Current tools do very well  Overall Logic Synthesis Current tools just can’t produce good solutions that require a global minimization heuristics. Current tools just can’t produce good solutions that require a global minimization heuristics.

UCLA VLSICAD LAB Conclusions (continued) u Download every testcases mentioned here   Click on “Optimality Study”  Click on “LEKO/LEKU”  Harder and Larger LEKO and LEKU circuits will be posted soon! u Check out the article in EE Times  Just search EE Times for “kirill”  Thank you EE Times for your interest! u Questions?

UCLA VLSICAD LAB

Additional Slides

UCLA VLSICAD LAB Construction Algorithm (LEKO)

UCLA VLSICAD LAB Variations u LEKO  Using larger core graphs to create more complex designs  Using commonly used cells as the core graphs  Using collection of core graphs u LEKU  Using LEKO and adding in specific things to test Duplicating some specific parts Duplicating some specific parts Adding wires that will be removed when DON’T CARES are computed Adding wires that will be removed when DON’T CARES are computed

UCLA VLSICAD LAB Interesting New Results u After seeing the results we got several responses  ABC Repeating Repeating map 4-LUTs  don’t care calculation let to 3x improvement on the largest LEKU example let to 3x improvement on the largest LEKU example  DAOMap Multiple iteration of Multiple iteration of map 5-LUTs  simplify  map 4-LUTs showed similar improvements on the LEKU examples showed similar improvements on the LEKU examples  Altera For the LEKO the following For the LEKO the following map 5-LUT  map 4-LUT was able to achieve near optimal solutions This result wouldn’t extend if we used a larger G5 This result wouldn’t extend if we used a larger G5

UCLA VLSICAD LAB Different G5s u Assuming a K -LUT u G5 has to have the following properties 1.It has m inputs and m outputs. 2.Every output is a function of all five inputs. 3.Each internal node of G5 has exactly two inputs. 4.There exists an optimal (in terms of area/depth) mapping of G5 into a K -LUT mapping solution, denoted M5, such that M5 only has K -LUTs. u Where  m ≥ K + 1  The larger the m the harder the G5 is to map