1 The Optimization of High- Performance Digital Circuits Andrew Conn (with Michael Henderson and Chandu Visweswariah) IBM Thomas J. Watson Research Center.

Slides:



Advertisements
Similar presentations
Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
Advertisements

© Imperial College London Eplex: Harnessing Mathematical Programming Solvers for Constraint Logic Programming Kish Shen and Joachim Schimpf IC-Parc.
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Constraint Optimization Presentation by Nathan Stender Chapter 13 of Constraint Processing by Rina Dechter 3/25/20131Constraint Optimization.
ELEN 468 Lecture 261 ELEN 468 Advanced Logic Design Lecture 26 Interconnect Timing Optimization.
Timing Margin Recovery With Flexible Flip-Flop Timing Model
Control Structure Selection for a Methanol Plant using Hysys/Unisim
Progress in Linear Programming Based Branch-and-Bound Algorithms
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
1 Fast Primal-Dual Strategies for MRF Optimization (Fast PD) Robot Perception Lab Taha Hamedani Aug 2014.
1 Recursive Learning Madhurima Maddela. ELEC /26/052 Decision Tree Traditionally used to branch and bound in the search space to generate test.
Design of Variable Input Delay Gates for Low Dynamic Power Circuits
Nov. 29, 2005 ELEC Class Presentation 1 Logic Redesign for Low Power ELEC 6970 Project Presentation By Nitin Yogi.
1 A Second Stage Network Recourse Problem in Stochastic Airline Crew Scheduling Joyce W. Yen University of Michigan John R. Birge Northwestern University.
Design Automation for VLSI, MS-SOCs & Nanotechnologies Dr. Malgorzata Chrzanowska-Jeske Mixed-Signal System-on-Chip (supported.
Efficient Methodologies for Reliability Based Design Optimization
1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
SAMSON: A Generalized Second-order Arnoldi Method for Reducing Multiple Source Linear Network with Susceptance Yiyu Shi, Hao Yu and Lei He EE Department,
Support Vector Regression David R. Musicant and O.L. Mangasarian International Symposium on Mathematical Programming Thursday, August 10, 2000
High-Speed Circuit-Tuning Techniques Based on Lagrangian Relaxation Charlie Chung-Ping Chen (608)
Branch and Bound Algorithm for Solving Integer Linear Programming
Gate Sizing by Mathematical Programming Prof. Shiyan Hu
Functional Timing Analysis Made Fast and General Presenter: Yi-Ting Chung Advisor: Jie-Hong Roland Jiang 03/09/2012 Graduate Institute of Electronics Engineering,
Optimization Using Broyden-Update Self-Adjoint Sensitivities Dongying Li, N. K. Nikolova, and M. H. Bakr McMaster University, 1280 Main Street West, Hamilton,
Escape Routing For Dense Pin Clusters In Integrated Circuits Mustafa Ozdal, Design Automation Conference, 2007 Mustafa Ozdal, IEEE Trans. on CAD, 2009.
1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.
Discrete Gate Sizing CENG 5270 – Tutorial 9 WILLIAM CHOW.
Power Reduction for FPGA using Multiple Vdd/Vth
Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal February 12, 2007 Inexact Methods for PDE-Constrained Optimization.
Jia Yao and Vishwani D. Agrawal Department of Electrical and Computer Engineering Auburn University Auburn, AL 36830, USA Dual-Threshold Design of Sub-Threshold.
MICAS Department of Electrical Engineering (ESAT) AID–EMC: Low Emission Digital Circuit Design Junfeng Zhou Wim Dehaene Update of the “Digital EMC project”
New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE Advanced Digital Systems Design Lecture 12 – Timing Analysis Capt Michael Tanner Room 2F46A HQ U.S. Air Force Academy I n t e g r i.
-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI.
Supporting Conceptual Design Innovation through Interactive Evolutionary Systems I.C. Parmee Advanced Computation in Design and Decision-making CEMS, University.
EE 201C Modeling of VLSI Circuits and Systems
1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.
A Power Grid Analysis and Verification Tool Based on a Statistical Prediction Engine M.K. Tsiampas, D. Bountas, P. Merakos, N.E. Evmorfopoulos, S. Bantas.
Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal January 31, 2007 Inexact Methods for PDE-Constrained Optimization.
Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.
Approximate Dynamic Programming Methods for Resource Constrained Sensor Management John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL.
Christopher Moh 2005 Competition Programming Analyzing and Solving problems.
© Chandu Visweswariah, 2004New Challenges in IC Design1 New Challenges in IC Design … with a focus on variability … SBCCI 2004 Panel Discussion Chandu.
Optimization Challenges in Transistor Sizing Chandu Visweswariah IBM Thomas J. Watson Research Center Yorktown Heights, NY Acknowledgments The entire.
Solution of a Partial Differential Equations using the Method of Lines
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
High-Speed Circuit-Tuning Techniques Based on Lagrangian Relaxation Charlie Chung-Ping Chen ICCAD 99’ Embedded Tutorial Session 12A
Static Timing Analysis
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
A Hybrid Optimization Approach for Automated Parameter Estimation Problems Carlos A. Quintero 1 Miguel Argáez 1, Hector Klie 2, Leticia Velázquez 1 and.
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.
The Annealing Algorithm Revisited L.P.P.P. van Ginneken DigiPen Institute of Technology.
1 Chapter 6 Reformulation-Linearization Technique and Applications.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 2: September 28, 2005 Covering.
Placement and Routing Algorithms. 2 FPGA Placement & Routing.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Bounded Nonlinear Optimization to Fit a Model of Acoustic Foams
Solver & Optimization Problems
CSE 245: Computer Aided Circuit Simulation and Verification
Haihua Su, Sani R. Nassif IBM ARL
Optimal control T. F. Edgar Spring 2012.
Timing Optimization Andreas Kuehlmann
SAT-Based Area Recovery in Technology Mapping
Sungho Kang Yonsei University
Yiyu Shi*, Wei Yao*, Jinjun Xiong+ and Lei He*
CS5321 Numerical Optimization
Performance-Driven Interconnect Optimization Charlie Chung-Ping Chen
Presentation transcript:

1 The Optimization of High- Performance Digital Circuits Andrew Conn (with Michael Henderson and Chandu Visweswariah) IBM Thomas J. Watson Research Center Yorktown Heights, NY

2 Outline

3 Circuit optimization

4 Dynamic vs. static optimization Transistor and wire sizes Simulator Nonlinear optimizer Function and gradient values Transistor and wire sizes Static timing analyzer Nonlinear optimizer Function and gradient values

5 Dynamic vs. static optimization

6 Custom? High-performance?

7 EinsTuner: formal static optimizer Embedded time- domain simulator SPECS Static transistor- level timer EinsTLT Nonlinear optimizer LANCELOT Transistor and wire sizes Function and gradient values

8 Components of EinsTuner Read netlist; create timing graph (EinsTLT) Formulate pruned optimization problem Feed problem to nonlinear optimizer (LANCELOT) Snap-to-grid; back-annotate; re-time Solve optimization problem, call simulator for delays/slews and gradients thereof Obtain converged solution Fast simulation and incremental sensitivity computation (SPECS)

9 Static optimization formulation

10 Digression: minimax optimization

11 Remapped problem

12 Springs and planks analogy

13 Springs and planks analogy

14 Algorithm animation: inv3 Delay Logic stages PIs by criticality Wire Gate Red=critical Green=non-critical Curvature=sensitivity Thickness=transistor size One such frame per iteration

15 Algorithm demonstration: 1b ALU

16 i j Constraint generation

17 Statement of the problem

18 SPECS: fast simulation Two orders of magnitude faster than SPICE 5% typical stage delay and slew accuracy; 20% worst-case Event-driven algorithm Simplified device models Specialized integration methods Invoked via a programming interface Accurate gradients indispensable

19 LANCELOT

20 LANCELOT algorithms Uses augmented Lagrangian for nonlinear constraints  (x, ) = f(x) +  [  i c i (x) + c i (x) 2 /2 ] Simple bounds handled explicitly Adds slacks to inequalities Trust region method

21 LANCELOT algorithms continued Simple bounds Trust-region

22 Customization of LANCELOT Cannot just use as a black box Non-standard options may be preferable eg Solve the BQP subproblem accurately Magic Steps Noise considerations Structured Secant Updates Adjoint computations Preprocessing (Pruning) Failure recovery in conjunction with SPECS

23 LANCELOT State-of-the-art large-scale nonlinear optimization package Group partial separability is heavily exploited in our formulation Two-step updates applied to linear variables Specialized criteria for initializations, updates, adjoint computations, stopping and dealing with numerical noise

24 Aids to convergence Initialization of multipliers and variables Scaling, choice of units Choice of simple bounds on arrival times, z Reduction of numerical noise Reduction of dimensionality Treating fanout capacitances as “internal variables” of the optimization Tuning of LANCELOT to be aggressive Accurate solution of BQP

25 Demonstration of degeneracy

26 Demonstration of degeneracy Degeneracy!

27 Why do we need pruning?

28 Pruning of the timing graph The timing graph can be manipulated –to reduce the number of arrival time variables –to reduce the number of timing constraints –most of all, to reduce degeneracy No loss in generality or accuracy Bottom line: average 18.3x  AT variables, 33%  variables, 43% timing constraints, 22%  constraints, 1.7x to 4.1x  in run time on large problems

29 Pruning strategy During pruning, number of fanins of any un- pruned node monotonically increases During pruning, number of fanouts of any un-pruned node monotonically increases Therefore, if a node is not pruned in the first pass, it will never be pruned Therefore, a one-pass algorithm can be used for a given pruning criterion

30 Pruning strategy The order of pruning provably produces different (possibly sub-optimal) results Greedy 3-pass pruning produces a “very good” (but perhaps non-optimal) result We have not been able to demonstrate a better result than greedy 3-pass pruning However, the quest for a provably optimal solution continues...

31 Pruning: an example 1234

32 Block-based vs. path-based timing Block-based Path-based

33 Block-based & path-based timing In timing graph, if node has n fanins, m fanouts, eliminating it causes 2mn constraints instead of 2 (m+n) Criterion: if 2mn  2(m+n)+2, prune!

34 Detailed pruning example

35 Detailed pruning example Sink12 15 Source Edges = 26 Nodes = 16 (+2)

36 Detailed pruning example Sink12 15 Source Edges = 26  20 Nodes = 16  10

37 Detailed pruning example Sink12 Source Edges = 20  17 Nodes = 10  7

38 Detailed pruning example Sink12 15 Source 14 1,7 2,7 3,7 Edges = 17  16 Nodes = 7  6

39 Detailed pruning example 9 11, Sink12 15 Source 14 1,7 2,7 3,7 Edges = 16  15 Nodes = 6  5

40 Detailed pruning example 9 11, ,16 Sink12 15 Source 14 1,7 2,7 3,7 Edges = 15  14 Nodes = 5  4

41 Detailed pruning example 9 11, ,13,16 Sink12 15 Source 14 1,7 2,7 3,7 Edges = 14  13 Nodes = 4  3

42 Detailed pruning example 9 11, ,13,16 Sink Source 10,12,14 12,15 10,12,15 12,14 1,7 2,7 3,7 Edges = 13  13 Nodes = 3  2

43 Pruning vs. no pruning

44 Adjoint Lagrangian mode –gradient computation is the bottleneck –if the problem has m measurements and n tunable transistor/wire sizes: traditional direct method: n sensitivity simulations traditional adjoint method: m adjoint simulations –adjoint Lagrangian method computes all gradients in a single adjoint simulation!

45 Adjoint Lagrangian mode –useful for large circuits –implication: additional timing/noise constraints at no extra cost! –is predicated on close software integration between the optimizer and the simulator –gradient computation is 8% of total run time on average

46 Noise considerations –noise is important during tuning –semi-infinite problem ],[in allfor ),( 21 tttNMtxv L  vt area = c(x) NM L t1t1 t2t2

47 Noise considerations Trick: remap infinite number of constraints to a single integral constraint c(x) = 0 In adjoint Lagrangian mode, any number of noise constraints almost for free! General (constraints, objectives, minimax) Tradeoff analysis for dynamic library cells v t area = c(x) NM L t1t1 t2t2

48 Noise considerations

49 Initialization of s 1/2 1/6 1/4 1/6

50 Some numerical results - Dynamic

51 Some numerical results - Static

52 –Motivation Lagrangian relaxation Tuning community loves the approach –reduces the size of the problem –reduces redundancy and degeneracy – BUT … Never get something for nothing

53 Lagrangian relaxation (continued) Complicating constraints Relax into objective function

54 Substituting the first-order optimality conditions on  removes the dependence on z and AT’s --- because of the problem’s structure! Lagrangian relaxation (continued)

55 Lagrangian Relaxation

56 Lagrangian relaxation (continued) NameFETs Iterations CPU(s) w/o w w/  w/o w inv a Agrph ldder s s s ppc s c

57 Future work and conclusions – Elaine/IPOPT (Andreas Waechter) Handle linear constraints as before/directly Nonlinear constraints as before/filter method Simple bounds via primal-dual interior point method Spherical trust region scaled appropriately/line search method

58 References C. Visweswariah, A. R. Conn and L. Silva Exploiting Optimality Conditions in Accurate Static Circuit Tuning in High Performance Algorithms and Software for Nonlinear Optimization, G. DiPillo and A. Murli, Eds, pages 1-19, Kluwer, 2002 (to appear) A. R. Conn, P. K. Coulman, R. A. Haring, G. L. Morrill, and C. Visweswariah. Optimization of custom MOS Circuits by transistor sizing. To appear (2002) in the book "The Best of ICCAD - 20Years of Excellence in Computer Aided Design". Originally appeared as IEEE International Conference on Computer-Aided Design, pages , Nov 1996 A.R. Conn and C. Visweswariah Overview of continuous optimization advances and applications to circuit tuning. Proceedings International Symposium on Physical Design (2001), pp , ACM Press, New York.

59 References C. Visweswariah, R. A. Haring, and A. R. Conn. Noise considerations in circuit optimization. IEEE Transactions on Computer-Aided Design of ICs and Systems, Vol. 19, pages , June C. Visweswariah and A. R. Conn Formulation of static circuit optimization with reduced size, degeneracy and redundancy by timing graph manipulation. IEEE International Conference on Computer-Aided Design, pages , Nov A. R. Conn, L. N. Vicente, and C. Visweswariah. Two-step algorithms for nonlinear optimization with structured applications. SIAM Journal on Optimization, volume 9, number 4, pages , September 1999.

60 References A. R. Conn, I. M. Elfadel, W. W. Molzen, Jr., P. R. O'Brien, P. N. Strenski, C. Visweswariah, and C. B. Whan Gradient-based optimization of custom circuits using a static-timing formulation. Proc. Design Automation Conference, pages , June A. R. Conn, R. A. Haring, and C. Visweswariah Noise considerations in circuit optimization. IEEE International Conference on Computer-Aided Design, pages , Nov A. R. Conn, P. K. Coulman, R. A. Haring, G. L. Morrill, C. Visweswariah, and C. W. Wu. JiffyTune: circuit optimization using time-domain sensitivities. IEEE Transactions on Computer-Aided Design of ICs and Systems, number 12, volume 17, pages , December 1998.

61 References A. R. Conn, R. A. Haring, C. Visweswariah, and C. W. Wu. Circuit optimization via adjoint Lagrangians. IEEE International Conference on Computer-Aided Design, pages , Nov A. R. Conn, R. A. Haring, and C. Visweswariah. Efficient time-domain simulation and optimization of digital FET circuits. Mathematical Theory of Networks and Systems, May A popular article on our work by Stewart Wolpin