1 The Optimization of High- Performance Digital Circuits Andrew Conn (with Michael Henderson and Chandu Visweswariah) IBM Thomas J. Watson Research Center.

1 The Optimization of High- Performance Digital Circuits Andrew Conn (with Michael Henderson and Chandu Visweswariah) IBM Thomas J. Watson Research Center Yorktown Heights, NY

2 Outline

3 Circuit optimization

4 Dynamic vs. static optimization Transistor and wire sizes Simulator Nonlinear optimizer Function and gradient values Transistor and wire sizes Static timing analyzer Nonlinear optimizer Function and gradient values

5 Dynamic vs. static optimization

6 Custom? High-performance?

7 EinsTuner: formal static optimizer Embedded time- domain simulator SPECS Static transistor- level timer EinsTLT Nonlinear optimizer LANCELOT Transistor and wire sizes Function and gradient values

8 Components of EinsTuner Read netlist; create timing graph (EinsTLT) Formulate pruned optimization problem Feed problem to nonlinear optimizer (LANCELOT) Snap-to-grid; back-annotate; re-time Solve optimization problem, call simulator for delays/slews and gradients thereof Obtain converged solution Fast simulation and incremental sensitivity computation (SPECS) 1 2 3 4

9 Static optimization formulation

10 Digression: minimax optimization

11 Remapped problem

12 Springs and planks analogy

13 Springs and planks analogy

14 Algorithm animation: inv3 Delay Logic stages PIs by criticality Wire Gate Red=critical Green=non-critical Curvature=sensitivity Thickness=transistor size One such frame per iteration

15 Algorithm demonstration: 1b ALU

16 i j Constraint generation

17 Statement of the problem

18 SPECS: fast simulation Two orders of magnitude faster than SPICE 5% typical stage delay and slew accuracy; 20% worst-case Event-driven algorithm Simplified device models Specialized integration methods Invoked via a programming interface Accurate gradients indispensable

19 LANCELOT

20 LANCELOT algorithms Uses augmented Lagrangian for nonlinear constraints  (x, ) = f(x) +  [  i c i (x) + c i (x) 2 /2 ] Simple bounds handled explicitly Adds slacks to inequalities Trust region method

21 LANCELOT algorithms continued Simple bounds Trust-region

22 Customization of LANCELOT Cannot just use as a black box Non-standard options may be preferable eg Solve the BQP subproblem accurately Magic Steps Noise considerations Structured Secant Updates Adjoint computations Preprocessing (Pruning) Failure recovery in conjunction with SPECS

23 LANCELOT State-of-the-art large-scale nonlinear optimization package Group partial separability is heavily exploited in our formulation Two-step updates applied to linear variables Specialized criteria for initializations, updates, adjoint computations, stopping and dealing with numerical noise

24 Aids to convergence Initialization of multipliers and variables Scaling, choice of units Choice of simple bounds on arrival times, z Reduction of numerical noise Reduction of dimensionality Treating fanout capacitances as “internal variables” of the optimization Tuning of LANCELOT to be aggressive Accurate solution of BQP

25 Demonstration of degeneracy

26 Demonstration of degeneracy Degeneracy!

27 Why do we need pruning?

28 Pruning of the timing graph The timing graph can be manipulated –to reduce the number of arrival time variables –to reduce the number of timing constraints –most of all, to reduce degeneracy No loss in generality or accuracy Bottom line: average 18.3x  AT variables, 33%  variables, 43% timing constraints, 22%  constraints, 1.7x to 4.1x  in run time on large problems

29 Pruning strategy During pruning, number of fanins of any un- pruned node monotonically increases During pruning, number of fanouts of any un-pruned node monotonically increases Therefore, if a node is not pruned in the first pass, it will never be pruned Therefore, a one-pass algorithm can be used for a given pruning criterion

30 Pruning strategy The order of pruning provably produces different (possibly sub-optimal) results Greedy 3-pass pruning produces a “very good” (but perhaps non-optimal) result We have not been able to demonstrate a better result than greedy 3-pass pruning However, the quest for a provably optimal solution continues...

31 Pruning: an example 1234

32 Block-based vs. path-based timing Block-based Path-based 1 2 3 4 5 6

33 Block-based & path-based timing In timing graph, if node has n fanins, m fanouts, eliminating it causes 2mn constraints instead of 2 (m+n) Criterion: if 2mn  2(m+n)+2, prune! 1 2 3 4 5 6 1 2 3 5 6

34 Detailed pruning example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

35 Detailed pruning example 1 2 3 7 91114 4 5 6 8 101316 Sink12 15 Source Edges = 26 Nodes = 16 (+2)

36 Detailed pruning example 7 91114 1 2 3 4 5 6 8 101316 Sink12 15 Source Edges = 26  20 Nodes = 16  10

37 Detailed pruning example 1 2 3 7 911 4 5 6 8 1013 Sink12 Source 14 16 15 14 Edges = 20  17 Nodes = 10  7

38 Detailed pruning example 91114 4 5 6 8 1013 16 Sink12 15 Source 14 1,7 2,7 3,7 Edges = 17  16 Nodes = 7  6

39 Detailed pruning example 9 11,14 4 5 6 8 1013 16 Sink12 15 Source 14 1,7 2,7 3,7 Edges = 16  15 Nodes = 6  5

40 Detailed pruning example 9 11,14 4 5 6 8 10 13,16 Sink12 15 Source 14 1,7 2,7 3,7 Edges = 15  14 Nodes = 5  4

41 Detailed pruning example 9 11,14 4 5 6 8 10 10,13,16 Sink12 15 Source 14 1,7 2,7 3,7 Edges = 14  13 Nodes = 4  3

42 Detailed pruning example 9 11,14 4 5 6 8 10,13,16 Sink Source 10,12,14 12,15 10,12,15 12,14 1,7 2,7 3,7 Edges = 13  13 Nodes = 3  2

43 Pruning vs. no pruning

44 Adjoint Lagrangian mode –gradient computation is the bottleneck –if the problem has m measurements and n tunable transistor/wire sizes: traditional direct method: n sensitivity simulations traditional adjoint method: m adjoint simulations –adjoint Lagrangian method computes all gradients in a single adjoint simulation!

45 Adjoint Lagrangian mode –useful for large circuits –implication: additional timing/noise constraints at no extra cost! –is predicated on close software integration between the optimizer and the simulator –gradient computation is 8% of total run time on average

46 Noise considerations –noise is important during tuning –semi-infinite problem ],[in allfor ),( 21 tttNMtxv L  vt area = c(x) NM L t1t1 t2t2

47 Noise considerations Trick: remap infinite number of constraints to a single integral constraint c(x) = 0 In adjoint Lagrangian mode, any number of noise constraints almost for free! General (constraints, objectives, minimax) Tradeoff analysis for dynamic library cells v t area = c(x) NM L t1t1 t2t2

48 Noise considerations

49 Initialization of s 1/2 1/6 1/4 1/6

50 Some numerical results - Dynamic

51 Some numerical results - Static

52 –Motivation Lagrangian relaxation Tuning community loves the approach –reduces the size of the problem –reduces redundancy and degeneracy – BUT … Never get something for nothing

53 Lagrangian relaxation (continued) Complicating constraints Relax into objective function

54 Substituting the first-order optimality conditions on  removes the dependence on z and AT’s --- because of the problem’s structure! Lagrangian relaxation (continued)

55 Lagrangian Relaxation

56 Lagrangian relaxation (continued) NameFETs Iterations CPU(s) w/o w w/  w/o w inv3 8 61 2 365.15.3 a3.2 34 41 17 215 21.5 138.2 Agrph 34 30 28 303 38.1 434.7 ldder 46 41 16 182 21.5 121.1 s3901 72 40 34 109 87.1 479.7 s3902 102 67 61 471 990.5 842.6 s3903 154 26 29 131 236.8 1502 ppc-1 824 37 117 1181 181857430 s3904 882 20 156 386 532.814553 c8 584 53 194 1372 100437068

57 Future work and conclusions – Elaine/IPOPT (Andreas Waechter) Handle linear constraints as before/directly Nonlinear constraints as before/filter method Simple bounds via primal-dual interior point method Spherical trust region scaled appropriately/line search method

58 References C. Visweswariah, A. R. Conn and L. Silva Exploiting Optimality Conditions in Accurate Static Circuit Tuning in High Performance Algorithms and Software for Nonlinear Optimization, G. DiPillo and A. Murli, Eds, pages 1-19, Kluwer, 2002 (to appear) A. R. Conn, P. K. Coulman, R. A. Haring, G. L. Morrill, and C. Visweswariah. Optimization of custom MOS Circuits by transistor sizing. To appear (2002) in the book "The Best of ICCAD - 20Years of Excellence in Computer Aided Design". Originally appeared as IEEE International Conference on Computer-Aided Design, pages 174--180, Nov 1996 A.R. Conn and C. Visweswariah Overview of continuous optimization advances and applications to circuit tuning. Proceedings International Symposium on Physical Design (2001), pp. 74-81, ACM Press, New York.

59 References C. Visweswariah, R. A. Haring, and A. R. Conn. Noise considerations in circuit optimization. IEEE Transactions on Computer-Aided Design of ICs and Systems, Vol. 19, pages 679-690, June 2000. C. Visweswariah and A. R. Conn Formulation of static circuit optimization with reduced size, degeneracy and redundancy by timing graph manipulation. IEEE International Conference on Computer-Aided Design, pages 244--251, Nov. 1999. A. R. Conn, L. N. Vicente, and C. Visweswariah. Two-step algorithms for nonlinear optimization with structured applications. SIAM Journal on Optimization, volume 9, number 4, pages 924--947, September 1999.

60 References A. R. Conn, I. M. Elfadel, W. W. Molzen, Jr., P. R. O'Brien, P. N. Strenski, C. Visweswariah, and C. B. Whan Gradient-based optimization of custom circuits using a static-timing formulation. Proc. Design Automation Conference, pages 452--459, June 1999. A. R. Conn, R. A. Haring, and C. Visweswariah Noise considerations in circuit optimization. IEEE International Conference on Computer-Aided Design, pages 220--227, Nov 1998. A. R. Conn, P. K. Coulman, R. A. Haring, G. L. Morrill, C. Visweswariah, and C. W. Wu. JiffyTune: circuit optimization using time-domain sensitivities. IEEE Transactions on Computer-Aided Design of ICs and Systems, number 12, volume 17, pages 1292--1309, December 1998.

61 References A. R. Conn, R. A. Haring, C. Visweswariah, and C. W. Wu. Circuit optimization via adjoint Lagrangians. IEEE International Conference on Computer-Aided Design, pages 281--288, Nov 1997. A. R. Conn, R. A. Haring, and C. Visweswariah. Efficient time-domain simulation and optimization of digital FET circuits. Mathematical Theory of Networks and Systems, May 1996. A popular article on our work by Stewart Wolpin http://www.research.ibm.com/thinkresearch/pages/2002/20020625_einstuner.shtml

1 The Optimization of High- Performance Digital Circuits Andrew Conn (with Michael Henderson and Chandu Visweswariah) IBM Thomas J. Watson Research Center.

Similar presentations

Presentation on theme: "1 The Optimization of High- Performance Digital Circuits Andrew Conn (with Michael Henderson and Chandu Visweswariah) IBM Thomas J. Watson Research Center."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 The Optimization of High- Performance Digital Circuits Andrew Conn (with Michael Henderson and Chandu Visweswariah) IBM Thomas J. Watson Research Center.

Similar presentations

Presentation on theme: "1 The Optimization of High- Performance Digital Circuits Andrew Conn (with Michael Henderson and Chandu Visweswariah) IBM Thomas J. Watson Research Center."— Presentation transcript:

Similar presentations

About project

Feedback