Presentation is loading. Please wait.

Presentation is loading. Please wait.

-1- Sensitivity-Guided Metaheuristics for Accurate Discrete Gate Sizing Jin Hu*, Andrew B. Kahng, Seokhyeong Kang, Myung-Chul Kim* and Igor L. Markov*

Similar presentations


Presentation on theme: "-1- Sensitivity-Guided Metaheuristics for Accurate Discrete Gate Sizing Jin Hu*, Andrew B. Kahng, Seokhyeong Kang, Myung-Chul Kim* and Igor L. Markov*"— Presentation transcript:

1 -1- Sensitivity-Guided Metaheuristics for Accurate Discrete Gate Sizing Jin Hu*, Andrew B. Kahng, Seokhyeong Kang, Myung-Chul Kim* and Igor L. Markov* UC San Diego, *University of Michigan International Conference on Computer-Aided Design November 5 th, 2012

2 -2- Outline Background and Motivation Background and Motivation Sensitivity-Guided Metaheuristics Sensitivity-Guided Metaheuristics –Global Timing Recovery –Power Reduction with Feasible Timing Experimental Results Experimental Results Conclusions and Ongoing Work Conclusions and Ongoing Work

3 -3- Gate Sizing in VLSI Design Gate sizing Gate sizing –Effective approach to power, delay optimization –Sizing problem seen at all phases of RTL-to-GDS flow Energy vs. Performance Envelope in VLSI Design Energy vs. Performance Envelope in VLSI Design All Possible Designs Energy Delay Lowest possible delay Lowest possible energy Energy consumption vs. performance tradeoff Pareto frontier

4 -4- Gate Sizing in VLSI Design Objective Objective –Size the library cell of each gate while minimizing total power subject to design constraints (e.g., slack, slew, capacitance)

5 -5- Gate Sizing in VLSI Design Objective Objective –Size the library cell of each gate while minimizing total power subject to design constraints (e.g., slack, slew, capacitance) Tunable parameters : gate-width, gate-length and V th Tunable parameters : gate-width, gate-length and V th gate-width (drive-strength) multi-Vth L gate -bias INVX2INVX4 INVX8INVX16 HVTNVT LVT L=60nmL=65nmL=55nm lower power lower speed higher power higher speed

6 -6- Previous Approaches Common heuristics/algorithms Common heuristics/algorithms Limitations Limitations –Continuous methods: industrial cell libraries offer discrete gate sizes, and rounding solutions is not easy –Discrete methods: scalability to large circuits is an issue –Do not account for realistic delay models and constraints (capacitance, slew) Continuous methods Discrete methods Linear programming Convex optimization Lagrangian relaxation Dynamic programming Sensitivity-based sizing Optimality Scalability

7 -7- Stochastic Combinatorial Optimization Hard combinatorial optimizations are often solved using Simulated Annealing or other metaheuristics Hard combinatorial optimizations are often solved using Simulated Annealing or other metaheuristics Our work uses two newer metaheuristic frameworks: Large-Step Markov Chains and Go-With-The-Winners Our work uses two newer metaheuristic frameworks: Large-Step Markov Chains and Go-With-The-Winners SA: analogy to physical annealing and thermodynamic ensembles [Kirkpatrick, Gelatt, Vecchi 1983] SA: analogy to physical annealing and thermodynamic ensembles [Kirkpatrick, Gelatt, Vecchi 1983] State = solution; Energy = cost State = solution; Energy = cost Optimal only in limit of infinitely slow cooling and runtime Optimal only in limit of infinitely slow cooling and runtime –Annealing on fractal landscapes: Sorkin91 –Finite-time annealing: BoeseK93

8 -8- Stochastic Combinatorial Optimization: LSMC Large-Step Markov Chains [Martin, Otto, Felten 1991]: Iteratively perform two operations: 1. descend using a greedy search method, 2. perturb local optimum result with kick move Large-Step Markov Chains [Martin, Otto, Felten 1991]: Iteratively perform two operations: 1. descend using a greedy search method, 2. perturb local optimum result with kick move Takes advantage of an available local search heuristic; more efficient than conventional simulated annealing Takes advantage of an available local search heuristic; more efficient than conventional simulated annealing LSMC is essentially greedy, but with a powerful neighborhood operator (= {kick + descent}); always steps directly from one local minimum to better local minimum LSMC is essentially greedy, but with a powerful neighborhood operator (= {kick + descent}); always steps directly from one local minimum to better local minimum

9 -9- Stochastic Combinatorial Optimization: GWTW Go-With-The-Winners [Aldous, Vazirani 1994]: invoke greedy heuristics with randomized multi-starts, explore large space by continuing the search from a small set of best-seen solutions Go-With-The-Winners [Aldous, Vazirani 1994]: invoke greedy heuristics with randomized multi-starts, explore large space by continuing the search from a small set of best-seen solutions Finds global optimum with high probability under certain assumptions [AV94] Finds global optimum with high probability under certain assumptions [AV94] Runtime of GWTW is bounded by a polynomial in depth of tree and tree imbalance parameter Runtime of GWTW is bounded by a polynomial in depth of tree and tree imbalance parameter

10 -10- Our Work: Sensitivity-Guided Metaheuristics We apply sensitivity-guided metaheuristics based on the Go-With-The-Winners paradigm We apply sensitivity-guided metaheuristics based on the Go-With-The-Winners paradigm –Define parameterized space for gate sizing –Explore a heuristic space using multistart technique and efficient parallelization on multi-core system –Use total negative slack (TNS) as a sensitivity function (with fast estimation technique) Infrastructure: ISPD 2012 gate sizing contest Infrastructure: ISPD 2012 gate sizing contest –Realistic benchmarks mapped into a modern discrete gate library

11 -11- Outline Background and Motivation Background and Motivation Sensitivity-Guided Metaheuristics Sensitivity-Guided Metaheuristics –Global Timing Recovery –Power Reduction with Feasible Timing Experimental Results Experimental Results Conclusions and Ongoing Work Conclusions and Ongoing Work

12 -12- Trident: Sensitivity-Guided Metaheuristics In our heuristic, multiple tines of a trident represent multiple solution trajectories Trident: central to the symbols of both UCSD and the Ukraine

13 -13- Trident: Sensitivity-Guided Metaheuristics Our Heuristic: explore a parameterized heuristic space with multistarts, then apply Go-With-The-Winners Initial solution Final solution multistarts go-with-the-winners Global Timing Recovery (GTR) Power Reduction with Feasible Timing (PRFT) Find violation-free solutions with multstarts (recover feasibility ) Iteratively reduce total leakage with greedy downsizing (maintain feasibility)

14 -14- Trident: Entire Flow Primary Optimization Multi-threaded Final Cell Assignments Power Reduction with Feasible Timing (PRFT) Sensitivity-guided Greedy Sizing Perturbing (upsizing) Bottleneck Cells Input Design (Netlist, SPEF, SDF, Cell Library) Initial Cell Assignments Global Timing Recovery (GTR) Coarse Search Fine Search Multistart Violation-free solution

15 -15- GTR seeks violation-free solutions w/ two parameters: α : leakage exponent and γ : % of upsizing GTR seeks violation-free solutions w/ two parameters: α : leakage exponent and γ : % of upsizing Global Timing Recovery: Flow on Each Thread Run static timing analysis Calculate sensitivity ( α ) for cells w/ negative slack Upsize γ % of cells in descending order of sensitivity Timing meet? Update timing NO TNS: total negative slack TNS: TNS reduction after cell upsizing leakage: cell leakage increase after cell upsizing

16 -16- Global Timing Recovery: TNS Estimation delay = delay change (old – new) from upsizing Npaths = # of negative-slack paths through the cell

17 -17- Multistart w/ different parameters and Go-With-The- Winners; GTR sweeps parameter α and γ, and chooses the best (minimum leakage) solution Multistart w/ different parameters and Go-With-The- Winners; GTR sweeps parameter α and γ, and chooses the best (minimum leakage) solution Global Timing Recovery: Multistart Search Space Coarse Search Step Size (0, ] init [, ] CGS Thres (0, ] init Thres Best solutions ( α, γ ) [, ] Search Space Fine Search Step Size FGS [ - /2, + /2] Thres Best solutions (, ) Focus on ranges around best-seen param.

18 -18- In GTR, some cells are oversized In GTR, some cells are oversized PRFT iteratively reduces total leakage power using sensitivity-guided greedy sizing (SGGS) PRFT iteratively reduces total leakage power using sensitivity-guided greedy sizing (SGGS) Power Reduction with Feasible Timing Run static timing analysis Calculate sensitivity for all cells Downsize cell C with maximum sensitivity slack (C ) < 0 Incremental STA NO Revert the sizing YES SGGS procedure:

19 -19- PRFT runs multiple SGGS with different sensitivity functions (SF1 ~ SF5) PRFT runs multiple SGGS with different sensitivity functions (SF1 ~ SF5) PRFT: Sensitivity Functions SF1leakage / delay SF2leakage * slack SF3leakage / (delay*#paths) SF4leakage * slack / #paths SF5leakage * slack / (delay*#paths) Each SF provides a different solution, and we select the best solution among them Each SF provides a different solution, and we select the best solution among them Each run automatically finds the best SF for a given testcase Each run automatically finds the best SF for a given testcase

20 -20- Monotonic downward sizing can be a local optimum Monotonic downward sizing can be a local optimum Speed up bottleneck cells: recover timing slack with minimum power impact Speed up bottleneck cells: recover timing slack with minimum power impact Perturbation and greedy sizing recall the LSMC approach Perturbation and greedy sizing recall the LSMC approach PRFT: Speeding up Bottleneck Cells Sensitivity-guided Greedy Sizing w/ SF i best solution Speed up γ % bottleneck cells best seen ? yes no final solution Progression of GTR & PRFT (TNS, leakage) GTR PRFT kick-move

21 -21- In PRFT, cell slack should be recalculated incremental STA is used after cell sizing to reduce runtime In PRFT, cell slack should be recalculated incremental STA is used after cell sizing to reduce runtime To achieve further speedup, we propagate updated timing when it is larger than a propagation threshold (e.g., 0.1ps) To achieve further speedup, we propagate updated timing when it is larger than a propagation threshold (e.g., 0.1ps) Incremental Static Timing Analysis 1. Update cell delay, transition time and AAT 2. Update RAT and slack

22 -22- Handling Capacitance and Slew Violations Each standard cell can drive a certain maximum capacitance, and transition time must be smaller than maximum transition time Each standard cell can drive a certain maximum capacitance, and transition time must be smaller than maximum transition time Trident removes max-capacitance and max- transition (slew) violations at every iteration of GTR Trident removes max-capacitance and max- transition (slew) violations at every iteration of GTR Max. Cap. violation 1.Backward traversal: visit cells in reverse order, and upsize driving cells 2.Forward traversal: downsize fanout cells Requires one to two iterations Max. Cap. violation

23 -23- Configurations for GWTW Trident can configure the number of best-seen solutions used in GWTW Trident can configure the number of best-seen solutions used in GWTW GTR: coarse search GTR: coarse search GTR: fine search 1 GTR: fine search 1 GTR: fine search 2 GTR: fine search 2 PRFT: greedy sizing PRFT: greedy sizing PRFT: kick-move+ greedy sizing PRFT: kick-move+ greedy sizing Parameters from N-best seen solutions Which configuration is optimal in terms of runtime vs. sizing quality? Which configuration is optimal in terms of runtime vs. sizing quality? –More start points More chances to find near-optimum –Runtime increases with the number of start points SF1, SF2 ….

24 -24- Outline Background and Motivation Background and Motivation Sensitivity-Guided Metaheuristics Sensitivity-Guided Metaheuristics –Global Timing Recovery –Power Reduction with Feasible Timing Experimental Results Experimental Results Conclusions and Ongoing Work Conclusions and Ongoing Work

25 -25- ISPD 2012 Gate Sizing Contest [Ozdal et al.] Provide benchmarks to accurately model the discrete gate-sizing problem Provide benchmarks to accurately model the discrete gate-sizing problem Netilst (Verilog), parasitics (SPEF), timing constraint (SDC) Netilst (Verilog), parasitics (SPEF), timing constraint (SDC) Library: 11 different logic functions, 30 different cell types (three multi-Vt and ten different sizes) 330 cells Library: 11 different logic functions, 30 different cell types (three multi-Vt and ten different sizes) 330 cells The contest compares leakage power of violation-free solutions The contest compares leakage power of violation-free solutions

26 -26- Analysis of Our Implementation Trident is written in C++, has a built-in static timer Trident is written in C++, has a built-in static timer –Significantly faster than Tcl access to PT in contest env. –Improve runtime w/ an incremental STA Runtime for all (14) ISPD2012 benchmarks: less than 83 hours w/ 4 threads (Intel Xeon E31230) Runtime for all (14) ISPD2012 benchmarks: less than 83 hours w/ 4 threads (Intel Xeon E31230) coarse search: 6.2% fine-grain search: 25.1% Greedy sizing: 45.4.0% Perturbing iterations: 23.0% Runtime breakdown (for NETCARD_slow)

27 -27- Characterization of ISTA Runtime of full-scale STA and incremental STA Runtime of full-scale STA and incremental STA –Average runtime and maximum slack error have been measured after randomly sizing 1% of cells benchmarks FSTA (sec) ISTA runtime (msec)ISTA max. error (ps) 0ps0.1ps1.0ps0ps0.1ps1.0ps DMA0.2331.4950.8450.5080.00.102.20 PCI0.2710.9820.7170.3480.00.221.73 DES1.1080.70.5080.4220.00.101.28 VGA1.72922.758.1082.0690.00.332.59 B192.4355.462.7171.8330.00.331.94 LEON3MP6.74643.212.1520.9390.00.322.73 NETCARD9.7519.6122.2991.6750.00.322.57 Geomean924X2.86X1.0X0.54X0.0X1.0X9.1X

28 -28- Tuning of ISTA Runtime vs. Quality of sizing solutions with different propagation thresholds Runtime vs. Quality of sizing solutions with different propagation thresholds Prop. threshold0.0ps0.1ps1.0ps ISTA runtime (msec) 1.4950.8450.508 ISTA max. error (ps) 0.00.102.20 Trident runtime (min)19.413.912.8 Final leakage (mW)0.299 0.306 Testcase: DMA (w/ tight timing constraint)

29 -29- Configurations of GWTW Five multi-threaded stages can provide n best-seen solutions for GWTW heuristic Five multi-threaded stages can provide n best-seen solutions for GWTW heuristic Solution quality vs. runtime Solution quality vs. runtime [A][B][C][D][E] Stage A: GTR coarse-grain search B: GTR fine-grain search I C: GTR fine-grain search II D: PRFT greedy sizing E: PRFT speed up bottleneck Value 0: skip the stage 1: keep one best solution 2: keep two best solutions Default configuration: 22211

30 -30- Experimental Results on ISPD Benchmarks Benchmarks (# of cells) leakage power Runtime (min) GTR param.PRFT param. GTRPRFT α γ (%)SF γ (%) DMA (25K)0.650.299140.9124.5SF51 PCI (33K)0.3480.183130.9134SF44 DES (111K)7.1571.842830.8546.5SF51 VGA (165K)0.6850.471460.717.5SF54 B19 (219K)1.3770.7712071.3316.5SF24 LEON3 (649K)1.9891.48713230.717SF41 NETCARD (959K)1.9971.86110970.574SF31 Leakage power/ runtime/ parameters for GTR, PRFT Leakage power/ runtime/ parameters for GTR, PRFT Benchmark set with tight timing constraint Benchmark set with tight timing constraint Best parameter values found by our heuristic

31 -31- Experimental Results on ISPD Benchmarks Benchmarks (# of cells) leakage power Runtime (min) GTR param.PRFT param. GTRPRFT α γ (%)SF γ (%) DMA (25K)0.2110.145101 SF55 PCI (33K)0.1850.111101.1136SF54 DES (111K)0.9220.614700.838.5SF23 VGA (165K)0.4540.35188110SF43 B19 (219K)0.7180.5832141.57.5SF51 LEON3 (649K)1.4221.34112740.894SF42 NETCARD (959K)1.8181.773002.674SF31 Leakage power/ runtime/ parameters for GTR, PRFT Leakage power/ runtime/ parameters for GTR, PRFT Benchmark set with loose timing constraint Benchmark set with loose timing constraint

32 -32- Leakage Comparison on ISPD Benchmarks Contest best: best of all entries in the competition (ISPD 2012 contest) Intel Labs (contest organizer) released five (near-optimal) results ISPD 2012 contest: http://archive.sigda.org/ispd/contests/12/ispd2012_contest.html http://archive.sigda.org/ispd/contests/12/ispd2012_contest.html In all benchmarks (except one), Trident achieves lowest leakage power: 43% further reduction over contest winner. We outperform Intel results on four.

33 -33- Conclusions Within the research-oriented infrastructure used in ISPD 2012 Gate-Sizing Contest, we have developed a metaheuristic approach to gate sizing Within the research-oriented infrastructure used in ISPD 2012 Gate-Sizing Contest, we have developed a metaheuristic approach to gate sizing Our implementation, Trident, outperforms the best reported results on all but one of the ISPD 2012 benchmarks. Our implementation, Trident, outperforms the best reported results on all but one of the ISPD 2012 benchmarks. Compared to the 2012 contest winner, we further reduce leakage power by an average of 43% Compared to the 2012 contest winner, we further reduce leakage power by an average of 43%

34 -34- Ongoing Works Extension to support real industry library Extension to support real industry library Consider addition of interconnect delay in the next version of sizer Consider addition of interconnect delay in the next version of sizer

35 -35- Thank you

36 -36- Experimental Results on ISPD Benchmarks Leakage power/ runtime/ parameters for GTR, PRFT Benchmarks # of cells leakage power Runtime (min) GTR param.PRFT param. GTRPRFTαγ(%)SFγ(%) DMA_fast25.30.650.299140.9124.5SF51 DMA_slow25.30.2110.145101 SF55 PCI_fast33.20.3480.183130.9134SF44 PCI_slow33.20.1850.111101.1136SF54 DES_fast1117.1571.842830.8546.5SF51 DES_slow1110.9220.614700.838.5SF23 VGA_fast1650.6850.471460.717.5SF54 VGA_slow1650.4540.35188110SF43 B19_fast2191.3770.7712071.3316.5SF24 B19_slow2190.7180.5832141.57.5SF51 LEON3_fast6491.9891.48713230.717SF41 LEON3_slow6491.4221.34112740.894SF42 NETCARD_fast9591.9971.86110970.574SF31 NETCARD_slow9591.8181.773002.674SF31

37 -37- Experimental Results on ISPD Benchmarks Leakage power/ runtime/ parameters for GTR, PRFT Benchmarks # of cells leakage power Runtime (min) GTR param.PRFT param. GTRPRFTαγ(%)SFγ(%) DMA_slow25.30.2110.145101 SF55 PCI_slow33.20.1850.111101.1136SF54 DES_slow1110.9220.614700.838.5SF23 VGA_slow1650.4540.35188110SF43 B19_slow2190.7180.5832141.57.5SF51 LEON3_slow6491.4221.34112740.894SF42 NETCARD_slow9591.8181.773002.674SF31


Download ppt "-1- Sensitivity-Guided Metaheuristics for Accurate Discrete Gate Sizing Jin Hu*, Andrew B. Kahng, Seokhyeong Kang, Myung-Chul Kim* and Igor L. Markov*"

Similar presentations


Ads by Google