Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper.

Similar presentations


Presentation on theme: "Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper."— Presentation transcript:

1 Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper Halbutogullari AMD Sunnyvale, CA March 19, 2007 ISPD 2007 Austin

2 ISPD 2007 2 March 19, 2007 Agenda Justification for threshold voltage selection for leakage power reduction and multi-corner cycle time adjustments Multi-Threshold voltage selection flow Heuristic V TH selection algorithm Dynamic Forward traversal V TH selection algorithm Results Conclusions Q & A

3 ISPD 2007 3 March 19, 2007 Motivation Reduce leakage power by increasing the threshold voltages of non-critical gates. Meet aggressive timing constraints Support the above constraints for multiple process corners Optimize extremely rigid designs at post-route step to handle process variability Support multi-V TH flows (scalable as more V TH libraries are made available) Generate design variants with power-performance tradeoff

4 ISPD 2007 4 March 19, 2007 METHODOLOGY & OPTIMIZATION FLOW

5 ISPD 2007 5 March 19, 2007 Methodology Flow 1.Start with unoptimized design 2.Read in constraints for multiple corners 3.Run Static Timing Analysis for each of these corners 4.Optimize first to meet aggressive timing constraints for each corner by down-swapping (selecting lower V TH cells for critical path gates) 5.Then optimize to reduce leakage power by up-swapping (selecting higher V TH cells for critical path gates) 6.Let multiple corners interact 7.Iterate 3-6 8.Static Timing Analysis check

6 ISPD 2007 6 March 19, 2007 Simultaneous optimizations across multiple corners STA 1 Corner 1 Corner 2 Optimization Iteration 1 Optimization Iteration 1 Exchange swaps as they are computed STA 2 New design

7 ISPD 2007 7 March 19, 2007 Multi-Threshold V TH selection flow

8 ISPD 2007 8 March 19, 2007 Optimization flow – Multi corner + design variant Lib Mobile constraints Desktop constraints Corner 1 Corner 2 Corner 3 Corner 4 Un-optimized Design Optimized for corner 1 Optimized for corner 2 Optimized for corner 3 Optimized for corner 4 Optimized Mobile design Optimized Desktop design

9 ISPD 2007 9 March 19, 2007 Multi V TH scalable – 3 V TH example Un-optimized MVT Design Un-optimized HVT Design Un-optimized MVT Design + Final Design Step 1: Meet timing constraints : down-swap Fix critical paths by changing to LVT MVT LVT Step 2: Reduce leakage power : up-swap HVT LVT HVT MVT LVT HVT Extract HVT

10 ISPD 2007 10 March 19, 2007 Heuristic V TH Selection Algorithm

11 ISPD 2007 11 March 19, 2007 Heuristic Algorithm Sensitivity analysis based heuristic approach Picks instances that have the most impact on performance with reasonable leakage costs Instances picked affect multiple paths Circuit topology aware Works best for the first few optimization iterations Flexibility to chose an instance selection window size to fine-grain the optimization

12 ISPD 2007 12 March 19, 2007 Heuristic algorithm – Pros and Cons Pros Extremely fast Efficiently selects instances that affect multiple critical paths. Changing only these instances to low V TH cells helps meet aggressive timing constraints at very low power leakage costs. Parametrizable instance selection windows Topology aware algorithm

13 ISPD 2007 13 March 19, 2007 Cons Effective only in the first few set of iterations. Does not work best when fine-grain optimization is required No timing update or analysis done to improve results within a single round of iteration. Each iteration picks a window of instances for V TH selection. Timing information is not updated with every swap with the same selection group.

14 ISPD 2007 14 March 19, 2007 1.list all launching flops 2.foreach flop f 3. do depth first recursive forward traversal 4. calculate time benefit if swapped from libraries 5.determine total VTH layout width (cost) 6.calculate benefit/cost score 7.for each immediate o/p pin 8. prorate each score 9. criticality with other relatively critical pins 10. register capture flop 11. [recursively get downstream scores] 12. add downstream scores to current inst score 13.for each flop from list of capture flops 14. do depth first recursive reverse traversal 15. calculate time benefit if swapped from libraries 16.determine total VTH layout width (cost) 17.calculate benefit/cost score 18. for each immediate i/p pin 19. prorate each score based on i/p pin 20. criticality with other relatively critical pins 21. [recursively get upstream scores] 22. add upstream scores to current inst score 23.list all instances in decreasing final scores 24. pick top x% of instances and swap them to lower VTH 25.update database and perform STA 26.repeat PseudoCode for heuristic algorithm

15 ISPD 2007 15 March 19, 2007 Definition of Instance score a : Original Cell b : Potential Cell selection m : Instance under consideration p : Each transistor within cell ‘a’ or cell ‘b’

16 ISPD 2007 16 March 19, 2007 Updated topological instance score Individual score from Sensitivity analysis Scores of Instances downstream Scores of Instances upstream

17 ISPD 2007 17 March 19, 2007 Computing DownCone scores m: instance being considered for selection n: Fanout gate of m m n

18 ISPD 2007 18 March 19, 2007 Computing UpCone scores m: instance being considered for selection n: Fanin gate of m m n

19 ISPD 2007 19 March 19, 2007 Upscore proration With Proration Without Proration

20 ISPD 2007 20 March 19, 2007 Downscore proration With Proration Without Proration

21 ISPD 2007 21 March 19, 2007 Advantage of proration Leakage power Normalized with respect to non-prorated cones

22 ISPD 2007 22 March 19, 2007 Dynamic Path Traversing V TH Swap algorithm

23 ISPD 2007 23 March 19, 2007 Dynamic Path Traversing Regular Forward traversal algorithm Breadth-first search from flop to flop Works with a power and timing budget to do V TH selection Only forward traversal, though backward traversal could be implemented Stops optimizing when either power or timing budget is exhausted Budgets scaled for every path based on a linear formulation of combinational logic depth and effective fanout Works best for the last few iterations where fine-grain optimization is required

24 ISPD 2007 24 March 19, 2007 Pros and Cons Pros Simple implementation Constantly works with a power and timing budget After every V TH selection, the budgets are updated Timing between swaps is more up-to-date as compared to the Heuristic algorithm Timing paths can be differentiated based on combinational depth and fanout

25 ISPD 2007 25 March 19, 2007 Cons Not as fast as the Heuristic algorithm Complementary to the Heuristic algorithm Works best for fine-grain selection. Not good at selecting the most ‘influential’ instances. Since it is traverses forward and is budget limited, it ends up selecting instances closer to the launching flop No circuit topology information

26 ISPD 2007 26 March 19, 2007 Psuedo Code for Dynamic algorithm 1.list all launching flops 2.decide worst slack to consider (eg.wslk = -40ps) 3.foreach launching flop f 4. Start with worst slack at o/p pin (path slack) 5. Start with an approximate swap cost budget 6. do breadth first recursive forward traversal 7. for each instance failing timing 8. calculate time benefit if swapped from libraries 9. determine leakage delta (cost) 10. swap this instance to its lower V TH version 11. New Timing budget = Slack of path – time benefit of inst 12. New power budget =Budget – delta power of this inst 13. Update design database for new V TH cells 14. exit loop if timing met (wslk) 15. exit loop if path is unconstrained 16. exit if receiving flop reached 17. exit loop if budget exhausted 18. 19.update design database 20.perform STA and repeat with new wslk

27 ISPD 2007 27 March 19, 2007 Flow iteration (scalable) Swap from MVT to LVT (11 iterations) H-2, H-4, H-8, D-60, H-15, D-40, H-20, D-20, H-8, D-10, D-0 Swap from HVT to MVT with LVT swaps included (11 iterations) H-2, H-4, H-8, D-60, H-15, D-40, H-20, D-20, H-8, D-10, D-0 Swap from VHVT to HVT with LVT and MVT swaps included (11 iterations) H-2, H-4, H-8, D-60, H-15, D-40, H-20, D-20, H-8, D-10, D-0 H-4 => Heuristic flow with 4% instance window D-40 => Dynamic algorithm with worst slack of -40 ps

28 ISPD 2007 28 March 19, 2007 Slack Distribution after optimization

29 ISPD 2007 29 March 19, 2007 Experiments Ex 1: Initial unoptimized design not meeting timing constraints Ex 2: Quick implementation of backward followed by forward (Front-based technique [12] * ) Ex 3: 6 step iteration using only the Dynamic swapper algorithm Ex 4: 6 step iteration using only the Heuristic swapper algorithm Ex 5: 6 step iteration using alternating combinations of the Dynamic and Heuristic swapper algorithms *[12] Srivastava, “Minimizing total power by simultaneous Vdd/VTH assignment, IEEE Transactions on Computer Aided Design; 2004

30 ISPD 2007 30 March 19, 2007 Results Ex 1Ex 2Ex 3Ex 4Ex 5 HVT (%)8.522.931.239.947.1 MVT (%)90.437.152.345.740.2 LVT (%)0.339.215.713.612 Total Leakage Power (W)2.2786.5603.5543.1222.834

31 ISPD 2007 31 March 19, 2007 Conclusions Described here is a post-route optimization flow for V TH selection that supports multiple corners This iterative flow uses 2 complementary instance selection techniques : Heuristic and a budget based forward traversal algorithm The flow is not limited to 2-3 V TH levels but is scalable for any number of levels The Heuristic algorithm is a unique non-solver based topologically aware heuristic that optimizes over multiple paths simultaneously by including the effects of the upstream and downtream logic cones Can handle huge full chip microprocessor designs with more than 5 million stdcell gates No extensive probabilistic stdcell characterization is required. Process corners can simulate inter-chip variations that are not currently handled by statistical methods. Multiple process corner optimizations occur in parallel and optimization results are shared between different servers in real-time. This reduces the number of iterations and improves the quality of the optimization. Solver based techniques failed to handle full chip industrial size designs. These designs were handled by this flow

32 ISPD 2007 32 March 19, 2007 Trademark Attribution AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners. © 2006 Advanced Micro Devices, Inc. All rights reserved. Thanks

33 ISPD 2007 33 March 19, 2007 Backup Slides

34 ISPD 2007 34 March 19, 2007 Solver based statistical tools Inaccurate sensitivity models based on delta VTH variation of transistor widths Difficulty in translating transistor model sensitivities of power based on variational parameters to huge libraries Lack of interchip variation and consideration of only intra-chip variations Virtual memory constraints for linear solvers on industrial size designs and modeling approximations involved in non-linear solvers No topological information taken into consideration in path based heuristic approaches Inappropriate consideration of logic fanouts In statistical methods, the optimization step is usually decoupled from the librray characterization step

35 ISPD 2007 35 March 19, 2007 Downstream Score

36 ISPD 2007 36 March 19, 2007 Upstream Score


Download ppt "Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper."

Similar presentations


Ads by Google