Presentation is loading. Please wait.

Presentation is loading. Please wait.

High Performance, Multi-CPU Power Signoff for Mega Designs

Similar presentations


Presentation on theme: "High Performance, Multi-CPU Power Signoff for Mega Designs"— Presentation transcript:

1 High Performance, Multi-CPU Power Signoff for Mega Designs
Patrick Sproule Director of Engineering, VLSI Methodology

2 Nvidia Power Analysis Requirements
Static and Dynamic Full Chip Power Analysis Tool implementation must handle both sub-chip analysis or full die analysis in a single sessions. Ideally provide full domain analysis for full accuracy in a single run. Design Size Scalability Full flat design analysis to handle both small and largest production designs on existing/available compute resource. Runtime Predictability Designs get larger but schedule time for power analysis is required to stay constant or shrink. Required close ended runtime estimates. Clear Reporting Large amount of analysis data must be condensed to clear reports.

3 Power Analysis Challenges
Designs have seen device count grow by 4 orders of magnitude in less than 10 years. Increased number of metal layers and modelled device count cause calculation to expand faster than tools and compute resources. Large runtimes and/or inefficient subdivision of designs required. Designs have also become highly replicated at a multitude of hierarchy levels. Complexity of data handling and integration within the tools. Many engineer run analysis at different hierarchy levels. Recreation of db and duplication of analysis costs schedule.

4 Current Rail Analysis Methodology
Partition-based hierarchical methodology is planned and executed within a large design team at many levels Unique design technologies, especially in low power Multi-power domains, power gating switches, … Full Chip Integration Full Chip chiplet Chiplet Owners partition Partition Owners

5 Typical Extraction and Rail Analysis
Power-Grid-View (PGV): physical modeling of IP Current Signatures Extraction Rail: RC, current, geometry Physical Database PGDB Primitive PGV RC Extraction Rail Analysis Current Signatures IR Drop Results/Plots

6 Hierarchical Rail Analysis Method (H-PGV)
Partition 1 Partition N H-PGV 1 H-PGV N RC Extraction Top-Level Database PGDB RC Extraction Current Signatures Rail Analysis Primitive PGV IR Drop Results/Plots

7 H-PGV Advantages H-PGV generation runtime is minimal compared to full chip database setup for IR-drop analysis H-PGVs can be generated in parallel Hierarchical methodology supports bottom-up and top-down rail analysis. Capturing H-PGV boundary condition for ECO at partition level (top down push) Full and Sub-chip level analysis time greatly improved with same accuracy

8 Flat vs. Hierarchical Correlation
Example Analysis: Sub-chip level 14.4M total primitive instance count (modelled cells) 8.9M regular logic and memory cells 5.5M filler, tap, decap cells 18 total partitions in chiplet 7 unique partitions 3 partitions replicated 4 time each. H-PGV run metrics : Runtime : 18~32 minutes Memory : 40~45G

9 Rail Analysis at Full Chip Level
Design Metal Layers # of Transistors (Billions) RAM (GB) CPU Rail Analysis Runtime (Days) GF100 (flat) 9 3.0 200 1 2.25 GK104 11 3.5 600 8 10 GK110 7 1000+ (est.) 26 (est.) (hierarchical) 650

10 Nvidia Scale and Runtime Issues
Design Size Growth outpacing tool and resource capability.

11 Voltus on Kepler ~380M instances flat analysis – tsmc28nm
Main resource: ~725Gb memory on 1Tb 32 cpu machine. Static and Dynamic Signoff Power analysis at VDD & VSS (done as parallel runs) 21 hour runtime per analysis domain. ~8x runtime improvement over previous method with equivalent accuracy.

12 Rail Analysis at Full Chip Level
Design Metal Layers # of Transistors (Billions) RAM (GB) CPU Rail Analysis Runtime (Days) GF100 (flat) 9 3.0 200 1 2.25 GK104 11 3.5 600 8 10 GK110 7 1000+ (est.) 26 (est.) (hierarchical) 650 (VOLTUS) 700 32 21 hours

13 Nvidia Scale and Runtime Issues
Memory requirement

14 Summary Voltus meets our needs for Rail analysis with accuracy and runtime with far less than expected runtimes. Further testing proved possible to run VDD-GND combined domain in a single pass in 50 hrs runtime using multi-threaded and distributed capabilities. Capability to run both multi-threaded and distributed allows us the flexibility to manage schedule and resource requirements. Congratulations to the Voltus team on delivering a distruptive runtime improvement.

15 Q&A


Download ppt "High Performance, Multi-CPU Power Signoff for Mega Designs"

Similar presentations


Ads by Google