Presentation on theme: "Chungki Oh, Jianfeng Liu, Seokhoon Kim, Kyung-Tae Do,"— Presentation transcript:
1 Critical Signal Flow for Power Estimation: The Road to Billion Gate SoC Power Verification Chungki Oh,Jianfeng Liu, Seokhoon Kim, Kyung-Tae Do,JungYun Choi, Hyo-Sig Won, Kee Sup KimDesign Technology TeamSystem LSI DivisionSamsung ElectronicsJeongwon Kang, Kamlesh Madheshiya, Arti DwivediAnsys Apache
2 Table of Contents Mobile SoC Design Trend Challenges in SoC Power AnalysisPower Critical Signal Flow in RTL/Gate power analysisSummary
3 Mobile SoC Design Trend The design size of mobile SoC has been increasing at a rapid speedFierce competition in mobile market has driven SoC design to provide high performance and numerous functionality, which was only previously available in PC and laptopTo meet the power wall of mobile design and leverage the additional capacity in silicon processing scaling, multiple cores and parallelism are popular in current SoC designBillion Gate SoCSoC consumer portable design complexity trends- ITRS, 2011 editionLet’s take a look at the Mobile SOC design trends.There is fierce competition to provide high performance and numerous functionalities in smart phones which were previously available in PCs or laptops.The designs have large number of modes of operation, and functional as well as power verification in different modes has become critical.To meet the power budget and leverage the additional capacity in silicon processing scaling, multi-core designs and parallel processing are popular in SOC designs.
4 Challenges in SoC Power Analysis The era of billion gate SoC design put significant challenges for power analysisSimulation is needed to analyze the dynamic power accurately. However, for billion gate SoC, the simulation runtime is becoming too long for reasonable design cycleThe simulation waveform generated from simulation can occupy more than hundreds of GigaBytes, which puts significant burden on power analysis tools to deal with.10’s of modesMillions of clocksVideo streamingGPS + Voice CallWeb +For accurate power analysis, simulation vector is essential as part of power analysis flow.As the design size increases, simulation of billion gates SOC designs is becoming more and more challenging.Simulation runtimes are very long for reasonable design cycles and the simulation dump sizes are in 100s of GBs. Such large simulation dumps cause significant performance and memory degradation in power analysis tools, and it is impractical to perform power analysis using such large fsdb files.
5 RTL Power Estimation Flow Basic concept of RTL power estimationInputs: RTL-coded design, power library, capacitance model, activity fileElaborate: RTL design is compiled and elaborated into an interconnection of primitive gatesCalculate Power: Design is mapped to the target technology and average/time-based power analysis is performed based on switching activityPowerArtistElaborateCalculate PowerRTL(Verilog/VHDL)RTL power reportPower Library(.lib)Activity File(.vcd/.fsdb/.saif)CapacitancemodelVerilog SimulationMicro-architecturalInferred netlistWe use PowerArtist for power estimation at RTL. The key inputs required by PowerArtist are RTL/gate design, power characterized libraries (liberty files), capacitance model and simulation dump which can be vcd, fsdb or saif file.PowerArtist first compiles and elaborates the RTL design into an inferred netlist, which is an interconnection of primitive gates.In the next step, activity file is read, and power estimation is done. During power estimation, PowerArtist performs cell selection, infers a clock tree and performs average or time based power calculation. A text report and and an OADB based database is generated.To obtain reasonable accuracy, simulation is needed for vector-based power estimation
6 Critical Signal Extraction with PowerArtist Generate a significantly smaller power-critical-signals-only FSDB from the Emulator/SimulatorFull FSDBVerilog SimulationRTLTest Benchtestbench.top_inst.temp_outtestbench.top_inst.temptestbench.top_inst.entestbench.top_inst.outtestbench.top_inst.clktestbench.top_inst.inCtestbench.top_inst.inBtestbench.top_inst.inAinitial befin$fsdbDumpfile(“pa_extracted.fsdb”);$fsdbDumpvarsByFile(“sig_file_name”);endRTLPower-Critical Signal ExtractionPowerArtistVerilogSimulationPartial FSDBCritical Signal ListTest BenchPowerArtist enables us to reduce the simulation dump size using the critical signal extraction. The goal of this flow is to generate a simulation dump for only the power-critical signals in the design. These are signals like I/O ports, sequential elements etc.PowerArtist reads in the RTL design and generates a list of power-critical signals. We provide this list to our simulator or emulator, so that an fsdb is generated only for the critical signals.This causes significant reduction in simulation runtime as well as fsdb file size, with an acceptable error in accuracy. This also reduces the runtime for power analysis in PowerArtist.
7 Power-Critical ≠ Functional-Debug Signals Identify Power-Critical SignalsPower Analysis + DebugL1Apache PowerArtistReduced FSDBSimulator/EmulatorOptimized for power analysis over entire simulation durationIdentify Function-Critical SignalsFunctional DebugL2Functional Debug ToolsReduced FSDBSimulator/EmulatorOptimized for functional debug over limited clock cyclesSome of the functional debug tools also generate a critical signal list. This is different from the power-critical signal list generated by PowerArtist. Power critical signal list consists of all signals important for power analysis accuracy like signals connected to sequential elements, control logic etc. Functional-critical signals are for functional debug and primarily consists of primary I/Os and sequential elements.
8 The Principle of Power-Critical Signal Flow Power-critical signalsActivity for only a subset of signals is necessary for accurate power estimationCritical signals consists of signals such as sequential and module in/out portsNon-critical signalsActivity propagation can be performed for the remaining signals based-on activity propagation formulae of various cell typesIO cellsFlip-FlopsICGCsLatchesPI & POMUXPower-critical signals consist of a subset of signals in the design, which are essential for power analysis accuracy, such as signals connected to sequential elements, primary I/Os etc.PowerArtist annotates the activity of critical signals from fsdb, and propagates activity for the remaining logic in the design.
9 Power-Critical Signal Flow with PowerArtist ApplicationPower-critical signals can be extracted for both RTL and gate-level designsCritical signals can be utilized in simulation as well as emulation flowsImpactActivity file dumped only for power-critical signals saves simulator/emulator and power analysis runtime and memory resource with small error in power analysisPower-critical signal flow enables power analysis of huge design for which power estimation used to be unrealizableElaborateCalculate PowerRT/Gate-leveldesignRTL Power ReportPower LibraryPartially dumpedActivity FileWire Load ModelSimulation/EmulationMicro-architecturallyInferred netlistCrit. Sig. ExtractionCrit. sig. listTest BenchTime & MemorySavingPowerArtistPower-critical signal flow is useful for power analysis at RTL as well as gate level. This flow can be used with both simulators and emulators.When simulation data is dumped using critical-signal list, it reduces the runtime and memory usage of simulator or emulator. It also significantly reduces the size of simulation dump.This flow also helps to reduce runtime and memory usage in PowerArtist.
10 Critical Signal Flow for RTL Power Estimation Experimental result with Design-A in RTLThe first experiment was done with a multimedia codec IP designDesign size is about 8 Million Gates, with 32nm libraryCPU timeImpact on CPU time69% Time reductionImpact on memory resource & power result46% Memory saving5% Power mismatch58% Disk savingI would like to share some results based on this flow now. The first design is a multimedia codec IP design.This RTL design is about 8M gates in size at 32nm technology.Looking at the total runtime of simulation, and power analysis, we have achieved a runtime reduction by 69%.Power analysis runtime itself reduced by about 89%.Simulation dump size was 50% smaller and PowerArtist memory usage improved by 46%.Our power analysis results were within 5% of power numbers with full simulation dump.
11 Critical Signal Flow for RTL Power Estimation (2) Experimental result with Design-B in RTLThe second experiment was done with quad-core CPU blockDesign size is Tens of Million Gates, with 32nm library42% Memory saving2% Power mismatch73% Disk savingImpact on memory resource & power resultCPU time [hr]11724121478% Time reductionImpact on CPU timeThe 2nd design is a quad-core CPU block. This RTL design is 10M+ gates at 32nm technology.For this block, we achieved 78% reduction in runtime.FSDB size was reduced by a signicant 73% and power correlation with full-simulation dump is within 2%.PowerArtist memory usage improved by 42%.
12 Critical Signal Flow for Gate-level Power Estimation Experimental result with Design-A in Gate-levelThe third experiment was done with same design as the first one but in gate-levelDesign size is about 8 Million Gates, with 32nm library69% Time reductionCPU timeImpact on CPU time87% Memory saving9% Power mismatch97% Disk savingImpact on memory resource & power resultThe 3rd design is a gate level netlist of 1st testcase, a multimedia codec IP design.For this design, we achieved a total runtime improvement of 69%.There was a large reduction in fsdb size at 97%. The correlation with full-simulation dump is 9%.
13 SummaryIn the era of billion gate SoC chip design, the runtime and generated waveform database size are challenging issues for accurate power estimation.To solve this challenge, we have proposed to use a subset of the full signal list in the design when dumping the waveform. We have introduced the methodology on how to choose this signal subset for good power correlation while keep this signal subset small enough.The PowerArtist power critical signal flow has been verified by extensive experiments covering both RTL and gate-level power estimation flows.Our experimental results show that critical signal flow cut the runtime by 70-80%, simulation waveform size by 60-97%, while keeping the power correlation within less 10% mismatch.