Download presentation

Presentation is loading. Please wait.

Published byAndrew Maule Modified about 1 year ago

1
Copyright Agarwal & Srivaths, 2007 Low-Power Design and Test, Lecture 4 High-level Power Analysis

2
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 2 Outline Background ■ CMOS Power Consumption Basics ■ Why Address Power Consumption Issues in High-Level Design High-Level Power Analysis ■ RTL Power Estimation ● Fast Synthesis ● Analytical Approaches ● Characterization ■ Accelerating RTL Power Estimation ● Power Emulation (Hardware Accelerated Power Estimation) ■ Beyond RTL Power Estimation ● Power Estimation at the Cycle-accurate Behavior Level ■ Architectural Power Estimation

3
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 3 CMOS Power Consumption Basics What are the various components of CMOS power consumption?

4
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 4 Levels of Design Abstraction (d) Transistor-level layout Logic Synthesis FSM reg_c0reg_c1 reg_y1 reg_x reg_y !=<- Controller input_y input_x out x = input_x; y = input_y; while (x != y) { if (x < y) { y = y - x; } else { x = x - y; } out = x; (a)Behavioral description Scheduling Binding ST_1: x = input_x; y = input_y; goto ST_2; ST_2: c0 = x!=y; c1 = x

5
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 5 Why Address Power at Higher Levels of Design Abstraction? System-level design Power models for system-level components System-level power analysis High-level synthesis, RTL optimizations Architecture-level power analysis Power models for macroblocks, control logic Logic synthesis Logic-level power analysis Power models for gates, cells, nets Transistor-level/ Layout synthesis Transistor-level power analysis Design flow with high-level power analysis System level Algorithm level Register-transfer level Logic level Layout level Transistor level Power reduction opportunities Power analysis iteration times 10-20X 2-5X 20 - 50% seconds - minutes minutes - hours hours - days Increasing power savings Decreasing design iteration times Benefits: Estimation Early feedback about power budget Faster / Fewer design iterations Benefits: Optimization ü Large power savings possible at higher levels Benefits: Estimation Early feedback about power budget Faster / Fewer design iterations Benefits: Optimization ü Large power savings possible at higher levels

6
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 6 Outline Background ■ CMOS Power Consumption Basics ■ Why Address Power Consumption Issues in High-Level Design High-Level Power Analysis ■ RTL Power Estimation ● Fast Synthesis ● Analytical Approaches ● Characterization ■ Accelerating RTL Power Estimation ● Power Emulation (Hardware Accelerated Power Estimation) ■ Beyond RTL Power Estimation ● Power Estimation at the Cycle-accurate Behavior Level ■ Architectural Power Estimation

7
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 7 Fast Synthesis based Power Estimation Map design through “low-effort” to a netlist for power estimation [Llopis98] Use gate-level power data to perform power estimation Approach followed by some commercial tools Low-Effort Synthesis Gate-Level Power Estimation RTL Power RTL estimates Gate Level estimates 15-20% dev Source: (Llopis-98)

8
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 8 Analytical Methods Correlate power consumption to simple measures of design complexity ■ Logic Structures: Use gate count [Glaser91] ■ GE : Circuit size in NAND2 gate equivalent ■ E typ : Typical power dissipation per MHz for a NAND2 gate ■ C L : Estimated load capacitance per gate ■ f, Vdd: Clock frequency, Voltage ■ A int : Estimated activity factor per clock cycle (20-30%)

9
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 9 Analytical Methods ■ Memories [Liu94] Dominant component ■ 2 k : No. of memory cells, 2 n-k : No. of rows ■ c int : Capacitance of unit wire length ■ l column : Column interconnect length ■ C tr : Drain diffusion capacitance on the bit/bit line

10
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 10 Analytical Methods Entropy based approach [Nemani96] ■ Entropy: Measure of uncertainty in a random variable ■ Entropy H of a random variable x is given by ■ D avg : Average node switching activity ■ GE: Gate equivalents, C avg : Average gate capacitance Recall that ■ p: Probability of x being 1

11
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 11 Analytical Methods Answer: Yes! Entropy H is given by H i and H o are respectively the input and output entropies Hypothesis ■ Can D avg be estimated only from knowledge of inputs and output behavior?

12
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 12 Analytical methods Entropy Based Power Estimation Methodology: ■ Run a structural RTL simulation to measure input/output entropies ■ Using input/output entropies, estimate P avg for the combinational block ■ Use other techniques [Liu94] to estimate latch and clock power

13
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 13 Outline Background ■ CMOS Power Consumption Basics ■ Why Address Power Consumption Issues in High-Level Design High-Level Power Analysis ■ RTL Power Estimation ● Fast Synthesis ● Analytical Approaches ● Characterization ■ Accelerating RTL Power Estimation ● Power Emulation (Hardware Accelerated Power Estimation) ■ Beyond RTL Power Estimation ● Power Estimation at the Cycle-accurate Behavior Level ■ Architectural Power Estimation

14
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 14 Characterization Based Approaches ■ Characterization based power macro-models [Raghunathan-book, Ravi- aspdac05] ● Characterize a lower level implementation of an RTL block ● Construct a macromodel or power models Power = f(I/O signal statistics) ● Applicable in behavioral synthesis environments Macromodel template selection - Complexity analysis - Variable / parameter selection Pattern generation Logic- / transistor-level power simulator Data fitting / coefficient extraction Training sequences Power macromodels RTL COMPONENT LIBRARY Power Profiles

15
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 15 Power Models POW_STROBE D Q in1 D Q in2 D Q inN + ● ● ●● ● ● ● ● ●● ● ● D Q Power[31:0] ● ● ●● ● ● Coeff_1 [31:0] Coeff_2 [31:0] Coeff_N [31:0] Component Inputs/Outputs Transition count function Queues Power summation Power = coeff_0 + transition_count(in1[t], in1[t-1]) * coeff_1 + transition_count(in2[t], in2[t-1]) * coeff_2 + ……………………. + transition_count(inN[t], inN[t-1]) * coeff_N What does the power model implement? What does the power model contain? ■ Queues to store present and past values ■ Transition count function is a simple computation ■ Coefficients aggregated based on output of transition count function Coeff_0[31:0]

16
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 16 Constructing Power Models: An Example Power = coeff_0+ transition_count(in1_0[t], in1_0[t-1]) * coeff_1 + transition_count(in1_1[t], in1_1[t-1]) * coeff_2 + ……………………. + transition_count(in1_15[t], in1_15[t-1]) * coeff_16 + transition_count(in2_0[t], in2_0[t-1]) * coeff_17 + transition_count(in2_1[t], in2_1[t-1]) * coeff_18 + ……………………. + transition_count(in2_15[t], in2_15[t-1]) * coeff_32 ADDER 16 In1[0:15] In2[0:15] Out[0:15] (1) Macromodel template (2) Training Sequence 10101011011011011010010010110010; 11101011101101101110011110001001; 11110100011111100000100100101010; ………………… ………………..

17
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 17 Constructing Power Models: An Example 0.079140 0.030423 0.126169 ……………… (3) Gate-Level Power Data (4) Outputs from Regression – Inputs (1), (2), and (3) coeff_0 = 0.04110908 coeff_1 = 0.001006622 coeff_2 = 0.001146324 ………………

18
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 18 Constructing Power Models: An Example entity add_power IS port (in1 : IN std_logic_vector; in2 : IN std_logic_vector; POW_STROBE: in std_logic; power : out real); end addd_power; architecture VHDLgen OF add_power IS type queue1 is ARRAY (1 downto 0) of std_logic_vector(0 to (in1'high - in1'low) ); type queue2 is ARRAY (1 downto 0) of std_logic_vector(0 to (in2'high - in2'low) ); begin process(POW_STROBE) variable queue_in1: queue1; -- BIT-WIDTH INFERENCE variable queue_in2: queue2; variable bw : integer; variable flag : integer; begin -- BIT-WIDTH INFERENCE flag := 0; bw := (queue_in1(1)'high - queue_in1(1)'low) + 1; for i in 0 to bw loop if (flag = 0) then if (bw <= 2**i) then bw := 2**i; flag := 1; end if; end if; end loop; if POW_STROBE = '1' AND (POW_STROBE'event) then -- QUEUE MANAGEMENT queue_in1(1) := queue_in1(0); queue_in2(1) := queue_in2(0); queue_in1(0) := in1; queue_in2(0) := in2; -- MACROMODEL COMPUTATION case bw IS when 2 => power <= tc(queue_in1(0),queue_in1(1),0) * 7.88452e-05 + tc(queue_in1(0),queue_in1(1),1) * 7.800038e-05 + tc(queue_in2(0),queue_in2(1),0) * 0.0002803612 + tc(queue_in2(0),queue_in2(1),1) * 5.245284e-05; when 4 => power <= tc(queue_in1(0),queue_in1(1),0) * 0.0002173669 + tc(queue_in1(0),queue_in1(1),1) * 0.0002525756 + tc(queue_in1(0),queue_in1(1),2) * 0.00023067 + tc(queue_in1(0),queue_in1(1),3) * 0.0001498218 + tc(queue_in2(0),queue_in2(1),0) * 0.0001684765............ end case; end if; end process; end VHDLgen; Store current and previous I/O values Compute bit-level I/O switching activity and weigh them by their power coefficients Infer bitwidth of RTL component (5) Putting it all together

19
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 19 Improvements to Macromodels If (x>y) z=x-y else z=y RTL components can exhibit significantly different power behavior for different parts of the input space [Potlapally00] See Example Circuit: ■ C5 implements part of the GCD algorithm ■ C5 also implements operand gating for the subtractor C5 Behavior

20
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 20 Improvements to Macromodels 98% of the points in the upper cluster satisfy the condition (x>y): Power Mode 1 All the points in the lower cluster satisfy the condition (x<=y): Power Mode 2 Conventional Approach Proposed Approach

21
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 21 Improvements to Macromodels Power mode identification function (PIF) deduces the power mode based on the input vectors Appropriate macromodel gets invoked based on the identified power mode

22
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 22 CHARACTERIZATION FLOW Characterization Based Power Estimation RTL library Synthesis P&R Post-layout netlists Synthesis conditions Power Characterization Power macro-model database - Speed (fast/medium/slow) - Output cap. load - Input slew rate Synthesizable spec. for each component Enhanced RTL Testbench / stimuli RTL design (HDL) RT-level design planning / mapping Structural (macro) netlist Power model inference and estimation code generation Power Profiles Input Output Power Power model library generator Powerlib.vhd Powerlib.v Powerlib.c Simulateable Power Model Libraries RTLsimulation Cycle-by-cycle power report Support rel. and abs. accuracy Structural power profile Characterization based macromodeling Simulate-able power libraries Tightly coupled with RTL design planning

23
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 23 Enhanced RTL: Graphical View Main components include ■ Power models for every component: Monitor component I/O values and compute power ■ Power strobe generator: Trigger power models (statistical sampling employed for improved efficiency since RTL simulation can also be slow for large designs) ■ Power aggregator: Compute total power consumption <= reg_c0 reg_c1 <= reg_c1 + >> 1 reg_mid +/- reg_firstreg_last FSM reg_out 1 first last valuedata addrout Controller Functional Units Registers Bus 1 Bus 2 Bus 3 Power Strobe Generator Power Aggregator Power Model Power Model Power Model Power Model Power Model Power Model Power Model ● ● ●● ● ● ● ● ●● ● ● Total Power Example: Power Model Enhanced RTL

24
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 24 Enhanced RTL: An illustration -- POWER MODEL TRIGGERING POW_STROBE_GEN : process( CLOCK ) begin POW_STROBE <= CLOCK after POW_STROBE_DELAY; end process POW_STROBE_GEN; -- POWER AGGREGATION FOR EACH COMPONENT CLASS POW_TOTAL:process begin wait until (POW_STROBE='1' AND POW_STROBE'event) OR(POW_STROBE_REG='1' AND POW_STROBE_REG'event);.... FU_power <= VHDLgen_fu3000 + VHDLgen_fu3001 +.... ; REG_power <= VHDLgen_reg3008 + VHDLgen_reg3009 +.... ; end process; -- POWER AGGREGATION FOR COMPLETE DESIGN ENERGY_GEN : process begin wait until CLOCK'event OR POW_STROBE'event OR POW_STROBE_REG'event; if( ( POW_STROBE = '1') and POW_STROBE'event ) then main_cycle_energy := (GATE_power + FU_power + MUX_power )*characterization_period; main_energy := main_energy + ( GATE_power + FU_power + MUX_power)*characterization_period; end if; if( CLOCK = '1' and CLOCK'event ) then num_clocks := num_clocks + 1; main_power := main_energy / (real(num_clocks) * clock_period); end if; end process energy_gen; end VHDLgen; ENTITY gcd is port ( RESET : IN std_logic; CLOCK : IN std_logic; yi : IN std_logic_vector(0 to 7); xi : IN std_logic_vector(0 to 7);.... ); end gcd; ARCHITECTURE VHDLgen of gcd is signal M_39 : std_logic; signal M_38 : std_logic; signal VHDLgen_fu3000 : real := 0.0;.... component cmp_lt port (i1 : IN std_logic_vector(0 to 7) ; i2 : IN std_logic_vector(0 to 7) ; o1 : BUFFER std_logic); end component; -- POWER MODEL component cmp_lt_power port (in1 : in std_logic_vector; in2 : in std_logic_vector; out1 : in std_logic; POW_STROBE : in std_logic; power : out real); end component;.. begin.... cmp_lt port map(cmp_lt1i1, cmp_lt1i2, cmp_lt1ot); cmp_lt_power port map ( cmp_lt1i1(0 to 7), cmp_lt1i2(0 to 7), POW_STROBE, VHDLgen_fu3000);.... component cmp_lt port (i1 : IN std_logic_vector(0 to 7) ; i2 : IN std_logic_vector(0 to 7) ; o1 : BUFFER std_logic); end component; component cmp_lt_power port (in1 : in std_logic_vector; in2 : in std_logic_vector; out1 : in std_logic; POW_STROBE : in std_logic; power : out real); end component; RTLComponent PowerModel

25
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 25 Enhanced RTL: An illustration -- POWER MODEL TRIGGERING POW_STROBE_GEN : process( CLOCK ) begin POW_STROBE <= CLOCK after POW_STROBE_DELAY; end process POW_STROBE_GEN; -- POWER AGGREGATION FOR EACH COMPONENT CLASS POW_TOTAL:process begin wait until (POW_STROBE='1' AND POW_STROBE'event) OR(POW_STROBE_REG='1' AND POW_STROBE_REG'event);.... FU_power <= VHDLgen_fu3000 + VHDLgen_fu3001 +.... ; REG_power <= VHDLgen_reg3008 + VHDLgen_reg3009 +.... ; end process; -- POWER AGGREGATION FOR COMPLETE DESIGN ENERGY_GEN : process begin wait until CLOCK'event OR POW_STROBE'event OR POW_STROBE_REG'event; if( ( POW_STROBE = '1') and POW_STROBE'event ) then main_cycle_energy := (GATE_power + FU_power + MUX_power )*characterization_period; main_energy := main_energy + ( GATE_power + FU_power + MUX_power)*characterization_period; end if; if( CLOCK = '1' and CLOCK'event ) then num_clocks := num_clocks + 1; main_power := main_energy / (real(num_clocks) * clock_period); end if; end process energy_gen; end VHDLgen; ENTITY gcd is port ( RESET : IN std_logic; CLOCK : IN std_logic; yi : IN std_logic_vector(0 to 7); xi : IN std_logic_vector(0 to 7);.... ); end gcd; ARCHITECTURE VHDLgen of gcd is signal M_39 : std_logic; signal M_38 : std_logic; signal VHDLgen_fu3000 : real := 0.0;.... component cmp_lt port (i1 : IN std_logic_vector(0 to 7) ; i2 : IN std_logic_vector(0 to 7) ; o1 : BUFFER std_logic); end component; -- POWER MODEL component cmp_lt_power port (in1 : in std_logic_vector; in2 : in std_logic_vector; out1 : in std_logic; POW_STROBE : in std_logic; power : out real); end component;.. begin.... cmp_lt port map(cmp_lt1i1, cmp_lt1i2, cmp_lt1ot); cmp_lt_power port map ( cmp_lt1i1(0 to 7), cmp_lt1i2(0 to 7), POW_STROBE, VHDLgen_fu3000);.... POWER MODEL TRIGGERING POW_STROBE_GEN : process( CLOCK ) begin POW_STROBE <= CLOCK after POW_STROBE_DELAY; end process POW_STROBE_GEN; POW_TOTAL:process begin wait until (POW_STROBE='1' AND POW_STROBE'event) OR(POW_STROBE_REG='1' AND POW_STROBE_REG'event);.... FU_power <= VHDLgen_fu3000 + VHDLgen_fu3001 +.... ; REG_power <= VHDLgen_reg3008 + VHDLgen_reg3009 +.... ; end process; POWERAGGREGATION

26
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 26 Enhanced RTL: An illustration -- POWER MODEL TRIGGERING POW_STROBE_GEN : process( CLOCK ) begin POW_STROBE <= CLOCK after POW_STROBE_DELAY; end process POW_STROBE_GEN; -- POWER AGGREGATION FOR EACH COMPONENT CLASS POW_TOTAL:process begin wait until (POW_STROBE='1' AND POW_STROBE'event) OR(POW_STROBE_REG='1' AND POW_STROBE_REG'event);.... FU_power <= VHDLgen_fu3000 + VHDLgen_fu3001 +.... ; REG_power <= VHDLgen_reg3008 + VHDLgen_reg3009 +.... ; end process; -- POWER AGGREGATION FOR COMPLETE DESIGN ENERGY_GEN : process begin wait until CLOCK'event OR POW_STROBE'event OR POW_STROBE_REG'event; if( ( POW_STROBE = '1') and POW_STROBE'event ) then main_cycle_energy := (GATE_power + FU_power + MUX_power )*characterization_period; main_energy := main_energy + ( GATE_power + FU_power + MUX_power)*characterization_period; end if; if( CLOCK = '1' and CLOCK'event ) then num_clocks := num_clocks + 1; main_power := main_energy / (real(num_clocks) * clock_period); end if; end process energy_gen; end VHDLgen; ENTITY gcd is port ( RESET : IN std_logic; CLOCK : IN std_logic; yi : IN std_logic_vector(0 to 7); xi : IN std_logic_vector(0 to 7);.... ); end gcd; ARCHITECTURE VHDLgen of gcd is signal M_39 : std_logic; signal M_38 : std_logic; signal VHDLgen_fu3000 : real := 0.0;.... component cmp_lt port (i1 : IN std_logic_vector(0 to 7) ; i2 : IN std_logic_vector(0 to 7) ; o1 : BUFFER std_logic); end component; -- POWER MODEL component cmp_lt_power port (in1 : in std_logic_vector; in2 : in std_logic_vector; out1 : in std_logic; POW_STROBE : in std_logic; power : out real); end component;.. begin.... cmp_lt port map(cmp_lt1i1, cmp_lt1i2, cmp_lt1ot); cmp_lt_power port map ( cmp_lt1i1(0 to 7), cmp_lt1i2(0 to 7), POW_STROBE, VHDLgen_fu3000);.... POWERREPORT ENERGY_GEN : process begin wait until CLOCK'event OR POW_STROBE'event OR POW_STROBE_REG'event; if( ( POW_STROBE = '1') and POW_STROBE'event ) then main_cycle_energy := (GATE_power + FU_power + MUX_power )*characterization_period; main_energy := main_energy + ( GATE_power + FU_power + MUX_power)*characterization_period; end if; if( CLOCK = '1' and CLOCK'event ) then num_clocks := num_clocks + 1; main_power := main_energy / (real(num_clocks) * clock_period); end if; end process energy_gen; end VHDLgen;

27
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 27 The CPU time overheads of RTL power estimation LOG (Time in Seconds) 20X 6X 40X 32X 38X 43X Simulation time data obtained using ModelSim 5.3 (ModelTech) 1.25 million trans. Need improvements in efficiency for large designs [Ravi03]

28
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 28 Observations Power estimation time depends on the HDL constructs used in the power estimation code HDL-aware Optimizations Computation versus Storage Trade-offs Partitioned Statistical Sampling Computation can be traded off for storage to improve efficiency Power estimation effort should be directed where needed –Significant contributors –Tough to estimate portions

29
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 29 Solution 1: HDL-aware optimizations Convert operations with complex datatypes into operations with simpler datatypes Inline HDL functions to eliminate function maintenance overheads Minimize power model activations Reduce workload of a power model process Convert operations with complex datatypes into operations with simpler datatypes Inline HDL functions to eliminate function maintenance overheads Minimize power model activations Reduce workload of a power model process if (flag = 0) then bw := (queue_in1(1)'high – queue_in1(1)'low) + 1; for i in 0 to bw loop if (flag = 0) then if (bw <= 2**i) then bw := 2**i; flag := 1; end if; end if; end loop; end if; flag := 0; bw := (queue_in1(1)'high - queue_in1(1)'low)+1; for i in 0 to bw loop if (flag = 0) then if (bw <= 2**i) then bw := 2**i;flag := 1; end if; end if; end loop; EXAMPLE: BIT-WIDTH INFERENCE CODE IN POWER MODEL

30
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 30 Solution 2: Computation vs Storage Trade-offs Compute average power consumption once in k cycles Store observed signal bits of RTL component for k cycles Compute transition counts and power consumption only in the kth cycle Compute average power consumption once in k cycles Store observed signal bits of RTL component for k cycles Compute transition counts and power consumption only in the kth cycle CLK store compute store compute store compute store compute store compute POWER MODEL ACTIVITY (ORIG) ENHANCED POWER MODEL ACTIVITY store compute Simulation Time Queue Length 10 0 10 1 10 2 10 3 10 4 Variations in simulation time with queue length

31
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 31 Partitioned Sampling Mean and Variance Scatter Plot (MVSP) for an example design Components with high mean power, high variance (deserve high estimation effort) Components with low mean power, low variance (low impact on accuracy) Motivation: Smart “allocation of effort” during power estimation

32
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 32 Partitioned Sampling Algorithm HDL compilation Power model enhanced design in HDL Simulate for a user-specified fraction of the overall simulation time Power profiles of all the RTL components Determine Mean Variance Scatter Plot (MVSP) for the RTL design Apply clustering algorithm on MVSP to group components with similar mean and variance HDL compilation Full Simulation Fix sampling probabilities for clustered RTL components Transform HDL to incorporate sampled partitions 1 2 3 4 5 6 7 8 RTL power estimate

33
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 33 Fixing the Sampling Probabilities Objective: Determine the sampling probabilities for n clusters C 1, C 2 … C n Obs. #1: The error due to sampling given by must be minimized Obs. #2: The error in sampling a cluster C i that accounts for a greater fraction (f i ) of the total power must be kept small. That is, Minimize Power estimation error due to sampling for a component comp Standard deviation of the power profile of a component comp Number of samples for the cluster C i Equation 1

34
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 34 Fixing the Sampling Probabilities Formulation: Minimize Equation 1 subject to the following constraints Number of component-samples Computational budget Solution: Formulation now a “Linearly constrained Optimization” problem -- Many solvers available (Excel, Ampl)

35
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 35 RTL Power Estimation: Results Designs as large as 1.25 million transistors have been successfully evaluated using our RTL Power Estimator (RTL-PEST)* 4.1% 3.8% 13.9% 1.2% 0.18µ 0.13µ 12.2% 0.18µ 2.9% 14.8% 0.18µ Power Estimation speed better than the best available commercially RTL power estimates roughly 5 to 10% off gate-level power estimates RTL power estimation 10-50X faster than gate-level power estimation Execution Time (sec) Speedup Power (mW) * For further information, please see [Ravi03]

36
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 36 RTL Power Estimation: Results Percentage error versus CPU time trade-off for partitioned sampling and testbench reduction techniques Local power estimation errors for partitioned sampling and conventional sampling techniques

37
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 37 Outline Background ■ CMOS Power Consumption Basics ■ Why Address Power Consumption Issues in High-Level Design High-Level Power Analysis ■ RTL Power Estimation ● Fast Synthesis ● Analytical Approaches ● Characterization ■ Accelerating RTL Power Estimation ● Power Emulation (Hardware Accelerated Power Estimation) ■ Beyond RTL Power Estimation ● Power Estimation at the Cycle-accurate Behavior Level ■ Architectural Power Estimation

38
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 38 Power Emulation Technology Overview New paradigm for power estimation ! [Coburn05] Basic Observations 1.Power estimation uses power models for different components 2.Power models are themselves simple functions 3.Emulation is commonly used to speed up circuit simulation <= reg_c0 reg_c1 <= reg_c1 + >> 1 reg_mid +/- reg_firstreg_last FSM reg_out 1 fi r s t lastlast v al u e da ta ad dr out Host PC FPGA platform Testbench Outputs <= reg_c0 reg_c1 <= reg_c1 + >> 1 reg_mid +/- reg_firstreg_last FSM reg_out 1 fi r s t lastlast v al u e da ta ad dr ou t Power Strobe Generator Power Aggregator Power Model Power Model Power Model Power Model Power Model Power Model Power Model ● ● ●● ● ● ● ● ●● ● ● Total Power 2 to 3 orders of magnitude speedup possible !

39
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 39 Power Emulation: Challenges Size of design enhanced with power models is very large! 18.2X ■ Size increases on an average of 18.2X for MPEG4 sub-designs ■ Enhanced version exceeds capacity of largest Xilinx Virtex-II FPGA 17.7X 14.7X 20.6X 16.3X 17.5X 15.0X 20.4X Capacity of XC2V8000 FPGA Need to reduce the area requirements of power models !

40
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 40 Power Emulation: Challenges Why area increase? ■ Resource-hungry power models used for every RTL component in the design How to reduce area? ■ Optimize the number of power models used ■ Make the implementations of power models resource- efficient ■ Catch: Ensure minimum loss of estimation accuracy due to area reduction techniques

41
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 41 Area Optimization Techniques Clustering of power models ■ Single power model servicing multiple components Changing component granularity ■ Constructing power models for complex components that subsume several smaller components Exploiting correlation ■ Using power correlation between components to reduce the number of monitored components Optimizing power model implementations ■ Multi-cycling additions in power model computations ■ Using FPGA block memories for efficient storage of power model coefficients

42
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 42 Power Emulation: Results Evaluation on various NEC designs, Comparison with RTL-PEST, Comm-RTL CKT Estimation Time (sec) Estimated Power (mW) FPGA Area (LUTs) RTL- PEST Comm- RTL EmulationAcc RTL- PEST EmulationError RTL- PEST Emulatio nAO Sort 11.680.21.2 9.7 X, 66.8X 0.330.313.53% 160556653.53X HVPeakF 120.3136.8 1.770.8X,80.5X 0.14 0% 319290162.82X DCT 172.9173.3 3.7 46.7 X, 46.8X 8.227.765.6% 6121192423.14X MPEG4 330025876.3524X,411X 4.744.514.9% 24907723512.9X Upto 500X speedup compared to RTL power estimation 3% Loss of accuracy on an average Area overheads lowered to ≈3X RTL-PESTComm-RTL Power Emulation MPEG4 3300sec2587sec6.3sec Nearly 500X speedup possible ! For further information, please see [Coburn05]

43
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 43 Outline Background ■ CMOS Power Consumption Basics ■ Why Address Power Consumption Issues in High-Level Design High-Level Power Analysis ■ RTL Power Estimation ● Fast Synthesis ● Analytical Approaches ● Characterization ■ Accelerating RTL Power Estimation ● Power Emulation (Hardware Accelerated Power Estimation) ■ Beyond RTL Power Estimation ● Power Estimation at the Cycle-accurate Behavior Level ■ Architectural Power Estimation

44
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 44 Cycle-Accurate Functional Descriptions (CAFDs) FSM reg_c0reg_c1 reg_y1 reg_x reg_y !=<- Bus1 Bus2 Registers Functional units Controller input_y input_x out x = input_x; y = input_y; while (x != y) { if (x < y) { y = y - x; } else { x = x - y; } out = x; (a) Behavioral description Scheduling (c) RTL implementation Binding (b) Cycle-accurate functional description cmp lt_cmp sub ST_1: x = input_x; y = input_y; goto ST_2; ST_2: c0 = x!=y; c1 = x

45
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 45 Cycle-Accurate Functional Descriptions (CAFDs) FSM reg_c0reg_c1 reg_y1 reg_x reg_y !=<- Bus1 Bus2 Registers Functional units Controller input_y input_x out x = input_x; y = input_y; while (x != y) { if (x < y) { y = y - x; } else { x = x - y; } out = x; (a) Behavioral description Scheduling (c) RTL implementation Binding (b) Cycle-accurate functional description cmp lt_cmp sub ST_1: x = input_x; y = input_y; goto ST_2; ST_2: c0 = x!=y; c1 = x

46
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 46 Overview of Power Estimation using Cycle- Accurate Functional Descriptions (CAFDs) Objectives ■ Extract minimum RTL structural info. ■ Back-annotate RTL structural info. More information in (Zhong04) CAFD RTL Preprocessing Synthesis RTL information extraction Virtual component instantiation Idle cycle analysis structure-aware CAFD Power model library Simulation test bench Cycle-accurate functional simulation Power report Input Output Power Power vs. time Resource, timing constraints (Scheduled Behavior) (C++/SystemC)

47
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 47 Original cycle-accurate functional description POWER MODEL LIBRARY Virtual component Stores I/O values for current & previous cycles Invokes the power macro-model in each cycle I/O mapping Traces appropriate CAFD variables to capture component I/Os in each cycle Generates idle cycle input values Power aggregation & reporting code Structure-AWARE CAFD reg_power add_power SIMULATION TEST BENCH Structure-aware CAFD

48
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 48 Example Snippet of an Structure-aware CAFD VC bus1,bus2; VC reg_y1; VC lt_cmp; … ST_1: … ST_2: bus1.RecordInput(x) bus2.RecordInput(y); c0 = x!=y; reg_c0.RecordInput(c0); eq_cmp.RecordIO(x,y,c0); c1 = x

49
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 49 C-based HW Power Estimation: Results Compared accuracy, efficiency vs. gate-level and RTL-PEST ■ 50-100X speedup (or more) for various designs, less than 20% error w.r.t POWERD For further information, please see (Zhong04) Circuit Average Power Error Cycle- level Absolute Error Speedup vs. RTL Estimation Slowdown vs. Functional Simulation DES 2.1%2.2%83 X1.1 X HDTV-1 1.7%4.0%356 X3.2 X JPEG 2.7%6.6%1,143 X3.3 X MPEG4-IDCT 3.1%5.1%412 X3.2 X MPEG4-ISPQ 1.5%2.4%438 X2.1 X SORT 1.7%5.4%266 X1.7 X VITERBI 1.4%6.5%305 X3.0 X WAVELET 2.4%5.1%223 X2.1 X

50
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 50 Outline Background ■ CMOS Power Consumption Basics ■ Why Address Power Consumption Issues in High-Level Design High-Level Power Analysis ■ RTL Power Estimation ● Fast Synthesis ● Analytical Approaches ● Characterization ■ Accelerating RTL Power Estimation ● Power Emulation (Hardware Accelerated Power Estimation) ■ Beyond RTL Power Estimation ● Power Estimation at the Cycle-accurate Behavior Level ■ Architectural Power Estimation

51
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 51 Architectural Power Estimation Requirements ■ Need to evaluate trade-offs in processor configuration ■ Need to evaluate trade-offs in software running on system ■ Must be very fast compared to HDL based power estimators.

52
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 52 Architectural Power Estimation (Wattch) Overall structure of an architectural power estimator Wattch [Brooks00] Parameterized models for different CPU units ■ Can vary size or design style as needed ■ Use the fundamental equation for dynamic power consumption ● P=CV 2. A.f On each cycle, determine which units are accessed and accumulate energy consumption Capacitance modeled for various critical components Activity factors ■ Runtime measurements using a cycle- accurate performance simulator called SimpleScalar (has been ported to many simulators) ■ Assume an activity factor of 0.5 for which the simulator cannot report statistics Cycle-Level Performance Simulator Parameterizable Power Models BinaryHW Config Performance Estimate Power Estimate Cycle-by-Cycle Hardware Access Counts

53
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 53 Architectural Power Estimation 10-15% accuracy variations with low-level industry data (source: Brooks_hpca2001) Good relative accuracy even when absolute accuracy may be off

54
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 54 Conclusions High-level power analysis techniques are finally coming of age Efficiency Accuracy What we could not cover today?: Using high-level power analysis for optimization ■ Power reports provide information about a design’s “hotspots” ■ Presence of power analysis in a high-level design flow makes optimization and design space exploration easy

55
Copyright Agarwal & Srivaths, 2007Low-Power Design and Test, Lecture 4 55 References Books/Tutorials ■ [Raghunathan-book] A. Raghunathan, N. K. Jha, and S. Dey, "High-level power analysis and optimization", Kluwer Academic Publishers, 1997 ■ [Ravi-aspdac05] “Power Analysis in C-based Design” (part of tutorial entitled “C-based Design: Industrial Experience”), Asia-South Pacific Design Automation Conference (ASP-DAC), January 2005 Conference Papers ■ [Llopis98] R. Llopis, K. Goossens, “The petrol approach to high-level power estimation”, ISLPED 1998: 130- 132 ■ [Glaser91] K. D. Glaser, K. Kirsch, and K. Neusinger, ``Estimating essential design characteristics to support project planning for ASIC design management,'' in Proc. Int. Conf. Computer-Aided Design, pp. 148--151, Nov. 1991. ■ [Liu94] D. Liu and C. Svensson, ``Power consumption estimation in CMOS VLSI chips,'' IEEE J. Solid-State Circuits, vol. 29, pp. 663--670, June 1994 ■ [Nemani96] M. Nemani and F. N. Najm, ``High-level power estimation and the area complexity of Boolean functions,'' in Proc. Int. Symp. Low Power Electronics & Design, pp. 329--334, Aug. 1996. ■ [Potlapally01] N. R. Potlapally, A. Raghunathan, G. Lakshminarayana, M. S. Hsiao, and S. T. Chakradhar, "Accurate power macro-modeling techniques for complex RTL circuits", IEEE International Conference on VLSI Design, January 2001 ■ [Ravi03] S. Ravi, A. Raghunathan, and S. T. Chakradhar, "Efficient RTL Power Estimation for Large Designs," IEEE International Conference on VLSI Design, January 2003 ■ [Zhong04]L. Zhong, S. Ravi, A. Raghunathan, and N. K. Jha, "Power estimation for cycle-accurate functional descriptions of hardware," IEEE/ACM International Conference on Computer-Aided Design, November 2004 ■ [Coburn05] J. Coburn, S. Ravi, and A. Raghunathan, "Power emulation: A new paradigm for power estimation," ACM/IEEE Design Automation Conference, June 2005 ■ [Brooks00] David Brooks, Vivek Tiwari, and Margaret Martonosi, “Wattch: A Framework for Architectural-Level Power Analysis and Optimizations,” 27th International Symposium on Computer Architecture (ISCA), June 2000

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google