Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parapet Research Group, Princeton University EE Workshop on Hardware Performance Monitor Design and Functionality HPCA-11 Feb 13, 2005 Hardware Performance.

Similar presentations


Presentation on theme: "Parapet Research Group, Princeton University EE Workshop on Hardware Performance Monitor Design and Functionality HPCA-11 Feb 13, 2005 Hardware Performance."— Presentation transcript:

1 Parapet Research Group, Princeton University EE Workshop on Hardware Performance Monitor Design and Functionality HPCA-11 Feb 13, 2005 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk ISCI Gilberto CONTRERAS Margaret MARTONOSI

2 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 2  Several explored research avenues  Runtime power/thermal estimations  Dynamic management  Workload phases and application behavior prediction  HPCs provide value beyond simulations  Long-timescales  Real-system behavior Hardware Performance Counters (HPCs) Go beyond Performance

3 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 3  Runtime power  Isci & Martonosi [MICRO 2003]  Contreras & Martonosi [Submitted 2005]  Runtime thermal  Lee & Skadron [HP-PAC in IPDPS 2005]  Dynamic power management  Choi et al. [ISLPED 2004]  Weißel & Bellosa [CASES 2002]  Dynamic thermal management  Bellosa et al. [COLP 2003]  Workload phases and application behavior prediction  Isci & Martonosi [WWC 2003]  Duesterwald et al. [PACT 2003] Hardware Performance Counters (HPCs) Go beyond Performance

4 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 4 High-Performance Corner: P4 Power Estimation  Idea: MaxPower[I] x ArchScaling[I] x AccessRate[I] + NonGatedPower[I] Power of component I =  Motivation:  Fast (Real-time)  Estimated view of on-chip detail (Per physical component)  Design:  Developed heuristics using 24 events to approximate access rates for 22 chip components  Used 15 counters with 4 rotations to collect all event data  Validation:  Real-time estimates against real-time measured power

5 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 5 P4 Power Estimator Results  Average difference: ~5% among all benchmarks  SPEC CPU2000 & other applications Gcc Measured Modeled GzipVprVortexGap Crafty

6 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 6 Embedded Corner: PXA255 Power Estimation  Idea: PerformanceEvents nx5 x LinearParameters 5x1 + IdlePower CPU Power nx1 =  Motivation:  Runtime power optimizations under DVFS  Design:  Parameter estimation (OLS) using dominant counter readings and live power measurements  Power estimation at various CPU configurations  Validation:  Comparison between estimates and real-time measured power PerformanceEvents nx2 x LinearParameters 2x1 + IdlePower Mem Power nx1 =

7 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 7 PXA255 Results  DB CDC Java  5% average error across 3 domains  Java CDC  Java CLDC  SPEC2000

8 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 8 Proposals from Experiences  1. Track each physical unit individually for power & thermal:  Ex: Trace Cache μCode ROM μop Queue Allocate Rename Instr-n Queue1 Instr-n Queue2 Schedulers MEM EXE Dispatch Ports All tracked with in-flight μops written to μop queue  Need individual utilization counts for each physical unit available on die for power and hotspot analyses

9 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 9 Proposals from Experiences  2. Need bitline activity counts  Utilization is not complete information, power in part depends on switching factor  Not necessarily fully detailed counts Accumulate bitwise XOR of current and previous input/output ports Sample RegFile ports/bit populations 30mW (10%) swing 400Mhz 1.3V PXA255 Processor

10 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 10 Proposals from Experiences  2. Need bitline activity counts  Utilization is not complete information, power in part depends on switching factor  Not necessarily fully detailed counts Accumulate bitwise XOR of current and previous input/output ports Sample RegFile ports/bit populations 20mW swing 111…11 000…00 + 111…11 000…00 + 400Mhz 1.3V PXA255 Processor 111…11 + 000…01 000…00 + 000…01 111…11 + B 111…11 000…00 011…11 000…00 001…11 000…00 : 000…11 000…00 000…01 000…00 A 000…01 000…01 000…01 000…01 000…01 000…01 : 000…01 000…01 000…01 000…01

11 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 11 Proposals from Experiences  3. More detailed off-chip/memory access support in the embedded domain  Mem Power ~40% of system power  Tracking memory hierarchy transactions may help render better memory power estimates REX Memory power consumption (one 16b bank) Main memory Read/Writes  Core + DMA Transaction length in bytes Activity factors can be shared with RegFile

12 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 12 Proposals from Experiences  4. Metrics related to queue occupancy  Modern processor ≡ Several queues  Depending on implementation Power ∝ Queue occupancy Buyuktosunoglu et al. [ISLPED’02] Tradeoffs in Power-Efficient Issue Queue Design

13 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 13 Proposals from Experiences  5. General/aggregate metrics in addition to specialized cases/ breakdowns simplify runtime sampling for unit accesses  P4 ex1. MOB: Only event MOB_load_replays Counts replays for unknown st addr./data, partial/unaligned addr. match No info for MOB entries/accesses/updates  P4 ex2. FPU: Has 8 separate events (with 2 dedicated ESCRs) Need at least 4 rotations to collect  P4 ex3. INT ALU: No dedicated event

14 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 14 Additional Comments for HPC Design  General/aggregate metrics in addition to specialized cases/ breakdowns simplify runtime sampling for unit accesses  Metrics related to RegFile accesses vs. forwarding  Semi-distributed implementations will always induce dependencies among simultaneously countable events  Higher parallelism among (power oriented) metrics for minimal counter rotations at runtime  Implementations that allow counter rotations without need for intermediate logging Partitioned / Dual-mode / Buffered counters  Different events for different types of accesses to same units with different magnitude power implications  i.e. branch scan < BHT update < BTA update  Different API/SW demands:  Lightweight implementations for runtime analyses  Per-thread for application profiling vs. global for real-time measurement comparisons and hotspots

15 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 15 Wishlist for Power/Thermal  1) For each physical unit on die, separate events to track utilization rates  Sub events for different type of accesses with different power costs  2) Bitline activity counters for switching units  3) Occupancy counters for related queues  4) Counter support for off-core memory accesses  5) High parallelism among power events for minimal counter rotations

16 Hardware Performance Counters for Detailed Runtime Power and Thermal Estimations: Experiences & Proposals Canturk Isci, Gilberto Contreras, Margaret Martonosi 16 Conclusions  New opportunities remain to be explored in future PMC designs for power and thermal studies  Direct correspondence to physical units  Bitline and occupancy counters  We believe in the feasibility of these additions with the continuing emphasis given to counter design, as long as power is also considered a primary design target.


Download ppt "Parapet Research Group, Princeton University EE Workshop on Hardware Performance Monitor Design and Functionality HPCA-11 Feb 13, 2005 Hardware Performance."

Similar presentations


Ads by Google