Presentation is loading. Please wait.

Presentation is loading. Please wait.

Canturk Isci Advisor: Margaret Martonosi

Similar presentations


Presentation on theme: "Canturk Isci Advisor: Margaret Martonosi"— Presentation transcript:

1 Canturk Isci Advisor: Margaret Martonosi
Workload Adaptive Power Management with Live Phase Monitoring and Prediction Canturk Isci Advisor: Margaret Martonosi PrincetonUniversity November 19, 2018

2 Power Critical Across Computing Spectrum
Cooling Packaging Battery Lifetime Peak Performance/Utilization Reliability Cost of Ownership Computing Scale

3 Powerful Observations
[NY Times, 06/2006] The Dalles, OR $5M 5% Power Savings (%) Cost Savings $4M $3M $2M $1M 10% 15% 20% 25% [Eugene Gorbatov - Intel’06] Cost of power $0.10/KWh 1100 racks in datacenter WA – 48MW WA – 42MW > 30,000 homes US EPA Power Efficiency Specs: $1.8 billion savings / next 5 years - Equivalent to the annual savings of 2.7 million cars Power triple play: Save 1W processor power  ~1W power supply conversion  ~1W cooling power

4 Power Management Wide application scope Live adaptation and response
Static/Offline Dynamic/Online Circuits Dual-VT VDD gating Dynamic pipeline/ cache reconfiguration Architecture Positional adaptation Compiler driven frequency scaling System Process cruise control Energy-aware / coordinated allocation Data center Dedicated servers Instruction/procedure energy accounting Application Dynamic compilation Wide application scope Live adaptation and response

5 Phase-driven dynamic adaptation Energy-efficient resource allocation
My Work Circuits Architecture System Data center Application Dual-VT VDD gating Positional adaptation Dynamic pipeline/ cache reconfiguration Compiler driven frequency scaling Dedicated servers Instruction/procedure energy accounting Process cruise control Energy-aware / coordinated allocation Dynamic compilation Static/Offline Dynamic/Online Leverage application scope with architectural insight to infer dynamic workload behavior  Phases CMP power budgeting Phase-driven dynamic adaptation Energy-efficient resource allocation Detect and predict repetitive behavior for dynamic adaptations  Adaptive power management

6 Thesis Contributions A power-oriented phase analysis framework for real-system studies Generic runtime phase prediction methodology Phase-prediction-driven workload-adaptive power management Runtime power monitoring and estimation with hardware performance counters Phase detection approach resilient to system-induced variations

7 This Talk PART-I: On-the-fly phase monitoring and prediction
Simple, target specific phases Application to dynamic management PART-II: Power-oriented phase analysis We look at… for… Left: Micro’06 Right: ~(Micro’03 + WWC’03) & HPCA’05 – hadi bakem Ever? IISWC’05?? -> Not likely (All IISWC is in extras.ppt NOW. But needs mins by itself -> ditch) Detailed phases reflecting power behavior Workload power phase characterization

8 Program Phases Distinct and often-recurring regions of program behavior How can we detect recurrent execution under real-system variability? How can we predict future phase patterns? How can we leverage predicted phase behavior for workload-adaptive power management? Can we do better than simple, reactive methods? Useful for: Characterizing execution regions Use current phase/behavior to predict future behavior Managing dynamic adaptation

9 Dynamic Power Management with Live, Runtime Phase Prediction
Current common dynamic adaptation approach: Assume last/recent observed behavior will persist Tracked Characteristic Great for stable execution! Inaccurate response for highly variable behavior! t Key questions: How can we accurately predict future application phase behavior on all types of execution? Can predicted phase behavior be leveraged for workload-adaptive power management?

10 Phases for Dynamic Power Management
Need phases: Represent dynamic voltage and frequency scaling (DVFS) potential Resilient to system variations Invariant to dynamic management actions DVFS potential α Memory access rate Memory accesses per instruction (MPI) resilient to variations with fixed instruction granularity tracking MPI invariant to DVFS power modes Different MPI rates Different phases DVFS settings MPI Phase # DVFS Setting < 0.005 1 (1500 MHz, 1484 mV) [0.005,0.010) 2 (1400 MHz, 1452 mV) : > 0.030 6 ( 600 MHz, mV) [Wu et al. Micro’05]

11 Applu execution snapshot:
How Can We Predict Application Phase Behavior on All Types of Execution? Applu execution snapshot: MPI Phases 0.020 0.015 MPI Rate 0.010 0.005 0.000 1 2 3 4 5 Phases Now going back to our first question, lets look at a real ex 2.80E+10 2.90E+10 3.00E+10 3.10E+10 3.20E+10 3.30E+10 Cycles Significant variations exist! Phase patterns expose available recurrence!

12 Predicting Phases with the Global Phase History Table (GPHT) Predictor
PHT Tags PHT Pred-n Age / Invalid GPHR Pt’ Pt’-1 Pt’-2 Pt’-N Pt’ Pt’-1 Pt’-2 Pt’-N Pt’+1 20 Pt-1 Pt-2 Pt-N Pt Pt-N-1 Pt’’ Pt’’-1 Pt’’-2 Pt’’-N Pt’’ Pt’’-1 Pt’’-2 Pt’’-N Pt’’+1 Pt’’+1 15 : : : : : : : PHT entries : GPHR depth Pt Pt : : : : : : : : P0 P0 P0 P0 P0 -1 Last observed phase from performance counters GPHR depth Predicted Phase From GPHR(0) if no matching pattern From the corresponding PHT Prediction entry if matching pattern in PHT Similar to a global history branch predictor Implemented in OS for on-the-fly phase prediction

13 Prediction Accuracies
100 90 80 LastValue Prediction Accuracy (%) 70 PHT:1024, GPHR:8 60 PHT:128, GPHR:8 PHT:64, GPHR:8 50 PHT:1, GPHR:8 40 gzip_log mcf_inp gcc_200 gap_ref gcc_scilab gcc_expr ammp_in gcc_166 apsi_ref parser_ref mgrid_in applu_in equake_in wupwise_ref gcc_integrate bzip2_program bzip2_source bzip2_graphic Compare to reactive approaches (Last Value prediction) GPHT performs significantly better for highly varying applications Up to 6X and on average 2.4X misprediction improvement Similar results for the misprediction distances Good performance down to 128 PHT entries

14 Full-System Implementation
Application Application Binary OS Predictor State Phase History Data Acquisition System Parallel Port CPU (V,I) PMI Interrupt Handler Predict Next Phase Stop/Read Counters Check/Set DVFS State Hardware Restart Counters Performance Counters DVFS Registers Pentium-M Processor

15 Phase-Driven Dynamic Adaptation: Complete Example
MPI (GPHT) ACTUAL_PHASE PRED_PHASE (GPHT) 0.000 0.004 0.008 0.012 0.016 0.020 0.024 MPI GPHT can accurately predict varying application behavior! 1 2 3 4 5 Phases 2 4 6 8 10 12 14 Significant power savings compared to baseline! Power (Baseline) Power (GPHT) Power [W] 0.3 0.6 0.9 1.2 1.5 1.8 2.1 BIPS (Baseline) BIPS (GPHT) Small performance degradation! BIPS 1.5E+09 2.0E+09 2.5E+09 3.0E+09 3.5E+09 4.0E+09 4.5E+09 5.0E+09 Instructions

16 Improvement over Reactive Methods
0% 10% 20% 30% 40% 50% Energy-Delay Product (EDP) Improvement Last Value GPHT 63% 66% 70% 7% EDP improvement over reactive approach! GPHT 2X better for variable applications! Comparable or less performance degradation! Higher is better 0% 5% 10% 15% 20% bzip2_program bzip2_source bzip2_graphic mgrid_in applu_in equake_in swim_in mcf_inp average Perf. Degradation Last Value GPHT Lower is better Plots show EDP impr. And perf degr. For GPHT and last val, wrt baseline exec-n

17 This Talk PART-I: On-the-fly phase monitoring and prediction
Summary: GPHT accurately predicts phases at runtime 2.4X fewer mispredictions Phase-driven dynamic management improves power efficiency EDP improvement: 27% over baseline, 7% over reactive PART-II: Power-oriented phase analysis We look at… for… Left: Micro’06 Right: ~(Micro’03 + WWC’03) & HPCA’05 – hadi bakem Ever? IISWC’05?? -> Not likely (All IISWC is in extras.ppt NOW. But needs mins by itself -> ditch) Detailed phases reflecting power behavior Workload power phase characterization

18 Power Phase Characterization
Insight to power behavior is very useful Identifying varying workload power demand Allocating power budgets Guiding thermal management Key questions: Can we leverage hardware performance counters to understand workload power behavior? How do different application attributes perform to characterize workload power phases? Understanding workload power behavior and how it changes is useful for many endgoals Identifying varying workload power demand – LAPSU / match (virtual) machine power cap allocation/cooling Allocating power budgets – CMP core knobs / VM consolidate Responding to thermal implications – swap hot code / cooperate hotspot shifting

19 Performance Counters Reflect Application Power Behavior
Composition of performance monitoring counters (PMC Vectors) as proxy to power behavior High issue & exec. power High bus power High L2 cache power High L1 cache power EOMicro’03 CPU Bound L1 Bound L2 Bound Memory Bound

20 Identifying Phases with PMC Vectors
Application execution ≡ PMC Vector samples Similar PMC vectors ≡ Same phase How to quantify vector (dis)similarity: “Similarity Distance (SD)” Absolute Manhattan(L1) Distance Normalized Manhattan Distance &

21 Phase Classification Results
All pair-wise distances constitute a “Similarity Matrix” Classify similarity matrix into phases EOWWC’03 Small number of (5-10) phases capture power variation within 10%

22 Evaluating Different Features for Phase Analysis
Several studied program characteristics - Specific metrics (IPC, EPI) - Hardware performance vectors - Branch counts - Working sets - Basic block vectors - Procedures Two main approaches: Control Flow Methods Basic Block Vectors (BBVs) [Sherwood et al. ASPLOS’02] Event Monitoring Techniques Performance Monitoring Counters (PMCs) [Isci and Martonosi Micro’03] Key Question: How do these methods perform in terms of accurate representations of power phase behavior?

23 Experimental Setup Application Binary Pin
Sample basic block head addresses Instrument basic block heads Pin Application Binary Application Pintool OS OS serial device file Goal: To acquire control flow, performance metric and power behavior of workload execution at matching & controlled observation points on a real system Hardware Performance Counter Hardware External Power Measurement via Current Probe

24 A {BBV,PMC,Power} Sample
Visited basic blocks: PMCs: Power history: 0x 10844 0x 0x 463832 36.5W 0x 37.5W 37.2W 0x 0x804878d 299303 36.9W 0x804879c 36382 35.9W 1 Sample 0.5 0.02 0.7 1.4 0.16 1 PMC vector 37W 1 Power number 5 15 13 44 6 1 BBV Hash Every 100M instructions 5 15 12 13 44 6 BBV32 Every 1M instructions

25 Evaluation Main Steps Cluster BBV samples Cluster PMC vectors
Compare each to true measured power Also compare to Oracle: classify directly for power Random: assign samples to target clusters randomly Deviation from power represents our error

26 Comparison of Techniques
30% BBVs 70% of Random PMCs 40% of Random Random 25% BBV PMC Oracle 30% of BBVs Oracle 50% of PMCs 20% Oracle PMCs achieve 40% less error than BBVs Percent Error w.r.t. Actual Power 15% 10% 5% 0% EO HPCA’05 I’ll show for first pivot and agglo-complete linkage only And one for whole set of suites We quantify these next AVE(SPECint) AVE(SPECfp) AVE(OTHER) AVE(Overall) Consistent results regardless of clustering method BBV and PMCs both improve on upper bounds, but also significant gap over lower bound PMCs generally lead to less errors than BBVs

27 This Talk PART-II: Power-oriented phase analysis Summary:
Event counters useful to track runtime workload power profile PMC vectors better characterize power phase behavior with simpler monitoring and control 40% less error than BBVs We look at… for… Left: Micro’06 Right: ~(Micro’03 + WWC’03) & HPCA’05 – hadi bakem Ever? IISWC’05?? -> Not likely (All IISWC is in extras.ppt NOW. But needs mins by itself -> ditch)

28 Future Directions Broader picture: Many core/Mini core/Accelerators
Virtualization/Scalable enterprise Intentional/Unintentional variability DEMAND BASED EVERYTHING! Accurately projecting workload demand is key for dynamic adaptations! Configurable architectures/cores Allocating/migrating workloads/VMs to heterogeneous resources Dynamically tuning to schedules in data centers/real-time systems Cooperating PMC and control-flow features  ‘action’ dependent phases  multiple actions Before summing all these up for our final conclusions, lets look at some future research directions

29 Conclusions Event counters useful to track/predict/manage power
Runtime phase predictions with the Global Phase History Table can accurately predict future application behavior 2.4X fewer mispredictions Dynamic power management guided by phase predictions improves system power-performance efficiency 27% EDP improvements over baseline, 7% over reactive “PMC vector” similarity identifies application phases PMC features provide better proxy workload power behavior compared to control-flow features PMC phases achieve 40% less error than control flow

30 THANKS!

31 Thanks! Canturk Isci Princeton University Department of Electrical Engineering Parapet Research Group Advisor: Margaret Martonosi

32 Collaborators Princeton Margaret Martonosi Gilberto Contreras Qiang Wu
Intel Eugene Gorbatov Sameer Abhinkar Rick Forand IBM Pradip Bose Alper Buyuktosunoglu Chen-Yong Cher Prabhakar Kudva Zhigang Hu Georgia Tech Ripal Nathuji

33 Research Overview Dynamic Management Power Estimation Phase Analysis
Monitor application execution via specific features Classify features into phases Detect/Predict phase behavior Apply dynamic power management guided by phase predictions Validate with real measurements Dynamic Management Power Estimation Phase Analysis Power Estimation Runtime Monitoring Hardware Performance Counters Dynamic Program Flow Application Real Measurements

34 Publications Runtime Monitoring Power Estimation Dynamic Management
Hardware Performance Counters Dynamic Program Flow [MICRO’03]  Runtime Power [WHPM in HPCA’05] Counters & Power Power Estimation [WWC’03]  Power Phases [IEEE MICRO’05]  Durations [IISWC’05]  Detection [HPCA’06]  PMC vs. BBV Phase Analysis [MICRO’06]  GPHT [MICRO’06]  CMP Budget [ICAC’07]  Hetero Datacenter Dynamic Management

35 Thesis Outline Power and Performance Measurement and Estimation on Real Systems: Methods and Basics [Micro-36’03][PMC Workshop in HPCA-11’05] Power Oriented Phase Analysis [WWC’03] [HPCA-12’06] Detecting Repetitive Phase Patterns with Real-System Variability [IISWC’05] Predicting Stable Phase Durations [IEEE MICRO’05] Runtime Phase Tracking and Phase Driven Dynamic Management [Micro-39’06]

36 MEASURED/ESTIMATED CPU POWER vs. TIME
Other Work Event Counter Based Runtime Power Estimation Estimate component power breakdowns based on access rates [Micro’03] Gcc Gzip Vpr Vortex Gap Crafty Measured Estimated MEASURED/ESTIMATED CPU POWER vs. TIME Detecting Recurrent Phase Behavior under Real-System Variability Phase transformations due to variability effects Transition-guided phase detection framework [IISWC’05] Before giving our overall contribs & conclusions Here, let’s look at some of the other stuff we did

37 …Other Work Long-Term Value and Duration Prediction
Predict duration and rate of change for stable phases [IEEE MICRO’05] Global Power Management for Chip Multiprocessors Optimize throughput for fixed global power budget [Micro’06] Before giving our overall contribs & conclusions Here, let’s look at some of the other stuff we did Power Management in Heterogeneous Data Centers Allocate workloads to heterogeneous platforms [ICAC’07]

38 Future Directions Many immediate research paths:
Phase predictions and dynamic adaptations for thermally limited systems Extending to detailed ‘action-dependent’ phases via ‘across-mode’ phase predictions Dynamically tuning to service-level agreements in data centers/real-time systems Leveraging control-flow information in coordination with event-counters

39 …Future Directions Broader picture:
Many core/Mini core/Accelerators Virtualization/Scalable enterprise Intentional/Unintentional variability DEMAND BASED EVERYTHING! Accurately projecting workload demand is key for dynamic adaptations! Multiconfigurable architectures Locally adapting CMP cores to workload demand Allocating/migrating workloads/VMs to heterogeneous resources (and vice-versa)

40 What Would I do? Real-System: Alternative mgmt  cache config
Multidimension phases  across-mode predictions (golden patterns/BF model)  better characterizing power?? Woodcrest/Sossaman: DP-CMP  workload shifting for aligning phases  opportunity study first AMD-Barcelona(’07)  core-level mgmt benefits OPEN: How can I use control flow granularities w/o PIN? Simulation: Multiple features  big classifier  state dependent phase tables  Next-Phase State Machine  multiple actions cooperatively Phases to guide power gating? Phased/associativity-aware cache: Dual Vt or shutdown banks Phase-Driven Runahead: when to runahead & when not to In conjunction with MLP

41 Future Many core/mini core/NoC/accelarators/helper engines
Security - LaGrande Parallelism – TM Virtualization/Platformization/Scalable Enterprise Consolidation/isolation/migration Process Variation Ultra low power/embedded/disposable/cheap computing Intentional/Unintentional heterogeneity DEMAND BASED EVERYTHING!

42 What ifs… We consider CMP+MP… SMT… Multithreaded?
This is interesting (multiple virtual cntrs, pid dependent GPHT?) Non trashing vs trashing trade-off Thermal limited Fused caches?

43 Phase Visions

44 Phase-Driven Management Vision
PC X A C S1 S3 S8 N V M S6 O Action to Controller Events PMCs Classifier History & State Table Phase State Machine I$ D$ Commit I$ Misses D$ Misses Instr-ns Completed DVS Cache Reconfig Phase State Next Phase

45 1.1) Why Care About Phases? Characterizing execution regions
Summarize exec. Into repr exec. regions

46 1.1) Why Care About Phases? Characterizing execution regions
Managing dynamic adaptation OFF ON Dynamic/adaptive mgmt

47 1.1) Why Care About Phases? Characterizing execution regions
Managing dynamic adaptation Use current phase/behavior to predict future behavior 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3 8 13 Time [s] Load Refs Store Misses

48 1.2) Why Care About Power Phases?
Useful for: Guiding power budget / temperature limit management Slow down! Power [W] Temp. [oC] Time [s] Uncontrolled T Enforced T I.e. Montecito/Foxton I.e. Montecito/Foxton

49 1.2) Why Care About Power Phases?
Useful for: Guiding power budget / temperature limit management Power/Temperature aware scheduling Power [W] This helps in 2 ways: Reduce cooling cost/heat removal rate for a server Extend battery life for a mobile as less cooling power/time is needed Time [s] [Bellosa et al. COLP’03]

50 1.2) Why Care About Power Phases?
Useful for: Guiding power budget / temperature limit management Power/Temperature aware scheduling Power balancing for multiprocessor systems/activity migration Power Power Task1 Task2 Swap hot task Migrate hot task Or Slow down hot core Core/μP 1 Core/μP 2 Speed up! Slow down!

51 Counter Based Power Estimation

52 This Talk Application Runtime Monitoring Power Estimation
Counter-based runtime power estimation Application Runtime Monitoring Hardware Performance Counters Dynamic Program Flow Power Estimation Power Estimation Phase Analysis Dynamic Management

53 Counter-Based Power Estimation
Power of component i: MaxPower[i] · ArchScaling[i] · AccessRate[i] + NonGatedPower[i] Die area & stressmarks Microarch. properties Performance counters Empirical power measurements Fast (real-time) power estimation Offers estimated view of on-chip detail 22 subcomponents in the Pentium4 die  22 dimensional “Power Vector (PV)” EX: Trace cache delivers 3 uops/cycle in deliver mode and 1 uop/cycle in build mode: Power(TC)=[Access-Rate(TC)/3 + Access-Rate(ID)] x MaxPower(TC) + Non-gated TC CLK power

54 Experimental Framework
1mV/Adc conversion Voltage readings via RS232 to logging machine POWER SERVER POWER CLIENT Counter based access rates over ethernet Convert voltage to measured power Convert access rates to modeled powers

55 Power Estimation Results
Gcc Gzip Vpr Vortex Gap Crafty Measured Estimated MEASURED/ESTIMATED CPU POWER vs. TIME Average Estimation Error: 3W (~6%)

56 Identifying Phases with Power Vectors

57 This Talk On-the-fly phase monitoring and prediction
Offline phase characterization Simple, target specific phases Detailed phases reflecting power behavior Application to dynamic management Workload power phase characterization We consider… for…

58 This Talk Application Runtime Monitoring Power Estimation
Counter-based runtime power estimation Identifying workload phase behavior with event counter information Application Runtime Monitoring Hardware Performance Counters Dynamic Program Flow Power Estimation Power Estimation Phase Analysis Dynamic Management

59 Power Vectors: Similarity Metrics
Manhattan (L1) distance between vectors r,c Absolute Distance (AD): Manhattan distance between normalized r,c Normalized Distance (ND): Combination of AD & ND Similarity Distance (SD):

60 Phase Classification All pair-wise distances constitute a “Similarity Matrix” SD(r,c)  Matrix entry(r,c) Classify execution into phases: First Pivot Clustering Target: O(10) phases Cumulative errors: Max: 4.7W & RMS: 3.1W (~6%) Total power error < Σ(comp. errors) EOWWC’03

61 Similarity Matrix and Phase Classification
All pair-wise distances constitute a “Similarity Matrix” SD(r,c)  Matrix entry(r,c) Classify execution into phases: First Pivot Clustering Cumulative errors: Max: 4.7W & RMS: 3.1W Total power error < Σ(comp. errors) EOWWC’03

62 Evaluating Different Features for Phase Analysis

63 This Talk Application Runtime Monitoring Power Estimation
Counter-based runtime power estimation Identifying workload phase behavior with event counter information Evaluating event-counter and control-flow techniques for power phase characterization Application Runtime Monitoring Hardware Performance Counters Dynamic Program Flow Power Estimation Power Estimation Phase Analysis Dynamic Management

64 BAK-Different Target Number of Clusters
Why worse for small # of phases: too much granularity in control-flow EO IISWC PMCs perform relatively better for the practical range of target clusters Relative BBV error is significantly larger than PMCs for small number of phases [1-10]

65 Summary Identifying phases with event counter vectors
Comparison of event counter and control flow approaches Event counters provide good proxy for runtime power profile of applications Simple similarity analysis on the composition of PMC events help identify workload power phases PMC-based features generally provide a better characterization of power behavior compared to control-flow features By now we have a good idea of how to track phases, and confidence in our features, next is: Detecting recurrent under variability Phase-guided dynamic mgmt

66 Detecting Recurrent Phase Behavior under Real-System Variability

67 This Talk Application Runtime Monitoring Power Estimation
Counter-based runtime power estimation Identifying workload phase behavior with event counter information Evaluating event-counter and control-flow techniques for power phase characterization Detecting Recurrent Phase Behavior under Real-System Variability Application Runtime Monitoring Hardware Performance Counters Dynamic Program Flow Power Estimation Power Estimation Phase Analysis Dynamic Management

68 Detecting Recurrent Phase Behavior under Real-System Variability
Repetitive phases inevitably exhibit different behavior Values & durations vary Phase distributions vary Key Questions: How do phases manifest themselves with real-system effects? How can we extract recurrent behavior in spite of these variations?

69 Real-System Variability Effects on Phases
Metric t Ideal A B C Glitch A B C D NEED TO BE ANUMATION, WITH CHANGING TIMELINES and PHASE LABELS, ALSO SHOULD HAVE GRADIENT Gradient A B C D E Shift A B C D E Mutation A B C D E F Time Dilation A B C D E F

70 Comparing Phase Signatures
Metric Ideal t A B C Metric Final t A B C D E F A direct apples to apples comparison of phase signatures is not very relevant in real world!

71 Value-Based Phases Value Based Phases (VBP) 3 2 2 1 t A B C 6 5 4 3 2 2 Let’s revisit our concept example 1 t A B C D E F Value based phase representations do not show good correlation

72 Our Proposed Solution with Transitions
Transition Based Phases (TBP) 1 1 1 00…0 00…0 00…0 00…0 t A B C 1 1 1 1 1 1 Here I intentionally ignore shifts to make my point, at the end all analyses are shift invariant! So In the remaining of talk, we focus on transitions, and prune the remaining effects 00…0 00…0 00…0 00…0 00…0 t A B C D E F Tracking phase transitions rather than phase sequences is more useful in detecting recurrent behavior

73 Our Transition-Guided Detection Framework
Phase #1 Sample PMCs to form 12D vectors Phase #2 Vector stream #1 Identify Transitions Vector stream #2 TBPinit #1 Apply glitch/gradient filtering TBPinit #2 TBPgg #1 TBPgg #2 Apply near-neighbor blurring TBPggN #1 Match ⇒ Peak at best alignment Mismatch ⇒ No observable peak Apply cross correlation

74 Sampling Effects: Glitches & Gradients
Glitch: Instability where before & after are same  Spurious transitions Gradient: Instability where before & after are different  A single true transition Glitch/Gradient Filtering: Very simple: no consecutive transitions Initial Transitions: GLITCHES: 1 Refined Trans-ns: GRADIENTS:

75 Time Dilations Mismatch!
Observation: Dilations exist as small jitters (few samples) Proposed Solution: “Near-Neighbor Blurring” Blur edges slightly  Consider transitions as distributions around their actual locations Tolerance: Spread of this distribution, [t-x, t+x] samples Ex: Matching improvement with tolerance=2: run1 1 1 1 Mismatch! Results of this in a while t run2 1 1 1 1 t

76 Time Dilations Observation: Dilations exist as small jitters (few samples) Proposed Solution: “Near-Neighbor Blurring” Blur edges slightly  Consider transitions as distributions around their actual locations Tolerance: Spread of this distribution, [t-x, t+x] samples Ex: Matching improvement with tolerance=2: run1 1 1 1 Match! Results of this in a while .7 .3 t run2 1 1 1 1 t

77 Receiver Operating Characteristics
0 detect threshold P{hit} = 1 P{false alarm} = 1 Desired operating point P{hit} ~ 1 P{false alarm} ~ 0 Best detection scheme (tolerance=1) achieves 100% hit detection with <5% false alarms. Very high detect threshold P{hit} = 0 P{false alarm} = 0

78 Improvement with Transition-Based Phases
In all cases transitions perform better In almost all cases near-neighbor blurring improves detection

79 Summary Detecting phase behavior on real systems has interesting challenges resulting from system induced variability Phase transition information improves detection capabilities TBP show 6X better detection capabilities than VBP Supporting methods, such as Glitch/Gradient Filtering and Near-Neighbor Blurring improve detectability of transition signatures Near-neighbor blurring with tolerance=1 achieves 100% recurrence detection with <5% false alarms

80 Dynamic Power Management with Live, Runtime Phase Prediction

81 This Talk On-the-fly phase monitoring and prediction
Offline phase characterization Simple, target specific phases Detailed phases reflecting power behavior Application to dynamic management Workload power phase characterization We look at… for… Left: Micro’06 Right: ~(Micro’03 + WWC’03) & HPCA’05 – hadi bakem Ever? IISWC’05?? -> Not likely (All IISWC is in extras.ppt NOW. But needs mins by itself -> ditch)

82 This Talk Application Runtime Monitoring Power Estimation
Counter-based runtime power estimation Identifying workload phase behavior with event counter information Evaluating event-counter and control-flow techniques for power phase characterization Detecting Recurrent Phase Behavior under Real-System Variability Workload adaptive power management with live, runtime phase predictions Application Runtime Monitoring Hardware Performance Counters Dynamic Program Flow Power Estimation Power Estimation Phase Analysis Dynamic Management

83 Design Constraints and Decisions
Target management technique Dynamic voltage and frequency scaling (DVFS) Experimental platform Pentium-M (Banias)  2 PMCs Instruction based monitoring Eliminate timing variations First PMC  Instructions retired DVFS potential: α Memory boundedness of application α (Available concurrent execution)-1 Second PMC  Memory accesses per instruction (MPI) DVFS invariance: Tracked features should not change with dynamic adaptations Here we shift gears from our general purpose phase analiz for specific target

84 Guiding Dynamic Power Management
Target management technique Dynamic voltage and frequency scaling (DVFS) DVFS potential: Memory Access Rate Low High CPU-Memory Overlap Low High f: t CPU MEM ½ f: t Here we shift gears from our general purpose phase analiz for specific target CPU MEM Track (Main) Memory accesses per instruction (MPI) Different MPI rates  Different DVFS settings

85 Phase Definitions Assign different MPI ranges to different phases
Higher phase number  more memory bound phase MPI Phase # DVFS Setting < 0.005 1 (1500 MHz, 1484 mV) [0.005,0.010) 2 (1400 MHz, 1452 mV) [0.010,0.015) 3 (1200 MHz, 1356 mV) [0.015,0.020) 4 (1000 MHz, 1228 mV) [0.020,0.030) 5 ( 800 MHz, 1116 mV) > 0.030 6 ( 600 MHz, mV) [Based on Wu et al. Micro’05] Important phase properties Resilient to system variations Invariant to dynamic power management actions

86 Phase Definitions Need phases such that: Represent dynamic voltage and frequency scaling (DVFS) potential Resilient to system variations Invariant to dynamic management actions DVFS potential: f(Memory Access Rate, Overlapping CPU Execution) Memory accesses per instruction (MPI) - Exexuted IPC - (ROB entries)/(RS entries) Resilient to variations with fixed instruction granularity tracking MPI invariant to DVFS power modes Different MPI rates  Different phases  DVFS settings 6

87 Variability and Power Savings Quadrants

88 GPHT Prediction Accuracies
100 90 80 LastValue Prediction Accuracy (%) 70 PHT:1024, GPHR:8 FixWindow_8 60 VarWindow_128_0.005 50 40 gzip_log mcf_inp gcc_200 gap_ref gcc_scilab gcc_expr ammp_in gcc_166 apsi_ref mgrid_in applu_in parser_ref equake_in wupwise_ref gcc_integrate bzip2_program bzip2_source bzip2_graphic On the x-axis some of spec ordered Compare to reactive approaches Last Value / Fixed Window History / Variable Window History GPHT performs significantly better for highly varying applications Up to 6X and on average 2.4X misprediction improvement

89 Impact of PHT Size 128-entry PHT is plenty
100 90 80 LastValue Prediction Accuracy (%) 70 PHT:1024, GPHR:8 60 PHT:128, GPHR:8 PHT:64, GPHR:8 50 PHT:1, GPHR:8 40 gzip_log mcf_inp gcc_200 gap_ref gcc_scilab gcc_expr ammp_in gcc_166 apsi_ref parser_ref mgrid_in applu_in equake_in wupwise_ref gcc_integrate bzip2_program bzip2_source bzip2_graphic 128-entry PHT is plenty Converges to last value as PHT entries  1

90 Impact of Phase Granularities
Average accuracy over experimented applications: N=1  Both 100% NO(10,000)  Both  0% 6

91 Mispredicted Distance vs. Prediction Accuracy
Average distance between actual and predicted phase numbers over whole execution NOTE: Phases not uniform space though!! 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 gzip_log mcf_inp gcc_200 gcc_scilab wupwise_ref gap_ref gcc_integrate gcc_expr ammp_in gcc_166 parser_ref apsi_ref bzip2_program mgrid_in bzip2_source bzip2_graphic applu_in equake_in Prediction Error Distance LastValue GPHT_8_1024

92 DVFS Invariance Important constraint when talking “actions”
If actions change phase classifications: Obsolete past history & unreliable predictions

93 Application Execution
Control Flow Dynamic Adaptation Control: Stop/Read performance counters Every 100 million instructions Translate to phases Update phase predictor states Predict next phase Application Execution Translate to DVFS setting Same as current setting? No Apply new DVFS setting Yes Exit to program execution Clear interrupt Restart counters

94 Full-System Implementation
Application Application Binary OS PMC and Phase Log Predictor State Performance Monitoring Interrupt PMI Interrupt Handler Predict Next Phase Stop/Read Counters Check/Set DVFS State Hardware I1 Restart Counters V1 VCPU Data Acquisition System Parallel Port Voltage Regulator I2 V2 Performance Counters DVFS Registers R1,2=2mΩ Power Supply Pentium-M Processor

95 Averages: GPHT: 21% & Last Value: 14%  50% Improvement
Power Savings Averages: GPHT: 21% & Last Value: 14%  50% Improvement

96 Bounding Performance Degradation
Phase mappings dynamically configurable Can limit performance degradation sacrificing power efficiency EO Micro’06

97 GPHT Overhead Insignificant ~0.02%

98 Summary Phase characterizations help identify repetitive application behavior under real-system variability and dynamic management actions Runtime phase predictions with the Global Phase History Table can accurately predict future application behavior Up to 6X and on average 2.4X less mispredictions than reactive approaches Dynamic power management guided by these phase predictions help improve system power-performance efficiency 27% EDP improvements over baseline and 7% over reactive approaches

99 Repository

100 Full-System Implementation
V1 OS kernel R1=2mΩ VCPU I2 V2 Predictor state PMC and phase log R2=2mΩ Pentium-M Processor Performance counters Voltage regulator PMI Interrupt handler DVFS mode set registers Power supply

101 Guiding Dynamic Power Management
Target management technique Dynamic voltage and frequency scaling (DVFS) DVFS potential: α Memory boundedness of application α (Available concurrent execution)-1 f: t CPU MEM ½ f: t Here we shift gears from our general purpose phase analiz for specific target CPU MEM Track Memory accesses per instruction (MPI) Different MPI rates  Different DVFS settings

102 Guiding Dynamic Power Management
Target management technique Dynamic voltage and frequency scaling (DVFS) DVFS potential: α Memory boundedness of application α (Available concurrent execution)-1 f: t CPU MEM ½ f: t Here we shift gears from our general purpose phase analiz for specific target CPU MEM Track Memory accesses per instruction (MPI) Different MPI rates  Different DVFS settings

103 Guiding Dynamic Power Management
Target management technique Dynamic voltage and frequency scaling (DVFS) DVFS potential: α Memory boundedness of application α (Available concurrent execution)-1 f: t CPU MEM ½ f: t Here we shift gears from our general purpose phase analiz for specific target CPU MEM Track Memory accesses per instruction (MPI) Different MPI rates  Different DVFS settings

104 CMP Management AMD Barcelona
AMD's Barcelona core, due out in Q2 '07, will have support for independent clocks per core but all sharing the same voltage

105 CMP Management AMD Barcelona
AMD's Barcelona core, due out in Q2 '07, will have support for independent clocks per core but all sharing the same voltage

106 PHASES & Phases & phases


Download ppt "Canturk Isci Advisor: Margaret Martonosi"

Similar presentations


Ads by Google