Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cooperative Boosting: Needy versus Greedy Power Management INDRANI PAUL 1,2, SRILATHA MANNE 1, MANISH ARORA 1,3, W. LLOYD BIRCHER 1, SUDHAKAR YALAMANCHILI.

Similar presentations


Presentation on theme: "Cooperative Boosting: Needy versus Greedy Power Management INDRANI PAUL 1,2, SRILATHA MANNE 1, MANISH ARORA 1,3, W. LLOYD BIRCHER 1, SUDHAKAR YALAMANCHILI."— Presentation transcript:

1 Cooperative Boosting: Needy versus Greedy Power Management INDRANI PAUL 1,2, SRILATHA MANNE 1, MANISH ARORA 1,3, W. LLOYD BIRCHER 1, SUDHAKAR YALAMANCHILI 2 JUNE Advanced Micro Devices, Inc. 2 Georgia Institute of Technology 3 University of California, San Diego

2 2COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 GOAL & OUTLINE  Goal: –Optimize performance under power and thermal constraints in heterogeneous architecture  Outline: –State-of-the-Art Power and Thermal Management –Thermal Coupling –Performance Coupling –Cooperative Boosting –Results

3 STATE-OF-THE-ART POWER AND THERMAL MANAGEMENT

4 4COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 STATE-OF-THE-ART PROCESSOR Graphics processing unit (GPU): 384 AMD Radeon™ cores Multi-threaded CPU cores Shared Northbridge  access to overlapping CPU-GPU physical address spaces  Many resources shared among CPU and GPU –For example, memory hierarchy, power, and thermal capacity Accelerated processing unit (APU)

5 5COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 PROGRAMMING MODEL  Coupled programming model  Offload compute intensive tasks to the GPU APU Hardware CPU Operating System User Application OpenCL™ Software Stack Host Tasks GPU Tasks GPU Each OpenCL kernel Grid of threads, each operating over a data partition N-Dimensional Range

6 6COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 WHAT IS THERMAL DESIGN POWER?  Thermal design power: TDP –Upper bound for the sustainable power draw –Determines the cooling solution and package limits –Usually set by determining worst-case execution profile  Performance depends on effective utilization of thermal headroom  Instructions/cycle Time

7 7COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 STATE-OF-THE-ART: BI-DIRECTIONAL APPLICATION POWER MANAGEMENT (BAPM)  Power management algorithm 1.Calculate digital estimate of power consumption 2.Convert power to temperature - RC network model for heat transfer 3.Assign new power budgets to TEs based on temperature headroom 4.TEs locally control (boost) their own DVFS states Chip is divided into BAPM-controlled thermal entities (TEs) CU0 TE CU1 TE GPU TE

8 8COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 CURRENT BOOST ALGORITHMS: POWER VS. THERMAL MANAGEMENT 3.0 Time APU Die Temperature Thermal Headroom Convert thermal headroom to higher performance through boost HW Boost states Max Die Temp SW visible states APU Performance CPU DVFS- state HW Only (Boost) Pb0 Pb1 SW- Visible P0 P1 P Pmin GPU DVFS- state HW Only High Medium Low

9 9COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 KEY TAKEAWAYS  Power and thermals are shared resources in a heterogeneous processor  thermal coupling  Overall application performance is a function of both the CPU and the GPU  performance coupling  State of the practice: Managing to thermal limits by locally boosting when thermal headroom is available  utilize all of the headroom!

10 THERMAL COUPLING

11 11COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 THERMAL SIGNATURES: CPU & GPU  High-power GPU benchmark  Sustained power: 19.7 W  High-power CPU benchmark, idle GPU  Sustained power: 18.8 W  Higher thermal density of CPUs  steeper thermal gradients  Faster consumption of thermal headroom on the CPU Steady-state thermal fields produced by BAPM on a 19W AMD Trinity APU

12 12COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 THERMAL TIME CONSTANT  Significant rise in temperature of the idle component due to thermal coupling and pollution from the active components within a die  CPU consumes thermal headroom more rapidly (4X faster)  GPU can sustain higher power boosts longer Idle GPU temperature rose by ~20 o C

13 13COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 THERMAL COUPLING: THERMAL HEADROOM AVAILABILITY

14 14COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 THERMAL COUPLING: BOOST FOR CONSUMPTION OF THERMAL HEADROOM 6 o C rise in GPU temperature once CPU power limit was removed and both CUs were allowed to boost

15 15COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 THERMAL COUPLING: THERMAL THROTTLING  Minimize detrimental effects of thermal coupling by capping maximum CPU P-state  P-state limiting

16 16COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 BAPM P2  Capping the max CPU DVFS state at P2  Capping the max CPU DVFS state at P4 RESIDENCY IN DIFFERENT POWER STATES

17 17COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 KEY TAKEAWAYS  Thermal signatures different between CPU and GPU  Heterogeneity in physical properties  High thermal density leads to faster consumption of thermal headroom in the CPU cores  Significant thermal coupling from active to idle components  Near the thermal limit, boosting based on available thermal headroom introduces inefficiencies –Reduce the CPU P-state limit

18 PERFORMANCE COUPLING

19 19COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 CPU-GPU PERFORMANCE COUPLING  CPU should be just fast enough to keep the GPU fully utilized  P-state should be high enough APU Hardware CPU Operating System User Application OpenCL™ Software Stack Host Tasks GPU Tasks GPU Each OpenCL kernel Grid of threads, each operating over a data partition N-Dimensional Range

20 20COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 MANAGING THERMALS FOR PERFORMANCE-COUPLED APPLICATIONS

21 21COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 MANAGING THERMALS FOR PERFORMANCE-COUPLED APPLICATIONS

22 22COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 MANAGING THERMALS FOR PERFORMANCE-COUPLED APPLICATIONS

23 23COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 P-STATE SENSITIVITY

24 24COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 DETERMINING CRITICAL CPU P-STATE  Find the inflection point in performance as a function of CPU P-state  critical P-state  Critical P-state is determined by interference (CPU vs. GPU) in the memory system Critical CPU P-state Limit

25 25COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 KEY TAKEAWAYS  Performance coupling – CPU-GPU performance dependency  Balance between detrimental effects of thermal coupling and needs of performance coupling  CPU critical P-state limit is determined by performance coupling and thermal coupling  GPU memory bandwidth gradients as a function of CPU frequency along with CPU IPC serve as a measure of performance coupling

26 COOPERATIVE BOOSTING ALGORITHM

27 27COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 COOPERATIVE BOOSTING (CB)  Overlaid on top of BAPM – invoked periodically when thermal coupling is detrimental i.e. when thermal limit is approached

28 28COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 EXPERIMENTAL SET-UP  Trinity A8-4555M APU: 19W TDP  CPU: Managed by HW or SW P- state Voltage (V) Freq (MHz) HW Only (Boost) Pb Pb SW- Visible P P P P P  GPU: Managed by HW only  GPU-high: 423 MHz  GPU-med: 320 MHz  Cooperative Boosting implemented as a system software policy overlaid on top of BAPM in real hardware

29 29COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 BENCHMARKS BM (Description)Problem SizeType NDL (Needleman- Wusch) 4096x4096 data points, 1K iterations Performance- coupled HS (HotSpot)1024x1024 data points, 100K iterations Performance- coupled BF (BoxFilter SAT)1Kx1K input image, 6x6 filter,10K iterations Performance- coupled FAH (Folding at Home) Synthesis of large protein: spectrin$ Performance- coupled BS (Binary Search)4096 inputs, 256 segments, 1M iterations Performance- coupled Viewdle (Haar facial recognition) Image 1920x1080, 2K iterationsPerformance- coupled Lbm (CPU2006)4 threads, Ref inputCPU-centric Gcc (CPU2006)4 threads, Ref inputCPU-centric

30 RESULTS

31 31COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 PERFORMANCE IMPROVEMENT WITH COOPERATIVE BOOSTING  Static P-state limiting requires profiling and a priori information of workload  An average of 15% performance gain for performance-coupled applications with CB

32 32COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 POWER SAVINGS  Average 10% power savings across performance-coupled applications  5 o C reduction in peak temperature for BS -> large percentage of leakage power savings

33 33COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 ENERGY*DELAY^2  Average 33% energy-delay^2 savings across performance-coupled applications

34 34COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 CONCLUSIONS  Demonstrated effects of thermal and performance coupling on performance –Applications with high GPU compute-to-load ratio are more susceptible to detrimental effects of thermal coupling –Emergent balanced workloads with split CPU-GPU computation are tightly performance-coupled  Proposed Cooperative Boosting (CB) technique to determine critical CPU P-state at which effects of thermal coupling are balanced with needs of performance coupling –Shifts power to CPU only when needed  Demonstrated effectiveness of CB on real hardware as a well- rounded power and thermal management scheme

35 35COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners.

36 BACKUP

37 37COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 VIEWDLE PERFORMANCE ANALYSIS

38 38COOPERATIVE BOOSTING: NEEDY VERSUS GREEDY POWER MANAGEMENT | JUNE, 2013 BINARY SEARCH TEMPERATURE


Download ppt "Cooperative Boosting: Needy versus Greedy Power Management INDRANI PAUL 1,2, SRILATHA MANNE 1, MANISH ARORA 1,3, W. LLOYD BIRCHER 1, SUDHAKAR YALAMANCHILI."

Similar presentations


Ads by Google