Presentation is loading. Please wait.

Presentation is loading. Please wait.

Power Management for Hard Disks and Main Memory 11/06/2008 Presented by Matthias Eiblmaier 1.

Similar presentations


Presentation on theme: "Power Management for Hard Disks and Main Memory 11/06/2008 Presented by Matthias Eiblmaier 1."— Presentation transcript:

1 Power Management for Hard Disks and Main Memory 11/06/2008 Presented by Matthias Eiblmaier 1

2 Power-consumption is a key factor to achieve environmental and financial goals 11/06/2008 Matthias Eiblmaier Motivation There are several ways to save power in a computer Throttling CPU speed Set idle RAM banks and ranks into low power mode Throttling disk speed 2

3 Several approaches have been proposed to save energy by efficient peripheral power management. The two papers that will be discussed today: 11/06/2008 Matthias Eiblmaier Outline A.Performance Directed Energy Management for Main Memory and Disks (by Xiaodong Li, Zhenmin Li, Francis David, Pin Zhou, Yuanyuan Zhou, Sarita Adve and Sanjeev Kumar)‏ Department of Computer Science,University of Illinois, The Proceedings of the Eleventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'04), October, 2004 B.A comprehensive Approach to DRAM Power Management (by Ibrahim Hur and Calvin Lin)‏ Department of Computer Sciences, the University of Texas the 14th IEEE International Symposium on High-Performance Computer Architecture (HPCA 2008), Salt Lake City, Utah, February 2008 3

4 11/06/2008 Matthias Eiblmaier Outline A.Performance Directed Energy Management for Main Memory and Disks 1.Introduction and background 2.Performance Guarantees 3.Control Algorithms 4.Disk energy management 5.Experiment 6.Conclusion 7.Critiques 4

5 11/06/2008 Matthias Eiblmaier 1. Introduction and background You can save power for a storage device by putting them into low power modes. Low power modes can degrade performance Current (threshold based) algorithms: monitor usage (response time) and move if this value exceeds, based on certain thresholds, the device into low power mode. need painstaking, application-dependent manual tuning of thresholds have no performance guarantee 5

6 11/06/2008 Matthias Eiblmaier 1. Introduction and background This paper contributes: 1. Technique to guarantee performance 2. A self tuning threshold based control algorithm (called PD)‏ 3. A simpler, optimization based, threshold free control algorithm (called PS)‏ RDRAM Memory Power Modes: Each chip can be activated independently There are 4 power modes: active, standby, nap & power down Chip needs to be in active mode to serve are read/write request. Previous control algorithms: Static: put device in a fixed power mode Dynamic: change power mode when being idle for a specific amount of time (threshold)‏ 6

7 11/06/2008 Matthias Eiblmaier 2. Performance Guarantee Assume the best performance is without energy management An acceptable slowdown is referred to the control algorithm Slowdown is the percentage increase of the execution time To estimate the slowdown, the following terms are used: t = Execution time using the underlaying energy management until some point P in the program = Execution time without any energy management until the same point in the program Delay(t) = absolute increase in execution time due to energy mangement = t – T base (t)‏ Actual percentage slowdown = 7

8 11/06/2008 Matthias Eiblmaier 2. Performance Guarantee Performance Gurantee is subject to Slack(t) = amount of execution time not violating timing constraints Epoch based algorithm: Application’s execution time can be predicted. Estimates available slack for entire epoch at start of epoch. Check slack after each access. If slack is not enough, algorithm forced all devices in active mode. 8 Example :

9 11/06/2008 Matthias Eiblmaier 2. Performance Guarantee Available slack for next period: where t epoch is the predicted execution time of the next epoch without power management slackt Slowdown/100 * t Delay(t)‏ slack Slowdown/100 * t epoch t epoch Delay(t+1)‏ 9

10 11/06/2008 Matthias Eiblmaier 3. Control Algorithm Two kind of algorithms are used for performance guarantee: Performance-directed static algorithm (PS). Fixed power mode to a memory chip for the entire duration of an epoch. Performance-directed dynamic algorithm (PD). Transfers to low power mode after some idle time and re- tunes of the thresholds based on available slack and workload characteristics 10

11 11/06/2008 Matthias Eiblmaier 3. Control Algorithm (PS) Goal is to choose for every chip a configuration, that maximizes the total energy savings to the constraint of the total available slack: maximize: subject to: where PS Algorithm called at the beginning of every epoch 1. Predict AvailableSlack for the next epoch. 2. Predict E(Ci) and D(Ci) for each device i. 3. Solve the knapsack problem. 4. Set the power mode for each device for the next epoch. 11

12 11/06/2008 Matthias Eiblmaier 3. Control Algorithm (PS) Obtain available slack from performance-guarantee algorithm Algorithm need to predict next epoch’s number and distribution of accesses. Prediction: Number of accesses: The same as last epoch. Distribution of accesses: Uniform distribution in time. Algorithm reclaim any unused slack from last epoch. 12

13 11/06/2008 Matthias Eiblmaier 3. Control Algorithm (PD) PD automatically re-tunes its thresholds at the end of each epoch, based on available slack and workload characteristics: 1.Predict AvailableSlack for the next epoch. 2.Predict number of accesses for the next epoch. 3.Adjust the functions for Thk(S) (1≤k≤M-1) access count measured from the last epoch. 4.for k = 1,...,M-1 do. 5.5. Use the Thk(S) functions to determine the values for Thk, 6. end for. 7. Set thresholds Th1,..., ThM for all chips. 13 Manipulate thresholds + slackslack opt Transfer function Per.Dyn. Controller - Command threshold If slack to low set higher thresholds

14 11/06/2008 Matthias Eiblmaier 3. Control Algorithm (PD) When i>k To keep device active during the short idle time. Using break-even time as threshold. When 0≤i≤k(threshold: Ck-i*tk) Putting a device in mode k unless device id already idle for a large period. The lower value of i the higher threshold. threshold: Ck-i*tk Constant C use to dynamically adjust threshold: Slack not used up: Cnext=0.95*Ccurrent Slack used up: Cnext=2*Ccurrent 14

15 11/06/2008 Matthias Eiblmaier 4. Disk management Model DRPM disk model: multi-speed disk. Can service request at a low rotational speed. No transition overhead. Performance delay: Period of speed change. Service in low speed. Performance Guarantee Static algorithm: The same as memory. Dynamic algorithm: Algorithm adjust UT and LT based on Predicted access count. Available slack. 15

16 11/06/2008 Matthias Eiblmaier 5. Experiments The experimental verification are done on a simulator (Simplescalar) with an enhanced RDRAM memory model. Execution times with original algorithms: 16

17 11/06/2008 Matthias Eiblmaier 5. Experiments Results for Memory: 17

18 11/06/2008 Matthias Eiblmaier 5. Experiments Experiments and results for Disk: Simulator: DiskSim. Disk:IBM Ultrastar 36Z15. Rotational speed:3K,6K,9K,12K. Access distribution: Exponential,Pareto,Cello’96. 18

19 11/06/2008 Matthias Eiblmaier 6. Conclusion Improvement PM algorithm’s execution time degrade. Proposing self-tuning energy-management algorithm. 19

20 11/06/2008 Matthias Eiblmaier 7. Critiques PD/PS cannot guarantee real time Performance guarantee algorithm is not tested for stability PD causes overhead Loop variable Delay, hence slack, is just estimated Experimental verification lacks of substantial benchmarks (e.g. real server workloads) Not exactly stated where and how to implement algorithm (chip, OS) 20

21 11/06/2008 Matthias Eiblmaier Outline B.Performance Directed Energy Management for Main Memory and Disks 1.Queue-Aware Power-Down Mechanism 2.Power/Performance-Aware Scheduling 3.Adaptive Memory Throttling 4.Delay Estimator Model 5.Simulation and Results 6.Conclusions 7.Critiques 21

22 11/06/2008 Matthias Eiblmaier 1. Queue-Aware Power- Down Mechanism DRAM Processors/Caches Memory Queue Scheduler Read Write Queues MEMORY CONTROLLER 1.Read/Write instructions are queued in a stack 2.Scheduler (AHB) decides which instruction is preferred 3.Subsequently instructions are transferred into FIFO Memory Queue 22

23 11/06/2008 Matthias Eiblmaier 1. Queue-Aware Power- Down Mechanism 1.Rank counter is zero -> rank is idle & 2.The rank status bit is 0 -> rank is not yet in a low power mode & 3.There is no command in the CAQ with the same rank number -> avoids powering down if a access of that rank is immanent Read/Write Queue C:1 - R:2 – B:1 – 0 - 1 C:1 - R:2 – B:1 – 0 - 2 C:1 - R:2 – B:1 – 0 - 3 C:1 - R:2 – B:1 – 0 - 4 C:1 - R:2 – B:1 – 0 - 5 C:1 - R:2 – B:1 – 0 - 6 C:1 - R:2 – B:1 – 0 - 7 C:1 - R:1 – B:1 – 0 - 1 Set rank1 counter to 8 Set rank2 status bit to 8 Decrement counter for rank 2 Decrement counter for rank 1 Set rank2 status bit to 8 Power down rank 1 … 23

24 11/06/2008 Matthias Eiblmaier 2. Power/Performance- Aware Scheduling 1.An adaptive history scheduler uses the history of recently scheduled memory commands when selecting the next memory command 2.A finite state machine (FSM) groups same-rank commands in the memory as close as possible -> total amount of power-down/up operations is reduced 3.This FSM is combined with performance driven FSM and latency driven FSM 24

25 11/06/2008 Matthias Eiblmaier 3. Adaptive Memory Throttling 25 DRAM Processors/Caches Memory Queue Scheduler Read Write Queues Reads/Writes MEMORY CONTROLLER Throttle Delay Estimator Throttling Mechanism Model Builder (a software tool, active only during system design/install time) decides to throttle or not, at every cycle determines how much to throttle, at every 1 million cycles Power Target sets the parameters for the delay estimator

26 11/06/2008 Matthias Eiblmaier 3. Adaptive Memory Throttling Stall all traffic from the memory controller to DRAM for T cycles for every 10,000 cycle intervals... 10,000 cycles T cycles activestall active stall time T cycles How to calculate T (throttling delay)? 26

27 11/06/2008 Matthias Eiblmaier 3. Adaptive Memory Throttling Model Building A B Application 1 App. 2 T  Throttling degrades performance  Inaccurate throttling  Power consumption is over the budget  Unnecessary performance loss 27

28 11/06/2008 Matthias Eiblmaier 4. Delay Estimation Model Calculates the throttling delay, T, using a linear model – Input: Power threshold and information about memory access behavior of the application – Output: Throttling delay Calculates the delay periodically (in epochs) – Assumes consecutive epochs have similar behavior – Epoch length is long (1 million cycles): overhead is small What are the features and the coefficients of the linear model? Step 1: Perform experiments with various memory access behavior Step 2: Determine models and model features – Needs human interaction during system design time Step 3: Compute model coefficients – Solution of a linear system of equations 28

29 11/06/2008 Matthias Eiblmaier 4. Delay Estimation Model Model Building An offline process performed during system design/installation Step 1: Perform experiments with various memory access behavior Step 2: Determine models and model features – Needs human interaction during system design time Step 3: Compute model coefficients – Solution of a linear system of equations 29

30 11/06/2008 Matthias Eiblmaier 4. Delay Estimation Model Model features that we determine – Power threshold – Number of Reads – Number of Writes – Bank conflict information Possible Models – T1: Uses only Power threshold – T2: Uses Power, Reads, Writes – T3: Uses all features 30

31 11/06/2008 Matthias Eiblmaier 4. Delay Estimation Model Step 1: Set up a system of equations – Known values are measurement data – Unknowns are model coefficients Step 2: Solve the system R 2 =0.191 R 2 =0.122 R 2 =0.003 31

32 11/06/2008 Matthias Eiblmaier 5. Simulation and Results Used a cycle accurate IBM Power5+ simulator that IBM design team uses – Simulated performance and DRAM power – 2.1 GHz, 533-DDR2 Evaluated single thread and SMT configurations – Stream – NAS – SPEC CPU2006fp – Commercial benchmarks Memory Controller 2 cores on a chip SMT capability ~300 million transistors (1.6% of chip area) 32

33 11/06/2008 Matthias Eiblmaier 5. Simulation and Results Energy efficiency improvements from Power-Down mechanism and Power- Aware Scheduler Stream: 18.1% SPECfp2006: 46.1% 33

34 11/06/2008 Matthias Eiblmaier 5. Simulation and Results 34

35 11/06/2008 Matthias Eiblmaier 6. Conclusion Introduced three techniques for DRAM power management – Queue-Aware Power-Down – Power-Aware Scheduler – Adaptive Memory Throttling Evaluated on a highly tuned system, IBM Power5+ – Simple and accurate – Low cost Results in the paper – Energy efficiency improvements from our Power-Down mechanism and Power-Aware Scheduler Stream: 18.1% SPECfp2006: 46.1% 35

36 11/06/2008 Matthias Eiblmaier 7. Critiques Overhead is not computed or estimated Needs a relative complicated architecture Throttling and queuing result in delays -> no RT Dependence on prediction model 36

37 11/06/2008 Matthias Eiblmaier Overall Conclusion and comparison 37 PS/PD + Performance Guarantee Queue aware mechanism + Power aware scheduling + Throttling ObjectiveMinimize power + guarantee fixed worst case execution time Minimize power Maximize performance RealizationexperimentalBased on AHB scheduler Real-timeno ImplementationMemory Controller or OS kernel (not specified) Memory Controller MethodologySimulation (Simplescalar)Simulation (IBM Power5+) ControllerAd-hocOpen loop/open loop/ad hoc

38 11/06/2008 Matthias Eiblmaier Thank You 38

39 11/06/2008 Matthias Eiblmaier 3. Control Algorithm To enforce performance guarantee, algorithm needs to: Apportion a part of the available to each chip. keep track of the actual delay each chip incurs. Compare actual delay and predicted delay for every epoch 39


Download ppt "Power Management for Hard Disks and Main Memory 11/06/2008 Presented by Matthias Eiblmaier 1."

Similar presentations


Ads by Google