Power Management for Hard Disks and Main Memory 11/06/2008 Presented by Matthias Eiblmaier 1.

Slides:

Advertisements

Similar presentations

Hadi Goudarzi and Massoud Pedram

Advertisements

1 “Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation In Multi-processor Real-Time Systems” Dakai Zhu, Rami Melhem, and Bruce Childers.

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,

Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.

4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.

Solutions for Scheduling Assays. Why do we use laboratory automation? Improve quality control (QC) Free resources Reduce sa fety risks Automatic data.

1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.

1 Adaptive History-Based Memory Schedulers Ibrahim Hur and Calvin Lin IBM Austin The University of Texas at Austin.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

1 Conserving Energy in RAID Systems with Conventional Disks Dong Li, Jun Wang Dept. of Computer Science & Engineering University of Nebraska-Lincoln Peter.

SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.

Input/Output Management and Disk Scheduling

Dynamic Power Management for Systems with Multiple Power Saving States Sandy Irani, Sandeep Shukla, Rajesh Gupta.

Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.

Bandwidth Allocation in a Self-Managing Multimedia File Server Vijay Sundaram and Prashant Shenoy Department of Computer Science University of Massachusetts.

ECE 510 Brendan Crowley Paper Review October 31, 2006.

Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.

Adaptive Video Coding to Reduce Energy on General Purpose Processors Daniel Grobe Sachs, Sarita Adve, Douglas L. Jones University of Illinois at Urbana-Champaign.

Power Management for Memory Systems Ming Chen Nov. 10 th, 2009 ECE 692 Topic Presentation 1.

University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.

Software-Hardware Cooperative Power Management Technique for Main Memory So, today I’m going to be talking about a software-hardware cooperative power.

Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.

Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy Vibhore Vardhan, Daniel G. Sachs, Wanghong Yuan, Albert F. Harris,

/38 Lifetime Management of Flash-Based SSDs Using Recovery-Aware Dynamic Throttling Sungjin Lee, Taejin Kim, Kyungho Kim, and Jihong Kim Seoul.

Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.

Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.

Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.

An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget Represented by: Majid Malaika Authors:

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

An Energy-Efficient Hypervisor Scheduler for Asymmetric Multi- core 1 Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.

1 A New Approach to File System Cache Writeback of Application Data Sorin Faibish – EMC Distinguished Engineer P. Bixby, J. Forecast, P. Armangau and S.

1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.

Efficient Power Management for Memory in Soft Real Time Systems Midterm presentation.

1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.

Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.

1 Presented By: Michael Bieniek. Embedded systems are increasingly using chip multiprocessors (CMPs) due to their low power and high performance capabilities.

Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.

Virtual Memory The memory space of a process is normally divided into blocks that are either pages or segments. Virtual memory management takes.

MIAO ZHOU, YU DU, BRUCE CHILDERS, RAMI MELHEM, DANIEL MOSSÉ UNIVERSITY OF PITTSBURGH Writeback-Aware Bandwidth Partitioning for Multi-core Systems with.

Modeling Virtualized Environments in Simalytic ® Models by Computing Missing Service Demand Parameters CMG2009 Paper 9103, December 11, 2009 Dr. Tim R.

11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.

Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.

An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)

Energy Efficient Prefetching and Caching Athanasios E. Papathanasiou and Michael L. Scott. University of Rochester Proceedings of 2004 USENIX Annual Technical.

Sunpyo Hong, Hyesoon Kim

Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.

HAT: Heterogeneous Adaptive Throttling for On-Chip Networks Kevin Kai-Wei Chang Rachata Ausavarungnirun Chris Fallin Onur Mutlu.

Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Lavanya Subramanian 1.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.

Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems Tsinghua University Tsinghua National Laboratory for Information.

Online Parameter Optimization for Elastic Data Stream Processing Thomas Heinze, Lars Roediger, Yuanzhen Ji, Zbigniew Jerzak (SAP SE) Andreas Meister (University.

OPERATING SYSTEMS CS 3502 Fall 2017

Performance directed energy management using BOS technique

Jacob R. Lorch Microsoft Research

Ching-Chi Lin Institute of Information Science, Academia Sinica

Babak Sorkhpour, Prof. Roman Obermaisser, Ayman Murshed

A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.

Fine-Grain CAM-Tag Cache Resizing Using Miss Tags

A Talk on Adaptive History-Based Memory Scheduling

Qingbo Zhu, Asim Shankar and Yuanyuan Zhou

Presented By: Darlene Banta

Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,

Self-Managed Systems: an Architectural Challenge

Presentation transcript:

Power Management for Hard Disks and Main Memory 11/06/2008 Presented by Matthias Eiblmaier 1

Power-consumption is a key factor to achieve environmental and financial goals 11/06/2008 Matthias Eiblmaier Motivation There are several ways to save power in a computer Throttling CPU speed Set idle RAM banks and ranks into low power mode Throttling disk speed 2

Several approaches have been proposed to save energy by efficient peripheral power management. The two papers that will be discussed today: 11/06/2008 Matthias Eiblmaier Outline A.Performance Directed Energy Management for Main Memory and Disks (by Xiaodong Li, Zhenmin Li, Francis David, Pin Zhou, Yuanyuan Zhou, Sarita Adve and Sanjeev Kumar)‏ Department of Computer Science,University of Illinois, The Proceedings of the Eleventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'04), October, 2004 B.A comprehensive Approach to DRAM Power Management (by Ibrahim Hur and Calvin Lin)‏ Department of Computer Sciences, the University of Texas the 14th IEEE International Symposium on High-Performance Computer Architecture (HPCA 2008), Salt Lake City, Utah, February

11/06/2008 Matthias Eiblmaier Outline A.Performance Directed Energy Management for Main Memory and Disks 1.Introduction and background 2.Performance Guarantees 3.Control Algorithms 4.Disk energy management 5.Experiment 6.Conclusion 7.Critiques 4

11/06/2008 Matthias Eiblmaier 1. Introduction and background You can save power for a storage device by putting them into low power modes. Low power modes can degrade performance Current (threshold based) algorithms: monitor usage (response time) and move if this value exceeds, based on certain thresholds, the device into low power mode. need painstaking, application-dependent manual tuning of thresholds have no performance guarantee 5

11/06/2008 Matthias Eiblmaier 1. Introduction and background This paper contributes: 1. Technique to guarantee performance 2. A self tuning threshold based control algorithm (called PD)‏ 3. A simpler, optimization based, threshold free control algorithm (called PS)‏ RDRAM Memory Power Modes: Each chip can be activated independently There are 4 power modes: active, standby, nap & power down Chip needs to be in active mode to serve are read/write request. Previous control algorithms: Static: put device in a fixed power mode Dynamic: change power mode when being idle for a specific amount of time (threshold)‏ 6

11/06/2008 Matthias Eiblmaier 2. Performance Guarantee Assume the best performance is without energy management An acceptable slowdown is referred to the control algorithm Slowdown is the percentage increase of the execution time To estimate the slowdown, the following terms are used: t = Execution time using the underlaying energy management until some point P in the program = Execution time without any energy management until the same point in the program Delay(t) = absolute increase in execution time due to energy mangement = t – T base (t)‏ Actual percentage slowdown = 7

11/06/2008 Matthias Eiblmaier 2. Performance Guarantee Performance Gurantee is subject to Slack(t) = amount of execution time not violating timing constraints Epoch based algorithm: Application’s execution time can be predicted. Estimates available slack for entire epoch at start of epoch. Check slack after each access. If slack is not enough, algorithm forced all devices in active mode. 8 Example :

11/06/2008 Matthias Eiblmaier 2. Performance Guarantee Available slack for next period: where t epoch is the predicted execution time of the next epoch without power management slackt Slowdown/100 * t Delay(t)‏ slack Slowdown/100 * t epoch t epoch Delay(t+1)‏ 9

11/06/2008 Matthias Eiblmaier 3. Control Algorithm Two kind of algorithms are used for performance guarantee: Performance-directed static algorithm (PS). Fixed power mode to a memory chip for the entire duration of an epoch. Performance-directed dynamic algorithm (PD). Transfers to low power mode after some idle time and re- tunes of the thresholds based on available slack and workload characteristics 10

11/06/2008 Matthias Eiblmaier 3. Control Algorithm (PS) Goal is to choose for every chip a configuration, that maximizes the total energy savings to the constraint of the total available slack: maximize: subject to: where PS Algorithm called at the beginning of every epoch 1. Predict AvailableSlack for the next epoch. 2. Predict E(Ci) and D(Ci) for each device i. 3. Solve the knapsack problem. 4. Set the power mode for each device for the next epoch. 11

11/06/2008 Matthias Eiblmaier 3. Control Algorithm (PS) Obtain available slack from performance-guarantee algorithm Algorithm need to predict next epoch’s number and distribution of accesses. Prediction: Number of accesses: The same as last epoch. Distribution of accesses: Uniform distribution in time. Algorithm reclaim any unused slack from last epoch. 12

11/06/2008 Matthias Eiblmaier 3. Control Algorithm (PD) PD automatically re-tunes its thresholds at the end of each epoch, based on available slack and workload characteristics: 1.Predict AvailableSlack for the next epoch. 2.Predict number of accesses for the next epoch. 3.Adjust the functions for Thk(S) (1≤k≤M-1) access count measured from the last epoch. 4.for k = 1,...,M-1 do Use the Thk(S) functions to determine the values for Thk, 6. end for. 7. Set thresholds Th1,..., ThM for all chips. 13 Manipulate thresholds + slackslack opt Transfer function Per.Dyn. Controller - Command threshold If slack to low set higher thresholds

11/06/2008 Matthias Eiblmaier 3. Control Algorithm (PD) When i>k To keep device active during the short idle time. Using break-even time as threshold. When 0≤i≤k(threshold: Ck-i*tk) Putting a device in mode k unless device id already idle for a large period. The lower value of i the higher threshold. threshold: Ck-i*tk Constant C use to dynamically adjust threshold: Slack not used up: Cnext=0.95*Ccurrent Slack used up: Cnext=2*Ccurrent 14

11/06/2008 Matthias Eiblmaier 4. Disk management Model DRPM disk model: multi-speed disk. Can service request at a low rotational speed. No transition overhead. Performance delay: Period of speed change. Service in low speed. Performance Guarantee Static algorithm: The same as memory. Dynamic algorithm: Algorithm adjust UT and LT based on Predicted access count. Available slack. 15

11/06/2008 Matthias Eiblmaier 5. Experiments The experimental verification are done on a simulator (Simplescalar) with an enhanced RDRAM memory model. Execution times with original algorithms: 16

11/06/2008 Matthias Eiblmaier 5. Experiments Results for Memory: 17

11/06/2008 Matthias Eiblmaier 5. Experiments Experiments and results for Disk: Simulator: DiskSim. Disk:IBM Ultrastar 36Z15. Rotational speed:3K,6K,9K,12K. Access distribution: Exponential,Pareto,Cello’96. 18

11/06/2008 Matthias Eiblmaier 6. Conclusion Improvement PM algorithm’s execution time degrade. Proposing self-tuning energy-management algorithm. 19

11/06/2008 Matthias Eiblmaier 7. Critiques PD/PS cannot guarantee real time Performance guarantee algorithm is not tested for stability PD causes overhead Loop variable Delay, hence slack, is just estimated Experimental verification lacks of substantial benchmarks (e.g. real server workloads) Not exactly stated where and how to implement algorithm (chip, OS) 20

11/06/2008 Matthias Eiblmaier Outline B.Performance Directed Energy Management for Main Memory and Disks 1.Queue-Aware Power-Down Mechanism 2.Power/Performance-Aware Scheduling 3.Adaptive Memory Throttling 4.Delay Estimator Model 5.Simulation and Results 6.Conclusions 7.Critiques 21

11/06/2008 Matthias Eiblmaier 1. Queue-Aware Power- Down Mechanism DRAM Processors/Caches Memory Queue Scheduler Read Write Queues MEMORY CONTROLLER 1.Read/Write instructions are queued in a stack 2.Scheduler (AHB) decides which instruction is preferred 3.Subsequently instructions are transferred into FIFO Memory Queue 22

11/06/2008 Matthias Eiblmaier 1. Queue-Aware Power- Down Mechanism 1.Rank counter is zero -> rank is idle & 2.The rank status bit is 0 -> rank is not yet in a low power mode & 3.There is no command in the CAQ with the same rank number -> avoids powering down if a access of that rank is immanent Read/Write Queue C:1 - R:2 – B:1 – C:1 - R:2 – B:1 – C:1 - R:2 – B:1 – C:1 - R:2 – B:1 – C:1 - R:2 – B:1 – C:1 - R:2 – B:1 – C:1 - R:2 – B:1 – C:1 - R:1 – B:1 – Set rank1 counter to 8 Set rank2 status bit to 8 Decrement counter for rank 2 Decrement counter for rank 1 Set rank2 status bit to 8 Power down rank 1 … 23

11/06/2008 Matthias Eiblmaier 2. Power/Performance- Aware Scheduling 1.An adaptive history scheduler uses the history of recently scheduled memory commands when selecting the next memory command 2.A finite state machine (FSM) groups same-rank commands in the memory as close as possible -> total amount of power-down/up operations is reduced 3.This FSM is combined with performance driven FSM and latency driven FSM 24

11/06/2008 Matthias Eiblmaier 3. Adaptive Memory Throttling 25 DRAM Processors/Caches Memory Queue Scheduler Read Write Queues Reads/Writes MEMORY CONTROLLER Throttle Delay Estimator Throttling Mechanism Model Builder (a software tool, active only during system design/install time) decides to throttle or not, at every cycle determines how much to throttle, at every 1 million cycles Power Target sets the parameters for the delay estimator

11/06/2008 Matthias Eiblmaier 3. Adaptive Memory Throttling Stall all traffic from the memory controller to DRAM for T cycles for every 10,000 cycle intervals... 10,000 cycles T cycles activestall active stall time T cycles How to calculate T (throttling delay)? 26

11/06/2008 Matthias Eiblmaier 3. Adaptive Memory Throttling Model Building A B Application 1 App. 2 T  Throttling degrades performance  Inaccurate throttling  Power consumption is over the budget  Unnecessary performance loss 27

11/06/2008 Matthias Eiblmaier 4. Delay Estimation Model Calculates the throttling delay, T, using a linear model – Input: Power threshold and information about memory access behavior of the application – Output: Throttling delay Calculates the delay periodically (in epochs) – Assumes consecutive epochs have similar behavior – Epoch length is long (1 million cycles): overhead is small What are the features and the coefficients of the linear model? Step 1: Perform experiments with various memory access behavior Step 2: Determine models and model features – Needs human interaction during system design time Step 3: Compute model coefficients – Solution of a linear system of equations 28

11/06/2008 Matthias Eiblmaier 4. Delay Estimation Model Model Building An offline process performed during system design/installation Step 1: Perform experiments with various memory access behavior Step 2: Determine models and model features – Needs human interaction during system design time Step 3: Compute model coefficients – Solution of a linear system of equations 29

11/06/2008 Matthias Eiblmaier 4. Delay Estimation Model Model features that we determine – Power threshold – Number of Reads – Number of Writes – Bank conflict information Possible Models – T1: Uses only Power threshold – T2: Uses Power, Reads, Writes – T3: Uses all features 30

11/06/2008 Matthias Eiblmaier 4. Delay Estimation Model Step 1: Set up a system of equations – Known values are measurement data – Unknowns are model coefficients Step 2: Solve the system R 2 =0.191 R 2 =0.122 R 2 =

11/06/2008 Matthias Eiblmaier 5. Simulation and Results Used a cycle accurate IBM Power5+ simulator that IBM design team uses – Simulated performance and DRAM power – 2.1 GHz, 533-DDR2 Evaluated single thread and SMT configurations – Stream – NAS – SPEC CPU2006fp – Commercial benchmarks Memory Controller 2 cores on a chip SMT capability ~300 million transistors (1.6% of chip area) 32

11/06/2008 Matthias Eiblmaier 5. Simulation and Results Energy efficiency improvements from Power-Down mechanism and Power- Aware Scheduler Stream: 18.1% SPECfp2006: 46.1% 33

11/06/2008 Matthias Eiblmaier 5. Simulation and Results 34

11/06/2008 Matthias Eiblmaier 6. Conclusion Introduced three techniques for DRAM power management – Queue-Aware Power-Down – Power-Aware Scheduler – Adaptive Memory Throttling Evaluated on a highly tuned system, IBM Power5+ – Simple and accurate – Low cost Results in the paper – Energy efficiency improvements from our Power-Down mechanism and Power-Aware Scheduler Stream: 18.1% SPECfp2006: 46.1% 35

11/06/2008 Matthias Eiblmaier 7. Critiques Overhead is not computed or estimated Needs a relative complicated architecture Throttling and queuing result in delays -> no RT Dependence on prediction model 36

11/06/2008 Matthias Eiblmaier Overall Conclusion and comparison 37 PS/PD + Performance Guarantee Queue aware mechanism + Power aware scheduling + Throttling ObjectiveMinimize power + guarantee fixed worst case execution time Minimize power Maximize performance RealizationexperimentalBased on AHB scheduler Real-timeno ImplementationMemory Controller or OS kernel (not specified) Memory Controller MethodologySimulation (Simplescalar)Simulation (IBM Power5+) ControllerAd-hocOpen loop/open loop/ad hoc

11/06/2008 Matthias Eiblmaier Thank You 38

11/06/2008 Matthias Eiblmaier 3. Control Algorithm To enforce performance guarantee, algorithm needs to: Apportion a part of the available to each chip. keep track of the actual delay each chip incurs. Compare actual delay and predicted delay for every epoch 39