Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and.

Slides:



Advertisements
Similar presentations
Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Advertisements

1 “Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation In Multi-processor Real-Time Systems” Dakai Zhu, Rami Melhem, and Bruce Childers.
Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.
International Symposium on Low Power Electronics and Design Energy-Efficient Non-Minimal Path On-chip Interconnection Network for Heterogeneous Systems.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Real- time Dynamic Voltage Scaling for Low- Power Embedded Operating Systems Written by P. Pillai and K.G. Shin Presented by Gaurav Saxena CSE 666 – Real.
Power Reduction Techniques For Microprocessor Systems
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
Towards Power Efficiency on Task-Based, Decoupled Access-Execute Models Konstantinos Koukos David Black-Schaffer Vasileios Spiliopoulos Stefanos Kaxiras.
Institute of Networking and Multimedia, National Taiwan University, Jun-14, 2014.
Green Governors: A Framework for Continuously Adaptive DVFS Vasileios Spiliopoulos, Stefanos Kaxiras Uppsala University, Sweden.
Operating Systems 1 K. Salah Module 2.1: CPU Scheduling Scheduling Types Scheduling Criteria Scheduling Algorithms Performance Evaluation.
Aleksandra Tešanović Low Power/Energy Scheduling for Real-Time Systems Aleksandra Tešanović Real-Time Systems Laboratory Department of Computer and Information.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
Performance and Energy Bounds for Multimedia Applications on Dual-processor Power-aware SoC Platforms Weng-Fai WONG 黄荣辉 Dept. of Computer Science National.
System-Wide Energy Minimization for Real-Time Tasks: Lower Bound and Approximation Xiliang Zhong and Cheng-Zhong Xu Dept. of Electrical & Computer Engg.
Bandwidth Allocation in a Self-Managing Multimedia File Server Vijay Sundaram and Prashant Shenoy Department of Computer Science University of Massachusetts.
Power-aware Computing n Dramatic increases in computer power consumption: » Some processors now draw more than 100 watts » Memory power consumption is.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
Towards Eco-friendly Database Management Systems W. Lang, J. M. Patel (U Wisconsin), CIDR 2009 Shimin Chen Big Data Reading Group.
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Power Issues in On-chip Interconnection Networks Mojtaba Amiri Nov. 5, 2009.
Baoxian Zhao Hakan Aydin Dakai Zhu Computer Science Department Computer Science Department George Mason University University of Texas at San Antonio DAC.
Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.
Energy-Efficient Soft Real-Time CPU Scheduling for Mobile Multimedia Systems Wanghong Yuan, Klara Nahrstedt Department of Computer Science University of.
Low Power Design for Real-Time Systems Low power (energy) consumption is a key design for embedded systems Battery’s life during operation Reliability.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Games are Up for DVFS Yan Gu Samarjit Chakraborty Wei Tsang Ooi Department of Computer Science National University of Singapore.
Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.
Energy Savings with DVFS Reduction in CPU power Extra system power.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
Power Management of Flash Memory for Portable Devices ELG 4135, Fall 2006 Faculty of Engineering, University of Ottawa November 1, 2006 Thayalan Selvam.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
1 Our focus  scheduling a single CPU among all the processes in the system  Key Criteria: Maximize CPU utilization Maximize throughput Minimize waiting.
Energy Management in Virtualized Environments Gaurav Dhiman, Giacomo Marchetti, Raid Ayoub, Tajana Simunic Rosing (CSE-UCSD) Inside Xen Hypervisor Online.
Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K.
Computer Science Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs Min Yeol Lim Computer Science Department Sep.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
NC STATE UNIVERSITY 1 Feedback EDF Scheduling w/ Async. DVS Switching on the IBM Embedded PowerPC 405 LP Frank Mueller North Carolina State University,
Parametric Optimization Of Some Critical Operating System Functions An Alternative Approach To The Study Of Operating Systems Design.
Morgan Kaufmann Publishers
Critical Power Slope: Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi †,Charles Lefurgy ‡, Eric Van Hensbergen ‡, Ram Rajamony ‡,
1 Real-Time Scheduling. 2Today Operating System task scheduling –Traditional (non-real-time) scheduling –Real-time scheduling.
Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.
Multimedia Computing and Networking Jan Reduced Energy Decoding of MPEG Streams Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs.
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Energy-aware QoS packet scheduling.
E-MOS: Efficient Energy Management Policies in Operating Systems
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Dynamic Power Management Using Online Learning Gaurav Dhiman, Tajana Simunic Rosing (CSE-UCSD) Existing DPM policies do not adapt optimally with changing.
Application-Aware Traffic Scheduling for Workload Offloading in Mobile Clouds Liang Tong, Wei Gao University of Tennessee – Knoxville IEEE INFOCOM
Lecturer 5: Process Scheduling Process Scheduling  Criteria & Objectives Types of Scheduling  Long term  Medium term  Short term CPU Scheduling Algorithms.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Yoav Etsion, Dan Tsafrir, Dror G. Feitelson
OPERATING SYSTEMS CS3502 Fall 2017
Networks and Operating Systems: Exercise Session 2
Babak Sorkhpour, Prof. Roman Obermaisser, Ayman Murshed
Department of Computer Science University of California, Santa Barbara
Computer Architecture
Scheduling of Regular Tasks in Linux
Department of Computer Science University of California, Santa Barbara
Scheduling of Regular Tasks in Linux
Presentation transcript:

Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and Engineering University of California, San Diego ISLPED 2007

Why Dynamic Voltage Frequency Scaling?  Power consumption is a critical issue in system design today Mobile systems face battery life issues High performance systems face heating issues  Dynamic Voltage Frequency Scaling (DVFS): Dynamically scale the supply voltage level of CPU to provide “just enough” circuit speed to process the workload An effective system level technique to reduce power consumption  Dynamic Power Management (DPM) is another popular system level technique. However focus of this work is on DVFS

Previous Work  Based on task level knowledge: [Yao95],[Ishihara98],[Quan02]  Based on compiler/app. support: [Azevedo02],[Hsu02],[Chung02]  Based on micro-architecture level support: [Marculescu00],[Weissel02],[Choi04], [Choi05]

Workload Characterization and Voltage-Frequency Selection  No hard task deadlines in general purpose system.  Goal: Maximize energy savings while minimizing performance delay.  Key idea: CPU-intensive tasks don’t benefit from scaling Memory intensive tasks energy efficient at low v-f settings

Workload Characterization and Voltage-Frequency Selection (contd.) Three tasks burn_loop (CPU-intensive), mem (memory intensive) and combo (mix) run with static scaling. burn_loop energy efficient at all settings mem energy efficient at lowest v-f setting

Measure CPU-intensiveness (µ)  CPI Stack CPI avg =CPI base +CPI cache +CPI tlb +CPI branch +CPI stall  Use Performance Monitoring Unit (PMU) of PXA27x to estimate CPI stack components.  µ = CPI base /CPI avg  High µ indicates high CPU-intensiveness and vice versa

Dynamic Task Characterization  Dynamically estimate µ for every scheduler quantum and feed it to the online learning algorithm.  The algorithm models the CPU- intensiveness of the task and accordingly selects the best suited v-f setting.  Theoretical guarantee on converging to the best v-f setting available.

Online Learning for Horse Racing Experts Selects the best performing expert for investing his money Expert manages money for the race Evaluates performance of all experts for that race

Online Learning for DVFS DVFS Experts (Working Set) Selects the best performing expert Selected expert applied to CPU for next scheduler quantum Evaluates performance of all experts ….. v-f setting 1 DVFS Controller CPU v-f setting 2v-f setting n

Controller Algorithm Parameters: Initial weight vector for experts such that Do for t = 1,2,3….. 1.Calculate µ. 2.Update weight vector of task: w i t+1 = w i t. (1-(1-ß). l i t 3.Choose expert with highest probability factor in : 4. Apply the v-f setting corresponding to the selected expert to the CPU. 5. Reset and restart the PMU Sched. tick occurs

Evaluation of experts (loss calculation) Expert1 µmean µ Expert3 µmean Expert4 µmean Expert5 µmean Expert2 µmean 1.0  Intuition: Best suited frequency scales linearly with µ.  Map task characteristics to the best suited frequency using µ-mapper. Eg: Expert1-5={100,200,300,400,500}MHz  Evaluate experts against the best suited frequency.

What about Multi-tasking systems?  Possible for task with differing characteristics to execute together.  Weight vector (w t ) characterizes an executing task.  Need to personalize this information at task level for accurate characterization.  Solution: store weight vector as a task level structure

Performance bound on Controller  If l t i is the loss incurred by expert i for the scheduler quantum t: = r t.l t  Goal to minimize net loss: L G –min i L i where, r t.l t and  Net loss bounded by  Average net loss per period decreases at the rate of Performance of the scheme converges to that of best performing expert with successive sched ticks Let N: experts in working set, T: total number of sched ticks

Implementation  Testbed Intel PXA27x Development Platform Linux Implemented as Loadable Kernel Module DVFS LKM Task Creation Scheduler Tick Linux Process Manager Intel PXA27x /proc file system Linux Kernel User PMU vf setting

Experiments  Setup 1.25 samples/sec DAQ Energy savings calculated using actual current measurements  Working set: 4 v-f setting experts  Workloads: qsort djpeg blowfish dgzip Freq (MHz) Voltage (V)

Results: Single Task Environment Bench. Low perf delay > Higher energy savings %delay%energy%delay%energy%delay%energy qsort djpeg dgzip bf Bench. 208MHz/1.2V %delay%energy qsort 5648 djpeg 3454 dgzip 3354 bf 4051

Result: Frequency of Selection For qsort Higher energy savings Lower Perf Delay

Results: Multi Task Environment Bench. Low perf delay > Higher energy savings %delay%energy%delay%energy%delay%energy qsort+djpeg djpeg+dgzip qsort+djpeg dgzip+bf

Advantages of the scheme  Online learning algorithm: Provides theoretical guarantee on performance converging to that of the best performing expert.  Multi-Tasking systems: Works seamlessly across context switches.  User preference: Adapts energy savings/performance delay tradeoff with changes in user preference.

Overhead  Process Creation: used lat_proc from lmbench. 0% overhead  Context Switch: used lat_ctx from lmbench 3% overhead with 20 processes (max supported by lat_ctx) [choi05] cause 100% overhead in context switch times  Extremely lightweight implementation.

Conclusion  Designed and implemented a DVFS technique for general purpose multi- tasking systems.  Based on online learning that provides theoretical guarantee on the convergence of overall performance to that of the best performing expert.  Provides user control over desired energy/performance tradeoff and is extremely lightweight.