Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and.

Similar presentations


Presentation on theme: "Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and."— Presentation transcript:

1 Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and Engineering University of California, San Diego ISLPED 2007

2 Why Dynamic Voltage Frequency Scaling?  Power consumption is a critical issue in system design today Mobile systems face battery life issues High performance systems face heating issues  Dynamic Voltage Frequency Scaling (DVFS): Dynamically scale the supply voltage level of CPU to provide “just enough” circuit speed to process the workload An effective system level technique to reduce power consumption  Dynamic Power Management (DPM) is another popular system level technique. However focus of this work is on DVFS

3 Previous Work  Based on task level knowledge: [Yao95],[Ishihara98],[Quan02]  Based on compiler/app. support: [Azevedo02],[Hsu02],[Chung02]  Based on micro-architecture level support: [Marculescu00],[Weissel02],[Choi04], [Choi05]

4 Workload Characterization and Voltage-Frequency Selection  No hard task deadlines in general purpose system.  Goal: Maximize energy savings while minimizing performance delay.  Key idea: CPU-intensive tasks don’t benefit from scaling Memory intensive tasks energy efficient at low v-f settings

5 Workload Characterization and Voltage-Frequency Selection (contd.) Three tasks burn_loop (CPU-intensive), mem (memory intensive) and combo (mix) run with static scaling. burn_loop energy efficient at all settings mem energy efficient at lowest v-f setting

6 Measure CPU-intensiveness (µ)  CPI Stack CPI avg =CPI base +CPI cache +CPI tlb +CPI branch +CPI stall  Use Performance Monitoring Unit (PMU) of PXA27x to estimate CPI stack components.  µ = CPI base /CPI avg  High µ indicates high CPU-intensiveness and vice versa

7 Dynamic Task Characterization  Dynamically estimate µ for every scheduler quantum and feed it to the online learning algorithm.  The algorithm models the CPU- intensiveness of the task and accordingly selects the best suited v-f setting.  Theoretical guarantee on converging to the best v-f setting available.

8 Online Learning for Horse Racing Experts Selects the best performing expert for investing his money Expert manages money for the race Evaluates performance of all experts for that race

9 Online Learning for DVFS DVFS Experts (Working Set) Selects the best performing expert Selected expert applied to CPU for next scheduler quantum Evaluates performance of all experts ….. v-f setting 1 DVFS Controller CPU v-f setting 2v-f setting n

10 Controller Algorithm Parameters: Initial weight vector for experts such that Do for t = 1,2,3….. 1.Calculate µ. 2.Update weight vector of task: w i t+1 = w i t. (1-(1-ß). l i t 3.Choose expert with highest probability factor in : 4. Apply the v-f setting corresponding to the selected expert to the CPU. 5. Reset and restart the PMU Sched. tick occurs

11 Evaluation of experts (loss calculation) 0.1 0.3 0.5 0.7 0.9 0 0.2 0.60.8 0.4 Expert1 µmean µ Expert3 µmean Expert4 µmean Expert5 µmean Expert2 µmean 1.0  Intuition: Best suited frequency scales linearly with µ.  Map task characteristics to the best suited frequency using µ-mapper. Eg: Expert1-5={100,200,300,400,500}MHz  Evaluate experts against the best suited frequency.

12 What about Multi-tasking systems?  Possible for task with differing characteristics to execute together.  Weight vector (w t ) characterizes an executing task.  Need to personalize this information at task level for accurate characterization.  Solution: store weight vector as a task level structure

13 Performance bound on Controller  If l t i is the loss incurred by expert i for the scheduler quantum t: = r t.l t  Goal to minimize net loss: L G –min i L i where, r t.l t and  Net loss bounded by  Average net loss per period decreases at the rate of Performance of the scheme converges to that of best performing expert with successive sched ticks Let N: experts in working set, T: total number of sched ticks

14 Implementation  Testbed Intel PXA27x Development Platform Linux 2.6.9 Implemented as Loadable Kernel Module DVFS LKM Task Creation Scheduler Tick Linux Process Manager Intel PXA27x /proc file system Linux Kernel User PMU vf setting

15 Experiments  Setup 1.25 samples/sec DAQ Energy savings calculated using actual current measurements  Working set: 4 v-f setting experts  Workloads: qsort djpeg blowfish dgzip Freq (MHz) Voltage (V) 2081.2 3121.3 4161.4 5201.5

16 Results: Single Task Environment Bench. Low perf delay -------> Higher energy savings %delay%energy%delay%energy%delay%energy qsort 61716322541 djpeg 7 2115372645 dgzip 153021422749 bf 61116272540 Bench. 208MHz/1.2V %delay%energy qsort 5648 djpeg 3454 dgzip 3354 bf 4051

17 Result: Frequency of Selection For qsort Higher energy savings Lower Perf Delay

18 Results: Multi Task Environment Bench. Low perf delay -------> Higher energy savings %delay%energy%delay%energy%delay%energy qsort+djpeg 61715332541 djpeg+dgzip 132419392748 qsort+djpeg 72018352642 dgzip+bf 131822322744

19 Advantages of the scheme  Online learning algorithm: Provides theoretical guarantee on performance converging to that of the best performing expert.  Multi-Tasking systems: Works seamlessly across context switches.  User preference: Adapts energy savings/performance delay tradeoff with changes in user preference.

20 Overhead  Process Creation: used lat_proc from lmbench. 0% overhead  Context Switch: used lat_ctx from lmbench 3% overhead with 20 processes (max supported by lat_ctx) [choi05] cause 100% overhead in context switch times  Extremely lightweight implementation.

21 Conclusion  Designed and implemented a DVFS technique for general purpose multi- tasking systems.  Based on online learning that provides theoretical guarantee on the convergence of overall performance to that of the best performing expert.  Provides user control over desired energy/performance tradeoff and is extremely lightweight.


Download ppt "Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and."

Similar presentations


Ads by Google