Energy-Efficient Mapping and Scheduling for DVS Enabled Distributed Embedded Systems Marcus T. Schmitz and Bashir M. Al-Hashimi University of Southampton, United Kingdom Petru Eles Linköping University, Sweden
Contents Motivation & Introduction Dynamic Voltage Scaling Co-Synthesis with DVS Consideration DVS optimised Scheduling DVS optimised Mapping Experimental Results Conclusions
Motivation Low Energy: Portable Applications Autonomous Systems Feasibilty Issues (SoC - heat) Operational Cost and Environmental Reasons System Level Co-Design: Shrinking Time-To-Market Windows Reducing Production Cost High Degree of Optimisation Freedom
Introduction Dynamic Voltage Scaling System Level Co-Synthesis Energy-Efficient Co-Synthesis for DVS Sytems
Dynamic Voltage Scaling (DVS) Energy vs. Speed 1.2 DVS Processor 1 Frequency 0.8 VR f Reg. Energy 0.6 Voltage/Frequency 0.4 0.2 1 1.5 2 2.5 3 3.5 4 4.5 5 1/Speed Available from: Transmeta, AMD, Intel
Co-Synthesis for DVS Systems System Specification, Technology Lib. Allocation Mapping Designer driven Scheduling EE-GMA EE-GLSA Voltage Scaling Evaluation
DVS in Distributed Systems [23] Input: Scheduling (mapping) Power profile Output: scaled voltage for each DVS task Emax Esc < Emax P P Slack PE0 PE0 CL0 CL0 2.3V 3.3V 2.4V PE1 PE1 d d t t Voltage Scaling @ Vmax @ dyn. V
Energy-Efficient Scheduling Two objectives: Timing feasibility Garantee deadlines Low energy dissipation Optimisation DVS usability – Slack time Traditional scheduling technique focus mainly on timing feasibility! Problem due to power variations: Simply increase deadline slack leads to sub-optimal solutions!
Energy-Efficient Scheduling P E=71J E=65.6J P PE0 t t 4 5 t t 4 5 Slack Savings DVS PE1 t 1 t t 2 t t 1 t 2 PE2 Slack t 3 t 6 t 3 t 6 t t S2: E=71J E=53.9J P P Slack Slack Savings PE0 t 4 t 5 t 4 t 5 DVS PE1 t t t 2 1 t t 1 t 2 PE2 t t 3 t 3 6 t 6 t t
Energy-Efficient Scheduling Based on Genetic List Scheduling Algorithm [6,10] Task priorities are encoded into priorities strings t0 Schedule List Scheduler t1 t2 t3 t4 Duties of the Scheduler: Select ready task with highest priority Schedule selected task Update schedule and ready list Repeat until no un-scheduled task is left PS 4 3 9 7 2 t0 t1 t2 t3 t4
EE-GLSA Initial Population List Scheduler DVS Insertion Assign fitness 3 7 8 1 2 3 2 1 No Hole Filling! No Mapping! Initial Population List Scheduler DVS Insertion Timing, Energy Assign fitness Mutation Rank individuals Optimised Population low high Mating Selection GA
Advantages Optimisation can be based on an arbitrary complex fitness function, including: Timing Energy (DVS technique) Enlarged search space (|T+C|! different schedules) Trade-off freedom: Synthesis time <-> quality Easily adaptable to computing clusters Multiple populations with immigration scheme
Hole Filling Problem Hole filling t0 PE0 t4 t2 t3 t3 t1 t4 t2 PE1 t0 7 t2 t3 t3 1 d3 t1 d2 d3,4 4 t4 6 d4 t2 4 PE1 d2 t0 t1 Therefore, priorities decide solely upon execution order!
Task Mapping Why seperation from the list scheduling? Regardless of priorties, greedy mapping P t0 7 PE0 LS t1 4 d1 PE1 t t2 d1,2 5 d2
Task Mapping Make greedy mapping decision based on: Timing Energy t0 ? 7 PE0 ? LS t1 4 d1 PE1 ? t t2 d1,2 5 d2
Task Mapping Make mapping decision based on: Timing Energy t0 t0 LS t1 7 PE0 t0 LS t1 4 d1 PE1 t t2 d1,2 5 d2
Task Mapping Make mapping decision based on: Timing Energy t0 t0 ? LS 7 PE0 t0 ? LS t1 4 d1 PE1 ? t t2 d1,2 5 d2
Task Mapping Make mapping decision based on: Timing Energy t0 t0 LS t1 7 PE0 t0 LS t1 4 d1 PE1 t2 t t2 d1,2 5 d2
Task Mapping Make mapping decision based on: Timing Energy t0 t0 LS t1 7 PE0 t0 LS t1 4 d1 PE1 t2 t t2 d1,2 5 d2
Task Mapping Make mapping decision based on: Timing Energy t0 t0 LS t1 7 PE0 t0 LS t1 4 d1 PE1 t2 t1 t t2 d1,2 5 d2
Task Mapping Make mapping decision based on: Timing Energy t0 t0 t2 LS 7 PE0 t0 t2 LS t1 4 d1 PE1 t1 t t2 d1,2 5 d2
Genetic Mapping Algorithm [8] Task mapping are encoded into mapping strings task PE 0 1 1 2 2 3 4 5 6 CPU DVS-CPU ASIC 0 1 2 d 5 3 6 4 0 1 2 Chromosome
EE-GMA Including DVS Initial Population EE-GLSA Insertion Timing, Energy + Area Insertion Assign fitness Mutation Rank individuals Optimised Population low high Mating Selection GA
Experimental Results 4 Benchmark Sets: 27 generated by TGFF [7] 8 to 100 tasks: Power variations 2.6 2 Hou examples taken from [13] 8 to 20 tasks: Power variations 11 TG1 and TG2 taken from [11] 60 examples with 30 tasks, each: No power variations Measurement application taken from [3] 12 tasks: No power profile is provided Power and time overhead for DVS is neglected Average results of 5 optimisation runs
Schedule Optimisation
Schedule Optimisation
Mapping Optimisation
Conclusions DVS capability can achieve high energy savings in distributed embedded systems Proposed a new energy-efficient two-step mapping and scheduling approach Iterative improvement provides high savings / ad hoc constructive techniques are not suitable Optimisation times are reasonable Additional objectives can be easily included Consideration of power profile information leads to further energy reductions