Research on Embedded Hypervisor Scheduler Techniques 2014/10/02 1.

Research on Embedded Hypervisor Scheduler Techniques 2014/10/02 1

Background Asymmetric multi-core is becoming increasing popular over homogeneous multi-core systems. ◦ An asymmetric multi-core platform consists of cores with different capabilities.  ARM: big.LITTLE architecture.  Qualcomm: asynchronous Symmetrical Multi- Processing (aSMP)  Nvidia: variable Symmetric Multiprocessing (vSMP)  …etc. 2

Motivation Scheduling goals differ between homogenous and asymmetric multi-core platforms. ◦ Homogeneous multi-core: load-balancing.  Distribute workloads evenly in order to obtain maximum performance. ◦ Asymmetric multi-core: maximize power efficiency with modest performance sacrifices. 3

Motivation(Cont.) Need new scheduling strategies for asymmetric multi-core platform. ◦ The power and computing characteristics vary from different types of cores. ◦ Take the differences into consideration while scheduling. 4

Project Goal Research on the current scheduling algorithms for homogenous and asymmetric multi-core architecture. Design and implement the hypervisor scheduler on asymmetric multi-core platform. Assign virtual cores to physical cores for execution. Minimize the power consumption with performance guarantee. 5

Linaro Linux Kernel GUEST2 Android Framework Scheduler VCPU 6 ARM Cortex-A15 ARM Cortex-A7 OS Kernel GUEST1 Android Framework Scheduler VCPU Hypervisor Performance Power- saving OS Kernel GUEST2 Android Framework Scheduler VCPU Low computing resource requirement High computing resource requirement Task 2 Task 4 VM Introspector b-L vCPU Scheduler VM Introspector gathers task information from Guest OS Task-to- vCPU Mapper Modify the CPU mask of each task according to the task information from VMI [1|0] [0|1] Treat this vCPU as LITTLE core since tasks with low computing requirement are scheduled here. Hypervisor vCPU scheduler will schedule big vCPU to A15, and LITTLE vCPU to A7. VCPU Task 3 Task 1 Hypervisor Architecture with VMI

Hypervisor Scheduler Assigns the virtual cores to physical cores for execution. ◦ Determines the execution order and amount of time assigned to each virtual core according to a scheduling policy. ◦ Xen - credit-based scheduler ◦ KVM - completely fair scheduler 7

Virtual Core Scheduling Problem For every time period, the hypervisor scheduler is given a set of virtual cores. Given the operating frequency of each virtual core, the scheduler will generate a scheduling plan, such that the power consumption is minimized, and the performance is guaranteed. 8

Scheduling Plan Must satisfy three constraints. ◦ Each virtual core should run on each physical core for a certain amount of time to satisfy the workload requirement. ◦ A virtual core can run only on a single physical core at any time. ◦ The virtual core should not switch among physical cores frequently, so as to reduce the overheads. 9

Example of A Scheduling Plan ◦ x: physical core idle 10 V4V4 xxV4V4 x… Core 0 V3V3 xV3V3 xx… Core 1 V2V2 V4V4 V1V1 V2V2 V4V4 … Core 2 V1V1 V1V1 V2V2 V3V3 V1V1 … Core 3 t1t1 t2t2 t3t3 t4t4 t 100 Execution Slice

Three-phase Solution [Phase 1] generates the amount of time each virtual core should run on each physical core. [Phase 2] determines the execution order of each virtual core on a physical core. [Phase 3] exchanges the order of execution slice in order to reduce the number of core switching. 11

Phase 1 Given the objective function and the constraints, we can use integer programming to find a i,j. ◦ a i,j : the amount of time slices virtual core i should run on physical core j.  Divide a time interval into time slices. ◦ Integer programming can find a feasible solution in a short time when the number of vCPUs and the number of pCPUs are small constants. 12

Phase 1(Cont.) If the relationship between power and load is linear. ◦ Use greedy instead. ◦ Assign virtual core to the physical core with the least power/instruction ratio and load under100%. 13

Phase 2 With the information from phase 1, the scheduler has to determine the execution order of each virtual core on each physical core. ◦ A virtual core cannot appear in two or more physical core at the same time. 14

Example vCPU 0 (50,40,0, 0) t=100 t=0 vCPU 3 (10,10,20, 20) vCPU 1 (20,20,20, 20) vCPU 4 (10,10,10, 10) vCPU 2 (10,10,20, 20) vCPU 5 (0, 0,10, 10)

Phase 2(Cont.) We can formulate the problem into an Open-shop scheduling problem (OSSP). ◦ OSSP with preemption can be solved in polynomial time. [1] 16 [1] T. Gonzalez and S. Sahni. Open shop scheduling to minimize finish time. J. ACM, 23(4):665–679, Oct. 1976.

After Phase 1 & 2 After the first two phases, the scheduler generates a scheduling plan. ◦ x: physical core idle 17 V4V4 xxV4V4 x… Core 0 V3V3 xV3V3 xx… Core 1 V2V2 V4V4 V1V1 V2V2 V4V4 … Core 2 V1V1 V1V1 V2V2 V3V3 V1V1 … Core 3 t1t1 t2t2 t3t3 t4t4 t 100 Execution Slice

Phase 3 Migrating tasks between cores incurs overhead. Reduce the overhead by exchanging the order to minimize the number of core switching. 18

Number of Switching Minimization Problem Given a scheduling plan, we want to find an order of the execution slice, such that the cost is minimized. ◦ An NPC problem  Reduce from the Hamilton Path Problem. ◦ Propose a greedy heuristic. 19

Example 123 312 x23 xx1 432 213 p1p1 p2p2 p3p3 #switching = 0

Example 1230 3120 x230 xx10 4320 2130 p1p1 p2p2 p3p3

Example 123 312 x23 432 213 p1p1 p2p2 p3p3 xx1t1t1

Example 1231 3121 x230 4320 2131 p1p1 p2p2 p3p3 xx1t1t1

Example 123 312 432 213 p1p1 p2p2 p3p3 x23t2t2 xx1t1t1

Example 1231 3123 4322 2132 p1p1 p2p2 p3p3 x23t2t2 xx1t1t1

Example 312 432 213 p1p1 p2p2 p3p3 123t3t3 x23t2t2 xx1t1t1 #switching = 1

Example p1p1 p2p2 p3p3 213t6t6 312t5t5 432t4t4 123t3t3 x23t2t2 xx1t1t1 #switching = 7

Evaluation Conduct simulations to compare the power consumption of our asymmetry- aware scheduler with that of a credit- based scheduler. Compare the numbers of core switching from our greedy heuristic and that from an optimal solution. 28

Environment Two types of physical cores ◦ power-hunger “big” cores  frequency: 1600MHz ◦ power-efficient “little” cores  frequency: 600MHz ◦ The DVFS mechanism is disabled. 30

Power Model Relation between power consumption, core frequency, and load. ◦ bzip2

Scenario I – 2 Big and 2 Little Dual-core VM. Two sets of input: ◦ Case 1: Both VMs with light workloads.  250MHz for each virtual core. ◦ Case 2: One VM with heavy workloads, the other with modest workloads.  Heavy:1200MHz for each virtual core  Modest:600MHz for each virtual core. 32

Scenario I - Results ◦ Case 1: asymmetry-aware method is about 43.2% of that of credit-based method. ◦ Case 2:asymmetry-aware method uses 95.6% of energy used by the credit-base method. 33 Power(Watt) Case 1 Light-load VMs Asymmetry-aware0.295 Credit-based0.683 Case 2 Heavy-load VM + Modest-load VM Asymmetry-aware2.382 Credit-based2.491

Scenario 2 – 4 Big and 4 Little Quad-core VM. Three cases 34 VM1VM2VM3 Case 1 Light-load All 250 MHz Case 2 Modest-load All 600MHz All 250 MHz Case 3 Heavy-load All 1600MHz

Scenario 2 - Results  In case 3, the loading of physical cores are 100% using both methods.  Cannot save power if the computing resources are not enough. 35 Power(Watt)Savings Case 1 Light-load Asymmetry-aware1.205 41.2% Credit-based2.049 Case 2 Modest-Load Asymmetry-aware3.524 11.1% Credit-based3.960 Case 3* Heavy-load Asymmetry-aware6.009 0% Credit-based6.009

Setting 25 sets of input ◦ 4 physical cores, 12 virtual cores, 24 distinct execution slices. Optimal solution ◦ Enumerates all possible permutations of the execution slices. ◦ Use A* search to reduce the search space. 37

Evaluation Result 38 Greedy HeuristicA* Search Average number of switching 31.227.7 Average execution time 0.006 seconds10+ minutes

XEN HYPERVISOR SCHEDULER: CODE STUDY 39

Xen Hypervisor Scheduler: ◦ xen/common/  schedule.c  sched_credit.c  sched_credit2.c  sched_sedf.c  sched_arinc653.c 40

xen/common/schedule.c Generic CPU scheduling code ◦ implements support functionality for the Xen scheduler API. ◦ scheduler: default to credit-base scheduler static void schedule(void) ◦ de-schedule the current domain. ◦ pick a new domain. 41

xen/common/sched_credit.c Credit-based SMP CPU scheduler static struct task_slice csched_schedule; ◦ Implementation of credit-base scheduling. ◦ SMP Load balance.  If the next highest priority local runnable VCPU has already eaten through its credits, look on other PCPUs to see if we have more urgent work. 42

xen/common/sched_credit2.c Credit-based SMP CPU scheduler ◦ Based on an earlier version. static struct task_slice csched2_schedule; ◦ Select next runnable local VCPU (i.e. top of local run queue). static void balance_load(const struct scheduler *ops, int cpu, s_time_t now); 43

Scheduling Steps Xen call do_schedule() of current scheduler on each physical CPU(PCPU). Scheduler selects a virtual CPU(VCPU) from run queue, and return it to Xen hypervisor. Xen hypervisor deploy the VCPU to current PCPU. 44

Adding Our Scheduler Our scheduler periodically generates a scheduling plan. Organize the run queue of each physical core according to the scheduling plan. Xen hypervisor assigns VCPU to PCPU according to the run queue. 45

Current Status We propose a three-phase solution for generating a scheduling plan on asymmetric multi-core platform. Our simulation results show that the asymmetry-aware strategy results in a potential energy savings of up to 56.8% against the credit-based method. On going: implement the solution into Xen hypervisor. 46

Questions or Comments? 47

Research on Embedded Hypervisor Scheduler Techniques 2014/10/02 1.

Similar presentations

Presentation on theme: "Research on Embedded Hypervisor Scheduler Techniques 2014/10/02 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Research on Embedded Hypervisor Scheduler Techniques 2014/10/02 1.

Similar presentations

Presentation on theme: "Research on Embedded Hypervisor Scheduler Techniques 2014/10/02 1."— Presentation transcript:

Similar presentations

About project

Feedback