Research on Embedded Hypervisor Scheduler Techniques

Research on Embedded Hypervisor Scheduler Techniques
Project overview 2014/10/20

Background Asymmetric multi-core is becoming increasing popular over homogeneous multi-core systems. An asymmetric multi-core platform consists of cores with different capabilities. ARM: big.LITTLE architecture. Qualcomm: asynchronous Symmetrical Multi-Processing (aSMP) Nvidia: variable Symmetric Multiprocessing (vSMP) …etc.

ARM big.LITTLE Core Developed by ARM in Oct. 2011.
Combine two kinds of architecturally compatible cores. To create a multi-core processor that can adjust better to dynamic computing needs and use less power than clock scaling alone. big cores are more powerful but power-hungry, while LITTLE cores are low-power but (relatively) slower.

Three Types of Models Cluster migration
CPU migration(In-Kernel Switcher) Heterogeneous multi-processing (global task scheduling)

Motivation Scheduling goals differ between homogenous and asymmetric multi-core platforms. Homogeneous multi-core: load-balancing. Distribute workloads evenly in order to obtain maximum performance. Asymmetric multi-core: maximize power efficiency with modest performance sacrifices.

Motivation(Cont.) Need new scheduling strategies for asymmetric multi-core platform. The power and computing characteristics vary from different types of cores. Take the differences into consideration while scheduling.

Current Hypervisor Architecture and Problem
Low computing resource requirement High computing resource requirement OS Kernel GUEST1 Scheduler VCPU OS Kernel GUEST2 Scheduler VCPU OS Kernel GUEST2 Scheduler VCPU Task 1 Task 3 Task 2 Task 4 If Guest OS scheduler is not big.LITTLE-aware, it will assign tasks to vCPUs evenly in order to achieve load balancing. Hypervisor vCPU scheduler will assign vCPUs evenly to physical ARM cores since it is not big.LITTLE-aware. Hypervisor vCPU Scheduler ARM Cortex-A15 Cannot take advantage on big.LITTLE core architeture ARM Cortex-A7 Performance Power-saving

Current Hypervisor Architecture and Problem(Cont.)
OS Kernel GUEST1 Scheduler VCPU OS Kernel GUEST2 Scheduler VCPU Assume that the scheduler in the Guest OS is big.LITTLE-aware. The vCPU are either big or little. Hypervisor vCPU scheduler will assign vCPUs evenly to physical ARM cores in order to achieve load-balancing. Hypervisor vCPU Scheduler ARM Cortex-A15 Cannot take advantage on big.LITTLE core architeture ARM Cortex-A7 Performance Power-saving Waste energy Performance Degradation VCPU

Project Goal Research on the current scheduling algorithms for homogenous and asymmetric multi-core architecture. Design and implement the hypervisor scheduler on asymmetric multi-core platform. Assign virtual cores to physical cores for execution. Minimize the power consumption with performance guarantee.

Challenge The hypervisor scheduler cannot take advantage of big.LITTLE architecture if the scheduler inside guest OS is not big.LITTLE aware.

Current Hypervisor Architecture and Problem(Cont.)
OS Kernel GUEST1 Android Framework Scheduler VCPU OS Kernel GUEST2 Android Framework Scheduler VCPU If Guest OS scheduler is not big.LITTLE-aware, it will assign tasks to vCPUs evenly in order to achieve load balancing. Task 3 Task 4 Even if hypervisor vCPU scheduler is big.LITTLE-aware, it will schedule these vCPUs to either big cores or LITTLE cores since they have the same loading. Task 1 Task 2 Hypervisor b-L vCPU Scheduler Both on big core ARM Cortex-A15 Cannot take advantage on big.LITTLE core architeture ARM Cortex-A7 Ot both on LITTLE core Performance Power-saving

Possible Solution Apply VM introspection(VMI) to retrieve the process list in a VM. VMI is a technique that allows the hypervisor to inspect the contents of the VM in real-time. Modify the CPU masks of tasks in the VM in order to create an illusion of “big vCPU” and “LITTLE vCPU”. Hypervisor scheduler can assign the vCPU to corresponding big or LITTLE cores.

Hypervisor Architecture with VMI
Low computing resource requirement High computing resource requirement OS Kernel GUEST1 Android Framework Scheduler VCPU Linaro Linux Kernel GUEST2 Android Framework Scheduler VCPU OS Kernel GUEST2 Android Framework Scheduler VCPU [1|0] [0|1] Task 1 Task 3 Task 2 Task 4 VM Introspector gathers task information from Guest OS Treat this vCPU as LITTLE core since tasks with low computing requirement are scheduled here. VCPU Hypervisor Task-to-vCPU Mapper Hypervisor vCPU scheduler will schedule big vCPU to A15, and LITTLE vCPU to A7. Modify the CPU mask of each task according to the task information from VMI VM Introspector b-L vCPU Scheduler ARM Cortex-A15 ARM Cortex-A7 Performance Power-saving

Hypervisor Architecture with VMI(Cont.)
Guest OS 1 has two task with high computing requirement and two task with low computing requirement Guest OS 2 has two task with low computing requirement OS Kernel GUEST1 Android Framework Scheduler VCPU OS Kernel GUEST2 Android Framework Scheduler VCPU [1|0] [0|1] Task 1 Task 3 Task 1 Task 2 [1|1] Task 2 Task 4 VM Introspector gathers task information from Guest OS Treat this vCPU as LITTLE core since tasks with low computing requirement are scheduled here. VCPU VCPU VCPU Hypervisor Task-to-vCPU Mapper Hypervisor vCPU scheduler will schedule big vCPU to A15, and LITTLE vCPU to A7. Modify the CPU mask of each task according to the task information from VMI VM Introspector b-L vCPU Scheduler ARM Cortex-A15 ARM Cortex-A7 Performance Power-saving

Hypervisor Scheduler Assigns the virtual cores to physical cores for execution. Determines the execution order and amount of time assigned to each virtual core according to a scheduling policy. Xen - credit-based scheduler KVM - completely fair scheduler

Virtual Core Scheduling Problem
For every time period, the hypervisor scheduler is given a set of virtual cores. Given the operating frequency of each virtual core, the scheduler will generate a scheduling plan, such that the power consumption is minimized, and the performance is guaranteed.

Scheduling Plan Must satisfy three constraints.
Each virtual core should run on each physical core for a certain amount of time to satisfy the workload requirement. A virtual core can run only on a single physical core at any time. The virtual core should not switch among physical cores frequently, so as to reduce the overheads.

Core Models There are two types of cores – virtual cores and physical cores. vj: frequency of the virtual core fi: frequency of the physical core ti, tj: type of the core

Power Model To decide the power model, we have done some preliminary experiments to measure the power consumption of cores. On ODROID-XU board

Result – bzip2

Power Model(Cont.) The power consumption of a physical core is a function of core type, core frequency, and the load of the core. The load of a core is the percentage of time a core is executing virtual cores.

Performance A ratio between the computing resource assigned, to the computing resource requested. Ex: a virtual core running at 800MHz runs on a physical core of 1200MHz for 60% of a time interval. The performance of this virtual core is 0.6*1200/800 = 0.9.

Optimization Problem Objective function: Generate a set of ai,j.
n: number of physical core Generate a set of ai,j. ai,j:the amount of time executing virtual core j on physical core i in a time interval. Some constraints. We can formulate the virtual core scheduling problem into an optimization problem. The objective function is the summation of the power consumption of each physical core. We want to generate a scheduling plan that minimize the objective function. A scheduling plan consist of a_{i,j}, which indicates the amount of time executing virtual core j on physical core i in a time interval. There are some constraints on a_{i,j}.

Constraints Equal performance of each virtual core.
Resource sufficient: all virtual core with performance = 1. Resource insufficient: all virtual core with equal performance less than 1. Time assigned to a virtual core should be less than a time interval. The first constraint is that the performance of each virtual core should be equal. If the overall resource physical cores can provide is greater than the total resource required, then the performance of every virtual core should be 1. On the other hand, if the resource is less than required, the resource should be distributed to core such that the performance of each virtual core is equal. The second is that the time assigned to a virtual core on every physical core should be less than a time interval.

Constraints(Cont.) A physical core has a fixed amount of computing resources in a time interval. Load of a physical core ≦ 100%. Also, a physical core has only a fixed amount of computing resources in a time interval. In another word, the load of a physical core must not be greater than 100%.

Three-phase Solution [Phase 1] generates the amount of time each virtual core should run on each physical core. [Phase 2] determines the execution order of each virtual core on a physical core. [Phase 3] exchanges the order of execution slice in order to reduce the number of core switching. “The reason we separate our solution into three phases is that, with the information on the amount of time each virtual core should run on each physical core, it is difficult to satisfy the following two constraints simultaneously – a virtual core can only run on a single physical core at any time, and the virtual core should not switch among physical cores frequently. Instead we formulate the first constraint as an edge coloring problem and solve it with an efficient algorithm. Then we adjust the solution from the edge coloring problem to reduce the number of core switchings in order to satisfy the second constraint.”

Phase 1 Given the objective function and the constraints, we can use integer programming to find ai,j. ai,j : the amount of time slices virtual core i should run on physical core j. Divide a time interval into time slices. Integer programming can find a feasible solution in a short time when the number of vCPUs and the number of pCPUs are small constants.

Phase 1(Cont.) If the relationship between power and load is linear.
Use greedy instead. Assign virtual core to the physical core with the least power/instruction ratio and load under100%.

Phase 2 With the information from phase 1, the scheduler has to determine the execution order of each virtual core on each physical core. A virtual core cannot appear in two or more physical core at the same time.

Example vCPU0 (50,40,0, 0) vCPU1 (20,20,20, 20) vCPU2 (10,10,20, 20)
(10,10,10, 10) vCPU5 (0, 0,10, 10) t=100 t=0

Phase 2(Cont.) We can formulate the problem into an Open-shop scheduling problem (OSSP). OSSP with preemption can be solved in polynomial time. [1] [1] T. Gonzalez and S. Sahni. Open shop scheduling to minimize finish time. J. ACM, 23(4):665–679, Oct

After Phase 1 & 2 After the first two phases, the scheduler generates a scheduling plan. x: physical core idle V4 x … Core0 V3 Core1 V2 V1 Core2 Core3 t1 t2 t3 t4 t100 Execution Slice

Phase 3 Migrating tasks between cores incurs overhead.
Reduce the overhead by exchanging the order to minimize the number of core switching.

Number of Switching Minimization Problem
Given a scheduling plan, we want to find an order of the execution slice, such that the cost is minimized. An NPC problem Reduce from the Hamilton Path Problem. Propose a greedy heuristic.

Example #switching = 0 1 2 3 x 4 p1 p2 p3

Example #switching = 0 1 2 3 x 4 p1 p2 p3 x 1 t1

Example #switching = 0 1 2 3 4 p1 p2 p3 x 2 3 t2 1 t1

Example #switching = 1 3 1 2 4 p1 p2 p3 1 2 3 t3 x t2 t1

Example #switching = 7 p1 p2 p3 2 1 3 t6 t5 4 t4 t3 x t2 t1

Xen Hypervisor Scheduler: code Study

Xen Hypervisor Scheduler: xen/common/ schedule.c sched_credit.c
sched_sedf.c sched_arinc653.c

xen/common/schedule.c
Generic CPU scheduling code implements support functionality for the Xen scheduler API. scheduler: default to credit-base scheduler static void schedule(void) de-schedule the current domain. pick a new domain.

Scheduling Steps Xen call do_schedule() of current scheduler on each physical CPU(PCPU). Scheduler selects a virtual CPU(VCPU) from run queue, and return it to Xen hypervisor. Xen hypervisor deploy the VCPU to current PCPU.

Adding Our Scheduler Our scheduler periodically generates a scheduling plan. Organize the run queue of each physical core according to the scheduling plan. Xen hypervisor assigns VCPU to PCPU according to the run queue.

Current Status Implementing our scheduler on Xen. TODOs: Scheduler.
Fetch virtual core frequency in Xen. Insert a periodic routine in Xen. Virtual core idle and wake-up mechanism in our scheduler.

Research on Embedded Hypervisor Scheduler Techniques

Similar presentations

Presentation on theme: "Research on Embedded Hypervisor Scheduler Techniques"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Research on Embedded Hypervisor Scheduler Techniques

Similar presentations

Presentation on theme: "Research on Embedded Hypervisor Scheduler Techniques"— Presentation transcript:

Similar presentations

About project

Feedback