Research on Embedded Hypervisor Scheduler Techniques 2014/10/02 1.

Slides:

Advertisements

Similar presentations

Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.

Advertisements

4. Workload directed adaptive SMP multicores

Hadi Goudarzi and Massoud Pedram

Abdulrahman Idlbi COE, KFUPM Jan. 17, Past Schedulers: 1.2 & : circular queue with round-robin policy. Simple and minimal. Not focused on.

Energy-Efficient System Virtualization for Mobile and Embedded Systems Final Review 2014/01/21.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.

Chap 5 Process Scheduling. Basic Concepts Maximum CPU utilization obtained with multiprogramming CPU–I/O Burst Cycle – Process execution consists of a.

Project Overview 2014/05/05 1. Current Project “Research on Embedded Hypervisor Scheduler Techniques” ◦ Design an energy-efficient scheduling mechanism.

Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.

Cs238 CPU Scheduling Dr. Alan R. Davis. CPU Scheduling The objective of multiprogramming is to have some process running at all times, to maximize CPU.

Progress Report Design, implementation, experiments, and demo plan 2014/12/03 1.

University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.

Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.

Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.

 Escalonamento e Migração de Recursos e Balanceamento de carga Carlos Ferrão Lopes nº M6935 Bruno Simões nº M6082 Celina Alexandre nº M6807.

Virtual Machine Scheduling for Parallel Soft Real-Time Applications

Power and Performance Modeling in a Virtualized Server System M. Pedram and I. Hwang Department of Electrical Engineering Univ. of Southern California.

Chapter 6 Scheduling. Basic concepts Goal is maximum utilization –what does this mean? –cpu pegged at 100% ?? Most programs are I/O bound Thus some other.

Politecnico di Torino Dipartimento di Automatica ed Informatica TORSEC Group Performance of Xen’s Secured Virtual Networks Emanuele Cesena Paolo Carlo.

Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.

Progress Report 2014/02/12. Previous in IPDPS’14 Energy-efficient task scheduling on per- core DVFS architecture ◦ Batch mode  Tasks with arrival time.

An Energy-Efficient Hypervisor Scheduler for Asymmetric Multi- core 1 Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.

Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.

A dynamic optimization model for power and performance management of virtualized clusters Vinicius Petrucci, Orlando Loques Univ. Federal Fluminense Niteroi,

Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.

VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.

1 11/29/2015 Chapter 6: CPU Scheduling l Basic Concepts l Scheduling Criteria l Scheduling Algorithms l Multiple-Processor Scheduling l Real-Time Scheduling.

Chapter 5: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 5: CPU Scheduling Basic.

Progress Report 2013/11/07. Outline Further studies about heterogeneous multiprocessing other than ARM Cache miss issue Discussion on task scheduling.

Research on Asymmetric-aware Hypervisor Scheduler Project overview 6/4.

IIS Progress Report 2015/10/12. Problem Revisit Given a set of virtual machines, each contains some virtual cores with resource requirements. Decides.

Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.

1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 5 CPU Scheduling Slide 1 Chapter 5 CPU Scheduling.

Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.

Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.

Progress Report 07/30. Virtual Core Scheduling Problem For every time period, the hypervisor scheduler is given a set of virtual cores with their operating.

Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.

CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.

Basic Concepts Maximum CPU utilization obtained with multiprogramming

IIS Progress Report 2016/01/11. Goal Propose an energy-efficient scheduler that minimize the power consumption while providing sufficient computing resources.

Guy Martin, OSLab, GNU Fall-09

Chapter 6: CPU Scheduling (Cont’d)

lecture 5: CPU Scheduling

Progress Report 2014/05/23.

April 6, 2001 Gary Kimura Lecture #6 April 6, 2001

Ching-Chi Lin Institute of Information Science, Academia Sinica

Computing Resource Allocation and Scheduling in A Data Center

Comparison of the Three CPU Schedulers in Xen

Chapter 6: CPU Scheduling

Module 5: CPU Scheduling

3: CPU Scheduling Basic Concepts Scheduling Criteria

Process scheduling Chapter 5.

Chapter 6: CPU Scheduling

CPU SCHEDULING.

Outline Scheduling algorithms Multi-processor scheduling

Resource Cages: A New Abstraction of the Hypervisor for Performance Isolation Considering IDS Offloading Kenichi Kourai*, Sungho Arai**, Kousuke Nakamura*,

Chapter 6: CPU Scheduling

Progress Report 2014/04/23.

Research on Embedded Hypervisor Scheduler Techniques

Shortest-Job-First (SJR) Scheduling

Chapter 6: CPU Scheduling

Chapter 6: CPU Scheduling

Module 5: CPU Scheduling

Chapter 6: CPU Scheduling

Chapter 6: CPU Scheduling

Progress Report 2015/01/28.

Progress Report 2017/02/08.

IIS Progress Report 2016/01/18.

Module 5: CPU Scheduling

Progress Report 11/05.

Presentation transcript:

Research on Embedded Hypervisor Scheduler Techniques 2014/10/02 1

Background Asymmetric multi-core is becoming increasing popular over homogeneous multi-core systems. ◦ An asymmetric multi-core platform consists of cores with different capabilities.  ARM: big.LITTLE architecture.  Qualcomm: asynchronous Symmetrical Multi- Processing (aSMP)  Nvidia: variable Symmetric Multiprocessing (vSMP)  …etc. 2

Motivation Scheduling goals differ between homogenous and asymmetric multi-core platforms. ◦ Homogeneous multi-core: load-balancing.  Distribute workloads evenly in order to obtain maximum performance. ◦ Asymmetric multi-core: maximize power efficiency with modest performance sacrifices. 3

Motivation(Cont.) Need new scheduling strategies for asymmetric multi-core platform. ◦ The power and computing characteristics vary from different types of cores. ◦ Take the differences into consideration while scheduling. 4

Project Goal Research on the current scheduling algorithms for homogenous and asymmetric multi-core architecture. Design and implement the hypervisor scheduler on asymmetric multi-core platform. Assign virtual cores to physical cores for execution. Minimize the power consumption with performance guarantee. 5

Linaro Linux Kernel GUEST2 Android Framework Scheduler VCPU 6 ARM Cortex-A15 ARM Cortex-A7 OS Kernel GUEST1 Android Framework Scheduler VCPU Hypervisor Performance Power- saving OS Kernel GUEST2 Android Framework Scheduler VCPU Low computing resource requirement High computing resource requirement Task 2 Task 4 VM Introspector b-L vCPU Scheduler VM Introspector gathers task information from Guest OS Task-to- vCPU Mapper Modify the CPU mask of each task according to the task information from VMI [1|0] [0|1] Treat this vCPU as LITTLE core since tasks with low computing requirement are scheduled here. Hypervisor vCPU scheduler will schedule big vCPU to A15, and LITTLE vCPU to A7. VCPU Task 3 Task 1 Hypervisor Architecture with VMI

Hypervisor Scheduler Assigns the virtual cores to physical cores for execution. ◦ Determines the execution order and amount of time assigned to each virtual core according to a scheduling policy. ◦ Xen - credit-based scheduler ◦ KVM - completely fair scheduler 7

Virtual Core Scheduling Problem For every time period, the hypervisor scheduler is given a set of virtual cores. Given the operating frequency of each virtual core, the scheduler will generate a scheduling plan, such that the power consumption is minimized, and the performance is guaranteed. 8

Scheduling Plan Must satisfy three constraints. ◦ Each virtual core should run on each physical core for a certain amount of time to satisfy the workload requirement. ◦ A virtual core can run only on a single physical core at any time. ◦ The virtual core should not switch among physical cores frequently, so as to reduce the overheads. 9

Example of A Scheduling Plan ◦ x: physical core idle 10 V4V4 xxV4V4 x… Core 0 V3V3 xV3V3 xx… Core 1 V2V2 V4V4 V1V1 V2V2 V4V4 … Core 2 V1V1 V1V1 V2V2 V3V3 V1V1 … Core 3 t1t1 t2t2 t3t3 t4t4 t 100 Execution Slice

Three-phase Solution [Phase 1] generates the amount of time each virtual core should run on each physical core. [Phase 2] determines the execution order of each virtual core on a physical core. [Phase 3] exchanges the order of execution slice in order to reduce the number of core switching. 11

Phase 1 Given the objective function and the constraints, we can use integer programming to find a i,j. ◦ a i,j : the amount of time slices virtual core i should run on physical core j.  Divide a time interval into time slices. ◦ Integer programming can find a feasible solution in a short time when the number of vCPUs and the number of pCPUs are small constants. 12

Phase 1(Cont.) If the relationship between power and load is linear. ◦ Use greedy instead. ◦ Assign virtual core to the physical core with the least power/instruction ratio and load under100%. 13

Phase 2 With the information from phase 1, the scheduler has to determine the execution order of each virtual core on each physical core. ◦ A virtual core cannot appear in two or more physical core at the same time. 14

Example vCPU 0 (50,40,0, 0) t=100 t=0 vCPU 3 (10,10,20, 20) vCPU 1 (20,20,20, 20) vCPU 4 (10,10,10, 10) vCPU 2 (10,10,20, 20) vCPU 5 (0, 0,10, 10)

Phase 2(Cont.) We can formulate the problem into an Open-shop scheduling problem (OSSP). ◦ OSSP with preemption can be solved in polynomial time. [1] 16 [1] T. Gonzalez and S. Sahni. Open shop scheduling to minimize finish time. J. ACM, 23(4):665–679, Oct

After Phase 1 & 2 After the first two phases, the scheduler generates a scheduling plan. ◦ x: physical core idle 17 V4V4 xxV4V4 x… Core 0 V3V3 xV3V3 xx… Core 1 V2V2 V4V4 V1V1 V2V2 V4V4 … Core 2 V1V1 V1V1 V2V2 V3V3 V1V1 … Core 3 t1t1 t2t2 t3t3 t4t4 t 100 Execution Slice

Phase 3 Migrating tasks between cores incurs overhead. Reduce the overhead by exchanging the order to minimize the number of core switching. 18

Number of Switching Minimization Problem Given a scheduling plan, we want to find an order of the execution slice, such that the cost is minimized. ◦ An NPC problem  Reduce from the Hamilton Path Problem. ◦ Propose a greedy heuristic. 19

Example x23 xx p1p1 p2p2 p3p3 #switching = 0

Example x230 xx p1p1 p2p2 p3p3

Example x p1p1 p2p2 p3p3 xx1t1t1

Example x p1p1 p2p2 p3p3 xx1t1t1

Example p1p1 p2p2 p3p3 x23t2t2 xx1t1t1

Example p1p1 p2p2 p3p3 x23t2t2 xx1t1t1

Example p1p1 p2p2 p3p3 123t3t3 x23t2t2 xx1t1t1 #switching = 1

Example p1p1 p2p2 p3p3 213t6t6 312t5t5 432t4t4 123t3t3 x23t2t2 xx1t1t1 #switching = 7

Evaluation Conduct simulations to compare the power consumption of our asymmetry- aware scheduler with that of a credit- based scheduler. Compare the numbers of core switching from our greedy heuristic and that from an optimal solution. 28

Evaluation Conduct simulations to compare the power consumption of our asymmetry- aware scheduler with that of a credit- based scheduler. Compare the numbers of core switching from our greedy heuristic and that from an optimal solution. 29

Environment Two types of physical cores ◦ power-hunger “big” cores  frequency: 1600MHz ◦ power-efficient “little” cores  frequency: 600MHz ◦ The DVFS mechanism is disabled. 30

Power Model Relation between power consumption, core frequency, and load. ◦ bzip2

Scenario I – 2 Big and 2 Little Dual-core VM. Two sets of input: ◦ Case 1: Both VMs with light workloads.  250MHz for each virtual core. ◦ Case 2: One VM with heavy workloads, the other with modest workloads.  Heavy:1200MHz for each virtual core  Modest:600MHz for each virtual core. 32

Scenario I - Results ◦ Case 1: asymmetry-aware method is about 43.2% of that of credit-based method. ◦ Case 2:asymmetry-aware method uses 95.6% of energy used by the credit-base method. 33 Power(Watt) Case 1 Light-load VMs Asymmetry-aware0.295 Credit-based0.683 Case 2 Heavy-load VM + Modest-load VM Asymmetry-aware2.382 Credit-based2.491

Scenario 2 – 4 Big and 4 Little Quad-core VM. Three cases 34 VM1VM2VM3 Case 1 Light-load All 250 MHz Case 2 Modest-load All 600MHz All 250 MHz Case 3 Heavy-load All 1600MHz

Scenario 2 - Results  In case 3, the loading of physical cores are 100% using both methods.  Cannot save power if the computing resources are not enough. 35 Power(Watt)Savings Case 1 Light-load Asymmetry-aware % Credit-based2.049 Case 2 Modest-Load Asymmetry-aware % Credit-based3.960 Case 3* Heavy-load Asymmetry-aware % Credit-based6.009

Evaluation Conduct simulations to compare the power consumption of our asymmetry- aware scheduler with that of a credit- based scheduler. Compare the numbers of core switching from our greedy heuristic and that from an optimal solution. 36

Setting 25 sets of input ◦ 4 physical cores, 12 virtual cores, 24 distinct execution slices. Optimal solution ◦ Enumerates all possible permutations of the execution slices. ◦ Use A* search to reduce the search space. 37

Evaluation Result 38 Greedy HeuristicA* Search Average number of switching Average execution time seconds10+ minutes

XEN HYPERVISOR SCHEDULER: CODE STUDY 39

Xen Hypervisor Scheduler: ◦ xen/common/  schedule.c  sched_credit.c  sched_credit2.c  sched_sedf.c  sched_arinc653.c 40

xen/common/schedule.c Generic CPU scheduling code ◦ implements support functionality for the Xen scheduler API. ◦ scheduler: default to credit-base scheduler static void schedule(void) ◦ de-schedule the current domain. ◦ pick a new domain. 41

xen/common/sched_credit.c Credit-based SMP CPU scheduler static struct task_slice csched_schedule; ◦ Implementation of credit-base scheduling. ◦ SMP Load balance.  If the next highest priority local runnable VCPU has already eaten through its credits, look on other PCPUs to see if we have more urgent work. 42

xen/common/sched_credit2.c Credit-based SMP CPU scheduler ◦ Based on an earlier version. static struct task_slice csched2_schedule; ◦ Select next runnable local VCPU (i.e. top of local run queue). static void balance_load(const struct scheduler *ops, int cpu, s_time_t now); 43

Scheduling Steps Xen call do_schedule() of current scheduler on each physical CPU(PCPU). Scheduler selects a virtual CPU(VCPU) from run queue, and return it to Xen hypervisor. Xen hypervisor deploy the VCPU to current PCPU. 44

Adding Our Scheduler Our scheduler periodically generates a scheduling plan. Organize the run queue of each physical core according to the scheduling plan. Xen hypervisor assigns VCPU to PCPU according to the run queue. 45

Current Status We propose a three-phase solution for generating a scheduling plan on asymmetric multi-core platform. Our simulation results show that the asymmetry-aware strategy results in a potential energy savings of up to 56.8% against the credit-based method. On going: implement the solution into Xen hypervisor. 46

Questions or Comments? 47