Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adding a Scheduling Policy to the Linux Kernel By Juan M. Banda CS518 Advanced Operating Systems.

Similar presentations


Presentation on theme: "Adding a Scheduling Policy to the Linux Kernel By Juan M. Banda CS518 Advanced Operating Systems."— Presentation transcript:

1 Adding a Scheduling Policy to the Linux Kernel By Juan M. Banda CS518 Advanced Operating Systems

2 Presentation Outline Introduction Introduction Project Description / Challenges Project Description / Challenges Background Information Background Information Project Steps Project Steps Achievements Achievements References References

3 Introduction What is Linux? What is Linux? Operating system for computers, comparable to Windows or Mac OS X Operating system for computers, comparable to Windows or Mac OS X Created starting in 1991 by Finnish programmer Linus Torvalds with the assistance of developers from around the globe Created starting in 1991 by Finnish programmer Linus Torvalds with the assistance of developers from around the globe Runs on a wide variety of hardware platforms, from huge mainframes to desktop PCs to cell phones Runs on a wide variety of hardware platforms, from huge mainframes to desktop PCs to cell phones Licensed under the Free Software Foundation's GNU Project's GNU General Public License, version 2, which lets users modify and redistribute the software Licensed under the Free Software Foundation's GNU Project's GNU General Public License, version 2, which lets users modify and redistribute the software You can think of Linux as having two parts -- a kernel, which is the basic interface between the hardware and other system software, and the functions that run on top of it, such as a graphical user interface (GUI) and application programs You can think of Linux as having two parts -- a kernel, which is the basic interface between the hardware and other system software, and the functions that run on top of it, such as a graphical user interface (GUI) and application programs

4 Project Description / Challenges Idea: Implement a new scheduling policy Idea: Implement a new scheduling policy Purpose: The new policy should schedule processes in the background. Purpose: The new policy should schedule processes in the background. Problem 1: SCHED_IDLE already does this Problem 1: SCHED_IDLE already does this Modification: Policy should schedule process in a lower priority than SCHED_IDLE Modification: Policy should schedule process in a lower priority than SCHED_IDLE Problem 2: Kernel 2.6 scheduler is considerably different than in Kernel 2.4 Problem 2: Kernel 2.6 scheduler is considerably different than in Kernel 2.4

5 Background Information Kernel 2.4 scheduler major features: Kernel 2.4 scheduler major features: An O(n) scheduler - Goes through the entire “ global runqueue” to determine the next task to be run. This is an O(n) algorithm where 'n' is the number of processes. The time taken was proportional to the number of active processes in the system. A Global runqueue - All CPUs had to wait for other CPUs to finish execution. A Global runqueue for all processors in a symmetric multiprocessing system (SMP). This meant a task could be scheduled on any processor -- which can be good for load balancing but bad for memory caches. For example, suppose a task executed on CPU-1, and its data was in that processor's cache. If the task got rescheduled to CPU-2, its data would need to be invalidated in CPU-1 and brought into CPU-2 A Global runqueue for all processors in a symmetric multiprocessing system (SMP). This meant a task could be scheduled on any processor -- which can be good for load balancing but bad for memory caches. For example, suppose a task executed on CPU-1, and its data was in that processor's cache. If the task got rescheduled to CPU-2, its data would need to be invalidated in CPU-1 and brought into CPU-2 This lead to large performance hits during heavy workloads

6 Background Information Kernel 2.4 Scheduler Policies: Kernel 2.4 Scheduler Policies: SCHED_FIFO - A First-In, First-Out real-time process When the scheduler assigns the CPU to the process, it leaves the process descriptor in its current position in the runqueue list. If no other higher- priority realtime process is runnable, the process will continue to use the CPU as long as it wishes, even if other real-time processes having the same priority are runnable

7 Background Information SCHED_RR - A Round Robin real-time process When the scheduler assigns the CPU to the process, it puts the process descriptor at the end of the runqueue list. This policy ensures a fair assignment of CPU time to all SCHED_RR real-time processes that have the same priority SCHED_OTHER - A conventional, time-shared process The policy field also encodes a SCHED_YIELD binary flag. This flag is set when the process invokes the sched_ yield( ) system call (a way of voluntarily relinquishing the processor without the need to start an I/O operation or go to sleep. The scheduler puts the process descriptor at the bottom of the runqueue list

8 Background Information Kernel 2.6 Kernel 2.6 The 2.6 scheduler was designed and implemented by Ingo Molnar. His motivation in working on the new scheduler was to create a completely O(1) scheduler for wakeup, context-switch, and timer interrupt overhead The 2.6 scheduler was designed and implemented by Ingo Molnar. His motivation in working on the new scheduler was to create a completely O(1) scheduler for wakeup, context-switch, and timer interrupt overhead One of the issues that triggered the need for a new scheduler was the use of Java virtual machines (JVMs). The Java programming model uses many threads of execution, which results in lots of overhead for scheduling in an O(n) scheduler One of the issues that triggered the need for a new scheduler was the use of Java virtual machines (JVMs). The Java programming model uses many threads of execution, which results in lots of overhead for scheduling in an O(n) scheduler Each CPU has a runqueue made up of 140 priority lists that are serviced in FIFO order. Tasks that are scheduled to execute are added to the end of their respective runqueue's priority list Each CPU has a runqueue made up of 140 priority lists that are serviced in FIFO order. Tasks that are scheduled to execute are added to the end of their respective runqueue's priority list Each task has a time slice that determines how much time it's permitted to execute Each task has a time slice that determines how much time it's permitted to execute The first 100 priority lists of the runqueue are reserved for real-time tasks, and the last 40 are used for user tasks (MAX_RT_PRIO=100 and MAX_PRIO=140) The first 100 priority lists of the runqueue are reserved for real-time tasks, and the last 40 are used for user tasks (MAX_RT_PRIO=100 and MAX_PRIO=140)

9 Background Information In addition to the CPU's runqueue, which is called the active runqueue, there's also an expired runqueue In addition to the CPU's runqueue, which is called the active runqueue, there's also an expired runqueue When a task on the active runqueue uses all of its time slice, it's moved to the expired runqueue. During the move, its time slice is recalculated (and so is its priority) When a task on the active runqueue uses all of its time slice, it's moved to the expired runqueue. During the move, its time slice is recalculated (and so is its priority) If no tasks exist on the active runqueue for a given priority, the pointers for the active and expired runqueues are swapped, thus making the expired priority list the active one If no tasks exist on the active runqueue for a given priority, the pointers for the active and expired runqueues are swapped, thus making the expired priority list the active one

10 Background Information O(1) Algorithm ( Constant time algorithm ) Choose the task on the highest priority list to execute Choose the task on the highest priority list to execute To make this process more efficient, a bitmap is used to define when tasks are on a given priority list To make this process more efficient, a bitmap is used to define when tasks are on a given priority list On most architectures, a find-first-bit-set instruction is used to find the highest priority bit set in one of five 32-bit words (for the 140 priorities) On most architectures, a find-first-bit-set instruction is used to find the highest priority bit set in one of five 32-bit words (for the 140 priorities) The time it takes to find a task to execute depends not on the number of active tasks but instead on the number of priorities The time it takes to find a task to execute depends not on the number of active tasks but instead on the number of priorities This makes the 2.6 scheduler an O(1) process because the time to schedule is both fixed and deterministic regardless of the number of active tasks This makes the 2.6 scheduler an O(1) process because the time to schedule is both fixed and deterministic regardless of the number of active tasks

11 Background Information SMP Support: SMP Support: Even though the prior scheduler worked in SMP systems, its big-lock architecture meant that while a CPU was choosing a task to dispatch, the runqueue was locked by the CPU, and others had to wait Even though the prior scheduler worked in SMP systems, its big-lock architecture meant that while a CPU was choosing a task to dispatch, the runqueue was locked by the CPU, and others had to wait The 2.6 scheduler doesn't use a single lock for scheduling; instead, it has a lock on each runqueue. This allows all CPUs to schedule tasks without contention from other CPUs The 2.6 scheduler doesn't use a single lock for scheduling; instead, it has a lock on each runqueue. This allows all CPUs to schedule tasks without contention from other CPUs Task preemption: Task preemption: This means a lower-priority task won't execute while a higher-priority task is ready to run. The scheduler preempts the lower-priority process, places the process back on its priority list, and then reschedules This means a lower-priority task won't execute while a higher-priority task is ready to run. The scheduler preempts the lower-priority process, places the process back on its priority list, and then reschedules

12 Background Information Function nameFunction description schedule The main scheduler function. Schedules the highest priority task for execution. load_balance Checks the CPU to see whether an imbalance exists, and attempts to move tasks if not balanced. effective_prio Returns the effective priority of a task (based on the static priority, but includes any rewards or penalties). recalc_task_prio Determines a task's bonus or penalty based on its idle time. source_load Conservatively calculates the load of the source CPU (from which a task could be migrated). target_load Liberally calculates the load of a target CPU (where a task has the potential to be migrated). migration_thread High-priority system thread that migrates tasks between CPUs.

13 Background Information Kernel 2.6 Scheduler Policies: Kernel 2.6 Scheduler Policies: SCHED_NORMAL - A conventional, time-shared process (used to be called SCHED_OTHER), for normal tasks Each task assigned a “Nice” value PRIO = MAX_RT_PRIO + NICE + 20 Assigned a time slice Tasks at the same prio(rity) are round-robined Ensures Priority + Fairness

14 Background Information SCHED_FIFO - A First-In, First-Out real-time process Run until they relinquish the CPU voluntarily Priority levels maintained Not pre-empted !! SCHED_RR - A Round Robin real-time process Assigned a timeslice and run till the timeslice is exhausted. Once all RR tasks of a given prio(rity) level exhaust their timeslices, their timeslices are refilled and they continue running Prio(rity) levels are maintained

15 Background Information SCHED_BATCH - for "batch" style execution of processes For computing-intensive tasks For computing-intensive tasks Timeslices are long and processes are round robin scheduled Timeslices are long and processes are round robin scheduled lowest priority tasks are batch-processed (nice +19) lowest priority tasks are batch-processed (nice +19) SCHED_IDLE - for running very low priority background job SCHED_IDLE - for running very low priority background job nice value has no influence for this policy nice value has no influence for this policy extremely low priority (lower than +19 nice) extremely low priority (lower than +19 nice) SCHED_ISO - To be implemented!! SCHED_ISO - To be implemented!!

16 Background Information Interactivity estimator Dynamically scales a tasks priority based on it's interactivity Interactive tasks receive a prio bonus [ -5 ] Hence a larger timeslice CPU bound tasks receive a prio penalty [ +5 ] Interactivity estimated using a running sleep average. Interactive tasks are I/O bound. They wait for events to occur. Sleeping tasks are I/O bound or interactive !! Actual bonus/penalty is determined by comparing the sleep average against a constant maximum sleep average. Does not apply to RT tasks

17 Background Information When a task finishes it's timeslice : It's interactivity is estimated Interactive tasks can be inserted into the 'Active' array again Else, priority is recalculated Inserted into the NEW priority level in the 'Expired' array Re-inserting interactive tasks To avoid delays, interactive tasks may be re-inserted into the 'active' array after their timeslice has expired Done only if tasks in the 'expired' array have run recently Done to prevent starvation of tasks Decision to re-insert depends on the task's priority level

18 Background Information Timeslice distribution: Priority is recalculated only after expiring a timeslice Interactive tasks may become non-interactive during their LARGE timeslices, thus starving other processes To prevent this, time-slices are divided into chunks of 20ms A task of equal priority may preempt the running task every 20ms The preempted task is requeued and is round-robined in it's priority level. Also, priority recalculation happens every 20ms

19 Background Information From From /usr/src/linux-2.6.x/kernel/sched.c void schedule() The main scheduling function. Upon return, the highest priority process will be active Data struct runqueue() The main per-CPU runqueue data structure struct task_struct() The main per-process data structure

20 Background Information Process Control methods void set_user_nice (... ) Sets the nice value of task p to given value int setscheduler(... ) o Sets the scheduling policy and parameters for a given pid rt_task( pid ) o Returns true if pid is real-time, false if not yield() Place the current process at the end of the runqueue and call schedule()

21 Background Information Benchmark Benchmark Each individual test runs a multiple of 25 processes, increments to the next multiple and reruns the benchmark. This continues until a max level, set by the tester, is achieved Each individual test runs a multiple of 25 processes, increments to the next multiple and reruns the benchmark. This continues until a max level, set by the tester, is achieved

22 Background Information Now that we know all of this….. Now that we know all of this….. THEY CHANGED IT AGAIN!!!!!!!!!!!!!!!

23 Background Information Kernel 2.6.23 scheduler Kernel 2.6.23 scheduler Called Completely Fair Scheduler (CFS) Called Completely Fair Scheduler (CFS) Does not use runqueues, it uses a time-ordered rbtree to build a 'timeline' of future task execution, and thus has no 'array switch' artifacts for the SCHED_NORMAL policy (or SCHED_OTHER) Does not use runqueues, it uses a time-ordered rbtree to build a 'timeline' of future task execution, and thus has no 'array switch' artifacts for the SCHED_NORMAL policy (or SCHED_OTHER) Has no notion of 'timeslices' and has no heuristics whatsoever Has no notion of 'timeslices' and has no heuristics whatsoever sched_rt.c implements SCHED_FIFO and SCHED_RR semantics, in a simpler way than the vanilla scheduler does. It uses 100 runqueues (for all 100 RT priority levels, instead of 140 in the vanilla scheduler) and it needs no expired array sched_rt.c implements SCHED_FIFO and SCHED_RR semantics, in a simpler way than the vanilla scheduler does. It uses 100 runqueues (for all 100 RT priority levels, instead of 140 in the vanilla scheduler) and it needs no expired array SCHED_BATCH is handled by the CFS scheduler module too SCHED_BATCH is handled by the CFS scheduler module too

24 Project Steps To start, we need to figure out what version of the kernel we are currently running. We'll use the uname command for that To start, we need to figure out what version of the kernel we are currently running. We'll use the uname command for that $ uname -r 2.6.24-3-generic Now we need to Install the Linux source for your kernel, you can substitute the kernel number for whatever you are running. We also need to install the curses library and some other tools to help us compile Now we need to Install the Linux source for your kernel, you can substitute the kernel number for whatever you are running. We also need to install the curses library and some other tools to help us compile $ sudo apt-get install linux-source-2.6.24 kernel-package libncurses5-dev fakeroot If you are curious where the Linux source gets installed to, you can use the dpkg command to tell you the files within a package If you are curious where the Linux source gets installed to, you can use the dpkg command to tell you the files within a package $ dpkg -L linux-source-2.6.17

25 Project Steps To make things easier, we'll put ourselves in root mode by using sudo to open a new shell. There's other ways to do this, but I prefer this way To make things easier, we'll put ourselves in root mode by using sudo to open a new shell. There's other ways to do this, but I prefer this way $ sudo /bin/bash Now change directory into the source location so that we can install. Note that you may need to install the bunzip utility if it's not installed Now change directory into the source location so that we can install. Note that you may need to install the bunzip utility if it's not installed $ cd /usr/src $ bunzip2 linux-source-2.6.24.tar.bz2 $ tar xvf linux-source-2.6.24.tar $ ln -s linux-source-2.6.24 linux

26 Project Steps Make a copy of your existing kernel configuration to use for the custom compile process Make a copy of your existing kernel configuration to use for the custom compile process $ cp /boot/config-`uname -r` /usr/src/linux/.config First we'll do a make clean, just to make sure everything is ready for the compile First we'll do a make clean, just to make sure everything is ready for the compile $ make-kpkg clean Next we'll actually compile the kernel. This will take a LONG FREAKING TIME, so go find something interesting to do Next we'll actually compile the kernel. This will take a LONG FREAKING TIME, so go find something interesting to do $ fakeroot make-kpkg --initrd --append-to-version=-custom kernel_image kernel_headers This process will create two.deb files in /usr/src that contain the kernel This process will create two.deb files in /usr/src that contain the kernel

27 Project Steps Please note that when you run these next commands, this will set the new kernel as the new default kernel. This could break things! If your machine doesn't boot, you can hit Esc at the GRUB loading menu, and select your old kernel. You can then disable the kernel in /boot/grub/menu.lst or try and compile again Please note that when you run these next commands, this will set the new kernel as the new default kernel. This could break things! If your machine doesn't boot, you can hit Esc at the GRUB loading menu, and select your old kernel. You can then disable the kernel in /boot/grub/menu.lst or try and compile again $ dpkg -i linux-image-2.6.24.3-custom_2.6.24.3-custom- 10.00.Custom_i386.deb $ dpkg -i linux-headers-2.6.24.3-custom_2.6.24.3-custom- 10.00.Custom_i386.deb Now reboot your machine. If everything works, you should be running your new custom kernel. You can check this by using uname. Note that the exact number will be different on your machine Now reboot your machine. If everything works, you should be running your new custom kernel. You can check this by using uname. Note that the exact number will be different on your machine $ uname -r 2.6.17.14-ubuntu1-custom

28 Project Steps Actual Kernel Files Modified: Actual Kernel Files Modified: sched.h sched.h sched.c sched.c Auxiliary Program Modified Auxiliary Program Modified chrt.c chrt.c

29 Project Steps Kernel files modifications: Kernel files modifications: Added an new policy called SCHED_JUAN Added an new policy called SCHED_JUAN Given a static lower priority value than SCHED_IDLE Given a static lower priority value than SCHED_IDLE Code? See the attached files Code? See the attached files

30 Project Steps Auxiliary Program: Auxiliary Program: chrt command is part of util-linux package - low-level system utilities that are necessary for a Linux system to function. It is installed by default under Ubuntu and almost all other Linux distributions chrt command is part of util-linux package - low-level system utilities that are necessary for a Linux system to function. It is installed by default under Ubuntu and almost all other Linux distributions You can get / set attributes of running processes You can get / set attributes of running processes Compile: gcc chrtJ.c -o chrtJU Compile: gcc chrtJ.c -o chrtJU Changed chrt source to support SCHED_JUAN Changed chrt source to support SCHED_JUAN Code ? See attached file (chrtJ.c) Code ? See attached file (chrtJ.c)

31 Achievements Project Demo Project Demo

32 Project Steps Is the policy useful ? Is the policy useful ? Improvements ? Improvements ?

33 Questions ?

34 References Kernel Design Kernel Design http://aplawrence.com/Linux/linux26_features.html http://aplawrence.com/Linux/linux26_features.html http://www.linux.com/whatislinux/119700 http://www.linux.com/whatislinux/119700 http://www.ibm.com/developerworks/linux/library/l-scheduler/ http://www.ibm.com/developerworks/linux/library/l-scheduler/ http://lxr.linux.no/linux+v2.6.24/Documentation/sched-design.txt http://lxr.linux.no/linux+v2.6.24/Documentation/sched-design.txt Kernel Compiling Guide Kernel Compiling Guide http://www.howtogeek.com/howto/ubuntu/how-to-customize-your-ubuntu-kernel/ http://www.howtogeek.com/howto/ubuntu/how-to-customize-your-ubuntu-kernel/ SCHED_IDLE Reference: SCHED_IDLE Reference: https://kerneltrap.org/mailarchive/linux-kernel/2008/3/3/1051054 https://kerneltrap.org/mailarchive/linux-kernel/2008/3/3/1051054 Chrt Chrt http://www.cyberciti.biz/faq/howto-set-real-time-scheduling-priority-process/ http://www.cyberciti.biz/faq/howto-set-real-time-scheduling-priority-process/ Benchmark Benchmark http://devresources.linux-foundation.org/craiger/hackbench/ http://devresources.linux-foundation.org/craiger/hackbench/


Download ppt "Adding a Scheduling Policy to the Linux Kernel By Juan M. Banda CS518 Advanced Operating Systems."

Similar presentations


Ads by Google