Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linux Process and Scheduling

Similar presentations


Presentation on theme: "Linux Process and Scheduling"— Presentation transcript:

1 Linux Process and Scheduling
Computer Science & Engineering Department Arizona State University Tempe, AZ 85287 Dr. Yann-Hang Lee (480)

2 Linux Tasks task_struct (/include/linux/sched.h)
process control block or process descriptor thread_info (/arch/x86/include/asm/thread_info.h) low level task data that entry.S needs immediate access to Why thread_info is needed the handlers in entry.S – all system-call, timer-interrupt, as well as all interrupts and faults that can result in a task-switch. At the instant that the CPU is switched to kernel due to interrupts and exceptions. need to access the execution status based on registers (including stack) or a fixed memory location. Most kernel code that deals with processes works directly with struct task_struct. current : pointer to the task_struct of the running process In PPC, current is saved in a register. In x86, it is computed from thread_info. Exercises: find the size of “task_struct”

3 Task IDs Task pid in task_struct Internal notion of process identifier
Each process has an associated process identifier (PID) – used in user space typedef __kernel_pid_t pid_t; typedef int __kernel_pid_t; pid hash table to find task_struct from pid Internal notion of process identifier struct pid (include/linux/pid.h) A reference to struct pid is unique, avoiding from pid_t wrap around. Various ids pid: process identifier tid: thread identifier (the pid in task_struct) tgid: thread group identifier (the pid of the thread that started the whole process.) pthread_id: pthread identifier SYSCALL_DEFINE0(getpid) { return task_tgid_vnr(current); } SYSCALL_DEFINE0(gettid) { return task_pid_vnr(current);

4 Threads, Processes, Process Groups, And Sessions,
In V6 Unix, a pre-allocated array of struct proc that was indexed by PID 4.3BSD (1986), a dynamic linked list of struct proc 4.4BSD secondary lists for process group to avoid searching the whole process table. In Linux, process = thread group leader process group: the processes connected in pipes (“cat file | grep “*.ko” | more”) session: terminal sessions & foreground process group PID/PID hash table linkage – 4 hash tables stored in the pid_hash array struct pid_link pids[PIDTYPE_MAX]; PIDTYPE_PID, PIDTYPE_TGID, PIDTYPE_PGID, PIDTYPE_SID

5 Linux pid Hash Tables /* in task_struct
struct pid_link pids[PIDTYPE_MAX]; */ /* include/linux/pid.h struct pid { atomic_t count; unsigned int level; /* lists of tasks that use this pid */ struct hlist_head tasks[PIDTYPE_MAX]; struct rcu_head rcu; struct upid numbers[1]; }; struct pid_link struct hlist_node node; struct pid *pid;

6 Process Hierarchy struct list_head tasks; struct task_struct __rcu *real_parent; /* actually created this child process */ struct task_struct __rcu *parent; /* recipient of SIGCHLD, wait4() reports */ struct list_head children; /* list of my children */ struct list_head sibling; /* linkage in my parent's children list */ Exercise: Find out how to set a new “parent” from “real_parent” All processes are descendents of the init process. task list -- a circular doubly linked list Example: to iterate over a process's child struct task_struct *task; struct list_head *list; list_for_each(list, &current->children) { task = list_entry(list, struct task_struct, sibling); /* task now points to one of current's children */ }

7 Linux List Structure Doubly linked list embedded in data struct
list_entry(p,t,m) -- Returns the address of the data structure of type t in which the list_head field that has the name m and the address p is included. #define list_entry(ptr, type, member) \ container_of(ptr, type, member) list_for_each(p,h) -- Scans the elements of the list specified by the address h of the head; in each iteration, a pointer to the list_head structure of the list element is returned in p. #define list_for_each(pos, head) \ for (pos = (head)->next; pos != (head); pos = pos->next)

8 Linux Task (Process) State (1)
Wait queues to keep sleeping and waiting tasks Runqueue (list) of each priority level for tasks in TASK_RUNNING state (?)

9 Linux Task (Process) State (2)
TASK_RUNNING The process is runnable; it is either currently running or on a run queue (?) waiting to run. TASK_INTERRUPTIBLE The process is sleeping (that is, it is blocked), waiting for some condition to exist. The process can awakes prematurely and becomes runnable if it receives a signal. TASK_UNINTERRUPTIBLE it does not wake up and become runnable if it receives a signal. (so, the event is expected to occur quite quickly). (unkillable by SIGKILL) TASK_ZOMBIE The task has terminated, but its parent has not yet issued a wait4() system call. The task's process descriptor must remain in case the parent wants to access it. If the parent calls wait4(), the process descriptor is deallocated. TASK_STOPPED Process execution has stopped; the task is not running nor is it eligible to run.

10 Process Creation 3 syscalls: fork(), clone(), vfork() However,
fork() is traditional: duplicate process and can be expensive clone() is used for lightweight processes vfork() is an efficiency hack However, fork(), vfork(), and clone() all invoke do_fork() with the requisite flags. In do_fork, copy_process() and wake_up_new_task() Copy-on-write and child-runs-first Exercise: It looks like you cannot fork a new process ifndef CONFIG_MMU (/kernel/fork.c -- SYSCALL_DEFINE0(fork)). Can a Linux system have multiple processes (threads) without MMU or without virtual memory

11 Fork Programming Example
If the fork() function is successful then it returns twice. Once it returns in the child process with return value ‘0’ Then it returns in the parent process with child’s PID as return value. separate copies of the variables. Fork/exec Create a child process and calls exec to replace its parent's program with a new program. The text, data, bss, and stack of the calling process are overwritten by that of the program loaded. Exercise: does the child process inherit any mutex locks or allocated heaps of its parent process int var_glb = 0; /* A global variable*/ int main(void) { int var_lcl = 0; pid_t pid = fork(); if (pid == 0) { // child process var_glb++; var_lcl++; printf("child process %d %d \n“, var_glb, var_lcl); } else if (pid > 0) { // parent process var_glb--; var_lcl--; printf("parent process %d %d \n“, var_glb, var_lcl); else { // fork failed printf("fork() failed!\n"); return 1; return 0;

12 Copy_process() Creates a new kernel stack, thread_info structure, and task_struct for the new process. The new values are identical to those of the current task. Checks the resource limits on the number of processes for the current user. Initiates the values for the child’s task_struct. Members that are not inherited are primarily statistically information. The bulk of the data in the process descriptor is shared. Set the child's state to TASK_UNINTERRUPTIBLE copy_flags() to update the flags member of the task_struct. get_pid() to assign an available PID to the new task. Depending on the flags passed to clone(), either duplicates or shares open files, filesystem information, signal handlers, process address space, and namespace. The remaining timeslice between the parent and its child is split between the two. Cleans up and returns to the caller a pointer to the new child.

13 Copy on Write For fork(), all resources owned by the parent are duplicated and the copy is given to the child Copy on write: Initially, all pages are shared between parent and child processes marking pages of memory as read-only and intercepts the write attempt when a write occurs, allocates a new physical page initialized with the copy-on-write data keeping a reference count. CoW is no longer needed if the counter is 1. The resulting cost of fork operation the duplication of the parent’s page tables the creation of a unique process descriptor or the child

14 Process Switch (1) Process switching occurs only in kernel mode.
The contents of all registers used by a process in user mode have already been saved on the Kernel Mode stack before performing process switching. (why?) Hardware context: The set of data that must be loaded into the registers before the process resumes its execution on the CPU Stored in task_struct (thread_struct) and kernel-mode stack (general registers) Changing the stack pointer changes the process Switching: in function switch_to() and __switch_to() Enter a subroutine; push registers onto the stack Save other state in thread_struct Change the stack pointer Restore state from the new thread_struct Restore registers from the new stack Return to the new process’ caller Exercise: when to switch to the new stack

15 Process Switch (2) /* switch_to macro CONFIG_X86_32*/
flags ebp …… esp prev next /* switch_to macro CONFIG_X86_32*/ pushfl\n\t" /* save flags */ \ "pushl %%ebp\n\t" /* save EBP */ \ "movl %%esp,%[prev_sp]\n\t" /* save ESP */ \ "movl %[next_sp],%%esp\n\t" /* restore ESP */ \ "movl $1f,%[prev_ip]\n\t" / * save EIP */ \ "pushl %[next_ip]\n\t" /* restore EIP */ \ __switch_canary \ "jmp __switch_to\n" /* regparm call */ \ "1:\t" "popl %%ebp\n\t" /* restore EBP */ \ …….. /* in __switch_to() */ return prev_p; next_ip flags ebp …… esp prev next __switch_to …… esp prev next flags ebp __switch_to

16 Process Scheduling Cooperative multitasking Preemptive multitasking
does not stop running until it voluntary decides to do so (yielding) Preemptive multitasking Preemption -- involuntarily suspending a running process. Scheduling in Linux Preemption -- when a process enters the TASK_RUNNING state, the kernel checks whether its priority is higher than the priority of the currently executing process. When a process's timeslice reaches zero, it is preempted and the scheduler is invoked to select a new process. Three classes of threads for scheduling purposes: SCHED_FIFO, SCHED_RR, SCHED_NORMAL Process priority Static -- Used for real-time processes in the range of 0-99, no time quanta, processes run until block or yield voluntarily Dynamic -- for conventional processes, , and can be adjusted based on time run and priority class All real-time processes are higher priority than any conventional processes

17 Basic Data Structure for Scheduling (1)
Runqueue (in Linux 2.6, from “Understanding the Linux Kernel”) the list of runnable processes on a given processor one runqueue per processor. and each runnable process is on exactly one runqueue. (need to re-balance the runqueues periodically) processor affinity (inherited initially and can be changed). struct runqueue { spinlock_t lock; /* spin lock that protects this runqueue */ unsigned long nr_running; /* number of runnable tasks */ unsigned long nr_switches; /* context switch count */ unsigned long expired_timestamp; /* time of last array swap */ unsigned long nr_uninterruptible; /* uninterruptible tasks */ unsigned long long timestamp_last_tick; /* last scheduler tick */ struct task_struct *curr; /* currently running task */ struct task_struct *idle; /* this processor's idle task */ struct mm_struct *prev_mm; /* mm_struct of last ran task */ struct prio_array_t *active; /* active priority array */ struct prio_array_t *expired; /* the expired priority array */ struct prio_array_t arrays[2]; /* the actual priority arrays */ struct task_struct *migration_thread; /* migration thread */ struct list_head migration_queue; /* migration queue*/ atomic_t nr_iowait; /* number of tasks waiting on I/O */ };

18 Basic Data Structure for Scheduling (2)
Struct prio_array When “active” is empty, swap “active” and “expired” struct prio_array_t { int nr_active; /* number of tasks in the queues */ unsigned long bitmap[BITMAP_SIZE]; /* priority bitmap */ struct list_head queue[MAX_PRIO]; /* priority queues */ }

19 When to Call Schedule()
TIF_NEED_RESCHED: a flag in thread_info Preemption Switch from the running process to a higher priority process The high priority task must be set to runnable due to a current event The event can be external or internal Yielding The running process is not runnable any more It is blocked, enters sleep state, or runs out of time quanta. Example: an irq action calls “complete” to wake up a waiting process complete()  __wake_up_locked()  __wake_up_common  call func of __wait_queue (e.g. default_wake_function)  try_to_wake_up()  choose a cpu, ttwu_queue()  ttwu_do_activate()  ttwu_activate(); ttwu_do_wakeup()  sched_class->check_preempt_curr ()  e.g. check_preempt_curr _rt – if (p->prio < rq->curr->prio) { resched_curr(rq); }  set_tsk_need_resched(curr);

20 Schedule() /* __schedule() is the main scheduler function. * The main means of driving the scheduler and thus entering this function are: * 1. Explicit blocking: mutex, semaphore, waitqueue, etc. * 2. TIF_NEED_RESCHED flag is checked on interrupt and userspace return * paths. For example, see arch/x86/entry_64.S. * 3. Wakeups don't really cause entry into schedule(). They add a * task to the run-queue and that's it. * Now, if the new task added to the run-queue preempts the current * task, then the wakeup sets TIF_NEED_RESCHED and schedule() gets * called on the nearest possible occasion: * - If the kernel is preemptible (CONFIG_PREEMPT=y): * - in syscall or exception context, at the next outmost preempt_enable(). * (this might be as soon as the wake_up()'s spin_unlock()!) * - in IRQ context, return from interrupt-handler to preemptible context */

21 Scheduling Call when ret_from_intr
An interrupt may wake up a process In IRQ handling – ……. call do_IRQ jmp ret_from_intr …… jb resume_kernel preempt_schedule_irq() entry point to schedule() from kernel preemptionoff of irq context. (called and return with irqs disabled.) call __schedule() next = pick_next_task(rq, prev); clear_tsk_need_resched(prev); clear_preempt_need_resched(); context_switch(rq, prev, next); ENTRY(resume_kernel) DISABLE_INTERRUPTS(CLBR_ANY) need_resched: cmpl $0,PER_CPU_VAR(__preempt_count) jnz restore_all testl $X86_EFLAGS_IF,PT_EFLAGS(%esp) # interrupts off (exception path) ? jz restore_all call preempt_schedule_irq jmp need_resched END(resume_kernel)

22 Linux Scheduler Class (1)
Linux scheduler classes A modular approach to enable different algorithms to schedule different types of processes. Each scheduler class has a way to prioritize runnable tasks. Iteration over each scheduler class in a fixed order. The (highest priority) scheduler class that has a runnable process wins, selecting who runs next. Exercise: find the order that the scheduler classes are checked to find a runnable task Example: CFS (Complete-fair scheduler) for tasks with SCHED_NORMAL/SCHED_BATCH RT (real-time) for tasks with SCHED_FIFO and SCHED_RR policies DL (deadline): after 3.14 /* * Scheduling policies */ #define SCHED_NORMAL #define SCHED_FIFO #define SCHED_RR #define SCHED_BATCH /* SCHED_ISO: reserved but not implemented */ #define SCHED_IDLE #define SCHED_DEADLINE

23 Linux Scheduler Class (2)
Sched_class kernel/sched/sched.h fair_sched_class -- kernel/sched/fair.c rt_sched_class -- kernel/sched/rt.c dl_sched_class -- kernel/sched/deadline.c Methods for each sched_class for example: To pick_next_task for_each_class(class) { p = class->pick_next_task(rq, prev); if (p) { …… return p;} } For rt_sched_class tasks are scheduled according to their priority (0 to 99) In CFS (Documentation/scheduler/sched-design-CFS.txt) basically models an "ideal, precise multi-tasking CPU" on real hardware "virtual runtime" a task specifies when its next timeslice would start execution on the ideal multi-tasking CPU CFS scheduler is to run the task with the smallest p->se.vruntime value. const struct sched_class rt_sched_class = { .next = &fair_sched_class, .enqueue_task = enqueue_task_rt, .dequeue_task = dequeue_task_rt, .yield_task = yield_task_rt, .check_preempt_curr = check_preempt_curr_rt, .pick_next_task = pick_next_task_rt, …… }

24 Basic Data Structure for Scheduling (3)
struct rq (/kernel/sched/sched.h, v3.19) consists of runqueues for 3 scheduler classes In cfs_rq and dl_rq, runnable tasks are in rbtrees (red-black tree) Runnable asks are sorted by their virtual runtimes (CFS) or deadlines (DL) /* This is the main, per-CPU runqueue data structure. */ struct rq { raw_spinlock_t lock; /* runqueue lock: */ unsigned int nr_running; /*……. cpu load…*/ u64 nr_switches; struct cfs_rq cfs; struct rt_rq rt; struct dl_rq dl; unsigned long nr_uninterruptible; struct task_struct *curr, *idle, *stop; unsigned long next_balance; struct mm_struct *prev_mm; /* ……..*/ };


Download ppt "Linux Process and Scheduling"

Similar presentations


Ads by Google