Presentation is loading. Please wait.

Presentation is loading. Please wait.

China’s Software Industry August 2006 Instructor: Hengming Zou, Ph.D.

Similar presentations


Presentation on theme: "China’s Software Industry August 2006 Instructor: Hengming Zou, Ph.D."— Presentation transcript:

1 China’s Software Industry August 2006 Instructor: Hengming Zou, Ph.D.
Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒

2 Outline of Content 3.1. Critical Sections, Semaphores, and Monitors
3.2. Windows Trap Dispatching, Interrupts, Synchronization 3.3. Advanced Windows Synchronization 3.4. Windows APIs for Synchronization and IPC

3 Critical Sections, Semaphores, and Monitors
China’s Software Industry August 2006 Critical Sections, Semaphores, and Monitors The Critical-Section Problem Software Solutions Synchronization Hardware Semaphores Synchronization in Windows & Linux

4 The Critical-Section Problem
China’s Software Industry August 2006 The Critical-Section Problem n threads all competing to use a shared resource Each thread has a code segment, called critical section, in which the shared data is accessed Problem: Ensure that: when one thread is executing in its critical section, no other thread is allowed to execute in its critical section For simplicity, we generally refer to the concurrent execution of threads (as typical for Windows). However, in some systems the fundamental unit of concurrency might be a process rather than a thread.

5 Solution to Critical-Section Problem
Mutual Exclusion Only one thread at a time is allowed into its CS, among all threads that have CS for the same resource or shared data A thread halted in its non-critical section must not interfere with other threads Progress A thread remains inside CS for a finite time only No assumptions concerning relative speed of the threads

6 Solution to Critical-Section Problem
Bounded Waiting It must no be possible for a thread requiring access to a critical section to be delayed indefinitely When no thread is in a critical section, any thread that requests entry must be permitted to enter without delay

7 Initial Attempts to Solve Problem
Only 2 threads, T0 and T1 General structure of thread Ti (other thread Tj) do { enter section critical section exit section reminder section } while (1); Threads may share some common variables to synchronize their actions

8 First Attempt: Algorithm 1
Shared variables Initialization: int turn = 0; turn == i  Ti can enter its critical section Thread Ti do { while (turn != i) ; critical section turn = j; reminder section } while (1); Satisfies mutual exclusion, but not progress

9 Second Attempt: Algorithm 2
Shared variables initialization: int flag[2]; flag[0] = flag[1] = 0; flag[i] == 1  Ti can enter its critical section Thread Ti do { flag[i] = 1; while (flag[j] == 1) ; critical section flag[i] = 0; remainder section } while(1); Satisfies mutual exclusion, not progress requirement

10 Algorithm 3 (Peterson’s Algorithm - 1981)
Shared variables of algorithms 1 and 2 - initialization: int flag[2]; flag[0] = flag[1] = 0; int turn = 0; Thread Ti do { flag[i] = 1; turn = j; while ((flag[j] == 1) && turn == j) ; critical section flag[i] = 0; remainder section } while (1); Solves the critical-section problem for two threads

11 Dekker’s Algorithm (1965) This is the first correct solution proposed for the two-thread (two-process) case Originally developed by Dekker in a different context, it was applied to the critical section problem by Dijkstra. Dekker adds the idea of a favored thread and allows access to either thread when the request is uncontested. When there is a conflict, one thread is favored, and the priority reverses after successful execution of the critical section

12 Dekker’s Algorithm (contd.)
Shared variables - initialization: int flag[2]; flag[0] = flag[1] = 0; int turn = 0; Thread Ti do { flag[i] = 1; while (flag[j] ) if (turn == j) { flag[i] = 0; while (turn == j); flag[i] = 1; } critical section turn = j; flag[I] = 0;; remainder section } while (1);

13 Bakery Algorithm (Lamport 1979)
A Solution to the Critical Section problem for n threads Before entering its CS, a thread receives a number Holder of the smallest number enters the CS. If threads Ti and Tj receive the same number, if i < j, then Ti is served first; else Tj is served first. The numbering scheme generates numbers in monotonically non-decreasing order: i.e., 1,1,1,2,3,3,3,4,4,5...

14 Bakery Algorithm Notation “<“ establishes lexicographical order among 2-tuples (ticket #, thread id #) (a,b) < (c,d) if a < c or if a == c and b < d max (a0,…, an-1) = { k | k  ai for i = 0,…, n – 1 } Shared data int choosing[n]; int number[n]; - the ticket Data structures are initialized to 0

15 Bakery Algorithm do { choosing[i] = 1;
number[i] = max(number[0],number[1] ...,number[n-1]) + 1; choosing[i] = 0; for (j = 0; j < n; j++) { while (choosing[j] == 1) ; while ((number[j] != 0) && ((number[j],j) ‘’<‘’ (number[i],i))); } critical section number[i] = 0; remainder section } while (1);

16 Mutual Exclusion - Hardware Support
Interrupt Disabling Concurrent threads cannot overlap on a uniprocessor Thread will run until performing a system call or interrupt happens Special Atomic Machine Instructions Test and Set Instruction - read & write a memory location Exchange Instruction - swap register and memory location Problems with Machine-Instruction Approach Busy waiting Starvation is possible Deadlock is possible

17 Synchronization Hardware
Test and modify the content of a word atomically boolean TestAndSet(boolean &target) { boolean rv = target; target = true; return rv; }

18 Mutual Exclusion with Test-and-Set
Shared data: boolean lock = false; Thread Ti do { while (TestAndSet(lock)) ; critical section lock = false; remainder section }

19 Synchronization Hardware
Atomically swap two variables void Swap(boolean &a, boolean &b) { boolean temp = a; a = b; b = temp; }

20 Mutual Exclusion with Swap
Shared data (initialized to 0): int lock = 0; Thread Ti int key; do { key = 1; while (key == 1) Swap(lock,key); critical section lock = 0; remainder section }

21 Semaphores Semaphore S – integer variable
can only be accessed via two atomic operations wait (S): while (S <= 0); S--; signal (S): S++;

22 Critical Section of n Threads
Shared data: semaphore mutex; //initially mutex = 1 Thread Ti: do { wait(mutex); critical section signal(mutex); remainder section } while (1);

23 Semaphore Implementation
Semaphores may suspend/resume threads Avoid busy waiting Define a semaphore as a record typedef struct { int value; struct thread *L; } semaphore; Assume two simple operations: suspend() suspends the thread that invokes it resume(T) resumes the execution of a blocked thread T

24 Implementation Semaphore operations now defined as wait(S): S.value--;
if (S.value < 0) { add this thread to S.L; suspend(); } signal(S): S.value++; if (S.value <= 0) { remove a thread T from S.L; resume(T);

25 Semaphore as a General Synchronization Tool
Execute B in Tj only after A executed in Ti Use semaphore flag initialized to 0 Code: Ti Tj   A wait(flag) signal(flag) B

26 Two Types of Semaphores
Counting semaphore integer value can range over an unrestricted domain. Binary semaphore integer value can range only between 0 and 1; can be simpler to implement. Counting semaphore S can be implemented as a binary semaphore

27 Deadlock and Starvation
two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads Let S and Q be two semaphores initialized to 1 T0 T1 wait(S); wait(Q); wait(Q); wait(S);   signal(S); signal(Q); signal(Q) signal(S);

28 Deadlock and Starvation
Starvation – indefinite blocking A thread may never be removed from the semaphore queue in which it is suspended. Solution – all code should acquire/release semaphores in same order

29 Windows Synchronization
Uses interrupt masks to protect access to global resources on uniprocessor systems. Uses spinlocks on multiprocessor systems. Provides dispatcher objects which may act as mutexes and semaphores. Dispatcher objects may also provide events. An event acts much like a condition variable

30 Linux Synchronization
Kernel disables interrupts for synchronizing access to global data on uniprocessor systems. Uses spinlocks for multiprocessor synchronization. Uses semaphores and readers-writers locks when longer sections of code need access to data. Implements POSIX synchronization primitives to support multitasking, multithreading (including real-time threads), and multiprocessing

31 China’s Software Industry August 2006
Further Reading Ben-Ari, M., Principles of Concurrent Programming, Prentice Hall, 1982 Lamport, L., The Mutual Exclusion Problem, Journal of the ACM, April 1986 Abraham Silberschatz, Peter B. Galvin, Operating System Concepts, John Wiley & Sons, 6th Ed., 2003; Chapter 7 - Process Synchronization Chapter 8 - Deadlocks

32 3.2. Trap Dispatching, Interrupts, Synchronization
China’s Software Industry August 2006 3.2. Trap Dispatching, Interrupts, Synchronization Trap and Interrupt dispatching IRQL levels & Interrupt Precedence Spinlocks and Kernel Synchronization Executive Synchronization

33 Kernel Mode Versus User Mode
China’s Software Industry August 2006 Kernel Mode Versus User Mode A processor state Controls access to memory Each memory page is tagged to show the required mode for reading and for writing Protects the system from the users Protects the user (process) from themselves System is not protected from system Code regions are tagged “no write in any mode” Controls ability to execute privileged instructions A Windows abstraction Intel: Ring 0, Ring 3

34 Kernel Mode Versus User Mode
China’s Software Industry August 2006 Kernel Mode Versus User Mode Control flow (a thread) can change from user to kernel mode and back Does not affect scheduling Thread context includes info about execution mode (along with registers, etc) PerfMon counters: “Privileged Time” and “User Time” 4 levels of granularity: thread, process, processor, system

35 Getting Into Kernel Mode
China’s Software Industry August 2006 Getting Into Kernel Mode Code is run in kernel mode for one of three reasons: 1. Requests from user mode Via the system service dispatch mechanism Kernel-mode code runs in the context of the requesting thread 2. Dedicated kernel-mode system threads Some threads in the system stay in kernel mode at all times mostly in the “System” process Scheduled, preempted, etc., like any other threads

36 Getting Into Kernel Mode
China’s Software Industry August 2006 Getting Into Kernel Mode 3. Interrupts from external devices interrupt dispatcher invokes the interrupt service routine ISR runs in the context of the interrupted thread so-called “arbitrary thread context” ISR often requests the execution of a “DPC routine,” which also runs in kernel mode Time not charged to interrupted thread

37 China’s Software Industry August 2006
Trap dispatching Trap: processor‘s mechanism to capture executing thread Switch from user to kernel mode Interrupts – asynchronous Exceptions - synchronous Interrupt dispatcher System service dispatcher Interrupt service routines System services Exception dispatcher Exception handlers Virtual memory manager‘s pager Interrupt System service call HW exceptions SW exceptions Virtual address exceptions

38 Interrupt Dispatching
China’s Software Industry August 2006 Interrupt Dispatching user or kernel mode code kernel mode Note, no thread or process context switch! Interrupt dispatch routine interrupt ! Disable interrupts Record machine state (trap frame) to allow resume Mask equal- and lower-IRQL interrupts Find and call appropriate ISR Dismiss interrupt Restore machine state (including mode and enabled interrupts) Interrupt service routine Tell the device to stop interrupting Interrogate device state, start next operation on device, etc. Request a DPC Return to caller

39 Interrupt Precedence via IRQLs (x86)
China’s Software Industry August 2006 Interrupt Precedence via IRQLs (x86) IRQL = Interrupt Request Level Precedence of the interrupt with respect to other interrupts Different interrupt sources have different IRQLs not the same as IRQ IRQL is also a state of the processor Servicing an interrupt raises processor IRQL to that interrupt’s IRQL this masks subsequent interrupts at equal and lower IRQLs User mode is limited to IRQL 0 No waits or page faults at IRQL >= DISPATCH_LEVEL

40 Interrupt Precedence via IRQLs (x86)
China’s Software Industry August 2006 Interrupt Precedence via IRQLs (x86) Passive/Low APC Dispatch/DPC Device 1 . Profile & Synch (Srv 2003) Clock Interprocessor Interrupt Power fail High normal thread execution Hardware interrupts Deferrable software interrupts 1 2 30 29 28 31

41 China’s Software Industry August 2006
Interrupt processing Interrupt dispatch table (IDT) Links to interrupt service routines x86: Interrupt controller interrupts processor (single line) Processor queries for interrupt vector; uses vector as index to IDT Alpha: PAL code (Privileged Architecture Library – Alpha BIOS) determines interrupt vector, calls kernel Kernel uses vector to index IDT After ISR execution, IRQL is lowered to initial level

42 China’s Software Industry August 2006
Interrupt object Allows device drivers to register ISRs for their devices Contains dispatch code (initial handler) Dispatch code calls ISR with interrupt object as parameter (HW cannot pass parameters to ISR) Connecting/disconnecting interrupt objects: Dynamic association between ISR and IDT entry Loadable device drivers (kernel modules) Turn on/off ISR Interrupt objects can synchronize access to ISR data Multiple instances of ISR may be active simultaneously (MP machine) Multiple ISR may be connected with IRQL

43 China’s Software Industry August 2006
Predefined IRQLs High used when halting the system (via KeBugCheck()) Power fail originated in the NT design document, but has never been used Inter-processor interrupt used to request action from other processor (dispatching a thread, updating a processors TLB, system shutdown, system crash) Clock Used to update system‘s clock, allocation of CPU time to threads Profile Used for kernel profiling (see Kernel profiler – Kernprof.exe, Res Kit)

44 Predefined IRQLs (contd.)
China’s Software Industry August 2006 Predefined IRQLs (contd.) Device Used to prioritize device interrupts DPC/dispatch and APC Software interrupts that kernel and device drivers generate Passive No interrupt level at all, normal thread execution

45 China’s Software Industry August 2006
IRQLs on 64-bit Systems x64 IA64 15 High/Profile High/Profile/Power 14 Interprocessor Interrupt/Power Interprocessor Interrupt 13 Clock Clock 12 Synch (Srv 2003) Synch (MP only) Device n Device n . . 4 . Device 1 3 Device 1 Correctable Machine Check 2 Dispatch/DPC Lab how to find the IRQL of a device (from hardware hive, devicename.Translated resource list, “Level” is IRQL (“vector” is offset in IDT). May be quite different under NT5. Dispatch/DPC & Synch (UP only) 1 APC APC Passive/Low Passive/Low

46 Interrupt Prioritization & Delivery
China’s Software Industry August 2006 Interrupt Prioritization & Delivery IRQLs are determined as follows: x86 UP systems: IRQL = 27 - IRQ x86 MP systems: bucketized (random) x64 & IA64 systems: IRQL = IDT vector number / 16 On MP systems, which processor is chosen to deliver an interrupt? By default, any processor can receive an interrupt from any device Can be configured with IntFilter utility in Resource Kit On x86 and x64 systems, the IOAPIC (I/O advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL On IA64 systems, the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt source Processors are assigned round robin for each interrupt vector

47 China’s Software Industry August 2006
Software interrupts Initiating thread dispatching DPC allow for scheduling actions when kernel is deep within many layers of code Delayed scheduling decision, one DPC queue per processor Handling timer expiration Asynchronous execution of a procedure in context of a particular thread Support for asynchronous I/O operations

48 China’s Software Industry August 2006
Flow of Interrupts EXPERIMENT: Examining Interrupt Internals Using the kernel debugger, you can view details of an interrupt object, including its IRQL, ISR address, and custom interrupt dispatching code. First, execute the !idt command and locate the entry that includes a reference to I8042KeyboardInterruptService, the ISR routine for the PS2 keyboard device: 31: 8a39dc3ci8042prt!I8042KeyboardInterruptService(KINTERRUPT 8a39dc00) To view the contents of the interrupt object associated with the interrupt, execute dt nt!_kinterrupt with the address following KINTERRUPT: kd> dt nt!_kinterrupt 8a39dc00 nt!_KINTERRUPT +0x000Type : x002Size : x004InterruptListEntry :_LIST_ENTRY [0x8a39dc04- 0x8a39dc04 ] +0x00cServiceRoutine : 0xba7e74a2 i8042prt!I8042KeyboardInterruptService+0 +0x010ServiceContext : 0x8a x014SpinLock : x018TickCount : 0xffffffff +0x01cActualLock : 0x8a > x020DispatchAddress : 0x nt!KiInterruptDispatch x024Vector : 0x31 +0x028Irql : 0x1a’’ +0x029SynchronizeIrql : 0x1a’’ +0x02aFloatingSave : 0’’ … In this example, the IRQL Windows assigned to the interrupt is 0x1a (which is 26 in decimal). Because this output is from a uniprocessor x86 system, we calculate that the IRQ is 1, because IRQLs on x86 uniprocessors are calculated by subtracting the IRQ from 27. We can verify this by opening the Device Manager, locating the PS/2 keyboard device, and viewing its resource assignments.

49 Synchronization on SMP Systems
China’s Software Industry August 2006 Synchronization on SMP Systems Sync on MP use spinlocks to coordinate among processors Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm Spinlock is either free or is considered to be owned by a CPU Analogous to using Windows API mutexes from user mode A spinlock is just a data cell in memory Accessed with a test-and-modify operation that is atomic across all processors KSPIN_LOCK is an opaque data type, typedef’d as a ULONG To implement synchronization, a single bit is sufficient

50 Kernel Synchronization
China’s Software Industry August 2006 Kernel Synchronization Processor A Processor B . . do acquire_spinlock(DPC) until (SUCCESS) begin remove DPC from queue end release_spinlock(DPC) do acquire_spinlock(DPC) until (SUCCESS) begin remove DPC from queue end release_spinlock(DPC) spinlock DPC DPC Critical section A spinlock is a locking primitive associated with a global data structure, such as the DPC queue The concept of mutual exclusion is crucial in operating systems development. It refers to the guarantee that one, and only one, thread can access a particular resource at a time. Mutual exclusion is necessary when a resource does not lend itself to shared access or when sharing would result in unpredictable outcome. In general, writeable resources cannot be shared without restrictions. The issue of mutual exclusion is especially important for a tightly coupled, symmetric multiprocessing (SMP) operating system such as Windows 2000, in which the same system code runs simultaneously on more than one processor, sharing certain data structures stored in global memory. In Windows 2000, it is the kernel‘s job to provide mechanisms that system code can use to prevent two threads from modifying the same structure at the same time. The kernel provides mutual exclusion primitives that it and the rest of the executive use to synchronize their access to global data structures. Because the IRQL is an effective synchronization mechanism on uniprocessors, the spinlock acquisition and release functions of uniprocessor HALs do not implement spinlocks – they simply raise and lower the IRQL.

51 Queued Spinlocks Problem: Checking status of spinlock via test-and-set operation creates bus contention Queued spinlocks maintain queue of waiting processors First processor acquires lock; other processors wait on processor-local flag Thus, busy-wait loop requires no access to the memory bus When releasing lock, the 1st processor’s flag is modified Exactly one processor is being signaled Pre-determined wait order

52 SMP Scalability Improvements
China’s Software Industry August 2006 SMP Scalability Improvements Windows 2000: queued spinlocks !qlocks in Kernel Debugger Server 2003: More spinlocks eliminated (context swap, system space, commit) Further reduction of use of spinlocks & length they are held Scheduling database now per-CPU Allows thread state transitions in parallel

53 SMP Scalability Improvements
China’s Software Industry August 2006 SMP Scalability Improvements XP/2003: Minimized lock contention for hot locks PFN or Page Frame Database lock Some locks completely eliminated Charging nonpaged/paged pool quotas, allocating and mapping system page table entries, charging commitment of pages, allocating/mapping physical memory through AWE functions New, more efficient locking mechanism (pushlocks) Doesn’t use spinlocks when no contention Used for object manager and address windowing extensions (AWE) related locks

54 China’s Software Industry August 2006
Waiting Flexible wait calls Wait for one or multiple objects in one call Wait for multiple can wait for “any” one or “all” at once “All”: all objects must be in the signalled state concurrently to resolve the wait All wait calls include optional timeout argument Waiting threads consume no CPU time

55 China’s Software Industry August 2006
Waiting Waitable objects include: Events (may be auto-reset or manual reset; may be set or “pulsed”) Mutexes (“mutual exclusion”, one-at-a-time) Semaphores (n-at-a-time) Timers Processes and Threads (signalled upon exit or terminate) Directories (change notification)

56 China’s Software Industry August 2006
Waiting No guaranteed ordering of wait resolution If multiple threads are waiting for an object, and only one thread is released (e.g. it’s a mutex or auto-reset event), which thread gets released is unpredictable

57 Executive Synchronization
China’s Software Industry August 2006 Executive Synchronization Waiting on Dispatcher Objects – outside the kernel Create and initialize thread object Initialized Wait is complete; Set object to signaled state Thread waits on an object handle Waiting Ready Terminated Transition Standby Running The focus within the process state diagram depicted here is on the ready, waiting, and running states (the states related to waiting on objects). The other states and the complete Windows approach to thread scheduling are covered in Unit OS 4. Interaction with thread scheduling

58 Interaction bet Synchronization & Dispatching
User mode thread waits on an event object‘s handle Kernel changes thread‘s scheduling state from ready to waiting and adds thread to wait-list Another thread sets the event Kernel wakes up waiting threads; variable priority threads get priority boost

59 Interaction bet Synchronization & Dispatching
Dispatcher re-schedules new thread – it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch If no processor can be preempted, the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

60 China’s Software Industry August 2006
What signals an object? System events and resulting state change Dispatcher object Effect of signaled state on waiting threads Owning thread releases mutex Mutex (kernel mode) nonsignaled signaled Kernel resumes one waiting thread Resumed thread acquires mutex Owning thread or other thread releases mutex Mutex (exported to user mode) nonsignaled signaled Kernel resumes one waiting thread Resumed thread acquires mutex One thread releases the semaphore, freeing a resource Semaphore nonsignaled signaled Kernel resumes one or more waiting threads A thread acquires the semaphore. More resources are not available

61 What signals an object? (contd.)
China’s Software Industry August 2006 What signals an object? (contd.) Dispatcher object System events and resulting state change Effect of signaled state on waiting threads A thread sets the event Event nonsignaled signaled Kernel resumes one or more waiting threads Kernel resumes one or more threads Dedicated thread sets one event in the event pair Event pair nonsignaled signaled Kernel resumes waiting dedicated thread Kernel resumes the other dedicated thread Timer expires Timer nonsignaled signaled Kernel resumes all waiting threads A thread (re) initializes the timer Thread terminates Thread nonsignaled signaled Kernel resumes all waiting threads A thread reinitializes the thread object

62 China’s Software Industry August 2006
Further Reading Mark E. Russinovich and David A. Solomon, Microsoft Windows Internals, 4th Edition, Microsoft Press, 2004. Chapter 3 - System Mechanisms Trap Dispatching (pp. 85 ff.) Synchronization (pp. 149 ff.) Kernel Event Tracing (pp. 175 ff.)

63 3.3. Advanced Windows Synchronization
China’s Software Industry August 2006 3.3. Advanced Windows Synchronization Deferred and Asynchronous Procedure Calls IRQLs and CPU Time Accounting Wait Queues & Dispatcher Objects

64 Deferred Procedure Calls (DPCs)
China’s Software Industry August 2006 Deferred Procedure Calls (DPCs) Used to defer processing from higher (device) interrupt level to a lower (dispatch) level Also used for quantum end and timer expiration Driver (usually ISR) queues request One queue per CPU. DPCs are normally queued to the current processor, but can be targeted to other CPUs Executes specified procedure at dispatch IRQL (or “dispatch level”, also “DPC level”) when all higher-IRQL work (interrupts) completed Maximum times recommended: ISR: 10 usec, DPC: 25 usec See NOTES: Technically every CPU has its own DPC queue head and normally DPCs are queued to the CPU from which they’re requested. However the scheduler idle loop “peeks” at other CPUs’ DPC queues, so if one CPU is busy with high-IRQL code its DPCs may be processed by other CPUs. Also, KiSetTargetCpuDpc (or something like that) can be used to direct the DPC object to a particular CPU. Nor is the queue always FIFO; KiSetImportanceDpc allows a DPC object to be set to high priority, which means that it goes at the head of the queue instead of the tail. LAB: Perfmon: Interrupts/sec, %Interrupt time, %DPC time, other DPC counters

65 Deferred Procedure Calls (DPCs)
China’s Software Industry August 2006 Deferred Procedure Calls (DPCs) queue head DPC object DPC object DPC object NOTES: Technically every CPU has its own DPC queue head and normally DPCs are queued to the CPU from which they’re requested. However the scheduler idle loop “peeks” at other CPUs’ DPC queues, so if one CPU is busy with high-IRQL code its DPCs may be processed by other CPUs. Also, KiSetTargetCpuDpc (or something like that) can be used to direct the DPC object to a particular CPU. Nor is the queue always FIFO; KiSetImportanceDpc allows a DPC object to be set to high priority, which means that it goes at the head of the queue instead of the tail. LAB: Perfmon: Interrupts/sec, %Interrupt time, %DPC time, other DPC counters

66 China’s Software Industry August 2006
Delivering a DPC 1. Timer expires, kernel queues DPC that will release all waiting threads Kernel requests SW int. DPC routines can‘t assume what process address space is currently mapped DPC Interrupt dispatch table high Power failure 2. DPC interrupt occurs when IRQL drops below dispatch/DPC level 3. After DPC interrupt, control transfers to thread dispatcher DPC DPC DPC Dispatch/DPC dispatcher DPC queue APC Low The kernel always raises the processor‘s IRQL to DPC/dispatch level or above when it need to synchronize access to shared kernel structures. This disables additional software interrupts and thread dispatching. When the kernel detects that dispatching should occur, it requests a DPC/dispatch level interrupt. But because the IRQL is at or above that level, the processor holds the interrupt in check. When the kernel completes its current activity, it sees that it‘s going to lower the IRQL below DPC/dispatch level and checks to see whether any dispatch interrupts are pending. If there are, the IRQL drops to DPC/dispatch level and the dispatch interrupts are processed. DPC routines can call kernel functions but can‘t call system services, generate page faults, or create or wait on objects 4. Dispatcher executes each DPC routine in DPC queue

67 Asynchronous Procedure Calls (APCs)
China’s Software Industry August 2006 Asynchronous Procedure Calls (APCs) Execute code in context of a particular user thread APC routines can acquire resources (objects), incur page faults, call system services APC queue is thread-specific User mode & kernel mode APCs Permission required for user mode APCs

68 Asynchronous Procedure Calls (APCs)
China’s Software Industry August 2006 Asynchronous Procedure Calls (APCs) Executive uses APCs to complete work in thread space Wait for asynchronous I/O operation Emulate delivery of POSIX signals Make threads suspend/terminate itself (env. subsystems) APCs are delivered when thread is in alertable wait state WaitForMultipleObjectsEx(), SleepEx()

69 Asynchronous Procedure Calls (APCs)
China’s Software Industry August 2006 Asynchronous Procedure Calls (APCs) Special kernel APCs Run in kernel mode, at IRQL 1 Always deliverable unless thread is already at IRQL 1 or above Used for I/O completion reporting from “arbitrary thread context” Kernel-mode interface is linkable, but not documented

70 Asynchronous Procedure Calls (APCs)
China’s Software Industry August 2006 Asynchronous Procedure Calls (APCs) “Ordinary” kernel APCs Always deliverable if at IRQL 0, unless explicitly disabled (disable with KeEnterCriticalRegion) User mode APCs Used for I/O completion callback routines (see ReadFileEx, WriteFileEx); also, QueueUserApc Only deliverable when thread is in “alertable wait”

71 Asynchronous Procedure Calls (APCs)
China’s Software Industry August 2006 Asynchronous Procedure Calls (APCs) Thread Object K APC objects U

72 IRQLs and CPU Time Accounting
China’s Software Industry August 2006 IRQLs and CPU Time Accounting Interval clock timer ISR keeps track of time Clock ISR time accounting: If IRQL<2, charge to thread’s user or kernel time If IRQL=2 and processing a DPC, charge to DPC time If IRQL=2 & not processing a DPC, charge to thread kernel time If IRQL>2, charge to interrupt time Note: if you’re at IRQL 2 the time appears as %DPC time, even though you might not be running a DPC. Similarly, if you’re at IRQL >2 the time appears as %interrupt time, even though you might not be servicing an interrupt. You might have just done a KeRaiseIrql.

73 IRQLs and CPU Time Accounting
China’s Software Industry August 2006 IRQLs and CPU Time Accounting Since time servicing interrupts are NOT charged to interrupted thread, if system is busy but no process appears to be running, must be due to interrupt-related activity Note: time at IRQL 2 or more is charged to the current thread’s quantum (to be described) Note: if you’re at IRQL 2 the time appears as %DPC time, even though you might not be running a DPC. Similarly, if you’re at IRQL >2 the time appears as %interrupt time, even though you might not be servicing an interrupt. You might have just done a KeRaiseIrql.

74 Interrupt Time Accounting
China’s Software Industry August 2006 Interrupt Time Accounting Task Manager includes interrupt and DPC time with the Idle process time Interrupt activity is not charged to any thread/process Process Explorer shows these as separate processes not really processes Context switches for these are really # of interrupts & DPCs

75 Time Accounting Quirks
China’s Software Industry August 2006 Time Accounting Quirks Looking at total CPU time for each process may not reveal where system has spent its time CPU time accounting is driven by programmable interrupt timer Normally 10 msec (15 msec on some MP Pentiums) Thread execution and context switches between clock intervals NOT accounted E.g., one or more threads run and enter a wait state before clock fires Thus threads may run but never get charged View context switch activity with Process Explorer Add Context Switch Delta column

76 Looking at Waiting Threads
China’s Software Industry August 2006 Looking at Waiting Threads For waiting threads, user-mode utilities only display the wait reason Example: pstat

77 Wait Internals 1: Dispatcher Objects
China’s Software Industry August 2006 Wait Internals 1: Dispatcher Objects Any kernel object you can wait for is a “dispatcher object” some exclusively for synchronization e.g. events, mutexes (“mutants”), semaphores, queues, timers others can be waited for as a side effect of their prime function e.g. processes, threads, file objects non-waitable kernel objects are called “control objects” All dispatcher objects have a common header All dispatcher objects are in one of two states “signaled” vs. “nonsignaled” when signalled, a wait on the object is satisfied different object types differ in terms of what changes their state wait and unwait implementation is common to all types of dispatcher objects Dispatcher Object Size Type State Wait listhead Object-type-specific data (see \ntddk\inc\ddk\ntddk.h)

78 Wait Internals 2: Wait Blocks
Thread Objects China’s Software Industry August 2006 Wait Internals 2: Wait Blocks WaitBlockList Key Type Next link List entry Object Thread Key Type Next link List entry Object Thread Key Type Next link List entry Object Thread WaitBlockList Represent a thread’s reference to something it’s waiting for (one per handle passed to WaitFor…) All wait blocks from a given wait call are chained to the waiting thread Type indicates wait for “any” or “all” Key denotes argument list position for WaitForMultipleObjects Dispatcher Objects Wait blocks Size Type State Wait listhead Object-type-specific data Size Type State Wait listhead Object-type-specific data

79 3.4. Windows APIs for Synchronization and IPC
China’s Software Industry August 2006 3.4. Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication Synchronization Critical sections Mutexes Semaphores Event objects Synchronization through interprocess communication Anonymous pipes Named pipes Mailslots The Windows API provides a number of constructs for thread and process synchronization, namely critical sections, mutexes, semaphores, and events. Among them, critical sections are special as they can only be used for synchronizing threads within the same process. In addition, the Windows API providesvarious constructs for interprocess communication (IPC) between processes. Two primary Windows mechanisms for IPC are the anonymous pipe and the named pipe, both of which can be accessed with the familiar ReadFile() and WriteFile() functions. As such, they are well suited for redirecting the output of one program to the input of another, as is commonly done between UNIX programs. Named pipes are much more powerful. They are full-duplex and message-oriented and they allow networked communication. There can be multiple handles open on the same pipe. These capabilities make named pipes appropriate for client/server systems. Win32 mailslots are another networked IPC mechanism which implements one-to-many message broadcasting. Besides functioning as a communication mechanisms, these constructs can also be used to achieve synchronization among cooperating processes.

80 Critical Sections VOID InitializeCriticalSection( LPCRITICAL_SECTION sec ); VOID DeleteCriticalSection( LPCRITICAL_SECTION sec ); VOID EnterCriticalSection( LPCRITICAL_SECTION sec ); VOID LeaveCriticalSection( LPCRITICAL_SECTION sec ); BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec ); Only usable from within the same process Critical sections are initialized and deleted but do not have handles Only one thread at a time can be in a critical section A thread can enter a critical section multiple times - however, the number of Enter- and Leave-operations must match Leaving a critical section before entering it may cause deadlocks No way to test whether another thread is in a critical section

81 Critical Section Example
/* counter is global, shared by all threads */ volatile int counter = 0; CRITICAL_SECTION crit; InitializeCriticalSection ( &crit ); /* … main loop in any of the threads */ while (!done) { _try { EnterCriticalSection ( &crit ); counter += local_value; LeaveCriticalSection ( &crit ); } _finally { LeaveCriticalSection ( &crit ); } DeleteCriticalSection( &crit );

82 Synchronizing Threads with Kernel Objects
China’s Software Industry August 2006 Synchronizing Threads with Kernel Objects DWORD WaitForSingleObject( HANDLE hObject, DWORD dwTimeout ); DWORD WaitForMultipleObjects( DWORD cObjects, LPHANDLE lpHandles, BOOL bWaitAll, DWORD dwTimeout ); The following kernel objects can be used to synchronize threads: Processes Threads Files Console input File change notifications Mutexes Events (auto-reset + manual-reset) Waitable timers

83 Wait Functions - Details
WaitForSingleObject(): hObject specifies kernel object dwTimeout specifies wait time in msec dwTimeout == 0 - no wait, check whether object is signaled dwTimeout == INFINITE - wait forever WaitForMultipleObjects(): cObjects <= MAXIMUM_WAIT_OBJECTS (64) lpHandles - pointer to array identifying these objects bWaitAll - whether to wait for first signaled object or all objects Function returns index of first signaled object Side effects: Mutexes, auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

84 Mutexes Mutexes work across processes
First thread has to call CreateMutex() When sharing a mutex, second thread (process) calls CreateMutex() or OpenMutex() fInitialOwner == TRUE gives creator immediate ownership Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects() ReleaseMutex() gives up ownership CloseHandle() will free mutex object

85 Mutexes HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsa, BOOL fInitialOwner, LPTSTR lpszMutexName ); HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsa, BOOL fInitialOwner, LPTSTR lpszMutexName ); BOOL ReleaseMutex( HANDLE hMutex );

86 Mutex Example /* counter is global, shared by all threads */
volatile int done, counter = 0; HANDLE mutex = CreateMutex( NULL, FALSE, NULL ); /* main loop in any of the threads, ret is local */ DWORD ret; while (!done) { ret = WaitForSingleObject( mutex, INFINITE ); if (ret == WAIT_OBJECT_0) counter += local_value; else /* mutex was abandoned */ break; /* exit the loop */ ReleaseMutex( mutex ); } CloseHandle( mutex );

87 Comparison - POSIX mutexes
POSIX pthreads specification supports mutexes Synchronization among threads in same process Five basic functions: pthread_mutex_init() pthread_mutex_destroy() pthread_mutex_lock() pthread_mutex_unlock() pthread_mutex_trylock() Comparison: pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex ); pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

88 Semaphores Semaphore objects are used for resource counting
A semaphore is signaled when count > 0 Threads/processes use wait functions Each wait function decreases semaphore count by 1 ReleaseSemaphore() may increment count by any value ReleaseSemaphore() returns old semaphore count

89 Semaphores HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsa, LONG cSemInit, LONG cSemMax, LPTSTR lpszSemName ); HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsa, LONG cSemInit, LONG cSemMax, LPTSTR lpszSemName ); HANDLE ReleaseSemaphore( HANDLE hSemaphore, LONG cReleaseCount, LPLONG lpPreviousCount );

90 Events Multiple threads can be released when a single event is signaled (barrier synchronization) Manual-reset event can signal several thread simultaneously; must be reset manually PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event Auto-reset event signals a single thread; event is reset automatically fInitialState == TRUE - create event in signaled state

91 Events HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsa, BOOL fManualReset, BOOL fInititalState LPTSTR lpszEventName ); BOOL SetEvent( HANDLE hEvent ); BOOL ResetEvent( HANDLE hEvent ); BOOL PulseEvent( HANDLE hEvent );

92 Comparison - POSIX condition variables
pthread’s condition variables are comparable to events pthread_cond_init() pthread_cond_destroy() Wait functions: pthread_cond_wait() pthread_cond_timedwait() Signaling: pthread_cond_signal() - one thread pthread_cond_broadcast() - all waiting threads No exact equivalent to manual-reset events

93 Anonymous pipes main prog2 prog1 Half-duplex character-based IPC
cbPipe: pipe byte size; zero == default Read on pipe handle will block if pipe is empty Write operation to a full pipe will block Anonymous pipes are oneway BOOL CreatePipe( PHANDLE phRead, PHANDLE phWrite, LPSECURITY_ATTRIBUTES lpsa, DWORD cbPipe ) main prog1 prog2 pipe

94 I/O Redirection using an Anonymous Pipe
/* Create default size anonymous pipe, handles are inheritable. */ if (!CreatePipe (&hReadPipe, &hWritePipe, &PipeSA, 0)) { fprintf(stderr, “Anon pipe create failed\n”); exit(1); } /* Set output handle to pipe handle, create first processes. */ StartInfoCh1.hStdInput = GetStdHandle (STD_INPUT_HANDLE); StartInfoCh1.hStdError = GetStdHandle (STD_ERROR_HANDLE); StartInfoCh1.hStdOutput = hWritePipe; StartInfoCh1.dwFlags = STARTF_USESTDHANDLES; if (!CreateProcess (NULL, (LPTSTR)Command1, NULL, NULL, TRUE, 0, NULL, NULL, &StartInfoCh1, &ProcInfo1)) { fprintf(stderr, “CreateProc1 failed\n”); exit(2); CloseHandle (hWritePipe);

95 Pipe example (contd.) /* Repeat (symmetrically) for the second process. */ StartInfoCh2.hStdInput = hReadPipe; StartInfoCh2.hStdError = GetStdHandle (STD_ERROR_HANDLE); StartInfoCh2.hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE); StartInfoCh2.dwFlags = STARTF_USESTDHANDLES; if (!CreateProcess (NULL, (LPTSTR)targv, NULL, NULL,TRUE,/* Inherit handles. */ 0, NULL, NULL, &StartInfoCh2, &ProcInfo2)) { fprintf(stderr, “CreateProc2 failed\n”); exit(3); } CloseHandle (hReadPipe); /* Wait for both processes to complete. */ WaitForSingleObject (ProcInfo1.hProcess, INFINITE); WaitForSingleObject (ProcInfo2.hProcess, INFINITE); CloseHandle (ProcInfo1.hThread); CloseHandle (ProcInfo1.hProcess); CloseHandle (ProcInfo2.hThread); CloseHandle (ProcInfo2.hProcess); return 0;

96 Named Pipes Message oriented: Bi-directional
Reading process can read varying-length messages precisely as sent by the writing process Bi-directional Two processes can exchange messages over the same pipe Multiple, independent instances of a named pipe: Several clients can communicate with a single server using the same instance Server can respond to client using the same instance Pipe can be accessed over the network location transparency Convenience and connection functions

97 Using Named Pipes lpszPipeName: \\.\pipe\[path]pipename
Not possible to create a pipe on remote machine (. – local machine) fdwOpenMode: PIPE_ACCESS_DUPLEX, PIPE_ACCESS_INBOUND, PIPE_ACCESS_OUTBOUND fdwPipeMode: PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE PIPE_WAIT or PIPE_NOWAIT (will ReadFile block?) Use same flag settings for all instances of a named pipe HANDLE CreateNamedPipe (LPCTSTR lpszPipeName, DWORD fdwOpenMode, DWORD fdwPipMode DWORD nMaxInstances, DWORD cbOutBuf, DWORD cbInBuf, DWORD dwTimeOut, LPSECURITY_ATTRIBUTES lpsa );

98 Named Pipes (contd.) nMaxInstances: Number of instances,
PIPE_UNLIMITED_INSTANCES: OS choice based on resources dwTimeOut Default time-out period (in msec) for WaitNamedPipe() First CreateNamedPipe creates named pipe Closing handle to last instance deletes named pipe Polling a pipe: Nondestructive – is there a message waiting for ReadFile BOOL PeekNamedPipe (HANDLE hPipe, LPVOID lpvBuffer, DWORD cbBuffer, LPDWORD lpcbRead, LPDWORD lpcbAvail, LPDWORD lpcbMessage);

99 Named Pipe Client Connections
CreateFile with named pipe name: \\.\pipe\[path]pipename \\servername\pipe\[path]pipename First method gives better performance (local server) Status Functions: GetNamedPipeHandleState SetNamedPipeHandleState GetNamedPipeInfo

100 Convenience Functions
WriteFile / ReadFile sequence: BOOL TransactNamedPipe( HANDLE hNamedPipe, LPVOID lpvWriteBuf, DWORD cbWriteBuf, LPVOID lpvReadBuf, DWORD cbReadBuf, LPDOWRD lpcbRead, LPOVERLAPPED lpa);

101 Convenience Functions
CreateFile / WriteFile / ReadFile / CloseHandle: dwTimeOut: NMPWAIT_NOWAIT, NMPWAIT_WIAT_FOREVER, NMPWAIT_USE_DEFAULT_WAIT : BOOL CallNamedPipe( LPCTSTR lpszPipeName, LPVOID lpvWriteBuf, DWORD cbWriteBuf, LPVOID lpvReadBuf, DWORD cbReadBuf, LPDWORD lpcbRead, DWORD dwTimeOut);

102 Server: eliminate the polling loop
lpo == NULL: Call will return as soon as there is a client connection Returns false if client connected between CreateNamed Pipe call and ConnectNamedPipe() Use DisconnectNamedPipe to free the handle for connection from another client WaitNamedPipe(): Client may wait for server‘s ConnectNamedPipe() Security rights for named pipes: GENERIC_READ, GENERIC_WRITE, SYNCHRONIZE BOOL ConnectNamedPipe (HANDLE hNamedPipe, LPOVERLAPPED lpo

103 Comparison with UNIX UNIX FIFOs are similar to a named pipe
FIFOs are half-duplex FIFOs are limited to a single machine FIFOs are still byte-oriented, so its easiest to use fixed-size records in client/server applications Individual read/writes are atomic A server using FIFOs must use a separate FIFO for each client‘s response, although all clients can send requests via a single, well known FIFO Mkfifo() is the UNIX counterpart to CreateNamedPipe() Use sockets for networked client/server scenarios

104 Client Example using Named Pipe
WaitNamedPipe (ServerPipeName, NMPWAIT_WAIT_FOREVER); hNamedPipe = CreateFile (ServerPipeName, GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); if (hNamedPipe == INVALID_HANDLE_VALUE) { fptinf(stderr, Failure to locate server.\n"); exit(3); } /* Write the request. */ WriteFile (hNamedPipe, &Request, MAX_RQRS_LEN, &nWrite, NULL); /* Read each response and send it to std out. */ while (ReadFile (hNamedPipe, Response.Record, MAX_RQRS_LEN, &nRead, NULL)) printf ("%s", Response.Record); CloseHandle (hNamedPipe); return 0;

105 Server Example Using a Named Pipe
hNamedPipe = CreateNamedPipe (SERVER_PIPE, PIPE_ACCESS_DUPLEX, PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT, 1, 0, 0, CS_TIMEOUT, pNPSA); while (!Done) { printf ("Server is awaiting next request.\n"); if (!ConnectNamedPipe (hNamedPipe, NULL) || !ReadFile (hNamedPipe, &Request, RQ_SIZE, &nXfer, NULL)) { fprintf(stderr, “Connect or Read Named Pipe error\n”); exit(4); } printf( “Request is: %s\n", Request.Record); /* Send the file, one line at a time, to the client. */ fp = fopen (File, "r"); while ((fgets (Response.Record, MAX_RQRS_LEN, fp) != NULL)) WriteFile (hNamedPipe, &Response.Record, (strlen(Response.Record) + 1) * TSIZE, &nXfer, NULL); fclose (fp); DisconnectNamedPipe (hNamedPipe); } /* End of server operation. */

106 China’s Software Industry August 2006
Win32 IPC - Mailslots Mailslots bear some nasty implementation details; they are almost never used Broadcast mechanism: One-directional Mutliple writers/multiple readers (frequently: one-to-many comm.) Message delivery is unreliable Can be located over a network domain Message lengths are limited (w2k: < 426 byte) Operations on the mailslot: Each reader (server) creates mailslot with Creat slot() Write-only client opens mailslot with CreateFile() and uses WriteFile() – open will fail if there are no waiting readers Client‘s message can be read by all servers (readers) Client lookup: \\*\mailslot\mailslotname Client will connect to every server in network domain The Win32 API provides various constructs for interprocess communication (IPC) between processes. Two primary Win32 mechanisms for IPC are the anonymous pipe and the named pipe, both of which can be accessed with the familiar ReadFile() and WriteFile() functions. As such, they are well suited for redirecting the output of one program to the input of another, as is commonly done between UNIX programs. Win32 mailslots are another networked IPC mechanism which implements one-to-many message broadcasting. Within a distributed client/server aapplication, mailslots can be used to implement a naming service by periodically broadcasting the names of those named pipes which are access points to the server process.

107 Locate a server via mailslot
Mailslot Servers Mailslot Client App client 0 Message is sent periodically hMS = Creat slot( “\\.\mailslot\status“); ReadFile(hMS, &ServStat); /* connect to server */ App Server While (...) { Sleep(...); hMS = CreateFile( “\\.\mailslot\status“); ... WriteFile(hMS, &StatInfo } App client n hMS = Creat slot( “\\.\mailslot\status“); ReadFile(hMS, &ServStat); /* connect to server */

108 Creating a mailslot lpszName points to a name of the form
\\.\mailslot\[path]name Name must be unique; mailslot is created locally cbMaxMsg is msg size in byte dwReadTimeout Read operation will wait for so many msec 0 – immediate return MAILSLOT_WAIT_FOREVER – infinite wait HANDLE Creat slot(LPCTSTR lpszName, DWORD cbMaxMsg, DWORD dwReadTimeout, LPSECURITY_ATTRIBUTES lpsa);

109 Opening a mailslot CreateFile with the following names:
\\.\mailslot\[path]name - retrieve handle for local mailslot \\host\mailslot\[path]name - retrieve handle for mailslot on specified host \\domain\mailslot\[path]name - returns handle representing all mailslots on machines in the domain \\*\mailslot\[path]name - returns handle representing mailslots on machines in the system‘s primary domain: max mesg. len: 400 bytes Client must specifiy FILE_SHARE_READ flag GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

110 Thoughts Change Life 意念改变生活


Download ppt "China’s Software Industry August 2006 Instructor: Hengming Zou, Ph.D."

Similar presentations


Ads by Google