COT 5611 Operating Systems Design Principles Spring 2012

COT 5611 Operating Systems Design Principles Spring 2012
Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 5:00-6:00 PM

Lecture 18 – Monday March 19, 2012 Reading assignment:
Chapter 9 from the on-line text Last time – Error correcting codes 11/30/2018 Lecture 18

Today All-or-nothing and before-or after atomicity
Atomicity and processor management Processes, threads, and address spaces Thread coordination with a bounded buffer – the naïve approach Thread management Address spaces and multi-level memories Kernel structures for the management of multiple cores/processors and threads/processes YIELD system call 11/30/2018 Lecture 18

All-or-nothing and before-or-after atomicity
All-or-nothing atomicity  strategy for masking failures during the execution of programs. Before-or-after atomicity  necessary for coordination of concurrent activities. Applications: Transaction processing systems. Processor virtualization and processor sharing in operating systems Adding multilevel memory management to pipelined processors complicated the design of OS: a missing page exception could occur in the “middle” of an instruction and requires an all-or-nothing interface between the processor and the OS Coordination of concurrent threads and scheduling require before-or-after atomicity. 11/30/2018 Lecture 18

Processes, threads, and address spaces
Abstractions: Process  a program in execution Address space  the container in which a process runs Thread  a lightweight process; multiple threads could share the same address space; the creation of a an additional thread does not require the overhead associated with the allocation of an address space The distinction between process and thread is somewhat blurred in the modern OS literature. Virtualization: supports abstractions necessary to: 1. Cope with the mismatch between processor speed and memory and I/O speed. A processor is shared among several processes/threads a process/thread is a virtual processor. 2. Allow processes/treads to run independent of the size of the physical memory An address space is a virtual memory container. 11/30/2018 Lecture 18

Thread coordination with bounded buffers
Bounded buffer the virtualization of a communication channel Thread coordination Locks for serialization Bounded buffers for communication Producer thread  writes data into the buffer Consumer thread read data from the buffer Basic assumptions: We have only two threads Threads proceed concurrently at independent speeds/rates Bounded buffer – only N buffer cells Messages are of fixed size and occupy only one buffer cell. Spin lock  a thread keeps checking a control variable/semaphore “until the light turns green.” feasible only when the threads run on a different processors (how could otherwise give a chance to other threads?) 11/30/2018 Lecture 18

11/30/2018 Lecture 18

Implicit assumptions for the correctness of thread coordination implementation
One sending and one receiving thread. Only one thread updates each shared variable. Sender and receiver threads run on different processors to allow spin locks in and out are implemented as integers large enough so that they do not overflow (e.g., 64 bit integers) The shared memory used for the buffer provides read/write coherence The memory provides before-or-after atomicity for the shared variables in and out The result of executing a statement becomes visible to all threads in program order. No compiler optimization supported. These assumption are not realistic!!! 11/30/2018 Lecture 18

Thread and virtual memory management
The kernel supports thread and virtual memory management Thread management: Creation and destruction of threads Allocation of the processor to a ready to run thread Handling of interrupts Scheduling – deciding which one of the ready to run threads should be allocated the processor Virtual memory management  maps virtual address space of a process/thread to physical memory. Each module runs in own address space; if one module runs multiple threads all share one address space. Thread + virtual memory  virtual computer for each module. 11/30/2018 Lecture 18

Multi-level memory systems
The amount of storage and the access time increase at the same time CPU registers L1 cache L2 cache Main memory Magnetic disk Mass storage systems Remote storage Memory management schemes  where the data is placed in this hierarchy Manual  left to the user Automatic  based on memory virtualization More effective Easier to use Lecture 26

Forms of memory virtualization
Memory-mapped files  in UNIX mmap Copy on write  when several threads use the same data map the page holding the data and store the data only once in memory. This works as long all the threads only READ the data. If one of the threads carries out a WRITE then the virtual memory handling should generate an exception and data pages to be remapped so that each thread gets its only copy of the page. On-demand zero filled pages Instead of allocating zero-filled pages on RAM or on the disk the VM manager maps these pages without READ or WRITE permissions. When a thread attempts to actually READ or WRITE to such pages then an exception is generated and the VM manager allocates the page dynamically. Virtual-shared memory  Several threads on multiple systems share the same address space. When a thread references a page that is not in its local memory the local VM manager fetches the page over the network and the remote VM manager un-maps the page. Lecture 26

Multi-level memory management and virtual memory
Two level memory system: RAM + disk. Each page of an address space has an image in the disk The RAM consists of blocks. READ and WRITE from RAM  controlled by the VM manager GET and PUT from disk  controlled by a multi-level memory manager Old design philosophy: integrate the two to reduce the instruction count New approach – modular organization Implement the VM manager (VMM) in hardware. Translates virtual addresses into physical addresses. Implement the multi-level memory manager (MLMM) in the kernel in software. It transfers pages back and forth between RAM and the disk Lecture 26

Lecture 26

The price to pay -the performance of a two level memory
The latency Lp << LS LP  latency of the primary device e.g., 10 nsec for RAM LS  latency of the secondary device, e.g., 10 msec for disk Hit ratio h the probability that a reference will be satisfied by the primary device. Average Latency (AS)  AS = h x LP + (1-h) LS. Example: LP = 10 nsec (primary device is main memory) LS = 10 msec (secondary device is the disk) Hit ratio h= 0.90  AS= 0.9 x x 10,000, = 1,000, nsec~ 1000 microseconds = 1 msec Hit ratio h= 0.99  AS= 0.99 x x 10,000,000 = 100, nsec~ 100 microseconds = 0.1 msec Hit ratio h=  AS= x x 10,000,000 = 10, nsec~ 10 microseconds = 0.01 msec Hit ratio h=  AS= x x 10,000,000 = 1, nsec~ 1 microsecond This considerable slowdown is due to the very large discrepancy (six orders of magnitude) between the primary and the secondary device. Lecture 26

The performance of a two level memory (cont’d)
Statement: if each reference occurs with equal frequency to a cell in the primary and in the secondary device then the combined memory will operate at the speed of the secondary device. The size SizeP << SizeS SizeS =K x SizeP with K large (1/K small) SizeP  number of cells of the primary device SizeS  number of cells of the secondary device Lecture 26

Locality of reference Concentration of references
Spatial locality of reference Temporal locality of reference Reasons for locality of references Programs consists of sets of sequential instructions interrupted by branches Data structures group together related data elements Working set  the collection of references made by an application in a given time window. If the working set is larger than the number of cells of the primary device significant performance degradation. Lecture 26

Threads and the TM (Thread Manager)
Thread  virtual processor - multiplexes a physical processor a module in execution; a module may have several threads. sequence of operations: Load the module’s text Create a thread and lunch the execution of the module in that thread. Scheduler  system component which chooses the thread to run next Thread Manager implements the thread abstraction. Interrupts  processed by the interrupt handler which interacts with the thread manager Exception  interrupts caused by the running thread and processed by exception handlers Interrupt handlers run in the context of the OS while exception handlers run in the context of interrupted thread. 11/30/2018 Lecture 18

The state of a thread; kernel versus application threads
Thread state: Thread Id  unique identifier of a thread Program Counter (PC) -the reference to the next computational step Stack Pointer (SP) PMAR – Page Table Memory Address Register Other registers Threads User level threads/application threads – threads running on behalf of users Kernel level threads – threads running on behalf of the kernel Scheduler thread – the thread running the scheduler Processor level thread – the thread running when there is no thread ready to run 11/30/2018 Lecture 18

11/30/2018 Lecture 18

Virtual versus real; the state of a processor
Virtual objects need a physical support. A system may have multiple processors and each processor may have multiple cores The state of a processor or a core: Processor Id/Core Id  unique identifier of a processor /core Program Counter (PC) -the reference to the next computational step Stack Pointer (SP) PMAR – Page Table Memory Address Register Other registers The OS maintains several kernel structures Thread table  used by the thread manager to maintain the state of all treads. Processor table  used by the scheduler to maintain the state of each processor. 11/30/2018 Lecture 18

11/30/2018 Lecture 18

Processor and thread tables – control structures that must be locked for serialization
11/30/2018 Lecture 18

Primitives for processor virtualization
Memory CREATE/DELETE_ADDRESS SPACE ALLOCATE/FREE_BLOCK MAP/UNMAP UNMAP Interpreter ALLOCATE_THREAD DESTROY_THREAD EXIT_THREAD YIELD AWAIT ADVANCE TICKET ACQUIRE RELEASE Communication channel ALLOCATE/DEALLOCATE_BOUNDED_BUFFER SEND/RECEIVE 11/30/2018 Lecture 18

Thread states and state transitions
11/30/2018 Lecture 18

The state of a thread and its associated virtual address space
11/30/2018 Lecture 18

Switching the processor from one thread to another
Thread creation: thread_id ALLOCATE_THREAD(starting_address_of_procedure, address_space_id); YIELD  function implemented by the kernel to allow a thread to wait for an event. Allows several threads running on the same processor to wait for a lock. It replaces the busy wait. Actions Save the state of the current thread Schedule another thread Start running the new thread – dispatch the processor to the new thread More about YIELD cannot be implemented in a high level language, must be implemented in the machine language. can be called from the environment of the thread, e.g., C, C++, Java 11/30/2018 Lecture 18

YIELD System call  executed by the kernel at the request of an application allows an active thread A to voluntarily release control of the processor. YIELD invokes the ENTER_PROCESSOR_LAYER procedure locks the thread table and unlock it when it finishes it work changes the state of thread A from RUNNING to RUNNABLE invokes the SCHEDULER the SCHEDULER searches the thread table to find another tread B in RUNNABLE state the state of thread B is changed from RUNNABLE to RUNNING the registers of the processor are loaded with the ones saved on the stack for thread B thread B becomes active Why is it necessary to lock the thread table? We may have multiple cores/processors so another thread my be active. An interrupt may occur The pseudo code assumes that we have a fixed number of threads, 7. The flow of control YIELDENTER_PROCESSOR_LAYERSCHEDULEREXIT_PROCESSOR_LAYERYIELD 11/30/2018 Lecture 18

11/30/2018 Lecture 18

Dynamic thread creation and termination
Until now we assumed a fixed number, 7 threads; the thread table was of fixed size. We have to support two other system calls: EXIT_THREAD  Allow a tread to self-destroy and clean-up DESTRY_THREAD  Allow a thread to terminate another thread of the same application 11/30/2018 Lecture 18

Important facts to remember
Each thread has a unique ThreadId Threads save their state on the stack. The stack pointer of a thread is stored in the thread table. To activate a thread the registers of the processor are loaded with information from the thread state. What if no thread is able to run  create a dummy thread for each processor called a processor_thread which is scheduled to run when no other thread is available the processor_thread runs in the thread layer the SCHEDULER runs in the processor layer We have a processor thread for each processor/core. 11/30/2018 Lecture 18

System start-up procedure
Procedure RUN_PROCESSORS() for each processor do allocate stack and setup processor thread /*allocation of the stack done at processor layer shutdown  FALSE SCHEDULER() deallocate processor_thread stack /*deallocation of the stack done at processor layer halt processor 11/30/2018 Lecture 18

Switching threads with dynamic thread creation
Switching from one user thread to another requires two steps Switch from the thread releasing the processor to the processor thread Switch from the processor thread to the new thread which is going to have the control of the processor The last step requires the SCHEDULER to circle through the thread_table until a thread ready to run is found The boundary between user layer threads and processor layer thread is crossed twice Example: switch from thread 0 to thread 6 using YIELD ENTER_PROCESSOR_LAYER EXIT_PROCESSOR_LAYER 11/30/2018 Lecture 18

11/30/2018 Lecture 18

The control flow when switching from one thread to another
The control flow is not obvious as some of the procedures reload the stack pointer (SP) When a procedure reloads the stack pointer then the place where it transfers control when it executes a return is the procedure whose SP was saved on the stack and was reloaded before the execution of the return. ENTER_PROCESSOR_LAYER Changes the state of the thread calling YIELD from RUNNING to RUNNABLE Save the state of the procedure calling it , YIELD, on the stack Loads the processors registers with the state of the processor thread, thus starting the SCHEDULER EXIT_PROCESSOR_LAYER Saves the state of processor thread into the corresponding PROCESSOR_TABLE and loads the state of the thread selected by the SCHEDULER to run (in our example of thread 6) in the processor’s registers Loads the SP with the values saved by the ENTER_PROCESSOR_LAYER 11/30/2018 Lecture 18

11/30/2018 Lecture 18

In ENTER PROCESSOR_LAYER instead of SCHEDULER() should be SP processor_table[processor].topstack
11/30/2018 Lecture 18

COT 5611 Operating Systems Design Principles Spring 2012

Similar presentations

Presentation on theme: "COT 5611 Operating Systems Design Principles Spring 2012"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

COT 5611 Operating Systems Design Principles Spring 2012

Similar presentations

Presentation on theme: "COT 5611 Operating Systems Design Principles Spring 2012"— Presentation transcript:

Similar presentations

About project

Feedback