Pthread and POSIX.1c Threads. What’s A Thread? (Review) Recall that a process is a complete computational entity, including –credentials, –resources and.

Pthread and POSIX.1c Threads

What’s A Thread? (Review) Recall that a process is a complete computational entity, including –credentials, –resources and resource quotas (e.g. memory, disk, CPU time), and –an execution environment with a single set of registers (including program counter). In the thread model –a process is the entity with which credentials, resources and resource quotas are associated, and –a thread is a separately schedulable entity (with its own set of registers, stack, and limited private memory) that shares the same execution requirement and other resources with all other processes belonging to the same process.

Thread Implementation Models There are two major thread implementation models: –kernel threads – the kernel is aware of threads and makes independent scheduling decisions for each thread, regardless of the process of which it is a member. –user-level threads – faster to create, with faster context switching than kernel threads. A potential problem is that when a single thread in a process blocks, since the kernel isn’t aware of the threads, the entire process – and every other thread in that process – blocks. Most modern implementations of user-level threads attempt to provide a solution to the blocking problem by providing “wrappers” for the system calls that could block, detecting them before they block, and diverting control to another thread in the same process.

Thread Implementation Models

When To Use Threads Increased need for throughput – threads are ideal for server applications, as individual threads can be created (perhaps in advance as with many web servers) to handle client requests. Need for performance – especially on SMP machines, each parallelized component of an application can be executed by a separate thread, potentially in parallel. I/O and CPU can be overlapped – this can be achieved, to some extent, using things like the POSIX “aio” functions, but having one thread do I/O and another do computation is a more familiar, and possibly manageable, paradigm, permitting the use of simpler algorithms. Processes are created frequently – In client-server applications, servers often create processes (an expensive operation) for every client that connects.

Pitfalls of Thread Programming Added complexity – more complex algorithm design and data synchronization Difficult to debug and test – thread-level debuggers are newer and more primitive than process-level debuggers; a multithreaded application may work fine on a single processor, but fail on an SMP system. Data synchronization and race conditions – since memory and other resources are shared, explicit synchronization must be used (hence the need for good understanding of the classic process synchronization problems!!!) Potential for deadlocks – since resource locking is required, careful attention must be paid to the order in which this locking is performed (again, good understanding of deadlocks is necessary) Non-thread-safe-environments – many standard system libraries and third- party libraries are not reentrant (allowing execution by several threads at the same time). Although thread-safe libraries are now relatively common, you must be certain to use them!

Models of Thread Programming Master/slave model – one thread (the master) receives each request and creates slave threads to handle the request. Worker model – a number of worker threads are created to service clients; client requests are placed on a queue, and removed by worker threads when as they finish earlier requests. Pipelining model – tasks are broken into smaller components, each component providing the input to the next component. For example, a multithreaded compiler might have threads to preprocess, compile, assemble, and optimize code.

Thread Implementations Provided by many operating systems: –Mach (Carnegie-Mellon) –WIN32 (Microsoft) –OS/2 (IBM) –UNIX (e.g. SUN Solaris, OSF/1) Unfortunately, each of these has a different API for thread-level operations (e.g. thread creation). POSIX.1c provides a standard API for threads, and can be implemented as kernel or user-level threads.

POSIX Thread Functions

pthread_create #include int pthread_create( pthread_t* thread, const pthread_attr_t* attr, void* (*start_routine)(void* ), void* arg ); This function is used to create a new thread which will begin execution with the function named by the start_routine argument, and with the argument pointed to by arg. The “thread” argument points to a pthread_t object which can effectively be used as the thread identification. (“attr” will be discussed later.)

pthread_join #include int pthread_join( pthread_t thread, void** value_ptr ); This function is used to block the calling thread until the thread specified by “thread” terminates (or has already terminated). The “value_ptr” argument is used to retrieve any exit value provide by the terminated thread (by the pthread_exit function).

Mutexes (or Mutices?) A mutex is similar (identical?) to a binary semaphore. That is, it is used to control a resource that can be used by at most one thread at a time. To create a mutex, use pthread_mutex_t amutex = PTHREAD_MUTEX_INITIALIZER; or pthread_mutex_t amutex = PTHREAD_RECURSIVE_MUTEX_INITIALIZER;

Mutex Types Determined by behavior when locking function is called on already owns mutex Fast – simply suspends Error checking – return deadlock indicator Recursive – return immediately, number of locks is increased

pthread_mutex_destroy #include int pthread_mutex_destroy( pthread_mutex_t *mutex); Destroys a mutex.

pthread_mutex_init #include int pthread_mutex_init( pthread_mutex_t *mutex, pthread_mutex_attr *attr); Initializes a mutex with the attributes specified in the specified mutex attribute object. If attr is NULL, the default attributes are used.

pthread_mutex_lock #include int pthread_mutex_lock(pthread_mutex_t* mutex ); Used to lock the specified “mutex.” If it’s already locked, then the calling thread blocks until the mutex is unlocked.

pthread_mutex_unlock #include int pthread_mutex_unlock(pthread_mutex_t* mutex ); Used to unlock the specified “mutex.” For a recursive mutex that has been locked multiple times, only the last unlock (the one that reduces the lock count to zero) will release the mutex for use by other threads. If other threads are blocked waiting on the mutex, the highest priority waiting thread is unblocked and becomes the owner of the mutex.

pthread_mutex_trylock #include int pthread_mutex_trylock( pthread_mutex_t* mutex ); Tries to lock a mutex. If the mutex is already locked, the calling thread returns without waiting for the mutex to be freed.

Recursive Mutex Locking A thread that attempts to lock a non- recursive mutex it already owns (has locked) will receive a deadlock indication and the attempt to lock the mutex will fail. Using a recursive mutex avoids this problem, but the thread must ensure that it unlocks the mutex the appropriate number of times. Otherwise no other threads will be able to lock the mutex.

Dynamic Mutex Initialization #include int pthread_mutex_init( phread_mutex_t *mutex, const pthread_mutexattr_t *attr); Initializes Pthread mutex with specified argument If attr is NULL, default attribute is used

Pthread Mutex Attribute Support only one attribute – mutex type #include int pthread_mutexattr_init(pthread_mutexattr_t *attr); int pthread_mutexattr_destroy(pthread_mutexattr_t *attr); int pthread_mutexattr_settype(pthread_mutexattr_t *attr, int kind); int pthread_mutexattr_gettype(const pthread_mutexattr_t *attr, int *kind); Kind is one of PTHREAD_MUTEX_FAST_NP, PTHREAD_MUTEX_RECURSIZE_NP or PTHREAD_MUTEX_ERRORCHECK_NP

pthread_once #include int pthread_once( phread_once_t *once_block, void (*init_routine) (void)); Ensures that init_routine will run just once regardless of how many threads in a process call it. All threads issue calls to the routine by making identical pthread_once calls (with the same once_block and init_routine). The thread that first makes the pthread_once call succeeds in running the routine; subsequent pthread_once calls from other threads do not run the routine.

pthread_sigmask #include int pthread_sigmask( int how, const sigset_t *set, sigset_t *oset); Examines or changes the calling thread’s signal mask.

pthread_self #include pthread_t pthread_self( void ); This function returns the thread ID of the calling thread.

pthread_equal #include int pthread_equal( pthread_t t1, pthread_t t2 ); Returns zero if the two thread IDs t1 and t2 are not equal, and non-zero if they are equal.

pthread_exit #include void pthread_exit(void *value); Terminates the calling thread, returning the specified value to any thread that may have previously issue a pthread_join on the thread.

pthread_kill #include int pthread_kill(pthread_t thread, int sig); Delivers a signal (sig) to the specified thread.

POSIX Thread Synchronization Tools We have already seen the mutex, similar to a binary semaphore, and the pthread_join and pthread_once functions. Other thread synchronization facilities in pthreads include: –counting semaphores (covered next) –condition variables (as in monitors) –Barriers

Condition Variables Recall that a condition variable (in a monitor) is a synchronization object on which a process may wait (stepping outside the monitor) until another process signals it. With POSIX threads, a condition variable is used in conjunction with a mutex. When executing inside a monitor, the mutex is locked. If necessary a thread waits on the condition variable, which unlocks the mutex, allowing other threads to enter the critical section. Later, when the conditions are right, a thread signals the condition variable, unblocking the highest priority thread waiting on the condition.

Initializing a Condition Variable A condition variable is created and initialized using code similar to this: #include pthread_cond_t a_c_v = PTHREAD_COND_INITIALIZER; where a_c_v is an arbitrary condition variable name.

Waiting on a Condition Variable To wait on a condition variable, a thread must have already locked a mutex. Then it executes: #include pthread_cond_wait(&a_c_v,&a_mutex); The mutex is unlocked and the calling thread blocks. On return from the function, the mutex will again have been locked, and will be owned by the calling thread. Do not use a recursive mutex with this function.

Signaling a Condition Variable A condition variable can be signaled in two ways: –pthread_cond_signal (pthread_cond_t *cond) will unblock the highest priority thread that has been waiting the longest. –pthread_cond_broadcast (pthread_cond_t *cond)will unblock all threads in priority order, using FIFO order for threads with the same priority. A thread may also use pthread_cond_timedwait to wait on a condition variable; an absolute time parameter is provided to unblock the process if the absolute time passes.

After the Signals Over… Recall the rules regarding a process (thread) that performs a signal operation on a condition variable. Either –the process must immediately exit the monitor (e.g. unlock the mutex) [Brinch Hansen’s approach], or –the process must wait while the awakened process uses the controlled resource [Hoare’s approach]. The first approach is recommended.

Barriers A barrier is essentially a gate at which threads must wait until a specified number of threads arrive; each is then allowed to continue. After the blocked threads continue, the barrier is effectively reinitialized to its original state, ready to block threads until a group of the appropriate size has again reassembled.

Creating and Initializing a Barrier To (dynamically) initialize a barrier, use code similar to this (which sets the number of threads to 3): –pthread_barrier_t b; pthread_barrier_init(&b,NULL,3); The second argument specifies an object attribute; using NULL yields the default attributes. This barrier could have been statically initialized (in QNX) by assigning an initial value created using the macro PTHREAD_BARRIER_INITIALIZER(3).

Waiting at a Barrier To wait at a barrier, a process executes: pthread_barrier_wait(&b); One of the threads continuing from the barrier will be returned the value BARRIER_SERIAL_THREAD; the others will receive 0. This property can be used to allow one of the threads to execute unique code. Consider, for example, the operation of the pthread_once function. If a thread waiting at a barrier is signaled, then it resumes waiting at the barrier after the signal handler returns.

Pthread Attributes Default attributes, frequently acceptable, are provided when a NULL parameter is provided for the attribute parameter in many pthread functions. In some cases, however, explicit attribute settings may be required. In this case, an attribute object must be created so the attributes can be modified. Creating and initializing an attribute object is easy: pthread_attr_t my_attributes; pthread_attr_init(&my_attributes);

Setting Attribute Values Once an initialized attribute object exists, changes can be made. For example: –To change the stack size for a thread to 8192 (before calling pthread_create), do this: pthread_attr_setstacksize(&my_attributes, (size_t)8192); –To get the stack size, do this: size_t my_stack_size; pthread_attr_getstacksize(&my_attributes, &my_stack_size);

Other Attributes Detached state – set if no other thread will use pthread_join to wait for this thread (improves efficiency) Guard size – use to protect against stack overfow Inherit scheduling attributes (from creating thread) – or not Scheduling parameter(s) – in particular, thread priority Scheduling policy – FIFO or Round Robin Contention scope – with what other threads does this thread compete for a CPU Stack address – explicitly dictate where the stack is located Lazy stack allocation – allocate on demand (lazy) or all at once, “up front”

Understanding Pthreads Implementation Implementations fall into three categories: –pure user space implementations, –pure kernel thread implementations, or –somewhere between the two (referred to as two-level schedulers, lightweight processes, or activations). There are advantages and disadvantages to implementations based on which of these is used. –Pure user space implementations don’t provide global scheduling scope, and don’t allow multiple threads from the same process to execute in parallel on multiple CPUs. –Pure kernel thread implementations don’t scale well when a process has many threads.

User Threads User threads are programming abstractions that exist to be accessed by calls from within a user program. They might not rely on kernel threads, even if they are provided. A kernel thread is an abstraction for a system execution point within a process. Implementation of POSIX pthreads doesn’t require use of kernel threads, even if they exist.

Older User-level Thread Packages Some operating systems may provide a non- POSIX user-level thread package that has similarity to the POSIX thread standard. POSIX threads may be built on top of these (sometimes easily), or an implementation may be entirely separate. It’s all up to the implementer. Of course, these older packages may be significantly different, and in any case, probably don’t have the same syntax and precise semantics from one system to another.

User Space Implementations User space implementations include: –a “library scheduler” that runs in user mode to schedule the threads in a single process; there’s one of these in each process using threads –the operating system scheduler, which schedules each process independently. A simple-minded way to provide a user space implementation is to switch between threads using the user mode context switching functions like setjmp, longjmp, and signals.

Pure User Space Advantages It doesn’t require any change to the operating system, meaning a new OS can quickly provide support for Pthreads. Context switching between threads is usually faster than if the kernel was involved, because no user-kernel and kernel-user address space switches are required. New threads can be created quickly. Each thread is just another time slice of the set of resources originally assigned to the process.

Pure User Space Disadvantages The all-to-one mapping of user threads to a single kernel-schedulable entity means that threads within the same process compete with each other for CPU cycles. Priority changes are only relative to threads within the same process, not to threads in other processes. This has significant negative implications for real-time programs. Multiple threads in the same process cannot be running in parallel on multiple CPUs.

Kernel Space Threads This is basically a one-to-one mapping of threads in the user program to schedulable entities in the operating system. Much information that must be maintained by the kernel for each thread is of the same size and scope as the information that was previously used for a single process. Some information, such as the open file table, is still associated only with the process.

Pure Kernel Space Advantages Threads compete against all other threads in the system for CPU cycles. Thread priorities are global. Multiple threads in a single process can run in parallel on multiple CPUs.

Pure Kernel Space Disadvantages Creating a new thread does require kernel overhead (although less than creating a new process). If the application will never run on a multiprocessor, user space threads are probably more efficient. Applications using a lot of threads (which could mean 10 or 100 depending on the system) will consume significant system resources and degrade its overall performance, hurting other processes.

Two-level Schedulers In a two-level scheduler system, the library scheduler and the kernel scheduler cooperate to schedule user threads. Called a some-to-one mapping, many user threads are mapped to any of a pool of kernel threads. A user thread may not have a unique mapping to a kernel thread; the mapping may change over time. The Pthreads library assigns user threads to run in a process’ available kernel threads, and the kernel schedules kernel threads from the collection of all processes’ runnable kernel threads.

Example Cases Suppose a program’s user threads frequently sleep on timers, events, or I/O completion. –It’s not logical to tie each of these to a kernel thread, since they’d see little CPU activity. –It’s better in this case to allow the library scheduler to associate these with a single kernel thread, yielding less kernel overhead and better performance. Or suppose user threads are frequently CPU-bound. In this case, the library scheduler can simply associate each thread with a separate kernel thread. This allows the CPU considerable flexibility in scheduling, selecting any of the available user threads for execution.

The Best of Both Worlds Of course, most multithreaded programs don’t include just one type of thread. The greatest advantage of a two-level scheduler is its ability to tailor its kernel thread allocation policies based on the characteristics of the user threads. The extent to which two-level Pthreads implementations actually apportion kernel threads to user threads varies considerably.

The End

Acknowledgement This slide is taken from Stanley A. Wileman, Jr., University of Nebraska at Omaha http://csalpha.ist.unomaha.edu/~stanw/031 /csci4510/

POSIX Counting Semaphores A counting semaphore is not as efficient in providing mutual exclusion as a mutex, but it is a more general tool. The terms wait and post are used in POSIX to refer to the operations also called down and up or P and V. Two types of semaphores are provided: named and unnamed. Named semaphore permit access by multiple processes; in QNX, their names appear in the /dev/sem directory. Named semaphores are slower than the unnamed variety.

sem_t *sem_open (char *nm, int oflags, mode_t cmode, unsigned init_val); nm (name) must begin with '/' and must not contain any additional occurrences of '/'. oflags is not used in opening an existing semaphore. To create a semaphore use O_CREAT, possibly or’ed with O_EXCL to force creation failure if the semaphore already exists. cmode (creation mode) specifies access permission for a newly created semaphore (like file permissions). Usually set ALL (RWX) permissions for the desired group of users (U, G, or O). The init_val argument is used to set the value component of the semaphore. The returned value is a pointer to the semaphore, or –1.

int sem_init (sem_t *sem, int pshared, unsigned value); This function creates an unnamed semaphore. If the pshared argument is non-zero, then the semaphore can be shared between processes via shared memory. The value argument is used to set the value component of the semaphore. The function returns 0 on success, and –1 on failure.

int sem_close (sem_t *sem); The sem_close function is used to sever the connection to a named semaphore opened with sem_open. Named semaphores are persistent; that is, the state of a semaphore persists even if no one has the semaphore open. Using a semaphore after it has been closed (except in a sem_open call) has an undefined effect.

int sem_destroy (sem_t *sem); The sem_destroy function is used to delete a unnamed semaphore after its use. The semaphore being destroyed must have been previously initialized with sem_init. Destroying a semaphore on which other processes are blocked causes them to become unblocked with an error (errno = EINVAL). Using a semaphore after it has been destroyed has an undefined effect.

int sem_getvalue (sem_t *sem, int *value); The sem_getvalue function is used to obtain the value of a named or unnamed semaphore. The returned value is positive if the resource controlled by the semaphore is unlocked. If the returned value is 0, then the resource is locked. Some implementations (but not QNX) may also return a negative number n to indicate the resource is locked, and – nprocesses (or threads) are blocked because of sem_wait operations.

int sem_wait (sem_t *sem); This function attempts to decrement the value of the identified semaphore. If successful, the calling process continues. If the semaphore’s value is 0, the calling process or thread blocks until it can successfully decrement the value.

int sem_post (sem_t *sem); This function increments the value of the identified semaphore. If any processes are blocked because of a sem_wait call on the semaphore, the highest priority process that has been waiting the longest is awakened. sem_post is reentrant with respect to signals, and can be called from a signal handler.

int sem_trywait (sem_t *sem); This function conditionally tries to decrement the value of the identified semaphore. If successful, the function returns 0. If unsuccessful, the function returns –1 and sets errno to EAGAIN.

int sem_unlink (char *name); This function destroys the named semaphore. If other processes have the semaphore open, they will continue to be allowed to use it until they close it (with sem_close).

Pthread and POSIX.1c Threads. What’s A Thread? (Review) Recall that a process is a complete computational entity, including –credentials, –resources and.

Similar presentations

Presentation on theme: "Pthread and POSIX.1c Threads. What’s A Thread? (Review) Recall that a process is a complete computational entity, including –credentials, –resources and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pthread and POSIX.1c Threads. What’s A Thread? (Review) Recall that a process is a complete computational entity, including –credentials, –resources and.

Similar presentations

Presentation on theme: "Pthread and POSIX.1c Threads. What’s A Thread? (Review) Recall that a process is a complete computational entity, including –credentials, –resources and."— Presentation transcript:

Similar presentations

About project

Feedback