Presentation is loading. Please wait.

Presentation is loading. Please wait.

© D.Zinchin Introduction to Network Programming in UNIX & LINUX4-1 Thread Thread is separate part of process, providing it’s specific.

Similar presentations


Presentation on theme: "© D.Zinchin Introduction to Network Programming in UNIX & LINUX4-1 Thread Thread is separate part of process, providing it’s specific."— Presentation transcript:

1 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-1 Thread Thread is separate part of process, providing it’s specific working flow, and sharing the process data and resources with other threads. Text Initialized Read-Only Data Initialized Read-Write Data Uninitialized Data Heap Stack Text Initialized Read-Only Data Initialized Read-Write Data Uninitialized Data Heap Stack Text fork Process 1Process 2 P1 P2 Multi-Processing Text Initialized Read-Only Data Initialized Read-Write Data Uninitialized Data Heap Stack 2Stack 1 Thread 1Thread 2 T1 T2 Multi-Threading Thread Attributes: Thread ID Set of registers (stack pointer, program counter, etc.) Stack (local variables, return addresses) errno Signal mask Priority Multi-ProcessingMulti-Threading New processes created by means of expensive fork call, copying memory and descriptors to allocated resources. Thread is “light-weight” process. Its creation is 10-100 times faster than process creation, and does not require copying of memory and descriptors. Different processes have separate address spaces and resources Multiple threads directly share memory and resources Inter-Process communication requires usage of specific IPC mechanisms. Shared data segment of process is used for inter- thread data exchange Context switching is expensive.Context switching is cheap. Each process uses system calls to allocate its own resources ( IPC, synchronization, other resources). Thread uses system calls for synchronization needs. All other resources could be shared. Single-threaded process could be executed each time by single CPU Multi-threaded process can utilize multiple CPUs for simultaneous execution of multiple threads.

2 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-2 The Particularity of Multi-Threaded Programming All threads within a process share the same global memory. This makes the sharing of information easy between the threads, but along with this simplicity comes the problem of synchronization. Reentrant Functionality Functionality (procedure, object) is reentrant if all its task-unique information (such as local variables) is kept in a separate area of memory that is distinct for each thread (or process), executing this functionality in simultaneous mode. Thread-Safe Functionality Functionality is thread-safe if: It is reentrant All its parts, requiring execution by single thread only, are protected from multiple simultaneous execution. (For this purpose some form of mutual exclusion is used.) Safety from Deadlocks, Livelocks and Starvation The following synchronization problems could occur in multi-threaded (multi-process) environment as result of Race Condition: A Deadlock is a situation in which two or more threads (or processes) sharing the same resource are effectively preventing each other from accessing the resource. A Livelock is a situation in which two or more threads (or processes) continually change their state, each in response to state change in the other one. The result is that none of the running threads could make further progress. A Starvation is a situation where a thread (or process) is unable to gain regular access to shared resource, because the resource are made unavailable for long periods by "greedy" threads (processes), locking it for a long time.

3 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-3 Thread Deadlock and Livelock Examples The classic Deadlock case is where two Threads both require two shared Resources, and they try to lock them in opposite order. In more sophisticated cases the Deadlock scenario could contain the chain of multiple threads waiting each other: The Livelock may arise from attempts to avoid Threads blocking via a try-lock. After the try-lock failure, both threads release their lock and no work is done. Then the same locking pattern is repeated. Thread AResource XResource YThread B lock try-lock lock (fails) try-lock (fails) unlock lock repeating Thread AResource XResource YThread B lock (blocks) lock(blocks) waiting for Thread B waiting for Thread A Thread BThread N …

4 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-4 Thread AResource XResource YThread B lock unlock lock (blocks) unlock lock Deadlock Resolution Example To avoid Deadlock in previous example, both Threads have to follow the same Resource locking order. In other words, both Threads have to establish common Resource locking protocol. In more sophisticated systems with big number of Locking Resources, the establishing of common Resource Locking protocol could be problematic or impossible. In this case the system must provide functionality, prohibiting the specific lock operation, which “fastens” the Deadlock chain. Thread AThread BThread N …

5 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-5 Thread Life Cycle Thread Creation Each process has at least one thread – the Main Thread, where function main() of the process is executed. Any other thread could be created with system call, issued by the Main Thread or any other running thread. Thread Execution The functionality of any thread is execution of its own “thread-main” function, using its own thread stack. The “thread-main” function of the Main Thread is function main(). The “thread-main” function of any other thread is specified during the thread creation. Thread Termination Execution of any specific thread is terminated in following cases: - When its “thread-main” function returns - When Kernel terminates the execution of specific thread by request from another thread or as result of signal handling. - When process terminates, finishing the execution of all its threads. This occurs, when function main() returns, exit() called by any thread, or process terminated by signal. Thread “Post-mortem” Termination Status The return value of “thread-main” function is interpreted as thread termination status and preserved by kernel in the scope of process after specific thread termination. The thread termination status then could be extracted by request from any other running thread. This is default behaviour, which could be changed by specific thread, if it desires to run in daemon mode. Each Operation System, which supports multi-threading, provides its own specific set of system calls, implementing the thread life cycle functionalities. In the following slides we will discuss the set of system calls, provided by POSIX standard, which is supported by most modern UNIX and Linux operation systems.

6 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-6 POSIX Thread API. PThread Basic System Calls int pthread_create(pthread_t *tid, const pthread_attr_t *attr, void *(*func) (void *), void *arg); Creates new Thread under the running process Argument attr specifies the thread attributes. Default parameters are assigned if NULL specified. The new Thread executes function func with arguments arg. On success, the tid argument is filled with ID of newly-created Thread. Returns 0 on success, or positive error code in case of failure. void pthread_exit (void *status); #include gcc … – D _REENTRANT [– D _THREAD_SAFE] … – l pthread POSIX declares portable API, providing set of thread-related system calls named pthread_XXX(). All these system calls require the following synopsis: The header file containing the declaration of all pthread_XXX() system calls, to be #include-d. Macro name _REENTRANT (or _THREAD_SAFE, or _POSIX_THREAD_SAFE_FUNCTIIONS) must be #define-d in your code or to be provided via – D compiler option in compilation time to use thread-safe version of errno and standard functions. The standard library /usr/lib/libpthread.a would be linked via – l compiler option in linkage time. Terminates the execution of calling thread. Does not return to caller. The status must not point to a thread-local object since that object disappears when the thread terminates. Note: If the function, running under the thread, return-s. The return value is the exit status of the thread. If the main() function of the process return-s or if any thread calls exit(), the whole process terminates, including all its threads. Attention: Parameter arg would NOT point to automatic variable in Calling Thread to avoid dereferencing of arg by Newly-created Thread after its de-allocation on stack of Caller Thread.

7 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-7 PThread Basic System Calls pthread_t pthread_self (void); Returns thread ID of calling thread int pthread_detach (pthread_t tid); Changes the specified thread to be detached. The Detached Thread is like a daemon process: when it terminates, all its resources are released, and it could not be pthread_join-ed by another thread. Commonly called by the thread, that wants to detach itself: pthread_detach(pthread_self()); Returns 0 on success, error code on failure. int pthread_equal (pthread_t tid1, pthread_t tid2); Compares two thread IDs. Returns positive value (true) if IDs are equal, returns 0 otherwise Note: POSIX standard does not provide methods like “Join Any” or “Join All”, supported by some thread APIs. POSIX supposes, that thread calling “join”, would explicitly know, whom does he join. The “Join All” method, suspending execution of calling thread until execution of all other non-detached threads in the current process is terminated, could be implemented at application level in following way: - The application would maintain the ID list of all active non-detached threads - The calling thread navigates through ID list and calls pthread_join() for each ID in this list. int pthread_join (pthread_t tid, void ** status); Suspends execution of calling thread, until execution of target thread tid is terminated (as waitpid() for process) Returns 0 on success, error code on failure. If non-NULL status argument is specified, it is filled with target thread termination status. #include int sched_yield(void); Yields the current thread execution in favor of another thread with the same or greater priority (if such exists). Returns 0 (no errors)

8 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-8 Thread Attributes PThread API provides type pthread_attr_t, representing container of PThread Attributes used during Thread Creation with system call pthread_create(). The following Thread Attributes are defined: detachstate The thread ID and exit status for Joinable Thread are retained for a later pthread_join() by some other thread. The Detached Thread is like a daemon process: when it terminates, all its resources are released. Possible values: PTHREAD_CREATE_JOINABLE (default), PTHREAD_CREATE_DETACHED. schedpolicy, schedparam, inheritsched, scope The Scheduler is the part of Kernel that decides which runnable resource (process, thread) will be executed by the CPU next. The 4 attributes above describe the Scheduling Policy and Scheduling Priority of the thread. The default scheduling policy of the thread is non-realtime. Setting the specific values for these attributes is meaningful only for real-time programming (for more information see corresponding man sections) /* attribute container initialization / deinitialization */ int pthread_attr_init(pthread_attr_t *attr); int pthread_attr_destroy(pthread_attr_t *attr); /* setter and getter for Detach State attribute */ int pthread_attr_setdetachstate(pthread_attr_t *attr, int detachstate); int pthread_attr_getdetachstate(const pthread_attr_t *attr, int *detachstate); /* realtime-related setters and getters for Scheduling Policy attributes */ int pthread_attr_setschedpolicy(pthread_attr_t *attr, int policy); int pthread_attr_getschedpolicy(const pthread_attr_t *attr, int *policy); int pthread_attr_setschedparam(pthread_attr_t *attr, const struct sched_param *param); int pthread_attr_getschedparam(const pthread_attr_t *attr, struct sched_param *param); int pthread_attr_setinheritsched(pthread_attr_t *attr, int inherit); int pthread_attr_getinheritsched(const pthread_attr_t *attr, int *inherit); int pthread_attr_setscope(pthread_attr_t *attr, int scope); int pthread_attr_getscope(const pthread_attr_t *attr, int *scope); All functions return 0 on success and a non-zero error code on error. On success, the getter functions also store the current value of the requested attribute in the location pointed to by their second argument.

9 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-9 Thread Cancellation Thread Cancellation is the mechanism by which one (source) thread can send request to another (target) thread to terminate its (target thread’s) execution. Depending on its settings, the target thread can then either: -ignore the request, -honor it immediately, -or defer it till it reaches a cancellation point. Cancellation State: PTHREAD_CANCEL_ENABLE – (default) thread ready to accept cancellation requests PTHREAD_CANCEL_DISABLE – threads ignores cancellation requests Cancellation Type: PTHREAD_CANCEL_DEFERRED – (default) the cancellation request is pending until the next cancellation point (test of cancellation request existence is initiated by target thread). PTHREAD_CANCEL_ASYNCHRONOUS – the cancellation request is executed immediately by the kernel. sourcetarget kernel cancellation point cancel request test cancel stop deferral time Deferred Cancellation int pthread_cancel (pthread_t thread); - sends cancellation request to target thread int pthread_setcancelstate (int state, int *oldstate); - sets self cancellation state int pthread_setcanceltype (int type, int *oldtype); - sets self cancellation type void pthread_testcancel (void); - explicitly tests and executes pending cancellation Cancellation Points in thread execution are the places, where a test for pending cancellation requests is performed and cancellation is executed if positive. The following POSIX threads functions are cancellation points: pthread_join, pthread_cond_wait, pthread_cond_timedwait, pthread_testcancel, sem_wait, sigwait. All other POSIX threads functions are guaranteed not to be cancellation points. Note: In the high level programming languages (C++, Java) it is recommended to terminate the threads on application level, using user-defined cancellation flag variable with following deferred flag testing. This allows to release safely all system resources allocated by specific thread, before its termination.

10 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-10 Thread Cleanup Cleanup Handlers are functions that get called when a thread terminates (if pthread_exit() called or because of cancellation). They free the resources (unlock synchronization devices, close open descriptors, etc.) that a thread may hold at the time of its termination. The POSIX provides system calls to installation and removal of Cleanup Handlers in stack-like order (LIFO). On thread termination all Cleanup Handlers are executed in reverse order, beginning from most recently installed on stack. void pthread_cleanup_push(void (*handler) (void *), void *arg); - installs the handler with argument arg on stack void pthread_cleanup_pop (int execute); - removes and optionally (execute>0) executes the handle, most recently installed on stack Thread Cleanup in C++ In C++ for safe de-allocation of resources during thread termination, the GUARD pattern is used. The Guard class: -Provides the allocation of specific resource in class Constructor -Provides the de-allocation of already allocated resource in class Destructor The Guard is instantiated as automatic variable in any function (or scope), where resource allocation is required. When the program execution leaves this scope, the destructors of all automatic variables are always executed. As result, the resource is de-allocated automatically, when corresponded Guard instance is destructed. Thread Cleanup in Java In Java for guaranteed de-allocation of resources the try{…} catch{…}finally{…} constraint could be used. The finally block always executes when the try block exits. This ensures that the finally block is executed even if an unexpected exception occurs.

11 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-11 Multi-Threading & Fork. POSIX Fork-One Model int pthread_atfork(void (*prepare)(void), void (*parent)(void), void (*child)(void)); The pthread_atfork() function declares fork handlers to be called prior to and following fork(), within the thread that called fork(). Any of prepare, parent or child handlers could be specified as NULL. Returns 0 on success or error number in case of failure. The prepare handlers will be called in LIFO order from the Parent process just before the fork() processing begins. The parent handlers will be called in FIFO order from the Parent process just after fork() processing finishes. The child handlers will be called in FIFO order from the Child process just after fork() processing finishes. The POSIX standard provides "fork-one" model: fork() duplicates the whole memory space including lock objects, but only one calling thread; other threads are not running in the child process. Solution of Potential Deadlock Problem in “Fork-One” Model. To avoid potential Deadlock after the fork() the following “atfork” handlers could be installed before the fork(): - The prepare handle acquires the lock (waiting in blocking mode, while another thread will release the lock) - The parent and child handlers release the lock, acquired by parent handler As result, the lock object is released in Child process, and could be acquired in future by any thread. Potential Deadlock Problem in “Fork-One” Model. If at the time of the fork() call another thread in the Parent owns a lock, this lock will be never unlocked in the Child process, because lock owner thread is not duplicated in the Child. If then any thread in the Child process needs to acquire this lock, the Deadlock occurs. fork() Text Data Heap S t a c k s Text Data Heap S t a c k Thread A Thread B Thread C Thread B ParentChild

12 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-12 Signal Handling In Threads The POSIX standard provides the following system calls for handling of signals in threads: #include int pthread_sigmask(int how, const sigset_t *newmask, sigset_t *oldmask); Changes the signal mask for the calling thread as described by arguments : - how (SIG_SETMASK, SIG_BLOCK, SIG_UNBLOCK) - newmask (initialized and modified by sigXXXset() macros) If oldmask is not NULL, the previous signal mask is stored. Any new thread inherits the calling thread's signal mask Returns 0 on success, error code on failure. Note: signal masks are set on a per-thread basis, but signal actions (sigaction()), are shared between all threads. int pthread_kill(pthread_t thread, int signo); Sends signal number signo to the specific thread. If signo is 0, the actual signal is not sent. Used to check thread existence. Returns 0 on success, error code (ESRCH – thread doesn’t exist, EINVAL – bad parameter) on failure. int sigwait(const sigset_t *set, int *sig); Suspends the calling thread until one of the signals in set becomes pending on the calling thread. Accepting the signal, clears it from pending signals mask, and stores the number of the signal under sig. The signals in set must be blocked and not ignored on entrance to sigwait(). If the delivered signal has a signal handler function attached, that function is not called. The sigwait() is a cancellation point. Returns 0 on success, error code on failure On platforms, supporting also non-POSIX version of sigwait() call, the -D _POSIX_PTHREAD_SEMANTICS compilation flag is required. Note: For systems, using SIGALRM for implementation of system call sleep(), it is recommended in multi-threaded environment to use system call nanosleep() instead of it.

13 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-13 Thread Synchronization. Mutex Mutex is a MUTual EXclusion device, used to serialize access to a section of reentrant code that cannot be executed concurrently by more than one thread. Mutex is useful for protecting shared data from concurrent modifications. A Mutex has two possible states: locked (owned by one thread) unlocked (not owned by any thread) A Mutex could be unlocked only by the same Thread, which locked it. A Mutex can never be owned by two different threads simultaneously. A Thread attempting to lock a Mutex that is already locked by another Thread, is suspended until the owning Thread unlocks the Mutex first. The POSIX defines 3 types of Mutex: Normal (Fast) Mutex (default type) Could be locked only once by the same Thread. Attempt to lock Normal Mutex, already locked by the same Thread, leads to Deadlock. Recursive Mutex Could be locked repeatedly by the same Thread. To be unlocked, the number of unlock operations must be equal to number of performed locks. Error Checking Mutex Could be locked only once by the same Thread. Attempt to lock Error Checking Mutex, already locked by the same Thread, results in error. Mutex. Lock Mutex. Unlock Mutex Thread 1Thread 2 Mutex. Lock Mutex. Unlock Mutex. Lock Mutex. Unlock Mutex Thread 1Thread 2 Mutex. Lock Mutex. Unlock owner

14 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-14 Mutex Initialization System Calls int pthread_mutex_init (pthread_mutex_t *mutex, const pthread_mutexattr_t *mutexattr); Initializes the Mutex referenced by mutex with attributes specified by mutexattr. If mutexattr is NULL, the default attributes are used Returns 0 on success, error (EBUSY- already initialized, EINVAL-bad parameter, ENOMEM-no memory) on failure. int pthread_mutex_destroy (pthread_mutex_t *mutex); Destroys the Mutex referenced by mutex. Returns 0 on success, error (EBUSY- currently locked by another thread, EINVAL-bad parameter) on failure. pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; Static Mutex initializer. Could be used to initialize mutex with default attributes. PThread API provides type pthread_mutexattr_t, representing Mutex Attributes container with only one attribute – Mutex Type, which has following predefined values: PTHREAD_MUTEX_NORMAL - Normal (Default, Fast) Mutex is locked only once, repeated trial leads to deadlock PTHREAD_MUTEX_RECURSIVE - Recursive Mutex permits repeated locks by the same thread, multiple locks to be followed by equal number of unlock operations PTHREAD_MUTEX_ERRORCHECK - Error Check Mutex denies repeated lock trials, returning error The following utilities are provided for Mutex Attributes container maintenance: int pthread_mutexattr_init (pthread_mutexattr_t *attr);- container initializer int pthread_mutexattr_settype(pthread_mutexattr_t *attr, int kind);- Mutex Type setter int pthread_mutexattr_gettype(const pthread_mutexattr_t *attr, int *kind);- Mutex Type getter int pthread_mutexattr_destroy(pthread_mutexattr_t *attr);- container deinitializer All these utilities return 0 on success, error EINVAL if bad parameter specified

15 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-15 Mutex Locking System Calls int pthread_mutex_lock (pthread_mutex_t *mutex); Note: The Normal (Default, Fast) Mutex is basic mutex type, supported by most of platforms and APIs. For platforms and APIs, which don’t support Recursive and Error Check mutex types, these types could be implemented at application level on top of the Normal Mutex. For this purpose the additional “owner id” and “lock count” attributes would be maintained per each Normal Mutex by the application. int pthread_mutex_trylock (pthread_mutex_t *mutex); int pthread_mutex_unlock (pthread_mutex_t *mutex); Locks referenced mutex on behalf of calling thread, which becomes the Mutex Owner. If the mutex is already locked by another thread, the calling thread blocks until the mutex becomes available. Note: If a signal is delivered to a thread waiting for a mutex, upon return from the signal handler the thread resumes waiting for the mutex as if it was not interrupted. If the mutex is already locked by the same thread, result depends on mutex type, as follows: PTHREAD_MUTEX_NORMAL – repeated lock trial leads to Recursive Deadlock PTHREAD_MUTEX_RECURSIVE – returns with immediate success, internal lock count is increased. PTHREAD_MUTEX_ERRORCHECK – returns with EDEADLK error Returns 0 in success or error on failure. Tries to lock the mutex like pthread_mutex_lock() does, but in non-blocking mode. If the mutex could not be locked immediately, returns EBUSY error. Returns 0 on success, error on failure. Releases referenced mutex. As result, the mutex becomes available to any other waiting threads. If the mutex is of type PTHREAD_MUTEX_RECURSIVE, it becomes available for other threads only when lock count reaches 0. Returns 0 on success or error EPERM if the current thread is not owner of the mutex.

16 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-16 Condition Variable Condition Variable is a synchronization device that allows threads to suspend execution and release the processors until shared data will be changed to have a desired state. The basic operations on Condition Variables are: Wait for the specific state of shared data, suspending the thread execution until another thread changes the shared data and notifies (signals) the Condition Variable, that state is changed. Notify one (signal) or all (broadcast) threads, waiting for specific condition, that shared data state is changed. A Condition Variable is stateless signaling device. Notification (signal) does not change the state of device. It affects only the thread(s), that are waiting on this Condition Variable in the moment of notification (signal). A Condition Variable must always be associated with a Mutex, to avoid the race condition where a thread prepares to wait on a Condition Variable and another thread notifies (signals) the condition just before the first thread actually waits on it. Mutex. Lock Mutex. Unlock Mutex Thread 1Thread 2 Mutex. Lock Mutex. Unlock owner CondVar State NOT Fitting CondVar.Wait(Mutex) Mutex. Lock Mutex. Unlock Mutex Thread 1Thread 2 Mutex. Lock Mutex. Unlock owner CondVar State NOT Fitting CondVar.Wait(Mutex) Change State CondVar. Notify Mutex. Lock Mutex. Unlock Thread 1Thread 2 Mutex. Lock Mutex. Unlock CondVar State NOT Fitting CondVar.Wait(Mutex) State is Fitting Change State CondVar. Notify Mutex owner

17 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-17 Condition Variable Initialization System Calls int pthread_cond_init (pthread_cond_t *cond, pthread_condattr_t *cond_attr); Initializes the Condition Variable referenced by cond with attributes specified by cond_attr. If cond_attr is NULL, the default attributes are used Returns 0 on success, error (EBUSY- already initialized, EINVAL-bad parameter, ENOMEM-no memory) on failure. pthread_cond_t cond = PTHREAD_COND_INITIALIZER; Static Condition Variable initializer. Could be used to initialize Condition Variable with default attributes. PThread API provides type pthread_condattr_t, representing Condition Variable Attributes container and following utilities for its maintenance: int pthread_condattr_init(pthread_condattr_t *attr);- container initializer int pthread_condattr_destroy(pthread_condattr_t *attr);- container deinitializer Currently PThread API defines only one default type of Condition Variable. So the type pthread_condattr_t and corresponded utilities are provided only for compliance with the POSIX standard. int pthread_cond_destroy (pthread_cond_t *cond); Destroys the Condition Variable referenced by cond. Returns 0 on success, error (EBUSY- currently used by another thread, EINVAL-bad parameter) on failure.

18 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-18 Condition Variable Utilization System Calls int pthread_cond_wait (pthread_cond_t *cond, pthread_mutex_t *mutex); int pthread_cond_timedwait (pthread_cond_t *cond, pthread_mutex_t *mutex, const struct timespec *abstime); These condition wait calls are used to block the calling thread on a Condition Variable until another thread notifies (signals) the Condition Variable. These calls must be called when Mutex referenced by mutex, is locked by the calling thread. Both condition wait system calls provide the following functionality: - suspend current thread - release the locked mutex - wait for notification from another thread to be sent to cond-itional variable (or expiration of specified timeout – for pthread_cond_timedwait () only ) - re-acquire the lock on mutex - unblock current thread The condition wait calls could be interrupted (spuriously waken up) by delivered UNIX signals. So, if thread blocks on condition variable, awaiting for some logical condition to become true, the logical condition must be re-evaluated, when wait call finished. These system calls are cancellation points. If cancellation request is activated upon waiting call, the mutex is re-acquired before calling the first cancellation cleanup handler. Return values: 0 on success, EINVAL - bad parameter or invalid concurrent usage of condition variable with different mutex objects, EINTR – interrupted by signal ETIMEDOUT - timeout expiration. int pthread_cond_signal (pthread_cond_t *cond); int pthread_cond_broadcast (pthread_cond_t *cond); These condition signaling (condition notification) calls are used to unblock (wake up) the threads, which currently waiting on this cond-ition variable object. The call pthread_cond_signal() wakes up one of threads waiting on cond, the pthread_cond_broadcast() wakes up all threads waiting on cond. The condition notification calls have no effect if there are no threads currently blocked on cond. To avoid race condition, these calls would be called when mutex is locked. The unblocking order of waiting threads depends on Scheduler Policy. Return 0 on success, error (EINVAL) on failure.

19 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-19 Condition Variable Example Consider two shared variables x and y, protected by the mutex mut, and a condition variable cond that is to be signaled whenever x becomes greater than y. Waiting until x is greater than y is performed as follows: pthread_mutex_lock(&mut); while (x <= y) { pthread_cond_wait( &cond, &mut); } /* operate on x and y */ pthread_mutex_unlock( &mut); Modifications on x and y caused x to become greater than y should signal the condition: pthread_mutex_lock( &mut); /* modify x and y */ if (x > y) { pthread_cond_broadcast(&cond); } pthread_mutex_unlock( &mut); To wait for x to becomes greater than y with a timeout of 5 seconds, do: struct timespec timeout; int retcode = 0; pthread_mutex_lock( &mut); clock_gettime(CLOCK_REALTIME, &timeout); timeout.tv_sec = now.tv_sec + 5; while (x <= y && retcode != ETIMEDOUT) { retcode = pthread_cond_timedwait(&cond, &mut, &timeout); } if (retcode == ETIMEDOUT) { /* timeout occurred */ } else { /* operate on x and y */ } pthread_mutex_unlock( &mut); Note: To avoid mutex locking forever (in case of unexpected thread termination), the cleanup handler could be used.

20 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-20 Synchronization Devices in C++ In typical C++ APIs the Mutex and Condition Variable devices are represented by classes like following: class Mutex { friend class CondVar; public: Mutex(); virtual ~Mutex(); // Locks the mutex, works in blocking mode. bool lock(); // Unlocks the mutex bool unlock(); // Tries to lock the mutex, works in non-blocking mode bool tryLock(); … }; class CondVar { public: CondVar(); virtual ~CondVar(); // Unlocks the locked Mutex,waits notification from // another thread, then restores the Mutex lock. bool wait(Mutex& mutex, long sec = 0, long nsec = 0); // Unblocks the first thread, wait()-ing on this Conditional Variable bool notify(); // Unblocks all threads, wait( )-ing on this Conditional Variable bool notifyAll(); … }; Together with class Thread, these classes encapsulate the specifics of Operation System calls, and provide Object Oriented API for development of multi-threaded applications. The following pattern provides safe Mutex locking control: class MutexGuard { public: // Constructor: locks specified Mutex MutexGuard(Mutex& mutex) :m_mutex(mutex) {m_mutex.lock();} // Destructor: unlockes the Mutex ~MutexGuard( ) {m_mutex.unlock();} private: // Reference to Mutex to be locked/unlocked Mutex& m_mutex; }; Sometimes the pair (Mutex + Condition Variable) is called Monitor. The Monitor plays the role of one global synchronization device, providing Facade functionality for encapsulated mutex and conditional variable.

21 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-21 Thread Synchronization in Java Unlike C++, the classic Java standardized the multi-threading interface as part of language syntax. Each instance of generic base class java.lang.Object (and, therefore, any instance of any class) is declared to have its own Monitor (which could be considered as impartible pair: Mutex + Condition Variable) In terms of Java, the thread owns the Monitor, corresponded to specific object in following cases: - by calling a synchronized instance method of this object - by executing a synchronized static method of that class - using synchronized(obj){…} block In terms of C++, the synchronized block could be considered as: { MutexGuard dummy (obj.monitor.mutex); // owns object monitor … } The suspending of thread execution until desired state of specific object is achieved, and wakening up the suspended threads is provided in Java by methods Object.wait(), Object.notify(), Object.notifyAll(). In terms of Java, all these methods could be called successfully only by a thread that is the owner of this object's monitor. Otherwise the IllegalMonitorStateException is thrown. In terms of C++, the call to the method Object.wait() could be considered as: { MutexGuard dummy (obj.monitor.mutex); // owns object monitor … obj.monitor.cond_var.wait (obj.monitor.mutex); // waits object monitor to be notified } Object Monitor as impartible pair: Mutex + Condition Variable, is not always useful for optimal synchronization design. If different groups of functionalities (threads) need to wait for different states of the same synchronized object, it is useful to have multiple Condition Variables (one per waiting group), associated with the same single Mutex responsible for object synchronization. Such design was impossible in Java till release SE 1.5. Beginning from release 1.5, Java introduced new package java.util.concurrent.locks. This package contains class ReentrantLock, which is analog of recursive Mutex, and interface Condition, providing analogy of Condition Variable. The method ReentrantLock::newCondition() is provided to create multiple Condition instances associated with the same instance of ReentrantLock.

22 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-22 Thread Synchronization by means of POSIX Semaphore. POSIX defines 2 forms of Semaphores: Named and Unnamed. The Unnamed semaphore is memory-based. Unlike other types of semaphores used for inter-process synchronization, the Unnamed semaphore could be declared as non-shared. In this case it is visible only from single process and used for inter-thread synchronization only. #include int sem_init(sem_t *sem, 0 /*non-shared*/, - Initializes unnamed non-shared Semaphore in address sem unsigned int value); with initial value equal value. int sem_destroy(sem_t *sem); - Destroys the Semaphore sem int sem_wait(sem_t *sem); - Waits until Value of Semaphore sem becomes positive, then decrements (locks) the semaphore. int sem_timedwait(sem_t* sem, - The same as sem_wait(), excepting the time limit specified for const struct timespec * abs_timeout ); the decrement operation if could not performed immediately int sem_post(sem_t *sem); - Increments (unlocks) the Semaphore Semaphore Role Initialization Value for sem_init() “Wait” Operation sem_wait() meaning “Increment” Operation sem_post() meaning Resource CounterMaximal Number of Free Resources Acquire ResourceRelease Resource Non-Recursive Blocking Device1 (unblocked)LockUnlock Stateful Signaling Device0 (not signaled)WaitSignal (Notify) In most cases the Semaphore is used in following roles:

23 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-23 Semaphore Example. Resource Counter. … #include #define MAX_CONNECTIONS 5 sem_t sem_counter; int main() { /* init semaphore */ sem_init( &sem_counter, 0, MAX_CONNECTIONS); …. } Consider we want to open simultaneously no more than 5 connections to some database. The semaphore will be used as counter of available (not acquired) connections. open_db_connection(…) { /* wait at least one connection to be available */ sem_wait( &sem_counter); /* now actual DB connection could be opened*/ …. } close_db_connection(…) { /* close actual DB connection */ … /* increment available connections counter, if “waiting” threads exist, one of them will be awakened */ sem_post( &sem_counter); } The following procedure will be used by thread to acquire DB connection: The following procedure will be used by thread to release DB connection:

24 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-24 Semaphore Example. Blocking and Signaling. Consider two shared variables x and y, protected by blocking semaphore sem_blocker, and a signaling semaphore sem_signaler is signaled whenever x becomes greater than y. The initialization of these semaphores would be performed as follows: sem_t sem_blocker, sem_signaler; sem_init( &sem_blocker, 0, 1); /* initially posted (unlocked) */ sem_init( &sem_signaler, 0, 0); /* initially not posted (not signaled) */ Waiting until x is greater than y is performed as follows: sem_wait( &sem_blocker); /*lock */ while (x <= y) { sem_post( &sem_blocker); /* temporary unlock to allow x,y to be modified */ sem_wait( &sem_signaler); /* “signaler” is stateful. no race condition here */ sem_wait( &sem_blocker); /* restore lock to continue the work */ } /* operate on x and y */ sem_post( &sem_blocker); /* unlock */ Modifications on x and y that may cause x to become greater than y should post (signal) the “signaler”: sem_wait( &sem_blocker) ); /*lock */ /* modify x and y */ if (x > y) { sem_post( &sem_signaler); } sem_post( &sem_blocker); /* unlock */ Note: To avoid sem_blocker to remain “locked” forever (in case of unexpected thread termination), the cleanup handler could be used.

25 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-25 Mutex or Semaphore ? The Persistent Semaphore is still the main personage for Inter-Process synchronization. For Inter-Thread synchronization the memory-based non-shared Semaphore is optimal in role of Resource Counter. In role of Blocking and Signaling device the Semaphore is less universal than Mutex - Condition Variable couple: as Blocking devise it is non-recursive and could be mistakenly “unlocked” without “lock” as Signaling device it does not have “broadcast” (notify all) possibility The Semaphore is more complicated. The Mutex and Condition Variable are more primitive and, as result of this, could be used as universal “bricks” for building of more complicated synchronization devices with various synchronization scenarios. Example. Building of “Semaphore” from Mutex and Condition Variable. (Error codes are not checked for example simplicity) typedef struct _app_sem_t{ unsigned int value; pthread_mutex_t mutex; pthread_cond_t cond; } app_sem_t; /* constructed semaphore type */ /* analog of sem_init( ) */ void app_sem_init( app_sem_t* sem, unsigned int value) { sem->value = value; pthread_mutex_init (&sem->mutex, NULL); pthread_cond_init (&sem->cond, NULL); } /* analog of sem_destroy( ) */ void app_sem_destroy(app_sem_t* sem) { pthread_mutex_destroy (&sem->mutex); pthread_cond_destroy (&sem->cond); } /* analog of sem_post( ) */ void app_sem_post(app_sem_t* sem) { pthread_mutex_lock(&sem->mutex); sem->value++; pthread_cond_signal(&sem->cond); pthread_mutex_unlock(&sem->mutex); } /* analog of sem_wait ()*/ void app_sem_wait(app_sem_t* sem) { pthread_mutex_lock(&sem->mutex); while(sem->value == 0) { pthread_cond_wait(&sem->cond, &sem->mutex); } sem->value--; pthread_mutex_unlock(&sem->mutex); }

26 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-26 Mutex versus Semaphore Needs logical "binding" to shared resource. Used for synchronization by means of application protocol agreement. (Example: traffic light) Physically locks the section of code, preventing the access of non-owner threads to the code section. (Example: gate or barrier on the road) Could be used as: - resource counter - non-recursive blocking device - stateful signaling device Used as blocking device (recursive or non-recursive ). In conditional scenarios often coupled with Condition Variable, which is stateless signaling device. Does not have any “owner”. Could be modified by multiple processes / threads simultaneously Has a single Owner Thread. When locked, could be unlocked by Owner only. Supported operations: increment (Sys V) / post (POSIX) wait & decrement (Sys V) / wait (POSIX) wait zero (Sys V) System 5 Semaphore also supports transactions. Supported operations: lock (trylock) unlock Is a counter (or array of counters in Sys V)Has only 2 states: locked / unlocked Used for inter-process or inter-thread synchronizationUsed mostly for inter-thread synchronization Could be memory-based or persistent object. Could be visible from different processes. Could exist independently of process life In most cases, is memory-based object, allocated in the scope of single process existing only until process finishes Semaphore (Sys V, POSIX)Mutex Condition Variable versus Signaling Semaphore Stateful device. “Signaling” and “waiting” could be performed asynchronously. Stateless device. When signaled, only currently “waiting” threads could listen the signal. Supports “broadcast” signaling, awakening all “waiting” threads. Each “post” (increment) operation awakens no more than one thread.

27 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-27 Multi-threading & Once-Only Initialization Let us have some initialization function, which must be called at most once in our program: In single-treaded environment to ensure once-only call to this function, we can do the following: “C” example: “C++” example: struct Dummy { Dummy() {init_something();} }; void f() { … static Dummy dummy; … } void init_something( ) {…} int flag=0; void f() { … if (! flag) { init_something(); flag=1; } … } Most compilers emit object code, which tests a secret compiler- generated flag to see whether static variable already initialized. So, actually, both the examples perform the same algorithm. In multi-threaded environment both the examples above are Thread-Unsafe. If two or more threads simultaneously enter the critical section in function f(), the function init_something() will be executed more than once. To ensure thread safety, the critical section would be protected from mutual execution: pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER ; int flag=0; void f() { … pthread_mutex_lock(&mutex); if (! flag) { init_something(); flag=1; } pthread_mutex_unlock(&mutex); … } Note 1: If init_something() is cancellation point, the function f() also has to install cleanup handlers to avoid mutex locking forever. Note 2: In this thread-safe example each thread, calling the function f(), provides 2 system calls ( 2 context switches) even when call to function init_something() is not performed.

28 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-28 Once-Only Initialization in POSIX pthread_once_t once_control = PTHREAD_ONCE_INIT; int pthread_once (pthread_once_t *once_control, void (*init_routine) (void)); The system call pthread_once() ensures that a piece of initialization code is executed at most once. The once_control argument points to a global variable statically initialized to PTHREAD_ONCE_INIT. The first time pthread_once() is called with a given once_control argument, it calls init_routine and changes the value of the once_control variable to value “initialization performed”. Subsequent call to pthread_once() with the same once_control argument : - blocks calling thread, if any other thread currently executes the same call with the same once_control argument - does not run init_routine(), if once_control variable already has value “initialization performed”. The pthread_once() is not a cancellation point. However, if the function init_routine is a cancellation point and is canceled, the effect on once_control is the same as if pthread_once() had never been called. If once_control is automatic variable on stack or is not initialized, the behavior is undefined. Returns 0 on success, error code (EINVAL – bad argument) on failure. pthread_once_t once_flag = PTHREAD_ONCE_INIT; void f() { … pthread_once(&once_flag, init_something); … } Using system call pthread_once(), the previous example with at most one call to initialization function init_something() could be implemented in following way:

29 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-29 not allocated key1 key2.... PTHREAD_KEYS_MAX TSD Keys Global Data Thread 1 TSD Area Thread 2 TSD Area NULL Data A Data B Data C Static and global variables normally are shared by threads, because these variables are located in one memory space shared by all threads of the same process. Variables on the stack however are local to threads, because each thread has its own stack, residing in a different memory location. So, with regular variables it is impossible to have global or static variables that have different values in different threads. Thread-local storage (TLS) is a programming method which enables to use static or global memory local to a thread. Thread-specific data (TSD) is the POSIX implementation of TLS method. Each thread possesses a private memory block, the Thread-Specific Data Area (TSD Area). The TSD Area is indexed by TSD Keys. Each TSD Key in the TSD Area has associated value of type void *, which can be NULL or can be a pointer to any thread-specific data. TSD keys are common to all threads, but the value associated with a given TSD key can be different in each thread. When a new thread created, its TSD area initially associates NULL with all keys, that already allocated in the scope of current process. When new TSD Key is allocated by request from some running thread, this Key becomes known and is associated with NULL value in all currently executing threads. Thread Local Storage (TLS). POSIX Thread Specific Data (TSD).

30 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-30 POSIX TSD System Calls int pthread_key_create (pthread_key_t *key, void (*destr_function) (void * )); This system call allocates a new TSD Key. The allocated TSD Key is stored in the location pointed to by key. The value initially associated with the returned key is NULL in all currently executing threads. Note: There is a limit of PTHREAD_KEYS_MAX on the number of keys allocated at a given time. The destr_function argument, if not NULL, specifies a destructor function associated with the key. When a thread terminates via pthread_exit() or by cancellation, destr_function is called and accepts as argument the value associated with the Key in the specific thread. The destr_function is not called if that value is NULL. Note: The order in which destructor functions are called at thread termination time is unspecified. Before the destructor function is called, the NULL value is associated with the key in the current thread. Returns 0 on success, error (EAGAIN – key limit reached, ENOMEM- no memory) on failure. int pthread_key_delete (pthread_key_t key); This system call de-allocates a TSD key. Note: This call does not check whether non- NULL values are associated with that Key in the currently executing threads, nor call the destructor function associated with the key. int pthread_setspecific (pthread_key_t key, const void *pointer); This system call changes the value associated with key in the calling thread, storing the given pointer instead. Note: The pointer argument would not refer to the stack (automatic) variable void *pthread_getspecific (pthread_key_t key); This system call returns the value currently associated with key in the calling thread. Note: In Java (SE Since 1.2) Thread Local Storage pattern is represented by class java.lang.ThreadLocal

31 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-31 This example allocates a thread-specific array of 100 characters, with automatic reclamation at thread exit. The array used to store the user-defined thread name, which could be extracted by any function called during following thread execution. /*---- Supporting “private” variables and functions ---- */ static pthread_key_t name_buffer_key; /* TSD Key for name buffer storage */ static pthread_once_t once_flag = PTHREAD_ONCE_INIT; /* Once-only initializer for TSD Key allocation */ /* Frees the thread-specific name buffer */ static void destroy_name_buffer(void * name_buffer_ptr) { free(name_buffer_ptr); } /* Allocates the TSD Key for name buffer storage */ static void allocate_name_buffer_key() { pthread_key_create(&name_buffer_key, destroy_name_buffer); } /*---- User-define thread name “public” utilities ---- */ #define MAX_NAME_LENGTH 100 /* Allocates the thread-specific buffer (once only) and stores there thread-specific name */ void set_thread_name (char* name_string) { char* name_buffer_ptr; pthread_once(&once_flag, allocate_name_buffer_key); /* Allocate TSD Key - once only per process*/ name_buffer_ptr = (char *) pthread_getspecific(name_buffer_key); /* Get name buffer pointer */ if (NULL == name_buffer_ptr) { /* Allocate name buffer – once only per thread */ name_buffer_ptr = (char *)malloc(MAX_NAME_LENGTH); pthread_setspecific(name_buffer_key, name_buffer_ptr ); memset(name_buffer_ptr, ‘\0’, MAX_NAME_LENGTH); } strncpy(name_buffer_ptr, name_string, MAX_NAME_LENGTH – 1); /* Store the name into name buffer */ } /* Gets the thread-specific name */ char * get_thread_name (void) { return (char *) pthread_getspecific(name_buffer_key); /* Get name buffer pointer */ } TSD Usage Example: User-Defined Thread Name Utilities

32 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-32 Singleton Pattern & Multi-threading In software engineering, the Singleton design pattern is used to restrict instantiation of a class to one only object. Commonly the Singleton is implemented as Class with prohibited copying (in C++ it means private Copy Constructor and private operator=() ) and with private Constructor, which is used to create at most one instance of the Class. Public interface of Singleton commonly has static method getInstance(), providing access to the single instance. The Singleton could be implemented as Statically or Dynamically (Lazily) constructed instance. class Singleton { public: static Singleton* getInstance(){ return &m_instance; } private: … // prohibited copy constructor & operator=()… Singleton(…){…} // private constructor static Singleton m_instance; …// private data members }; // initialization of static data member in.cpp file Singleton Singleton::m_instance(…); class Singleton { public: static Singleton* getInstance() { if (m_pInstance == NULL) { // 1) m_pInstance = new Instance(…); // 2) } return m_pInstance; } private: … // prohibited copy constructor & operator=()… Singleton(…){…} // private constructor static Singleton* m_pInstance; …// private data members }; // initialization of static data member in.cpp file Singleton Singleton::m_pInstance=NULL; Singleton with Dynamic (Lazy) Instantiation: The instance is created during first call to method getInstance(). Advantages: Initialization is performed only when actually required Not depends on static data initialization order in C++ Disadvantage: In multi-threaded environment, this implementation is Thread-Unsafe. The method getInstance() has critical region between m_pInstance pointer value checking (1) and assignment (2). If two threads simultaneously call method getInstance() at first time, the Singleton may be instantiated twice. Singleton with Static Instantiation: The instance is created during application start-up. Advantage: When application started, the Singleton is “ready to use” Disadvantages: We spend resources for instance initialization even if method getInstance() is never called. C++ standard does not specify the order of static data members initialization during application start-up. If constructor of Singleton needs access to any static data in other classes, its static construction may crash in C++.

33 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-33 Thread-Safe Singleton & Double-Checked Locking Pattern Thread-Safe Singleton with Dynamic (Lazy) Instantiation: The thread-safe implementation of Lazy Singleton could be achieved by synchronization of getInstance() method using Mutex or Once-Only Initializer. Advantage: This is reliable thread-safe implementation of Singleton Disadvantage: Each access to the Singleton acquires a lock. Actually lock is necessary only during first time initialization. As result, n calls to Singleton perform n-1 superfluous lock operations. class Singleton { public: static Singleton* getInstance() { MutexGuard guard(m_mutex); // thread-safe if (m_pInstance == NULL) { m_pInstance = new Instance(…); } return m_pInstance; } private: … // prohibited copy constructor & operator=()… Singleton(…){…} // private constructor static Singleton* m_pInstance; static Mutex m_mutex; …// private data members }; // initialization of static data member in.cpp file Singleton Singleton::m_pInstance=NULL; Mutex Singleton::m_mutex; Double-Check Locking Pattern (DCLP): Access to already initialized Singleton is lock-free; The lock is acquired only if m_pInstance is NULL; The 2 nd check after lock acquisition ensures, that another thread did not perform initialization while calling thread acquired the lock. Advantage: DCLP avoids superfluous lock operations. Disadvantage: DCLP … DOES NOT WORK with modern Optimizing Compilers and Optimizing Processors. (See short explanation on following slides… See also: “C++ Perils of DCLP” http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf )http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf static Singleton* getInstance() { if (m_pInstance == NULL) { // 1 st check MutexGuard guard(m_mutex); if (m_pInstance == NULL) { // 2 nd check m_pInstance = new Instance(…); } return m_pInstance; }

34 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-34 DCLP and Instruction Ordering by Optimizing Compilers In response to the code: m_pInstance = new Singleton(…); the compiler actually generates the following sequence of operations: Singleton* tmp = operator new(sizeof(Singleton)); // Step 1: memory allocation new (tmp) Singleton; // Step 2: memory initialization m_pInstance = tmp; // Step 3: assignment The Problem: Although the class instance life cycle in C++ begins only after successful finish of its constructor, some types of Optimizing Compilers could reorder Steps 2 and 3. As result, Singleton instance becomes visible by another threads, while it is NOT fully initialized yet. The Explanation: As languages, neither C nor C++ have language constraints to express the ordering. To define safe synchronization primitives, the system-specific libraries (like POSIX PThread), commonly use assembler instructions in their implementation. As result, sequence of operations, accessing the data not through system- specific library constraints, could be reordered by compiler. The volatile keyword in C and C++. The standard of C and C++ declares keyword volatile. It tells the compiler that the object can change at any time (by hardware, asynchronous kernel activity, etc.), and that compiler would restrict its optimizations, working with such an object. Every reference to volatile object to be a genuine (actual) reference, as follows: The system would always re-read the current value of a volatile object at the point it is requested, even if a previous instruction already extracted a value from the same object. The system would re-write the value of the object immediately on assignment. The Problem: Even after full “volatilization” of DCLP pattern (defining of m_pInstance, tmp and Singleton instance itself as volatile) the pattern does not become reliable. The Explanation: The following two issues could not be solved by usage of volatile keyword: “Volatilized” call to constructor new volatile Singleton(…) first initializes the instance, and only then actually declares it as volatile. As result, reordering still could occur during instance initialization. The Standard prevents compilers from reordering of “read” and “write” operations to volatile data within a single thread, but it imposes no restrictions on such reordering across multiple threads.

35 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-35 DCLP & Optimizing Processors Cache Coherency Problem On a machine with multiple processors, each processor has its own memory cache. To modify shared resource, each processor caches data from “main” memory, updates the data in the cache, and then flushes the updated data to the “main” memory. The data updated by one processor, could have its inconsistent copies in the cache of other processors. Cache Coherency Problem is inter-cache inconsistency in the value of a shared resource. Memory Barriers The Memory Barriers are instructions for compiler or linker, limiting the reordering of reads and writes of shared memory in multiprocessor systems. Memory Barriers are used to solve Cache Coherency Problem. Note: The implementation of reliable solutions with Memory Barriers impossible in C and C++ Such implementation requires the platform-specific code written in assembler. For all that, Why Mutex still works ? As it was already stated, neither C nor C++ have language constraints to restrict “read” and “write” operations reordering across multiple threads. Implementation of safe synchronization primitives by the system-specific libraries (like POSIX PThread) is reached by calling system-specific assembler code, implementing Memory Barriering instructions. Actually, Mutex lock acquisition leads not only to exclusive locking of code section by mutex owner, but also to synchronization of processor memory cache with “main” memory. The Conclusion The only way to provide safe multi-threaded access to shared data resource in C and C++ is to use synchronization primitives (mutex, once-only initializer, etc.), provided by system-specific libraries (like POSIX PThread). The access to shared resource without synchronization leads to data inconsistency because of implicit operation reordering and data caching provided by Optimized Compilers and Optimized Processors.

36 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-36 For all that, Is It Possible to Avoid Superfluous Locking? class Singleton { public: static Singleton* getInstance() { Singleton* tmp = (Singleton*) m_tlsKey.getValue(); // get from TLS if (tmp == NULL) { MutexGuard guard(m_mutex); // thread-safe if (m_pInstance == NULL) { m_pInstance = new Instance(…); } m_tlsKey.setValue(m_pInstance); // store in TLS } return tmp; } private: … // prohibited copy constructor & operator=()… Singleton(…){…} // private constructor static Singleton* m_pInstance; static Mutex m_mutex; static ThreadLocalKey m_tlsKey; …// private data members }; // initialization of static data member in.cpp file Singleton Singleton::m_pInstance=NULL; Mutex Singleton::m_mutex; ThreadLocalKey Singleton::m_tlsKey; The possibilities to avoid superfluous locking still exist: 1)If Singleton constructor does not access any static data in other classes, we can use Static Instantiation. 2)We can remove any locking from method getInstance(), but call it once from Main Thread, before any other Thread is created. 3)If we still want to use Lazy Instantiation, optimization is also possible. In DCLP we unsuccessfully tried to limit number of lock acquisitions to single lock. The following solution demonstrates possibility to limit number of locks to single lock per each accessing thread: Thread-Safe Lazy Singleton optimization by means of Thread Local Storage: The synchronized (via Mutex or Once-Only Initializer) method is called by specific Thread only once, during first access from this Thread to the Singleton. The accepted reference to Singleton is saved in Thread Local Storage (TLS). All subsequent calls to Singleton are provided via reference saved in Thread Local Storage without superfluous locking.

37 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-37 Singleton in Java A correct thread-safe Java lazy-loaded solution working in any Java version is suggested by Bill Pugh. It is known as the "Initialization On Demand Holder" pattern. public class Singleton { private Singleton() { } // Private constructor // Inner Singleton Holder class private static class SingletonHolder { private static Singleton instance = new Singleton(); } public static Singleton getInstance() { return SingletonHolder.instance; } Java took volatile a step further than C++ by guaranteeing reordering restrictions across multiple threads. Since Java 1.5 the Memory Model was standardized. The volatile has the more restrictive, but simpler semantics: Any read of a volatile is guaranteed to occur prior to any memory reference in the subsequent statements, Any write to a volatile is guaranteed to occur after all memory references in the preceding statements class Singleton { private static volatile Singleton instance = null; private Singleton(){ } // Private constructor public static Singleton getInstance() { if (instance == null) { synchronized (Singleton.class) { if (instance == null) instance = new Singleton(); } return instance; } The DCLP is reliable only since Java 1.5 “Initialization On Demand Holder” Pattern Note: Java (SE Since 1.5) provides package java.util.concurrent.atomic containing “volatilized” data types and lock-free atomic operations. (See “The DCLP is Broken Declaration”: http://www.cs.umd.edu/~pugh/java/memory Model/DoubleCheckedLocking.htmlhttp://www.cs.umd.edu/~pugh/java/memory Model/DoubleCheckedLocking.html )

38 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-38 Common Oversights in Multi-Threaded Programs These are more frequent oversights that causing bugs in multi-threaded programs: An argument, passed to a New thread, points to the stack (to automatic variable) of the Caller Thread. As result, Newly-created Thread could dereference the argument after its de-allocation on stack of Caller Thread. A shared global memory (global variable) with changeable state is accessed without exclusive lock protection by two or more threads, and at least one of the threads tries to write to the location. As result, the order of accesses is non-deterministic and leads to Data Race bugs. Two threads trying to acquire rights to the same pair of global resources in alternate order. As result, Deadlock is caused. Trying to reacquire a non-recursive lock already held. As result, Recursive Deadlock is caused. Protected code segment contains call to a function that frees and reacquires the synchronization before returning to the caller. As result, data actually has not been protected, and caller is not aware of a hidden gap in synchronization protection. Mixing UNIX signals with threads, and not using the sigwait() model for handling asynchronous signals. Long-jumping away (calling setjmp() and longjmp() in C, or throwing exception in C++) without releasing the mutex locks. Failing to re-evaluate the conditions after returning from a call to pthread_cond_wait() or thread_cond_timedwait(). Forgetting that default threads are created PTHREAD_CREATE_JOINABLE and must be reclaimed with pthread_join(). Note that pthread_exit() does not free up its storage space. Making deeply nested, recursive calls and using large automatic arrays can cause problems because multi-threaded programs have a more limited stack size than single-threaded programs. Specifying an inadequate stack size, or using non-default stacks. In general, multi-threading bugs are statistical instead of deterministic. Multi-threaded programs with such bugs, often behave differently in two successive runs, even with identical inputs. This behavior is caused by differences in the order that threads are scheduled. In such cases tracing is more effective method of bug finding, than is breakpoint-based debugging.

39 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-39 Design of Asynchronous Functionalities. Asynchronous Processing. Producer-Consumer Pattern Any server usually has two different functional parts: Request Listening and Request Processing. In most the cases it is very useful to separate these functionalities and to provide mediating mechanism for data exchange between them. Such design not only allows easy implementation of concurrent request processing, but also has advantages even for iterative server, allowing: - To avoid loss of Requests (because of socket buffer overflow) in case of long processing time of single request. - To provide equable CPU utilization in case of non-equable traffic of Requests. Such solution is also useful for the servers, providing simultaneously different type of Request Listeners (for example, sockets and shared memory) to provide uniform processing of Requests accepted from different sources. In multi-processing and multi-threading programming this solution is known as Producer-Consumer Pattern: - The Producer repeatedly generates a piece of data and puts it into the Mediation Buffer - The Consumer repeatedly extracts the piece of data from Mediation Buffer and provides its processing. - The main idea of the pattern is to make sure that the Producer won't try to add data into the Buffer if it's full, and that the Consumer won't try to remove data from the Buffer if it’s empty. Request Listener Mediation Data Buffer push data Request Processor pop data Repeatable Thread Utilization. Thread Pool Pattern In multi-threaded systems the creation of each new thread requires additional system resources. So, it is useful to provide repeatable utilization of already created threads. From another side, holding the big amount of “unemployed” threads also unjustified. For these purposes usually well-known Thread Pool Pattern is used. Thread Pool minimizes the unjustified usage of system resources and commonly implements the following functionalities: - Dynamic creation of new threads - Repeatable utilization of already created threads - Dynamic Termination of hibernated “unemployed” threads Note: The Java Platform (SE Since 1.5) provides it’s own implementation of different patterns for asynchronous processing in package java.util.concurrent

40 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-40 Blocking Queue: Functionality and Design The Blocking Queue is one of possible implementations of Mediation Buffer for Producer-Consumer Pattern. It is thread-safe FIFO (first-input-first-output) data container, providing the following functionalities: - Synchronized methods for storage and extraction of data items, - Blocking of Consumers, trying to extract the data, if Queue is empty, - Blocking of Producers, trying to store new data, if Queue is full. The implementation of such container would provide the following methods: Constructor It builds the Queue instance. Could provide optional High Water Mark parameter, specifying the maximal allowed Queue capacity to avoid memory overload of using application. push(), tryPush() These methods push the new object to the back of Queue and represent blocking and non-blocking version of functionality to be used by Producer(s). The push() method could block if High Water Mark parameter is specified and maximal allowed Queue capacity is reached. In this case Producer would wait until the Consumer will decrease the used capacity of the Queue. pop(), tryPop() These methods pop the next object (if exists) from the top of Queue and represent blocking and non-blocking version of functionality to be used by Consumer(s). The pop() methods could block if Queue is empty. In this case Consumer would wait until the Producer will push the new object to the Queue. dispose() While this Queue has blocking methods, we need to unblock waiting threads before Queue destruction. This method could provide parameter, specifying disposal mode. Deferred disposal means that the objects already contained by the Queue, remain until they are popped by Consumer(s). Immediate disposal means discarding of such objects. In both modes the pushing of new objects to the Queue would be prohibited. Destructor It would require Queue disposal, if not provided yet. Attention: To safely avoid application crash, even after Queue disposal and emptying, all Producer(s) and Consumer(s) threads would be disallowed to access the Queue instance (or joined) before Queue deletion. Note: In Java (SE Since 1.5) Blocking Queue pattern is represented by interface java.util.concurrent.BlockingQueue

41 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-41 Synchronized Request Queue Examples Socket Request Queue Listener Processor Iterative Server … Socket Request Queue Processors Listener Concurrent Server … Socket Request Queue ProcessorsDispatcher Listener Concurrent Server Scenario 1. Iterative Server One Listener and One Processor Listener accepts Request from Socket and puts it into Request Queue Processor extracts one-by-one Requests from Queue and processes them. Scenario 2. Concurrent Server One Listener and static number of Processors Listener accepts Request from Socket and puts it into Request Queue The static number of Processors compete to extract Requests from Queue and to process them one-by-one Scenario 3. Concurrent Server One Listener, one Dispatcher and dynamic number of Processors Listener accepts Request from Socket and puts it into Request Queue Dispatcher extracts one-by-one Requests from Queue and starts separate Processor for each Request. Processor provides processing of single Request and terminates Legend: listener thread processor thread dispatcher thread request request processing data flow

42 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-42 Repeatable Utilization of Resources. Object Pool The Object Pool is standard pattern for storage of reusable objects. When instance of object is required by any functionality, it is retrieved from Object Pool. After the end of usage, the instance is not destroyed, but is retuned to Object Pool for future reuse. The main service provided by Object Pool is object storage. The additional optional service is dynamic object allocation and de-allocation. Traditional Object Pool interface: Get Object – gets the object from storage. If object does not exist, it could be allocated by Pool. Release Object – returns the object to Pool, were it stored in “passive” state until future reuse. Traditional Object Pool functionalities: Dynamic Object Allocation (optional). For this purpose the Pool would contain Object Factory. The maximal number of object instances could be restricted. Object Storage (mandatory) Dynamic Object De-allocation (optional). In this case each object has expiration time. The Pool would have “Garbage Collector” Thread, which periodically checks the state of objects in Pool. If specific Object is in “passive” (not used) state for the period more than specified expiration time, this Object is de-allocated by Garbage Collector.

43 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-43 Thread Pool: Functionality and Design Public Interface of Thread Pool: Constructor - specifies Maximal Thread Count and Thread Expiration Time (maximal hibernation timeout). addTask() - adds Runnable Task to the Pool, by need creates new Thread or activates “passive” Thread if exists. dispose() - unblocks all waiting Pool Threads. join() - waits for all Pool Threads termination. Private Interface of Pool’s Threads: Constructor - visible from Thread Pool only, creates Daemon Thread, specifies reference to the Pool to be notified when Thread processing is finished. run() - performs loop of Task getting and processing, until empty Task is accepted ( when hibernation timeout expired or Pool is disposed). The Thread Pool is pool of threads. It has the following specifics: The service provided by Thread Pool is not only the allocation and storage of the Threads, but providing asynchronous simultaneous processing of scheduled tasks by means of dynamically changed number of reusable threads. Instead of methods “Get Object“ / “Release Object” the Thread Pool traditionally provides the method “Add Task” for scheduling of runnable tasks to be asynchronously executed by Pool’s threads. Thread Pool does not require existence of separate “Garbage Collector” Thread. Each of Pool’s thread is able to destroy itself, if it was not used during the specified expiration timeout period. Private Interface of Thread Pool used by its Threads: getTask() - signs calling Thread as “passive” and blocks until next Runnable Task is added, then signs Thread as “active” and returns the Task for processing by calling Thread. deregisterThread() - decreases total threads count. R R R … R AddTask ( ) Thread::Run { while(Pool.GetTask()) { Task.Run(); } Pool.Deregister(this); } Dispose ( ) Join ( ) Task Queue Max thread count Total thread count Passive thread count Expiration time Mutex and CondVar Dispose flag Note: In Java (SE Since 1.5) Thread Pool pattern is represented by class java.util.concurrent.ThreadPoolExecutor

44 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-44 Thread Pool Usage Examples … Socket R RR R … R RRR … R RRR R Legend: listener thread dispatcher thread processor thread data flow request processing request Runnables: request processing dispatching combined R R R 123 Scenario 1. From each accepted Request the Listener builds one Runnable and schedules it to Thread Pool. Runnable contains Request Data and algorithm of Request Processing. Scenario 2 Listener puts accepted Requests to Request Queue and schedules one Runnable per accepted Request to Thread Pool Runnable contains algorithm of Request Data extraction from Request Queue and following Request Processing. Scenario 3. Listener only puts accepted Requests to Request Queue Dispatcher Thread extracts Requests from Request Queue, builds Runnable per Request and schedules it to Thread Pool Runnable contains Request Data and algorithm of Request Processing.

45 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-45 Thread Pool Usage Examples (continuation) … Socket R RR R … RRR R Initial Task 45 Scenario 4. Listener only puts accepted Requests to Request Queue. Thread Pool is initialized with one Dispatcher Runnable, providing functionality of extraction Request Data from Request Queue, building one Request Processing Runnable per extracted Request and scheduling it to Thread Pool. As result, one thread of Thread Pool always performs functionality of Dispatcher Thread from Scenario 3. Request Processing Runnable, scheduled by Dispatcher, contains Request Data and algorithm of Request Processing. Scenario 5 Listener only puts accepted Requests to Request Queue. Thread Pool is initialized with one Combined Runnable, implementing part of Dispatcher functionality and Request Processing algorithm. It performs the following functionality: - Extract single Request Data from Request Queue - Schedule the new Combined Runnable to Thread Pool - Perform Data Processing algorithm on extracted Request Data. As result: - All Threads in Pool execute the same algorithm of Combined Runnable - Dispatcher functionality distributed by parts between all running Threads - No more than one Runnable waits for processing in internal Thread Pool queue of Runnables.

46 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-46 RWLock – Multi Read / Single Write Lock Pattern Some systems, providing the multiple simultaneous access to shared data, need more complicated kind of data locking, than binary locking logics (Locked/Unlocked) provided by Mutex. Such systems (for example, Airline- Ticketing stations) should be able to read the data concurrently (find available “seat”), but only one of them should be able to change the status of data (reserve a “seat”) at a given time. The Read-Write Lock (RWLock) provides non-exclusive read-only access and exclusive write access to the shared data. Some Unix (Free BSD, Solaris) and Linux (Debian, Ubuntu) operating systems support the RWLock pattern and implement the set of RWLock-related system calls, defined by POSIX: RWLock Attributes related system calls: pthread_rwlockattr_init(), pthread_rwlockattr_destroy() Constructor and Destructor of RWLock attributes container of type pthread_rwlockattr_t pthread_rwlockattr_getpshared(), pthread_rwlockattr_setpshared() Setter and Getter for RWLock Sharing Mode attribute, which accepts the following values: - “Process-Private” (default) mode – the lock used for inter-thread synchronization of single process only - “Process-Shared” mode – the lock could be used for inter-process synchronization RWLock related system calls: pthread_rwlock_init(), pthread_rwlock_destroy() Constructor and Destructor of RWLock object of type pthread_rwlock_t pthread_rwlock_rdlock(), pthread_rwlock_tryrdlock() Acquires non-exclusive Read Lock on RWLock object in behalf of calling thread in blocking or non-blocking mode pthread_rwlock_wrlock(), pthread_rwlock_trywrlock() Acquires exclusive Write Lock on RWLock object in behalf of calling thread in blocking or non-blocking mode pthread_rwlock_unlock() Releases last lock (Read or Write) acquired on RWLock object by calling thread Note: In Java (SE Since 1.5) the RWLock pattern is represented by class java.util.concurrent.locks.ReentrantReadWriteLock

47 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-47 Trivial RWLock Implementation Example class RWLock { private: // state: 0-unlocked, -1-write-locked, // >0-number of read locks. int m_lockState; // number of pending writers int m_waitingWriters; // synchronization devices Mutex m_mutex; CondVar m_cond; public: // constructor RWLock() :m_lockState(0), m_waitingWriters(0){}... // acquires recursive non-exclusive Read Lock void readLock() { MutexGuard guard(m_mutex); while((m_lockState == -1) || (m_waitingWriters != 0)) { m_cond.wait(m_mutex); } m_lockState++; } // acquires non-recursive exclusive Write Lock void writeLock() { MutexGuard guard(m_mutex); while(m_lockState != 0) { m_waitingWriters++; m_cond.wait(m_mutex); m_waitingWriters--; } m_lockState = -1; } // releases last acquired lock void unlock() { MutexGuard guard(m_mutex); if(0 == m_lockState) return; // already unlocked if(m_lockState == -1) m_lockState = 0; // write unlock else m_lockState--; // read unlock m_cond.notifyAll(); } }; // end class RWLock This example demonstrates the simple C++ implementation of RWLock functionality, similar to most POSIX implementations in Unix and Linux:

48 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-48 RWLock Acquisition Policies: Preference Order Common Policies All RWLock implementations provide the following two obvious lock acquisition rules: Read Lock could be acquired immediately, if no Write Locks held on the RWLock object by other threads. If any other thread already holds the Write Lock, the current Read Lock operation will be blocked. Write Lock could be acquired immediately, if no Read Locks held on the RWLock object by other threads. The Write Lock operation will be blocked until all other threads will release their Read Locks on this object. Acquisition Preference Order Two different threads try to acquire Read or Write Lock on the same RWLock object. Who would be the first? Different platforms provide different preference ordering to lock access: Random-order access (implemented by Java) – order is not specified Arrival-order access (implemented by Java) – the longest-waiting single writer or group of readers will acquire lock. “Writer-first” order (implemented by POSIX) – to avoid writers starvation, Write Lock has higher priority than Read Lock. This means: - If Write Lock could not be acquired immediately (Read Lock is held by another thread(s)), the Write Lock is signed as Pending Write Lock and then is blocked until all other threads release previously acquired Read Locks. - All subsequent Read Lock operations will be blocked, if Pending Write Lock exists. - The acquisition order of two different Write Locks, performed by different threads, depends on Scheduler Policy (for non-realtime threads in Linux it will be random-order access) Ra “Write-First” preference order Ra Wa Legend: - acquired lock; - pending lock; - released lock; R - read lock; W - write lock; a,b - object name, - wait

49 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-49 RWLock Acquisition Policies: Recursivity and Reentrancy Lock Recursivity Recursivity means possibility to acquire the multiple concurrent Locks of the same type (Read or Write) by the same thread with following matching number of unlock operations. Recursive Read Lock is supported by most of RWLock implementations (POSIX, Java). Recursive Write Lock is supported by Java, but is not supported by POSIX implementations. In most of POSIX implementations the trial to take recursive Write Lock leads to unspecified behavior of RWLock object or to Recursive Deadlock. Note: To support Recursive Write Lock, its implementation would explicitly maintain the ID of Write Lock holding thread to permit repeatable Write Lock operation to this thread only. Ra Wa Recursive Read Lock Recursive Write Lock is acquired in Java Recursive Write Lock leads to Deadlock in POSIX Wa Lock Reentrancy Reentrancy means guaranteed success during repeat acquisition of Recursive Lock (Read or Write), which already held (at least once) by the calling thread. Java implementation guarantees Reentrancy of Read and Write Lock operations. In POSIX implementation Write Lock is not Recursive, Read Lock is Recursive, but not Reentrant. If pending Write Lock appeared between two sequential acquisitions of the Read Lock, this leads to Deadlock. Note: To support Reentrancy of Read and Write Lock, its implementation would explicitly maintain the IDs of Write Lock and Read Lock holding threads. Ra Wa Ra Recursive Read Lock is not Reentrant in POSIX. Existence of Pending Writer can lead to Deadlock.

50 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-50 RWLock Acquisition Policies: Downgrading and Upgrading Lock Downgrading Ability Ability to Downgrade the RWLock means possibility to turn Write Holder Thread to became the Read Holder of the specific RWLock object. The possible downgrade scenario looks as follows: - The Thread holds Write Lock on specific RWLock Object - The Thread acquires Read Lock on the same object - The Thread releases Write Lock, retaining the acquired Read Lock. Lock Downgrading is supported in Java. Acquisition of Read Lock by Write Holder Thread is not permitted by POSIX and leads to Recursive Deadlock. Note: To support Lock Downgrading, the RWLock implementation would explicitly maintain the ID of Write Lock holding thread to permit Read Lock acquisition operation to this thread only. Ra Wa Acquisition of Read Lock by Write Holder Thread leads to Deadlock in POSIX Ra Lock Downgrading in Java Lock Upgrading Ability Ability to Upgrade the RWLock means possibility to turn Read Holder Thread to became the Write Holder of the specific RWLock object. The possible upgrade scenario looks as follows: - The Thread holds Read Lock on specific RWLock Object - The Thread acquires Write Lock on the same object - The Thread optionally releases Read Lock, retaining the acquired Write Lock. Lock Upgrading is NOT supported by Java and POSIX, because it leads to inevitable Deadlocks. Note: Unlike other RWLock related scenarios, the Lock Upgrading leads to Deadlocks not because design deficiency, but because of RWLock pattern nature itself. To support Lock Upgrading, the RWLock implementing systems must have Deadlock Prevention functionality, providing automatic interruption of potentially dangerous lock operation to avoid real deadlocking of RWLock maintaining threads. Ra Wa Without Deadlock Prevention Functionality Lock Upgrading scenario leads to inevitable Deadlocks. Ra

51 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-51 Deadlock Prevention in Multi-Lock Systems During development of multi-threaded applications with limited predictable number of simply locking resources, the Deadlocking scenarios could be avoided by establishing Common Resource Locking Protocol, providing predefined order of resource locking during the application design. In more sophisticated systems with big number of Locking Resources, the establishing of Common Resource Locking Protocol could be problematic or impossible. In systems with non-trivial locking scenarios the potential Deadlock occurrence is also inevitable because of pattern complexity (the example is upgradeable RWLock). The known practice in these cases is to provide automatic interruption of potentially dangerous lock operation. Possible Functionality of Deadlock Prevention with Automatic Lock Interruption. When potential deadlock situation is recognized, one of colliding Threads is interrupted. Accepting the interrupt (for example, exception) the Thread must release all holding Locks and only than this Thread is permitted to restart it’s flow from the beginning. Explanation if interrupted Thread repeatedly tries to restart the interrupted lock operation, without release of all retained locks previously held by this Thread, this leads to endless Livelock. So, the conclusion is: Interruption of potentially dangerous lock operation ensures Deadlock-Safety of application. The release of all retained locks by interrupted Thread ensures Livelock-Safety of application. Ra Wa Example of Potential Deadlock Interruption during simultaneous RWLock Upgrade operation Ra WaWb Rb Wa R|Wb Wb R|Wa Examples of Potential Deadlock Interruption occurred because of reverse locking order

52 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-52 Potential Deadlock Recognition On Waiting Threads Dependency Graph different Threads are represented as nodes, and “wait” dependencies between these Threads are represented by arrows. All the time, while Graph does not have any loops, the Deadlock does not occur. Each pending (could not be taken immediately) Lock operation would be verified to not became the last member of the newly created loop in the Waiting Threads Dependency Graph. This could be prevented by finding the reverse “way” on Graph. For example: 1)Thread A needs to wait for Thread B 2)If Waiting Threads Dependency Graph contains “way” B->A, the Thread A must be interrupted 3)Else dependency A->B could be added to Waiting Threads Dependency Graph, and Thread A begins to wait. Thread AThread BThread N … Inability to perform immediately specific resource locking operation by calling Thread (Lock Pretender ) usually means, that this resource is locked by another Thread or multiple Threads (Lock Holders). If locking operation is performed in blocking mode, such situation leads to blocking of calling Lock Pretender Thread, until the specific lock object will be released by all Lock Holder Threads. So, waiting in blocking mode for lock acquisition, the specific Thread actually waits for one or more other Threads, which also could wait for other Threads, etc. As result, in multi-threaded systems with synchronized access to multiple shared resources, in each specific moment of time we have spontaneously built chains of waiting Threads. The Deadlock could occur if such chain of waiting Threads “fastens” and becomes the loop: To build deadlock prevention system we can use the Graph, representing Dependencies between Waiting Threads. Thread A Thread C Thread B Thread D Thread E Thread G Thread H Thread F Thread I

53 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-53 Mini-Project: Database With RWLock Access to DB Instances. Requirement: Each DB Instance has unique ID and modifiable attribute(s). Each DB Instance behaves like RWLock object, supporting: Simultaneous read access from multiple Threads by means of non-exclusive Read Lock, Modification by single Thread by means of exclusive Write Lock, Reentrancy of Read and Write Locks, Ability to upgrade (Read->Write) and downgrade (Write->Read) the lock type. DB provides the storage of its instances with following functionality: DB Instance creation by unique ID Created DB Instance would be Write-Locked, giving to user possibility to initialize Instance Attributes DB Instance search by unique ID Found DB Instance would be Read-Locked, to avoid asynchronous Instance deletion by another Thread DB Instance removal Could be performed on Write-Locked instance only Asynchronous DB Instance removal would not result in crush of other functionalities DB Instance attributes modification Could be performed on Write-Locked instance only Attribute modification could be committed or roll-backed in the end of operation The DB would provide friendly Transaction API for performing all the operations described above The DB would provide Deadlock Prevention mechanism, containing: Recognition of potential Deadlocks Interruption of last colliding Transaction with release of all locks, allocated by this Transaction Automatic restart of Transaction interrupted due to collision Possibility of asynchronous removal of DB Instance with automatic termination of all Instance related Transactions.

54 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-54 Mini-Project: Ideas For Realization 1.Safe Unlocking Mechanism To provide safe DB Instance unlocking, the Lock Guard pattern could be implemented. Any Lock will be released in Guard destructor. 2. RWLock Reentrancy optimization To avoid superfluous calls to synchronized locking/unlocking methods of RWLock object, each Thread could hold in its Thread Local Storage (TLS) the IDs of already locked instances. The actual lock operation is performed by Guard only if the instance with specific ID is not signed as already locked in TLS. The Guard which performed actual Write Lock operation, is responsible to perform instance commit or roll-back functionalities. 3. Deadlock Prevention functionality The application could implement Waiting Threads Dependency Graph as synchronized singleton object, containing information about “wait” dependencies of all Threads, currently blocked in pending lock operations. Each lock operation requiring Thread blocking, would register new dependencies in Dependency Graph, before actual blocking of calling thread. If new dependencies lead to deadlock, the registration will fail with exception. 4. Transaction interruption, Lock releasing, automatic Transaction restart The Transaction could be associated with try/catch block, containing Lock Guard Handling Deadlock Exception, all Transaction, excluding the deepest in stack, would re-throw the Exception. The deepest Transaction would be restarted to begin all the flow from the starting point. The Transaction is deepest, if corresponded Lock Guard releases the last Instance, locked by current Thread. 5. Database Internal Design and synchronization The Database could be implemented as Singleton, providing synchronized methods “Create Instance”, “Find Instance” and “Remove Instance”. The DB would contain the storage of pointers to all existing DB Instances, sorted by unique Instance ID. The DB must provide additional functionality, to avoid crush of Threads, performing pending “Find Instance” operation in the time of this DB Instance removal by another Thread. For this purpose, the DB could contain also Pending Operation Journal, where each pending “Find Instance” operation is registered. The physical deletion of DB Instance would be provided only when neither pending Locks nor pending Operations are exist for this Instance.

55 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-55 RWLock Class Diagram

56 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-56 Lock Operation Sequence Diagram

57 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-57 Unlock Operation Sequence Diagram

58 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-58 XDB Class Diagram

59 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-59 see continuation on the next slide… XDB Read Transaction Sequence Diagram (1)

60 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-60 XDB Read Transaction Sequence Diagram (2) … continuation

61 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-61 XDB Write Transaction Sequence Diagram

62 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-62 XDB Delete Transaction Sequence Diagram

63 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-63 XDB Create Transaction Sequence Diagram

64 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-64 So, what could we learn from Internet ? So, what could we learn from the INTERNET ? To not remain lonely, establish Connection. Remember, that Connection requires the existence of a Protocol, which is Mandatory for Both the communicating sides. To establish Reliable Connection, be ready to offer a hand for Hand Shake. To be Well-Known, become a Server, one who is ready to Serve others. When you fall into a Deadlock, don’t be afraid to start from the beginning. The shortest Route is not always the best one. To not miss the Signal, address your Kernel.

65 © D.Zinchin [zinchin@gmail.com] Introduction to Network Programming in UNIX & LINUX4-65 Sof Maase h… בס"ד (From Sabbath Prayer)


Download ppt "© D.Zinchin Introduction to Network Programming in UNIX & LINUX4-1 Thread Thread is separate part of process, providing it’s specific."

Similar presentations


Ads by Google