Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 FIT5174 Distributed & Parallel Systems Lecture 3 FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 3 - 2013.

Similar presentations


Presentation on theme: "1 FIT5174 Distributed & Parallel Systems Lecture 3 FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 3 - 2013."— Presentation transcript:

1 1 FIT5174 Distributed & Parallel Systems Lecture 3 FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 3 - 2013

2 2 Overview Interprocess Communication (IPC) Synchronization in Distributed systems –Clock synchronization –Event Ordering –Distributed Mutual Exclusion –Deadlock

3 3 Inter-process Communication (IPC) IPC basically requires information sharing among two or more processes. Two basic methods- –Original sharing (Shared data approach) –Copy Sharing (Message passing approach) P1 P2 Shared Common Memory Area P1 P2 Shared data approachMessage passing approach

4 4 Message Passing System A message-passing system is a sub-system of a distributed system that provides a set of message-based IPC protocols and does so by shielding the details of complex network protocols and multiple heterogeneous platforms from programmers. It enables processes to communicate by exchanging messages Allows programs to be written by using simple communication primitives, such as send and receive. Serves as a suitable infrastructure for building other higher level IPC systems, such as RPC (Remote Procedure Call) and DSM (Distributed Shared Memory).

5 5 Desirable features of a good message passing system Simplicity Uniform semantics –Same primitives for local and remote communication Efficiency –Reduce the number of message as far as possible –Some optimization normally adopted for efficiency include- Avoiding the cost of establishing and terminating connections between the same pair of processes for each and every message exchange between them Minimizing cost of maintaining connections Piggybacking of acknowledgment of previous message with the next message.

6 6 Desirable features of a good message passing system Reliability –Lost and duplicate message handling Correctness –Atomicity –Ordered delivery –Survivability Flexibility –Can drop one or more correctness properties

7 7 Desirable features of a good message passing system Security –Authentication of sender and receiver –Encryption of messages Portability –Message passing system should itself be portable –The application written by using primitives of the IPC protocol should be portable.

8 8 Issues in IPC by message passing A typical message structure –Header  Addresses Sender address Receiver address  Sequence number  Structural information Type Number of bytes –Message

9 9 Design of an IPC protocol Some of the main issues will be: –Identity related –Who is the sender? –Who is the receiver? –Network Topology related –1 receiver or many? –Flow control related –Guaranteed by the receiver? –Sender should wait for reply? –Error control and channel management –Node crash……what to do? –Receiver not ready….what to do? –Several outstanding messages for the receiver

10 10 Synchronization in IPC Send primitive –Blocking –Non blocking Receive primitive –Blocking –Non Blocking Polling interrupt IPC Synchronization Semantics BlockingNon Blocking

11 11 Synchronous & Asynchronous Communication When both the send and receive primitives of a communication between two processes use blocking semantics, the communication is said to be Synchronous; otherwise it is asynchronous. SenderReceiver Receive (message) Execution suspended Execution resumed Message Acknowledgment Execution resumed Synchronous mode with blocking send and receive primitives

12 12 Synchronous VS Asynchronous Communication Synchronous –Simple and easy to implement –Contributes to reliability –No backward error recovery needed Asynchronous –High concurrency –More flexible than synchronous –Lower deadlock risk than in synchronous communication ( but beware)

13 13 Buffering Synchronous systems –Null buffer –Single message buffer Asynchronous systems –Unbounded capacity buffer –Finite message (multiple message buffer)

14 14 Buffering Synchronous SystemAsynchronous System

15 15 Multi-datagram Messages Almost all networks have an upper bound on the size of data that can be transmitted at a time. This is known as MTU (maximum transfer unit). Thus a message whose size is greater than MTU has to be fragmented - each fragment is sent in a packet. These packets are known as datagram. Thus messages may be single-datagram messages or multi-datagram messages Assembling and dissembling is the responsibility of message passing system.

16 16 Using a Bitmap for Multidatagrams

17 17 Encoding Decoding Encoding/Decoding is needed if Sender and receiver have different architecture Even for Homogeneous Encoding/Decoding is needed for Using an absolute pointer To know which object is stored in where and how much storage it requires

18 18 Process Addressing Explicit Addressing –Send(process_ID, message) –Receive(process_ID, message) Implicit Addressing –Send_any(service_ID, message) –Receive_any(process_ID, message)

19 19 Failure Handling Failure classification 1.Loss of request message 2.Loss of response message 3.Unsuccessful execution of the request crash sender receiver Request sender receiver Request Lost Request Lost response

20 20 4-message reliable IPC protocol

21 21 3-message Reliable IPC protocol

22 22 2-message Reliable IPC protocol

23 23 Example of Fault Tolerant System

24 24 Idempotency What is the difference between the following two functions? getSqrt(n) {return sqrt(n) } Debit (amount) { if (balance>amount) balance= balance-amount return (“success”, balance) else return (“failure”, balance) }

25 25 A Non-Idempotent Routine 4/13/2015Distributed Systems Lecture 2

26 26 Implementation of Idempotency How to implement Idempotency? –Adding sequence number with the request message –Introduction of ‘Reply cache’

27 27 4/13/2015Distributed Systems Lecture 2

28 28 Group Communication Group communication may be –One to many –Many to one –Many to many One to many –Group management –Group addressing –Buffered and unbuffered multicast –Send-to-all and Bulletin-Board semantics –Flexible reliability in multicast communication –Atomic multicast

29 29 Many to Many communication The issues related to one-to-many and many- to-one communications also applies here In addition, ordered message delivery is an important issue. This is trivial in one-to-many or many-to-one communications. For example, two server processes are maintaining a single salary database. Two client processes send updates for a salary record. What happen if they reach in different order? (Will sequencing of messages help in this case?)

30 30 Semantics for ordered delivery in many-to-many communications No-ordering No ordering constraint 2

31 31 Semantics for ordered delivery in many-to-many communications Absolute ordering –All messages are delivered to all receiver processes in the exact order in which they were sent. –Using global timestamp as message identifiers with sliding window protocol Absolute ordering semantic

32 32 Semantics for ordered delivery in many-to-many communications Consistent ordering –All messages are delivered to all receivers in the same order. However, this order may be different from the order in which messages were sent. Implementation –Make the communication appear as a combination of many-to- one and one-to-many communication [Chang and Maxemchuk] –Kernels of sending machines send messages to a single receiver (known as sequencer) that assigns a sequence number to each message and then multicast it. –Subject to single point of failure and hence has poor reliability. –A distributed algorithm - ABCAST in ISIS system [Birman and Renesse] (self study)

33 33 Semantics for ordered delivery in many-to-many comm. Consistent ordering semantic

34 34 Semantics for ordered delivery in many-to-many comm. Causal ordering –If the event of sending one message is causally related to the event of sending another message, the two messages are delivered to all receivers in correct order. –Two message sending events are said to be causally related if they are corelated by the happened-before relation. [The expression a→b is read “a happens before b" and means that all processes agree that first event a occurs, then afterward, event b occurs. The happens-before relation can be observed directly in two situations: 1. If a and b are events in the same process, and a occurs before b, then a→b is true. 2. If a is the event of a message being sent by one process, and b is the event of the message being received by another process, then a→b is also true. Happens-before is a transitive relation, so if a→b and b→c, then a→c. ]

35 35 Semantics for ordered delivery in many-to-many comm. One example of implementing Causal consistency is CBCAST in ISIS system [Birman et al]. Causal ordering semantic 2 nd round m2 is held back

36 36 Remote Procedure Call (RPC) The IPC part of a distributed application can be adequately and efficiently handled by using an IPC protocol based on message passing system. However, an independently developed IPC protocol is tailored specifically to one application and does not provide a foundation on which to build a variety of distributed applications. Therefore, a need was felt for a general IPC protocol that can be used for designing several distributed applications. The RPC facility emerged out of this need.

37 37 Remote Procedure Call While the RPC is not the universal panacea for all types of distributed applications but for a fairly large number of distributed applications. The RPC has become a widely accepted IPC mechanism in DS. Its features – –Simple call syntax. –Familiar semantics. –Specification of a well defined interface. –Ease of use. –Generality. “In single-machine computations procedure calls are often the most important mechanism for communication between the parts of the algorithm” [Birrell and Nelson]. –Its efficiency

38 38 RPC model RPC model is similar to “Procedure call” model. Procedure call is same as function call or subroutine call Local Procedure Call - The caller and the callee are within a single process on a given host. Remote Procedure Call (RPC) - A process on the local system invokes a procedure on a remote system. The reason we call this a “procedure call” is because the intent is to make it appear to the programmer that a local procedure call is taking place.

39 39 Local and Remote Procedure Call Local Procedure Call Remote Procedure Call

40 40 A Typical Model of RPC

41 41 Implementing RPC To achieve the goal of semantic transparency, the implementation of an RPC mechanism is based on the concept of stubs Stubs provide a perfectly normal (local) procedure call abstraction To hide the distance and functional details of underlying network, an RPC communication package (known as RPCRuntime) is used. Thus RPC implementation involves five elements- –The client –The client stub –The RPCRuntime –The server stub –The server

42 42 RPC in Detail

43 43 Stubs Client and server stubs are generated from interface definition of server routines by development tools. Interface definition is similar to class definition in C++ and Java.

44 44 Parameter Passing Mechanisms When a procedure is called, parameters are passed to the procedure as the arguments. There are three methods to pass the parameters. ◦call-by-value ◦call-by-reference ◦call-by-copy/restore

45 45 Call by value The values of the arguments are copied to the stack and passed to the procedure. The called procedure may modify these, but the modifications do not affect the original value at the calling side.

46 46 Call-by-Reference The memory addresses of the variables corresponding to the arguments are put into the stack and passed to the procedure. Since these are memory addresses, the original values at the calling side are changed if modified by the called procedure.

47 47 Call-by-Copy/Restore The values of the arguments are copied to the stack and passed to the procedure. When the processing of the procedure completes, the values are copied back to the original values at the calling side. If parameter values are changed in the subprogram, the values in the calling program are also affected.

48 48 Parameter Passing in RPC Which parameter passing mechanisms are possible? ◦It is possible to implement all of the three mechanisms if you wish. Usually call-by-value and call-by-copy/restore are used. ◦Call-by-reference is difficult to implement. All data which may be referenced must be copied to the remote host and the reference to the copied data is used. Do we need to convert the values of the arguments into a standard format to transmit over the network?

49 49 Parameter Passing in RPC Reasons to convert the values of the arguments into a standard format to transmit over the network ◦Different machines use different character codes. E.g., IBM main frames use EBCDIC, while PCs use ASCII. ◦Representation of numbers may differ from machine to machine. ◦Big endian and little endian

50 50 Parameter Passing in RPC If a standard format is not used, two message conversions are necessary. ◦If format information is attached to the message, only one conversion at the receiver will suffice. ◦However, the receiver must be able to handle many different formats.

51 51 RPC Messages Generally two types of messages –Call messages –Reply messages A typical RPC Call message format A typical RPC Reply message format (successful and unsuccessful)

52 52 Variations of RPC Asynchronous RPC –RPC (When a client requests a remote procedure, the client wait until a reply comes back in RPC. –If no result is to be returned, unnecessary wait time overhead. –In asynchronous RPC, the server immediately sends accept message when it receives a request.

53 53 Call-Back RPC One-way RPC In one-way RPC, the client immediately continues after sending the request to the server.

54 54 Some special types of RPC Callback RPC Broadcast RPC Batch-mode RPC Lightweight RPC

55 55 Optimizations for better Performance In Six Different Ways –Concurrent access to multiple servers –Serving multiple requests simultaneously –Reducing per-call workload of servers –Reply caching of idempotent remote procedures –Proper selection of timeout values –Proper design of RPC protocol specification

56 56 Concurrent Access to Multiple Servers One of the following three may be adopted: –Threads Use of Threads in the implementation of a client process where each thread can independently make remote procedure calls to different servers. Addressing in underlying protocol should be rich enough to provide correct routing of responses. –Early reply approach [Wilbur and Bacarisse] A call is split into two separate RPC calls- one passing parameters and other requesting the result Server must hold the result causing congestion or unnecessary overhead. –Call buffering approach [Gimson] Clients and servers do not interact directly but via a call buffer server A variant of this approach was implemented in MIT (Mercury Communication System)

57 57 Early Reply Approach

58 58 Call Buffering Approach

59 59 Serving Multiple Requests Simultaneously Following types of delays are common- –A server, during the course of a call execution, may wait for a shared resource –A server calls a remote function that involves computation or transmission delays So the server may accept and process other requests while waiting to complete a request. Multiple-threaded server may be a solution.

60 60 Summay of IPC (1) What is the purpose of IPC? –information sharing among two or more processes Differences between Synchronous and Asynchronous Communications? –When both the send and receive primitives of a communication between two processes use blocking semantics, the communication is said to be Synchronous; otherwise it is asynchronous List the Types of Failure in IPC Loss of request message, Loss of response message, Unsuccessful execution of the request

61 61 Summary of IPC (2) How to implement Idempotency? –Adding sequence number with the request message and Introduction of ‘Reply cache’ Three main types of Group Communications? –One to many, Many to one, Many to many One of the greatest challenges in Many to Many? Ordered Delivery Name an all propose IPC protocol? –Remote Procedure Call (RPC)

62 62 Summaary of IPC (3) Name a few ways to optimise RPC? –Concurrent Access to Multiple Servers, Serving Muliple Requests Concurrently, Reducing Call Workload per Server Three different techniques for implementing Concurrent Access to Multiple Servers? –Threads, Early Reply, Call Buffering

63 63 References for IPC 1.Birman, K. P. and Renesse, R. V. Reliable Distributed Computing with the ISIS Toolkit. IEEE Computer Society Press, 1994. 2.Birrell, A. D. and Nelson, B. J. Implementing remote procedure calls. ACM Trans. Comput. Syst. 2(1), 39- 59, 1984. 3.Wilbur, S. and Bacarisse, B. Building distributed systems with remote call, Software Engineering Journal, 2(5), 148-159, 1987. 4.R. Gimson. Call buffering service. Technical Report 19, Programming Research Group, Oxford University, Oxford University, Oxford, England, 1985.

64 64 Why Study Synchronisation, MUTEX, Deadlock? Synchronisation, mutual exclusion and deadlocks are very common problems which arise when multiple entities are competing for access to shared resources; All three are usually covered in units or textbooks dealing with operating systems, where the shared resources are CPU/core time and shared main memory, and sometimes I/O devices; In a distributed / parallel system the problem is complicated by the need to allow for transmission delays in passing messages through the IPC between tasks (processes), which makes some solutions used in operating systems problematic; Failure to properly address these problems will result in often catastrophic applications failures or bugs.

65 65 What is Synchronisation? World English Dictionary: synchronize or synchronise ( ˈ s ɪ ŋkrə ˌ na ɪ z): to occur or recur or cause to occur or recur at the same time or in unison to indicate or cause to indicate the same time: synchronize your watches ( tr ) films to establish (the picture and soundtrack records) in their correct relative position ( tr ) to designate (events) as simultaneous Synchronisation is about making two or more entities achieve a known state or condition at a known or identical time – for instance a receiver (e.g. task) must be ready before it can accept a message from a transmitter (e.g. task).

66 66 Synchronisation in Distributed Systems A Distributed System consists of a collection of distinct processes that are spatially separated and run concurrently; In systems with multiple concurrent processes, it is economical to share the system resources; Sharing may be cooperative or competitive; Both competitive and cooperative sharing require adherence to certain rules of behavior that guarantee that correct interaction occurs – otherwise chaotic behaviour may arise; The rules of enforcing correct interaction are implemented in the form of synchronization mechanisms; Synchronisation mechanisms may be part of an operating system, or communications mechanism like a library;

67 67 Issues in implementing synchronization In single CPU systems, synchronization problems such as mutual exclusion can be solved using semaphores and monitors. These methods rely on the existence of shared memory, which can be accessed very quickly by all tasks; We cannot use semaphores and monitors in distributed systems since two processes running on different machines cannot expect to have access to any shared memory; In a distributed system there are always finite time delays from messages to travel from one process to another, so any mechanism we use must account for these delays; Even simple matters such as determining if one event happened before another event require careful thought.

68 68 Issues implementing synchronization in DS/PS In distributed systems, it is usually not possible and often not desirable to collect all the information about the system in one place and synchronization among processes is difficult due to the following features of distributed systems: The relevant information is scattered among multiple machines. Processes make decisions based only on local information. Any single point of failure in the system should be avoided. No common clock or other precise global time source exists. Synchronisation in distributed/parallel systems requires unique algorithms which account for the unique behaviours in such systems, especially time delays.

69 69 Time in Distributed Systems: Why? External reasons: We often want to measure time accurately For billings: How long was computer X used? For legal reasons: When was credit card W charged? For traceability: When did this attack occurred? Who did it? –System must be in sync with an external time reference Usually the world time reference: UTC (Universal Coordinated Time) or derived GPS master clock; Internal reasons: many distributed algorithms use time Kerberos (authentication server) uses time-stamps This can be used to serialise transactions in databases This can be used to minimise updates when replicating data –System must be in sync internally - No need to be synchronised on an external time reference

70 70 Clock Synchronization Time is unambiguous in a centralized system – every process “sees” the same master clock in the machine. A process can simply make a system call to learn the time – the operating system then looks at the clock hardware. If process A asks for the current time, and a little later process B asks for the time, the value of B_time > A_time. In a distributed system, if process A and B are on different machines, B_time may not be greater than A_time. This is because the hardware clocks on these machines may not be precisely synchronised. Even if the clocks are synchronised, there may be a synchronisation error which is large enough to matter. Example: Recent CERN quantum physics experiment failure.

71 71 Imperfect Clocks Human-made clocks are imperfect –They run slower or faster than “real” physical time –How much faster or slower is termed “clock drift” –A drift of 1% (i.e. 1/100=10 -2 ) means the clock adds or loses a second every 100 seconds Suppose, when the real time is t, the time value of a clock p is C p (t). If the maximum drift rate allowable is ρ, a clock is said to be non-faulty if the following condition holds -

72 72 Clock Synchronisation Let’s synchronise. I’ve got 12:07. And You? 12:05 12:11 12:23 12:09 12:15 12:07 12:19 12:09 12:21 OK. Done. Done. I’ve got 12:11 Let’s agree on 12:09. OK?  Alice Bob

73 73 Cristian’s Algorithm This algorithm synchronizes clocks of all other machines to the clock of one machine, time server. If the clock of the time server is adjusted to the real time, all the other machines are synchronized to the real time. Every machine requests the current time to the time server. The time server responds to the request as soon as possible. The requesting machine sets its clock to C s +(T1 −T0 − I)/2. In order to avoid clocks moving backward, clock adjustment must be introduced gradually.

74 74 The Berkeley Algorithm Developed by Gusella and Zatti. Unlike Cristian’s Algorithm the server process in Berkeley algorithm, called the master periodically polls other slave process. Generally speaking the algorithm is as follows: –A master is chosen with a ring based election algorithm (Chang and Roberts algorithm). –The master polls the slaves who reply with their time in a similar way to Cristian's algorithm –The master observes the round-trip time (RTT) of the messages and estimates the time of each slave and its own. –The master then averages the clock times, ignoring any values it receives far outside the values of the others. –Instead of sending the updated current time back to the other process, the master then sends out the amount (positive or negative) that each slave must adjust its clock. This avoids further uncertainty due to RTT at the slave processes. –Everybody adjusts their time.

75 75 Averaging Algorithm Both Cristian ’s algorithm and the Berkeley algorithm are centralized algorithms with the disadvantages such as the existence of the single point of failure and high traffic volume concentrated in the master clock server. The “Averaging algorithm” is a decentralized algorithm. This algorithm divides time into resynchronization intervals with a fixed length R. Every machine broadcasts the current time at the beginning of each interval according to its clock. A machine collects all other broadcasts for a certain interval and sets the local clock by using the average of the arrival times.

76 76 Logical Clocks versus Physical Clocks Lamport showed that: –Clock synchronization need not be absolute –If two processes do not interact their clocks need not be synchronized. –What matters is they agree on the order in which events occur. For many purposes, it is sufficient that all interacting machines agree on the same time – they share a “frame of reference”. It is not essential that this agreed time is the same as “real time”. Clocks which agree across a group of computers but not necessarily with “real time” are termed “logical clocks”. Clocks that agree in time values, within a certain time limit (i.e. error), are “physical clocks”.

77 77 Lamport’s Synchronization Algorithm This algorithm only determines event order, but does not synchronize clocks. “Happens-before” relation: –“A →B” is read “A happens before B”: This means that all processes agree that event A occurs before event B. The happens-before relation can be observed directly in two situations: –If A and B are events in the same process, and A occurs before B, then A→B. –If A is the event of a message being sent by one process, and B is the event of the message being received by another process, then A→B. “Happens-before” is a “transitive” relation – if element a is related to an element b, and b is in turn related to an element c, then a is also related to c

78 78 Lamport’s Synchronization Algorithm If two events, X and Y happen in different processes that do not exchange messages (not even indirectly via third parties), then neither X →Y nor Y →X is true. These events are then termed “concurrent” What we need is a way to assign a time value C(A) on which all processes agree for every event A. The time value must have the following properties: ◦If A →B, then C(A) < C(B). ◦Clock time must always go forward, never backward. Suppose there are three processes which run on different machines as in the following figure. Each processor has its own local clock. The rates of the local clocks are different.

79 79 MUTEX / Mutual Exclusion in Distributed Systems When multiple processes access shared resources, using the concept of critical sections is a relatively easy way to manage access to shared resources: A critical section is a section in a program that accesses shared resources. A process enters a critical section before accessing the shared resource to ensure that no other process will use the shared resource at the same time. Critical sections are protected using semaphores and monitors in single-processor systems. We cannot use either of these mechanisms in distributed systems due to the time delays in propagating messages between machines.

80 80 A Centralized Algorithm – Coordinator Process This algorithm simulates mutual exclusion in single processor systems. One process is elected as the “coordinator”. When a process wants to enter a critical section of the code, it sends a request to the coordinator stating which critical section it wants to enter. If no other process is currently in that critical section, the coordinator returns a reply granting permission. If a different process is already in the critical section, the coordinator queues the request. When the process exits the critical section, the process sends a message to the coordinator releasing its exclusive access. The coordinator takes the first item off the queue of deferred request and sends that process a grant message.

81 81 A Centralized Algorithm

82 82 A Centralized Algorithm Advantages Since the service policy is first-come first-serve, it is fair and no process waits forever. It is simple and thus easy to implement. It requires only three messages, request, grant, and release, per use of a critical section. Disadvantages If the coordinator crashes, the entire system may go down. Processes cannot always distinguish a dead coordinator process from a “permission denied” message; A single coordinator may become a performance bottleneck if requests arrive at a high frequency, or the propagation delay of the messages is large.

83 83 A Distributed Algorithm The distributed algorithm proposed by Ricart and Agrawala requires ordering of all events in the system. We can use the Lamport’s algorithm for the ordering. When a process wants to enter a critical section, the process sends a request message to all other processes. The request message includes –Name of the critical section –Process number –Current time The other processes receive the request message. –If the process is not in the requested critical section and also has not sent a request message for the same critical section, it returns an OK message to the requesting process. –If the process is in the critical section, it does not return any response and puts the request to the end of a queue. –If the process has sent out a request message for the same critical section, it compares the time stamps of the sent request message and the received message.

84 84 A Distributed Algorithm If the time stamp of the received message is smaller than the one of the sent message, the process returns an OK message. If the time stamp of the received message is larger than the one of the sent message, the request message is put into the queue. The requesting process waits until all processes return OK messages. When the requesting process receives all OK messages, the process enters the critical section. When a process exits from a critical section, it returns OK messages to all requests in the queue corresponding to the critical section and removes the requests from the queue. Processes enter a critical section in time stamp order using this algorithm. This is a “consensus” mechanism as all processes must agree it is “OK to enter critical section”.

85 85 A Distributed Algorithm

86 86 Mutual Exclusion Algorithms: A Comparison The number of messages exchanged (i.e. messages per entry/exit of a single critical section): A.Centralized: 3 B.Distributed: 2*(n − 1) C.Ring: 2 Reliability problems which can disable the mutual exclusion mechanism: A.Centralized: coordinator crashes B.Distributed: any process crashes C.Ring: lost token, process crashes

87 87 Deadlocks in Distributed Systems A deadlock is a condition where a process cannot proceed because it needs to obtain a resource held by another process and it itself is holding a resource that the other process needs. We can consider two types of deadlock: –communication deadlock occurs when process A is trying to send a message to process B, which is trying to send a message to process C which is trying to send a message to A. –A resource deadlock occurs when processes are trying to get exclusive access to devices, files, locks, servers, or other resources. We will not differentiate between these types since we can consider communication channels to be resources without loss of generality.

88 88 What is a Deadlock? (Roy 2008) Permanent blocking of a set of processes that either compete for system resources or communicate with each other No efficient solution Involve conflicting needs for resources by two or more processes

89 89 Deadlock Example (Roy 2008)

90 90 Necessary Conditions for a Deadlock Four conditions have to be met for deadlock to be present: –Mutual exclusion. A resource can be held by at most one process –Hold and wait. Processes that already hold resources can wait for another resource. –Non-preemption. A resource, once granted, cannot be taken away from a process. –Circular wait. Two or more processes are waiting for resources held by one of the other processes.

91 91 Handling Deadlocks in DS/PS Strategies: A.The “Ostrich algorithm”; B.Deadlock detection and recovery; C.Deadlock prevention by careful resource allocations; D.Deadlock avoidance by designing the system in such a way that deadlocks simply cannot occur;

92 92 Summary of Synchronisation, mutual exclusion and deadlock Three Real Time Clock Synchronisation Methods –Cristian’s Method –Berkeley Algorithm –Averaging Algorithm Logical (Clock) Synchronisation Techniques –Lamport Mutual Exclusion Approaches –Centralised –Distributed Main Deadlock Modeling Areas –Necessary condition for occurrence –Deadlock Detection –Deadlock Handling


Download ppt "1 FIT5174 Distributed & Parallel Systems Lecture 3 FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture 3 - 2013."

Similar presentations


Ads by Google