Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Systems Major Design Issues

Similar presentations


Presentation on theme: "Distributed Systems Major Design Issues"— Presentation transcript:

1 Distributed Systems Major Design Issues
Debraj De Presentation 09/14/2011 CS8320 – Advanced Operating Systems Fall 2011 – Section 2.6 Presentation

2 Distributed System Design Issues
Presentation Outline Introduction Distributed System Design Issues Object Models and Naming Schemes Distributed Coordination Interprocess Communication Distributed Resources Fault Tolerance and Security References

3 Management of Distributed System mainly consists of:
Introduction Management of Distributed System mainly consists of: Coordination of concurrent distributed processes Management and networking of distributed resources Functioning of distributed algorithms But…. network may be unreliable Components may be untrusted These raise the design and implementation issues, in particular how to support transparency.

4 Introduction Following need to be considered for resolving design and implementation issues: How objects in the system are modeled and identified How to co-ordinate the interaction among objects how they communicate with each other How can shared/replicated objects be managed in controlled fashion Protection of objects and system security

5 Design & Implementation Issues
Object Models and Naming Schemes Distributed Coordination Interprocess Communication Distributed Resources Fault Tolerance and Security

6 [1] Object Models and Naming Schemes
Objects in a computer system: processes, data files, memory, devices, processors, and networks. Objects are encapsulated in servers process servers, file servers, memory servers etc. A client is a null server that accesses object servers.

7 [1] Object Models and Naming Schemes
Cont’d Three possible ways to identify a server Identification by name (name server) Identification by either physical or logical address (network server) Identification by service that the servers provide Following all depend on the naming scheme for system objects: Structure of the system, management of name space, name resolution, access methods

8 [2] Distributed Coordination
Processes require coordination to achieve synchronization Types of synchronization requirement Barrier synchronization Condition coordination Mutual exclusion

9 Types of Synchronization
[2] Distributed Coordination Types of Synchronization Barrier synchronization Process must reach a common synchronization point before they can continue Condition coordination A process must wait for a condition that will be set asynchronously by other interacting processes to maintain some ordering of execution Mutual exclusion Concurrent processes must have mutual exclusion when accessing a critical shared resource

10 [2] Distributed Coordination
State information synchronization No shared memory Time messages are inaccurate or incomplete Centralized coordination, shift in coordinator

11 [2] Distributed Coordination
Process deadlock problem related to synchronization Deadlock detection and recovery tool needed Four conditions must hold for deadlock to occur Exclusive use Hold and wait No preemption Cyclical wait

12 [2] Distributed Coordination
The problem of deadlocks can be handled in following ways Prevention Ensure that deadlock is not possible Avoidance require decisions by the system while it is running in order to insure that deadlocks will not occur Detection When detected, decide which process to rollback or abnormally terminate

13 [2] Distributed Coordination
If one of the four conditions is prevented, it will prevent deadlocks For example, to impose an order on the resources and require processes to request resources in increasing order. This prevents cyclical wait and thus makes deadlocks impossible

14 [2] Distributed Coordination
Real-world example with deadlock: Mars Rover problem M. Jones. What really happened on mars rover pathfinder. The Risks Digest, 19(49), December 1997

15 [2] Distributed Coordination
Mars Rover Frequent Reset issue: Data-gathering thread (low priority) lock.acquire(); write data; lock.release(); Info-bus thread (high priority) retrieve data; Communication thread (medium priority,long) Information bus (shared memory) Info-bus thread waits for data-gathering thread (to acquire lock); Communication thread preempts data-gathering thread

16 [3] Interprocess Communication
Interprocess communication can be accomplished by using simple message passing primitives Higher level logical communication methods provides the transparency Hide the physical details of message passing Two important concepts The client/server model Remote Procedure Call (RPC)

17 [3] Interprocess Communication
The client/ server model is a programming example for structuring processes in distributed systems logical communication request reply actual communication network client server kernel kernel

18 [3] Interprocess Communication
The Remote Procedure Call (RPC) model is similar to that of the local model The caller places arguments to a procedure in a specific location (such as a result register) The caller temporarily transfers control to the procedure When the caller gains control again, it obtains the results of the procedure from the specified location. The caller then continues program execution.

19 [3] Interprocess Communication
On the server side, a process is dormant (inactive, sleeping), awaiting the arrival of a call message. When one arrives, the server process computes a reply that it then sends back to the requesting client. After this, the server process becomes dormant again.

20 [3] Interprocess Communication

21 [4] Distributed Resources
Data and Processing Capacity Load Distribution multiprocessor scheduling (Static) load distribution/sharing (Dynamic)

22 [4] Distributed Resources
Distributed shared memory Distributed file systems Issues: Sharing and Replication of data Requires to maintain: data consistency and coherency Difference in implementation: distributed file systems and distributed shared memory

23 [5] Fault Tolerance and Security
Distributed Systems have openness in operating environment So vulnerable to failures and security threats Faults: Failure and Security Violation

24 [5] Fault Tolerance and Security
The problem of failures can be alleviated through: redundancy Transparent handling of failures (like removal of machines, network links, and other resources) without loss of data or functionality Roll-back recovery for execution states

25 [5] Fault Tolerance and Security
OS view: trustworthy communication process, confidentiality and integrity of messages and data Security Authentication: clients and also servers and messages must be authenticated. Authorization: access control has to be performed across a physical network with heterogeneous components under different administrative units using different security models.

26 [5] Fault Tolerance and Security
Ariane 5 failure A software bug caused European Space Agency’s Ariane 5 rocket to crash 40 seconds into its first flight in 1996 (cost: half billion dollars) The bug was caused because of a software component that was being reused from Ariane 4 A software exception occurred during execution of a data conversion from 64-bit floating point to 16-bit signed integer value The value was larger than 32,767, the largest integer storable in a 16 bit signed integer, and thus the conversion failed and an exception was raised by the program Engineers chose in earlier version of the Ariane rocket, to leave this function running for the first 40 seconds of flight to make it easy to restart the system in the event of a brief hold in the countdown * [Source:

27 Summary Unique design and implementation issues Include:
object models and naming schemes distributed resources interprocess communication Fault tolerance and security

28 References [1] Randy Chow & Theodore Johnson, 1997,“Distributed Operating Systems & Algorithms”, (Addison-Wesley), p. 45 to 50, 61 to 63. [2] Suresh Sridharan, 2006, “Distributed Operating Systems “, (University of Wisconsin, Madison). [3] JoAnne L. Holliday and Amr El Abbadi, ”Distributed Deadlock Detection”,

29 References [4] List of distributed computing projects:

30 Questions Thank You


Download ppt "Distributed Systems Major Design Issues"

Similar presentations


Ads by Google