Distributed Systems Major Design Issues

Slides:



Advertisements
Similar presentations
Distributed System Services Prepared By:- Monika Patel.
Advertisements

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Meng Han Presentation 09/11/2013 CS8320 – Advanced Operating Systems Fall 2013 – Section 2.6 Presentation.
CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community.
Distributed System Structures Network Operating Systems –provide an environment where users can access remote resources through remote login or file transfer.
Distributed systems Programming with threads. Reviews on OS concepts Each process occupies a single address space.
1 Concurrency: Deadlock and Starvation Chapter 6.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
DM Rasanjalee Himali CSc8320 – Advanced Operating Systems (SECTION 2.6) FALL 2009.
B.Ramamurthy9/19/20151 Operating Systems u Bina Ramamurthy CS421.
DCE (distributed computing environment) DCE (distributed computing environment)
Distributed File System By Manshu Zhang. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
1 Announcements The fixing the bug part of Lab 4’s assignment 2 is now considered extra credit. Comments for the code should be on the parts you wrote.
Deadlocks Silberschatz Ch. 7 and Priority Inversion Problems.
Distributed System Concepts and Architectures 2.3 Services Fall 2011 Student: Fan Bai
2001 Networking Operating Systems (CO32010) 1. Operating Systems 2. Processes and scheduling 4.
Middleware Services. Functions of Middleware Encapsulation Protection Concurrent processing Communication Scheduling.
Distributed System Concepts and Architectures Services
Deadlock Detection and Recovery
Shuman Guo CSc 8320 Advanced Operating Systems
Distributed System Services Fall 2008 Siva Josyula
System Components ● There are three main protected modules of the System  The Hardware Abstraction Layer ● A virtual machine to configure all devices.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Section 2.1 Distributed System Design Goals Alex De Ruiter
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Distributed Mutual Exclusion Synchronization in Distributed Systems Synchronization in distributed systems are often more difficult compared to synchronization.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Operating Systems Distributed-System Structures. Topics –Network-Operating Systems –Distributed-Operating Systems –Remote Services –Robustness –Design.
“Request /Reply Communication”
Distributed Shared Memory
Applied Operating System Concepts -
Andy Wang COP 5611 Advanced Operating Systems
Distributed Mutex EE324 Lecture 11.
#01 Client/Server Computing
Distributed System Concepts and Architectures
Operating Systems : Overview
Operating Systems Bina Ramamurthy CSE421 11/27/2018 B.Ramamurthy.
Outline Midterm results summary Distributed file systems – continued
Sarah Diesburg Operating Systems COP 4610
Fault Tolerance Distributed Web-based Systems
Distributed Mutual Exclusion
Operating Systems : Overview
Process Description and Control
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
Process Description and Control
Concurrency: Mutual Exclusion and Process Synchronization
Process Description and Control
Operating Systems : Overview
Process Description and Control
Process Description and Control
Operating Systems : Overview
Operating Systems : Overview
CS510 - Portland State University
Process Description and Control
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
Operating Systems : Overview
Chapter 2 Processes and Threads 2.1 Processes 2.2 Threads
Process Description and Control
#01 Client/Server Computing
Presentation transcript:

Distributed Systems Major Design Issues Debraj De Presentation 09/14/2011 CS8320 – Advanced Operating Systems Fall 2011 – Section 2.6 Presentation

Distributed System Design Issues Presentation Outline Introduction Distributed System Design Issues Object Models and Naming Schemes Distributed Coordination Interprocess Communication Distributed Resources Fault Tolerance and Security References

Management of Distributed System mainly consists of: Introduction Management of Distributed System mainly consists of: Coordination of concurrent distributed processes Management and networking of distributed resources Functioning of distributed algorithms But…. network may be unreliable Components may be untrusted These raise the design and implementation issues, in particular how to support transparency.

Introduction Following need to be considered for resolving design and implementation issues: How objects in the system are modeled and identified How to co-ordinate the interaction among objects how they communicate with each other How can shared/replicated objects be managed in controlled fashion Protection of objects and system security

Design & Implementation Issues Object Models and Naming Schemes Distributed Coordination Interprocess Communication Distributed Resources Fault Tolerance and Security

[1] Object Models and Naming Schemes Objects in a computer system: processes, data files, memory, devices, processors, and networks. Objects are encapsulated in servers process servers, file servers, memory servers etc. A client is a null server that accesses object servers.

[1] Object Models and Naming Schemes Cont’d Three possible ways to identify a server Identification by name (name server) Identification by either physical or logical address (network server) Identification by service that the servers provide Following all depend on the naming scheme for system objects: Structure of the system, management of name space, name resolution, access methods

[2] Distributed Coordination Processes require coordination to achieve synchronization Types of synchronization requirement Barrier synchronization Condition coordination Mutual exclusion

Types of Synchronization [2] Distributed Coordination Types of Synchronization Barrier synchronization Process must reach a common synchronization point before they can continue Condition coordination A process must wait for a condition that will be set asynchronously by other interacting processes to maintain some ordering of execution Mutual exclusion Concurrent processes must have mutual exclusion when accessing a critical shared resource

[2] Distributed Coordination State information synchronization No shared memory Time messages are inaccurate or incomplete Centralized coordination, shift in coordinator

[2] Distributed Coordination Process deadlock problem related to synchronization Deadlock detection and recovery tool needed Four conditions must hold for deadlock to occur Exclusive use Hold and wait No preemption Cyclical wait

[2] Distributed Coordination The problem of deadlocks can be handled in following ways Prevention Ensure that deadlock is not possible Avoidance require decisions by the system while it is running in order to insure that deadlocks will not occur Detection When detected, decide which process to rollback or abnormally terminate

[2] Distributed Coordination If one of the four conditions is prevented, it will prevent deadlocks For example, to impose an order on the resources and require processes to request resources in increasing order. This prevents cyclical wait and thus makes deadlocks impossible

[2] Distributed Coordination Real-world example with deadlock: Mars Rover problem M. Jones. What really happened on mars rover pathfinder. The Risks Digest, 19(49), December 1997

[2] Distributed Coordination Mars Rover Frequent Reset issue: Data-gathering thread (low priority) lock.acquire(); write data; lock.release(); Info-bus thread (high priority) retrieve data; Communication thread (medium priority,long) Information bus (shared memory) Info-bus thread waits for data-gathering thread (to acquire lock); Communication thread preempts data-gathering thread

[3] Interprocess Communication Interprocess communication can be accomplished by using simple message passing primitives Higher level logical communication methods provides the transparency Hide the physical details of message passing Two important concepts The client/server model Remote Procedure Call (RPC)

[3] Interprocess Communication The client/ server model is a programming example for structuring processes in distributed systems logical communication request reply actual communication network client server kernel kernel

[3] Interprocess Communication The Remote Procedure Call (RPC) model is similar to that of the local model The caller places arguments to a procedure in a specific location (such as a result register) The caller temporarily transfers control to the procedure When the caller gains control again, it obtains the results of the procedure from the specified location. The caller then continues program execution.

[3] Interprocess Communication On the server side, a process is dormant (inactive, sleeping), awaiting the arrival of a call message. When one arrives, the server process computes a reply that it then sends back to the requesting client. After this, the server process becomes dormant again.

[3] Interprocess Communication

[4] Distributed Resources Data and Processing Capacity Load Distribution multiprocessor scheduling (Static) load distribution/sharing (Dynamic)

[4] Distributed Resources Distributed shared memory Distributed file systems Issues: Sharing and Replication of data Requires to maintain: data consistency and coherency Difference in implementation: distributed file systems and distributed shared memory

[5] Fault Tolerance and Security Distributed Systems have openness in operating environment So vulnerable to failures and security threats Faults: Failure and Security Violation

[5] Fault Tolerance and Security The problem of failures can be alleviated through: redundancy Transparent handling of failures (like removal of machines, network links, and other resources) without loss of data or functionality Roll-back recovery for execution states

[5] Fault Tolerance and Security OS view: trustworthy communication process, confidentiality and integrity of messages and data Security Authentication: clients and also servers and messages must be authenticated. Authorization: access control has to be performed across a physical network with heterogeneous components under different administrative units using different security models.

[5] Fault Tolerance and Security Ariane 5 failure A software bug caused European Space Agency’s Ariane 5 rocket to crash 40 seconds into its first flight in 1996 (cost: half billion dollars) The bug was caused because of a software component that was being reused from Ariane 4 A software exception occurred during execution of a data conversion from 64-bit floating point to 16-bit signed integer value The value was larger than 32,767, the largest integer storable in a 16 bit signed integer, and thus the conversion failed and an exception was raised by the program Engineers chose in earlier version of the Ariane rocket, to leave this function running for the first 40 seconds of flight to make it easy to restart the system in the event of a brief hold in the countdown * [Source: http://www.ima.umn.edu/~arnold/disasters/ariane5rep.html]

Summary Unique design and implementation issues Include: object models and naming schemes distributed resources interprocess communication Fault tolerance and security

References [1] Randy Chow & Theodore Johnson, 1997,“Distributed Operating Systems & Algorithms”, (Addison-Wesley), p. 45 to 50, 61 to 63. [2] Suresh Sridharan, 2006, “Distributed Operating Systems “, (University of Wisconsin, Madison). http://pages.cs.wisc.edu/~dusseau/Classes/CS739/Writeups/Survey.pdf [3] JoAnne L. Holliday and Amr El Abbadi, ”Distributed Deadlock Detection”, http://www.cse.scu.edu/~jholliday/dd_9_16.htm

References [4] List of distributed computing projects: http://en.wikipedia.org/wiki/List_of_distributed_computing_projects

Questions Thank You Email: dde1@student.gsu.edu