Some Funda …. Computer Architecture for Interconnected Multi Processor systems CPU Shared Memory Interconnecting H/W Local Memory Processor Local Memory Processor Local Memory Processor Local Memory Processor Interconnecting N/w Tightly Coupled Loosely Coupled
Coming to The Point …. Tightly Coupled System = || Processing Sys.. Limitations : Limited Area/Bandwidth Loosely Coupled System = Distributed Sys. Processors can be geographically far Fully Scalable Ideally No Limits Distributed System is Collection of processors interconnected by communication n/w Each processor have its own local memory and peripherals Communication between processors is by message passing over communication n/w For a processor, own resources are ‘ local ’ where as other processor and their resources are ‘ remote ’. A processor and its resource are referred as node/site/machine
Some History …. In past computers were large & Costly Punch Card Days ….!! Job Setup was a problem rather than CPU Batch Process was introduced … ! Batching similar job increased CPU throughput, Offline processing increased I/O performance i.e. multiprocessing was introduced Time sharing was introduced to allow multiple users, this leaded to concept of dumb terminals for multiple interaction Advances in h/w technology made this things common after early 70s and induced minicomputers
More on History …. This time sharing provided two basic ideas of DS, Sharing of resources by multiple users Using of computers from different locations Later dumb terminals were replaced by ‘ intelligent ’ terminals That leaded to ‘ workstations ’ The idea was again sharing of resources Limitation of distance was eliminated by advances in LAN & WAN And finally the baby was born ( I mean the concept of DS … ) in late 70s. An still that ‘ baby ’ is growing....!
Distributed Computing Models Minicomputer Model Interconnected Mini Computers Simple Extension of Centralized time sharing model Several interactive terminals are connected to each minicomputer User logged on one computer can access remote resources on other computers Useful for resource sharing. Comm. N/w Mini Comp. Terminals Mini Comp. Terminals Mini Comp. Terminals Mini Comp. Terminals
Distributed Computing Models WorkStation Model Interconnected Work Stations Each workstation may have own disk, resources and serving as single PC Some may be idle, some may be busy.. Idea is to utilize idle ones for the busy ones If a user logs on a machine..and system finds that it doesn ’ t have sufficient capacity than load should be X ’ fered to others and result should be returned to user ….( So Simple … ;-) ) Comm. N/w WS
Distributed Computing Models (WorkStation) But it is not so simple.. B ’ caz How to decide Ideal Workstation ?? How the process will be X ’ ferred ?? What happens if the idle workstation becomes busy?? Well First two.. You will learn after few weeks lets talk about 3rd Allow remote process to share resources of workstation along with own process Easy to implement but then own user may not get optimum performance Kill the remote process Then what will happened to half cooked things, it is a waste, and will lead to inconsistency.. So Migrate remote process back to home, so execution can be continued there. Complex implementation b ’ caz preemptive process migration comes in to picture
Distributed Computing Models Workstation-Server Model A network of Minicomputers and Workstation( In which some may diskless) Who will provide the file system to diskless workstations (A diskful workstation or a minicomputer) Other minicomputers may provide other services like database, print etc. Specialized machines for specialized services.. User logs on to workstation, for local needs, workstation is sufficient, but for specialized services, request should go to to servers and servers should give response. The benefit is “ process migration ” is not needed so things are simple..
Distributed Computing Models Adv. Workstation-Server Model Vs. Workstation model Cheaper (Eg. few m/c with large HDD vs. ALL) Maintenance/Backup is easy User have flexibility to go to any workstation to access service. It uses request-response protocol so there is no need for process migration. Client-server model is well acceptable, and programmatically easy to implement. User is having guaranteed response time because of process distribution. This architecture doesn ’ t care about idle resources.
Distributed Computing Models Processor Pool Model Funda is..user doesn ’ t need full processing power all the time, but may require it for a short duration ( eg. Compilation) So in this model processors are pooled to gather to be shared by user as needed This pool consist of large no of micro and mini computers attached to n/w Each pool have its own memory to load and run a program. No pool is having terminals directly attached to them, user access is via special terminals (eg. X), A special server (run server) handles demand and supply of pool Communication N/w Run Server Terminals File Server Processor Pool
Distributed Computing Models Something More on Processor Pool Model Here there is no concept of home machine, as user logs on to a system. Better utilization compare to workstation-server model, b ’ caz entire power is available to logged user More flexible, b ’ caz service can be easily expanded without adding machines …. How.. Ideas please Not Suitable for high performance interactive application or graphics operation, because of high n/w communication. Hybrid Model … Workstation Server Weds Processor Pool
Why DCS are becoming popular.. Inherently distributed applications Information Sharing among users Resource Sharing Better Price performance ratio Shorter response time/higher throughput Higher Reliability Extensibility and Incremental Growth Better Flexibility Please Read Section 1.4 from Book, for detailed story …
What is Distributed Operating System OS = A program that controls the resources of computer system and provides its users with an interface or virtual machine. Two basic tasks of an OS are To present users with a virtual machine( interface) that is easier to program than the underlying hardware Manage various resources of the system OS being used for distributed environment can be broadly classified in following types Network Operating Systems Distributed Operating Systems Features which differentiate above are system image, autonomy and fault tolerance
System Image.. That is.. How user ‘ see ’ the system In case of NOS it is collection of distinct machines connected by communication subsystem, so users are aware that they are using multiple system DOS hides the existence of multiple computers and provides a single system image, collection of machine acts as a virtual uni-processor With NOS a user needs to ‘ know ’ the location of the resources to execute job,and for that different ‘ system calls ’ are required for local and remote resources DOS user doesn ’ t require above and OS automatically takes care of resources and system call The idea is “ Transparency ”… Yatrik ’ s Rule “ Better Transparency =Better Performance, More Ease ”
Autonomy Each system of NOS has its own local OS ( May not be same), and there is no co-ordination at all. Only exception of above rule that when two process of different computers communicate, they must agree on communication protocol. Each system is independent of others for local resources. System calls for different calls may be different In DOS, A single systemwide operating system and each computer runs as part of that OS All computers are tightly interwoven, having close cooperation with each other for optimized resource utilization
More on Autonomy … There is a globally valid set of system calls supported by OS is available on all computers. This set of system calls is implemented a set of program called ‘ kernel ’ ‘ kernel ’ manages and controls hardware in a way that all resources are available to other programs through system call. Identical ‘ kernel ’ s are running on all the computers, they often co-operate with each other when such need(?) arises. Computers in NOS are having higher degree of autonomy compared to DOS
Fault Tolerance capability NOS don ’ t have fault tolerance capability DOS are having higher ( or very high) fault tolerance capability. A distributed computing system that uses NOS is referred to as ‘ network system ’ where as one that uses a distributed operating system as ‘ true distributed system ’ (or distributed system)
Design Issues of DOS Some Background … In design of NOS ( Centralized OS), it is assumed that OS has complete and accurate information about the environment in which it is functioning. In above case OS is also aware that, result of state check will be always true Where as a DOS must be design by keeping in mind that no information is available to you about environment Resources are physically separated, No common clock between multiple processor, delivery of intercommunication message get delayed or may be lost Due all these DOS doesn ’ t have consistent/latest knowledge about state of various components and this makes things complex Even though it is complex ultimate objective is to provide a virutal uni- processor system to user, and for the same there are some key design issues..
Transparency Access Transparency It means a user should not need or able to recognize either a resource is remote or local user should access remote resource same way as local The system call should not distinguish between remote and local. it is responsibility of OS to locate resource and get job done. In short a well design set of system calls is required ( ‘ kernel ’ ) It is NOT possible to have system calls which have full access transparency
Transparency …… Location Transparency Name Transparency Name of resource should be independent of the physical connectivity or topology of the system or current location Movable resources must be allowed to move without having their name changed Resource name must be unique system wide User Mobility A user should be allowed to access resources with same name from any where ( or any system) In short … “ Each resource on a system should be identified by a global name, and to have that a global resource naming facility should be available ”
Transparency …… Replication Transparency Almost all DOS have provision to create replicas (of files/resources) Existence of this replicated resources and the replication process should be transparent to users It is responsibility of the system to name the various copies of resources and map it to user defined names System has to also do replication control eg. How many copies, where to place, when to create/delete replica etc.
Transparency …… Failure Transparency Deals with masking from user ’ s partial failures like communication link failure, storage device failure, machine failure etc. System should continue to function in case of above ( performance may decrease) Complete failure transparency is not achievable at present 100% failure transparent system may lead to high cost and low performance because of high degree of redundancy Ex. File services. PKS says “ Theoretically possible but practically not justified ” (??????????)
Transparency …… Migration Transparency For better performance, reliability and security, a movable object often moves from one node to another Migration transparency deals with automatic handling of movable objects Important issues to achieve MT as follows Migration decisions (What to move where..) should be automatically taken by the system Migration of an object should not require any change in name of object At time of migration of process, inter process communication(IPC) mechanism should ensure that a message sent to migrating process should always reach to it.
Transparency …… Concurrency Transparency It is always economical to share system resources among concurrently executing user processes Number of resources is always restricted one process will surly influence action of other concurrently executing process ( b ’ caz of competitions) Concurrency Transparency = above should not “ feeled ” (or felt ) User should always feel that he/she is the sole user of the resources Issues An ‘ event-ordering ’ property should ensure “ proper order ” of all request to resource to provide consistent view to all users A ‘ mutual exclusion ’ property to ensure that “ At given point of time only one process should access the resource ” A ‘ no starvation property ’ to ensure “ if every process granted a resource, must not be simultaneously used by multiple processes,eventually releases it, every request is granted ” A “ no deadlock property ” …..You know what is that …
Transparency …… Performance Transparency System should automatically reconfigure it self to achieve optimum performance. Processing Capability of the system should be uniformly distributed among the currently available jobs in the system Scaling Transparency System should scale (expand) without disrupting activities of users This needs open system architecture and use of scalable algorithms
Reliability Distributed systems are expected to be more reliable than centralized systems, due to multiple existence of resources Only “ multiple existence ” can not do magic, One need to design system to maximize use of “ multiple existence ” You know what is ‘ fault ’ (which cause system failure) Two types of failures “ fail-stop ” and “ Byzantine ” ( This spelling itself suggest something fishy ) Fault-handling mechanism should …. Avoid faults Tolerate faults Detect and recover from faults.. Fault Avoidance Occurrence of faults should be minimized Fault avoidance of H/w components is almost IMPOSSIBLE S/w components must be tested thoroughly to avoid faults
Reliability …… Fault Tolerance Fault tolerance is ability of a system to continue functioning in the case of partial system failure Few concepts to achieve fault tolerance capability Redundancy Techniques Basic idea is to avoid failure by replicating h/w and s/w components (or maintaining multiple copies) If one fails then other one can be used (At least theoretically) This will create additional overhead to maintain multiple copies and consistency between them More copies better reliability larger overhead ? Is how to balance, how much replication one wants..? To have ‘ n ’ fault tolerant system ‘ n+1 ’ replica(s) needed, and to have ‘ n ’ Byzantine fault tolerant system ‘ 2n+1 ’ replica(s) needed ( Why?) For s/w components another idea is to use a virtual storage device that can withstand transient I/o faults and storage media. Distributed Control Control mechanism to avoid single point of failure E.g. To have multiple independent file servers controlling multiple and independent storage devices (or name servers/print servers )
Reliability …… Fault detection and recovery Commonly used techniques are atomic transactions, stateless servers, acknowledgement and timeout based mechanisms Atomic transactions For computation consisting of a collection of operations either all operations are performed or none of their effect prevails Other concurrent process can not modify/see intermediate states This helps to preserve consistency of data objects Crash recovery is more easier ( why … ?) B ’ caz transaction can have only two states: either all are performed or none is performed Stateless servers In client server mechanism server can have two paradigms.. ‘ stateful ’ or ‘ stateless ’ ‘ Stateful ’ server maintains history of transactions with client which is not done by ‘ stateless ’ In case of failure stateless are better because they don ’ t maintain transaction record with client, where as stateful requires complex mechanisms to recover
Reliability …… Acknowledgement and timeout based transmission of messages Node crash or communication link failure may interrupt communication between two process resulting in loss of message IPC mechanisms must have ‘ something ’ to detect loss of message This “ something ” = time So if acknowledgement doesn ’ t come ‘ in-time ’ then message should be re-transmitted. This retransmission may also cause duplicate messages …. Which should be avoided Main drawback of reliable system is POTENTIAL LOSS OF EXECUTION TIME EFFICIENCY DUE TO X ’ TRA OVERHEAD INVOLVED IN IMPLIMENTING THIS TECHNIQUES.
Flexibility What is need of so called ‘ flexibility? Ease of modification It should be easy to incorporate changes in user transparent manner/with minimum interrupt Ease of enhancement It should be easy to add new functionalities/services to the system If user wants his/her own service, or modify existing service he should be allowed Design of kernel greatly influence “ flexibility ” “ Kernel ” is central part of the OS which provides basic system facilities, it operates in separate space that is not accessible by user ( so user can ’ t modify) You know that in distributed system identical kernels run on each node
Flexibility ….. In Distributed system mainly we refer to two kind of kernel that are monolithic and microkernel User Appln Monolithic (includes most of OS service) User Appln Monolithic (includes most of OS service) User Appln Monolithic (includes most of OS service) Network H/w User Appln Microkernel (Only Minimal Facilities) Srvr/Mngr Module User Appln Microkernel (Only Minimal Facilities) Srvr/Mngr Module User Appln Microkernel (Only Minimal Facilities) Srvr/Mngr Module Network H/w
Flexibility ….. In monolithic kernel model, most of OS services like process/memory/device/file management, IPC etc are provided by kernel So in above case kernel has large monolithic structure In microkernel model,main funda is to keep kernel ASAP ( not that one ….! it is As Small As Possible) So kernel provide only minimal facilities limited to IPC/low level device mgmnt/low level process mgmnt etc All other OS services/call handling etc. are implemented by user level server process, each process has its own address space and can be programmed separately Now tell me which is better.
Flexibility ….. Basic advantages of microkernel model being used in DOS are ( Why we use C/C++/C#/VB instead of assembly ?) Flexibility advantage of the microkernel model Theoretically microkernel model gives poor performance, that is not true in practice, overhead in message passing is usually negligible compared to other factors.
Performance Performance of application in DOS >= performance on centralized system Design principals are as follows Batch if possible Cache if possible Minimize copying of data Minimize the network traffic Take advantage of ‘ fine-grain ’ parallelism for multiprocessing
Scalability Scalability = capability of system to adapt to increased service load DOS should be designed to cope with growth of nodes and for that Design funda are … Avoid centralized entities b ’ caz.. Failure of entity often brings entire system down ( fault tolerance will be affected) Performance of centralized entity becomes bottle neck Even though centralized entity got performance power,capacity of n/w may play a role ( contention !!) In case of WAN based systems its improper to serve all request by single server Avoid Centralized algorithms Centralized algo = collect information from all nodes, process it at one node and distribute results to other nodes (eg. Scheduling Algorithm ) Decentralized algorithms should be used where global state information is not collected, decision is based on locally available information and global clock doesn ’ t exists
Scalability ….. Perform most operation of client workstation As server is common resource for several clients, and server cycles are more precious than cycles of client. This principal enhances the scalability of the system, as it allows graceful degradation as system grows Caching is frequently used
Heterogeneity Heterogeneous DOS = interconnected sets of dissimilar hardware and software systems There are lots of incompatibilities which includes formatting schemes / communication protocols and topologies etc.. Some form of data translation is necessary for interaction between two incompatible nodes This translation can be done at sender o receiver's end, and to have this each node must have translator for each format If there are n formats then n-1 translators at each not total n(n-1) translators in entire system (!!!!) This need can be reduced by using intermediate standard data format. So each node should only know how to read/write standard format ( so simple … !)
Security Prevention from unauthorized access More difficult in DOS … ( You know why … ) I thought you really know … !!!! Compared to centralized system DOS should have following additional requirements It should be possible for sender of a message to know that message has been really received by actual receiver It should be possible for the receiver of a message to know that the message was sent by genuine sender It should be possible for both the sender and the receiver of a message to be guaranteed that the contents of a message were not changed while it was in transfer Cryptography is the only known practical method When security depends on fewest possible entities, the system is supposed to be more secure.
Emulation of Existing Operating System For commercial issues it is important that newly design DOS should able to emulate popular OS such as unix/linux New s/w can be written using the system call interface of new OS to take full advantage of distributed computing, but vast no of already existing old s/w can also be run on same system without re-writing them. So new DOS will allow both the types of s/w which runs side by side …