3Some Funda….Computer Architecture for Interconnected Multi Processor systemsSharedMemoryCPUCPUCPUCPUTightly CoupledInterconnecting H/WLocal MemoryLocal MemoryLocal MemoryLocal MemoryLoosely CoupledProcessorProcessorProcessorProcessorInterconnecting N/w
4Coming to The Point …. Tightly Coupled System = || Processing Sys.. Limitations : Limited Area/BandwidthLoosely Coupled System = Distributed Sys.Processors can be geographically farFully ScalableIdeally No LimitsDistributed System isCollection of processors interconnected by communication n/wEach processor have its own local memory and peripheralsCommunication between processors is by message passing over communication n/wFor a processor, own resources are ‘local’ where as other processor and their resources are ‘remote’.A processor and its resource are referred as node/site/machine
5Some History…. In past computers were large & Costly Punch Card Days….!!Job Setup was a problem rather than CPUBatch Process was introduced…!Batching similar job increased CPU throughput, Offline processing increased I/O performance i.e. multiprocessing was introducedTime sharing was introduced to allow multiple users, this leaded to concept of dumb terminals for multiple interactionAdvances in h/w technology made this things common after early 70s and induced minicomputers
6More on History…. This time sharing provided two basic ideas of DS, Sharing of resources by multiple usersUsing of computers from different locationsLater dumb terminals were replaced by ‘intelligent’ terminalsThat leaded to ‘workstations’The idea was again sharing of resourcesLimitation of distance was eliminated by advances in LAN & WANAnd finally the baby was born ( I mean the concept of DS… ) in late 70s.An still that ‘baby’ is growing.. ..!
7Distributed Computing Models MiniComp.TerminalsMinicomputer ModelInterconnected Mini ComputersSimple Extension of Centralized time sharing modelSeveral interactive terminals are connected to each minicomputerUser logged on one computer can access remote resources on other computersUseful for resource sharing.MiniComp.MiniComp.Comm.N/wTerminalsTerminalsMiniComp.Terminals
8Distributed Computing Models WorkStation ModelInterconnected Work StationsEach workstation may have own disk, resources and serving as single PCSome may be idle, some may be busy..Idea is to utilize idle ones for the busy onesIf a user logs on a machine..and system finds that it doesn’t have sufficient capacity than load should be X’fered to others and result should be returned to user….( So Simple…;-) )WSWSComm.N/wWSWS
9Distributed Computing Models (WorkStation) But it is not so simple.. B’cazHow to decide Ideal Workstation ??How the process will be X’ferred ??What happens if the idle workstation becomes busy??Well First two.. You will learn after few weeks lets talk about 3rdAllow remote process to share resources of workstation along with own processEasy to implement but then own user may not get optimum performanceKill the remote processThen what will happened to half cooked things, it is a waste , and will lead to inconsistency .. So Migrate remote process back to home, so execution can be continued there.Complex implementation b’caz preemptive process migration comes in to picture
10Distributed Computing Models Workstation-Server ModelA network of Minicomputers and Workstation( In which some may diskless)Who will provide the file system to diskless workstations(A diskful workstation or a minicomputer)Other minicomputers may provide other services like database, print etc.Specialized machines for specialized services..User logs on to workstation, for local needs, workstation is sufficient, but for specialized services, request should go to to servers and servers should give response.The benefit is “process migration” is not needed so things are simple..
11Distributed Computing Models Adv. Workstation-Server Model Vs. Workstation modelCheaper (Eg. few m/c with large HDD vs. ALL)Maintenance/Backup is easyUser have flexibility to go to any workstation to access service.It uses request-response protocol so there is no need for process migration.Client-server model is well acceptable, and programmatically easy to implement.User is having guaranteed response time because of process distribution.This architecture doesn’t care about idle resources.
12Distributed Computing Models TerminalsTerminalsProcessor Pool ModelFunda is ..user doesn’t need full processing power all the time, but may require it for a short duration ( eg. Compilation)So in this model processors are pooled to gather to be shared by user as neededThis pool consist of large no of micro and mini computers attached to n/wEach pool have its own memory to load and run a program.No pool is having terminals directly attached to them, user access is via special terminals (eg. X), A special server (run server) handles demand and supply of poolCommunication N/wRunServerFileServerProcessor Pool
13Distributed Computing Models Something More on Processor Pool ModelHere there is no concept of home machine, as user logs on to a system.Better utilization compare to workstation-server model, b’caz entire power is available to logged userMore flexible , b’caz service can be easily expanded without adding machines….How.. Ideas pleaseNot Suitable for high performance interactive application or graphics operation, because of high n/w communication.Hybrid Model…Workstation Server Weds Processor Pool
14Why DCS are becoming popular.. Inherently distributed applicationsInformation Sharing among usersResource SharingBetter Price performance ratioShorter response time/higher throughputHigher ReliabilityExtensibility and Incremental GrowthBetter FlexibilityPlease Read Section 1.4 from Book, for detailed story…
15What is Distributed Operating System OS = A program that controls the resources of computer system and provides its users with an interface or virtual machine.Two basic tasks of an OS areTo present users with a virtual machine( interface) that is easier to program than the underlying hardwareManage various resources of the systemOS being used for distributed environment can be broadly classified in following typesNetwork Operating SystemsDistributed Operating SystemsFeatures which differentiate above are system image, autonomy and fault tolerance
16System Image.. That is .. How user ‘see’ the system In case of NOS it is collection of distinct machines connected by communication subsystem, so users are aware that they are using multiple systemDOS hides the existence of multiple computers and provides a single system image, collection of machine acts as a virtual uni-processorWith NOS a user needs to ‘know’ the location of the resources to execute job,and for that different ‘system calls’ are required for local and remote resourcesDOS user doesn’t require above and OS automatically takes care of resources and system callThe idea is “Transparency”…Yatrik’s Rule “ Better Transparency =Better Performance, More Ease”
17AutonomyEach system of NOS has its own local OS ( May not be same), and there is no co-ordination at all.Only exception of above rule that when two process of different computers communicate, they must agree on communication protocol.Each system is independent of others for local resources.System calls for different calls may be differentIn DOS , A single systemwide operating system and each computer runs as part of that OSAll computers are tightly interwoven, having close cooperation with each other for optimized resource utilization
18More on Autonomy…There is a globally valid set of system calls supported by OS is available on all computers .This set of system calls is implemented a set of program called ‘kernel’‘kernel’ manages and controls hardware in a way that all resources are available to other programs through system call.Identical ‘kernel’s are running on all the computers, they often co-operate with each other when such need(?) arises.Computers in NOS are having higher degree of autonomy compared to DOS
19Fault Tolerance capability NOS don’t have fault tolerance capabilityDOS are having higher ( or very high) fault tolerance capability.A distributed computing system that uses NOS is referred to as ‘network system’ where as one that uses a distributed operating system as ‘true distributed system’ (or distributed system)
20Design Issues of DOS Some Background… In design of NOS ( Centralized OS), it is assumed that OS has complete and accurate information about the environment in which it is functioning.In above case OS is also aware that, result of state check will be always trueWhere as a DOS must be design by keeping in mind that no information is available to you about environmentResources are physically separated, No common clock between multiple processor , delivery of intercommunication message get delayed or may be lostDue all these DOS doesn’t have consistent/latest knowledge about state of various components and this makes things complexEven though it is complex ultimate objective is to provide a virutal uni-processor system to user, and for the same there are some key design issues..
21Transparency Access Transparency It means a user should not need or able to recognize either a resource is remote or local user should access remote resource same way as local The system call should not distinguish between remote and local. it is responsibility of OS to locate resource and get job done.In short a well design set of system calls is required (‘kernel’)It is NOT possible to have system calls which have full access transparency
22Transparency…… Location Transparency Name Transparency User Mobility Name of resource should be independent of the physical connectivity or topology of the system or current locationMovable resources must be allowed to move without having their name changedResource name must be unique system wideUser MobilityA user should be allowed to access resources with same name from any where ( or any system)In short … “Each resource on a system should be identified by a global name, and to have that a global resource naming facility should be available”
23Transparency…… Replication Transparency Almost all DOS have provision to create replicas (of files/resources)Existence of this replicated resources and the replication process should be transparent to usersIt is responsibility of the system to name the various copies of resources and map it to user defined namesSystem has to also do replication control eg. How many copies, where to place, when to create/delete replica etc.
24Transparency…… Failure Transparency Deals with masking from user’s partial failures like communication link failure, storage device failure, machine failure etc.System should continue to function in case of above ( performance may decrease)Complete failure transparency is not achievable at present100% failure transparent system may lead to high cost and low performance because of high degree of redundancyEx. File services.PKS says “ Theoretically possible but practically not justified” (??????????)
25Transparency…… Migration Transparency For better performance, reliability and security, a movable object often moves from one node to anotherMigration transparency deals with automatic handling of movable objectsImportant issues to achieve MT as followsMigration decisions (What to move where..) should be automatically taken by the systemMigration of an object should not require any change in name of objectAt time of migration of process, inter process communication(IPC) mechanism should ensure that a message sent to migrating process should always reach to it.
26Transparency…… Concurrency Transparency It is always economical to share system resources among concurrently executing user processesNumber of resources is always restricted one process will surly influence action of other concurrently executing process ( b’caz of competitions)Concurrency Transparency = above should not “feeled” (or felt )User should always feel that he/she is the sole user of the resourcesIssuesAn ‘event-ordering’ property should ensure “proper order” of all request to resource to provide consistent view to all usersA ‘mutual exclusion’ property to ensure that “ At given point of time only one process should access the resource”A ‘no starvation property’ to ensure “if every process granted a resource, must not be simultaneously used by multiple processes,eventually releases it, every request is granted”A “no deadlock property” …..You know what is that…
27Transparency…… Scaling Transparency Performance Transparency System should automatically reconfigure it self to achieve optimum performance.Processing Capability of the system should be uniformly distributed among the currently available jobs in the systemScaling TransparencySystem should scale (expand) without disrupting activities of usersThis needs open system architecture and use of scalable algorithms
28ReliabilityDistributed systems are expected to be more reliable than centralized systems, due to multiple existence of resourcesOnly “multiple existence” can not do magic , One need to design system to maximize use of “multiple existence”You know what is ‘fault’ (which cause system failure)Two types of failures “fail-stop” and “Byzantine” ( This spelling itself suggest something fishy )Fault-handling mechanism should ….Avoid faultsTolerate faultsDetect and recover from faults..Fault AvoidanceOccurrence of faults should be minimizedFault avoidance of H/w components is almost IMPOSSIBLES/w components must be tested thoroughly to avoid faults
29Reliability…… Fault Tolerance Fault tolerance is ability of a system to continue functioning in the case of partial system failureFew concepts to achieve fault tolerance capabilityRedundancy TechniquesBasic idea is to avoid failure by replicating h/w and s/w components (or maintaining multiple copies)If one fails then other one can be used (At least theoretically)This will create additional overhead to maintain multiple copies and consistency between themMore copies better reliability larger overhead? Is how to balance, how much replication one wants..?To have ‘n’ fault tolerant system ‘n+1’ replica(s) needed, and to have ‘n’ Byzantine fault tolerant system ‘2n+1’ replica(s) needed ( Why?)For s/w components another idea is to use a virtual storage device that can withstand transient I/o faults and storage media.Distributed ControlControl mechanism to avoid single point of failureE.g. To have multiple independent file servers controlling multiple and independent storage devices (or name servers/print servers )
30Reliability…… Fault detection and recovery Commonly used techniques are atomic transactions, stateless servers , acknowledgement and timeout based mechanismsAtomic transactionsFor computation consisting of a collection of operations either all operations are performed or none of their effect prevailsOther concurrent process can not modify/see intermediate statesThis helps to preserve consistency of data objectsCrash recovery is more easier ( why…?)B’caz transaction can have only two states: either all are performed or none is performedStateless serversIn client server mechanism server can have two paradigms .. ‘stateful’ or ‘stateless’‘Stateful’ server maintains history of transactions with client which is not done by ‘stateless’In case of failure stateless are better because they don’t maintain transaction record with client , where as stateful requires complex mechanisms to recover
31Reliability……Acknowledgement and timeout based transmission of messagesNode crash or communication link failure may interrupt communication between two process resulting in loss of messageIPC mechanisms must have ‘something’ to detect loss of messageThis “something” = timeSo if acknowledgement doesn’t come ‘in-time’ then message should be re-transmitted.This retransmission may also cause duplicate messages…. Which should be avoidedMain drawback of reliable system is POTENTIAL LOSS OF EXECUTION TIME EFFICIENCY DUE TO X’TRA OVERHEAD INVOLVED IN IMPLIMENTING THIS TECHNIQUES.
32Flexibility What is need of so called ‘flexibility? Ease of modificationIt should be easy to incorporate changes in user transparent manner/with minimum interruptEase of enhancementIt should be easy to add new functionalities/services to the systemIf user wants his/her own service, or modify existing service he should be allowedDesign of kernel greatly influence “flexibility”“Kernel” is central part of the OS which provides basic system facilities, it operates in separate space that is not accessible by user ( so user can’t modify)You know that in distributed system identical kernels run on each node
33Flexibility…..In Distributed system mainly we refer to two kind of kernel that are monolithic and microkernelUser ApplnUser ApplnUser ApplnMonolithic(includes most of OS service)Monolithic(includes most of OS service)Monolithic(includes most of OS service)Network H/wUser ApplnUser ApplnUser ApplnSrvr/MngrModuleSrvr/MngrModuleSrvr/MngrModuleMicrokernel(Only Minimal Facilities)Microkernel(Only Minimal Facilities)Microkernel(Only Minimal Facilities)Network H/w
34Flexibility…..In monolithic kernel model, most of OS services like process/memory/device/file management, IPC etc are provided by kernelSo in above case kernel has large monolithic structureIn microkernel model,main funda is to keep kernel ASAP ( not that one….! it is As Small As Possible)So kernel provide only minimal facilities limited to IPC/low level device mgmnt/low level process mgmnt etcAll other OS services/call handling etc. are implemented by user level server process, each process has its own address space and can be programmed separatelyNow tell me which is better.
35Flexibility…..Basic advantages of microkernel model being used in DOS are( Why we use C/C++/C#/VB instead of assembly ?)Flexibility advantage of the microkernel modelTheoretically microkernel model gives poor performance, that is not true in practice, overhead in message passing is usually negligible compared to other factors.
36PerformancePerformance of application in DOS >= performance on centralized systemDesign principals are as followsBatch if possibleCache if possibleMinimize copying of dataMinimize the network trafficTake advantage of ‘fine-grain’ parallelism for multiprocessing
37ScalabilityScalability = capability of system to adapt to increased service loadDOS should be designed to cope with growth of nodes and for that Design funda are …Avoid centralized entities b’caz..Failure of entity often brings entire system down ( fault tolerance will be affected)Performance of centralized entity becomes bottle neckEven though centralized entity got performance power ,capacity of n/w may play a role ( contention !!)In case of WAN based systems its improper to serve all request by single serverAvoid Centralized algorithmsCentralized algo = collect information from all nodes, process it at one node and distribute results to other nodes (eg. Scheduling Algorithm )Decentralized algorithms should be used where global state information is not collected, decision is based on locally available information and global clock doesn’t exists
38Scalability….. Perform most operation of client workstation As server is common resource for several clients, and server cycles are more precious than cycles of client.This principal enhances the scalability of the system, as it allows graceful degradation as system growsCaching is frequently used
39HeterogeneityHeterogeneous DOS = interconnected sets of dissimilar hardware and software systemsThere are lots of incompatibilities which includes formatting schemes / communication protocols and topologies etc..Some form of data translation is necessary for interaction between two incompatible nodesThis translation can be done at sender o receiver's end, and to have this each node must have translator for each formatIf there are n formats then n-1 translators at each not total n(n-1) translators in entire system (!!!!)This need can be reduced by using intermediate standard data format. So each node should only know how to read/write standard format ( so simple…!)
40Security Prevention from unauthorized access More difficult in DOS … ( You know why …)I thought you really know…!!!!Compared to centralized system DOS should have following additional requirementsIt should be possible for sender of a message to know that message has been really received by actual receiverIt should be possible for the receiver of a message to know that the message was sent by genuine senderIt should be possible for both the sender and the receiver of a message to be guaranteed that the contents of a message were not changed while it was in transferCryptography is the only known practical methodWhen security depends on fewest possible entities , the system is supposed to be more secure.
41Emulation of Existing Operating System For commercial issues it is important that newly design DOS should able to emulate popular OS such as unix/linuxNew s/w can be written using the system call interface of new OS to take full advantage of distributed computing, but vast no of already existing old s/w can also be run on same system without re-writing them.So new DOS will allow both the types of s/w which runs side by side…