1 Programming systems for distributed applications Seif Haridi KTH/SICS
2 A distributed system is set of processes, linked by a network No global information, no global time Unpredictable communication delays Concurrency and nondeterminism Large probability of localized faults Easy access by unauthorized users What are the properties of global distributed systems?
3 Additional Properties of the Internet A global network that is partitioned into several protection domains (Firewalls) Private sub networks with multiple reassignment of IP addresses across networks Dynamic reassignment of IP addresses -- ISP’s reuse a pool of IP addresses among customers
4 Middleware for distributed global applications The middleware abstracts the complexity of the underlying system and provides the services of the OS to the application programmer. The middleware should support the distribution structure, scalability, openness, failure handling and security issues. Provides transparency (network and location) as much as possible or as desirable. The Network The operating system The middleware The applications
5 The middleware as a programming system for distributed applications Design a programming system from the start that is suitable for distributed applications (Oz language and its system Mozart) Extend an existing programming system with libraries to support distributed computing (JAVA) Provide an distribution layer that is language independent (CORBA), this layer might be needed anyway for communication with foreign software.
6 Programming system for distributed applications The programming language by design provides abstractions necessary for distributed applications: –Concurrency and various communication abstraction –Mobility of code (or more generally closures) and other entities –Mechanisms for security at the language level -- the programming language by construction support all the concept needed for allowing arbitrary security levels (no holes) –Notion of sited resources, how to plug and unplug resources –Notion of a distributed/mobile component (for mobility of applications) –Dynamic connectivity, transfer of entities and modification of various applications at runtime –Abstraction of the network transport media
7 Programming system for distributed applications The programming system (runtime system) by design provides mechanisms to support: –Network transparency –Well defined and extended distributed behavior for all language entities -- part of network awareness. –Mechanisms for guaranteeing security on untrusted sites (fake implementations) –Mechanism for limiting resource (memory and processor time) consumption by foreign computations at runtime –Network layer that supports location transparency (mobile applications) (multiple) IP independent addressing –Configurable and scalable network layer (multiple protocols, TCP, TTCP, Reliable UDP, …) –Dynamic connectivity, fault/ connectivity detection –Firewall enabled
8 The programming system (runtime system) Network layer Memory management layer Protocol layer Extended engine } Shared computation space Distributed garbage collection } Distribution protocols } Local execution is here Network Transport Level Messages between distributed nodes Messages between threads and nodes Structured Messages Raw byte sequence
9 The issues in distributed programming Functionality Fault tolerance Distribution Openness Resource Control Security Scalability
10 The issues in distributed programming Functionality Fault tolerance Part of problem Interaction Distribution Openness Resource Control Security Scalability
11 The issues in distributed programming Functionality Fault tolerance Part of problem Interaction Distribution Security Openness Resource Control Functionality Distribution Openness Security Resource Control Fault tolerance Scalability
12 Network transparency It means: –If you develop an application on a single machine, you can distribute the entities to different sites without changing the logical behavior (functionality/semantics) of the application –If you connect to independent applications together they will logically behave as if they were running on a single machine. It does not means that you loose control over your applications However network transparency breaks when faults (network partitioning) occurs. Therefore awareness is also needed. The role of fault detection is to reflect abstracted failure to allow the programmer to develop fault tolerance mechanisms.
13 Application Logic Graphic Entities BS DB CMWMCMWM Board Server Display Broadcaster CM - Client Manager WM- Window Manager
14 Transparent distribution Graphic Entities BS DB CMWMCMWM Board Server Display Broadcaster CM - Client Manager WM- Window Manager
15 Network-transparency and scalability Distributed servers Services User Connector User
16 Programming systems for distributed applications The programming language by design provides abstractions necessary for distributed applications: –Concurrency and various communication abstractions –Mobility of code (or more generally closures) and other entities –Mechanisms for security at the language level -- the programming language by construction support all the concept needed for allowing arbitrary security levels (no holes) –Notion of sited resources, how to plug and unplug resources –Notion of a distributed/mobile component (for mobility of applications) –Dynamic connectivity, transfer of entities and modification of various applications at runtime –Abstraction of the network transport media
17 Language Entities records, procedures, classes objects single-assignment variable cells ports, threads functors, components Concurrency and asynchronous communication abstractions are important to hide latency in distributed applications Cheap light weight threads are necessary combined with dataflow synchronization to overlap computation and communication time T 1 : m(X) T 0 : …X… T0T0 X=Value Object
18 Language Entities records, procedures, classes objects single-assignment variable cells ports, threads functors, components Concurrency and asynchronous communication abstractions are important to hide latency in distributed applications Just use Ports for asynchronous communication, no thread creation time T 0 : send m(X) to P T 0 : …X… T0T0 X=Value receive m invoke Object
19 JAVA’s remote method invocation JAVA’s basic mechanism for communication is extending the sequential method invocation into invocation on remote objects It is possible to create a thread, but it is both expensive and synchronization with the main thread is awkard A future abstraction has to programmed explicitly combined with the call-back in a separate thread time T 0 : invoke m(X) to object T 0 : …X… T0T0 X=Value Object
20 Language Entities records, procedures, classes objects single-assignment variable cells ports, threads functors, components Mobility of code (or more generally closures) and other entities Every entity in the language is a runtime value (access by reference not by a string) In particular procedures and classes are values that can be sent in a message, or given as a parameter to remote invocation No name collision at remote site C variable send m(.) to O
21 Distributed Lexical Scoping/ Mobility C variable send m(.) to site 2 E2 E1E1 E1E1 C variable E2 E1E1 E1E1 Site 1 Site 2
22 Advantages of distributed lexical scoping Distribution transparency under procedure/class/object mobility –programs can be tested and understood on one site, and behave the same when distributed on different sites Security, given correct implementation: –a procedure transferred to a site will not accidentally acquire unauthorized access to entities residing on the site. –Can easily implement capability based security, allowing subset of possible operations on an entity Example a procedure referring to a file object, only allowing reads
23 Programming system for distributed applications The programming system (runtime system) by design provides mechanisms to support: –Network transparency –Well defined and extended distributed behavior for all language entities -- part of network awareness. –Mechanisms for guaranteeing security on untrusted sites (fake implementations) –Mechanism for limiting resource (memory and processor time) consumption by foreign computations at runtime –Network layer that supports location transparency (mobile applications) (multiple) IP independent addressing –Configurable and scalable network layer (multiple protocols, TCP, TTCP, Reliable UDP, …) –Dynamic connectivity, fault/ connectivity detection –Firewall enabled
24 Distributed behavior StatelessReplicationEager Lazy record, procedure, class object-record Single assignment Eager elimination single assign. variable StatefulLocalizationMobile Stationary cell, object-state port, thread resourcesPrivate
25 System support Object could not be copied by value as in JAVA because this will loose transperancy Object could be either stationary, or mobile with a consistency protocol that is fault tolerant. Mozart support both.
26 Resource Security with distributed components requires Open.file Open New Module Manager I do not trust you You can only read/write in TEMP Imported to site
27 Conclusions I have shown that making a programming language distributed (Internet ready!) is not just a taking a centralized language and extending it with a distribution layer. The design should take distribution into account from scratch. I have just touched upon the subject, the topic I did not cover are just as important. Look at the mozart demo, where you would also see the distributed component architecture where new functionality can be added on the fly and make available to all participants in a Mozart instant messenger (ICQ like) Internet service.