CS 620 Advanced Operating Systems

CS 620 Advanced Operating Systems
Lecture 7 – Communication Professor Timothy Arndt BU 331

Layered Protocols As we saw previously, network software is often structured as a layered protocol suite. We will now examine these protocols in somewhat more detail. Protocol: An agreement between communicating parties on how communication is to proceed. Error correction codes. Blocksize. Ack/Nak.

Layered Protocols Layered protocol: The protocol decisions concern very different things How many volts is 1 or zero? How wide is the pulse? (low level details) Error correction Routing Sequencing (higher level details) As a result you have many routines that work on the various aspects. They are called layered.

Layered Protocols Layer X of the sender acts as if it is directly communicating with layer X of the receiver but in fact it is communicating with layer X-1 of the sender. Similarly layer X of the sender acts as a virtual layer X+1 of the receiver to layer X+1 of the sender. A famous example is the ISO OSI (International Standards Organization Open Systems Interconnection Reference Model).

Layered Protocols

Layered Protocols So for example the network layer sends messages intended for the other network layer but in fact sends them to the data link layer. Also the network layer must accept messages from the transport layer, which it then sends to the other network layer (really its own data link layer. What a layer really does to a message it receives is add a header (and maybe a trailer) that is to be interpreted by its corresponding layer in the receiver.

Layered Protocols So the network layer adds a header (in front of the transport layer's header) and sends to the other network layer (really its own data link layer that adds a header in front of the network layer's and a trailer). So headers get added as you go down the sender's layers (often called the Protocol Stack or Protocol Suite). They get used (and stripped off) as the message goes up the receiver's stack.

Layered Protocols

Layered Protocols It all starts with process A sending a message. By the time it reaches the wire it has 6 headers (the physical layer doesn't add one - Why?) and one trailer. The nice thing is that the layers are independent. You can change one layer and not change the others. Physical layer: hardware, i.e. voltages, speeds, connectors. Data link layer: Error correction and detection. "Group the bits into units called frames".

Layered Protocols Frames contain error detection (and correction) bits. This is what the pair of data link layers do when viewed as an extension of the physical. But when being used, the sending DL layer gets a packet from the network layer and breaks it into frames and adds the error detection bits.

Data Link Layer 2-3 Discussion between a receiver and a sender in the data link layer.

Layered Protocols Network layer: Routing.
Connection oriented network-layer protocol: X.25 or ATM. Send a message to destination and establish a route that will be used for further messages during this connection (a connection number is given). Like a telephone call. Connectionless: IP (Internet Protocol). Each packet (message between the network layers) is routed separately. Like the post office.

Layered Protocols Transport layer: make reliable and ordered (but not always). Break incoming message into packets and send to corresponding transport layer (really send to ...). They are sequence numbered. Header contains info as to which packets have been sent and received. These sequence numbers are for the end to end message.

Layered Protocols I.e. if grail.cba.csuohio.edu sends message to the transport layer breaks message into packets and numbers the packets. These packets may take different routes. On any one hop the data link layer keeps the frames ordered. If you use connection-oriented network layer there is little for transport layer to do. If you use IP for network layer, there is a lot to do. If use connection-oriented TCP for transport layer of client-server system, slower than need be Can use transactional TCP

Client-Server TCP 2-4 Normal operation of TCP. Transactional TCP.

Layered Protocols Session Layer: dialog and synchronization.
Dialog control Synchronization facilities Presentation layer: Describes "meaning" of fields. Record definition Application layer: For specific applications (e.g. mail, news, ftp). Middleware logically resides in the application layer, but contains functionality that is quite general Authentication Authorization Multicast, etc. This leads to a slightly modified reference model

Middleware Protocols 2-5
An adapted reference model for networked communication.

Remote Procedure Call (RPC)
Developed by Birrell and Nelson (1984). Recall how different the client code for copying a file was from the normal centralized (uniprocessor) code. Let’s make the client server request-reply look like a normal procedure call and return. Notice that getchar in the centralized version turns into a read system call. The following is for Unix: read looks like a normal procedure to its caller.

read is a user mode program. read manipulates registers and then does a trap to the kernel. After the trap, the kernel manipulates registers and then does a C-language routine and lots of work gets done (drivers, disks, etc). After the I/O, the process get unblocked, the kernel read manipulates registers, and returns. The user mode read manipulates registers and returns to the original caller. Let’s do something similar with request reply:

User (client) does a subroutine call to getchar (or read). Client knows nothing about messages. We link in a user mode program called the client stub (analogous to the user mode read above). This takes the parameters to read and converts them to a message (marshalls the arguments). Sends a message to machine containing the server directed to a server stub. Does a blocking receive (of the reply message).

The server stub is linked with the server. It receives the message from the client stub. Unmarshalls the arguments and calls the server (as a subroutine). The server procedure does what it does and returns (to the server stub). Server knows nothing about messages Server stub now converts this to a reply message sent to the client stub. Marshalls the arguments.

Client stub unblocks and receives the reply. Unmarshalls the arguments. Returns to the client. Client believes (correctly) that the routine it calls has returned just like a normal procedure does.

Passing Value Parameters (1)
Steps involved in doing remote computation through RPC 2-8

Heterogeneity: Machines have different data formats. How can we handle these differences in RPC? Have conversions between all possibilities. Done during marshalling and unmarshalling. Adopt a standard and convert to/from it.

Passing Value Parameters (2)
Original message on the Pentium The message after receipt on the SPARC The message after being inverted. The little numbers in boxes indicate the address of each byte

Pointers: Avoid them for RPC! Can put the object pointed to into the message itself (assuming you know its length). Convert call-by-reference to copyin/copyout If we have in or out parameters (instead of in out) can eliminate one of the copies Change the server to handle pointers in a special way. Callback to client stub

Registering and name servers
As we said before, we can use a name server. This permits the server to move using the following process. deregister from the name server move reregister This is sometimes called dynamic binding.

Registering and name servers
The client stub calls the name server (binder) the first time to get a handle to use for the future. There is a callback from the binder to the client stub if the server deregisters or we could have the attempt to use the handle fail so that the client stub will go to the binder again.

RPC Failures This gets hard and ugly. Can't find the server.
Need some sort of out-of-band response from the client stub to the client. Ada exceptions C signals Multithread the client and start the "exception" thread. This loses transparency (centralized systems don't have this).

RPC Failures Lost request message. Lost reply message.
This is easy if known. That is, if we are sure the request was lost. Also easy if idempotent and we think it might be lost. Simply retransmit the request. Assumes the client still knows the request. Lost reply message. If it is known the reply was lost, have server retransmit.

RPC Failures Assumes the server still has the reply. How long should the server hold the reply? Wait forever for the reply to be ack'ed? No! Discard after "enough" time. Discard after we receive another request from this client. Ask the client if the reply was received. Keep resending reply. What if we are not sure of whether we lost the request or the reply? If the server is stateless, it doesn't know and the client can't tell! If idempotent, simply retransmit the request.

RPC Failures Server crashes
What if the server is not idempotent and can't tell if we lost the request or the reply? Use sequence numbers so server can tell that this is a new request not a retransmission of a request it has already done. Doesn't work for stateless servers. Server crashes Did it crash before or after doing some nonidempotent action? Can't tell from messages.

RPC Failures From databases, we get the idea of transactions and commits. This really does solve the problem but is not cheap. Fairly easy to get “at least once” (try request again if timer expires) or “at most once (give up if timer expires)” semantics. Hard to get “exactly once” without transactions. To be more precise. A transaction either happens exactly once or not at all (sounds like at most once) and the client knows which.

RPC Failures Client crashes Orphan computations exist.
Again transactions work but are expensive. We can have the rebooted client start another epoch and all computations of previous epoch are killed and clients resubmit. It is better is to let old computations with owners that can be found continue. This isn’t a great solution.

RPC Failures Serious programming is needed.
An orphan may hold locks or might have done something not easily undone. Serious programming is needed.

Implementation Issues
Protocol choice Existing ones like UDP are designed for harder (more general) cases and so are not efficient. Often developers of distributed systems invent their own protocol that is more efficient. But of course they are all different. On a LAN we would like large messages since they are more efficient and don't take so long considering the high data rate.

Acks One per packet vs. one per message. Called stop-and-wait and blast. In former wait for each ack. In blast keep sending packets until message finished. Could also do a hybrid. Blast but ack each packet. Blast but request only those missing instead of general nak. Called selective repeat.

Flow control Buffer overrun problem. Internet worm caused by buffer overrun and rewriting non-buffer space. This is not the problem here. Can occur right at the interface chip, in which case the (later) packet is lost. More likely with blast but can occur with stop and wait if have multiple senders.

What to do If chip needs a delay to do back to back receives have sender delay that amount. If we can only buffer n packets, have sender only send n then wait for ack. The above fails when we have simultaneous sends. But hopefully that is not too common. This tuning to the specific hardware present is one reason why general protocols don't work as well as specialized ones.

Why is RPC slow? We have to... Call stub get message buffer marshall parameters If using UDP, computer checksum fill in headers Copy message to kernel space (Unless we have a special kernel) Put in real destination address Start DMA to communication device wire time

Why is RPC slow? We have to... Process interrupt (or polling delay) Check packet Determine relevant stub Copy to stub address space (unless we have a special kernel) Unmarshall Call server On the Paragon (large Intel MPP of a few years ago), a variety of the above took 30ms of which 1ms was wire time.

Eliminating copying Message transmission is essentially a copy so the minimum number of copies is 1. This requires the network device to do its DMA from the user buffer (client stub) directly into the server stub. But it is hard for the receiver to know where to put the message until it arrives and is inspected. Sounds like a copy is needed from the receiving buffer to the server stub. We can avoid this by adjusting memory maps.

Messages must then be full pages (as that is what is mapped). Normally there are two copies on the receiving side. From a hardware buffer to a kernel buffer. From the kernel buffer to user space (server stub). Often there are two on the sending side. User space (client stub) to kernel buffer. Kernel buffer to buffer on device. Then start the device. The sender ones can be reduced.

The device can do DMA from the kernel buffer thus eliminating the second. Doing DMA from the user would eliminate the first, but we would need scatter gather (just gather here) since the header must be in the kernel space since the user is not allowed to set it (for security). To eliminate the two on the receiver side is harder. We can eliminate the first if the device writes directly into a kernel buffer. To eliminate the second requires the remapping trick.

Timers and timeout values Getting a good value for the timeouts is a black art. Too small a value leads to many unneeded retransmissions. Too large causes us to wait too long when a message is lost. Should it be adaptive?? If we find that we sent an extra message then raise the timeout value for this class of transmissions. If timeout expires most of the time, lower the value for this class.

How to keep timeout values? If you know that almost all timers of this class are going to go off (alarms) and accuracy is important, then keep a list sorted by time to alarm. Only have to scan head for timer (so we can do it frequently). Additions must search for a place to add. Deletions (cancelled alarms) are presumed rare. If deletions are common and we can afford not so accurate an alarm, then sweep list of all processes (not so frequently since accuracy not required). Deletions and additions are easy since list is indexed by process number.

Difficulties with RPC Global variables like errno inherently have shared-variable semantics and so they don't fit in a distributed system. One (remote) procedure sets the variable and the local procedure is supposed to see it. But the setting is a normal store so is not seen by the communication system. So transparency is violated.

Weak typing (as in C) makes marshalling hard/impossible. How big is the object we should copy? What is the conversion needed if heterogeneous system? So transparency is violated.

How does a programmer create a program with RPC?
uuidgen generates a unique identifier for the RPC Include it in an IDL (interface description language file) and describe the interface for the RPC in the file as well Write the client and server code Client and server stubs are generated from the IDL file automatically Link things together and run on desired machines

Writing a Client and a Server
2-14 The steps in writing a client and a server in DCE RPC.

Binding a Client to an Object
Unlike RPC, distributed objects have systemwide object references The system may support either implicit binding or explicit binding The object reference may contain - IP address, port, object name Or use a location server so we need only address for this server plus the object name

Binding a Client to an Object
Distr_object* obj_ref; //Declare a systemwide object reference obj_ref = …; // Initialize the reference to a distributed object obj_ref-> do_something(); // Implicitly bind and invoke a method (a) Distr_object objPref; //Declare a systemwide object reference Local_object* obj_ptr; //Declare a pointer to local objects obj_ref = …; //Initialize the reference to a distributed object obj_ptr = bind(obj_ref); //Explicitly bind and obtain a pointer to the local proxy obj_ptr -> do_something(); //Invoke a method on the local proxy (b) (a) Example with implicit binding using only global references (b) Example with explicit binding using global and local references

Parameter Passing Since we have systemwide object refs, we don’t have the same types of problems we had with RPCs and pointers However, for performance motives we may want to treat object ref parameters differently depending on where the object resides

Parameter Passing The situation when passing an object by reference or by value. 2-18

Java RMI Java offers remote objects as the only type of distributed object One difference between local and remote objects is that synchronized methods work differently on the two types Blocking applies only to the proxies of the remote objects A parameter passed to an RMI must be serializable

Message-Oriented Communication
Neither RPC nor RMI works when we can’t assure that the receiving side isn’t executing We can use messaging in this case

Message-Oriented Communication
General organization of a communication system in which hosts are connected through a network 2-20

Messaging Modes Messaging systems can be either persistent or transient Are messages retained when the senders and/or receivers stop executing? Can also be either synchronous or asynchronous Blocking vs. non-blocking

Persistent Communication
Persistent communication of letters back in the days of the Pony Express.

Persistence and Synchronicity in Communication
Persistent asynchronous communication Persistent synchronous communication 2-22.1

2-22.2 Transient asynchronous communication Receipt-based transient synchronous communication

Delivery-based transient synchronous communication at message delivery Response-based transient synchronous communication

Message-Oriented Transient Communication
Sockets are an example of message-oriented transient communication The Message-Passing Interface (MPI) is a newer set of message-oriented primitives for multicomputers MPI communication takes place within a known group of processes A (groupID, processID) pair uniquely identifies a source or destination of a message

The Message-Passing Interface (MPI)
Some of the most intuitive message-passing primitives of MPI. Primitive Meaning MPI_bsend Append outgoing message to a local send buffer MPI_send Send a message and wait until copied to local or remote buffer MPI_ssend Send a message and wait until receipt starts MPI_sendrecv Send a message and wait for reply MPI_isend Pass reference to outgoing message, and continue MPI_issend Pass reference to outgoing message, and wait until receipt starts MPI_recv Receive a message; block if there are none MPI_irecv Check if there is an incoming message, but do not block

Message-Oriented Persistent Communication
Known as message-queuing systems or Message-Oriented Middleware (MOM) Support persistent asynchronous communication Generally have slow communications Similar to systems Basic model - applications communicate by inserting messages in specific queues

Message-Queuing Model
Four combinations for loosely-coupled communications using queues. 2-26

Message-Queuing Model
Basic interface to a queue in a message-queuing system. Primitive Meaning Put Append a message to a specified queue Get Block until the specified queue is nonempty, and remove the first message Poll Check a specified queue for messages, and remove the first. Never block. Notify Install a handler to be called when a message is put into the specified queue.

General Architecture of a Message-Queuing System
Messages are inserted into a local source queue The message contains the name of a destination queue The message-queuing system transfers messages to the destination queue Use a db which maps queue names to network locations

Queues are managed by queue managers Special queue managers act as relays which forward messages to other managers

The relationship between queue-level addressing and network-level addressing.

The general organization of a message-queuing system with routers. 2-29

Message Brokers Message-queuing systems can be used to integrate existing and new applications These diverse applications have different message formats Since we have old apps, can’t use a standard message format So use message brokers which convert messages from one format to another

Message Brokers 2-30 The general organization of a message broker in a message-queuing system.

Example: IBM MQSeries IBM WebSphere MQ (formerly MQSeries) is used to integrate old apps (generally running on IBM mainframes) Queues are managed by queue managers Queue managers are connected through message channels Each of the two ends of the message channel is managed by a message channel agents (MCA) Queue managers can be linked into the same process as the application using the queue Queue managers implemented using RPC

Example: IBM WebSphere MQ
General organization of IBM's WebSphere MQ message-queuing system. 2-31

Channels Some attributes associated with message channel agents.
Description Transport type Determines the transport protocol to be used FIFO delivery Indicates that messages are to be delivered in the order they are sent Message length Maximum length of a single message Setup retry count Specifies maximum number of retries to start up the remote MCA Delivery retries Maximum times MCA will try to put received message into queue

Aliases In order to be able to change the name of a queue manager or to replace it with another without having to recompile all of the applications which send messages to it, local aliases are used for queue manager names.

Message Transfer The general organization of an MQ queuing network using routing tables and aliases.

Message Transfer Primitives available in an IBM MQ MQI Primitive
Description MQopen Open a (possibly remote) queue MQclose Close a queue MQput Put a message into an opened queue MQget Get a message from a (local) queue Primitives available in an IBM MQ MQI

Other Messaging Systems
POSIX message queues an IPC mechanism for asynchronous communication on Linux/UNIX machines (like named pipes) JMS (Java Message Service) is a Java Message Oriented Middleware API Part of Java EE Implemented by IBM, BEA and others Microsoft Message Queueing (MSMQ) is Microsoft’s implementation of MOM Amazon.com offers the Amazon Simple Queue Service on its servers

Stream-Oriented Communication
Multimedia systems use stream-oriented communications The timing of the data delivery is critical in such systems Such communication is used for continuous media such as audio where the temporal relationships between different data items are meaningful as opposed to discrete media such as text

Data streams have several modes Asynchronous transmission mode places no timing constraints on the data items in a stream Synchronous transmission mode gives a maximum end-to-end delay for each item in a data stream Isochronous transmission mode gives both maximum and minimum delays Bounded jitter

Streams can be either simple or complex (with several related simple substreams) Related substreams will need to be synchronized Streams can be be seen as a channel between a source and a sink Source could be a file or multimedia capture device Sink could be a file or multimedia rendering device

Data Stream Setting up a stream between two processes across a network.

Data Stream Setting up a stream directly between two devices. 2-35.2

Data Stream An example of multicasting a stream to several receivers.

Streams and QoS Time-dependent requirements are generally expressed as Quality of Service (QoS) requirements The underlying distributed system and network must ensure that these are met Required bit rate Maximum session delay Maximum end-to-end delay Maximum delay variance (jitter)

Using a Buffer to Reduce Jitter

Interleaved Transmission

Setting up a Stream Before a stream is opened between source and sink resources through the network must be reserved in order to meet the QoS requirements Bandwidth Buffers Processing capability Figuring out how much of each is required is difficult since they aren’t specified directly in the QoS RSVP is a protocol for enabling resource reservations in network routers

Setting Up a Stream The basic organization of RSVP for resource reservation in a distributed system.

IP Streaming Protocols
RSVP is one protocol for streaming in TCP/IP networks RTP (Real-time Transport Protocol) and its control protocol RTCP (RTP Control Protocol) are designed for real-time transfer of streaming media Run on top of UDP RTSP (Real-time Streaming Protocol) enables VCR-like commands (play, pause) for streaming media players.

Stream Synchronization
An important issue is that different streams (possibly substreams of a complex stream) must be synchronized Continuous with discrete Continuous with continuous (more difficult) Different levels of granularity for syncing required depending on situation

Synchronization Mechanisms
Synchronization can be carried out by the application Can also be supplied by a middleware layer Complex streams are multiplexed according to a given synchronization specification (e.g. MPEG) Syncing can occur either at the sending or receiving end.

The principle of explicit synchronization on the level of data units.

The principle of synchronization as supported by high-level interfaces. 2-41

Multicast Communication
Much research has been done on enhancing network protocols by adding support for sending a message to multiple receivers (multicasting) No standard has yet emerged Peer-to-peer technology has been implemented with some success at the application level Multicasting has been implemented here - routers don’t need to be changed

Multicast Communication
The nodes organize themselves into an overlay network This is a logical organization, not a physical one May take the form of a tree or mesh A node which wants to start a multicast session becomes the root of a multicast tree. Other nodes may join the multicast group by becoming nodes of the logical tree

CS 620 Advanced Operating Systems

Similar presentations

Presentation on theme: "CS 620 Advanced Operating Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 620 Advanced Operating Systems

Similar presentations

Presentation on theme: "CS 620 Advanced Operating Systems"— Presentation transcript:

Similar presentations

About project

Feedback