Download presentation
Presentation is loading. Please wait.
Published bySilvester Stanley Modified over 9 years ago
1
Remote Procedure Call An Effective Primitive for Distributed Computing Seth James Nielson
2
What is RPC? Procedure calls transfer control within local memory Procedure calls transfer control within local memory RPC’s transfer control to remote machines RPC’s transfer control to remote machines Unused Proc B Proc A Main
3
Why RPC? Clean/Simple semantics Clean/Simple semantics Communication efficiency Communication efficiency Generality Generality RPC is an effective primitive for distributed systems because of -
4
How it Works (Idealized Example) localCall() … c = encrypt(msg) CLIENTSERVER With specialized hardware and encryption key wait… c = encrypt(msg) localCall() … encrypt(msg) Implementation Request Response
5
Early History of RPC 1976: early reference in literature 1976: early reference in literature 1976-1984: few full implementations 1976-1984: few full implementations Feb 1984: Cedar RPC Feb 1984: Cedar RPC –A. Birrell, B. Nelson at Xerox –“Implementing Remote Procedure Calls”
6
Imagine our Surprise… “In practice, … several areas [of RPC] were inadequately understood”
7
RPC Design Issues 1. Machine/communication failures 2. Address-containing arguments 3. Integration into existing systems 4. Binding 5. Suitable protocols 6. Data integrity/security
8
Birrell and Nelson Aims Primary Aim Primary Aim –Easy distributed computation Secondary Aims Secondary Aims –Efficient (with powerful semantics) –Secure
9
Fundamental Decisions 1. No shared address space among computers 2. Semantics of remote procedure calls should be as close as possible to local procedure calls Note that the first decision partially violates the second…
10
Binding Binds an importer to exporter Binds an importer to exporter Interface name: type/instance Interface name: type/instance Uses Grapevine DB to locate appropriate exporter Uses Grapevine DB to locate appropriate exporter Bindings (based on unique ID) break if exporter crashes and restarts Bindings (based on unique ID) break if exporter crashes and restarts
11
Unique ID At binding, importer learns of exported interface’s Unique ID (UID) At binding, importer learns of exported interface’s Unique ID (UID) The UID is initialized by a real-time clock on system start-up The UID is initialized by a real-time clock on system start-up If the system crashes and restarts, the UID will be a new unique number If the system crashes and restarts, the UID will be a new unique number The change in UID breaks existing connections The change in UID breaks existing connections
12
How Cedar RPC works Caller Machine GrapevineCallee Machine UserUser Stub RPCRun.RPCRun. Server Stub Server record export export update setConnect update addmember import import getConnect lookup bind(A,B) lookup return record return x=F(y) F=>3 transmit Check 3 3=>F F(y)
13
Packet-Level Transport Protocol Primary goal: minimize time between initiating the call and getting results Primary goal: minimize time between initiating the call and getting results NOT general – designed for RPC NOT general – designed for RPC Why? possible 10X performance gain Why? possible 10X performance gain No upper bound on waiting for results No upper bound on waiting for results Error Semantics: User does not know if machine crashed or network failed Error Semantics: User does not know if machine crashed or network failed
14
Creating RPC-enabled Software User Code Server Code Interface Modules Developer Lupine User Stub Server Stub RPCRuntime Server Program RPCRuntime Client Program Client Machine Server Machine
15
Making it Faster Simple Calls (common case): all of the arguments fit in a single packet Simple Calls (common case): all of the arguments fit in a single packet A server reply and a 2 nd RPC operates as an implicit ACK A server reply and a 2 nd RPC operates as an implicit ACK Explicit ACKs required if call lasts longer or there is a longer interval between calls Explicit ACKs required if call lasts longer or there is a longer interval between calls
16
Simple Calls CLIENT SERVER Call Response/ACK Call/ACK Response/ACK
17
Complex Calls CLIENT SERVER Call (pkt 0) ACK pkt 0 Data (pkt 2) Response/ACK Data (pkt 1) ACK pkt 1 ACK or New Call
18
Keeping it Light A connection is just shared state A connection is just shared state Reduce process creation/swapping Reduce process creation/swapping –Maintain idle server processes –Each packet has a process identifier to reduce swap –Full scheme results in no processes created/four process swaps per call RPC directly on top of Ethernet RPC directly on top of Ethernet
19
Elapsed Time Performance Number of Args/Results Time0 1097µ 100 1278µ 100 word array 2926µ
20
THE NEED FOR SPEED RPC performance cost is a barrier (Cedar RPC requires.1 sec for a 0 arg call!) RPC performance cost is a barrier (Cedar RPC requires.1 sec for a 0 arg call!) Peregrine RPC (about nine years later) manages a 0 arg call in.0573 seconds! Peregrine RPC (about nine years later) manages a 0 arg call in.0573 seconds!
21
A Few Definitions Hardware latency – Sum of call/result network penalty Hardware latency – Sum of call/result network penalty Network penalty – Time to transmit (greater than…) Network penalty – Time to transmit (greater than…) Network transmission time – Raw Network Speed Network transmission time – Raw Network Speed Network RPC – RPC between two machines Network RPC – RPC between two machines Local RPC – RPC between separate threads Local RPC – RPC between separate threads
22
Peregrine RPC Supports full functionality of RPC Supports full functionality of RPC Network RPC performance close to HW latency Network RPC performance close to HW latency Also supports efficient local RPC Also supports efficient local RPC
23
Messing with the Guts Three General Optimizations Three General Optimizations Three RPC-Specific Optimizations Three RPC-Specific Optimizations
24
General Optimization 1. Transmitted arguments avoid copies 2. No conversion for client/server with the same data representation 3. Use of packet header templates that avoid recomputation per call
25
RPC Specific Optimizations 1. No thread-specific state is saved between calls in the server 2. Server arguments are mapped (not copied) 3. No copying in the critical path of multi- packet arguments
26
I think this is COOL To avoid copying arguments from a single-packet RPC, Peregrine arranges instead to use the packet buffer itself as the server thread’s stack To avoid copying arguments from a single-packet RPC, Peregrine arranges instead to use the packet buffer itself as the server thread’s stack Any pointers are replaced with server- appropriate pointers (Cedar RPC didn’t support this…) Any pointers are replaced with server- appropriate pointers (Cedar RPC didn’t support this…)
27
This is cool too Multi-packet RPC’s use blast protocol (selective retransmission) Multi-packet RPC’s use blast protocol (selective retransmission) Data is transmitted in parallel with data copy Data is transmitted in parallel with data copy Last packet is mapped into place Last packet is mapped into place
28
Data 0 Data 3 Data 1 Data 2 Data 1 Data 2 Data 0 Data 3 Header0 Header3 Header2 Header1 Page Boundary Packets 1-3 data are copied into buffer at server Packet 0 buffer (sent last) Is remapped at server Fast Multi-Packet Receive
29
Peregrine 0-Arg Performance SystemLatencyThroughput Cedar 1097µsec 2.0mbps Amoeba** 1100µsec 6.4mbps x-kernel 1730µsec 7.1mbps V-System 2540µsec 4.4mbps Firefly (5 CPU) 2660µsec 4.6mbps Sprite 2800µsec 5.7mbps Firefly (1 CPU) 4800µsec 2.5mbps SunRPC** 6700µsec 2.7mbps Peregrine 573µsec 8.9mbps
30
Peregrine Multi-Packet Performance Procedure(Bytes) Network Penalty (ms) Latency(ms) Through put (mbps) 3000 byte in RPC 2.713.207.50 3000 byte in-out RPC 5.166.047.95 48000 byte in RPC 40.9643.338.86 48000 byte in-out RPC 81.6686.298.90
31
Cedar RPC Summary Cedar RPC introduced practical RPC Cedar RPC introduced practical RPC Demonstrated easy semantics Demonstrated easy semantics Identified major design issues Identified major design issues Established RPC as effective primitive Established RPC as effective primitive
32
Peregrine RPC Summary Same RPC semantics (with addition of pointers) Same RPC semantics (with addition of pointers) Significantly faster than Cedar RPC and others Significantly faster than Cedar RPC and others General optimizations (e.g., pre- computed headers) General optimizations (e.g., pre- computed headers) RPC-Specific (e.g., no copying in multipacket critical path) RPC-Specific (e.g., no copying in multipacket critical path)
33
Observations RPC is a very “transparent” mechanism – it acts like a local call RPC is a very “transparent” mechanism – it acts like a local call However, RPC requires a deep understanding of hardware to tune However, RPC requires a deep understanding of hardware to tune In short, RPC requires sophistication in its presentation as well as its operation to be viable In short, RPC requires sophistication in its presentation as well as its operation to be viable
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.