Fast Communication Firefly RPC Lightweight RPC  CS 614  Tuesday March 13, 2001  Jeff Hoy.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Chapter 4 Threads Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Threads, SMP, and Microkernels
Remote Procedure Call (RPC)
Remote Procedure Call Design issues Implementation RPC programming
User-Level Interprocess Communication for Shared Memory Multiprocessors Bershad, B. N., Anderson, T. E., Lazowska, E.D., and Levy, H. M. Presented by Akbar.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Tam Vu Remote Procedure Call CISC 879 – Spring 03 Tam Vu March 06, 03.
Lightweight Remote Procedure Call BRIAN N. BERSHAD THOMAS E. ANDERSON EDWARD D. LAZOWSKA HENRY M. LEVY Presented by Wen Sun.
Remote Procedure CallCS-4513, D-Term Remote Procedure Call CS-4513 Distributed Computing Systems (Slides include materials from Operating System.
Multiple Processor Systems
Implementing Remote Procedure Calls Andrew Birrell and Bruce Nelson Presented by Kai Cong.
Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Alana Sweat.
 Introduction Originally developed by Open Software Foundation (OSF), which is now called The Open Group ( Provides a set of tools and.
Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC L3 Similar.
G Robert Grimm New York University Lightweight RPC.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
Tutorials 2 A programmer can use two approaches when designing a distributed application. Describe what are they? Communication-Oriented Design Begin with.
CS533 Concepts of Operating Systems Class 8 Shared Memory Implementations of Remote Procedure Call.
CS533 - Concepts of Operating Systems 1 Remote Procedure Calls - Alan West.
CS490T Advanced Tablet Platform Applications Network Programming Evolution.
User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.
CS533 Concepts of Operating Systems Class 4 Remote Procedure Call.
Remote Procedure Calls (RPC) Presenter: Benyah Shaparenko CS 614, 2/24/2004.
Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, Henry M. Levy ACM Transactions Vol. 8, No. 1, February 1990,
Remote Procedure Calls (RPC) - Swati Agarwal. RPC – an overview Request / reply mechanism Procedure call – disjoint address space clientserver computation.
USER LEVEL INTERPROCESS COMMUNICATION FOR SHARED MEMORY MULTIPROCESSORS Presented by Elakkiya Pandian CS 533 OPERATING SYSTEMS – SPRING 2011 Brian N. Bershad.
Multiple Processor Systems 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
Exercises for Chapter 6: Operating System Support
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
CS533 Concepts of Operating Systems Class 4 Remote Procedure Call.
Remote Procedure Calls Taiyang Chen 10/06/2009. Overview Remote Procedure Call (RPC): procedure call across the network Lightweight Remote Procedure Call.
1 Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska and Henry M. Levy Presented by: Karthika Kothapally.
CS533 Concepts of Operating Systems Class 9 Lightweight Remote Procedure Call (LRPC) Rizal Arryadi.
CS510 Concurrent Systems Jonathan Walpole. Lightweight Remote Procedure Call (LRPC)
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Chapter 4 Threads, SMP, and Microkernels Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E.
Lightweight Remote Procedure Call (Bershad, et. al.) Andy Jost CS 533, Winter 2012.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
CS533 Concepts of Operating Systems Jonathan Walpole.
Operating System 4 THREADS, SMP AND MICROKERNELS
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
Fast Multi-Threading on Shared Memory Multi-Processors Joseph Cordina B.Sc. Computer Science and Physics Year IV.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
 Remote Procedure Call (RPC) is a high-level model for client-sever communication.  It provides the programmers with a familiar mechanism for building.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved RPC Tanenbaum.
Lightweight Remote Procedure Call BRIAN N. BERSHAD, THOMAS E. ANDERSON, EDWARD D. LASOWSKA, AND HENRY M. LEVY UNIVERSTY OF WASHINGTON "Lightweight Remote.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
Networking Implementations (part 1) CPS210 Spring 2006.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy. Presented by: Tim Fleck.
M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.
09/14/05 1 Implementing Remote Procedure Calls* Birrell, A. D. and Nelson, B. J. Presented by Emil Constantinescu *ACM Trans. Comput. Syst. 2, 1 (Feb.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Computer Science Lecture 3, page 1 CS677: Distributed OS Last Class: Communication in Distributed Systems Structured or unstructured? Addressing? Blocking/non-blocking?
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Chapter 4 Threads, SMP, and Microkernels Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E.
Computer Science Lecture 4, page 1 CS677: Distributed OS Last Class: RPCs RPCs make distributed computations look like local computations Issues: –Parameter.
Operating Systems Distributed-System Structures. Topics –Network-Operating Systems –Distributed-Operating Systems –Remote Services –Robustness –Design.
Introduction to Operating Systems Concepts
CS533 Concepts of Operating Systems
B. N. Bershad, T. E. Anderson, E. D. Lazowska and H. M
Multiple Processor Systems
By Brian N. Bershad, Thomas E. Anderson, Edward D
Fast Communication and User Level Parallelism
Presented by Neha Agrawal
Presented by: SHILPI AGARWAL
Last Class: Communication in Distributed Systems
Presentation transcript:

Fast Communication Firefly RPC Lightweight RPC  CS 614  Tuesday March 13, 2001  Jeff Hoy

Why Remote Procedure Call? Simplify building distributed systems and applications  Looks like local procedure call  Transparent to user Balance between semantics and efficiency  Universal programming tool Secure inter-process communication

RPC Model Client Application Client Stub Client Runtime Server Application Server Stub Server Runtime Network Call Return

RPC In Modern Computing CORBA and Internet Inter-ORB Protocol (IIOP)  Each CORBA server object exposes a set of methods DCOM and Object RPC  Built on top of RPC Java and Java Remote Method Protocol (JRMP)  Interface exposes a set of methods XML-RPC, SOAP  RPC over HTTP and XML

Goals Firefly RPC  Inter-machine Communication  Maintain Security and Functionality  Speed Lightweight RPC  Intra-machine Communication  Maintain Security and Functionality  Speed

Firefly RPC Hardware  DEC Firefly multiprocessor  1 to 5 MicroVAX CPUs per node  Concurrency considerations  10 megabit Ethernet Takes advantage of 5 CPUs

Fast Path in a RPC Transport Mechanisms  IP / UDP  DECNet byte stream  Shared Memory (intra-machine only) Determined at bind time  Inside transport procedures “Starter”, “Transporter”, “Ender”, and “Receiver” for the server

Caller Stub Gets control from calling program  Calls “Starter” for packet buffer  Copies arguments into the buffer  Calls “Transporter” and waits for reply  Copies result data onto caller’s result variables  Calls “Ender” and frees result packet

Server Stub Receives incoming packet  Copies data into stack, a new data block, or left in the packet  Calls server procedure  Copies result into the call packet and transmit

Transport Mechanism “Transporter” procedure  Completes RPC header  Calls “Sender” to complete UDP, IP, and Ethernet headers (Ethernet is the chosen means of communication)  Invoke Ethernet driver via kernel trap and queue the packet

Transport Mechanism “Receiver” procedure  Server thread awakens in “Receiver”  “Receiver” calls the stub interface included in the received packet, and the interface stub calls the procedure stub Reply is similar

Threading Client Application creates RPC thread Server Application creates call thread  Threads operate in server application’s address space  No need to spawn entire process  Threads need to consider locking resources

Threading

Performance Enchancements Over traditional RPC  Stubs marshal arguments rather than library functions handling arguments  RPC procedures called through procedure variables rather than by lookup table  Server retains call packet for results  Buffers reside in shared memory Sacrifices abstract structure

Performance Analysis Null() Procedure  No arguments or return value  Measures base latency of RPC mechanism Multi-threaded caller and server

Time for 10,000 RPCs Base latency – 2.66ms MaxResult latency (1500 bytes) – 6.35ms

Send and Receive Latency

With larger packets, transmission time dominates  Overhead becomes less of an issue  Good for Firefly RPC, assuming large transmission over network  Is overhead acceptable for intra-machine communication?

Stub Latency Significant overhead for small packets

Fewer Processors Seconds for 1,000 Null() calls

Fewer Processors Why the slowdown with one processor?  Fast path can be followed only in multiprocessor environment  Lock conflicts, scheduling problems Why little speedup past two processors?

Future Improvements Hardware  Faster network will help larger packets  Triple CPU speed will reduce Null() time by 52% and MaxResult by 36% Software  Omit IP and UDP headers for Ethernet datagrams, 2~4% gain  Redesign RPC protocol ~ 5% gain  Busy thread wait, 10~15% gain  Write more in assembler, 5~10% gain

Other Improvements Firefly RPC handles intra-machine communication through the same mechanisms as inter-machine communication Firefly RPC also has very high overhead for small packets Does this matter?

RPC Size Distribution Majority of RPC transfers under 200 bytes

Frequency of Remote Activity Most calls are to the same machine

Traditional RPC Most calls are small messages that take place between domains of the same machine Traditional RPC contains unnecessary overhead, like  Scheduling  Copying  Access validation

Lightweight RPC (LRPC) Also written for the DEC Firefly system Mechanism for communication between different protection domains on the same system Significant performance improvements over traditional RPC

Overhead Analysis Theoretical minimum to invoke Null() across domains: kernal trap + context change to call and a trap + context change to return Theoretical minimum on Firefly RPC: 109 us. Actual cost: 464us

Sources of Overhead 355us added  Stub overhead  Message buffer overhead  Not so much in Firefly RPC  Message transfer and flow control  Scheduling and abstract threads  Context Switch

Implementation of LRPC Similar to RPC Call to server is done through kernel trap  Kernel validates the caller  Servers export interfaces  Clients bind to server interfaces before making a call

Binding Servers export interfaces through a clerk  The clerk registers the interface  Clients bind to the interface through a call to the kernel  Server replies with an entry address and size of its A-stack  Client gets a Binding Object from the kernel

Calling Each procedure is represented by a stub Client makes a call through the stub  Manages A-stacks  Traps to the kernel  Kernel switches context to the server  Server returns by its own stub  No verification needed

Stub Generation Procedure representation  Call stub for client  Entry stub for server LRPC merges protocol layers Stub generator creates run-time stubs in assembly language  Portability sacrificed for Performance  Falls back on Modula2+ for complex calls

Multiple Processors LRPC caches domains on idle processors  Kernel checks for an idling processor in the server domain  If a processor is found, caller thread can execute on the idle processor without switching context

Argument Copying Traditional RPC copies arguments four times for intra-machine calls  Client stub to RPC message to kernel’s message to server’s message to server’s stack In many cases, LRPC needs to copy the arguments only once  Client stub to A-stack

Performance Analysis LRPC is roughly three times faster than traditional RPC Null() LRPC cost: 157us, close to the 109us theoretical minimum  Additional overhead from stub generation and kernel execution

Single-Processor Null() LRPC

Performance Comparison LRPC versus traditional RPC (in us)

Multiprocessor Speedup

Inter-machine Communication LRPC is best for messages between domains on the on the same machine The first instruction of the LRPC stub checks if the call is cross-machine  If so, stub branches to conventional RPC Larger messages are handled well, LRPC scales by packet size linearly like traditional RPC

Cost LRPC avoids needless scheduling, copying, and locking by integrating the client, kernel, server, and message protocols  Abstraction is sacrificed for functionality RPC is built into operating systems (Linux DCE RPC, MS RPC)

Conclusion Firefly RPC is fast compared to most RPC implementations. LRPC is even faster. Are they fast enough? “The performance of Firefly RPC is now good enough that programmers accept it as the standard way to communicate” (1990)  Is speed still an issue?