User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

URPC for Shared Memory Multiprocessors Brian Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy ACM TOCS 9 (2), May 1991.

Scheduler Activations: Effective Kernel Support for the User-level Management of Parallelism Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska,

User-Level Interprocess Communication for Shared Memory Multiprocessors Bershad, B. N., Anderson, T. E., Lazowska, E.D., and Levy, H. M. Presented by Akbar.

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.

Lightweight Remote Procedure Call BRIAN N. BERSHAD THOMAS E. ANDERSON EDWARD D. LAZOWSKA HENRY M. LEVY Presented by Wen Sun.

Multiple Processor Systems

Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-

Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Alana Sweat.

Extensibility, Safety and Performance in the SPIN Operating System Department of Computer Science and Engineering, University of Washington Brian N. Bershad,

Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC L3 Similar.

G Robert Grimm New York University Lightweight RPC.

User-Level Interprocess Communication for Shared Memory Multiprocessors Bershad, B. N., Anderson, T. E., Lazowska, E.D., and Levy, H. M. Presented by Chris.

SCHEDULER ACTIVATIONS Effective Kernel Support for the User-level Management of Parallelism Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, Henry.

Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,

CS533 Concepts of Operating Systems Class 8 Shared Memory Implementations of Remote Procedure Call.

User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.

Dawson R. Engler, M. Frans Kaashoek, and James O'Tool Jr.

CS533 Concepts of Operating Systems Class 4 Remote Procedure Call.

Scheduler Activations Effective Kernel Support for the User-Level Management of Parallelism.

Improving IPC by Kernel Design Jochen Liedtke Presented by Ahmed Badran.

3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.

Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, Henry M. Levy ACM Transactions Vol. 8, No. 1, February 1990,

Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,

3.5 Interprocess Communication

USER LEVEL INTERPROCESS COMMUNICATION FOR SHARED MEMORY MULTIPROCESSORS Presented by Elakkiya Pandian CS 533 OPERATING SYSTEMS – SPRING 2011 Brian N. Bershad.

CS533 Concepts of Operating Systems Class 3 Integrated Task and Stack Management.

CS533 Concepts of Operating Systems Class 9 User-Level Remote Procedure Call.

Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University

CS533 Concepts of Operating Systems Class 4 Remote Procedure Call.

Scheduler Activations Jeff Chase. Threads in a Process Threads are useful at user-level – Parallelism, hide I/O latency, interactivity Option A (early.

1 Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska and Henry M. Levy Presented by: Karthika Kothapally.

CS533 Concepts of Operating Systems Class 9 Lightweight Remote Procedure Call (LRPC) Rizal Arryadi.

CS510 Concurrent Systems Jonathan Walpole. Lightweight Remote Procedure Call (LRPC)

Lightweight Remote Procedure Call (Bershad, et. al.) Andy Jost CS 533, Winter 2012.

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

1 Previous lecture review n Out of basic scheduling techniques none is a clear winner: u FCFS - simple but unfair u RR - more overhead than FCFS may not.

Fast Multi-Threading on Shared Memory Multi-Processors Joseph Cordina B.Sc. Computer Science and Physics Year IV.

CS533 Concepts of Operating Systems Jonathan Walpole.

Scheduler Activations: Effective Kernel Support for the User- Level Management of Parallelism. Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska,

1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.

Scheduler Activations: Effective Kernel Support for the User-level Management of Parallelism Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska,

Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.

Lightweight Remote Procedure Call BRIAN N. BERSHAD, THOMAS E. ANDERSON, EDWARD D. LASOWSKA, AND HENRY M. LEVY UNIVERSTY OF WASHINGTON "Lightweight Remote.

CSE 60641: Operating Systems Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism. Thomas E. Anderson, Brian N.

OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.

Processes Introduction to Operating Systems: Module 3.

EXTENSIBILITY, SAFETY AND PERFORMANCE IN THE SPIN OPERATING SYSTEM

Middleware Services. Functions of Middleware Encapsulation Protection Concurrent processing Communication Scheduling.

Networking Implementations (part 1) CPS210 Spring 2006.

LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.

Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy. Presented by: Tim Fleck.

The Mach System Silberschatz et al Presented By Anjana Venkat.

Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.

Brian Bershad, Thomas Anderson, Edward Lazowska, and Henry Levy Presented by: Byron Marohn Published: 1991.

Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.

CS533 Concepts of Operating Systems Jonathan Walpole.

Background Computer System Architectures Computer System Software.

Processes Chapter 3. Processes in Distributed Systems Processes and threads –Introduction to threads –Distinction between threads and processes Threads.

CS533 Concepts of Operating Systems

B. N. Bershad, T. E. Anderson, E. D. Lazowska and H. M

By Brian N. Bershad, Thomas E. Anderson, Edward D

Fast Communication and User Level Parallelism

Presented by Neha Agrawal

Presented by: SHILPI AGARWAL

Thomas E. Anderson, Brian N. Bershad,

Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M

CS533 Concepts of Operating Systems Class 11

Presentation transcript:

User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented By: Yahia Mahmoud

Yahia Mahmoud - CS5332 Introduction  IPC is central to OS design  Encourages decomposition across address space boundaries Failure Isolation – no leaking Extensibility – adding modules dynamically Modularity  Slow IPC => trade of between performance and decomposition

Yahia Mahmoud - CS5333 Problems  IPC has been kernel responsibility – two problems: Architectural performance barriers  Overhead of Kernel-mediated cross address space call – 70% of LRPC overhead Interaction with user-level threads  Partitioning of communication and thread management across protection boundaries has high performance cost.

Yahia Mahmoud - CS5334 Solution  Eliminate kernel from cross-address space communication  Use shared memory as data transfer channel  Avoid processor reallocation – use already active processor in target address space  This approach results in improved performance because…

Yahia Mahmoud - CS5335 Advantages  Messages sent without kernel involved  Avoid processor reallocation. Reduce call overhead and preserve cache  Processor reallocation overhead can be amortized over several calls  Exploit the parallelism of send and receive of messages

Yahia Mahmoud - CS5336 URPC  Messages are passed through logical memory channels – created and mapped for every client/server  User-level Thread management – no kernel involved in messages sent

Yahia Mahmoud - CS5337 Software Components

Yahia Mahmoud - CS5338 URPC  Separates IPC into three components 1. Data transfer 2. Thread management 3. Processor reallocation  Goals Move 1 and 2 into user-level 3 needs the kernel but try to avoid it. Why?  Scheduling cost to decide which address space runs on the processor  Virtual memory mapping costs  Long-term costs because of cache and TLB invalidated

Yahia Mahmoud - CS5339 URPC  Calls appear synchronous to programmer but asynchronous beneath the system  Client thread blocks and another ready thread runs  LRPC used the same blocked thread to run the ready thread  URPC always tries to schedule another thread Avoid context switching overhead  If load balancing is needed client lends its processor to the server via kernel and server returns back after processing messages

Yahia Mahmoud - CS53310 Processor Reallocation  Kernel uses pessimistic reallocation (handoff scheduling)  This policy does not always improve performance  Kernel centralized data structure creates performance bottleneck (lock contention, thread run queues and message channels)

Yahia Mahmoud - CS53311 Processor Reallocation  Use Optimistic reallocation policy The client has other work to do – no performance side effect of delaying message processing at server side => inexpensive context switch Server is not underpowered – has or will have processor to process message => client executes in parallel with server  This won’t hold in case of time sensitive service is needed (real time system, high latency I/O ops)  In case of reallocation is needed it’s done via kernel Idle processor can donate itself to underpowered address space The identity of the donating processor is made known to the receiver No guarantee that processor will be returned back to donor

Yahia Mahmoud - CS53312 Example

Yahia Mahmoud - CS53313 Data Transfer  Arguments can be passed using shared memory and still guarantee safety  Stubs are responsible for communication safety On receipt of a message, stubs unmarshal the data into parameters and do the needed copying and checking to ensure application safety  No need to use kernel stubs can do the copying directly Data are passed on stack or heap and stubs copy them directly In case of type safe language copying into kernel does not guarantee type checking of data  Shared memory queues are controlled with test-and-set locks with no spinning

Yahia Mahmoud - CS53314 Thread Management  Fine-grained parallel application needs high performance thread management which could only be achieved by implementing in user-level  Communication & Thread management can achieve very good performances when both are implemented at user-level  Threads block in order to: Synchronize their activities in same address space Wait for external events from different address space  Communication implemented at kernel level will result in synchronization at both user level and kernel level

Yahia Mahmoud - CS53315 Performance

Yahia Mahmoud - CS53316 Call Latency and Throughput  Call Latency is the time from which a thread calls into the stub until control returns from the stub.  These are load dependent, and depend on Number of Client Processors (C) Number of Server Processors (S) Number of runnable threads in the client’s Address Space (T)  The graphs measure how long it takes to make 100,000 “Null” procedure calls into the server in a “tight loop”

Yahia Mahmoud - CS53317 Call Latency and Throughput

Yahia Mahmoud - CS53318 Conclusions  Performance gains from moving features out of the kernel not vice-versa  URPC represents appropriate division for operating system kernels of shared memory multiprocessors  URPC showcases a design specific to a multiprocessor, not just uniprocessor design that runs on multiprocessor hardware