1. Overview  Introduction  Motivations  Multikernel Model  Implementation – The Barrelfish  Performance Testing  Conclusion 2.

Slides:



Advertisements
Similar presentations
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Advertisements

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Multiple Processor Systems
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
The Multikernel: A new OS architecture for scalable multicore systems Andrew Baumann et al CS530 Graduate Operating System Presented by.
Computer Systems/Operating Systems - Class 8
Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC L3 Similar.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Bugnion et al. Presented by: Ahmed Wafa.
Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
G Robert Grimm New York University Disco.
User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
3.5 Interprocess Communication
1 Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, and Mendel Rosenblum, Stanford University, 1997.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
Chapter 11 Operating Systems
Introduction Operating Systems’ Concepts and Structure Lecture 1 ~ Spring, 2008 ~ Spring, 2008TUCN. Operating Systems. Lecture 1.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
Chapter 3 Operating Systems Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Chapter 8 Windows Outline Programming Windows 2000 System structure Processes and threads in Windows 2000 Memory management The Windows 2000 file.
Computer System Architectures Computer System Software
2017/4/21 Towards Full Virtualization of Heterogeneous Noc-based Multicore Embedded Architecture 2012 IEEE 15th International Conference on Computational.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
The Multikernel: A new OS architecture for scalable multicore systems
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Types of Operating Systems
Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Processes Introduction to Operating Systems: Module 3.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
Types of Operating Systems 1 Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young MACH: A New Kernel Foundation for UNIX Development Presenter: Wei-Lwun.
The Mach System Silberschatz et al Presented By Anjana Venkat.
Full and Para Virtualization
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
The Multikernel: A New OS Architecture for Scalable Multicore Systems By (last names): Baumann, Barham, Dagand, Harris, Isaacs, Peter, Roscoe, Schupbach,
Background Computer System Architectures Computer System Software.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
1.3 Operating system services An operating system provide services to programs and to the users of the program. It provides an environment for the execution.
T HE M ULTIKERNEL : A NEW OS ARCHITECTURE FOR SCALABLE MULTICORE SYSTEMS Presented by Mohammed Mustafa Distributed Systems CIS*6000 School of Computer.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Computer System Structures
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Alternative system models
Introduction to Operating System (OS)
CMSC 611: Advanced Computer Architecture
The Multikernel A new OS architecture for scalable multicore systems
Page Replacement.
Chapter 2: System Structures
Multithreaded Programming
Chapter 2: Operating-System Structures
Introduction to Operating Systems
Chapter 2: Operating-System Structures
Chapter 13: I/O Systems.
Presentation transcript:

1

Overview  Introduction  Motivations  Multikernel Model  Implementation – The Barrelfish  Performance Testing  Conclusion 2

Introduction Change and diversity in computer hardware become a challenge for OS designers Number of cores, caches, interconnect links, IO devices, etc. Today’s general purpose OS is not be able to scale fast enough to keep up with the new system designs In order to adapt with this changing hardware, treat the computer as networked components using OS architecture ideas from distributed systems. Multikernel is a good idea Treating the machine as a network of independent cores No inter-core sharing at the lowest level Moving traditional OS functionality to a distributed system of processes Scalability problems for operating systems can be recast by using messages 3

Motivations Increasingly diverse systems Impossibility of optimizing general-purpose OS at design or implementation time for any particular hardware configuration In order to use modern hardware efficiently, Oses such as Window 7 are forced to adopt complex optimizations. (6000 lines of code in 58 files) Increasingly diverse cores Cores can vary within a single machine A mix of different kinds of cores becoming popular Interconnection (connection between different components) For scalability reasons, message passing hardware replaced the single shared interconnect Communication between hardware components resembles a message passing network System software has to adapt to the inter-core topology 4

Motivations Messages vs Shared memory Trend is changing from shared memory to message passing Messages cost less than shared memory When 16 cores are modifying the same data it takes almost 12,000 extra cycles to perform the update. 5

Motivations Cache coherence is not always a solution Hardware cache-coherence protocols will be increasingly expensive because of the growth in the number of cores and complexity of the interconnect Future Oses will either have to handle non-coherent memory or be able to realize substantial performance gains bypassing the cache-coherence protocol 6

The Multikernel Model Three Design Principles: Make all inter-core communication explicit Make the Operating system structure hardware-neutral View state as replicated instead of shared 7

The Multikernel Model Explicit inter-core communication: All communication is done through explicit messages Use of pipelining and batching Pipelining: Sending a number of requests at once Batching: Bundling a number of requests into one message and processing multiple messages together 8

The Multikernel Model Hardware-neutral Operating System structure Separate the OS from the hardware as much as possible Only 2 aspects that are targeted at machine architectures Interface to hardware devices (CPUs and devices) Message passing mechanisms Messaging abstraction is used to avoid extensive optimizations to achieve scalability Focus on optimization of messaging rather than hardware/cache/memory access 9

The Multikernel Model Replicated state: Maintain state through replication rather than shared memory Replicating data and updating by exchanging messages Improves system scalability Reduces: Load on system interconnect Contention for memory Overhead for synchronization Brings data closer to the cores that process it which leads to lowered access latencies. 10

Implementation Barrelfish: A substantial prototype operating system structured according to the multikernel model Goals: Perform as well as or better than existing commodity operating systems on future multicore hardware. Be re-targeted and adapted to different hardware Demonstrate evidence of scalability to large numbers of cores Be able to exploit message passing abstraction to achieve good performance (pipelining and batching messages) Exploit the modularity of the OS to place OS functionality according to hardware topology 11

Implementation 12

Implementation 13 CPU Drivers Performs authorization, time-slices user-space processes Shares no data with other cores Completely event driven, single-threaded and nonpreemptable Monitors Performs all the inter-core coordination Single core, user-space processes and schedulable Keeps replicated data structures consistent Responsible for inter-process communication setup Can put the core to sleep if no work is to be done

Implementation Process Structure: Collection of dispatcher objects Communication is done through dispatchers Scheduling done by the local CPU drivers The dispatcher runs a user-level thread scheduler Inter-core communication: Most communication done through messages For now cache-coherent memory is used Carefully tailored to the cache-coherence protocol to minimize the number of interconnect messages Uses a user-level remote procedure call between cores: Shared memory used as a channel for communication Sender writes message to cache line Receiver polls on the last word of the cache line to read message 14

Implementation Memory Management User-level applications and system services might use shared memory across multiple cores Allocation of physical memory must be consistent OS code and data is itself stored in the same memory All memory management is performed explicitly through system calls Manipulate capabilities that are user level references to kernel objects or regions of memory The CPU driver is only responsible for checking the correctness of manipulation operations 15

All virtual memory management performed by the user-level code To allocate memory it makes a request for some RAM Retypes the RAM capabilities to page table capabilities Send it to the CPU driver to insert into root page table CPU driver checks the correctness and inserts it However, authors realized that this was a mistake 16 Implementation Memory Management

Implementation Shared Address Space Barrelfish supports the traditional process model of threads sharing a single virtual address space Coordination has an effect on 3 OS components: Virtual address space: Hardware page tables are shared among dispatchers or replicated through messages Capabilities: Monitors can send capabilities between cores, guaranteeing that capability is not pending revocation Thread management Thread schedulers exchange messages to Create and unblock threads Move threads between dispatchers (cores) Barrelfish only multiplexes dispatchers on each core via CPU driver scheduler 17

Implementation Knowledge and Policy Engine System Knowledge Base to keep track of hardware Contains information gathered through hardware discovery ACPI tables, PCI buses, CPUID data, URPC latency, Bandwidth.. Allows brief expressions of optimization queries to select appropriate message transports 18

Evaluation TLB shootdown Maintains TLB consistency invalidating entries Linux/Windows(IPI) vs Barrelfish (message passing): In Linux/Windows, through IPI, a core sends an interrupt to each core so that each core traps, acks the IPI, invalidates the TLB entry and resumes. It could be disruptive when every core takes the cost of a trap (800 cycles) In Barrelfish, Local monitor broadcasts invalidate messages and waits for a reply Are able to exploit knowledge about the specific hardware platform to achieve very good TLB shootdown performance 19

TLB Comparison 20

Evaluation TLB Shootdown  Allows optimization of messaging mechanism  Multicast scales much better than unicast and broadcast  Broadcast: good for AMD/Hypertransport which is a broadcast network  Unicast: good for small number of cores  Multicast: good for shared, on-chip L3 cache  NUMA-Aware Multicast: scales very well by allocating URPC buffers from memory local to the multicast aggregation nodes and sending messages to highest latency first 21

TLB Comparison 22

23 Computation Comparisons (Shared memory, threads and scheduling )

Conclusion 24 It does not beat Linux in performance, however… Barrelfish is more lightweight and has reasonable performance on current hardware Good scalability with core count and easy adaptation to use more efficient communication patterns Advantages of pipelining and batching of request messages without reconstructing the OS code Barrelfish can be a practicable alternative to existing monolithic systems