T HE M ULTIKERNEL : A NEW OS ARCHITECTURE FOR SCALABLE MULTICORE SYSTEMS Presented by Mohammed Mustafa Distributed Systems CIS*6000 School of Computer.

Slides:



Advertisements
Similar presentations
Operating Systems Components of OS
Advertisements

Multiple Processor Systems
Processes and Threads Chapter 3 and 4 Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community College,
Threads, SMP, and Microkernels
Lecture 12 Page 1 CS 111 Online Devices and Device Drivers CS 111 On-Line MS Program Operating Systems Peter Reiher.
Multiple Processor Systems
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Chapter 4 Threads, SMP, and Microkernels Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design.
Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Alana Sweat.
The Multikernel: A new OS architecture for scalable multicore systems Andrew Baumann et al CS530 Graduate Operating System Presented by.
Computer Systems/Operating Systems - Class 8
Chapter 13 Embedded Systems
Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC L3 Similar.
1. Overview  Introduction  Motivations  Multikernel Model  Implementation – The Barrelfish  Performance Testing  Conclusion 2.
Multiple Processor Systems Chapter Multiprocessors 8.2 Multicomputers 8.3 Distributed systems.
1: Operating Systems Overview
Introduction Operating Systems’ Concepts and Structure Lecture 1 ~ Spring, 2008 ~ Spring, 2008TUCN. Operating Systems. Lecture 1.
Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Chapter 8 Windows Outline Programming Windows 2000 System structure Processes and threads in Windows 2000 Memory management The Windows 2000 file.
Computer System Architectures Computer System Software
The Multikernel: A new OS architecture for scalable multicore systems
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
E X C E E D I N G E X P E C T A T I O N S OP SYS Linux System Administration Dr. Hoganson Kennesaw State University Operating Systems Functions of an operating.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Processes Introduction to Operating Systems: Module 3.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
System Components ● There are three main protected modules of the System  The Hardware Abstraction Layer ● A virtual machine to configure all devices.
Department of Computer Science and Software Engineering
The Mach System Silberschatz et al Presented By Anjana Venkat.
The Multikernel: A New OS Architecture for Scalable Multicore Systems By (last names): Baumann, Barham, Dagand, Harris, Isaacs, Peter, Roscoe, Schupbach,
Background Computer System Architectures Computer System Software.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
1.3 Operating system services An operating system provide services to programs and to the users of the program. It provides an environment for the execution.
CT101: Computing Systems Introduction to Operating Systems.
Introduction to Operating Systems Concepts
Computer System Structures
Module 3: Operating-System Structures
Process Management Process Concept Why only the global variables?
Session 3 Memory Management
Operating Systems (CS 340 D)
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Chapter 1: Introduction
Chapter 2: System Structures
Operating System Structure
Introduction to Operating System (OS)
Operating Systems (CS 340 D)
CMSC 611: Advanced Computer Architecture
The Multikernel A new OS architecture for scalable multicore systems
Chapter 4: Threads.
Threads, SMP, and Microkernels
Page Replacement.
Chapter 2: System Structures
Multiple Processor Systems
Lecture 4- Threads, SMP, and Microkernels
Fast Communication and User Level Parallelism
Threads Chapter 4.
Multithreaded Programming
Chapter 2: Operating-System Structures
Introduction to Operating Systems
Chapter 2: Operating-System Structures
Presentation transcript:

T HE M ULTIKERNEL : A NEW OS ARCHITECTURE FOR SCALABLE MULTICORE SYSTEMS Presented by Mohammed Mustafa Distributed Systems CIS*6000 School of Computer Science University of Guelph/ Fall

Introduction 1 Motivations 2 The Multikernel Model 3 Implementation 4 Evaluation 5 Conclusions 6 O VERVIEW 2

Computer hardware is changing faster than system software. There are many components changing at once number of cores, caches, interconnect links, IO devices, etc. Having so many changing components leads to difficulties with scaling operating systems. Since the hardware is changing fast, it’s better to use a general purpose design than target specific hardware components. Hardware optimizations become obsolete a few years later when new hardware arrives. Introduction 1 I NTRODUCTION 3

Introduction 1 In order to adapt with changing hardware, treat the computer as networked components using OS architecture ideas from distributed systems. Treat the machine as a network of independent cores No inter-core sharing at the lowest level Move traditional OS functionality to a distributed system of processes Communication done through message passing Goal is to be easily scalable I NTRODUCTION 4

A general purpose Operating System must perform well on an increasingly wide range of system designs. Hardware has different performance characteristics Unlike large high performance systems, you don’t know what hardware pieces you’ll be using so you cannot optimize for specific hardware at the design stage. The Sun Niagara processor uses a banked L2 cache that a reader-writer lock can exploit by trying to write to it twice in a row to see if there are any readers. M OTIVATIONS Motivations 2 5

Processor cores are also diverse Some machines have a mix of different kinds of cores. Possibly with different instruction sets. M OTIVATIONS Motivations 2 6

Interconnect is the term used to describe the connection between different components Communication between hardware components resembles a message passing network. There are already programmable peripherals like Network Interface Cards and Graphical Processing Units that do not maintain cache coherence with CPUs. M OTIVATIONS Motivations 2 7

Messages cost less than shared memory Traditionally, most communication has been done through shared memory The costs grow approximately linearly with the number of threads and how many lines are modified in the cache. When 16 cores are modiying the same data it takes almost 12,000 extra cycles to perform the update. M OTIVATIONS Motivations 2 8

Structure the Operating System as a distributed system of cores that communicate using message passing and share no memory. Three Design Principles: Make all inter-core communication explicit Make the Operating system structure hardware-neutral View state as replicated instead of shared T HE M ULTIKERNEL M ODEL The Multikernel Model 3 9

Make all inter-core communication explicit: All communication is done through messages Use pipelining and batching Pipelining: Sending a number of requests at once Batching: Bundling a number of requests into one message and processing multiple messages together Make the Operating System structure hardware-neutral Separate the OS from the hardware as much as possible Only things that are specific are: Interface to hardware devices Message passing mechanisms This lets the OS adapt to future hardware easily. Windows 7 had to change 6000 lines of code in 58 files to improve scalability. T HE M ULTIKERNEL M ODEL The Multikernel Model 3 10

View state as replicated: Operating Systems maintain state Windows dispatcher, Linux scheduler queue, etc. Must be accessible on multiple processors Information passing is usually done through shared data structures that are protected by locks Instead, replicate data and update by exchanging messages Improves system scalability Reduces: Load on system interconnect Contention for memory Overhead for synchronization Replication allows data to be shared between cores that do not support the same page table format Brings data closer to the cores that process it. T HE M ULTIKERNEL M ODEL The Multikernel Model 3 11

Barrelfish: A research operating system built from scratch Exploring OS structure for multi-core, scalability, and hardware diversity Goals: Give comparable performance to existing commodity operating systems. Demonstrate evidence of scalability to large numbers of cores Exploit message passing abstraction to achieve good performance Exploit the modularity of the OS to place OS functionality according to hardware topology Be retargeted and adapted for different hardware I MPLEMENTATION Implementation 4 12

System Structure: Put an OS instance on each core as a privileged-mode CPU driver Performs authorization Delivers hardware interrupts to user-space drivers Time-slices user-space processes Shares no data with other cores Completely event driven Single-threaded Nonpreemptable Implements an lightweight remote procedure call table Monitors Single core user-mode processes, schedulable Performs all the inter-core coordination Handles queues of messages Keeps replicated data structures consistent Memory allocation tables Address space mappings Can put the core to sleep if no work is to be done I MPLEMENTATION Implementation 4 13

I MPLEMENTATION Implementation 4 14

Processes: Collection of dispatcher objects One for each core it might execute on Communication is done through dispatchers Scheduling done by the local CPU drivers The dispatcher runs a user-level thread scheduler Implements a threads package similiar to standard POSIX threads Inter-core communication: Most communication done through messages For now it uses cache-coherent memory Was the only mechanism available on their current hardware platform Uses a user-level remote procedure call between cores Shared memory used to transfer cache-line-sized messages Poll on the last word to prevent partial messages I MPLEMENTATION Implementation 4 15

Memory Management: The operating system is distributed Must consistently manage a set of global resources User-level applications and system services can use shared memory across multiple cores Must ensure that a user-level process can’t acquire a virtual map to a region of memory that stores a hardware page table or other OS object All memory management is performed explicitly through system calls Manipulates user level references to kernel objects or regions of memory The CPU driver is responsible for checking the correctness All virtual memory management is performed by the user-level code To allocate memory it makes a request for some RAM Retypes it as a page table Send it to the CPU driver to insert into root page table CPU driver checks and inserts it This was a mistake Very complex and not more efficient that conventional OS I MPLEMENTATION Implementation 4 16

Shared Address Space: Barrelfish supports the traditional process model of threads sharing a single virtual address space Shared across dispatchers (cores) Provides coordination over: Virtual address space Hardware page tables Shared among dispatchers or replicated through messages Capabilities What the user address space has access to Monitors can send capabilities between cores Thread management Thread schedulers exchange messages Create and unblock threads Move threads between dispatchers (cores) Allows dispatchers to gang schedule threads to run together I MPLEMENTATION Implementation 4 17

Shared Address Space: Barrelfish supports the traditional process model of threads sharing a single virtual address space Shared across dispatchers (cores) Provides coordination over: Virtual address space Hardware page tables Shared among dispatchers or replicated through messages Capabilities What the user address space has access to Monitors can send capabilities between cores Thread management Thread schedulers exchange messages Create and unblock threads Move threads between dispatchers (cores) Allows dispatchers to gang schedule threads to run together I MPLEMENTATION Implementation 4 18

Knowledge and Policy Engine: Maintains a system knowledge base to keep track of hardware Populated through information discovery ACPI tables, PCI buses, CPUID data, RPC latency, Bandwidth.. Allows the system to optimize device drivers and select appropriate message transport mechanisms. I MPLEMENTATION 4. Implementation 19 4

This is just one way to use a multikernel model Separating the CPU driver and monitor is not optimal Doing so requires context switches to go back and forth Lots of overhead Barrelfish is designed to be a proving ground for experimenting with non-monolithic systems with the idea of moving away from shared data as a communications system E VALUATION Evaluation 5 20

TLB Comparison: Invalidating page entries requires global coordination Linux and Windows: The core that wants to change a page writes the operation to the shared location and sends an interrupt to all other cores that have that page. Each core traps, writes to a shared variable to notify it got the change, changes the TLB, and continues execution The first core can continue when it sees that all the other cores received it’s change. The is fast but disruptive In Barrelfish: Uses messages instead of interrupts to pass on the changes. Waits for a reply before continuing Requires waiting, messages get passed when its convient Takes advantage of known hardware platforms AMD HyperTransport provides very good performance E VALUATION Evaluation 5 21

TLB Comparison: E VALUATION Evaluation 5 22

Computation Comparisons (shared memory, threads, and scheduling): E VALUATION Evaluation 5 23

The systems they compared were very different Barrelfish performed reasonably well Many of the system components are not optimized or implemented yet The network stack they used was just a place holder Computer hardware is changing fast Multicore machines resemble complex networked systems Message based communication is a viable alternative Scalability is important to utilize future hardware C ONCLUSION 5 Conclusions 6 24

Q UESTIONS ? 25