Alternative system models

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

There is more Consensus in Egalitarian Parliaments Presented by Shayan Saeed Used content from the author's presentation at SOSP '13
NETWORK ALGORITHMS Presenter- Kurchi Subhra Hazra.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Multiple Processor Systems
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
The Multikernel: A new OS architecture for scalable multicore systems Andrew Baumann et al CS530 Graduate Operating System Presented by.
1. Overview  Introduction  Motivations  Multikernel Model  Implementation – The Barrelfish  Performance Testing  Conclusion 2.
490dp Synchronous vs. Asynchronous Invocation Robert Grimm.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Computer System Overview Chapter 1. Basic computer structure CPU Memory memory bus I/O bus diskNet interface.
Paxos Made Simple Jinghe Zhang. Introduction Lock is the easiest way to manage concurrency Mutex and semaphore. Read and write locks. In distributed system:
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Presented by Dr. Greg Speegle April 12,  Two-phase commit slow relative to local transaction processing  CAP Theorem  Option 1: Reduce availability.
Bringing Paxos Consensus in Multi-agent Systems Andrei Mocanu Costin Bădică University of Craiova.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services J. Baker, C. Bond, J.C. Corbett, JJ Furman, A. Khorlin, J. Larson, J-M.
Architectures of distributed systems Fundamental Models
Review COMPSCI 210 Recitation 7th Dec 2012 Vamsi Thummala.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
S-Paxos: Eliminating the Leader Bottleneck
Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
The Mach System Silberschatz et al Presented By Anjana Venkat.
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
T HE M ULTIKERNEL : A NEW OS ARCHITECTURE FOR SCALABLE MULTICORE SYSTEMS Presented by Mohammed Mustafa Distributed Systems CIS*6000 School of Computer.
Chapter 13: I/O Systems.
Last Class: Introduction
Module 12: I/O Systems I/O hardware Application I/O Interface
CS5102 High Performance Computer Systems Thread-Level Parallelism
Distributed Systems – Paxos
The Multikernel: A New OS Architecture for Scalable Multicore Systems
CSCI5570 Large Scale Data Processing Systems
Consistency in Distributed Systems
CMSC 611: Advanced Computer Architecture
The Multikernel A new OS architecture for scalable multicore systems
EECS 498 Introduction to Distributed Systems Fall 2017
Implementing Consistency -- Paxos
Outline Announcements Fault Tolerance.
IS 651: Distributed Systems Midterm
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
CS703 - Advanced Operating Systems
Multiple Processor Systems
Building your own modern distributed system
Architectures of distributed systems Fundamental Models
Yiannis Nikolakopoulos
EEC 688/788 Secure and Dependable Computing
Architectures of distributed systems Fundamental Models
From Viewstamped Replication to BFT
Chapter 2: Operating-System Structures
EEC 688/788 Secure and Dependable Computing
EECS 498 Introduction to Distributed Systems Fall 2017
CS510 - Portland State University
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Architectures of distributed systems Fundamental Models
Database System Architectures
EEC 688/788 Secure and Dependable Computing
Chapter 2: Operating-System Structures
Chapter 13: I/O Systems.
Implementing Consistency -- Paxos
Module 12: I/O Systems I/O hardwared Application I/O Interface
Sisi Duan Assistant Professor Information Systems
Presentation transcript:

Alternative system models Tudor David

Outline The multi-core as a distributed system Case study: agreement The distributed system as a multi-core

The system model Whatever can be expressed using shared memory Concurrency: several communicating processes executing at the same time; Implicit communication: shared memory; Resources – shared between processes; Communication – implicit through shared resources; Synchronization – locks, condition variables, non-blocking algorithms, etc. Explicit communication: message passing; Resources – partitioned between processes; Communication – explicit message channels; Synchronization – message channels; Whatever can be expressed using shared memory can be expressed using message passing

So far – shared memory view set of registers that any process can read or write to; communication – implicit through these registers; problems: concurrency bugs – very common; scalability - not great when contention high; Scalability

Alternative model: message passing more verbose but – can better control how information is sent in the multi-core. P1 P2 R1 R2 R3 R4 how do we get message passing on multicores? dedicated hardware message passing channels (e.g. Tilera) more common – use dedicated cache lines for message queues

Programming using message passing System design – more similar to distributed systems; Map concepts from shared memory to message passing; A few examples: Synchronization, data structures: flat combining; Programming languages: e.g. Go, Erlang; Operating systems: the multikernel (e.g. Barrelfish)

Barrelfish: All Communication - Explicit Communication – exclusively message passing Easier reasoning: know what is accessed when and by whom Asynchronous operations – eliminate wait time Pipelining, batching More scalable Applications can still use shared memory

Barrelfish: OS Structure – Hardware Neutral Separate OS structure from hardware Machine dependent components Messaging HW interface Better layering, modularity Easier porting

Barrelfish: Replicated State No shared memory => replicate shared OS state Reduce interconnect load Decrease latencies Asynchronous client updates Possibly NUMA aware replication

Consistency of replicas: agreement protocol Implicit communication (shared memory) Explicit communication (message passing) Locally cached data Replicated state State machine replication Total ordering of updates Agreement High availability, High scalability How should we do message-passing agreement in a multi-core?

Outline The multi-core as a distributed system Case study: agreement The distributed system as a multi-core

First go – a blocking protocol Two-Phase Commit (2PC) 1. broadcast Prepare 2. wait for Acks 3. broadcast Commit/Rollback 4. wait for Acks Blocking, all messages go through coordinator

Is a blocking protocol appropriate? “Latency numbers every programmer should know” L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes 3 000 ns Send 1K bytes over 1 Gbps network 10 000 ns Read 4K randomly from SSD 150 000 ns Read 1MB sequentially from memory 250 000 ns Round trip within datacenter 500 000 ns Read 1 MB sequentially from SSD 1 000 000 ns Disk seek 10 000 000 ns Read 1 MB sequentially from disk 20 000 000 ns Send packet CA->Netherlands->CA 150 000 000 ns Source: Jeff Dean Blocking agreement – only as fast as the slowest participant Scheduling? I/O? Use a non-blocking protocol

Non-blocking agreement protocols Consensus ~ non-blocking agreement between distributed processes on one out of possibly multiple proposed values Paxos Tolerates non-malicious faults or unresponsive nodes: in multi-cores, slow cores Needs a majority of responses to progress (tolerates partitions) Phase 1: prepare Phase 2: accept Roles: Proposer Acceptor Learner Lots of variations and optimizations: CheapPaxos, MultiPaxos, FastPaxos etc. Usually – all roles on a physical node (Collapsed Paxos)

MultiPaxos Unless failed, keep same leader in subsequent rounds P P P

Does MultiPaxos scale in a multi-core? Limited scalability in the multi-core environment

A closer look at the multi-core environment Where does time go when sending a message? Large networks: Minimize number of rounds/instance Multi-core: Minimize the number of messages < 1 us ~100 us

Can we adapt Paxos to this scenario? Replication of service (availability): Advocate client commands Resolve contention between proposers, short-term memory (reliability, availability) A A A L L L Replication of data (reliability): Long-term memory Using one acceptor significantly reduces the number of messages

1Paxos: The failure-free case 2 3 P P P 1. P2: obtains active acceptor A1 and sends prepare_request(pn) 2. A1: if pn -> max. proposal received, replies to P2 with ack A A A 3. P2 -> A1 accept_request(pn, value) L L L 4. A1 broadcasts value to learners Common case: only steps 3 and 4

1Paxos: Switching the acceptor 2 3 1. P2 leader? P P P 2.PaxosUtility: P2 proposes A3 active acceptor Uncommitted proposed values A A A A 3. P2 -> A3: prepare_request L L L

1Paxos: Switching the leader 2 3 1. A1 – active acceptor? P P P P 2. PaxosUtility: P3 new leader and A1 active acceptor A A A 3. P3 -> A1: prepare_request L L L

Switching leader and acceptor The trade-off: while leader and active acceptor non-responsive at the same time ✖ liveness ✔safety P P P A A A Why is it reasonable? small probability event no network partitions if nodes not crashed, but slow -> system becomes responsive after a while L L L

Latency and throughput 45 clients 7 clients 6 clients 13 clients Smaller # of messages - smaller latency and increased throughput

Agreement - summary Multi-core – message passing distributed system, but distributed algorithm implementations different Agreement in multi-cores non blocking reduced # of messages Use one acceptor: 1Paxos reduced latency increased throughput

Outline The multi-core as a distributed system Case study: agreement The distributed system as a multi-core

Remote Direct Memory Access (RDMA) Read/Write remote memory NIC performs DMA requests Great performance Bypasses the kernel Bypasses the remote CPU Source: A. Dragojevic

FaRM: Fast Remote Memory (NSDI’14) – Dragojevic et al. Source: A. Dragojevic FaRM: Fast Remote Memory (NSDI’14) – Dragojevic et al.