John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2014.

Slides:



Advertisements
Similar presentations
MinCopysets: Derandomizing Replication in Cloud Storage
Advertisements

Fast Crash Recovery in RAMCloud
Memory and Object Management in a Distributed RAM-based Storage System Thesis Proposal Steve Rumble April 23 rd, 2012.
RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Nandu Jayakumar, Diego Ongaro, Mendel Rosenblum,
RAMCloud Scalable High-Performance Storage Entirely in DRAM John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières,
RAMCloud 1.0 John Ousterhout Stanford University (with Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin.
MSN 2004 Network Memory Servers: An idea whose time has come Glenford Mapp David Silcott Dhawal Thakker.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble, Ankita Kejriwal, and John Ousterhout Stanford University.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
SEDCL: Stanford Experimental Data Center Laboratory.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
The Google File System.
CS 142 Lecture Notes: Large-Scale Web ApplicationsSlide 1 RAMCloud Overview ● Storage for datacenters ● commodity servers ● GB DRAM/server.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
Distributed storage for structured data
Metrics for RAMCloud Recovery John Ousterhout Stanford University.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
The RAMCloud Storage System
RAMCloud Design Review Recovery Ryan Stutsman April 1,
A Compilation of RAMCloud Slides (through Oct. 2012) John Ousterhout Stanford University.
It’s Time for Low Latency Steve Rumble, Diego Ongaro, Ryan Stutsman, Mendel Rosenblum, John Ousterhout Stanford University 報告者 : 厲秉忠
Assumptions Hypothesis Hopes RAMCloud Mendel Rosenblum Stanford University.
What We Have Learned From RAMCloud John Ousterhout Stanford University (with Asaf Cidon, Ankita Kejriwal, Diego Ongaro, Mendel Rosenblum, Stephen Rumble,
1 The Google File System Reporter: You-Wei Zhang.
RAMCloud Overview John Ousterhout Stanford University.
RAMCloud: a Low-Latency Datacenter Storage System John Ousterhout Stanford University
RAMCloud: Concept and Challenges John Ousterhout Stanford University.
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
Some key-value stores using log-structure Zhichao Liang LevelDB Riak.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Cool ideas from RAMCloud Diego Ongaro Stanford University Joint work with Asaf Cidon, Ankita Kejriwal, John Ousterhout, Mendel Rosenblum, Stephen Rumble,
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Forum January, 2015.
RAMCloud: System Performance Measurements (Jun ‘11) Nandu Jayakumar
Low-Latency Datacenters John Ousterhout Platform Lab Retreat May 29, 2015.
IMDGs An essential part of your architecture. About me
Durability and Crash Recovery for Distributed In-Memory Storage Ryan Stutsman, Asaf Cidon, Ankita Kejriwal, Ali Mashtizadeh, Aravind Narayanan, Diego Ongaro,
RAMCloud: Low-latency DRAM-based storage Jonathan Ellithorpe, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro,
RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Christos Kozyrakis, David Mazières, Aravind Narayanan,
An Introduction to Consensus with Raft
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Scaling Out Without Partitioning Phil Bernstein & Colin Reid Microsoft Corporation A Novel Transactional Record Manager for Shared Raw Flash © 2010 Microsoft.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
Presenters: Rezan Amiri Sahar Delroshan
CS 140 Lecture Notes: Technology and Operating Systems Slide 1 Technology Changes Mid-1980’s2012Change CPU speed15 MHz2.5 GHz167x Memory size8 MB4 GB500x.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Experiences With RAMCloud Applications John Ousterhout, Jonathan Ellithorpe, Bob Brown.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
RAMCloud Overview and Status John Ousterhout Stanford University.
SEDCL, June 7-8, Table Enumeration in RAMCloud Elliott Slaughter.
Implementing Linearizability at Large Scale and Low Latency Collin Lee, Seo Jin Park, Ankita Kejriwal, † Satoshi Matsushita, John Ousterhout Platform Lab.
Bigtable: A Distributed Storage System for Structured Data
CSci8211: Distributed Systems: RAMCloud 1 Distributed Shared Memory/Storage Case Study: RAMCloud Developed by Stanford Platform Lab  Key Idea: Scalable.
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2013.
The Raft Consensus Algorithm Diego Ongaro and John Ousterhout Stanford University.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
RAMCloud and the Low-Latency Datacenter John Ousterhout Stanford Platform Laboratory.
BIG DATA/ Hadoop Interview Questions.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble and John Ousterhout Stanford University.
CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
RAMCloud Architecture
RAMCloud Architecture
Lecture 15 Reading: Bacon 7.6, 7.7
Chapter 2: Operating-System Structures
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Chapter 2: Operating-System Structures
The Gamma Database Machine Project
Presentation transcript:

John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2014

General-purpose storage system for large-scale applications: ● All data is stored in DRAM at all times ● As durable and available as disk ● Simple key-value data model ● Large scale: servers, 100+ TB ● Low latency: 5-10 µs remote access time Project goal: enable a new class of data-intensive applications June 6, 2013RAMCloud Update and OverviewSlide 2 What is RAMCloud?

June 6, 2013RAMCloud Update and OverviewSlide 3 RAMCloud Architecture Master Backup Master Backup Master Backup Master Backup … Appl. Library Appl. Library Appl. Library Appl. Library … Datacenter Network Coordinator 1000 – 10,000 Storage Servers 1000 – 100,000 Application Servers Commodity Servers GB per server High-speed networking: ● 5 µs round-trip ● Full bisection bwidth

Data Model: Key-Value Store ● Basic operations:  read(tableId, key) => blob, version  write(tableId, key, blob) => version  delete(tableId, key) ● Other operations:  cwrite(tableId, key, blob, version) => version  Enumerate objects in table  Efficient multi-read, multi-write  Atomic increment ● Not in RAMCloud 1.0:  Atomic updates of multiple objects  Secondary indexes May 13, 2014RAMCloud/VMware RADIOSlide 4 Tables (Only overwrite if version matches) Key (≤ 64KB) Version (64b) Blob (≤ 1MB) Object

● Close to 1.0 release:  Core system becoming stable  Coordinator not yet fault-tolerant ● Original students working on dissertations ● New students staring to think about new projects June 6, 2013RAMCloud Update and OverviewSlide 5 Status at June 2013 Retreat

● RAMCloud 1.0, January 2014:  Key-value store  Low-latency RPC system (4.9 µs reads, 15.3 µs durable writes)  Log-structured storage management  1-2 second recovery from storage server crashes  Coordinator crash recovery ● New projects (see below) ● Application experiments/interest:  Graph processing: Jonathan Ellithorpe  ONOS (operating system for software-defined networks) Open Networking Laboratory  Various projects/experiments at Huawei  High-energy physics(CERN): Jakob Blomer visiting for summer  Port to NEC Atom cluster: Satoshi Matsushita June 6, 2013RAMCloud Update and OverviewSlide 6 Progress Since June 2013

● PhD dissertations:  Ryan Stutsman: “Durability and Crash Recovery in Distributed In-memory Storage Systems” Now at Microsoft Research  Steve Rumble: “Memory and Object Management in RAMCloud” Now at Google Zurich  Diego Ongaro: “Consensus: Bridging Theory and Practice” ETA summer 2014 ● Papers published:  “Log-Structured Memory for DRAM-Based Storage” Best Paper Award, FAST  “In Search of an Understandable Consensus Algorithm” USENIX ATC June 6, 2013RAMCloud Update and OverviewSlide 7 Progress, cont’d

● New papers submitted to OSDI:  “SLIK: Scalable Low-Latency Indexes for a Key-Value Store” (Ankita, Arjun, Ashish, Zhihao)  “Experience with Rules-Based Programming for Distributed, Concurrent, Fault-Tolerant Code” (Ryan, Collin) June 6, 2013RAMCloud Update and OverviewSlide 8 Progress, cont’d

Ryan StutsmanGraduated (PhD) Steve RumbleGraduated (PhD) Diego OngaroGraduating soon (PhD) Ankita Kejriwal Arjun GopalanGraduating soon (MS) Behnam Montazeri Collin LeeNew Henry QinNew Ashish GuptaNew (but leaving with MS) Seo Jin ParkNew Zhihao JiaNew (rotation only) Stephen YangRejoining Fall 2014 June 6, 2013RAMCloud Update and OverviewSlide 9 Changing of the Guard

June 6, 2013RAMCloud Update and OverviewSlide 10 New Projects First-generation RPC (based on Infiniband) Key-value store Log-structured storage management Crash recovery First-generation RPC (based on Infiniband) Key-value store Log-structured storage management Crash recovery RAMCloud 1.0 Secondary indexes Linearizability Multi-object transactions Graph support? Secondary indexes Linearizability Multi-object transactions Graph support? Higher-Level Data Model Analyze RPC latency Driver(s) for 10 GigE Clean-slate RPC redesign Analyze RPC latency Driver(s) for 10 GigE Clean-slate RPC redesign Networking Infrastructure Phase I: 2009 – 2013Phase II: 2014 – ?

● SLIK: Scalable, Low-latency Indexes for a Key-value Store ● Requires new object format: June 6, 2013RAMCloud Update and OverviewSlide 11 Secondary Indexes Key Version Value Key 0 Key 1 Value Version... Key N Old: key-value store New: multikey-value store Primary key: same as before

June 6, 2013RAMCloud Update and OverviewSlide 12 RAMCloud Operations Index Log Object Hash Table Log Primary Key Value Backups Secondary Key Range Primary Key Hash(es) Hash Table Log Hash Table Log Primary Key Hash(es) Object(s) Hash Table Index Log Backups Object Backups Secondary Key, Primary Key Hash Client Application Non-Indexed Log Indexed Read Write Master

● Status:  Preliminary limitations of most mechanism  Initial performance measurements ● Students involved:  Ankita Kejriwal (talk later today)  Arjun Gopalan  Ashish Gupta  Zhihao Jia June 6, 2013RAMCloud Update and OverviewSlide 13 SLIK, cont’d

● Holy Grail of consistency for large-scale apps June 6, 2013RAMCloud Update and OverviewSlide 14 Linearizability time Client sends request for operation Client receives response System behaves as if operation executes exactly once, instantaneously, sometime between when client sends request and receives response

June 6, 2013RAMCloud Update and OverviewSlide 15 Linearizability time x = 10 x = 5 read x OK to return either 5 or 10 time x = 10x = 5 read x Must return 5

June 6, 2013RAMCloud Update and OverviewSlide 16 Linearizability Failure Client Master if x.version == 22 then x.value = 20 key:x version:22 value:10

June 6, 2013RAMCloud Update and OverviewSlide 17 Linearizability Failure Client Master if x.version == 22 then x.value = 20 key:x version:23 value:20 success Backups key:x version:23 value:20 key:x version:23 value:20

June 6, 2013RAMCloud Update and OverviewSlide 18 Linearizability Failure Client Master if x.version == 22 then x.value = 20 key:x version:23 value:20 success Backups key:x version:23 value:20 key:x version:23 value:20 Recovery Master key:x version:23 value:20 Crash Recovery if x.version == 22 then x.value = 20 retry error: version mismatch! CRASH Must remember old results, avoid re-executing requests

● Create general-purpose infrastructure (use log to track RPC results) ● Use it to implement linearizable RPCs:  Conditional write  Multi-object transactions ● Students involved:  Seo Jin Park (talk later today)  Collin Lee  Ankita Kejriwal June 6, 2013RAMCloud Update and OverviewSlide 19 Linearizability Project

● After 4 years, still little understanding of RAMCloud latency!  What accounts for current latency?  How much can it be improved?  What are the fundamental limits?  What is the right system structure to minimize latency? ● Henry Qin starting to answer these questions June 6, 2013RAMCloud Update and OverviewSlide 20 Latency Analysis

● Infiniband reliable queue pairs:  Highest performance; our main workhorse  Reliable, in-order delivery implemented in hardware  Doesn’t support Ethernet-style networks  Driver is old, thrown-together, warty (“temporary solution”) ● Kernel TCP:  Easy to use  Too slow for real applications (50-150µs round-trips) ● FastTransport:  Custom transport for RAMCloud  Works with any underlying datagram protocol (e.g. kernel UDP)  Provides reliable, in-order, flow-controlled delivery  Not as fast as infrc, too complex, never fully debugged June 6, 2013RAMCloud Update and OverviewSlide 21 RAMCloud Transports Today

● Goal: clean-slate replacement for FastTransport:  Better latency and scalability  Replace infrc as workhorse transport  Separable from RAMCloud  “RPC for future datacenters” ● First steps (Behnam Montazeri):  Build SolarFlare datagram driver for FastTransport ● Kernel bypass for 10 GigE  Understand FastTransport weaknesses June 6, 2013RAMCloud Update and OverviewSlide 22 Transport Redesign

● Several new projects in early stages ● Talks this retreat: mostly work in progress ● Should have many interesting results over the next year June 6, 2013RAMCloud Update and OverviewSlide 23 Conclusion