RAMCloud 1.0 John Ousterhout Stanford University (with Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin.

Slides:



Advertisements
Similar presentations
MinCopysets: Derandomizing Replication in Cloud Storage
Advertisements

Fast Crash Recovery in RAMCloud
High throughput chain replication for read-mostly workloads
The google file system Cs 595 Lecture 9.
Memory and Object Management in a Distributed RAM-based Storage System Thesis Proposal Steve Rumble April 23 rd, 2012.
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2014.
RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Nandu Jayakumar, Diego Ongaro, Mendel Rosenblum,
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
RAMCloud Scalable High-Performance Storage Entirely in DRAM John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières,
Log-Structured Memory for DRAM-Based Storage Stephen Rumble, Ankita Kejriwal, and John Ousterhout Stanford University.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
SEDCL: Stanford Experimental Data Center Laboratory.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
CS 142 Lecture Notes: Large-Scale Web ApplicationsSlide 1 RAMCloud Overview ● Storage for datacenters ● commodity servers ● GB DRAM/server.
Distributed storage for structured data
Metrics for RAMCloud Recovery John Ousterhout Stanford University.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
The RAMCloud Storage System
Memory and Object Management in RAMCloud Steve Rumble December 3 rd, 2013.
RAMCloud Design Review Recovery Ryan Stutsman April 1,
A Compilation of RAMCloud Slides (through Oct. 2012) John Ousterhout Stanford University.
It’s Time for Low Latency Steve Rumble, Diego Ongaro, Ryan Stutsman, Mendel Rosenblum, John Ousterhout Stanford University 報告者 : 厲秉忠
What We Have Learned From RAMCloud John Ousterhout Stanford University (with Asaf Cidon, Ankita Kejriwal, Diego Ongaro, Mendel Rosenblum, Stephen Rumble,
1 The Google File System Reporter: You-Wei Zhang.
RAMCloud Overview John Ousterhout Stanford University.
RAMCloud: a Low-Latency Datacenter Storage System John Ousterhout Stanford University
RAMCloud: Concept and Challenges John Ousterhout Stanford University.
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
Some key-value stores using log-structure Zhichao Liang LevelDB Riak.
Cool ideas from RAMCloud Diego Ongaro Stanford University Joint work with Asaf Cidon, Ankita Kejriwal, John Ousterhout, Mendel Rosenblum, Stephen Rumble,
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Forum January, 2015.
VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,
RAMCloud: System Performance Measurements (Jun ‘11) Nandu Jayakumar
Low-Latency Datacenters John Ousterhout Platform Lab Retreat May 29, 2015.
Durability and Crash Recovery for Distributed In-Memory Storage Ryan Stutsman, Asaf Cidon, Ankita Kejriwal, Ali Mashtizadeh, Aravind Narayanan, Diego Ongaro,
RAMCloud: Low-latency DRAM-based storage Jonathan Ellithorpe, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro,
RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Christos Kozyrakis, David Mazières, Aravind Narayanan,
The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
Presenters: Rezan Amiri Sahar Delroshan
CS 140 Lecture Notes: Technology and Operating Systems Slide 1 Technology Changes Mid-1980’s2012Change CPU speed15 MHz2.5 GHz167x Memory size8 MB4 GB500x.
Serverless Network File Systems Overview by Joseph Thompson.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Experiences With RAMCloud Applications John Ousterhout, Jonathan Ellithorpe, Bob Brown.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
RAMCloud Overview and Status John Ousterhout Stanford University.
SEDCL, June 7-8, Table Enumeration in RAMCloud Elliott Slaughter.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Implementing Linearizability at Large Scale and Low Latency Collin Lee, Seo Jin Park, Ankita Kejriwal, † Satoshi Matsushita, John Ousterhout Platform Lab.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
)1()1( Presenter: Noam Presman Advanced Topics in Storage Systems – Semester B 2013 Authors: A.Cidon, R.Stutsman, S.Rumble, S.Katti,
Bigtable: A Distributed Storage System for Structured Data
Embedded System Lab. 정영진 The Design and Implementation of a Log-Structured File System Mendel Rosenblum and John K. Ousterhout ACM Transactions.
CSci8211: Distributed Systems: RAMCloud 1 Distributed Shared Memory/Storage Case Study: RAMCloud Developed by Stanford Platform Lab  Key Idea: Scalable.
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2013.
The Raft Consensus Algorithm Diego Ongaro and John Ousterhout Stanford University.
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
RAMCloud and the Low-Latency Datacenter John Ousterhout Stanford Platform Laboratory.
The Stanford Platform Laboratory John Ousterhout and Guru Parulkar Stanford University
Bigtable A Distributed Storage System for Structured Data.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble and John Ousterhout Stanford University.
CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.
Google Filesystem Some slides taken from Alan Sussman.
Google File System CSE 454 From paper by Ghemawat, Gobioff & Leung.
RAMCloud Architecture
RAMCloud Architecture
The Design and Implementation of a Log-Structured File System
Presentation transcript:

RAMCloud 1.0 John Ousterhout Stanford University (with Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin Park, Henry Qin, Mendel Rosenblum, Stephen Rumble, and Ryan Stutsman)

● RAMCloud project in transition ● Phase 1 complete:  RAMCloud 1.0 released ● Key-value store ● Memory management ● Fast crash recovery  First PhD students graduating ● Phase 2 starting up:  New projects: ● Higher-level data models ● Datacenter RPC revisited  Many new students January 21, 2014RAMCloud 1.0 (SEDCL Forum)Slide 2 Overview

January 21, 2014RAMCloud 1.0 (SEDCL Forum)Slide 3 RAMCloud Architecture Master Backup Master Backup Master Backup Master Backup … Appl. Library Appl. Library Appl. Library Appl. Library … Datacenter Network Coordinator 1000 – 10,000 Storage Servers 1000 – 100,000 Application Servers Commodity Servers GB per server High-speed networking: ● 5 µs round-trip ● Full bisection bwidth

Data Model: Key-Value Store ● Basic operations:  read(tableId, key) => blob, version  write(tableId, key, blob) => version  delete(tableId, key) ● Other operations:  cwrite(tableId, key, blob, version) => version  Enumerate objects in table  Efficient multi-read, multi-write  Atomic increment ● Not currently available:  Atomic updates of multiple objects  Secondary indexes January 21, 2014RAMCloud 1.0 (SEDCL Forum)Slide 4 Tables (Only overwrite if version matches) Key (≤ 64KB) Version (64b) Blob (≤ 1MB) Object

● Using Infiniband networking  24 Gb/sec effective bandwidth  Kernel bypass  Can also use other networking, but slower ● Reads:  100B objects: 5µs  10KB objects: 10µs  Single-server throughput (100B objects): 700 Kops/sec.  Small-object multi-reads: 1-2M objects/sec. ● Writes:  100B objects: 15µs  10KB objects: 40µs January 21, 2014RAMCloud 1.0 (SEDCL Forum)Slide 5 Performance

● How to manage objects in DRAM?  High write performance  High memory efficiency (80-90% utilization) ● Uniform log structure for all info (both DRAM and disk)  Log cleaner => incremental generational garbage collector ● Innovative aspects:  2-level cleaning (different policies for DRAM, disk)  Parallel cleaning (hides cost of cleaning)  Improved LFS segment selection formula ● Paper in FAST 2014, dissertation nearing completion; Steve is working at Google Zurich January 21, 2014RAMCloud 1.0 (SEDCL Forum)Slide 6 Steve Rumble’s Dissertation

January 21, 2014RAMCloud 1.0 (SEDCL Forum)Slide 7 Existing Allocators Waste Memory ● Allocators waste memory if workloads change:  E.g., W2 (simulates schema change): ● Allocate 100B objects ● Gradually overwrite with 130B objects ● All existing allocators waste at least 50% of memory under some conditions

January 21, 2014RAMCloud 1.0 (SEDCL Forum)Slide 8 Client Write Throughput MemoryPerformance UtilizationDegradation 80%17-27% 90%26-49% 80%14-15% 90%30-42% 80%3-4% 90%3-6%

Durability and availability for data in DRAM ● Fault-tolerant log for each master:  Decentralized log management  Finding log after crashes  Restoring redundancy after backup crashes ● Fast crash recovery:  Use thousands of servers concurrently to reload data from a crashed master  1-2 second recovery ● Handling simultaneous master/backup failures ● Dissertation filed December 2013 Ryan is now a post-doc at Microsoft Research January 21, 2014RAMCloud 1.0 (SEDCL Forum)Slide 9 Ryan Stutsman’s Dissertation

● Will improve with newer machines  More cores (our nodes: 4 cores)  More memory bandwidth (our nodes: 11 GB/sec) ● Bottom line: recover 500 MB/sec/server January 21, 2014RAMCloud 1.0 (SEDCL Forum)Slide 10 Recovery Scalability 80 masters 160 backups 160 SSDs 40 GB 80 masters 160 backups 160 SSDs 40 GB 1 master 2 backups 2 SSDs 500 MB 1 master 2 backups 2 SSDs 500 MB

Coordinator Crash Recovery ● Active/standby model ● Use replicated external storage for configuration data:  Cluster membership  Table metadata ● Distributed “commit” mechanism:  Record intent in storage  Notify relevant servers  (Eventually) mark storage “completed”  During restart, find and finish uncompleted ops January 21, 2014 RAMCloud 1.0 (SEDCL Forum) Slide 11 Standby Coordinator ZooKeeper Active Coordinator

● Table configuration management:  Tablets not moved after creation  RAMCloud has mechanisms for splitting, migrating tablet  No policy code yet ● Crude reconfiguration: crash server!  Tablets split among many other servers January 21, 2014RAMCloud 1.0 (SEDCL Forum)Slide 12 Missing from RAMCloud 1.0

● Goal: higher-level data model than just key-value store:  Secondary indexes  Transactions spanning multiple objects and servers  Graph-processing primitives (sets) ● Can RAMCloud support these without sacrificing  Latency?  Scalability? ● First project: secondary indexes (SLIK) (Arjun Gopalan, Ashish Gupta, Ankita Kejriwal)  Design complete, implementation underway  Ankita will discuss design issues January 21, 2014RAMCloud 1.0 (SEDCL Forum)Slide 13 New Project: Data Model

Complete redesign of RAMCloud RPC ● General purpose (not just RAMCloud) ● Latency:  Analyze, reduce latency  Explore alternative threading strategies  Optimize for kernel bypass ● Scale:  Support 1M clients/server (minimal state/connection)  Congestion control: reservation based? ● Initial projects underway (Behnam Montazeri, Henry Qin)  Analyze latency of existing RPCs  Support for pFabric, SolarFlare NIC January 21, 2014RAMCloud 1.0 (SEDCL Forum)Slide 14 New Work: Datacenter RPC

● Raft: new consensus protocol (Diego Ongaro) ● Graph processing on RAMCloud (Jonathan Ellithorpe) ● Using a rule-based approach for distributed, concurrent, fault-tolerant code (Collin Lee, Ryan Stutsman) January 21, 2014RAMCloud 1.0 (SEDCL Forum)Slide 15 Other Projects

● We now have a usable system ● Still many open research problems ● Real usage should generate additional information, ideas January 21, 2014RAMCloud 1.0 (SEDCL Forum)Slide 16 Conclusion