RAMCloud Overview and Status John Ousterhout Stanford University.

RAMCloud Overview and Status John Ousterhout Stanford University

DRAM in Storage Systems June 3, 2011RAMCloud Overview & StatusSlide 2 19701980199020002010 UNIX buffer cache Main-memory databases Large file caches Web indexes entirely in DRAM memcached Facebook: 200 TB total data 150 TB cache! Main-memory DBs, again

DRAM in Storage Systems ● DRAM usage limited/specialized ● Clumsy (must manage consistency with backing store) ● Lost performance (cache misses, slow writes) June 3, 2011RAMCloud Overview & StatusSlide 3 19701980199020002010 UNIX buffer cache Main-memory databases Large file caches Web indexes entirely in DRAM memcached Facebook: 200 TB total data 150 TB cache! Main-memory DBs, again

Harness full performance potential of large-scale DRAM storage: ● General-purpose storage system ● All data always in DRAM (no cache misses) ● Durable and available (no backing store) ● Scale: 1000+ servers, 100+ TB ● Low latency: 5-10µs remote access Potential impact: enable new class of applications June 3, 2011RAMCloud Overview & StatusSlide 4 RAMCloud

May 27, 2011RAMCloud: GSRC Mid-Year ReviewSlide 5 RAMCloud Architecture Master Backup Master Backup Master Backup Master Backup … Appl. Library Appl. Library Appl. Library Appl. Library … Datacenter Network Coordinator 1000 – 10,000 Storage Servers 1000 – 100,000 Application Servers 32-64GB/server

create(tableId, blob) => objectId, version read(tableId, objectId) => blob, version write(tableId, objectId, blob) => version cwrite(tableId, objectId, blob, version) => version delete(tableId, objectId) June 3, 2011RAMCloud Overview & StatusSlide 6 Data Model Tables Identifier (64b) Version (64b) Blob (≤1MB) Object (Only overwrite if version matches) Richer model in the future: Indexes? Transactions? Graphs? Richer model in the future: Indexes? Transactions? Graphs?

June 3, 2011RAMCloud Overview & Status Slide 7 RPC Transport Architecture TcpTransport InfRcTransport FastTransport Kernel TCP/IPInfiniband verbs Reliable queue pairs Kernel bypass Mellanox NICs UdpDriver InfUdDriver Kernel UDP Infiniband unreliable datagrams Transport API: Reliable request/response ClientsServers getSession(serviceLocator) clientSend(reqBuf, respBuf) wait() handleRpc(reqBuf, respBuf) Driver API: Unreliable datagrams InfEthDriver 10GigE packets via Mellanox NIC

● Implemented skeletal system  Fast RPC  Log-structured data management  Simple servers  But, not yet complete enough for production use ● Installed 40-node cluster  Mellanox Infiniband (32 Gb/sec, NICs bypass kernel)  10G Ethernet (Arista switch) ● Demonstrated fast recovery  Why? Only one copy of data in DRAM  Goal: recover 64GB from a failed server in 1-2 seconds  Basic recovery mechanism works, seems to scale  Submitted paper to SOSP June 3, 2011RAMCloud Overview & StatusSlide 8 Progress over the Last Year

June 3, 2011RAMCloud Overview & StatusSlide 9 Implementation Status Throw-away first version A few ideas Mature First real implementation RPC Architecture Recovery: Masters Master Server Threading Cluster Coordinator Log Cleaning Backup Server Higher-level Data Model Recovery: Backups Performance Tools RPC Transports Failure Detection Multi-object Transactions Multi-Tenancy Access Control/Security Split/move Tablets Tablet Placement Administration Tools Recovery: Coordinator Cold Start Tub Dissertation- ready (ask Diego)

Code36,900 lines Unit tests16,500 lines Total53,400 lines June 3, 2011RAMCloud Overview & StatusSlide 10 RAMCloud Code Size

● Latency for 100-byte reads (1 switch): InfRc4.9 µs TCP (1GigE)92 µs TCP (Infiniband)47 µs Fast + UDP (1GigE)91 µs Fast + UDP (Infiniband)44 µs Fast + InfUd4.9 µs ● Server throughput (InfRc, 100-byte reads, one core): 1.05 × 10 6 requests/sec ● Recovery time (6.6GB data, 11 recovery masters, 66 backups) 1.15 sec June 3, 2011RAMCloud Overview & StatusSlide 11 Selected Performance Metrics

● Fast RPC is within reach ● NIC is biggest long-term bottleneck: must integrate with CPU ● Can recover fast enough that replication isn’t needed for availability ● Randomized approaches are key to scalable distributed decision-making June 3, 2011RAMCloud Overview & StatusSlide 12 Lessons/Conclusions (so far)

● Get experience with applications  Joint project at Facebook over summer  Finish “least usable system” ● Pick next research question(s) to address  What is the right transport protocol for the datacenter?  Cluster management?  Higher-level operations? June 3, 2011RAMCloud Overview & StatusSlide 13 Plans for the Next Year

● Performance measurements Nandu Jayakumar ● Fast recovery Ryan Stutsman Diego Ongaro ● Simulating larger RAMCloud clusters Asaf Cidon ● RAMCloud’s transports Diego Ongaro ● Multi-read operations Ankita Kejriwal ● Tablet profiling Steve Rumble ● Low-level latency measurements Mario Flajslik June 3, 2011RAMCloud Overview & StatusSlide 14 Upcoming RAMCloud Talks

RAMCloud Overview and Status John Ousterhout Stanford University.

Similar presentations

Presentation on theme: "RAMCloud Overview and Status John Ousterhout Stanford University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

RAMCloud Overview and Status John Ousterhout Stanford University.

Similar presentations

Presentation on theme: "RAMCloud Overview and Status John Ousterhout Stanford University."— Presentation transcript:

Similar presentations

About project

Feedback