Download presentation
Presentation is loading. Please wait.
Published byGeorgia White Modified over 8 years ago
1
RAMCloud Overview and Status John Ousterhout Stanford University
2
DRAM in Storage Systems June 3, 2011RAMCloud Overview & StatusSlide 2 19701980199020002010 UNIX buffer cache Main-memory databases Large file caches Web indexes entirely in DRAM memcached Facebook: 200 TB total data 150 TB cache! Main-memory DBs, again
3
DRAM in Storage Systems ● DRAM usage limited/specialized ● Clumsy (must manage consistency with backing store) ● Lost performance (cache misses, slow writes) June 3, 2011RAMCloud Overview & StatusSlide 3 19701980199020002010 UNIX buffer cache Main-memory databases Large file caches Web indexes entirely in DRAM memcached Facebook: 200 TB total data 150 TB cache! Main-memory DBs, again
4
Harness full performance potential of large-scale DRAM storage: ● General-purpose storage system ● All data always in DRAM (no cache misses) ● Durable and available (no backing store) ● Scale: 1000+ servers, 100+ TB ● Low latency: 5-10µs remote access Potential impact: enable new class of applications June 3, 2011RAMCloud Overview & StatusSlide 4 RAMCloud
5
May 27, 2011RAMCloud: GSRC Mid-Year ReviewSlide 5 RAMCloud Architecture Master Backup Master Backup Master Backup Master Backup … Appl. Library Appl. Library Appl. Library Appl. Library … Datacenter Network Coordinator 1000 – 10,000 Storage Servers 1000 – 100,000 Application Servers 32-64GB/server
6
create(tableId, blob) => objectId, version read(tableId, objectId) => blob, version write(tableId, objectId, blob) => version cwrite(tableId, objectId, blob, version) => version delete(tableId, objectId) June 3, 2011RAMCloud Overview & StatusSlide 6 Data Model Tables Identifier (64b) Version (64b) Blob (≤1MB) Object (Only overwrite if version matches) Richer model in the future: Indexes? Transactions? Graphs? Richer model in the future: Indexes? Transactions? Graphs?
7
June 3, 2011RAMCloud Overview & Status Slide 7 RPC Transport Architecture TcpTransport InfRcTransport FastTransport Kernel TCP/IPInfiniband verbs Reliable queue pairs Kernel bypass Mellanox NICs UdpDriver InfUdDriver Kernel UDP Infiniband unreliable datagrams Transport API: Reliable request/response ClientsServers getSession(serviceLocator) clientSend(reqBuf, respBuf) wait() handleRpc(reqBuf, respBuf) Driver API: Unreliable datagrams InfEthDriver 10GigE packets via Mellanox NIC
8
● Implemented skeletal system Fast RPC Log-structured data management Simple servers But, not yet complete enough for production use ● Installed 40-node cluster Mellanox Infiniband (32 Gb/sec, NICs bypass kernel) 10G Ethernet (Arista switch) ● Demonstrated fast recovery Why? Only one copy of data in DRAM Goal: recover 64GB from a failed server in 1-2 seconds Basic recovery mechanism works, seems to scale Submitted paper to SOSP June 3, 2011RAMCloud Overview & StatusSlide 8 Progress over the Last Year
9
June 3, 2011RAMCloud Overview & StatusSlide 9 Implementation Status Throw-away first version A few ideas Mature First real implementation RPC Architecture Recovery: Masters Master Server Threading Cluster Coordinator Log Cleaning Backup Server Higher-level Data Model Recovery: Backups Performance Tools RPC Transports Failure Detection Multi-object Transactions Multi-Tenancy Access Control/Security Split/move Tablets Tablet Placement Administration Tools Recovery: Coordinator Cold Start Tub Dissertation- ready (ask Diego)
10
Code36,900 lines Unit tests16,500 lines Total53,400 lines June 3, 2011RAMCloud Overview & StatusSlide 10 RAMCloud Code Size
11
● Latency for 100-byte reads (1 switch): InfRc4.9 µs TCP (1GigE)92 µs TCP (Infiniband)47 µs Fast + UDP (1GigE)91 µs Fast + UDP (Infiniband)44 µs Fast + InfUd4.9 µs ● Server throughput (InfRc, 100-byte reads, one core): 1.05 × 10 6 requests/sec ● Recovery time (6.6GB data, 11 recovery masters, 66 backups) 1.15 sec June 3, 2011RAMCloud Overview & StatusSlide 11 Selected Performance Metrics
12
● Fast RPC is within reach ● NIC is biggest long-term bottleneck: must integrate with CPU ● Can recover fast enough that replication isn’t needed for availability ● Randomized approaches are key to scalable distributed decision-making June 3, 2011RAMCloud Overview & StatusSlide 12 Lessons/Conclusions (so far)
13
● Get experience with applications Joint project at Facebook over summer Finish “least usable system” ● Pick next research question(s) to address What is the right transport protocol for the datacenter? Cluster management? Higher-level operations? June 3, 2011RAMCloud Overview & StatusSlide 13 Plans for the Next Year
14
● Performance measurements Nandu Jayakumar ● Fast recovery Ryan Stutsman Diego Ongaro ● Simulating larger RAMCloud clusters Asaf Cidon ● RAMCloud’s transports Diego Ongaro ● Multi-read operations Ankita Kejriwal ● Tablet profiling Steve Rumble ● Low-level latency measurements Mario Flajslik June 3, 2011RAMCloud Overview & StatusSlide 14 Upcoming RAMCloud Talks
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.