Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bin Fan, David G. Andersen, Michael Kaminsky

Similar presentations


Presentation on theme: "Bin Fan, David G. Andersen, Michael Kaminsky"— Presentation transcript:

1 Bin Fan, David G. Andersen, Michael Kaminsky
MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing Bin Fan, David G. Andersen, Michael Kaminsky MemC3: Internal Improvements of memcached servers Concurrency, memory efficient and better performance Presenter: Son Nguyen

2 Memcached internal LRU caching using chaining Hashtable and doubly linked list

3 Goals Reduce space overhead (bytes/key)
Improve throughput (queries/sec) Target read-intensive workload with small objects Result: 3X throughput, 30% more objects

4 Doubly-linked-list’s problems
At least two pointers per item -> expensive Both read and write change the list’s structure -> need locking between threads (no concurrency)

5 Solution: CLOCK-based LRU
Approximate LRU Multiple readers/single writer Circular queue instead of linked list -> less space overhead

6 CLOCK example Originally: entry (ka, va) (kb, vb) (kc, vc) (kd, vd)
(ke, ve) recency 1 entry (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve) recency 1 Read(kd): entry (ka, va) (kb, vb) (kf, vf) (kd, vd) (ke, ve) recency 1 Write(kf, vf): entry (kg, vg) (kb, vb) (kf, vf) (kd, vd) (ke, ve) recency 1 Write(kg, vg):

7 Chaining Hashtable’s problems
Use linked list -> costly space overhead for pointers Pointer dereference is slow (no advantage from CPU cache) Read is not constant time (due to possibly long list)

8 Solution: Cuckoo Hashing
Use 2 hashtables Each bucket has exactly 4 slots (fits in CPU cache) Each (key, value) object therefore can reside at one of the 8 possible slots

9 Cuckoo Hashing HASH1(ka) (ka,va) HASH2(ka)

10 Cuckoo Hashing Read: always 8 lookups (constant, fast)
Write: write(ka, va) Find an empty slot in 8 possible slots of ka If all are full then randomly kick some (kb, vb) out Now find an empty slot for (kb, vb) Repeat 500 times or until an empty slot is found If still not found then do table expansion

11 Cuckoo Hashing X b a Insert a: HASH1(ka) (ka,va) HASH2(ka) X c

12 Cuckoo Hashing X a Insert b: HASH1(kb) (kb,vb) X b c HASH2(kb)

13 Cuckoo Hashing X a X Insert c: b HASH1(kc) c (kc,vc) HASH2(kc)
Done !!!

14 Cuckoo Hashing Problem: after (kb, vb) is kicked out, a reader might attempt to read (kb, vb) and get a false cache miss Solution: Compute the kick out path (Cuckoo path) first, then move items backward Before: (b,c,Null)->(a,c,Null)->(a,b,Null)->(a,b,c) Fixed: (b,c,Null)->(b,c,c)->(b,b,c)->(a,b,c)

15 Cuckoo path X b X Insert a: c HASH1(ka) (ka,va) HASH2(ka)
Disadvantage: traverse 2 times through the hashtables

16 Cuckoo path backward insert
X b a Insert a: HASH1(ka) (ka,va) HASH2(ka) X Disadvantage: traverse 2 times through the hashtables c

17 Cuckoo’s advantages Concurrency: multiple readers/single writer
Read optimized (entries fit in CPU cache) Still O(1) amortized time for write 30% less space overhead 95% table occupancy

18 Evaluation 68% throughput improvement in all hit case. 235% for all miss

19 Evaluation 3x throughput on “real” workload

20 Discussion Write is slower than chaining Hashtable
Chaining Hashtable: million keys/sec Cuckoo: 7 million keys/sec Idea: finding cuckoo path in parallel Benchmark doesn’t show much improvement Can we make it write-concurrent?


Download ppt "Bin Fan, David G. Andersen, Michael Kaminsky"

Similar presentations


Ads by Google