Bin Fan, David G. Andersen, Michael Kaminsky

Slides:

Advertisements

Similar presentations

Cache and Virtual Memory Replacement Algorithms

Advertisements

Chapter 101 The LRU Policy Replaces the page that has not been referenced for the longest time in the past By the principle of locality, this would be.

SILT: A Memory-Efficient, High-Performance Key-Value Store

Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.

Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.

Cuckoo Filter: Practically Better Than Bloom

MICA: A Holistic Approach to Fast In-Memory Key-Value Storage

1 Memory Management Managing memory hierarchies. 2 Memory Management Ideally programmers want memory that is –large –fast –non volatile –transparent Memory.

A Look at Modern Dictionary Structures & Algorithms Warren Hunt.

Cuckoo Hashing : Hardware Implementations Adam Kirsch Michael Mitzenmacher.

Lock-free Cuckoo Hashing Nhan Nguyen & Philippas Tsigas ICDCS 2014 Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden.

1 Virtual Memory in the Real World Implementing exact LRU Approximating LRU Hardware Support Clock Algorithm Thrashing Cause Working Set.

Hash Table indexing and Secondary Storage Hashing.

1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.

1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.

1 Virtual Memory in the Real World Implementing exact LRU Approximating LRU Hardware Support Clock Algorithm Thrashing Cause Working Set.

Hashing. Searching Consider the problem of searching an array for a given value –If the array is not sorted, the search requires O(n) time If the value.

Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen.

FAWN: A Fast Array of Wimpy Nodes Presented by: Aditi Bose & Hyma Chilukuri.

1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.

Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.

1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.

Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)

CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.

CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

FAWN: A Fast Array of Wimpy Nodes Presented by: Clint Sbisa & Irene Haque.

Hashing General idea: Get a large array

Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.

Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.

Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.

Data Structures Hashing Uri Zwick January 2014.

Optimizing RAM-latency Dominated Applications

Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.

CMPE 421 Parallel Computer Architecture

Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.

Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.

Lecture Topics: 11/17 Page tables TLBs Virtual memory flat page tables

Operating Systems COMP 4850/CISG 5550 Page Tables TLBs Inverted Page Tables Dr. James Money.

10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.

1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.

HASH TABLES -Paritosh Gupta. Problem. Required Search for The Precious One way would be to map all the data. And get key-value pairs. This means providing.

Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.

Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.

Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.

C-Hint: An Effective and Reliable Cache Management for RDMA- Accelerated Key-Value Stores Yandong Wang, Xiaoqiao Meng, Li Zhang, Jian Tan Presented by:

Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.

Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.

Parallel and Distributed Simulation Time Parallel Simulation.

A BRIEF INTRODUCTION TO CACHE LOCALITY YIN WEI DONG 14 SS.

Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,

CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.

1 ECE 526 – Network Processing Systems Design System Implementation Principles I Varghese Chapter 3.

University of Toronto Department of Electrical And Computer Engineering Jason Zebchuk RegionTracker: Optimizing On-Chip Cache.

Cuckoo Filter: Practically Better Than Bloom Author: Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher Publisher: ACM CoNEXT 2014 Presenter:

CS4432: Database Systems II

Algorithmic Improvements for Fast Concurrent Cuckoo Hashing

Memory COMPUTER ARCHITECTURE

MemCache Widely used for high-performance Easy to use.

Be Fast, Cheap and in Control

Andy Wang Operating Systems COP 4610 / CGS 5765

Data Structures and Algorithms

Andy Wang Operating Systems COP 4610 / CGS 5765

Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures

Operating Systems CMPSC 473

CS703 - Advanced Operating Systems

Sarah Diesburg Operating Systems CS 3430

Andy Wang Operating Systems COP 4610 / CGS 5765

Sarah Diesburg Operating Systems COP 4610

Presentation transcript:

Bin Fan, David G. Andersen, Michael Kaminsky MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing Bin Fan, David G. Andersen, Michael Kaminsky MemC3: Internal Improvements of memcached servers Concurrency, memory efficient and better performance Presenter: Son Nguyen

Memcached internal LRU caching using chaining Hashtable and doubly linked list

Goals Reduce space overhead (bytes/key) Improve throughput (queries/sec) Target read-intensive workload with small objects Result: 3X throughput, 30% more objects

Doubly-linked-list’s problems At least two pointers per item -> expensive Both read and write change the list’s structure -> need locking between threads (no concurrency)

Solution: CLOCK-based LRU Approximate LRU Multiple readers/single writer Circular queue instead of linked list -> less space overhead

CLOCK example Originally: entry (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve) recency 1 entry (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve) recency 1 Read(kd): entry (ka, va) (kb, vb) (kf, vf) (kd, vd) (ke, ve) recency 1 Write(kf, vf): entry (kg, vg) (kb, vb) (kf, vf) (kd, vd) (ke, ve) recency 1 Write(kg, vg):

Chaining Hashtable’s problems Use linked list -> costly space overhead for pointers Pointer dereference is slow (no advantage from CPU cache) Read is not constant time (due to possibly long list)

Solution: Cuckoo Hashing Use 2 hashtables Each bucket has exactly 4 slots (fits in CPU cache) Each (key, value) object therefore can reside at one of the 8 possible slots

Cuckoo Hashing HASH1(ka) (ka,va) HASH2(ka)

Cuckoo Hashing Read: always 8 lookups (constant, fast) Write: write(ka, va) Find an empty slot in 8 possible slots of ka If all are full then randomly kick some (kb, vb) out Now find an empty slot for (kb, vb) Repeat 500 times or until an empty slot is found If still not found then do table expansion

Cuckoo Hashing X b a Insert a: HASH1(ka) (ka,va) HASH2(ka) X c

Cuckoo Hashing X a Insert b: HASH1(kb) (kb,vb) X b c HASH2(kb)

Cuckoo Hashing X a X Insert c: b HASH1(kc) c (kc,vc) HASH2(kc) Done !!!

Cuckoo Hashing Problem: after (kb, vb) is kicked out, a reader might attempt to read (kb, vb) and get a false cache miss Solution: Compute the kick out path (Cuckoo path) first, then move items backward Before: (b,c,Null)->(a,c,Null)->(a,b,Null)->(a,b,c) Fixed: (b,c,Null)->(b,c,c)->(b,b,c)->(a,b,c)

Cuckoo path X b X Insert a: c HASH1(ka) (ka,va) HASH2(ka) Disadvantage: traverse 2 times through the hashtables

Cuckoo path backward insert X b a Insert a: HASH1(ka) (ka,va) HASH2(ka) X Disadvantage: traverse 2 times through the hashtables c

Cuckoo’s advantages Concurrency: multiple readers/single writer Read optimized (entries fit in CPU cache) Still O(1) amortized time for write 30% less space overhead 95% table occupancy

Evaluation 68% throughput improvement in all hit case. 235% for all miss

Evaluation 3x throughput on “real” workload

Discussion Write is slower than chaining Hashtable Chaining Hashtable: 14.38 million keys/sec Cuckoo: 7 million keys/sec Idea: finding cuckoo path in parallel Benchmark doesn’t show much improvement Can we make it write-concurrent?