M AINTAINING L ARGE A ND F AST S TREAMING I NDEXES O N F LASH Aditya Akella, UW-Madison First GENI Measurement Workshop Joint work with Ashok Anand, Steven.

M AINTAINING L ARGE A ND F AST S TREAMING I NDEXES O N F LASH Aditya Akella, UW-Madison First GENI Measurement Workshop Joint work with Ashok Anand, Steven Kappes (UW-Madison) and Suman Nath (MSR) 1

Memory & storage technologies 2 Question: What is the role of emerging memory/storage technologies in supporting current and future measurements and applications? This talk: Role of flash memory in supporting applications/measurements that need large streaming indexes; Improving current apps and enabling future apps

Streaming stores and indexes Motivating apps/scenarios Caching, content-based networks (DOT), WAN optimization, de- duplication Large-scale & fine-grained measurements E.g., IP Mon: compute per packet queuing delays Fast correlations across large collections of netflow records Index features Streaming: Data stored in a streaming fashion, maintain online index for fast access Expire old data, update index constantly Large size: Data store ~ several TB, index ~ 100s of GB Need for speed (fast reads and writes) Impacts usefulness of caching applications, timeliness of fine-grained TE 3

Index workload Key aspects Index lookups and writes are random Equal mix of reads/writes New data replaces some old data fast, constant expiry Index data structures Tree-like (B-tree) and log structures not suitable Slow lookup (e.g. log(n) complexity in trees) Poor support for flexible, fast garbage collection Hash tables ideal… … But current options for large streaming hash tables not optimal 4

Current options for >100GB Hashtables DRAM: large DRAMs expensive and can get very hot Disk: inexpensive, but too slow Flash provides a good balance between cost, performance, power efficiency… Bigger and more energy efficient than DRAM Comparable to disk in price >2 orders of magnitude faster than disk, if used carefully But… need appropriate data structures to maximize flash effectiveness and overcome inefficiencies 5

Flash properties Flash chips: Layout – large number of blocks (128KB), each block has multiple pages (2KB) Read/write granularity: page, erase granularity: block Read page: 50us, write page: 400us, block erase: 1ms Cheap: Any read including random, sequential write Expensive: random writes/overwrites, sub-block deletion Requires movement of valid pages from block to erase SSDs: disk like interface for flash Sequential/random read, sequential write: 80us Random write: 8ms 6 Flash good for hashtable lookups Insertions are hard small random overwrites Expiration is hard small random deletes

BufferHash data structure Batch expensive operations – random writes and deletes – on flash Maintain a hierarchy of small hashtables Maintain upper levels in DRAM Efficient insertion Accumulate random updates in memory Flush accumulated updates to lower level in flash (at the granularity of a flash page) Efficient deletion Delete in batch (at flash block granularity) Amortizes deletion cost 7

Handling small random updates … 2^k Buffers Each table uses N-bits key DRAM Flash K bitsN bits Hash key HT IndexHT key 1.Buffer small random updates in DRAM as small hashtables (buffers) 2.When a HT is full, write it to flash, without modifying existing data Each super table is a collection of small hashtables different incarnations over time of the same buffer How to search them? Use (bit-sliced) bloom filters Bit-sliced Bloom filter Super Table 8

Lookup Let key = Check the k 1 th hashtable in memory for the key k 2 If not found, use the bloom filters to decide which hashtable h of the k 1 th supertable may contain the key k 2 Read and check the hashtable (e.g., in hth page of k 1 th block of flash) 9

Expiry of hash entries A supertable is a collection of hashtables Expire the oldest hashtable from a supertable Option 1: use a flash block as a circular queue Supertable = flash block, hashtable = flash page Delete oldest hashtable incarnation (page) and replace it with a new one If a flash block has p pages, supertable has p latest hashtables Problem: a page can not be independently deleted without deleting the block (requires copying other pages) 10

Handle expiry of hash entries Interleave pages from different supertables when writing to flash or SSD Instead of 12341234123412341111222233334444 Do this … Advantage: batch deletion of multiple oldest incarnations Other flexible expiration policies can also be supported 11

Insertion Key = Insert into k 1 th hashtable in-memory, using k 2 as the key If the hashtable is full Expire the tail hashtable in k 1 th supertable This expires the oldest incarnation from all supertables Copy k 1 th hashtable from memory to the head of k 1 th supertable 12

Benchmarks 13 Prototyped BufferHash on 2 SSDs and hard drive 99 th percentile read and write latencies under 0.1ms Two orders of magnitude better than disks, at roughly similar cost Built a WAN accelerator that is 3X better than current designs Theoretical results on tuning BufferHash parameters Low bloom filter false positives, low lookup cost, low deletion cost on average Optimal buffer size

Conclusion 14 Many emerging apps and important measurement problems need fast streaming indexes with constant read/write/eviction Flash provides a good hardware platform to maintain such indexes BufferHash helps maximize flash effectiveness and overcome efficiencies Open issues: Role of flash in other measurement problems/architectures? Role of other emerging memory/storage technologies (e.g. PCM)? How to leverage persistence?

I/O operations API Data store/index: StoreData(data) Add data to store; Create/update index with data_name Data store: address Lookup(data_name) Data store: Data ReadData(address) Data store/index: ExpireOldData() Remove old data from store; clean up index Workload data_name is a hash over data Index lookups and writes are random Equal mix of reads/writes Index data structures Tree-like and log structures not suitable Hash tables ideal, but current options for large streaming hash tables not optimal… 15

M AINTAINING L ARGE A ND F AST S TREAMING I NDEXES O N F LASH Aditya Akella, UW-Madison First GENI Measurement Workshop Joint work with Ashok Anand, Steven.

Similar presentations

Presentation on theme: "M AINTAINING L ARGE A ND F AST S TREAMING I NDEXES O N F LASH Aditya Akella, UW-Madison First GENI Measurement Workshop Joint work with Ashok Anand, Steven."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

M AINTAINING L ARGE A ND F AST S TREAMING I NDEXES O N F LASH Aditya Akella, UW-Madison First GENI Measurement Workshop Joint work with Ashok Anand, Steven.

Similar presentations

Presentation on theme: "M AINTAINING L ARGE A ND F AST S TREAMING I NDEXES O N F LASH Aditya Akella, UW-Madison First GENI Measurement Workshop Joint work with Ashok Anand, Steven."— Presentation transcript:

Similar presentations

About project

Feedback