Hash Tables With Finite Buckets Are Less Resistant to Deletions Yossi Kanizo (Technion, Israel) Joint work with David Hay (Columbia U. and Hebrew U.) and.

Slides:



Advertisements
Similar presentations
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan And improvements with Kai-Min Chung.
Advertisements

Compressing Forwarding Tables Ori Rottenstreich (Technion, Israel) Joint work with Marat Radan, Yuval Cassuto, Isaac Keslassy (Technion, Israel) Carmi.
Multiple Choice Hash Tables with Moves on Deletes and Inserts Adam Kirsch Michael Mitzenmacher.
The strength of routing Schemes. Main issues Eliminating the buzz: Are there real differences between forwarding schemes: OSPF vs. MPLS? Can we quantify.
Hadi Goudarzi and Massoud Pedram
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Tutorial 8 CSI 2132 Database I. Exercise 1 Both disks and main memory support direct access to any desired location (page). On average, main memory accesses.
Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Hebrew Univ., Israel)
Short-Term Fairness and Long- Term QoS Lei Ying ECE dept, Iowa State University, Joint work with Bo Tan, UIUC and R. Srikant, UIUC.
EE 685 presentation Optimal Control of Wireless Networks with Finite Buffers By Long Bao Le, Eytan Modiano and Ness B. Shroff.
Wenye Wang Xinbing Wang Arne Nilsson Department of Electrical and Computer Engineering, NC State University March 2005 A New Admission Control Scheme under.
An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh.
Cuckoo Hashing : Hardware Implementations Adam Kirsch Michael Mitzenmacher.
PRESENTED BY: ILYA NELKENBAUM KEREN ARMON SUPERVISOR: MR. YOSSI KANIZO 09/03/2011 Cuckoo the Kicking Bird 1.
Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Cuckoo Hashing and CAMs Michael Mitzenmacher. Background For the past several years, I have had funding from Cisco to research hash tables and related.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Lecture 10: Search Structures and Hashing
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Hashing and Packet Level Algorithms
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Asaf Cohen (joint work with Rami Atar) Department of Mathematics University of Michigan Financial Mathematics Seminar University of Michigan March 11,
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.
Decentralised load balancing in closed and open systems A. J. Ganesh University of Bristol Joint work with S. Lilienthal, D. Manjunath, A. Proutiere and.
Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.
Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.
1 Chapter 5 Flow Lines Types Issues in Design and Operation Models of Asynchronous Lines –Infinite or Finite Buffers Models of Synchronous (Indexing) Lines.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley.
Palette: Distributing Tables in Software-Defined Networks Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
Database Management 7. course. Reminder Disk and RAM RAID Levels Disk space management Buffering Heap files Page formats Record formats.
Approximating the Performance of Call Centers with Queues using Loss Models Ph. Chevalier, J-Chr. Van den Schrieck Université catholique de Louvain.
Author: Heeyeol Yu and Rabi Mahapatra
Author : N. Sertac Artan, Haowei Yuan, and H. Jonathan Chao Publisher/Conf : IEEE GLOBECOM 2008 Speaker : Chen Deyu Data :
Calculating frequency moments of Data Stream
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Cuckoo Filter: Practically Better Than Bloom Author: Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher Publisher: ACM CoNEXT 2014 Presenter:
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
Compression for Fixed-Width Memories Ori Rottenstriech, Amit Berman, Yuval Cassuto and Isaac Keslassy Technion, Israel.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Optimizing Packet Lookup in Time and Space on FPGA Author: Thilan Ganegedara, Viktor Prasanna Publisher: FPL 2012 Presenter: Chun-Sheng Hsueh Date: 2012/11/28.
Dynamic Graph Partitioning Algorithm
The Variable-Increment Counting Bloom Filter
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Extendible Indexing Dina Said
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Database Systems (資料庫系統)
Database Design and Programming
Heavy Hitters in Streams and Sliding Windows
EMOMA- Exact Match in One Memory Access
Presentation transcript:

Hash Tables With Finite Buckets Are Less Resistant to Deletions Yossi Kanizo (Technion, Israel) Joint work with David Hay (Columbia U. and Hebrew U.) and Isaac Keslassy ( Technion )

Hash Tables for Networking Devices  Hash tables and hash-based structures are often used in high-speed devices  Heavy-hitter flow identification  Flow state keeping  Flow counter management  Virus signature scanning  IP address lookup algorithms  In many applications elements are also deleted (a.k.a. dynamic hash tables)

Dynamic vs. Static  Dynamic hash tables are harder to model than the static ones, that is, insertions only [Kirsch et al.]  Past studies show same asymptotic behavior with infinite buckets (insertions only vs. alternations)  traditional hashing using linked lists – maximum bucket size of approx. log n / log log n [ Gonnet, 1981]  d-random, d-left schemes – maximum bucket size of log log n / log 2 + O(1) [ Azar et al.,1994; Vöcking, 1999]  Using the static model seems natural.

High-Speed Hardware  Bucket is a memory word that contains h elements  E.g.: 128-bit memory word  h=4 elements of 32 bits  Assumption: Access cost (read & write word) = 1 cycle  Enable overflows: after d memory accesses → overflow list  Can be stored in expensive CAM  Otherwise, overflow elements = lost elements  Overflow fraction =  Memory h CAM 9

Degradation with Finite Buckets  Finite buckets are used.  Degradation in performance FiniteInfinite H(1) = 3H(2) = 3 Remove 1 Element “2” is not stored although its corresponding bucket is empty

Degradation with Finite Buckets  What we had is  Insert element “1”  Insert element “2”  Remove element “1”  Equivalent to only inserting element “2” in the static case 1234 FiniteInfinite

Simulations [h=1, load=n/(mh)=1, d = 2]

Comparing Static and Dynamic  Static setting: insertions only  n = number of elements  m = number of buckets  Dynamic setting: alternations between element insertions and deletions of randomly chosen elements.  fixed load of c = n / (mh)  Fair comparison  Given an average number of memory accesses a, minimize overflow fraction .

Why We Care about Average Number of Memory Accesses?  On-chip memory: memory accesses  power consumption  Off-chip memory: memory accesses  lost on/off-chip pin capacity  Datacenters: memory accesses  network & server load  Parallelism does not help reduce these costs  d serial or parallel memory accesses have same cost

From Discrete to Fluid Model  Discrete model  Models the system accurately but induces complex interactions between the elements  Approximation using a fluid model  Based on differential equations with an infinite number of elements and buckets.  Elements stay in the system for exponentially- distributed duration of average 1. Bucket departure rate is proportional to its occupancy.  Upon departure, a new element arrives. arrival rate is constant (fixed load in the system). Assuming uniformly distributed hash functions, bucket arrival rate is n / m = ch

Main Results  Case Study: Single choice hashing scheme  Lower bound on overflow fraction  Mitigating the degradation in performance.

Case Study: Analysis of Single Choice Hashing Scheme  Departure rate is proportional to bucket occupancy; arrival rate is constant  We show that (limit of) discrete Markov chain  fluid model  Intuition: No dependency between the buckets because of the single choice. No “complex interaction”  Bucket occupancy distribution is  The Overflow fraction is (Erlang-B formula) 12h0 1/m·(1-1/n) (1-1/m) ·1/n … 1/m·(1-2/n)1/m·(1-3/n)1/m·(1-h/n) (1-1/m) ·2/n(1-1/m) ·3/n h/n

Case Study: Numerical Example  For bucket size h=1, we get:  =c/(1 + c).  In case of 100% load (c=1):  dynamic: 50%.  static: 36.79%. [Kanizo et al., INFOCOM 2009]  In case of 10% load (c=0.1):  dynamic: 9.1%.  static: 4.84%.  As load  0, dynamic systems has twice the overflow fraction of static systems.

Main Results  Case Study: Single choice hashing scheme  Lower bound on overflow fraction  Mitigating the degradation in performance.

Overflow Lower Bound  Objective: given any online scheme with average a, find lower-bound on the overflow fraction .  We use the fluid model  Elements arrival rate is ch = n / m.  Hashing rate per element is a.  In the best case, all memory accesses are used to store elements.

Overflow Lower Bound  Overflow lower bound of where r = ach.  Also holds for non-uniformly distributed hash functions (under some constraints).

Numerical Example  For bucket size h=1, lower bound of 1-a/(1+ac).  100% load (c=1) implies lower bound of 1/(1+a).  To get an overflow fraction of 1%, one needs at least 99 memory accesses per element.  Infeasible for high-speed networking devices  Compared to a tight upper bound of e -a in the static case. [Kanizo et al., INFOCOM 2009]  need ~4.6 memory accesses.

The Lower Bound is Tight  Single choice hashing scheme  Optimal for a = 1  Multiple choice: Try to insert each element greedily until either inserted or d trials.  Optimal for larger number of memory accesses, depending on system parameters.  Example:  h = 4, c = 1, d = 4  Multiple choice is optimal for a  2.19.

Main Results  Case Study: Single choice hashing scheme  Lower bound on overflow fraction  Mitigating the degradation in performance.

Moving Back Elements  Recall the example from the beginning 1234 FiniteInfinite Element “2” is not stored although its corresponding bucket is empty

Moving Back Elements  Overflow elements are stored in CAM.  Moving back elements from the CAM to the buckets.  We cannot check upon a deletion every element in the CAM.  Store the hash values along with the elements in the CAM.  Upon departure check if an element can be moved back.  Can be combined with any hashing insertion scheme.

Evaluation  Single choice hashing scheme  Performance is exactly as in the static case.  Multiple choice hashing scheme  Performance is better than the static case, albeit with more memory accesses. [h=4, d=1]

Wrap-up  Initial simulation results show degradation in performance.  We found lower and upper bounds on the achievable overflow fraction.  We compared it with upper bounds of the static case.  Mitigating the degradation in performance.  Also in the paper  Simulations with synthetic data  Other dynamic models  Trace-driven simulations

Thank you.