An Approach to Generalized Hashing Michael Klipper With Dan Blandford Guy Blelloch.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables COT4810 Ken Pritchard 2 Sep 04.
CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada.
CSCE 3400 Data Structures & Algorithm Analysis
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Hashing as a Dictionary Implementation
1 Data structures for Pattern Matching Suffix trees and suffix arrays are a basic data structure in pattern matching Reported by: Olga Sergeeva, Saint.
CS 206 Introduction to Computer Science II 04 / 27 / 2009 Instructor: Michael Eckmann.
Chapter 7: Greedy Algorithms 7.4 Finding the Shortest Path Dijkstra’s Algorithm pp
Maps, Dictionaries, Hashtables
BTrees & Bitmap Indexes
CPSC 231 Organizing Files for Performance (D.H.) 1 LEARNING OBJECTIVES Data compression. Reclaiming space in files. Compaction. Searching. Sorting, Keysorting.
Hash Table indexing and Secondary Storage Hashing.
Generalized Hashing with Variable-Length Bit Strings Michael Klipper With Dan BlandfordGuy Blelloch Original source: D. Blandford and G. E. Blelloch. Storing.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Advanced Algorithms for Massive Datasets Basics of Hashing.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Shortest Path Problems Directed weighted graph. Path length is sum of weights of edges on path. The vertex at which the path begins is the source vertex.
CS 206 Introduction to Computer Science II 12 / 03 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
CSE 326: Data Structures Lecture #13 Extendible Hashing and Splay Trees Alon Halevy Spring Quarter 2001.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Compact Representations of Separable Graphs From a paper of the same title submitted to SODA by: Dan Blandford and Guy Blelloch and Ian Kash.
Data Structures Hashing Uri Zwick January 2014.
CSE332: Data Abstractions Lecture 26: Amortized Analysis Tyler Robison Summer
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
IP Address Lookup Masoud Sabaei Assistant professor
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Trevor Brown – University of Toronto B-slack trees: Space efficient B-trees.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hash Tables1   © 2010 Goodrich, Tamassia.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
CSE 326: Data Structures Lecture #16 Hashing HUGE Data Sets (and two presents from the Database Fiancée) Steve Wolfman Winter Quarter 2000.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
ITEC 2620A Introduction to Data Structures Instructor: Prof. Z. Yang Course Website: 2620a.htm Office: TEL 3049.
© 2004 Goodrich, Tamassia Hash Tables1  
CS 206 Introduction to Computer Science II 04 / 22 / 2009 Instructor: Michael Eckmann.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA Course: Data Structures Lecturer: Haim Kaplan and Uri Zwick.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
Chapter 5 Record Storage and Primary File Organizations
Amortized Analysis and Heaps Intro David Kauchak cs302 Spring 2013.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Chapter 7: Greedy Algorithms
File System Structure How do I organize a disk into a file system?
Hashing Exercises.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Indexing 4/11/2019.
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Lecture 20 Hashing Amortized Analysis
CSE 326: Data Structures Lecture #14
Presentation transcript:

An Approach to Generalized Hashing Michael Klipper With Dan Blandford Guy Blelloch

Hashing techniques currently available  Many hashing algorithms out there: Separate chaining Cuckoo hashing FKS perfect hashing  Also many hash functions designed, including several universal families  Good: O(1) expected amortized time for updates, and many have O(1) worst case time for searches  Bad: Require fixed-length keys and fixed- length data

What’s so bad about fixed length?  Easy to waste a lot of space: Every hash bucket must be as large as the largest item to be stored in the table. This is a large problem for sparsely-filled tables, or tables where large items occur infrequently.  Hash tables are often building blocks to more complicated structures, so optimizing them pays off in a lot of places.

Example: A Graph Layout Where We Store Edges in a Hash Table Let’s say u is a vertex of degree d and v 1, … v d are its neighbors. Let’s say that v 0 = v d+1 = u by convention. Then the entry representing the edge (u, v i ) has key (u, v i ) and data (v i-1, v i+1 ). u v2v2 v1v1 v3v3 u v1v1 u v2v2 u u v4v4 v1v1 u v3v3 v2v2 v4v4 4 Hash Table Degree of Vertex This extra entry “starts” the list.

An Idea for Compression  Instead of ((u, v i ), (v i-1, v i+1 )) in the table, we will store ((u, v i – u), (v i-1 – u, v i+1 – u)).  With this representation, we need O(kn) space where k =  (u,v)E log |u – v|.  A good labeling of the vertices will make many of these differences small! But not all of them. The following paper has details: D. Blandford, G. E. Blelloch, and I. Kash. Compact Representations of Separable Graphs. In SODA, 2003, pages

First, a simpler problem  Variable-length data stored in arrays  It’s like a hash table except that the indices now are in the fixed range 0…n-1 for n items in the array. We’ll use the following data for our example in these slides: (0, 10110)(1, 0110)(2, 11111) (3, 0101)(4, 1100)(5, 010)(6, 11011) (7, ) We’ll assume that the word size of the machine is 2 bytes.

Key Idea: BLOCKS  Multiple data items can be crammed into a word, so let’s take advantage of that.  Two words in a block: one with data, one marking off separations of strings  If the first index in a block is i, we’ll label the block as b i b0b0 2 nd word This is the block containing strings s 0 through s 2 from our example st word

Organization of Blocks  Index structure (regular array): A[i] = 1 if and only if string #i starts a block  Hash table (one of the regular kind): if string #i starts a block, H(i) = address of b i  Note that it is easy to split and merge blocks. b0b0 b3b3 b7b7 A Key size invariant: two adjacent blocks (like b0 and b3 in the example) must have their sizes sum to greater than the word size of the machine H(0) H(3) H(7)

A Rough Look at Space and Time Bounds for this Array Structure Let’s say we have n items, and w is the word size in bits of the machine. WLOG all data strings are nonempty. Let m =  i |s i |. Lookup tables cut the time down to constant time for finding a block and the string inside it, since they can operate on entire words at once. Indexing structures + hash table use O(w) bits per block. Each block is on average half full due to the invariant. O(m + w) bits used and operations are O(1) time! At most w strings <= w apart On avg block is ½ full O(m/w + 1) blocks!

Briefly, how we proceed from there  We can finally implement our generalized hash table using an array of the type we just described as the hash table.  There are more details: the following paper explains this. D. Blandford and G. E. Blelloch. Storing Variable-Length Keys in Arrays, Sets, and Dictionaries, with Applications. In Symposium on Discrete Algorithms (SODA), 2005 (hopefully)

Great, but if there’s a paper written on the subject already, then what do I do?  A lot of this code isn’t yet written. We haven’t yet checked to see that the code we have fulfills the theoretical bounds, since we have to make sure that any “cutting corners” done for the programming is theoretically safe.  My job is to get a lot of this running and look for optimizations.  Also, once this is running, we’ll want to run experiments to see how well it runs, especially in modeling graphs.