Data Structures Chapter 8: Hashing 8-1. Performance Comparison of Arrays and Trees Is it possible to perform these operations in O(1) ? ArrayTree Sorted.

Slides:



Advertisements
Similar presentations
CS Data Structures Chapter 8 Hashing.
Advertisements

HASH TABLE. HASH TABLE a group of people could be arranged in a database like this: Hashing is the transformation of a string of characters into a.
Review of Chapter 8 張啟中.
Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
CSCE 3400 Data Structures & Algorithm Analysis
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Skip List & Hashing CSE, POSTECH.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
NCUE CSIE Wireless Communications and Networking Laboratory CHAPTER 8 Hashing 1.
Data Structures Using C++ 2E
Chapter 8 1. Symbol Table Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  get(theKey)  put(theKey, theElement)  remove(theKey) 5/2/20151.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Part II Chapter 8 Hashing. Dynamic Hashing Also called extendible hashing Motivation Limitations of static hashing When the table is to be full, overflows.
Hash Table indexing and Secondary Storage Hashing.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by:  Search the hash table in.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
REPRESENTING SETS CSC 172 SPRING 2002 LECTURE 21.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
CS Data Structures Chapter 8 Hashing (Concentrating on Static Hashing)
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Comp 335 File Structures Hashing.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by:  Search the hash table in.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing Static Hashing Dynamic Hashing. – 2 – Sungkyunkwan University, Hyoung-Kee Choi © Symbol table ADT  We define the symbol table as a set of name-attribute.
Chapter 9 Hashing Dr. Youssef Harrath
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Chapter 5 Record Storage and Primary File Organizations
Data Structures Using C++ 2E
Dynamic Hashing (Chapter 12)
Hashing CENG 351.
EEE2108: Programming for Engineers Chapter 8. Hashing
Data Structures Using C++ 2E
Review Graph Directed Graph Undirected Graph Sub-Graph
Advanced Associative Structures
Hash Table.
Chapter 10 Hashing.
Chapter 21 Hashing: Implementing Dictionaries and Sets
CSCE 3110 Data Structures & Algorithm Analysis
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Module 12a: Dynamic Hashing
What we learn with pleasure we never forget. Alfred Mercier
Presentation transcript:

Data Structures Chapter 8: Hashing 8-1

Performance Comparison of Arrays and Trees Is it possible to perform these operations in O(1) ? ArrayTree Sorted Not Sorted Unbalanced Balanced (Chap. 10) InsertionO(n)O(1)O(h)O(log n) SearchingO(log n)O(n)O(h)O(log n) DeletionO(n)O(1)O(h)O(log n) 8-2 h: tree height

Static Hashing 雜湊 hash function: to transform a key into a table index Example: –key values: –hash function: h(k) = k mod 10 hash collision: Two records (keys) attempt to insert into the same bucket (position)

Hash Table 雜湊表 In static hashing, identifiers/keys are stored in a hash table. Slot: space for storing data Bucket: Each bucket may consist of s slots to hold synonym ( 同義字 ) –k 1 and k 2 are synonyms if h(k 1 ) = h(k 2 ) Hash collision: home bucket for new data is not empty A A A2 slot 1slot 2 bucket 0 bucket 1 bucket 2 bucket 25 … … D D bucket 3 GA Gg 8-4

Hash Functions A good hash function is easy to compute and minimizes # of collisions. Uniform hash function –Probability p(h(x) = i) is 1/b, where b is # of buckets Four popular hash functions: –Division 除法 –Mid-square 平方取中間位元 –Folding 折疊 –Digit Analysis 數位分析 8-5

Division h(k) = k % D remainder ( 餘數 ) Since the bucket address is from 0 to b  1 if there are b buckets, usually D=b. If D is even, it is not good. –D = 12, 20%12 = 8, 30%12 = 4, 28%14 = 0 If D is multiple ( 倍數 ) of 5, it is not good. –D = 15, 20%15 = 5, 30%15 = 0, 35%15 = 5 Prime numbers ( 質數 ) are good choice for D. For most practical dictionaries, D is a good cadidate if it has no prime factor smaller than

Mid-square Square the key and then use an appropriate number of bits from the middle of the square. Example –Key k= , b =10000, where 9999 is the largest bucket address. –Square the key, and then extract the middle 4 digits: h(k) =

Folding The key k is partitioned into several parts, all of the same length. These partitions are then added together to obtain the hash address for k. Two methods: –Shift folding  Folding at the boundaries P1 P2P3P4P5 P1 P2 P3 P4 P P1 P2 P3 P4 P

Overflow Handling 溢位處理 An overflow occurs when the home bucket for a new pair (key, element) is full. Two methods for handling overflows: Open addressing: Search the hash table in some systematic fashion for a bucket that is not full. –Linear probing 線性探測 (linear open addressing) –Quadratic probing 二次探測 –Rehashing 再雜湊 Chaining: Each bucket keeps a linked list of all pairs to the same bucket address. –Linked list 8-9

Linear Probing Also called linear open addressing Search the next available bucket one by one. (h(k) + j ) % D, j=0,1,2,3… Example –D (divisor) = b (number of buckets) = 17 –Bucket address = key % 17 –Insert pairs whose keys are Key: 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, h(x):6, 12, 0, 12, 11, 11, 6, 7, 0, 16,

Quadratic Probing h(x), (h(x)+i 2 )%b, and (h(x)  i 2 )%b,  i=1,2,3, …, (b  1)/2 Every bucket will be examined when b is a prime number of the form 4j+3. –For example, b=4j+3, b = 3, 7, 11, ,43, 59,  Hash function: –h(x) –(h(x)+1 2 ) % b –(h(x)  1 2 ) % b –(h(x)+2 2 ) % b –(h(x)  2 2 ) % b –  –(h(x)  ((b  1)/2) 2 ) % b 8-11 Example: b=7, (b-1)/2=3 i=1,2,3 Examine: 0,1,-1,4,-4,9,-9 (%7) Equivalent to 0,1,6,4,3,2,5

Rehashing If the overflow occurs at h i (x), try h i+1 (x). Use a series of hash function h 1, h 2, , h m to find an empty bucket. Example –h 1 (x) = x % 11 –h 2 (x) = x % 251 –h 3 (x) = x 2 % 251 –h 4 (x) = x % 1019 –  8-12

Chaining [0] [4] [8] [12] [16] Disadvantage of linear probing –Comparison of identifiers with different hash values. Use linked lists to connect the identifiers with the same hash value and to increase the capacity of a bucket key % 17

Dynamic Hashing Also called extendable hashing. Limitations of static hashing –When the table is to be full, overflows increase. As overflows increase, the overall performance decreases. –We cannot just copy entries from smaller into a corresponding buckets of a bigger table. Allow the size of dictionary to grow and shrink. –The size of hash table can be changed dynamically. Hash function: h( )  h’( ) Size of hash table: m  m’ 8-14

Dynamic Hashing Using Directories The hash function h(k) transforms k into 6-bit binary integer. kh(k)h(k) A A B B C C C C B B

Dynamic Hashing Using Directories A directory, d, of pointers to buckets is used. Directory size d = 2 t, where t is the number of bits used to identify all h(x). –Initially, t = 2. Thus, d = 2 2 = 4. h(k, t) is defined as the t least significant bits (LSB) in h(k), where t is also called the directory depth. Example: –h(C5) = –h(C5, 2) = 01 –h(C5, 3) =

Extending the Directory Consider the following keys have been already stored. t=2 (directory depth=2) differentiates all input keys. kh(k)h(k) A A B B C C d A0 B0 Directory of pointers to buckets A1 B1 C2 C3 C

Insertion of C5 (= ) t = 2 and h(C5, 2) = 01. A1 and B1 have been at d[01]. Bucket overflows. Decide u, h(k,u) is not the same for A1, B1, C5. Then u=3. Extend d to 2 u. Duplicate the pointers to the new half. A0 B0 A1 B1 C2 C kh(k)h(k) A A B B C C C u=

Insertion of C5 (= ) A0 B0 A1 B1 C2 C A0 B0 A1 B1 C2 C C5 u= Split the overflowed bucket using h(k,3). Rehash A1, B1, C5. A new bucket is created for C5.

A0 B0 A1 C1 C2 C C Insertion of C1 (= ) kh(k)h(k) A A B B C C C h(k,u) is not the same for A1, B1, C1: u=4. Rehash A1, B1, C1 using h(k,4) C B1

Advantages for Dynamic Hashing Using Directories Only double the directory rather than the whole hash table used in static hashing. Only rehash the entries in the buckets that overflows. 8-21

Directoryless Dynamic Hashing (1) 000 A A1 B5 010 C C B B4 A0 01 A1 B5 10 C C3 - C5 overflow bucket new active bucket (a) r = 2, q = 0 (b) Insert C5(= ), r = 2, q = Active: 0~2 r +q-1 (0~3) 1.01 overflows. 2.Activate 2 r +q ( Activate 100) 3.Increment q 4.Rehash B4, A0 by h(k,3) (000, 100: h(k,3)) (001~011: h(k,2)) 5.Insert C5 in overflow area.

Directoryless Dynamic Hashing (2) 000 A A1 C1 010 C C B B5 C5 new active bucket (c) Insert C1(= ), r = 2, q = overflows. 2.Activate 2 r +q ( Activate 101) 3.Increment q 4.Rehash A1, B5, C5, C1 by h(k,3) (000~001, 100~101: h(k,3)) (010~011: h(k,2))