Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.

Slides:



Advertisements
Similar presentations
CS Data Structures Chapter 8 Hashing.
Advertisements

Hash Tables.
Review of Chapter 8 張啟中.
Dictionaries Again Collection of pairs.  (key, element)  Pairs have different keys. Operations.  Search(theKey)  Delete(theKey)  Insert(theKey, theElement)
CSCE 3400 Data Structures & Algorithm Analysis
Skip List & Hashing CSE, POSTECH.
NCUE CSIE Wireless Communications and Networking Laboratory CHAPTER 8 Hashing 1.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
Chapter 8 1. Symbol Table Symbol table is used widely in many applications. dictionary is a kind of symbol table data dictionary is database management.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  get(theKey)  put(theKey, theElement)  remove(theKey) 5/2/20151.
Hashing Techniques.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by:  Search the hash table in.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Dictionaries Again Collection of pairs.  (key, element)  Pairs have different keys. Operations.  Get(theKey)  Delete(theKey)  Insert(theKey, theElement)
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS Data Structures Chapter 8 Hashing (Concentrating on Static Hashing)
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  get(theKey)  put(theKey, theElement)  remove(theKey)
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by:  Search the hash table in.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashing Static Hashing Dynamic Hashing. – 2 – Sungkyunkwan University, Hyoung-Kee Choi © Symbol table ADT  We define the symbol table as a set of name-attribute.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Copyright © Curt Hill Hashing A quick lookup strategy.
Data Structures Using C++
Chapter 9 Hashing Dr. Youssef Harrath
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  find(theKey)  erase(theKey)  insert(theKey, theElement)
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
Data Structures Chapter 8: Hashing 8-1. Performance Comparison of Arrays and Trees Is it possible to perform these operations in O(1) ? ArrayTree Sorted.
Data Structures Using C++ 2E
EEE2108: Programming for Engineers Chapter 8. Hashing
Data Structures Using C++ 2E
Review Graph Directed Graph Undirected Graph Sub-Graph
Advanced Associative Structures
Hash Table.
Hash Table.
CSCE 3110 Data Structures & Algorithm Analysis
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
What we learn with pleasure we never forget. Alfred Mercier
Presentation transcript:

Part II Chapter 8 Hashing

Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree SortedNot Sortedunbalancedbalanced Insertion O(n) / O(1)O(h) O(log k n) Searching O(log n) / O(1)O(h) O( log k n) Deletion O(n) / O(1)O(h) O(log k n) Is it possible to perform these operations in O(1) ?

Introduction If we find a mapping from a key to an index, then we can locate a record quickly according its key and perform random access. S1 S2 S3 … 012…012…

Introduction This mapping can be illustrated as follows: Hashing: define a function h so that h(Key) = i, where h is called a hash function. Two kinds Static hashing Dynamic hashing h h Keyi

8.2 Static Hashing

Definition In static hashing, identifiers/keys are stored in table with a fixed size that is called hash table. slot1slot2 Bucket 0 Bucket 1 Bucket 2 Bucket n Bucket: Each bucket has its own address and is capable of holding a key. h h xh(x) Hash function IdentifierBucket address

Definition Slot: Each bucket may consists of s slots to hold synonym ( 同義字 ) i 1 and i 2 are synonyms if h(i 1 ) = h(i 2 ). Distinct synonyms enter into the same bucket as long as the bucket has slots available.

Example Number of buckets: Number of slots for each bucket: Define hashing function f(x) f(x) = {i | i is the order of the initial of x}. A and A2 are synonyms. GA and GB are synonyms. If “Doll” enters, it will be put at buckect _______ (according to the hash function). A A A2 slot1slot2 Bucket 0 Bucket 1 Bucket 2 Bucket 25 D D Bucket 3 GA GB

Overflow and Collision Overflow occurs when a new identifier is mapped into a full bucket. Collision occurs when two non-identical identifiers are hashed into the same bucket. If the number of slot is 1, then overflow and collision occur simutaneously. A A A2 slot1slot2 Bucket 0 Bucket 1 Bucket 2 If A3 enters bucket 0, A3 collides with A and A2. The bucket overflows as well.

8.2.2 Hash Functions Ideally, we expect to find a hash function that is one-to- one and easy to compute. The hash function f(x) where f(x) = {i | i is the order of the initial of x}. The hash function can result in a lot of collisions because it only considers the initial character. Key points: use every character in the identifier as possible.

Common Approaches Division Mid-square Folding Digit Analysis

Division The most widely used hash function The key k is divided by some number D, and the remainder is used as the bucket address. h(k) = k % D Since the bucket address is from 0 to b-1 if there are b buckets, D is usually selected as the number of buckets.

Selecting The Divisor When the divisor is an even number, odd integers hash into odd home buckets and even integers into even home buckets. 20%14 = 6, 30%14 = 2, 8%14 = 8 15%14 = 1, 3%14 = 3, 23%14 = 9 When the divisor is an odd number, odd (even) integers may hash into any home. 20%15 = 5, 30%15 = 0, 8%15 = 8 15%15 = 0, 3%15 = 3, 23%15 = 8 The bias in the keys does not result in a bias toward either the odd or even home buckets. Better chance of uniformly distributed home buckets. So do not use an even divisor.

Selecting The Divisor Similar biased distribution of home buckets is seen, in practice, when the divisor is a multiple of prime numbers such as 3, 5, 7, … The effect of each prime divisor p of b decreases as p gets larger. Ideally, choose b so that it is a prime number. Alternatively, choose b so that it has no prime factor smaller than 20.

Mid-square Squaring the key and then using an appropriate number of bits from the middle of the square. Example: Suppose a character is represented in 6 bits and the bucket size is 2 r A x92= r bits

Mid-square Example Key = , m =10000, where 9999 is the largest bucket address. Squaring the key, and then we have h(x) = 1779

Folding The key k is partitioned into several parts, all of the same length. These partitions are then added together to obtain the hash address of k. Two schemes Shift folding Folding at the boundaries P1 P2P3P4P5

P1 Folding P2 P3 P4 P Shift folding P1 P2 P3 P4 P Folding at the boundaries

Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some systematic fashion for a bucket that is not full. Linear probing (linear open addressing). Quadratic probing. Rehashing. Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the bucket address. Array linear list. Chain.

Linear Probing Also called linear opening addressing Search one by one until a empty slot is found. Procedures: suppose b denotes the bucket size. 1. Compute h(k). 2. Examine the hash table buckets in the order ht[h(k)], ht[(h(k)+1)%b],…, ht[(h(k)+j)%b] until one of the following happens: ht[(h(k)+j)%b] has a pair whose key is k; k is found. ht[(h(k)+j)%b] is empty; k is not in the table. Return to ht[h(k)]; the table is full.

Linear Probing divisor = b (number of buckets) = 17. Bucket address = key % Insert pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30,

Linear Probing Linear opening addressing tends to create “cluster”. These clusters become larger as more synonyms enter.

Quadratic Probing Suppose i is used as the increment. When overflow occurs, the search is carried out by examining h(x), (h(x)+i 2 )%b, and (h(x)-i 2 )%b. For 1 ≦ i ≦ (b-1)/2 and b is a prime number of 4j+3. For example, b=3, 7, 11,…,43, 59..

Rehashing If overflow occurs at h i (x), then try h i+1 (x). Use a series of hash function h 1, h 2, …, h m to find an empty bucket. xh m (x)

Chaining [0] [4] [8] [12] [16] Disadvantage of linear probing Comparison of identifiers with different hash values. Use linked list to connect the identifiers with the same hash value and to increase the capacity of a bucket.