Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Hash Tables.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
CSCE 3400 Data Structures & Algorithm Analysis
Skip List & Hashing CSE, POSTECH.
Searching: Self Organizing Structures and Hashing
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
E.G.M. PetrakisHashing1  Data organization in main memory or disk  sequential, binary trees, …  The location of a key depends on other keys => unnecessary.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hash Table March COP 3502, UCF.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Chapter 9 Searching. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Comp 335 File Structures Hashing.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Course notes CS2606: Data Structures and Object-Oriented Development Chapter 9: Searching Department of Computer Science Virginia Tech Spring 2008 (The.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
UNCA CSCI October, 2001 These notes were largely prepared by the text’s author Clifford A. Shaffer Department of Computer Science Virginia Tech.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
Data Structures Using C++ 2E
Hashing, Hash Function, Collision & Deletion
Hash table CSC317 We have elements with key and satellite data
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Advanced Associative Structures
Hashing.
CSCE 3110 Data Structures & Algorithm Analysis
What we learn with pleasure we never forget. Alfred Mercier
DATA STRUCTURES-COLLISION TECHNIQUES
Collision Resolution: Open Addressing Extendible Hashing
Presentation transcript:

Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key value K, locate the record (k j, I j ) in T such that k j =K. Searching is a systematic method for locating the record(s) with key value k j =K. A successful search is one in which a record with key k j =K is found. An unsuccessful search is one in which no record with k j =K is found (and does not exist).

Searching Ordered Arrays Binary Search - been there done that. Dictionary Search - interpolation search –Determine how far from an endpoint your value is probably going to be. –Pos=(value-A[lo])/(A[hi]-A[low]) * (hi-lo) –Look here rather than mid –Assumes the data is evenly distributed.

Lists Ordered by Frequency Order lists by (expected) frequency of occurrence. –Perform sequential search Cost for first record : 1 Cost for second record : 2 Search cost= 1p p 2 + 3p 3 + … + np n Worst case (n+1)/2 Best if a few items are accessed many times

Self Organizing Lists 80/20 rule: 80% of the accesses are to 20% of the records –expected search cost =.122n Self organizing lists modify the order of records within the list basedon the actual pattern of record accesses. Self organizing lists use a rule called a heuristic for deciding how to reorder the list.

Self Organizing Heuristics Order by actual frequency - most frequently used first When a record is found, swap it with the first item When a record is found, move it to the front of the list When a record is found, swap it with the record ahead of it

Hashing The process of mapping a key value to a position in a table. A hash function maps key values to positions. A hash table is an array that holds the records. The hash table has M slots (0:M-1) For any value K in the key range and some hash function h, h(k) = I where 0≤ I<M, and key(T[I])=K

Hashing Situations Hashing is appropriate for unique keys. Good for both in-memory and disk based applications. Answers the question “What record, if any, has key value K?” Example: Store the n records with keys in range 0-(n-1). –Store the record with key i in slot i. –Uses the hash function h(k)=k. (Identity function).

Collisions More reasonable example –Store about 1000 records with keys in the range 0-16,383. –Impractical to keep a table of size 16,384. –We need a hash function to map keys to a smaller range. Given a hash function h and different keys k 1 and k 2. Let  be a position in the hash table. –If h(k 1 )= h(k 2 )=  then k 1 and k 2 have a collision at  under h.

Collision Resolution To search for the record with key K: –Compute the table location h(K). –Starting with slot h(K), locate the record containing key K using (if necessary) a collision resolution policy. Collisions are inevitable in most applications. –Example: In a group of 23 people the odds are good that at least one pair share a birthday.

Hash Functions Must return a value within the table range. Should evenly distribute the records to be stored among the table slots. Ideally, the function should distribute records with equal probability to all the positions. In reality, usually depends on the data. If we know nothing about the key distribution, evenly distribute the key range among the positions. If we know about the key distribution, use a distribution dependant hash function.

Example Hash Functions h(key)=key % 16 - uses only last 4 bits. H(key)=key % uses last 4 digits. Use % tablesize to make sure result is in the range. Mid-square method: square the key and take the middle r bits for a table of size 2 r Sum up ASCII characters and take results modulo tablesize (a folding technique).

Collision Handling Categories Open hashing - when there is a collision, put collided item outside the table. Closed hashing - when there is a collision, put collided item inside the table.

Open Hashing Look at each table element as the head of a linked list of items that has to that position. Can organize the linked lists in many ways –ordered : unsuccessful searches are quickly found. –Ordered by frequency: if a few are searched for frequently, then this is a good technique. If there are N records to be stored and the table is of size M then the average search length is O(N/M). Good for internal memory. Linked nodes may be in different blocks on disk and cause many disk accesses.

Closed Hashing - Linear Probe If the item you are looking for is not in the hash position, look in the next position. Do the same for insert until you find an empty location. When you reach the bottom, go to the beginning. Must have at least one empty slot or there will be an infinite loop. Tends to have clustering since the collision position is not uniformly distributed (i.e. if collide at position 4, go to position 5, then 6, independent of key).

Better Linear Probe Instead of going to the next slot, skip by some constant c. The tablesize M and c should be relatively prime. This assures the probing will cycle through all the table. Still has some clustering.

Quadratic Probe Instead of adding 1 to the key add i 2 i is the probe sequence, so add 1, 4, 9, 16,... Remember we also mod with table size.

Double Hashing After a collision, use a different hash function. Eliminates clustering to some degree. For example if h(k) causes a collision then use –p(k,i)= i*h 2 (k) –h 2 is a different hash function –generates a different probe sequence

Analysis of Closed Hashing load factor =lf=N/M –N is the number of records –M is the size of the table –N/M is the percent full –The larger the load factor the greater the probability of a collision Average search length is O(1/(1-lf))

Deletions If we delete a value it may stop the search prematurely (break the chain). Use a special mark to indicate something was deleted. When searching continue if see this mark rather than stopping as if it was empty. Once we have many deleted items we may wish to rehash everything remaining –best if we rehash the most frequently accessed items first.