Algorithm Course Dr. Aref Rashad February 20131 Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables.
Hash Tables CIS 606 Spring 2010.
CSCE 3400 Data Structures & Algorithm Analysis
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Data Structures Using C++ 2E
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Hashing CS 3358 Data Structures.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
Hash Table March COP 3502, UCF.
Spring 2015 Lecture 6: Hash Tables
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
SEARCHING UNIT II. Divide and Conquer The most well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
CSC 211 Data Structures Lecture 13
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Lecture 17 April 11, 11 Chapter 5, Hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Data Structures Using C++ 2E
Hashing CSE 2011 Winter July 2018.
Data Structures Using C++ 2E
Hash Table.
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Hashing.
Introduction to Algorithms 6.046J/18.401J
Resolving collisions: Open addressing
Hash Tables – 2 Comp 122, Spring 2004.
CSCE 3110 Data Structures & Algorithm Analysis
CS202 - Fundamental Structures of Computer Science II
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EE 312 Software Design and Implementation I
Collision Handling Collisions occur when different elements are mapped to the same cell.
EE 312 Software Design and Implementation I
Hash Tables – 2 1.
Presentation transcript:

Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms

February 2013 Algorithms Course..... Dr. Aref Rashad 2 Search Algorithms A search algorithm is a method of locating a specific item of information in a larger collection of data. Why Search? Everyday life -We always Looking for something – yellow pages, universities, hospitals,…etc. World wide web –different searching mechanisms, Databases –use to search for a record

February 2013 Algorithms Course..... Dr. Aref Rashad 3 Sequential search Basic Sequential Search Sorted array search Binary Search Hashing Hashing Functions Recursive structures search Binary Search tree Multidimensional search

February 2013 Algorithms Course..... Dr. Aref Rashad 4 Linear Search This is a very simple algorithm. It uses a loop to sequentially step through an array, starting with the first element. It compares each element with the value being searched for(key) and stops when that value is found or the end of the array is reached.

February 2013 Algorithms Course..... Dr. Aref Rashad 5 Algorithm Pseudo Code: Found = false; Position = –1; Index = 0 while index < number of elements, found = false if list[index] is equal to search value Found = true Position = index end if Index = Index +1 end while return Position

February 2013 Algorithms Course..... Dr. Aref Rashad 6 Linear Search Example

February 2013 Algorithms Course..... Dr. Aref Rashad 7 Linear Search Tradeoffs Benefits: Easy algorithm to understand Array can be in any order Disadvantages: Inefficient (slow): for array of N elements, examines N/2 elements on average for value in array, N elements for value not in array

February 2013 Algorithms Course..... Dr. Aref Rashad 8 Efficiency of a sequential Search of an Array In best case, you will locate the desired item first in the array You will have made only one comparison So search will be O(1) In worst case you will search the entire array, either desired item will be found at the end of array or not at all In either event you have made n comparisons for an array of n elements Sequential search in worst case is just O(n) In the average case, you will look at about one-half of the elements in the array. Thus is O(n/2), which is just O(n)

February 2013 Algorithms Course..... Dr. Aref Rashad 9 Binary Search Requires array elements to be in order 1. Divides the array into three sections: middle element elements on one side of the middle element elements on the other side of the middle element 2. If the middle element is the correct value, done. Otherwise, go to step 1. using only the half of the array that may contain the correct value. 3. Continue steps 1. and 2. until either the value is found or there are no more elements to examine

February 2013 Algorithms Course..... Dr. Aref Rashad 10 Example 1. Find 6 in {-1, 5, 6, 18, 19, 25, 46, 78, 102, 114}. Step 1 (middle element is 19 > 6): Step 2 (middle element is 5 < 6): Step 3 (middle element is 6 == 6): Example 2. Find 103 in {-1, 5, 6, 18, 19, 25, 46, 78, 102, 114}. Step 1 (middle element is 19 < 103): Step 2 (middle element is 78 < 103): Step 3 (middle element is 102 < 103): Step 4 (middle element is 114 > 103): Step 5 (searched value is absent): Binary Search Example

February 2013 Algorithms Course..... Dr. Aref Rashad 11 How a Binary Search Works Always look at the center value. Each time you get to discard half of the remaining list.

February 2013 Algorithms Course..... Dr. Aref Rashad 12 Huge advantage of this algorithm is that it's complexity depends on the array size logarithmically in worst case. In practice it means, that algorithm will do at most log 2 (n) iterations, which is a very small number even for big arrays. On every step the size of the searched part is reduced by half. Algorithm stops, when there are no elements to search in. Complexity Analysis

February 2013 Algorithms Course..... Dr. Aref Rashad 13 Binary Search Tradeoffs: Benefits: Much more efficient than linear search. For array of n elements, performs at most log 2 (n) comparisons Disadvantages: Requires that array elements be sorted

February 2013 Algorithms Course..... Dr. Aref Rashad 14 When compared to linear search, whose worst-case behavior is n iterations, we see that binary search is substantially faster as n grows large.linear search For example, to search a list of one million items takes as many as one million iterations with linear search, but never more than twenty iterations with binary search. However, a binary search can only be performed if the list is in sorted order. Linear vs Binary Search

Hashing Important and widely useful technique for implementing dictionaries Constant time per operation (on the average) Worst case time proportional to the size of the set for each operation

Basic Idea Use hash function to map keys into positions in a hash table Ideally If element e has key k and h is hash function, then e is stored in position h(k) of table To search for e, compute h(k) to locate position. If no element, dictionary does not contain e.

Example: Student Records Hash function maps ID into distinct table positions GradeAgeNameKey A25xxxx B30yyyy Hash function: h(k) = k To Search for h(951100) = = 100 Check Hash table (100) ? hash table buckets 100

Analysis (Ideal Case Unrealistic) O(b) time to initialize hash table (b number of positions or buckets in hash table) O(1) time to perform insert, remove, search Works for implementing dictionaries, but many applications have key ranges that are too large to have 1-1 mapping between buckets and keys! Example: Suppose key can take on values from ,535 (2 byte unsigned integers) Expect  1,000 records at any given time Impractical to use hash table with 65,536 slots!

Hash Functions If key range too large, use hash table with fewer buckets and a hash function which maps multiple keys to same bucket: h(k 1 ) =  = h(k 2 ): k 1 and k 2 have collision at slot  Popular hash functions: hashing by division h(k) = k%D, where D number of buckets in hash table (% …. MOD …. Reminder of division) Example: hash table with 11 buckets h(k) = k%11 80  3 (80%11= 3), 40  7, 65   3 collision!

Hashing 0 m–1 h(k1)h(k1) h(k4)h(k4) h(k 2 )=h(k 5 ) h(k3)h(k3) U (universe of keys) K (actual keys) k1k1 k2k2 k3k3 k5k5 k4k4 collision

Collision Resolution Policies Two classes: – Open hashing, separate chaining – Closed hashing, open addressing Difference has to do with whether collisions are stored outside the table (open hashing) or whether collisions result in storing one of the records at another slot in the table (closed hashing)

Methods of Resolution Chaining: Open hashing – Store all elements that hash to the same slot in a linked list. – Store a pointer to the head of the linked list in the hash table slot. Open Addressing: Closed hashing – All elements stored in hash table itself. – When collisions occur, use a systematic (consistent) procedure to store elements in free slots of the table. k2k2 0 m–1 k1k1 k4k4 k5k5 k6k6 k7k7 k3k3 k8k8

Open Hashing Each bucket in the hash table is the head of a linked list All elements that hash to a particular bucket are placed on that bucket’s linked list Records within a bucket can be ordered by order of insertion or by key value order

Collision Resolution by Chaining 0 m–1 h(k 1 )=h(k 4 ) h(k 2 )=h(k 5 )=h(k 6 ) h(k 3 )=h(k 7 ) U (universe of keys) K (actual keys) k1k1 k2k2 k3k3 k5k5 k4k4 k6k6 k7k7 k8k8 h(k8)h(k8) X X X

k2k2 Collision Resolution by Chaining 0 m–1 U (universe of keys) K (actual keys) k1k1 k2k2 k3k3 k5k5 k4k4 k6k6 k7k7 k8k8 k1k1 k4k4 k5k5 k7k7 k3k3 k8k8

Open hashing : Analysis Open hashing is most appropriate when the hash table is kept in main memory, implemented with a standard in- memory linked list We hope that number of elements per bucket roughly equal in size, so that the lists will be short If there are n elements in set, then each bucket will have roughly n/D, where D number of buckets in hash table If we can estimate n and choose D to be roughly as large, then the average bucket will have only one or two members

Open hashing : Analysis Average time operation: D buckets, n elements  average n/D elements per bucket insert, search, remove operation take O(1+n/D) time each If we can choose D to be about n, constant time

Closed Hashing To search for key k: Examine slot h(k). Examining a slot is known as a probe. If slot h(k) contains key k, the search is successful. If the slot contains NIL, the search is unsuccessful. There’s a third possibility: slot h(k) contains a key that is not k. – Compute the index of some other slot, based on k and which probe we are on. – Keep probing until we either find key k or we find a slot holding NIL. Advantages: Avoids pointers; so can use a larger table.

Closed Hashing Associated with closed hashing is a rehash strategy: “If we try to place x in bucket h(x) and find it occupied, find alternative location h 1 (x), h 2 (x), etc. Try each in order, if none empty table is full,” In general, collision resolution strategy is to generate a sequence of hash table slots (probe sequence) that can hold the record; test each slot until find empty one (probing)

Computing Probe Sequences Auxiliary hash functions : Linear Probing. Quadratic Probing. Double Hashing. S implest rehash strategy is called linear Probing h i (x) = (h(x) + i) % D h 1 (d) = (h(d)+1)% D h 2 (d) = (h(d)+2)% D h 3 (d) = (h(d)+3)% D

Example Linear (Closed) Hashing D=8, keys a,b,c,d have hash values h(a)=3, h(b)=0, h(c)=4, h(d)= b a c Where do we insert d? 3 already filled Probe sequence using linear hashing: h 1 (d) = (h(d)+1)%8 = 4%8 = 4 h 2 (d) = (h(d)+2)%8 = 5%8 = 5* h 3 (d) = (h(d)+3)%8 = 6%8 = 6 etc. 7, 0, 1, 2 Wraps around the beginning of the table! d

Pseudo-code for Search Hash-Search (T, k) 1. i  0 2. repeat j  h(k, i) 3. if T[j] = k 4. then return j 5. i  i until T[j] = NIL or i = m 7. return NIL Hash-Search (T, k) 1. i  0 2. repeat j  h(k, i) 3. if T[j] = k 4. then return j 5. i  i until T[j] = NIL or i = m 7. return NIL

Ex: Linear Probing Example: – h’(x)  x mod 13 – h(x)=(h’(x)+i) mod 13 – Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order

Example Insert h(k) = k% If next element has home bucket 0,1,2?  go to bucket 3 Only a record with home position 3 will stay. Only records hashing to 4 will end up in 4 (p=1/11); same for 5 and 6 h(1052) = 1052%11 = 7 h1(1052) = (7+1)%11 = 8 h2(1052) = (7+2)%11 = 9 h3(1052) = (7+3)%11 =

Linear Probing h(k, i) = (h(k)+i) mod m. The initial probe determines the entire probe sequence. – T[h(k)], T[h(k)+1], …, T[m–1], T[0], T[1], …, T[h(k)–1] – Hence, only m distinct probe sequences are possible. Suffers from primary clustering: – Long runs of occupied sequences build up. – Long runs tend to get longer, since an empty slot preceded by i full slots gets filled next with probability (i+1)/m. – Hence, average search and insertion times increase. keyProbe numberAuxiliary hash function

Quadratic Probing h(k,i) = (h(k) + c 1 i + c 2 i 2 ) mod m c 1  c 2 The initial probe position is T[h(k)], later probe positions are offset by amounts that depend on a quadratic function of the probe number i. Must constrain c 1, c 2, and m to ensure that we get a full permutation of  0, 1,…, m–1 . Can suffer from secondary clustering: – If two keys have the same initial probe position, then their probe sequences are the same. keyProbe numberAuxiliary hash function

Double Hashing h(k,i) = (h 1 (k) + i h 2 (k)) mod m Two auxiliary hash functions. – h 1 gives the initial probe. h 2 gives the remaining probes. Must have h 2 (k) relatively prime to m, so that the probe sequence is a full permutation of  0, 1,…, m–1 . – Choose m to be a power of 2 and have h 2 (k) always return an odd number. Or, – Let m be prime, and have 1 < h 2 (k) < m.  (m 2 ) different probe sequences. – One for each possible combination of h 1 (k) and h 2 (k). – Close to the ideal uniform hashing. keyProbe numberAuxiliary hash functions

Performance Analysis - Worst Case Initialization: O(b), b# of buckets Insert and search: O(n), n number of elements in table; all n key values have same home bucket No better than linear list for maintaining dictionary!

Performance Analysis - Avg Case Distinguish between successful and unsuccessful searches – Delete = successful search for record to be deleted – Insert = unsuccessful search along its probe sequence Expected cost of hashing is a function of how full the table is: load factor  = n/b Average costs under linear hashing (probing) are: – Insertion: 1/2(1 + 1/(1 -  ) 2 ) – Deletion: 1/2(1 + 1/(1 -  ))