Presentation is loading. Please wait.

Presentation is loading. Please wait.

hashing1 Hashing It’s not just for breakfast anymore!

Similar presentations


Presentation on theme: "hashing1 Hashing It’s not just for breakfast anymore!"— Presentation transcript:

1

2 hashing1 Hashing It’s not just for breakfast anymore!

3 hashing2 Hashing: the facts Approach that involves both storing and searching for values Behavior is linear in the worst case, but strong competitor with binary searching in the average case Hashing makes it easy to add and delete elements, an advantage over binary search (since the latter requires sorted array)

4 hashing3 Dictionary ADT Previously, we have seen a dictionary ADT implemented as a binary search tree A hash table can be used to provide an array-based dictionary implementation Abstract properties of dictionary: –every item has a key –to retrieve an item, specify key and retrieval process fetches associated data

5 hashing4 Possible structure for single dictionary item template struct RecordType { size_t key; item datarecord; }

6 hashing5 Setting up the array One approach to an array-based dictionary would be to create consecutive keys, storing the records so that each key corresponds to its index -- this is the method used in MS Access, for example An alternative would be to use an existing attribute of the data to be stored as the key value; this approach is more typical of hashing

7 hashing6 Setting up the array Use of existing key field presents challenges: –Value may be too large for indexing: e.g. social security number –No guarantee that individual values will be close enough together for effective indexing: e.g. last 4 digits of social security numbers of students in a class

8 hashing7 Solution: hashing Instead of direct use of data field, a function is applied to the original value to produce a valid index: this is called the hash function The hash function maps the key to an index that can be used to insert data into the array or to retrieve data based on a given key An array that uses hashing for indexing is called a hash table

9 hashing8 Operations on a hash table Inserting an item –calculate hash value (index) from item key –check index to determine if space is open if open, insert item if not open, collision occurs; search through array for next open slot –requires some mechanism for recognizing an empty space; can’t just start with uninitialized array

10 hashing9 Open-address hashing The insertion scheme just described uses open-address hashing In open addressing, collisions are resolved by placing a new item in the next open spot in the array Scheme requires that the key field of each array element be initialized to some known value; -1, for example

11 hashing10 Inserting a New Record In order to insert a new record, the key must somehow be converted to an array index. The index is called the hash value of the key. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number 506643548 Number 233667136 Number 281942902 Number 155778322... Number 580625685

12 hashing11 Inserting a New Record Typical hash function –701 is the number of items in the array –Number is the original key value [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number 506643548 Number 233667136 Number 281942902 Number 155778322... Number 580625685 (Number mod 701) What is (580625685 mod 701) ? 3

13 hashing12 Inserting a New Record The hash value is used for the location of the new record. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number 506643548 Number 233667136 Number 281942902 Number 155778322... [3]

14 hashing13 CollisionsCollisions Here is another new record to insert, with a hash value of 2. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number 506643548 Number 233667136 Number 281942902 Number 155778322... Number 580625685 Number 701466868 My hash value is [2].

15 hashing14 CollisionsCollisions This is called a collision, because there is already another valid record at [2]. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number 506643548 Number 233667136 Number 281942902 Number 155778322... Number 580625685 Number 701466868 When a collision occurs, move forward until you find an empty spot. When a collision occurs, move forward until you find an empty spot.

16 hashing15 CollisionsCollisions [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number 506643548 Number 233667136 Number 281942902 Number 155778322... Number 580625685 Number 701466868 The new record goes in the empty spot. The new record goes in the empty spot.

17 hashing16 Operations on a hash table Retrieving an item –calculate hash value based on desired key –search array, beginning at calculated index, for desired data –search is finished when: item is found; successful search an empty index is encountered; unsuccessful search

18 hashing17 Searching for a Key The data that's attached to a key can be found fairly quickly. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number 506643548 Number 233667136 Number 281942902 Number 155778322... Number 580625685 Number 701466868

19 hashing18 Searching for a Key Calculate the hash value. Check that location of the array for the key. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number 506643548 Number 233667136 Number 281942902 Number 155778322... Number 580625685 Number 701466868 Not me. Number 701466868 My hash value is [2].

20 hashing19 Searching for a Key Keep moving forward until you find the key, or you reach an empty spot. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number 506643548 Number 233667136 Number 281942902 Number 155778322... Number 580625685 Number 701466868 Not me. Number 701466868 My hash value is [2]. Not me.Yes!

21 hashing20 Searching for a Key When the item is found, the information can be copied to the necessary location. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number 506643548 Number 233667136 Number 281942902 Number 155778322... Number 580625685 Number 701466868 My hash value is [2].

22 hashing21 Operations on a hash table Deleting an item: –find index based on hashed key, as with insertion and retrieval –mark record at index to indicate the spot is open can’t use ordinary “empty” designation -- this could interfere with record retrieval use alternative “open” designation: indicate the slot is open for insertion, but won’t stop a search

23 hashing22 Deleting a Record Records may also be deleted from a hash table. But the location must not be left as an ordinary "empty spot" since that could interfere with searches. The location must be marked in some special way so that a search can tell that the spot used to have something in it. [ 0 ][ 1 ][ 2 ][ 3 ][ 4 ][ 5 ] [ 700] Number 506643548 Number 233667136 Number 281942902 Number 155778322... Number 580625685 Number 701466868 Please delete me.

24 hashing23 A class specification for a hashing dictionary Public functions: –constructor: creates and initializes empty dictionary –insert: inserts a new item –is_present: returns true if specified item is found in dictionary, false if not –find: returns a copy of the desired item, if found –remove: removes specified record if it exists –size: returns total number of records in dictionary

25 hashing24 Invariant for dictionary class Member variable used stores the number of records currently in dictionary Member variable data is an array of CAPACITY entries; actual records are stored here Each valid record has a non-negative key value; an unused record has its key field set to the constant NEVER_USED or the constant PREVIOUSLY_USED

26 hashing25 Code for dictionary class template class Dictionary { public: enum {CAPACITY = 811}; Dictionary( ); void insert (const RecType& entry); void remove (int key); bool is_present(int key) const; void find (int key, bool& found, RecType& result) const; size_t size( ) const {return used;}

27 hashing26 Code for dictionary class … private: const int NEVER_USED = -1; const int PREVIOUSLY_USED = -2; RecType data[CAPACITY]; size_t used; …

28 hashing27 Helper functions in dictionary class hash: calculates hash value for given key next_index: steps through array, providing wrap- around function at end of array find_index: finds array index of record with given key never_used: returns true if index has never been used is_vacant: returns true if index is not currently in use

29 hashing28 Code for dictionary class... // helper functions: size_t hash (int key) const {return key%CAPACITY;} size_t next_index (size_t index) const {return (index+1)%CAPACITY;} void find_index (int key, bool& found, size_t& index) const; bool never_used (size_t index) const {return data[index].key == NEVER_USED;} bool is_vacant(size_t index) const {return data[index].key < 0;} };

30 hashing29 Function implementations // constructor template Dictionary ::Dictionary( ) { used = 0; for (int x=0; x<CAPACITY; x++) data[x].key = NEVER_USED; }

31 hashing30 Function implementations // helper function find_index template void Dictionary ::find_index(int key, bool& found, size_t& index) { size_t count=0; index = hash(key); while ((count < CAPACITY) && (!never_used(index)) && (data[index].key != key)) { count++; index = next_index(index); } found = (data[index].key == key); }

32 hashing31 Function implementations template void Dictionary ::insert (const RecType& entry) { bool already_present;// true if entry already in table size_t index;// location of new entry find_index(entry.key, already_present, index); if (!already_present) { assert (size( ) < CAPACITY); used++; data[index] = entry; }

33 hashing32 Function implementations template void Dictionary ::remove (int key) { bool found;// true if key occurs somewhere in table size_t index;// index of key value assert (key >= 0);// must be valid key find_index(key, found, index); if (found) { data[index].key = PREVIOUSLY_USED; used--; }

34 hashing33 Function implementations template bool Dictionary ::is_present(int key) { bool found; size_t index; assert (key >= 0); find_index (key, found, index); return found; }

35 hashing34 Function implementations template void Dictionary ::find(int key, bool& found, RecType& result) const { size_t index; assert (key >= 0); find_index(key, found, index); if (found) result = data[index]; }


Download ppt "hashing1 Hashing It’s not just for breakfast anymore!"

Similar presentations


Ads by Google