LECTURE 34: MAPS & HASH CSC 212 – Data Structures.

LECTURE 34: MAPS & HASH CSC 212 – Data Structures

Entry ADT  Entry ADT represents searchable data  Two methods declared in Entry: key() & value()  Entry implementations need key & value fields  Entry instance holds single key-value pair  setValue() also included in most implementations  Does NOT define setKey()

Sequence:Element::Map:___  Sequence is collection of elements  Many implementations possible for this ADT  All of them could hold a number of elements  Collection of Entry s is defined by a Map  Possible to have many implementations of Map  Entry s stored in each of these implementations 9 “c” Entry s 11 “xd” 1 “ab” -4 “dc” View of the Map

Sequence:Element::Map:___  Sequence is collection of elements  Many implementations possible for this ADT  All of them could hold a number of elements  Collection of Entry s is defined by a Map  Possible to have many implementations of Map  Entry s stored in each of these implementations 9 “c” Position s elements 11 “xd” 1 “ab” -4 “dc” View of the Sequence used by the Map

Lessons from Polly… 1. When searching, key get (s) value 2. Each key is unique & has at most 1 value value 3. Failed search is usual case, not exceptional one

 In all seriousness, can be matter of life-or-death  911 Operators immediately need addresses  Google’s search performance in TB/s  O(log n) time too slow for these uses  Would love to use arrays  Get O(1) time to add, remove, or lookup data  This HUGE array needs massive RAM purchase Map Performance

Monster Amounts of RAM  Java requires int s be used as array indices  Unfortunately int and RAM have limits  Integer.MAX_VALUE = 2,147,483,647  Items in Google index = ~8,200,000,000 (2005)  Possible phone numbers = 10,000,000,000  Enabling O (1) array use requires we do more  As with all life’s problems we turn to hash

 Hash function turns key into int from 0 – N -1  Result is usable as index for an array  Function specific for key type; cannot be reused  Store the Entry s in array – a HASH TABLE  (Great name for shop in Amsterdam, too)  Compute index with hash function  Entry stored in array at that index  If O(1) time used computing hash  Could need O(1) time to get Entry  Adding & removing in O(1) time, too Hashing To The Rescue

Hash Table Example  Table is array of Entry  Simple hash function is h(x)  x mod 10,000  Key used is x  h(x) is Entry ’s index  Always mod array length  Not all locations used  Holes can appear in array  Empty slots left null Hash Table Entry s 0000 0001 0256120001“Jay Doe” 0002 9811010002“Bob Doe” 0003 0004 4512290004“Jill Roe” 9997 9998 2007519998“Rhi Smith” 9999

What Hash Does  Implement Map with a hash table  Given a key, easily look up its Entry  Always computes same index for that key  Hash must be computed on each access  O(1) efficiency of array utilized  But is wasted if hash is slow  Spreads out Entry s, ideally  Want to use entire hash table

Bad Hash  h(x) = 0  Fast, repeatable, little use of table  h(x) = random.nextInt ()  Fast, not repeatable, uses entire table  h(x) = current index -or- free index  Slow, repeatable, uses entire table  h(x) = x 34 + 2x 33 + 24x 32 + 10x 31 …  Moderate, repeatable, but too large

Really Bad Hash  Using only part of the key  Inevitably, you will guess wrong Portion of key that matters Use this portion of this key

 Hash first turns key into int  East to do for numbers, at least  For a String, could add value of each character  Would hash to same index “spot”, “pots”, “stop”  Instead use polynomial code like Horner’s method: ( x 0 * a k-1 ) + (x 1 * a k-2 ) + … + (x k-2 * a 1 ) + x k-1 Good Hash Censored Example: “spot” = (‘s’ * a 3 ) + (‘p’ * a 2 ) + (‘o’ * a 1 ) + ‘t’

 Hash only use is computing array indices  Useless if larger than table’s length: no index exists!  “spot” = 4,293,383, when a =33  “triskaidekaphobia” = too big for my calculator  Instead use modulus (%) to compress result: result = (result + length) % length  Remember that modulus returns the remainder  Keeps result within array (just like array-based queue) Compression

 Occurs when 2 keys hash to same index  Ideal hash spreads keys out evenly across table  As much as possible this limits collisions  Small table size important also, since RAM limited  Unfortunately, there is no such thing as ideal hash  Must handle collisions if you want it to work  Ultimately, this could kill our O(1) efficiency buzz Collisions

Bucket Arrays  Make hash table an array of linked list Node s  First node in a linked list aliased by each array location  Whenever we have collision, we “chain” Entry s  Create new Node that stores the Entry  The linked list will have new Node at its front 0 1 2 3 4 5

Bucket Arrays  But what if have really bad hash?  Hashes to same index in every situation  All Entry s now found in single linked list  O(n) execution times would now be required 0 1 2 3 4 5

 Continue week #12 assignment  Due at usual time, whatever that may be  Read sections 9.2.5 – 9.2.8 of the book  Examine better approaches to handling collisions  Consider what we should do in following situation: Before Next Lecture…

LECTURE 34: MAPS & HASH CSC 212 – Data Structures.

Similar presentations

Presentation on theme: "LECTURE 34: MAPS & HASH CSC 212 – Data Structures."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LECTURE 34: MAPS & HASH CSC 212 – Data Structures.

Similar presentations

Presentation on theme: "LECTURE 34: MAPS & HASH CSC 212 – Data Structures."— Presentation transcript:

Similar presentations

About project

Feedback