Storage and Retrieval Structures by Ron Peterson.

Storage and Retrieval Structures by Ron Peterson

Overview Storage & Retrieval as an ADT Simple implementations –Arrays of records –Sorted arrays –Trees Efficiency issues Hash tables

S & R ADT A container with a bunch of records Each record has a “key” field Operations: –Add a record –Remove a record by key –Find a record by key, retrieve a copy

Simple Implementations Array of records –Insert at end, –Find by linear search Sorted array –Insert in position order, –Find by binary search Trees and balanced trees –We’ll study this later

Efficiency Issues Regular arrays – O(N) retrieval Sorted arrays – O(log N) retrieval, but O(N) add (Insertion) Trees – O(log N) retrieval & add but backup & degenerate tree issues Balanced trees – O(log N), but complex & backup issues Alternative: Hash table – O(C) or close

Hash Table Motivation How about if we used an array, –but every record had a unique location? For example, we have an array of employee records, but the key is Employee-ID which goes from 1 to 300 Employee 17 gets put in location 17 Add and retrieve are each O(C) Problem: what if SSN is the key?

The Hash Table Solution For SSN as the Employee-ID –(as might be needed for Payroll) One slot per 9-digit ID would require an array of one billion slots; not feasible! Instead, let’s still have an array of 300 (or a few more) slots and then figure out: An easy “mapping” function: –LocationIndex = Hash(SSN)

Hash Table Issues Coming up with a Hash function –Easy to calculate –Result in correct range –Minimize duplicate answers Duplicates (“collisions”) inevitable –Many-to-one function (keys to location) Need a plan for dealing with it –“collision handling”

Collision Handling When adding a record, and a record with a different key is in the location given by the Hash function; And when retrieving any record that collided when added; You need to use the same process of what to do next.

Collision Handling Methods Just increment the location until you find an empty slot (or the key sought) –Called “linear probing” –Provably a bad choice because it tends to create filled up blocks! Jumps of increasing size (+ wrap-around); –Most common version is “quadratic probing” Using an overflow area with links

Hash function approaches Numeric key: just use mod: –Hash(key): return key%Size Non-numeric key: do a weighted sum of the ASCII codes of the characters: –Char[1] + 5*Char[2] + 17*Char[3] –Then Sum%Size Special care is usually taken to avoid non- uniformity in distribution of keys.

Design of a Hash Table Choose a size that leaves room for growth and turn-over (employees leaving?) Add, Remove, and Find all use the same –Hash function –Collision handling, so Write a Hash function Choose & implement a collision handling method

A Few Final Issues If you run out of slots, you might need to rebuild the whole table with a bigger size. The size is often chosen as a prime number so that cyclicity in the distribution of keys has the least effect. New approaches to collision handling are continually being studied. Hashing to pointers to linked lists can be very effective if the Hash function is good.

Storage and Retrieval Structures by Ron Peterson.

Similar presentations

Presentation on theme: "Storage and Retrieval Structures by Ron Peterson."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Storage and Retrieval Structures by Ron Peterson.

Similar presentations

Presentation on theme: "Storage and Retrieval Structures by Ron Peterson."— Presentation transcript:

Similar presentations

About project

Feedback