1 Microsoft Imagine Cup
2 CompSci 105 SS 2005 Principles of Computer Science Lecture 24: Tables and Hashing
3 Tables What is a table??
4 List ADT createTable() isEmpty() tableLength() tableInsert(item) tablDelete(searchKey) tableRetrieve(searchKey) tableTraverse()
5 Search Key It is important that the search key remain the same as long as the item is stored in the table. public abstract class KeyedItem { private Comparable searchKey; Public KeyItem(comparable key) { searchKey = key; } // end constructor public Comparable getKey() { return searchKey; } // end getKey } // end KeyedItem
6 Implementation?? Implementations for the ADT Table –Linear approaches Unsorted, array based Unsorted, reference based Sorted (by search key), array based Sorted (by search key), reference based –Non-linear approach Binary Search Tree The requirements of a particular application influence the selection of an implementation –What operations and how often they are used
7 Which to use??
8 ADT Table Unsorted Array Binary Search Tree ADT Table Program that uses a table Textbook, p
9 Databases Relational databases are simply a set of tables filled with data Use a variety of methods to store/retrieve that data
10 Hash Tables
11 Wouldn’t it be nice? Table ADT Key can be used as an array index IDSurnameFirst Name 1EksepshenCatchda 3BaseeksBeegoh 0HeadnodeDummy 2GettingsoonAyplus
12 Wouldn’t it be nice? Table ADT Key can be used as an array index IDSurnameFirst Name 1EksepshenCatchda 3BaseeksBeegoh 0HeadnodeDummy 2GettingsoonAyplus 0Headnode, Dummy 1Eksepshen, Catchda 2 3Baseeks, Beegoh
13 The Problem IDSurnameFirst Name EksepshenCatchda BaseeksBeegoh HeadnodeDummy GettingsoonAyplus ID range is far greater than can or should be stored...
14 The Problem General case is where we have a large range of possible keys/values, but are only storing a small number of items How do we distribute items in a smaller space?
15 Naive Solution If we have N possible search key values and M locations Simply divide N into M lots: e.g. N=1-1000, M=
16 What about collisions? If we want to store two items with search key 150 and 160, they will collide in the same array point
17 Hash Functions ? Hash Table Hash Function
18 A Hash Function ID % 10 IDSurname Eksepshen Baseeks Headnode Gettingsoon Hash Function Hash Table
19 Collision 0 Headnode Baseeks 8 9 key % 10 IDSurname Eksepshen Baseeks Headnode Gettingsoon Hash Function Hash Table Eksepshen Gettingsoon
20 Hash Function Tricks ?
21 Requirements of Hash Functions Don’t produce values outside of array Distribute items as evenly as possible Use all available space in array to minimise collision
22 Selecting Digits Digits 3 and 5 Hash Function How big do we need?
23 Folding Digits Sum of all digits Hash Function How big do we need? = 24
24 Folding Digits Group and add digits Hash Function How big do we need? = 735
25 Handling Characters Sum of Unicodes Hash Function “Catchda” Fold these as well?
26 Are these any good?? Do they even distribute values? No mention of array size?
27 Modulo Arithmetic % tablesize Hash Function
28 Modulo Arithmetic % tablesize Hash Function
29 Hash Functions Can combine multiple Hash functions into one Combine folding with modulus
30 Solutions to Collision?? All methods will result in collision There are many solutions....
31 Separate Chaining key % 10 IDSurname Eksepshen Baseeks Headnode Gettingsoon Hash Function Hash Table GettingsoonEksepshen Headnode Baseeks
32 Separate Chaining Could use ANY of the data structures so far Search time is reduced, but extra data structures required Can’t we just use array??
33 Linear Probing key % 10 IDSurname Eksepshen Baseeks Headnode Gettingsoon Hash Function Hash Table 0 Headnode 1 Eksepshen Baseeks 8 9 Clustering
34 Linear Probing key % 10 IDSurname Eksepshen Baseeks Headnode Gettingsoon Elizabeth II Hash Function Hash Table 0 Headnode 1 Eksepshen 2 Gettingsoon Baseeks 8 9 Clustering
35 Linear Probing key % 10 IDSurname Eksepshen Baseeks Headnode Gettingsoon Elizabeth II Hash Function Hash Table 0 Headnode 1 Eksepshen 2 Gettingsoon 3 Elizabeth II Baseeks 8 9 Clustering
36 Finding a Node
37 key % 10 Hash Function Hash Table 0 Headnode 1 Eksepshen 2 Gettingsoon 3 Elizabeth II Baseeks 8 9 Problem: Find item with key Solution: Search as if we were ADDING the item, checking each place we come across Stop if found or reach null Finding a Node
38 Efficiency?? What is the efficiency of these operations?? What is it dependant upon?? When is it best/worst??
39 Efficiency