Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hash Functions Sections 5.1 and 5.2

Similar presentations


Presentation on theme: "Hash Functions Sections 5.1 and 5.2"— Presentation transcript:

1 Hash Functions Sections 5.1 and 5.2
Hashing - 1 Hash Functions Sections 5.1 and 5.2

2 Introduction Data items stored in an array of some fixed size.
Search performed using some part of the data item – called the key. Hash function Maps keys to integers (which represent table indices) Hash(Key) = Integer Evenly distributed index values Even if the input data is not evenly distributed A hash function maps keys to small integers (buckets). An ideal hash function maps the keys to the integers in a random-like manner, so that bucket values are evenly distributed even if there are regularities in the input data. This process can be divided into two steps: Map the key to an integer. Map the integer to a bucket. Taking things that really aren't like integers (e.g. complex record structures) and mapping them to integers is icky. We won't discuss this. Instead, we will assume that our keys are either integers, things that can be treated as integers (e.g. characters, pointers) or 1D sequences of such things (lists of integers, strings of characters).

3 Simple Hash Functions Assumptions: Goal: K: an unsigned 32-bit integer
M: the number of buckets (the number of entries in a hash table) Goal: If a bit is changed in K, all bits are equally likely to change for Hash(K)

4 A Simple Hash Function…
What if Hash(K) == K What is wrong? Values of K may not be evenly distributed But Hash(K) needs to be evenly distributed

5 Another Simple Function
If Hash(K) == K % M What is wrong? Suppose M = 10, K = 2, 20, 34, 42,76 Then K % M = 2, 0, 4, 2, 6,... Since 10 is even, all even K are hashed to even numbers ...

6 Yet Another Simple Function
If Hash(K) = K % P, with P a prime number Suppose P = 11 K = 10, 20, 30, 40 K % P = 10, 9, 8, 7 More uniform distribution…

7 Hashing a Sequence of Keys
K = {K1, K2, …, Kn) E.g., Hash(“test”) = 98157 Design Principles Use the entire key Use the ordering information The hash functions in this section take a sequence of integers k=k1,...,kn and produce a small integer bucket value h(k). m is the size of the hash table (number of buckets), which should be a prime number. The sequence of integers might be a list of integers or it might be an array of characters (a string). The specific tuning of the following algorithms assumes that the integers are all, in fact, character codes. In C++, a character is a char variable which is an 8-bit integer. ASCII uses only 7 of these 8 bits. Of those 7, the common characters (alphabetic and number) use only the low-order 6 bits. And the first of those 6 bits primarily indicates the case of characters, which is relatively insignificant. So the following algorithms concentrate on preserving as much information as possible from the last 5 bits of each number, and make less use of the first 3 bits. When using the following algorithms, the inputs ki must be unsigned integers. Feeding them signed integers may result in odd behavior.

8 Use the Entire Key Problem: Hash(“ab”)==Hash(“ba”)
unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j] //bitwise xor } return hash; Problem: Hash(“ab”)==Hash(“ba”)

9 Use the Ordering Information
unsigned int Hash(const char *Key) { unsigned int hash = 0; for (unsigned int j = 0; j < K; j++) { hash = hash ^ Key[j]; hash = hash ^ (j%32); } return hash;

10 Better Hash Function unsigned int Hash(const String& S) {
unsigned int i; long unsigned int bigval = S.Element(0); // S[0] for (i = 1; i < S.Size(); ++i) bigval = ((bigval & 65535) * 18000) // low16 * magic_number + (bigval >> 16) // high16 + S[i]; bigval = ((bigval & 65535) * 18000) + (bigval >> 16); // bigval = low16 * magic_number + high16 return bigval & 65535; // return low16 }


Download ppt "Hash Functions Sections 5.1 and 5.2"

Similar presentations


Ads by Google