Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hashing – Part I CS 367 – Introduction to Data Structures.

Similar presentations


Presentation on theme: "Hashing – Part I CS 367 – Introduction to Data Structures."— Presentation transcript:

1 Hashing – Part I CS 367 – Introduction to Data Structures

2 Searching Up to now the only way to find a key is to search through all or part of the data –linked list: O(n) –AVL tree: O(log n) –binary search of array: O(log n) If lots of data and/or searching the data very often, these times can be long –given the key, would like to get the data directly

3 Hashing The solution to this problem is to put the key through a function that says exactly where the data is (or where it should be placed) –this function is called a hash function h(key) = integer –the integer obtained from a hash function can be used as an index into an array if the hash function is perfect – always generates a unique integer for different keys – the time to place and access data is O(1)

4 Hashing 01234567891011 Hashing Function AMXAMX AMX

5 Hashing Functions So what is the hashing function? –the simplest hashing function is to use the division remainder assume the array is 1000 elements in size translate the data into a number, n h(n) = n % 1000

6 Hashing Functions simple example –consider a small school –each student is tracked by a 4 digit ID number –each students ID# begins with the year they started 2000 -> 0, 2001->1, 2002->2, etc. –all student records are stored in an array maximum of 1000 students per year –let’s look at records for all sophomores assume they were freshman in 2001

7 Hashing Functions 01234567891011 Mary’s records Pete’s records John’s records Amy’s records … Mary’s ID #:1000 Pete’s ID #:1004 John’s ID #:1009 Amy’s ID#:1011 To find John’s record in the array: 1009 % 1000 = 9 Go to index number 9.

8 Generating n The previous example is rather simplistic in that it is hashing already unique integers –seems kind of pointless –maybe not if the integers are large consider the UW’s 10 digit ID numbers Often it is desirable to hash some other kind of data –a person’s name for example

9 Generating n How is a string converted into an integer? –the simplest method is to add all of the ASCII values for each character together –example convert amy into an integer –a = 97; m = 109; y = 121 –a + m + y = 327 –there are lots of other ways to convert strings to integers what are a few of them?

10 Hashing Functions There are millions of possible hashing functions –we will not be considering them all –basically, anything you can think of to generate an integer could be used as a hashing function Mathematicians have spent lots of time and effort to come up with some basic methods that work pretty well

11 Division We have already seen the division method –it involves taking the remainder of division h(key) = key % tableSize A few notes about making this work better –table size should be a prime number –usually a good method if nothing very little is known about the keys –the remaining methods will all use division as the final step in their calculation

12 Folding Separate the key into various equally sized parts and then recombine them –usually with addition Two kinds of folding –shift folding just add the various parts together as they are –boundary folding reverse the order of every other part and add them together

13 Folding Consider a SSN as a key –break it into 3 parts first 3, second 3, last 3 Shift folding example –SSN = 123-45-6789 –first = 123; second = 456; third = 789 –h(key) = (first + second + third) % size h(SSN) = 1368 % tableSize Boundary folding example –h(key) = (first + R(second) + third) % size –h(key) = (123 + 654 + 789) % size

14 Increasing Performance Consider using shifting and exclusive OR’ing to generate the key –exclusive OR parts together to generate index Example –consider the string abcdefgh –if each part is a letter, just exclusive OR them ‘a’ ^ ‘b’ ^ ‘c’ ^ ‘d’ ^ ‘e’ ^ ‘f’ ^ ‘g’ ^ ‘h’ –often, a character is represented by 8 bits what’s the problem with this? –might be better to exclusive OR chunks of the string “abcd” ^ “efgh” why were four digits chosen in this case?

15 Increasing Performance int shiftFold(String key, int tableSize) { int chunk = 0; int result = 0; byte[ ] st = key.getBytes(); for(int i=0; i<st.length; i+=4) { for(int j=0; (j<4) && (j + i < st.length); j++) { chunk = chunk | st[j + i]; chunk = chunk << 8; } result = result ^ chunk; chunk = 0; } return result % tableSize; }

16 Increasing Performance The performance could be increased even more if the table size was a power of 2 –can get rid of the modulo operation at the end –modulo is an expensive calculation –could just do a subtraction and an AND operation instead

17 Mid-Square Function Square the number and take the middle part as the index –a string must first be converted to get the number to square The entire key gets used to generate the address –less chance for conflicts more on this later This method works best if the table size is a power of two

18 Mid-Square Function Table size equals 1024 (2 10 ) The key is 3121 –3121 2 = 9740441 = (100101001010000101100001) 2 –middle 10 digits of this value are listed in bold Index in array is –(0101000010) 2 = 322 This is all very quick and easy to calculate using mask and shift operations

19 Mid-Square Function int tableSize = 1024; int mask = (tableSize – 1) ; int maskBits = logBase2(tableSize); int shiftBits = 7; // table size must be a power of two int midSquare(String key, int tableSize) { int n = stringToNum(key); int n = n * n; return n & (mask << shiftBits); }

20 Extraction Simply pull out a certain part of the key and use it as the index –example SSN = 123-45-6789 index = middle of key = 456 alternative index = first, middle, last = 159 Should try to choose a part of the key that is most likely unique –consider foreign student SSN –start with 999 probably not a great idea to extract the first three numbers


Download ppt "Hashing – Part I CS 367 – Introduction to Data Structures."

Similar presentations


Ads by Google