Presentation is loading. Please wait.

Presentation is loading. Please wait.

David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.

Similar presentations


Presentation on theme: "David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables."— Presentation transcript:

1 David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables

2 David Luebke 2 10/25/2015 Review: Skip Lists l A relatively recent data structure n “A probabilistic alternative to balanced trees” “A probabilistic alternative to balanced trees” n A randomized algorithm with benefits of r-b trees u O(lg n) expected search time u O(1) time for Min, Max, Succ, Pred n Much easier to code than r-b trees n Fast!

3 David Luebke 3 10/25/2015 Review: Skip Lists l The basic idea: l Keep a doubly-linked list of elements n Min, max, successor, predecessor: O(1) time n Delete is O(1) time, Insert is O(1)+Search time l Add each level-i element to level i+1 with probability p (e.g., p = 1/2 or p = 1/4) level 1 391218293537 level 2 level 3

4 David Luebke 4 10/25/2015 Review: Skip List Search l To search for an element with a given key: n Find location in top list u Top list has O(1) elements with high probability u Location in this list defines a range of items in next list n Drop down a level and recurse l O(1) time per level on average l O(lg n) levels with high probability l Total time: O(lg n)

5 David Luebke 5 10/25/2015 Review: Skip List Insert l Skip list insert: analysis n Do a search for that key n Insert element in bottom-level list n With probability p, recurse to insert in next level n Expected number of lists = 1+ p + p 2 + … = ??? = 1/(1-p) = O(1) if p is constant n Total time = Search + O(1) = O(lg n) expected l Skip list delete: O(1)

6 David Luebke 6 10/25/2015 Review: Skip Lists l O(1) expected time for most operations l O(lg n) expected time for insert l O(n 2 ) time worst case n But random, so no particular order of insertion evokes worst-case behavior l O(n) expected storage requirements l Easy to code

7 David Luebke 7 10/25/2015 Review: Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols: variable names, procedure names, etc. u Associated data: memory location, call graph, etc. n For a symbol table (also called a dictionary), we care about search, insertion, and deletion n We typically don’t care about sorted order

8 David Luebke 8 10/25/2015 Review: Hash Tables l More formally: n Given a table T and a record x, with key (= symbol) and satellite data, we need to support: u Insert (T, x) u Delete (T, x) u Search(T, x) n We want these to be fast, but don’t care about sort the records l The structure we will use is a hash table n Supports all the above in O(1) expected time!

9 David Luebke 9 10/25/2015 Hashing: Keys l In the following discussions we will consider all keys to be (possibly large) natural numbers l How can we convert floats to natural numbers for hashing purposes? l How can we convert ASCII strings to natural numbers for hashing purposes?

10 David Luebke 10 10/25/2015 Review: Direct Addressing l Suppose: n The range of keys is 0..m-1 n Keys are distinct l The idea: n Set up an array T[0..m-1] in which u T[i] = xif x  T and key[x] = i u T[i] = NULLotherwise n This is called a direct-address table u Operations take O(1) time! u So what’s the problem?

11 David Luebke 11 10/25/2015 The Problem With Direct Addressing l Direct addressing works well when the range m of keys is relatively small l But what if the keys are 32-bit integers? n Problem 1: direct-address table will have 2 32 entries, more than 4 billion n Problem 2: even if memory is not an issue, the time to initialize the elements to NULL may be l Solution: map keys to smaller range 0..m-1 l This mapping is called a hash function

12 David Luebke 12 10/25/2015 Hash Functions l Next problem: collision T 0 m - 1 h(k 1 ) h(k 4 ) h(k 2 ) = h(k 5 ) h(k 3 ) k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys)

13 David Luebke 13 10/25/2015 Resolving Collisions l How can we solve the problem of collisions? l Solution 1: chaining l Solution 2: open addressing

14 David Luebke 14 10/25/2015 Open Addressing l Basic idea (details in Section 12.4): n To insert: if slot is full, try another slot, …, until an open slot is found (probing) n To search, follow same sequence of probes as would be used when inserting the element u If reach element with correct key, return it u If reach a NULL pointer, element is not in table l Good for fixed sets (adding but no deletion) n Example: spell checking l Table needn’t be much bigger than n

15 David Luebke 15 10/25/2015 Chaining l Chaining puts elements that hash to the same slot in a linked list: —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

16 David Luebke 16 10/25/2015 Chaining l How do we insert an element? —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

17 David Luebke 17 10/25/2015 Chaining l How do we delete an element? —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

18 David Luebke 18 10/25/2015 Chaining l How do we search for a element with a given key? —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

19 David Luebke 19 10/25/2015 Analysis of Chaining l Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot l Given n keys and m slots in the table: the load factor  = n/m = average # keys per slot l What will be the average cost of an unsuccessful search for a key?

20 David Luebke 20 10/25/2015 Analysis of Chaining l Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot l Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot l What will be the average cost of an unsuccessful search for a key? A: O(1+  )

21 David Luebke 21 10/25/2015 Analysis of Chaining l Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot l Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot l What will be the average cost of an unsuccessful search for a key? A: O(1+  ) l What will be the average cost of a successful search?

22 David Luebke 22 10/25/2015 Analysis of Chaining l Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot l Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot l What will be the average cost of an unsuccessful search for a key? A: O(1+  ) l What will be the average cost of a successful search? A: O(1 +  /2) = O(1 +  )

23 David Luebke 23 10/25/2015 Analysis of Chaining Continued l So the cost of searching = O(1 +  ) l If the number of keys n is proportional to the number of slots in the table, what is  ? l A:  = O(1) n In other words, we can make the expected cost of searching constant if we make  constant

24 David Luebke 24 10/25/2015 Choosing A Hash Function l Clearly choosing the hash function well is crucial n What will a worst-case hash function do? n What will be the time to search in this case? l What are desirable features of the hash function? n Should distribute keys uniformly into slots n Should not depend on patterns in the data

25 David Luebke 25 10/25/2015 Hash Functions: The Division Method l h(k) = k mod m n In words: hash k into a table with m slots using the slot given by the remainder of k divided by m l What happens to elements with adjacent values of k? l What happens if m is a power of 2 (say 2 P )? l What if m is a power of 10? l Upshot: pick table size m = prime number not too close to a power of 2 (or 10)

26 David Luebke 26 10/25/2015 Hash Functions: The Multiplication Method l For a constant A, 0 < A < 1: l h(k) =  m (kA -  kA  )  What does this term represent?

27 David Luebke 27 10/25/2015 Hash Functions: The Multiplication Method l For a constant A, 0 < A < 1: l h(k) =  m (kA -  kA  )  l Choose m = 2 P l Choose A not too close to 0 or 1 l Knuth: Good choice for A = (  5 - 1)/2 Fractional part of kA

28 David Luebke 28 10/25/2015 Hash Functions: Worst Case Scenario l Scenario: n You are given an assignment to implement hashing n You will self-grade in pairs, testing and grading your partner’s implementation n In a blatant violation of the honor code, your partner: u Analyzes your hash function u Picks a sequence of “worst-case” keys, causing your implementation to take O(n) time to search l What’s an honest CS student to do?

29 David Luebke 29 10/25/2015 Hash Functions: Universal Hashing l As before, when attempting to foil an malicious adversary: randomize the algorithm l Universal hashing: pick a hash function randomly in a way that is independent of the keys that are actually going to be stored n Guarantees good performance on average, no matter what keys adversary chooses

30 David Luebke 30 10/25/2015 Universal Hashing l Let  be a (finite) collection of hash functions n …that map a given universe U of keys… n …into the range {0, 1, …, m - 1}. l  is said to be universal if: n for each pair of distinct keys x, y  U, the number of hash functions h   for which h(x) = h(y) is |  |/m n In other words: u With a random hash function from , the chance of a collision between x and y (x  y) is exactly 1/m

31 David Luebke 31 10/25/2015 Universal Hashing l Theorem 12.3: n Choose h from a universal family of hash functions n Hash n keys into a table of m slots, n  m n Then the expected number of collisions involving a particular key x is less than 1 n Proof: u For each pair of keys y, z, let c yx = 1 if y and z collide, 0 otherwise u E[c yz ] = 1/m (by definition) u Let C x be total number of collisions involving key x u u Since n  m, we have E[C x ] < 1

32 David Luebke 32 10/25/2015 A Universal Hash Function l Choose table size m to be prime l Decompose key x into r+1 bytes, so that x = {x 0, x 1, …, x r } n Only requirement is that max value of byte < m n Let a = {a 0, a 1, …, a r } denote a sequence of r+1 elements chosen randomly from {0, 1, …, m - 1} n Define corresponding hash function h a   : n With this definition,  has m r+1 members

33 David Luebke 33 10/25/2015 A Universal Hash Function l  is a universal collection of hash functions (Theorem 12.4) l How to use: n Pick r based on m and the range of keys in U n Pick a hash function by (randomly) picking the a’s n Use that hash function on all keys

34 David Luebke 34 10/25/2015 The End


Download ppt "David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables."

Similar presentations


Ads by Google