Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 171: Introduction to Computer Science II Hashing and Priority Queues.

Similar presentations


Presentation on theme: "CS 171: Introduction to Computer Science II Hashing and Priority Queues."— Presentation transcript:

1 CS 171: Introduction to Computer Science II Hashing and Priority Queues

2  Prepare to be aMAZEd!  Find a path through a maze  Open path  Wall path, just use the walls  Hand-in on Gimle  Tuesday March 31 at 11:59pm.

3  A bag is a collection where removing items is not supported (a.k.a. multi-set)  Useful if you don’t need removal  Provides a client the ability to collect items and then to iterate through the collected items.

4  An abstract data type like a regular queue, but each element is associated with a priority value  The element with highest priority will be removed first  Queue: first in first out  Stack: last in first out  Priority queue: highest priority first out  Priority queue applications  Task scheduling  Search and optimization

5 Implementing Priority Queue Assumes an element with a smaller value has higher priority Implementation using ordered array  Insert – insert an element to the correct position  Remove – delete the front element Implementation using unordered array  Insert?  Remove?

6 Priority Queue vs. Ordered Array So priority queue looks quite the same with an orderedArray. What are the differences? – You can only remove elements at the front one by one. You can’t remove arbitrary elements. – Priority queue only needs the capability of returning the highest priority element. There are efficient implementations (such as heaps) which do not require all elements to be sorted at all time.

7 Using Priority Queues Example: PriorityQueue s = new PriorityQueue(10); s.insert(25); s.insert(35); System.out.println(s.remove()); s.insert(45); s.insert(15); System.out.println(s.remove());

8  Set membership: Have I seen “x” before?  One of the most common problems in practice  Same interface as BST:  search(“x”) ▪ Either returns the data or an error  insert(“y”) ▪ Inserts to data structure if not there ▪ Up to you what to do with duplicates…  We don’t need to preserve order

9  BST takes on average O(log(N)) time to search for.  But sometimes we have ample space, much more than N, and really want to answer searches in nearly O(1) time.  It’s okay if insertions take longer time  Also, we sometimes have long and crazy keys  E-mail addresses  Full names of Icelanders  URLs  UUID (universal unique identifiers)  How do we do that?

10  Suppose we have an array arr of length M  For now, suppose all of our N keys are tiny  In fact, they’re distinct integers between [0,M-1]  What could we do to quickly look up if we have a given key?

11  Aha! We can use the key as an index into arr  When somebody searches for key i  We look in arr[i]  If there is nothing there, we return Not Found error  Otherwise, we return the value in arr[i]  So that’s easy. But what if keys are long?

12  Can we “convert” each key into short, distinct numbers?  Needs to be fast „Barbie“ Magic! 0 1 2 3 4 „Turtles“

13  Can we “convert” each key into short numbers?  Needs to be fast „Barbie“ 0 1 2 3 4 „Turtles“

14  A hash function converts any given key into a number between [0,M-1].  It should be consistent  Same key should always return the same number  Moreover, we want it to distribute keys evenly across the possible numbers!  That helps them be distinct

15  Suppose all the keys were large integers  What would be a good hash function?  Caveat: Ideally M should be prime 31337 num % M 0 1 2 3 4 3141592 num

16  What if keys are general strings?  “Flugeldufel”, “Ausgeschnitzel”, …  We could loop through the string, and add them up to a large integer, then use the modulo operation

17  But what if different keys produce the same hash value?

18 18  23 people in a room  How likely that two people share the same birthday? Roughly: Answer: 50.7%!

19 19  Birthday paradox:  Can’t avoid collisions unless you have ridiculous amount of memory  How many collisions do we expect to see?

20  Need to deal with these hash collisions  Several ways to deal with them  Hashing with separate chaining:

21  Put keys that collide into a list associated with an index

22

23

24  Hashing with linear probing

25

26

27

28

29

30

31  Average case lookup (without resize):  O(1 + M/N)  Worst case lookup:  O(N)  If the hash table is close to filling up, we could resize it and rehash every element  Usually one would double the its, or grow by x%

32

33

34

35

36  Hashing  In practice, most often you’ll want to use hashing ▪ Best to allocate about 150% of the space you’ll need to the table  Fast search, O(1) average case (unless close to full)  Dead simple to use for standard data types  Binary search trees  Doesn’t require building efficient uniform hash functions for your data  Don’t need to worry about fullness of the table or resizing  Guaranteed worst-case performance (“red-black” BSTs)


Download ppt "CS 171: Introduction to Computer Science II Hashing and Priority Queues."

Similar presentations


Ads by Google