Presentation is loading. Please wait.

Presentation is loading. Please wait.

Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.

Similar presentations


Presentation on theme: "Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key."— Presentation transcript:

1 Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key value K, locate the record (k j, I j ) in T such that k j =K. Searching is a systematic method for locating the record(s) with key value k j =K. A successful search is one in which a record with key k j =K is found. An unsuccessful search is one in which no record with k j =K is found (and does not exist).

2 Searching Ordered Arrays Binary Search - been there done that. Dictionary Search - interpolation search –Determine how far from an endpoint your value is probably going to be. –Pos=(value-A[lo])/(A[hi]-A[low]) * (hi-lo) –Look here rather than mid –Assumes the data is evenly distributed.

3 Lists Ordered by Frequency Order lists by (expected) frequency of occurrence. –Perform sequential search Cost for first record : 1 Cost for second record : 2 Search cost= 1p 1 + 2 p 2 + 3p 3 + … + np n Worst case (n+1)/2 Best if a few items are accessed many times

4 Self Organizing Lists 80/20 rule: 80% of the accesses are to 20% of the records –expected search cost =.122n Self organizing lists modify the order of records within the list basedon the actual pattern of record accesses. Self organizing lists use a rule called a heuristic for deciding how to reorder the list.

5 Self Organizing Heuristics Order by actual frequency - most frequently used first When a record is found, swap it with the first item When a record is found, move it to the front of the list When a record is found, swap it with the record ahead of it

6 Hashing The process of mapping a key value to a position in a table. A hash function maps key values to positions. A hash table is an array that holds the records. The hash table has M slots (0:M-1) For any value K in the key range and some hash function h, h(k) = I where 0≤ I<M, and key(T[I])=K

7 Hashing Situations Hashing is appropriate for unique keys. Good for both in-memory and disk based applications. Answers the question “What record, if any, has key value K?” Example: Store the n records with keys in range 0-(n-1). –Store the record with key i in slot i. –Uses the hash function h(k)=k. (Identity function).

8 Collisions More reasonable example –Store about 1000 records with keys in the range 0-16,383. –Impractical to keep a table of size 16,384. –We need a hash function to map keys to a smaller range. Given a hash function h and different keys k 1 and k 2. Let  be a position in the hash table. –If h(k 1 )= h(k 2 )=  then k 1 and k 2 have a collision at  under h.

9 Collision Resolution To search for the record with key K: –Compute the table location h(K). –Starting with slot h(K), locate the record containing key K using (if necessary) a collision resolution policy. Collisions are inevitable in most applications. –Example: In a group of 23 people the odds are good that at least one pair share a birthday.

10 Hash Functions Must return a value within the table range. Should evenly distribute the records to be stored among the table slots. Ideally, the function should distribute records with equal probability to all the positions. In reality, usually depends on the data. If we know nothing about the key distribution, evenly distribute the key range among the positions. If we know about the key distribution, use a distribution dependant hash function.

11 Example Hash Functions h(key)=key % 16 - uses only last 4 bits. H(key)=key % 1000 - uses last 4 digits. Use % tablesize to make sure result is in the range. Mid-square method: square the key and take the middle r bits for a table of size 2 r Sum up ASCII characters and take results modulo tablesize (a folding technique).

12 Collision Handling Categories Open hashing - when there is a collision, put collided item outside the table. Closed hashing - when there is a collision, put collided item inside the table.

13 Open Hashing Look at each table element as the head of a linked list of items that has to that position. Can organize the linked lists in many ways –ordered : unsuccessful searches are quickly found. –Ordered by frequency: if a few are searched for frequently, then this is a good technique. If there are N records to be stored and the table is of size M then the average search length is O(N/M). Good for internal memory. Linked nodes may be in different blocks on disk and cause many disk accesses.

14 Closed Hashing - Linear Probe If the item you are looking for is not in the hash position, look in the next position. Do the same for insert until you find an empty location. When you reach the bottom, go to the beginning. Must have at least one empty slot or there will be an infinite loop. Tends to have clustering since the collision position is not uniformly distributed (i.e. if collide at position 4, go to position 5, then 6, independent of key).

15 Better Linear Probe Instead of going to the next slot, skip by some constant c. The tablesize M and c should be relatively prime. This assures the probing will cycle through all the table. Still has some clustering.

16 Quadratic Probe Instead of adding 1 to the key add i 2 i is the probe sequence, so add 1, 4, 9, 16,... Remember we also mod with table size.

17 Double Hashing After a collision, use a different hash function. Eliminates clustering to some degree. For example if h(k) causes a collision then use –p(k,i)= i*h 2 (k) –h 2 is a different hash function –generates a different probe sequence

18 Analysis of Closed Hashing load factor =lf=N/M –N is the number of records –M is the size of the table –N/M is the percent full –The larger the load factor the greater the probability of a collision Average search length is O(1/(1-lf))

19 Deletions If we delete a value it may stop the search prematurely (break the chain). Use a special mark to indicate something was deleted. When searching continue if see this mark rather than stopping as if it was empty. Once we have many deleted items we may wish to rehash everything remaining –best if we rehash the most frequently accessed items first.


Download ppt "Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key."

Similar presentations


Ads by Google