Presentation is loading. Please wait.

Presentation is loading. Please wait.

Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.

Similar presentations


Presentation on theme: "Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc."— Presentation transcript:

1 Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc. Typical symbol table operations are Insert, Delete and Search It's a dictionary structure!

2 Symbol Tables What kind of information is usually stored in a symbol table? type storage class size scope stack frame offset register We also need a way to keep track of reserved words.

3 Symbol Tables Where is a symbol table stored? array/linked list
simple, but linear lookup time However, we may use a sorted array for reserved words, since they are generally few and known in advance. balanced tree O(lgn) lookup time hash table most common implementation O(1) amortized time for dictionary operations

4 Hashing Hash tables Hash functions
use array of size m to store elements given key k (the identifier name), use a function h to compute index h(k) for that key collisions are possible two keys hash into the same slot. Hash functions A good hash function is easy to compute avoids collisions (by breaking up patterns in the keys and uniformly distributing the hash values)

5 Hashing In the following slides: k is a key h(k) is the hash function
m is the size of the hash table n is the number of keys in the hash table

6 Hashing What makes a good hash function? It is easy to compute
It minimizes collisions. hash = to chop into small pieces (Merriam- Webster) = to chop any patterns in the keys so that the results are uniformly distributed (cs311)

7 Hashing When the key is a string, we generally use the ASCII values of its characters in some way: Examples for k = c1c2c3...cx h(k) = (c1128x-1+c2128x cx1280) mod m h(k) = (c1+c2+...+cx) mod m h(k) = (h1(c1)+h2(c2)+...hx(cx)) mod m, where each hi is an independent hash function.

8 Hash functions Truncation
Ignore part of the key and use the remaining part directly as the index. Example: if the keys are 8-digit numbers and the hash table has 1000 entries, then the first, fourth and eighth digit could make the hash function. Not a very good method : does not distribute keys uniformly

9 Hash functions Folding
Break up the key in parts and combine them in some way. Example : if the keys are 9 digit numbers, break up a key into three 3-digit numbers and add them up.

10 Hash functions Middle square
Compute k*k and pick some digits from the resulting number. Example : given a 9-digit key k, and a hash table of size 1000 pick three digits from the middle of the number k*k. Works fairly well in practice if the keys do not have many leading or trailing zeroes.

11 Hash functions Division h(k)=k mod m Fast
Not all values of m are suitable for this. For example powers of 2 should be avoided because then k mod m is just the least significant digits of k Good values for m are prime numbers .

12 Hash functions Multiplication
h(k)=m (k  c- k  c)  , 0<c<1 In English : Multiply the key k by a constant c, 0<c<1 Take the fractional part of k  c Multiply that by m Take the floor of the result The value of m does not make a difference Some values of c work better than others A good value is

13 Hash functions Multiplication Example: nice distribution!
Suppose the size of the table, m, is 1301. For k=1234, h(k)=850 For k=1235, h(k)=353 For k=1236, h(k)=115 For k=1237, h(k)=660 For k=1238, h(k)=164 For k=1239, h(k)=968 For k=1240, h(k)=471 nice distribution!

14 Hash functions Universal Hashing
Worst-case scenario: The chosen keys all hash to the same slot. This can be avoided if the hash function is not fixed: Start with a collection of hash functions Select one at random and use that. Good performance on average: the probability that the randomly chosen hash function exhibits the worst-case behavior is very low.

15 Load factor Given a hash table of size m, and n elements stored in it, we define the load factor of the table as =n/m The load factor gives us an indication of how full the table is. The possible values of the load factor depend on the method we use for resolving collisions.

16 Resolving collisions: Chaining
Put all the elements that collide in a chain (list) attached to the slot. The hash table is an array of linked lists The load factor indicates the average number of elements stored in a chain. It could be less than, equal to, or larger than 1. * a.k.a. closed addressing

17 Resolving collisions: Chaining
Insert/Delete/Lookup in expected O(1) time Keep the list doubly-linked to facilitate deletions Worst case of lookup time is linear. However, this assumes that the chains are kept small. If the chains start becoming too long, the table must be enlarged and all the keys rehashed.

18 Resolving collisions: Chaining
Assumption: simple uniform hashing any given key is equally likely to hash into any of the m slots Analysis of unsuccessful search: average time to search unsuccessfully for key k = the average time to search to the end of a chain. The average length of a chain is . Total (average) time required : (1+ )

19 Resolving collisions: Chaining
Analysis of successful search: Expected number e of elements examined during a successful search for key k = one more than the expected number of elements examined when k was inserted. it makes no difference whether we insert at the beginning or the end of the list. Take the average, over the n items in the table, of 1 plus the expected length of the chain to which the i th element was added:

20 Resolving collisions: Chaining
Total time : (1+ )

21 Resolving collisions: Chaining
Both types of search take (1+ ) time on average. If n=O(m), then =O(1) and the total time for Search is O(1) on average Insert : O(1) in the worst case Delete : O(1) in the worst case

22 Resolving collisions: Chaining
Storage for the elements may be allocated and deallocated within the hash table itself by linking all unused slots into a free list. Insert: if key k hashes into empty slot h(k), put it there and set a flag to indicate that this is the actual position where the element hashed. if h(k) is not empty, and the element k1 it contains has its flag set, then use a slot off the free list to store k1. Its flag should be unset. if h(k) is not empty, and the element k1 it contains has its flag unset, then move k1 to another slot and store k in h(k).


Download ppt "Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc."

Similar presentations


Ads by Google