A Knowledge Sharing Session on

A Knowledge Sharing Session on
Unit IV: Tables (DSPS)

Tables |Unit IV of DSPS (SE-Comp)
Syllabus: Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree Hash Tables: Basic Concepts, Hash Function, Hashing methods, Collision resolution, Bucket hashing, Dynamic Hashing. Tables |Unit IV of DSPS (SE-Comp)

Part I : Symbol Tables Part II: Hash Tables
Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree. Part II: Hash Tables Hash Tables: Basic Concepts, Hash Function, Hashing methods, Collision resolution, Bucket hashing, Dynamic Hashing.

AVL Algorithm Analysis
Symbol Table | Why Symbol Table Symbol Table What Compiler Does? • Lexical analysis – Detects inputs with illegal tokens • e.g.: main$ (); • Parsing – Detects inputs with ill-formed parse trees • e.g.: missing semicolons • Semantic analysis – Last “front end” phase – Catches all remaining errors Symbol Table Examples AVL Tree AVL Implementation AVL Algorithm Analysis

Symbol Table | Why Symbol Table
Typical Semantic Errors multiple declarations: a variable should be declared (in the same region) at most once. undeclared variable: a variable should not be used before being declared. type mismatch: type of the left-hand side of an assignment should match the type of the right-hand side. wrong arguments: methods should be called with the right number and types of arguments.

Symbol Table | Aim of Symbol Table
Purpose of Symbol Table – keep track of names declared in the program – names of • variables, classes, fields, methods,

Symbol Table | Symbol Table Stores
What it Contains associates a name with a set of attributes, e.g.: • kind of name (variable, class, field, method, etc) • type (int, float, etc) • nesting level • memory location (i.e., where will it be found at runtime).

Symbol Table | Symbol Table Revisit
In Short, During Lexical Analysis --Finds Symbols --Adds Symbols to symbol table During Syntactic Analysis --Information about each symbol is filled in During Semantic Analysis --Used for type checking.

Symbol Table | Symbol Table Important?
Info Provided by Symbol Table , Given an Identifier which name is it? What information is to be associated with a name? (Actual Characters of the name, Type, Storage allocation info (number of bytes), Line number where declared, Lines where referenced, Scope. How do we access this information? How do we associate this information with a name?

Symbol Table | Reminder on Symbol Table
Note, A name can represent Variable Type Constant Parameter Record Record Field Procedure Array Label file

Symbol Table |Operations on Symbol Table
determining whether a string has already been stored inserting an entry for a string deleting a string when it goes out of scope This requires three functions: 1. lookup(s): returns the index of the entry for string s, or 0 if there is no entry 2. insert(s): add a new entry for string s and return its index 3. delete(s): deletes s from the table (or, typically, hides it)

Symbol Table | Symbol Table Examples
01 PROGRAM Main 02 GLOBAL a,b 03 PROCEDURE P (PARAMETER x) 04 LOCAL a 05 BEGIN {P} 06 …a… 07 …b… 08 …x… 09 END {P} 10 BEGIN{Main} 11 Call P(a) 12 END {Main}

Symbol Table Unsorted List Look up Complexity 01 PROGRAM Main
02 GLOBAL a,b 03 PROCEDURE P (PARAMETER x) 04 LOCAL a 05 BEGIN {P} 06 …a… 07 …b… 08 …x… 09 END {P} 10 BEGIN{Main} 11 Call P(a) 12 END {Main} Look up Complexity Name Characteristic Class Scope Other Attributes Declared Referenced Other Main Program 0 Line 1 a Variable 0 Line 2 Line 11 b Variable 0 Line 2 Line 7 P Procedure 0 Line 3 Line 11 1, parameter, x x Parameter 1 Line 3 Line 8 a Variable 1 Line 4 Line 6

Symbol Table Sorted List Look up Complexity Worst Case:
01 PROGRAM Main 02 GLOBAL a,b 03 PROCEDURE P (PARAMETER x) 04 LOCAL a 05 BEGIN {P} 06 …a… 07 …b… 08 …x… 09 END {P} 10 BEGIN{Main} 11 Call P(a) 12 END {Main} Look up Complexity Worst Case: Name Characteristic Class Scope Other Attributes Declared Referenced Other a Variable 0 Line 2 Line 11 a Variable 1 Line 4 Line 6 b Variable 0 Line 2 Line 7 Main Program 0 Line 1 P Procedure 0 Line 3 Line 11 1, parameter, x x Parameter 1 Line 3 Line 8

Two issues: 1. Interface: how to use symbol tables
2. Implementation: how to implement it.

Basic Implementation Techniques
Considerations: Number of names Storage space Retrieval time

<1> unordered list (linked list/array) <2> ordered list
binary search on arrays expensive insertion (+) good for a fixed set of names (e.g. reserved words, assembly opcodes) <3> binary search tree On average, searching takes O(log(n)) time. However, names in programs are not chosen randomly. <4>AVL: <5> Hash table: most common (+) constant time For balanced tree, search takes O(log(n)) time. For random input, search takes O(log(n)) time. However, average search time is 38% greater than that for a balanced tree. In worst case, search takes O(n) time. e.g. ( A B C D E ) linear list ( A E B D C ) linear list

Static Tree Table If Symbols are known in advance :
No insertion and Deletion allowed Cost of searching symbols of higher frequency should be small. Huffman tree and OBST 1 if 1 1 c b a do Read 1 while e d Fig: Optimal Search Tree when frequency of symbols are specified Fig: Huffman Tree

Dynamic Tree Tables Symbols are inserted as and when they come
Deletion is also possible AVL 32 60 20 45 55 68 50 bst

Part I : Symbol Tables Part II: Hash Tables
Symbol Tables: Static and dynamic tree tables, AVL trees, AVL Tree Implementation, Algorithms and analysis of AVL Tree Part II: Hash Tables Hash Tables: Basic Concepts, Hash Function, Hashing methods, Collision resolution, Bucket hashing, Dynamic Hashing.

Hash Table| Motivation
Where Hashing will be Used? docDict Database Compliers Network Router and Servers Substring Search Cryptography

Symbol Table | Why Hash Table
Hashing A Problem? • We have to store some records and perform the following: add new record delete record search a record by key Find a way to do these efficiently! Motivation Hashing Methods Collision Resolution

Hash Table| Unsorted Array
Use an array to store the records, in unsorted order add - add the records as the last entry fast O(1) delete a target - slow at finding the target, fast at filling the hole (just take the last entry) O(n) search - sequential search slow O(n)

Hash Table| Sorted Array
Use an array to store the records, keeping them in sorted order add - insert the record in proper position. much record movement slow O(n) 2. delete a target - how to handle the hole after deletion? Much record movement slow O(n) 3. search - binary search fast O(log n)

Hash Table| Linked List
Store the records in a linked list (unsorted) add - fast if one can insert node anywhere O(1) delete a target - fast at disposing the node, but slow at finding the target O(n) search - sequential search slow O(n) (if we only use linked list, we cannot use binary search even if the list is sorted.)

Hash Table| More Approaches
What is the Solution then? have better performance but are more complex Hash table Tree (BST, Heap, …)

Hash Table| More Approaches
Array as table? studid name score sandy 81.5 bubli 90 david 56.8 ... peter 20 manali 100 ... tushar 73 Namrata 49

Hash Table| Array as table?
name score : : : One ‘stupid’ way is to store the records in a huge array (index ). The index is used as the student id, i.e. the record of the student with studid is stored at A[12345] 12345 andy 81.5 : : : 33333 betty 90 : : : 56789 david 56.8 : : : : : : bill 49 : : :

Hash Table| Whats Wrong Then?
Consider this problem. We want to store 1,000 student records and search them by student id. One ‘stupid’ way is to store the records in a huge array (index ). The index is used as the student id, i.e. the record of the student with studid is stored at A[12345]

Hash Table| What's Wrong Then?
Keys may not be nonnegative integers. Gigantic Memory hog

(reduce universe of all keys to reasonable size)
Hash Table| What's Wrong Then? Keys may not be nonnegative integers. Solution: Prehash Gigantic Memory hog Solution: Direct Hash Table (reduce universe of all keys to reasonable size)

Hash Table| Direct Hashing Table
• Each slot, or position, corresponds to a key in U. • If there’s an element x with key k, then T [k] contains a pointer to x. • Otherwise, T [k] is empty, represented by NIL.

Hash Table| Direct Hashing Table
Store the records in a huge array where the index corresponds to the key add - very fast O(1) delete - very fast O(1) search - very fast O(1)

function Hash(key: KeyType): integer;
Hash Table| Hash function function Hash(key: KeyType): integer; Imagine that we have such a magic function Hash. It maps the key (studid) of the 1000 records into the integers , one to one. No two different keys maps to the same number. H(‘ ’) = 134 H(‘ ’) = 67 H(‘ ’) = 764 … H(‘ ’) = 3

Hash Table| Hash Table : : : 9908080 bill 49 : : : 0033333 betty 90 :
name score To store a record, we compute Hash(stud_id) for the record and store it at the location Hash(stud_id) of the array. To search for a student, we only need to peek at the location Hash(target stud_id). : : : 3 bill 49 : : : 67 betty 90 : : : 134 andy 81.5 : : : 764 david 56.8 : : : 999 : : :

Hash Table| Division Method
Ex: key mod size 2201 mod 1000 =201 h(k) = k mod m

Hash Table| Collision different keys map to the same index
i.e h(k1)=h(k2)=i (k1!=K2) Ex: 5 mod 11 and 27 mod 11 have index 5.

Hashing Widely useful technique for implementing dictionaries
Constant time per operation (on the average) Best Case O(1) Worst Case O(n) 1 2 3 4 5 Key Record f()=>address

Ch s Hash Function Quick Computation I t should spread keys evenly:
Uniform Distribution Avoid collision Very rare cases E.g Birth day paradox

Hash Functions Direct hashing Digit Extraction Modulo –division method
Mid-square Method Folding method

Hash Table|-Collision Resolution DS
Hashing with Separate Chaining (Open hashing)-unlimited space Hashing with Open Addressing(closed hashing)

Hash Table|-Collision Resolution Strategies
Separate chaining Open Addressing Linear Probing Quadratic Probing Double Hashing LP without chaining LP with chaining LPWC with replacement LP WC without replacement

Hash Table| Chained Hash Table
One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil. 1 nil 2 nil 3 4 nil 5 : Key: name: tom score: 73 HASHMAX nil

Hash Table| Rehashing Is required: When table is completely full
With quadratic probing when table is filled half When insertion fail due to overflow Size get double after rehashing Mod value changed to new size * Very costly as new table creation, insertion from old table with using new hash fun.

Hash Table| Rehashing It’s more efficient when load factor is >=70%
Whr l is load factor= l=h/t whr h is total mapped loc t is total loc.

Types of Linear Probing (with chaining with and without replacement
Note: Try to Solve all example that is taken in class on transparencies and on board ……you can take it from book…

Extendible Hashing All tech. so far are used for small data
When data becomes bulky there will be too many disk access So in that case use extendible hashing This uses binary (disk) coding to mapped the loc with binary values. 4 size hash table with 4 slot 00 01 10 11

**Implementation: Followings are some example how to create structure and apply hash function on it… Linear Probing with store and search Double hashing Quadratic probing

Linear Probe int search_LP(int hashtable[],int key,int T[]) { int I,j; J=key%max;// mapped loc for(i=0;i<MAX;i++) { if(T[j]==0) hashtable[j]=key; T[j]=1; return(j); } j=(j+1)%MAX;//next loc in circular way. return(-1);

Search in LP Only change if condition checking for{
If(T[j]==1 && hashtable[j]==key) { return(j); }

Double hashing int search_DH(int hashtable[],int T[]) { int I,j,start; start=f1(key)%max; // 1st mapped loc u=f2(key); // u will used for increment for(i=0;i<MAX;i++) j=(start+ i*u)%max; if(T[j]==0) // found empty { hashtable[j]=key; T[j]=1; return(j);} } return(-1);}

Quadratic hashing int search_QP(int hashtable[],int T[]) { int I,j,start; start=key%max; // 1st mapped loc for(i=0;i<MAX;i++) { j=(start+ i*i)%max; if(T[j]==0) // found empty { hashtable[j]=key; T[j]=1; return(j);} } return(-1);}

A Knowledge Sharing Session on

Similar presentations

Presentation on theme: "A Knowledge Sharing Session on"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Knowledge Sharing Session on

Similar presentations

Presentation on theme: "A Knowledge Sharing Session on"— Presentation transcript:

Similar presentations

About project

Feedback