Hash Discrete Mathematics and Its Applications Baojian Hua

Slides:



Advertisements
Similar presentations
C and Data Structures Baojian Hua
Advertisements

Data Structure & Abstract Data Type
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
CSCE 3400 Data Structures & Algorithm Analysis
Hashing as a Dictionary Implementation
Extensible Array C and Data Structures Baojian Hua
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Log Files. O(n) Data Structure Exercises 16.1.
Dictionaries and Their Implementations
Maps, Dictionaries, Hashtables
Map, Set & Bit-Vector Discrete Mathematics and Its Applications Baojian Hua
Queue C and Data Structures Baojian Hua
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Binary Search Tree C and Data Structures Baojian Hua
Relation Discrete Mathematics and Its Applications Baojian Hua
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University.
C and Data Structures Baojian Hua
Hash Tables1 Part E Hash Tables  
Extensible Array C and Data Structures Baojian Hua
Extensible Array C and Data Structures Baojian Hua
Linked List C and Data Structures Baojian Hua
Functional List C and Data Structures Baojian Hua
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hash C and Data Structure Baojian Hua
Binary Search Tree C and Data Structures Baojian Hua
§3 Separate Chaining ---- keep a list of all keys that hash to the same value struct ListNode; typedef struct ListNode *Position; struct HashTbl; typedef.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
COSC 2007 Data Structures II
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hash Tables1   © 2010 Goodrich, Tamassia.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
HASHING PROJECT 1. SEARCHING DATA STRUCTURES Consider a set of data with N data items stored in some data structure We must be able to insert, delete.
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
Hash C and Data Structure Baojian Hua
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
ENEE150 – 0102 ANDREW GOFFIN Project 4 & Function Pointers.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
Copyright © Curt Hill Hashing A quick lookup strategy.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
Hash Tables From “Algorithms” (4 th Ed.) by R. Sedgewick and K. Wayne.
1 the hash table. hash table A hash table consists of two major components …
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Frank Carrano, © 2012.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
School of Computer Science and Engineering
Advanced Associative Structures
Richard Anderson (instead of Martin Tompa)
Discrete Mathematics and
Chapter 21 Hashing: Implementing Dictionaries and Sets
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

Hash Discrete Mathematics and Its Applications Baojian Hua

Searching A dictionary-like data structure contains a collection of tuple data:,, … keys are comparable and pair-wise distinct supports these operations: new () insert (dict, k, v) lookup (dict, k) delete (dict, k)

Examples ApplicationPurposeKeyValue Phone Bookphonenamephone No. Banktransactionvisa$$$ Dictionarylookupwordmeaning compilersymbolvariabletype om searchkey wordscontents …………

Summary So Far rep ’ op ’ arraysorted array linked list sorted linked list binary search tree lookup()O(n)O(lg n)O(n) insert()O(n) delete()O(n)

What ’ s the Problem? For every mapping (k, v)s After we insert it into the dictionary dict, we don ’ t know it ’ s position! Ex: insert (d, “ li ”, 97), (d, “ wang ”, 99), (d, “ zhang ”, 100), … and then lookup (d, “ zhang ” ); ( “ li ”, 97) … ( “ wang ”, 99) ( “ zhang ”, 100)

Basic Plan Start from the array-based approach Use an array A to hold elements (k, v)s For every key k: if we know its position (array index) i from k then lookup, insert and delete are simple: A[i] done in constant time O(1) … (k, v) i

Example Ex: insert (d, “ li ”, 97), (d, “ wang ”, 99), (d, “ zhang ”, 100), … ;and then lookup (d, “ zhang ” ); … (“li”, 97) ? Problem#1: How to calculate index from the given key?

Example Ex: insert (d, “ li ”, 97), (d, “ wang ”, 99), (d, “ zhang ”, 100), … ;and then lookup (d, “ zhang ” ); … (“li”, 97) ? Problem#2: How long should array be?

Basic Plan Save (k, v)s in an array, index i calculated from key k Hash function: a method for computing index from given keys … (“li”, 97) hash (“li”)

Hash Function Given any key, compute an index Efficiently computable Ideal goals: for any key, the index is uniform different keys to different indexes However, thorough research problem, :-( Next, we assume that the array is of infinite length, so the hash function has type: int hash (key k); To get some idea, next we perform a “ case analysis ” on how different key types affect “ hash ”

Hash Function On “ int ” // If the key of hash is of “int” type, the hash // function is trivial: int hash (int i) { return i; }

Hash Function On “ char ” // If the key of hash is of “char” type, the hash // function comes with type conversion: int hash (char c) { return c; }

Hash Function On “ float ” // Also type conversion: int hash (float f) { return (int)f; } // how to deal with 0.aaa, say 0.5?

Hash Function On “ string ” // Example: “BillG”: // A trivial one, but not so good: int hash (char *s) { int i=0, sum=0; while (s[i]) { sum += s[i]; i++; } return sum; }

Hash Function On “ Point ” // Suppose we have a user-define type: struct Point2d { int x; int y; }; int hash (struct Point2d pt) { // ??? }

From “ int ” Hash to Index Recall the type: int hash (T data); Problems with “ int ” return type At any time, the array is finite no negative index (say -10) Our goal: int i ==> [0, N-1] Ok, that ’ s easy! It ’ s just: abs(i) % N

Bug! Note that “ int ” s range: ~ So abs(-2 31 ) = 2 31 Overflow! The key step is to wipe the sign bit off int t = i & 0x7fffffff; int hc = t % N; In summary: hc = (i & 0x7fffffff) % N;

Collision Given two keys k1 and k2, we compute two hash codes hc1, hc2  [0, N-1] If k1<>k2, but h1==h2, then a collision occurs … (k1, v1) i (k2, v2)

Collision Resolution Open Addressing Re-hash Chaining (Multi-map)

Chaining For collision index i, we keep a separate linear list (chain) at index i … (k1, v1) i (k2, v2) k1 k2

General Scheme k1 k2 k5k8 k43

Load Factor loadFactor=numItems/numBuckets defaultLoadFactor: default value of the load factor k1 k2 k5k8 k43

“ hash ” ADT: interface #ifndef HASH_H #define HASH_H typedef void *poly; typedef poly key; typedef poly value; typedef struct hashStruct *hash; hash newHash (); hash newHash2 (double lf); void insert (hash h, key k, value v); poly lookup (hash h, key k); void delete (hash h, key k); #endif

Hash Implementation #include “hash.h” #define EXT_FACTOR 2 #define INIT_BUCKETS 16 struct hashStruct { linkedList *buckets; int numBuckets; int numItems; double loadFactor; };

In Figure k1 k2 k5k8 k43 buckets loadFactor numItems numBuckets h

“ newHash () ” hash newHash () { hash h = (hash)malloc (sizeof (*h)); h->buckets = malloc (INIT_BUCKETS * sizeof (linkedList)); for (…) // init the array h->numBuckets = INIT_BUCKETS; h->numItems = 0; h->loadFactor = 0.25; return h; }

“ newHash2 () ” hash newHash2 (double lf) { hash h = (hash)malloc (sizeof (*h)); h->buckets=(linkedList *)malloc (INIT_BUCKETS * sizeof (linkedList)); for (…) // init the array h->numBuckets = INIT_BUCKETS; h->numItems = 0; h->loadFactor = lf; return h; }

“ lookup (hash, key) ” value lookup (hash h, key k, compTy cmp) { int i = k->hashCode (); // how to perform this? int hc = (i & 0x7fffffff) % (h->numBuckets); value t =linkedListSearch ((h->buckets)[hc], k); return t; }

Ex: lookup (ha, k43) k1 k2 k5k8 k43 buckets ha hc = (hash (k43) & 0x7fffffff) % 8; // hc = 1

Ex: lookup (ha, k43) k1 k2 k5k8 k43 buckets ha hc = (hash (k43) & 0x7fffffff) % 8; // hc = 1 compare k43 with k8,

Ex: lookup (ha, k43) k1 k2 k5k8 k43 buckets ha hc = (hash (k43) & 0x7fffffff) % 8; // hc = 1 compare k43 with k43, found!

“ insert ” void insert (hash h, poly k, poly v) { if (1.0*numItems/numBuckets >=defaultLoadFactor) // buckets extension & items re-hash; int i = k->hashCode (); // how to perform this? int hc = (i & 0x7fffffff) % (h->numBuckets); tuple t = newTuple (k, v); linkedListInsertHead ((h->buckets)[hc], t); return; }

Ex: insert (ha, k13) k1 k2 k5k8 k43 buckets ha hc = (hash (k13) & 0x7fffffff) % 8; // suppose hc==4

Ex: insert (ha, k13) k13 k1 k5k8 k43 buckets ha hc = (hash (k13) & 0x7fffffff) % 8; // suppose hc==4 k2

Complexity rep ’ op ’ arraysorted array linked list sorted linked list hash lookup()O(n)O(lg n)O(n) O(1) insert()O(n) O(1) delete()O(n) O(1)