Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Structures Hash Table (aka Dictionary) i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst, Brian Hayes, Andreas Veneris, Glenn Brookshear,

Similar presentations


Presentation on theme: "Data Structures Hash Table (aka Dictionary) i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst, Brian Hayes, Andreas Veneris, Glenn Brookshear,"— Presentation transcript:

1 Data Structures Hash Table (aka Dictionary) i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst, Brian Hayes, Andreas Veneris, Glenn Brookshear, Nicolas Christin, or Ion Stoica.

2 John Chuang2 Outline  What is a data structure  Basic building blocks: arrays and linked lists  Data structures (uses, methods, performance): -List, stack, queue -Dictionary -Tree -Graph

3 John Chuang3 Dictionary  Also known as hash table, lookup table, associative array, or map  A searchable collection of key-value pairs -each key is associated with one value  Many possible applications, e.g., -Address book -Student record database -Routing tables -…

4 John Chuang4 Dictionary Methods  get(k): if the dictionary has an entry with key k, return its associated value; else return null  put(k,v): insert entry (k,v) into the dictionary; if key is not already in dictionary, return null; else return old value associated with k  remove(k): if dictionary has an entry with key k, remove it from dictionary and return its associated value; else return null  size()  isEmpty()

5 John Chuang5 Dictionary in Python  Example: >>> agents = {‘006’:’Jack Giddings’, ‘007’:’James Bond’} >>> agents[‘001’] = ’Edward Donne’ >>> agents[‘007’] ‘James Bond’

6 John Chuang6 Desirable Properties  Fast search, insert and delete (get, put, and remove)  Efficient space usage  Arrays: O(1) operations; not space efficient  Linked lists: O(n) operations; space efficient  Binary trees: O(log n) operations; space efficient  Can we do better?

7 John Chuang7 Hash Table  A generalization of an array that, if properly designed, can realize fast insert/delete/search operations with good space efficiency -average run-time of O(1) -worst case run-time of O(n)  A hash table is: -Good for storing and retrieving key-value pairs -Not good for iterating through a list of items

8 John Chuang8 Hash Table  A hash table consists of two major components: -Bucket array (for storing entries) -Hash function (for mapping keys to buckets) Obj5 key=1 obj1 key=15 Obj4 key=2 Obj2 key=36 7654321076543210 table Obj3 key=4 buckets hash value/index

9 John Chuang9 Hash Table Design  Bucket array is an array A of size N, where each cell of A is considered a “bucket”  Hash function h maps each key k to an integer in the range [0, N -1]  Store entry (k,v) in the bucket A[h(k)]  Search for entry (k,v) in the bucket A[h(k)]  Choose hash function such that -Can take arbitrary objects (keys) as input -Hash computation is fast -Key mappings evenly distributed across [0, N -1]

10 John Chuang10 Example Hash Function  h(k) = k mod N  mod stands for modulo, the remainder of the division of two numbers. For example: -8 mod 5 = 3 -9 mod 5 = 4 -10 mod 5 = 0 -15 mod 5 = 0  Observe that collisions are possible: -two different keys hash to the same value

11 John Chuang11 Example  h(k) = k mod 5 0123401234 key value Insert (2,x) 2 x 0123401234 key value Insert (21,y) 2 x 21 y 0123401234 key value Insert (34,z) 2 x 21 y 34 z Insert (54,w) There is a collision at array entry #4 ???

12 John Chuang12 Dealing with Collisions  Hashing with Chaining: every hash table entry contains a pointer to a linked list of keys that hash in the same entry 0123401234 key value Insert (54,w) 2 x 21 y 54 w34 z CHAIN 0123401234 Insert (101,x) 21 y 2 x 101 x 54 w34 z

13 John Chuang13 Dictionary Performance  What is the run time for insert/search/delete? -Insert: It takes O(1) time to compute the hash function and insert at head of linked list -Search: It is proportional to the length of the linked list -Delete: Same as search 0123401234 key value Insert (54,w) 2 x 21 y 54 w34 z CHAIN 0123401234 Insert (101,x) 21 y 2 x 101 x 54 w34 z

14 John Chuang14 Load Factor  Average length of the linked lists is a function of the load factor -Load factor = number of items in hash table / array size  In Python, the implementation details are transparent to the programmer (load factor kept under 2/3)  In Java, the programmer can set the initial table size (default=16) and/or the load factor (default = 0.75)

15 John Chuang15 Distributed Hash Table (DHT)  Similar to traditional hash table data structure, except data is stored in distributed peer nodes -Each node is analogous to a bucket in a hash table.  Applications: distributed search, e.g., peer-to-peer networks, content distribution networks (CDNs), etc.  DHTs are typically designed to scale to large numbers of nodes and to handle continual node arrivals and departures/failures.  Put(), Get() interface like a regular hash table: -put(id, item); -item = get(id);

16 John Chuang16 DHT Example: Chord  Nodes: n1(id=1), n2(id=2), n3(id=0), n4(id=6)  Items inserted: f1(id=7), f2(id=1) 0 1 2 3 4 5 6 7 i id+2 i succ 0 2 2 1 3 6 2 5 6 Finger Table i id+2 i succ 0 3 6 1 4 6 2 6 6 Finger Table i id+2 i succ 0 1 1 1 2 2 2 4 6 Finger Table 7 Items 1 i id+2 i succ 0 7 0 1 0 0 2 2 2 Finger Table Slide adapted from Ion Stoica, Nicolas Christin

17 John Chuang17 DHT Example: Chord  Upon receiving a query for item id, a node:  Check if the item is stored locally  If not, forwards the query to the largest node in its successor table that does not exceed id 0 1 2 3 4 5 6 7 i id+2 i succ 0 2 2 1 3 6 2 5 6 Finger Table i id+2 i succ 0 3 6 1 4 6 2 6 6 Finger Table i id+2 i succ 0 1 1 1 2 2 2 4 6 Finger Table 7 Items 1 i id+2 i succ 0 7 0 1 0 0 2 2 2 Finger Table query(7) Slide adapted from Ion Stoica, Nicolas Christin


Download ppt "Data Structures Hash Table (aka Dictionary) i206 Fall 2010 John Chuang Some slides adapted from Marti Hearst, Brian Hayes, Andreas Veneris, Glenn Brookshear,"

Similar presentations


Ads by Google