Week 9 - Monday.  What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add.

Slides:



Advertisements
Similar presentations
CS Data Structures Chapter 8 Hashing.
Advertisements

The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Chapter 4: Trees Part II - AVL Tree
Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Hashing Techniques.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hash Tables and Associative Containers CS-212 Dick Steflik.
Introduction to Hashing & Hashing Techniques
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Hashing General idea: Get a large array
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
CS261 Data Structures Hash Tables Concepts. Goals Hash Functions Dealing with Collisions.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Week 6 - Friday.  What did we talk about last time?  Recursive running time  Fast exponentiation  Merge sort  Introduced the Master theorem.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Hash Tables1   © 2010 Goodrich, Tamassia.
Appendix E-A Hashing Modified. Chapter Scope Concept of hashing Hashing functions Collision handling – Open addressing – Buckets – Chaining Deletions.
CSCE 3110 Data Structures & Algorithm Analysis AVL Trees Reading: Chap. 4, Weiss.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Storage and Retrieval Structures by Ron Peterson.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Xiaoying Gao, Peter Andreae, VUW B Trees and B+ Trees COMP 261.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing – Part I CS 367 – Introduction to Data Structures.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Oct 26, 2001CSE 373, Autumn A Forest of Trees Binary search trees: simple. –good on average: O(log n) –bad in the worst case: O(n) AVL trees: more.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CSE373: Data Structures & Algorithms Lecture 6: Priority Queues Kevin Quinn Fall 2015.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Advanced Data Structure By Kayman 21 Jan Outline Review of some data structures Array Linked List Sorted Array New stuff 3 of the most important.
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees Linda Shapiro Winter 2015.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
Week 13 - Wednesday.  What did we talk about last time?  NP-completeness.
CSE373: Data Structures & Algorithms Lecture 8: AVL Trees and Priority Queues Catie Baker Spring 2015.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
CSE373: Data Structures & Algorithms Priority Queues
Binary Search Trees Implementing Balancing Operations Hash Tables
COMP261 Lecture 23 B Trees.
Week 7 - Friday CS221.
Week 8 - Wednesday CS221.
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Hash Table.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

Week 9 - Monday

 What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add

 Takes a potentially unbalanced tree and turns it into a balanced tree  Step 1  Turn the tree into a degenerate tree (backbone or vine) by right rotating any nodes with left children  Step 2  Turn the degenerate tree into a balanced tree by doing sequences of left rotations starting at the root  The analysis is not obvious, but it takes O(n) total rotations

 How much time does it take to insert n items with:  Red-black or AVL tree  Balanced insertion method  Unbalanced insertion + DSW rebalance  How much space does it take to insert n items with:  Red-black or AVL tree  Balanced insertion method  Unbalanced insertion + DSW rebalance

 Balanced binary trees ensure that accessing any element takes O(log n) time  But, maybe only a few elements are accessed repeatedly  In this case, it may make more sense to restructure the tree so that these elements are closer to the root

1. Single rotation  Rotate a child about its parent if an element in a child is accessed, unless it's the root 2. Moving to the root  Repeat the child-parent rotation until the element being accessed is the root

 A concept called splaying takes the rotate to the root idea further, doing special rotations depending on the relationship of the child R with its parent Q and grandparent P  Cases: 1. Node R's parent is the root  Do a single rotation to make R the root 2. Node R is the left child of Q and Q is the left child of P (respectively right and right)  Rotate Q around P  Rotate R around Q 3. Node R is the left child of Q and Q is the right child of P (respectively right and left)  Rotate R around Q  Rotate R around P

 We are not going deeply into self-organizing trees  It's an interesting concept, and rigorous mathematical analysis shows that splay trees should perform well, in terms of Big O bounds  In practice, however, they usually do not  An AVL tree or a red-black tree is generally the better choice  If you come upon some special problem where you repeatedly access a particular element most of the time, only then should you consider splay trees

Student Lecture

 We can define a symbol table ADT with a few essential operations:  put(Key key, Value value) ▪ Put the key-value pair into the table  get(Key key): ▪ Retrieve the value associated with key  delete(Key key) ▪ Remove the value associated with key  contains(Key key) ▪ See if the table contains a key  isEmpty()  size()  It's also useful to be able to iterate over all keys

 We have been talking a lot about trees and other ways to keep ordered symbol tables  Ordered symbol tables are great, but we may not always need that ordering  Keeping an unordered symbol table might allow us to improve our running time

 Balanced binary search trees give us:  O(log n) time to find a key  O(log n) time to do insertions and deletions  Can we do better?  What about:  O(1) time to find a key  O(1) to do an insertion or a deletion

 We make a huge array, so big that we’ll have more spaces in the array than we expect data values  We use a hashing function that maps keys to indexes in the array  Using the hashing function, we know where to put each key but also where to look for a particular key

 Let’s make a hash table to store integer keys  Our hash table will be 13 elements long  Our hashing function will be simply modding each value by 13

 Insert these keys: 3, 19, 7, 104,

 Find these keys:  19  YES!  88  NO!  16  NO!

 We are using a hash table for a space/time tradeoff  Lots of space means we can get down to O(1)  How much space do we need?  How do we pick a good hashing function?  What happens if two values collide (map to the same location)

 We want a function that will map data to buckets in our hash table  Important characteristics:  Efficient: It must be quick to execute  Deterministic: The same data must always map to the same bucket  Uniform: Data should be mapped evenly across all buckets

 We want a function h(k) that computes a hash for every key k  The simplest way of guaranteeing that we hash only into legal locations is by setting h(k) to be:  h(k) = k mod N where N is the size of the hash table  To avoid crowding the low indexes, N should be prime  If it is not feasible for N to be prime, we can add another step using a prime p > N:  h(k) = (k mod p) mod N

 Pros  Simple  Fast  Easy to do  Good if you know nothing about the data  Cons  Prime numbers are involved (What’s the nearest prime to the size you want?)  Uses no information about the data  If the data is strangely structured (multiples of p, for example) it could all hash to the same location

 Break the key into parts and combine those parts  Shift folding puts the parts together without transformations  SSN: is broken up and summed = 1,368, then modded by N, probably  Boundary folding puts the parts together reversing every other part of the key  SSN: is broken up and summed = 1,566, then modded by N, probably

 Pros  Relatively Simple and Fast  Mixes up the data more than division  Points out a way to turn strings or other non- integer data into an integer that can be hashed  Transforms the numbers so that patterns in the data are likely to be removed  Cons  Primes are still involved  Uses no special information about the data

 Square the key, then take the “middle” numbers out of the result  Example: key = 3,121 then 3,121 2 = 9,740,641 and the hash value is 406  One nice thing about this method is that we can make the table size be a power of 2  Then, we can take the log 2 N middle bits out of the squared value using bitwise shifts and masking

 Pros  Randomizes the data a lot  Fast when implemented correctly  Primes are not necessary  Cons  Uses no special information about the data

 Remove part of the key, especially if it is useless  Example:  Many SSN numbers for Indianapolis residents begin with 313  Removing the first 3 digits will, therefore, not reduce the randomness very much, provided that you are looking at a list of SSN’s for Indianapolis residents

 Pros  Uses information about the key  Can be efficient and easy to implement  Cons  Requires special knowledge  Careless extraction of digits can give poor hashing performance

 Change the number to a different base  Then, treat the base as if it were still base 10 and use the division method  Example: 345 is 423 in base 9  If N = 100, we could take the mod and put 345 in location 23

 Pros  If many numbers have similar final digits or values mod N (or p), they can be randomized by this method  Cons  Choice of base can be difficult  Effects are unpredictable  Not as quick as many of the other methods  Values that didn’t collide before might now collide

 Determine if a string has any duplicate characters  Weak!  Okay, but do it in O(m) time where m is the length of the string

 Collisions  Chaining implementation of hash tables

 Keep working on Project 3!