Dictionaries, Hash Tables, Hashing, Collisions, Sets Svetlin Nakov Telerik Software Academy academy.telerik.com Technical Trainer www.nakov.com.

Slides:



Advertisements
Similar presentations
Lists, Stacks, Queues Svetlin Nakov Telerik Corporation
Advertisements

Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Hash Tables,
Dictionaries, Hash Tables, Collisions Resolution, Sets Svetlin Nakov Telerik Corporation
Data Structures and Collections
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing as a Dictionary Implementation
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Svetlin Nakov Telerik Software Academy academy.telerik.com Technical Trainer Wintellect Power Collections, C 5 Collections.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
CSE 373 Data Structures and Algorithms Lecture 18: Hashing III.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Lists, Stacks, Queues Svetlin Nakov Telerik Corporation
Lists, Stacks, Queues Svetlin Nakov Telerik Corporation
Trees, Tre-Like Structures, Binary Search Trees, Balanced Trees, Tree Traversals, DFS and BFS Svetlin Nakov Telerik Software Academy academy.telerik.com.
Collection types Collection types.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
FEN 2012UCN Technology - Computer Science 1 Data Structures and Collections Principles revisited.NET: –Two libraries: System.Collections System.Collections.Generics.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
Arrays, Lists, Stacks, Queues Static and Dynamic Implementation Svetlin Nakov Telerik Software Academy academy.telerik.com Technical Trainer
Hash Functions and the HashMap Class A Brief Overview On Green Marble John W. Benning.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Data structures and algorithms in the collection framework 1 Part 2.
111 © 2002, Cisco Systems, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
CSS446 Spring 2014 Nan Wang.  Java Collection Framework ◦ Set ◦ Map 2.
Multidimensional Arrays, Sets, Dictionaries Processing Matrices, Multidimensional Arrays, Dictionaries, Sets SoftUni Team Technical Trainers Software University.
Computational Complexity, Choosing Data Structures Svetlin Nakov Telerik Corporation
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing Hashing is another method for sorting and searching data.
CS261 Data Structures Ordered Bag Dynamic Array Implementation.
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Data Structures academy.zariba.com 1. Lecture Content 1.Linear Data Structures 2.Trees and Graphs* 3.Dictionaries and Hash Tables 4.Homework 2.
Interface: (e.g. IDictionary) Specification class Appl{ ---- IDictionary dic; dic= new XXX(); application class: Dictionary SortedDictionary ----
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
Computational Complexity of Fundamental Data Structures, Choosing a Data Structure.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Dictionaries, Hash Tables, Collisions Resolution, Sets Svetlin Nakov Telerik Corporation
Data Structures and Collections Principles.NET: –Two libraries: System.Collections System.Collections.Generics FEN 2014UCN Teknologi/act2learn1 Deprecated.
Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,
CMSC 202 Containers and Iterators. Container Definition A “container” is a data structure whose purpose is to hold objects. Most languages support several.
1 Principles revisited.NET: Two libraries: System.Collections System.Collections.Generics Data Structures and Collections.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CSE 373: Data Structures and Algorithms Lecture 16: Hashing III 1.
CSE 143 Lecture 11: Sets and Maps reading:
LINQ and Lambda Expressions Telerik Software Academy LINQ Overview.
Sets, Dictionaries SoftUni Team Technical Trainers Software University
Dictionaries, Hash Tables and Sets Dictionaries, Hash Tables, Hashing, Collisions, Sets SoftUni Team Technical Trainers Software University
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Building Java Programs Generics, hashing reading: 18.1.
Sets and Maps Chapter 9.
Sets, Hash table, Dictionaries
Combining Data Structures
Hash Tables, Sets and Dictionaries
Efficiency add remove find unsorted array O(1) O(n) sorted array
Algorithms Complexity and Data Structures Efficiency
Lesson 6. Types Equality and Identity. Collections.
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Sets and Maps Chapter 9.
Presentation transcript:

Dictionaries, Hash Tables, Hashing, Collisions, Sets Svetlin Nakov Telerik Software Academy academy.telerik.com Technical Trainer

1. Dictionaries 2. Hash Tables 3. Dictionary Class 4. Sets: HashSet and SortedSet 4. Sets: HashSet and SortedSet 2

Data Structures that Map Keys to Values

 The abstract data type (ADT) "dictionary" maps key to values  Also known as "map" or "associative array"  Contains a set of (key, value) pairs  Dictionary ADT operations:  Add(key, value)  FindByKey(key)  value  Delete(key)  Can be implemented in several ways  List, array, hash table, balanced tree,... 4

 Example dictionary: 5KeyValueC# Modern object-oriented programming language for the Microsoft.NET platform CLR Common Language Runtime – execution engine for.NET assemblies, integral part of.NET Framework compiler Software that transforms a computer program to executable machine code ……

What is Hash Table? How it Works?

 A hash table is an array that holds a set of (key, value) pairs  The process of mapping a key to a position in a table is called hashing 7…………………… …m-1 T h(k) Hash table of size m Hash function h: k → 0 … m-1

 A hash table has m slots, indexed from 0 to m-1  A hash function h(k) maps keys to positions:  h: k → 0 … m-1  For any value k in the key range and some hash function h we have h(k) = p and 0 ≤ p < m 8…………………… …m-1 T h(k)

 Perfect hashing function (PHF)  h(k) : one-to-one mapping of each key k to an integer in the range [0, m-1]  The PHF maps each key to a distinct integer within some manageable range  Finding a perfect hashing function is in most cases impossible  More realistically  Hash function h(k) that maps most of the keys onto unique integers, but not all 9

 A collision is the situation when different keys have the same hash value h(k 1 ) = h(k 2 ) for k 1 ≠ k 2  When the number of collisions is sufficiently small, the hash tables work quite well (fast)  Several collisions resolution strategies exist  Chaining in a list  Using the neighboring slots (linear probing)  Re-hashing (second hash function) ... 10

h("Pesho") = 4 h("Kiro") = 2 h("Mimi") = 1 h("Ivan") = 2 h("Lili") = m-1 11 Kiro Ivan null Mimi null Lili null Pesho null collision Chaining elements in case of collision null……null……… 01234…m-1 T

 Hash tables are the most efficient implementation of ADT "dictionary"  Add / Find / Delete take just few primitive operations  Speed does not depend on the size of the hash- table (constant time)  Amortized complexity O( 1 )  Example: finding an element in a hash-table with elements, takes just few steps  Finding an element in array of elements takes average steps 12

13

The Dictionary Class

 Implements the ADT dictionary as hash table  The size is dynamically increased as needed  Contains a collection of key-value pairs  Collisions are resolved by chaining  Elements have almost random order  Ordered by the hash code of the key  Dictionary relies on  Object.Equals() – for comparing the keys  Object.GetHashCode() – for calculating the hash codes of the keys 15

 Major operations:  Add(TKey,TValue) – adds an element with the specified key and value  Remove(TKey) – removes the element by key  this[] – get / add / replace of element by key  Clear() – removes all elements  Count – returns the number of elements  Keys – returns a collection of the keys  Values – returns a collection of the values 16

 Major operations:  ContainsKey(TKey) – checks whether the dictionary contains given key  ContainsValue(TValue) – checks whether the dictionary contains given value  Warning: slow operation – O(n)  TryGetValue(TKey, out TValue)  If the key is found, returns it in the TValue  Otherwise returns false 17

18 Dictionary studentsMarks = new Dictionary (); new Dictionary (); studentsMarks.Add("Ivan", 4); studentsMarks.Add("Peter", 6); studentsMarks.Add("Maria", 6); studentsMarks.Add("George", 5); int peterMark = studentsMarks["Peter"]; Console.WriteLine("Peter's mark: {0}", peterMark); Console.WriteLine("Is Peter in the hash table: {0}", studentsMarks.ContainsKey("Peter")); studentsMarks.ContainsKey("Peter")); Console.WriteLine("Students and grades:"); foreach (var pair in studentsMarks) { Console.WriteLine("{0} --> {1}", pair.Key, pair.Value); Console.WriteLine("{0} --> {1}", pair.Key, pair.Value);}

Live Demo

20 string text = "a text, some text, just some text"; IDictionary wordsCount = new Dictionary (); new Dictionary (); string[] words = text.Split(' ', ',', '.'); foreach (string word in words) { int count = 1; int count = 1; if (wordsCount.ContainsKey(word)) if (wordsCount.ContainsKey(word)) count = wordsCount[word] + 1; count = wordsCount[word] + 1; wordsCount[word] = count; wordsCount[word] = count;} foreach(var pair in wordsCount) { Console.WriteLine("{0} -> {1}", pair.Key, pair.Value); Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);}

Live Demo

 Data structures can be nested, e.g. dictionary of lists: Dictionary > 22 static Dictionary > studentGrades = new Dictionary >(); new Dictionary >(); private static void AddGrade(string name, int grade) { if (! studentGrades.ContainsKey(name)) if (! studentGrades.ContainsKey(name)) { studentGrades[name] = new List (); studentGrades[name] = new List (); } studentGrades[name].Add(grade); studentGrades[name].Add(grade);}

Live Demo

The Sorted Dictionary Class

 SortedDictionary implements the ADT "dictionary" as self- balancing search tree  Elements are arranged in the tree ordered by key  Traversing the tree returns the elements in increasing order  Add / Find / Delete perform log 2 (n) operations  Use SortedDictionary when you need the elements sorted by key  Otherwise use Dictionary – it has better performance 25

26 string text = "a text, some text, just some text"; IDictionary wordsCount = new SortedDictionary (); new SortedDictionary (); string[] words = text.Split(' ', ',', '.'); foreach (string word in words) { int count = 1; int count = 1; if (wordsCount.ContainsKey(word)) if (wordsCount.ContainsKey(word)) count = wordsCount[word] + 1; count = wordsCount[word] + 1; wordsCount[word] = count; wordsCount[word] = count;} foreach(var pair in wordsCount) { Console.WriteLine("{0} -> {1}", pair.Key, pair.Value); Console.WriteLine("{0} -> {1}", pair.Key, pair.Value);}

Live Demo

Using custom key classes in Dictionary and SortedDictionary Using custom key classes in Dictionary and SortedDictionary

 Dictionary relies on  Object.Equals() – for comparing the keys  Object.GetHashCode() – for calculating the hash codes of the keys  SortedDictionary relies on IComparable for ordering the keys  Built-in types like int, long, float, string and DateTime already implement Equals(), GetHashCode() and IComparable  Built-in types like int, long, float, string and DateTime already implement Equals(), GetHashCode() and IComparable  Other types used when used as dictionary keys should provide custom implementations 29

30 public struct Point { public int X { get; set; } public int X { get; set; } public int Y { get; set; } public int Y { get; set; } public override bool Equals(Object obj) public override bool Equals(Object obj) { if (!(obj is Point) || (obj == null)) return false; if (!(obj is Point) || (obj == null)) return false; Point p = (Point)obj; Point p = (Point)obj; return (X == p.X) && (Y == p.Y); return (X == p.X) && (Y == p.Y); } public override int GetHashCode() public override int GetHashCode() { return (X > 16) ^ Y; return (X > 16) ^ Y; }}

31 public struct Point : IComparable public struct Point : IComparable { public int X { get; set; } public int X { get; set; } public int Y { get; set; } public int Y { get; set; } public int CompareTo(Point otherPoint) public int CompareTo(Point otherPoint) { if (X != otherPoint.X) if (X != otherPoint.X) { return this.X.CompareTo(otherPoint.X); return this.X.CompareTo(otherPoint.X); } else else { return this.Y.CompareTo(otherPoint.Y); return this.Y.CompareTo(otherPoint.Y); } }}

Sets of Elements

 The abstract data type (ADT) "set" keeps a set of elements with no duplicates  Sets with duplicates are also known as ADT "bag"  Set operations:  Add(element)  Contains(element)  true / false  Delete(element)  Union(set) / Intersect(set)  Sets can be implemented in several ways  List, array, hash table, balanced tree,... 33

34

 HashSet implements ADT set by hash table  Elements are in no particular order  All major operations are fast:  Add(element) – appends an element to the set  Does nothing if the element already exists  Remove(element) – removes given element  Count – returns the number of elements  UnionWith(set) / IntersectWith(set) – performs union / intersection with another set 35

36 ISet firstSet = new HashSet ( new string[] { "SQL", "Java", "C#", "PHP" }); new string[] { "SQL", "Java", "C#", "PHP" }); ISet secondSet = new HashSet ( new string[] { "Oracle", "SQL", "MySQL" }); new string[] { "Oracle", "SQL", "MySQL" }); ISet union = new HashSet (firstSet); union.UnionWith(secondSet); PrintSet(union); // SQL Java C# PHP Oracle MySQL private static void PrintSet (ISet set) { foreach (var element in set) foreach (var element in set) { Console.Write("{0} ", element); Console.Write("{0} ", element); } Console.WriteLine(); Console.WriteLine();}

 SortedSet implements ADT set by balanced search tree (red-black tree)  Elements are sorted in increasing order  Example: 37 ISet firstSet = new SortedSet ( new string[] { "SQL", "Java", "C#", "PHP" }); new string[] { "SQL", "Java", "C#", "PHP" }); ISet secondSet = new SortedSet ( new string[] { "Oracle", "SQL", "MySQL" }); new string[] { "Oracle", "SQL", "MySQL" }); ISet union = new HashSet (firstSet); union.UnionWith(secondSet); PrintSet(union); // C# Java PHP SQL MySQL Oracle

Live Demo

 Dictionaries map key to value  Can be implemented as hash table or balanced search tree  Hash-tables map keys to values  Rely on hash-functions to distribute the keys in the table  Collisions needs resolution algorithm (e.g. chaining)  Very fast add / find / delete – O( 1 )  Sets hold a group of elements  Hash-table or balanced tree implementations 39

Questions?

1. Write a program that counts in a given array of double values the number of occurrences of each value. Use Dictionary. Example: array = { 3, 4, 4, - 2.5, 3, 3, 4, 3, } -2.5  2 times 3  4 times 4  3 times 2. Write a program that extracts from a given sequence of strings all elements that present in it odd number of times. Example: {C#, SQL, PHP, PHP, SQL, SQL }  {C#, SQL} 41

3. Write a program that counts how many times each word from given text file words.txt appears in it. The character casing differences should be ignored. The result words should be ordered by their number of occurrences in the text. Example: is  2 the  2 this  3 text  6 42 This is the TEXT. Text, text, text – THIS TEXT! Is this the text?

4. Implement the data structure "hash table" in a class HashTable. Keep the data in array of lists of key- value pairs (LinkedList >[]) with initial capacity of 16. When the hash table load runs over 75 %, perform resizing to 2 times larger capacity. Implement the following methods and properties: Add(key, value), Find(key)  value, Remove( key), Count, Clear(), this[], Keys. Try to make the hash table to support iterating over its elements with foreach. 5. Implement the data structure "set" in a class HashedSet using your class HashTable to hold the elements. Implement all standard set operations like Add(T), Find(T), Remove(T), Count, Clear(), union and intersect. 43

6. * A text file phones.txt holds information about people, their town and phone number: Duplicates can occur in people names, towns and phone numbers. Write a program to read the phones file and execute a sequence of commands given in the file commands.txt :  find(name) – display all matching records by given name (first, middle, last or nickname)  find(name, town) – display all matching records by given name and town 44 Mimi Shmatkata | Plovdiv | Kireto | Varna | Daniela Ivanova Petrova | Karnobat | Bat Gancho | Sofia |

 C# Telerik Academy  csharpfundamentals.telerik.com csharpfundamentals.telerik.com  Telerik Software Academy  academy.telerik.com academy.telerik.com  Telerik Facebook  facebook.com/TelerikAcademy facebook.com/TelerikAcademy  Telerik Software Academy Forums  forums.academy.telerik.com forums.academy.telerik.com