1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.

Slides:



Advertisements
Similar presentations
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Advertisements

The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
Dictionaries Again Collection of pairs.  (key, element)  Pairs have different keys. Operations.  Search(theKey)  Delete(theKey)  Insert(theKey, theElement)
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
CSCE 3400 Data Structures & Algorithm Analysis
Skip List & Hashing CSE, POSTECH.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  get(theKey)  put(theKey, theElement)  remove(theKey) 5/2/20151.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Dictionaries and Hash Tables1  
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by:  Search the hash table in.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Dictionaries Again Collection of pairs.  (key, element)  Pairs have different keys. Operations.  Get(theKey)  Delete(theKey)  Insert(theKey, theElement)
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  get(theKey)  put(theKey, theElement)  remove(theKey)
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hash Tables1   © 2010 Goodrich, Tamassia.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  find(theKey)  erase(theKey)  insert(theKey, theElement)
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing Alexandra Stefan.
Hash Tables (Chapter 13) Part 2.
Hashing CENG 351.
EEE2108: Programming for Engineers Chapter 8. Hashing
Hashing Alexandra Stefan.
Dictionaries 9/14/ :35 AM Hash Tables   4
Chapter 21 Hashing: Implementing Dictionaries and Sets
CSCE 3110 Data Structures & Algorithm Analysis
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

1 Chapter 7 Skip Lists and Hashing Part 2: Hashing

2 Sorted Linear Lists For formula-based implementation –Insert: O(n)comps & data moves –Delete: O(n)comps & data moves –Search: O(log(n)) comps For chained implementation: –Insert: O(n)comps –Delete: O(n)comps –Search: O(n)comps

3 Sorted Chain

4

5

6

7 Dictionary A dictionary is a collection of elements, each element has a field called key. Key is unique for each element Operations: –Insert an element with a specified key value –Search the dictionary for an element with a specified key value –delete an element with a specified key value The access mode for elements in a dictionary is random access (or direct access) mode: i.e. any element may be retrieved by performing a search on its key.

8 Dictionary

9 Ideal hashing Hash table: table used to store elements Hash function: function to map keys to positions: k => f(k) Search for an element with key k: if f(k) is not empty, found; otherwise, failed Insert: f(k) must be empty Delete: f(k) cannot be empty

10 Example: Student record dictionary Use student ID (6 digit number) as the key ID range and f(k) = k Table size: 1001 i.e. ht[ ] ht[i].key = 0 indicates an empty entry

11 Evaluation: Ideal Hashing Initialize an empty dictionary: Θ(b) where b is the size of the table Search, insert, and delete: Θ(1) Property: 1 key 1 position Problem: the range of the keys may be very large resulting in large hash table, e.g. if the key is a 9 digit integer (ex SSN), the size of the table will be 10 9

12 Hashing with linear open addressing Used when the size of the hash table (D) is smaller than the key range f(k) = k % D Positions in hash table are indexed 0..D-1 bucket - position in a hash table If key values are not integral type, they need to be converted first. two keys k1 and k2 map into the same bucket if f(k1) = f(k2)  collision home bucket - position numbered f(k) is the home bucket for k In general a bucket may contain space for more than one element. An overflow occurs if there is not room in the home bucket for the new element. If bucket has space for only one element, collision and overflow are the same.

13 Collision, overflow and linear open addressing 80, 58, &35 map into home bucket ht(3). In case of collision, insert in next available bucket in sequence.

14 Search To search for an element with key k, begin at bucket f(k) and continue in successive bucket regarding the table as circular, until: –a bucket containing an element with k is found (successful) –an empty bucket is reached (unsuccessful) –return to the home bucket (unsuccessful)

15 deletion After deletion, must move successive elements until: –am empty bucket is reached –return to the bucket from which the deletion took place To improve performance, use a NeverUsed field. May need reorganization when many buckets have their NeverUsed field set to false

16 Class definition

17 Constructor

18 hSearch

19 Search

20 Insert

21 Performance analysis b - the number of buckets in the hush table, b = D initialization - Θ(b) worst-case insert and search - Θ(n), where n is the number of elements in the table worst-case happens when all n keys have the same home bucket

22 Performance analysis (continue) Average performance Let α=n/b denote the loading factor U n and S n - average number of buckets examined during and unsuccessful and successful search, respectively, then

23 Performance analysis (continue) The performance of hashing with linear open addressing is superior : –when α=0.5 table is half full U n =2.5 and S n =1.5 –when α=0.9table is 90% full U n =50.5 and S n =5.5

24 Determining D either a prime number or has no prime factors less than 20 two methods: –begin with the largest possible value for b –Then find the largest D (<= b) that is either a prime or has no factors smaller than 20 –e.g., when b = 530, then D = 23*23 = 529

25 Determining D Second method: –determine your accepted U n and S n –Estimate n –determine α –determine smallest b for the above α –determine smallest integer D >= b that either prime or has no factor smaller than 20.

26 Determining D n = 1000 S  4 and U  50.5 –S = 4 ==> α = 6/7 –U = ==> α = 0.9 –α = min(6/7, 0.9) = 6/7 –b = n/ α = 7000/6 = 1167 –note: 23*51 = 1173 ==> select D = b = 1173

27 Hashing with Chains

28 Implementations

29 An improved implementation

30 Comparison with Linear Open Addressing Space complexity –Let s be the space required by an element –Let b and n denote the number of buckets and number of elements, respectively –Linear open addressing: b(s+2) bytes (2 for an element of empty array) –chaining: 2b+2n+ns bytes –when n < bs/(s+2), chaining takes less space

31 Search time complexity Worst-case time complexity= n occurs when all elements map to same bucket (equal to that of linear open addressing) Average –average length of a chain is α=n/b –average number of nodes examined in an unsuccessful search: * if chain has i nodes, it may take 1, 2, 3, …,I examinations. Assuming equal probability, on average search time =

32 Search time complexity Ctnd If α=0, U n =0 If α<1, U n <= α If α>=1,

33 Average time complexity for successful search Need to know the expected distance of each of the n elements from the head of its chain Without losing generality, we assume elements are inserted into the chain in increasing order When the ith element is inserted, the expected length of the chain is (i-1)/b; and the ith element is added into the end of the chain A search for this element will require examination of 1+(i-1)/b nodes Assuming n elements are searched for with equal probability, then

34 Comparison with linear open addressing The expected performance of chaining is superior, e.g., –when α=0.9 –Chaining: U n =0.9, S n =1.45 –Linear open addressing: U n =50.5, S n =5.5

35 Skip Lists

A sorted chain with head and tail nodes Pointers to middle are added

Pointers to every second node

38

39 Skip List Implementation

40

41

42

43

44

45

46

47

48

49

50 An application Text compression –compressor: file coding run-length coding: 1000 xs ys => 1000x2000y space needed: 3002 bytes (2 bytes for string ends) => 12 bytes –decompressor: decoding LZW Compression (Lempel, Ziv, and Welch)

51 LZW Compression Try aaabbbbbbaabaaba encoded as:

52 Input/Output

53 Input/Output (continue)

54 Dictionary organization Use code to represent the prefix of key

55 Dictionary organization (continue) assume each code is 12 bits long. Hence there are at most 2 12 =4096 codes Use hash table with divisor D = 4099 ChainHashTable h(D)

56 Output of codes

57 Compression

58 Compression (continue)

59 Compression (continue)

60 Headers and Function main

61 Headers and Function main (continue)

62 LZW Decompression The dictionary is searched for an entry with a given code The first code in the compressed file corresponds to a single character For all other codes p: –Case 1: p is in the dictionary –Case 2: p is not in the dictionary If q is the code that precedes p in the compressed file, then pair (next code, test(q)fc(p)) is entered into dictionary, where f c (p) is the first character of text(p). This can only happen when text(p) = text(q)f c (q) and the current text segment is text(q)text(q)f c (q)

63 Try Decode the result should be aaabbbbbbaabaaba

64 Input/Output

65 Input/Output (continue)

66 Dictionary organization

67 Input of Code

68 Decompression

69 Decompression (continue)

70 Headers and Function main

71 Headers and Function main (continue)

72 End of Chapter 7