Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

1 11. Hash Tables Heejin Park College of Information and Communications Hanyang University.
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 1 Chapter 2.
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Hash Tables.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Hash Tables,
Hashing.
HASH TABLE. HASH TABLE a group of people could be arranged in a database like this: Hashing is the transformation of a string of characters into a.
Space-for-Time Tradeoffs
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Hashing as a Dictionary Implementation
Chapter 7 Space and Time Tradeoffs. Space-for-time tradeoffs Two varieties of space-for-time algorithms: b input enhancement — preprocess the input (or.
1 CSC 421: Algorithm Design & Analysis Spring 2013 Space vs. time  space/time tradeoffs  examples: heap sort, data structure redundancy, hashing  string.
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off (extra space in tables - breathing.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 48 Hashing.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Chapter 7 Space and Time Tradeoffs James Gain & Sonia Berman
MA/CSSE 473 Day 24 Student questions Quadratic probing proof
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Theory of Algorithms: Space and Time Tradeoffs James Gain and Edwin Blake {jgain | Department of Computer Science University of Cape.
Chapter 6 Transform-and-Conquer Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Application: String Matching By Rong Ge COSC3100
MA/CSSE 473 Day 27 Hash table review Intro to string searching.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
MA/CSSE 473 Day 23 Student questions Space-time tradeoffs Hash tables review String search algorithms intro.
Hashing as a Dictionary Implementation Chapter 19.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off: b extra space in tables (breathing.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
Design and Analysis of Algorithms – Chapter 71 Space-Time Tradeoffs: String Matching Algorithms* Dr. Ying Lu RAIK 283: Data Structures.
MA/CSSE 473 Day 25 Student questions Boyer-Moore.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
CSG523/ Desain dan Analisis Algoritma
Chapter 27 Hashing Jung Soo (Sue) Lim Cal State LA.
MA/CSSE 473 Day 26 Student questions Boyer-Moore B Trees.
CSC 421: Algorithm Design & Analysis
Hash table CSC317 We have elements with key and satellite data
MA/CSSE 473 Day 24 Student questions Space-time tradeoffs
Hash functions Open addressing
Sorting.
CSCE350 Algorithms and Data Structure
Chapter 28 Hashing.
Space-for-time tradeoffs
Chapter 21 Hashing: Implementing Dictionaries and Sets
Dictionaries and Their Implementations
Chapter 7 Space and Time Tradeoffs
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Space-for-time tradeoffs
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Space-for-time tradeoffs
Space-for-time tradeoffs
Space-for-time tradeoffs
Presentation transcript:

Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.

7-1 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Space-for-time tradeoffs Two varieties of space-for-time algorithms: b input enhancement preprocess the input (or its part) to store some info to be used later in solving the problem counting sorts (Ch. 7.1)counting sorts (Ch. 7.1) string searching algorithmsstring searching algorithms b prestructuring preprocess the input to make accessing its elements easier hashinghashing indexing schemes (e.g., B-trees)indexing schemes (e.g., B-trees)

7-2 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Review: String searching by brute force pattern: a string of m characters to search for text: a (long) string of n characters to search in Brute force algorithm Step 1Align pattern at beginning of text Step 2Moving from left to right, compare each character of pattern to the corresponding character in text until either all characters are found to match (successful search) or a mismatch is detected Step 3 While a mismatch is detected and the text is not yet exhausted, realign pattern one position to the right and repeat Step 2 Time complexity (worst-case): O(mn)

7-3 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 String searching by preprocessing Several string searching algorithms are based on the input enhancement idea of preprocessing the pattern b Knuth-Morris-Pratt (KMP) algorithm preprocesses pattern left to right to get useful information for later searching b Boyer -Moore algorithm preprocesses pattern right to left and store information into two tables b Horspools algorithm simplifies the Boyer-Moore algorithm by using just one table O(m+n) time in the worst case

7-4 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Horspools Algorithm A simplified version of Boyer-Moore algorithm: preprocesses pattern to generate a shift table that determines how much to shift the pattern when a mismatch occurspreprocesses pattern to generate a shift table that determines how much to shift the pattern when a mismatch occurs always makes a shift based on the texts character c aligned with the last compared (mismatched) character in the pattern according to the shift tables entry for calways makes a shift based on the texts character c aligned with the last compared (mismatched) character in the pattern according to the shift tables entry for c

7-5 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 How far to shift? Look at first (rightmost) character in text that was compared: b The character is not in the pattern.....c ( c not in pattern).....c ( c not in pattern) BAOBAB BAOBAB b The character is in the pattern (but not the rightmost).....O ( O occurs once in pattern) BAOBAB.....O ( O occurs once in pattern) BAOBAB.....A ( A occurs twice in pattern).....A ( A occurs twice in pattern) BAOBAB BAOBAB b The rightmost characters do match.....B B BAOBAB BAOBAB

7-6 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Shift table b Shift sizes can be precomputed by the formula distance from cs rightmost occurrence in pattern among its first m-1 characters to its right end distance from cs rightmost occurrence in pattern among its first m-1 characters to its right end t(c) = t(c) = patterns length m, otherwise patterns length m, otherwise by scanning pattern before search begins and stored in a table called shift table. After the shift, the right end of pattern is t(c) positions to the right of the last compared character in text. by scanning pattern before search begins and stored in a table called shift table. After the shift, the right end of pattern is t(c) positions to the right of the last compared character in text. Shift table is indexed by text and pattern alphabet Eg, for BAOBAB: Shift table is indexed by text and pattern alphabet Eg, for BAOBAB: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z {

7-7 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Example of Horspools algorithm BARD LOVED BANANAS BAOBAB BAOBAB BAOBAB BAOBAB (unsuccessful search) If k characters are matched before the mismatch, then the shift distance is d 1 = t(c) – k. Note that the shift could be negative! E.g. if text = …ABABAB... E.g. if text = …ABABAB... A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ 6 ……………..czyx………. …c.…bzyx …c….bzyx { } k t(c)

7-8 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Boyer-Moore algorithm Based on the same two ideas: comparing pattern characters to text from right to leftcomparing pattern characters to text from right to left precomputing shift sizes in two tablesprecomputing shift sizes in two tables –bad-symbol table indicates how much to shift based on texts character causing a mismatch –good-suffix table indicates how much to shift based on matched part (suffix) of the pattern (taking advantage of the periodic structure of the pattern)

7-9 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Bad-symbol shift in Boyer-Moore algorithm b If the rightmost character of the pattern doesnt match, BM algorithm acts as Horspools b If the rightmost character of the pattern does match, BM compares preceding characters right to left until either all patterns characters match or a mismatch on texts character c is encountered after k > 0 matches textpattern bad-symbol shift d 1 =max{t(c ) - k, 1} bad-symbol shift d 1 = max{t(c ) - k, 1} c k matches

7-10 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Good-suffix shift in Boyer-Moore algorithm b Good-suffix shift d 2 is applied after 0 < k < m last characters were matched d 2 (k) = the distance between (the last letter of) the matched suffix of size k and (the last letter of ) its rightmost occurrence in the pattern that is not preceded by the same character preceding the suffix Example: CABABA d 2 (1) = 4 d 2 (k) = the distance between (the last letter of) the matched suffix of size k and (the last letter of ) its rightmost occurrence in the pattern that is not preceded by the same character preceding the suffix Example: CABABA d 2 (1) = 4 If there is no such occurrence, match the longest part (tail) of the k-character suffix with corresponding prefix; if there are no such suffix-prefix matches, d 2 (k) = m Example: WOWWOW d 2 (2) = 5, d 2 (3) = 3, d 2 (4) = 3, d 2 (5) = 3 If there is no such occurrence, match the longest part (tail) of the k-character suffix with corresponding prefix; if there are no such suffix-prefix matches, d 2 (k) = m Example: WOWWOW d 2 (2) = 5, d 2 (3) = 3, d 2 (4) = 3, d 2 (5) = ………………czyx………..…azyx….bzyx ….azyx.…bzyx } d 2 (k) k { ………………czyx………. yx….bzyx yx.…bzyx } d 2 (k) k {

7-11 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Boyer-Moore Algorithm After matching successfully 0 < k < m characters, the algorithm shifts the pattern right by d = max {d 1, d 2 } d = max {d 1, d 2 } where d 1 =max{t(c) - k, 1} is bad-symbol shift where d 1 = max{t(c) - k, 1} is bad-symbol shift d 2 (k) is good-suffix shift d 2 (k) is good-suffix shift Example: Find pattern AT _ THAT in WHICH _ FINALLY _ HALTS. _ _ AT _ THAT WHICH _ FINALLY _ HALTS. _ _ AT _ THAT t A H T _ ? d | AT_THAT AT_THAT | | | | | AT_THAT AT_THAT d1 = 7-1 = 6 d1 = 7-1 = 6 | | | | | | AT_THAT AT_THAT d1 = 4 -2 = 2 d1 = 4 -2 = 2 | | | | | | | | | | | | | | AT_THAT AT_THAT

7-12 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Boyer-Moore Algorithm (cont.) Step 1 Fill in the bad-symbol shift table Step 2 Fill in the good-suffix shift table Step 3 Align the pattern against the beginning of the text Step 4 Repeat until a matching substring is found or text ends: Compare the corresponding characters right to left. Compare the corresponding characters right to left. If no characters match, retrieve entry t 1 (c) from the bad-symbol table for the texts character c causing the mismatch and shift the pattern to the right by t 1 (c). If 0 < k < m characters are matched, retrieve entry t 1 (c) from the bad-symbol table for the texts character c causing the mismatch and entry d 2 (k) from the good- suffix table and shift the pattern to the right by If no characters match, retrieve entry t 1 (c) from the bad-symbol table for the texts character c causing the mismatch and shift the pattern to the right by t 1 (c). If 0 < k < m characters are matched, retrieve entry t 1 (c) from the bad-symbol table for the texts character c causing the mismatch and entry d 2 (k) from the good- suffix table and shift the pattern to the right by d = max {d 1, d 2 } where d 1 =max{t 1 (c) - k, 1}. d = max {d 1, d 2 } where d 1 = max{t 1 (c) - k, 1}.

7-13 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Example of Boyer-Moore alg. application B E S S _ K N E W _ A B O U T _ B A O B A B S B E S S _ K N E W _ A B O U T _ B A O B A B S B A O B A B B A O B A B d 1 = t( K ) = 6 B A O B A B d 1 = t( K ) = 6 B A O B A B d 1 = t( _ )-2 = 4 d 1 = t( _ )-2 = 4 d 2 (2) = 5 d 2 (2) = 5 B A O B A B B A O B A B d 1 = t( _ )-1 = 5 d 1 = t( _ )-1 = 5 d 2 (1) = 2 d 2 (1) = 2 B A O B A B (success) B A O B A B (success) A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ 6 kpatternd2d2 1 BAOBAB Worst-case time complexity: O(n+m).

7-14 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – find – insert – delete b Based on representation-change and space-for-time tradeoff ideas b Important applications: – symbol tables – databases (extendible hashing)

7-15 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Hash tables and hash functions The idea of hashing is to map keys of a given file of size n into a table of size m, called the hash table, by using a predefined function, called the hash function, h: K location (cell) in the hash table h: K location (cell) in the hash table Example: student records, key = SSN. Hash function: h(K) = K mod m where m is some integer (typically, prime) If m = 1000, where is record with SSN= stored? Generally, a hash function should: be easy to computebe easy to compute distribute keys about evenly throughout the hash tabledistribute keys about evenly throughout the hash table

7-16 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Collisions If h(K 1 ) = h(K 2 ), there is a collision If h(K 1 ) = h(K 2 ), there is a collision b Good hash functions result in fewer collisions but some collisions should be expected (birthday paradox) b Two principal hashing schemes handle collisions differently : Open hashing – each cell is a header of linked list of all keys hashed to itOpen hashing – each cell is a header of linked list of all keys hashed to it Closed hashingClosed hashing –one key per cell –in case of collision, finds another cell by –linear probing: use next free bucket – double hashing: use second hash function to compute increment

7-17 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Open hashing (Separate chaining) Keys are stored in linked lists outside a hash table whose elements serve as the lists headers. Example: A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED h(K) = sum of Ks letters positions in the alphabet MOD 13 KeyAFOOLANDHISMONEYARESOONPARTED h(K)h(K)h(K)h(K) AFOOLANDHISMONEYAREPARTED SOON Search for KID

7-18 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Open hashing (cont.) If hash function distributes keys uniformly, average length of linked list will be α = n/m. This ratio is called load factor. If hash function distributes keys uniformly, average length of linked list will be α = n/m. This ratio is called load factor. b For ideal hash functions, the average numbers of probes in successful, S, and unsuccessful searches, U: S 1+ α /2, U = α (CLRS, Ch. 11) S 1+ α /2, U = α (CLRS, Ch. 11) Load α is typically kept small (ideally, about 1) Load α is typically kept small (ideally, about 1) b Open hashing still works if n > m

7-19 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Closed hashing (Open addressing) Keys are stored inside a hash table. A AFOOL AANDFOOL AANDFOOLHIS AANDMONEYFOOLHIS AANDMONEYFOOLHISARE AANDMONEYFOOLHISARESOON PARTEDAANDMONEYFOOLHISARESOON KeyAFOOLANDHISMONEYARESOONPARTED h(K)h(K)h(K)h(K)

7-20 Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. 7 Closed hashing (cont.) b Does not work if n > m b Avoids pointers b Deletions are not straightforward Number of probes to find/insert/delete a key depends on load factor α = n/m (hash table density) and collision resolution strategy. For linear probing: Number of probes to find/insert/delete a key depends on load factor α = n/m (hash table density) and collision resolution strategy. For linear probing: S = (½) (1+ 1/(1- α )) and U = (½) (1+ 1/(1- α )²) As the table gets filled ( α approaches 1), number of probes in linear probing increases dramatically: As the table gets filled ( α approaches 1), number of probes in linear probing increases dramatically: