Space-for-time tradeoffs

Slides:



Advertisements
Similar presentations
Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Advertisements

Space-for-Time Tradeoffs
Chapter 7 Space and Time Tradeoffs. Space-for-time tradeoffs Two varieties of space-for-time algorithms: b input enhancement — preprocess the input (or.
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off (extra space in tables - breathing.
Hashing CS 3358 Data Structures.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
Chapter 7 Space and Time Tradeoffs James Gain & Sonia Berman
MA/CSSE 473 Day 24 Student questions Quadratic probing proof
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Theory of Algorithms: Space and Time Tradeoffs James Gain and Edwin Blake {jgain | Department of Computer Science University of Cape.
Chapter 6 Transform-and-Conquer Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Application: String Matching By Rong Ge COSC3100
MA/CSSE 473 Day 27 Hash table review Intro to string searching.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
MA/CSSE 473 Day 23 Student questions Space-time tradeoffs Hash tables review String search algorithms intro.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off: b extra space in tables (breathing.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Design and Analysis of Algorithms – Chapter 71 Space-Time Tradeoffs: String Matching Algorithms* Dr. Ying Lu RAIK 283: Data Structures.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
CSG523/ Desain dan Analisis Algoritma
Hashing, Hash Function, Collision & Deletion
Hash table CSC317 We have elements with key and satellite data
Hashing CSE 2011 Winter July 2018.
Slides by Steve Armstrong LeTourneau University Longview, TX
What is an algorithm? An algorithm is a sequence of unambiguous instructions for solving a problem, i.e., for obtaining a required output for any legitimate.
What is an algorithm? An algorithm is a sequence of unambiguous instructions for solving a problem, i.e., for obtaining a required output for any legitimate.
CS 332: Algorithms Hash Tables David Luebke /19/2018.
MA/CSSE 473 Day 24 Student questions Space-time tradeoffs
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash functions Open addressing
Sorting.
Advanced Associative Structures
CSCE350 Algorithms and Data Structure
Space-for-time tradeoffs
What is an algorithm? An algorithm is a sequence of unambiguous instructions for solving a problem, i.e., for obtaining a required output for any legitimate.
Hash Tables.
Dictionaries and Their Implementations
CSCE 3110 Data Structures & Algorithm Analysis
Chapter 7 Space and Time Tradeoffs
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Database Design and Programming
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Space-for-time tradeoffs
EE 312 Software Design and Implementation I
CSC 380: Design and Analysis of Algorithms
Space-for-time tradeoffs
Space-for-time tradeoffs
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Hashing.
CS 3343: Analysis of Algorithms
Analysis and design of algorithm
DATA STRUCTURES-COLLISION TECHNIQUES
EE 312 Software Design and Implementation I
Collision Resolution: Open Addressing Extendible Hashing
Lecture-Hashing.
What is an algorithm? An algorithm is a sequence of unambiguous instructions for solving a problem, i.e., for obtaining a required output for any legitimate.
Presentation transcript:

Space-for-time tradeoffs Two varieties of space-for-time algorithms: input enhancement — preprocess the input (or its part) to store some info to be used later in solving the problem counting sorts string searching algorithms prestructuring — preprocess the input to make accessing its elements easier hashing indexing schemes (e.g., B-trees) A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Sorting by Counting - Comparison Idea: count the total number of elements smaller than the key; use this number as the position in the sorted list A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Comparison Counting Sort Analysis Time efficiency: C(n) = Advantage: make the minimum number of key moves Work productively when elements to be sorted belong to a small set of values Disadvantage: time complexity O(n2) & space complexity O(n)  i=0 n-2 1 = j=i+1 n-1  [(n-1) – (i+1) + 1] = i=0 n-2  i=0 n-2   A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Sorting by Counting – Distribution Example: Sort the array: 13 11 12 13 12 12 Idea: the distribution values indicates the proper positions in the sorted array. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Sorting by Counting – Distribution Efficiency: O(n) A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Review: String searching by brute force pattern: a string of m characters to search for text: a (long) string of n characters to search in Brute force algorithm Step 1 Align pattern at beginning of text Step 2 Moving from left to right, compare each character of pattern to the corresponding character in text until either all characters are found to match (successful search) or a mismatch is detected Step 3 While a mismatch is detected and the text is not yet exhausted, realign pattern one position to the right and repeat Step 2 A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Examples of Brute-Force String Matching Pattern: 001011 Text: 10010101101001100101111010 Pattern: happy Text: It is never too late to have a happy childhood. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 3 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Review: String searching by brute force Time efficiency: Worse case: O(nm) Average case: O(n + m) A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

String searching by preprocessing Several string searching algorithms are based on the input enhancement idea of preprocessing the pattern Knuth-Morris-Pratt (KMP) algorithm preprocesses pattern left to right to get useful information for later searching Boyer -Moore algorithm preprocesses pattern right to left and store information into two tables Horspool’s algorithm simplifies the Boyer-Moore algorithm by using just one table A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Horspool’s Algorithm A simplified version of Boyer-Moore algorithm: preprocesses pattern to generate a shift table that determines how much to shift the pattern when a mismatch occurs always makes a shift based on the text’s character c aligned with the last character in the pattern according to the shift table’s entry for c A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

How far to shift? Look at first (rightmost) character in text that was compared: The character is not in the pattern .....c...................... (c not in pattern) BAOBAB The character is in the pattern (but not the rightmost) .....O...................... (O occurs once in pattern) BAOBAB .....A...................... (A occurs twice in pattern) The rightmost characters do match .....B...................... A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Shift table Shift sizes can be precomputed by the formula distance from c’s rightmost occurrence in pattern among its first m-1 characters to its right end t(c) = pattern’s length m, otherwise by scanning pattern before search begins and stored in a table called shift table Shift table is indexed by text and pattern alphabet Eg, for BAOBAB: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 6 6 6 A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Example of Horspool’s alg. application BARD LOVED BANANAS BAOBAB BAOBAB (unsuccessful search) A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 6 6 6 _ 6 A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Example of Horspool’s alg. application JIM_SAW_ME_IN_A_BARBERSHOP BARBER BARBER (successful search) A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 4 2 6 6 1 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 _ 6 A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Horspool’s Algorithm Time efficiency: Worse case: O(nm) Average case: O(n) A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Hashing A very efficient method for implementing a dictionary, i.e., a set with the operations: find insert delete Based on representation-change and space-for-time tradeoff ideas Important applications: symbol tables databases (extendible hashing) A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Hashing Symbol tables A compiler uses a symbol table to relate symbols to associated data Symbols: variable names, procedure names, etc. Associated data: memory location, call graph, etc. We typically don’t care about sorted order Hashing is based on the idea distributing keys among a one-dimensional array H[0 … m - 1] called a hash table.

Direct Addressing Suppose: The idea: The range of keys is 0..m-1; Keys are distinct The idea: Set up an array H[0..m-1] in which H[i] = x if x T and key[x] = i H[i] = NULL otherwise This is called a direct-address table Operations take O(1) time! H

Direct Addressing Direct addressing works well when the range m of keys is relatively small But what if the keys are 32-bit integers? Problem 1: direct-address table will have 232 entries, more than 4 billion Problem 2: even if memory is not an issue, actually stored maybe so small, so that most of the space allocated would be wasted Solution: map keys to smaller range 0..m-1 This mapping is called a hash function

Hash Functions H h(k1) k1 h(k4) k4 k5 h(k2) = h(k5) k2 h(k3) k3 m - 1 U (universe of keys) h(k1) k1 h(k4) K (actual keys) k4 k5 h(k2) = h(k5) k2 h(k3) k3 m - 1

Hash tables and hash functions The idea of hashing is to map keys of a given file of size n into a table of size m, called the hash table, by using a predefined function, called the hash function, h: K  location (cell) in the hash table Example: student records, key = SSN. Hash function: h(K) = K mod m where m is some integer (typically, prime) If m = 1000, where is record with SSN= 314159265 stored? Generally, a hash function should: be easy to compute distribute keys about evenly throughout the hash table A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Collisions If h(K1) = h(K2), there is a collision Good hash functions result in fewer collisions Two principal hashing schemes handle collisions differently: Open hashing – each cell is a header of linked list of all keys hashed to it Closed hashing one key per cell in case of collision, finds another cell by linear probing: use next free bucket double hashing: use second hash function to compute increment A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Open hashing (Separate chaining) Keys are stored in linked lists outside a hash table whose elements serve as the lists’ headers. Example: A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED h(K) = sum of K ‘s letters’ positions in the alphabet MOD 13 Key A FOOL AND HIS MONEY ARE SOON PARTED h(K) 1 9 6 10 7 11 12 1 2 3 4 5 6 7 8 9 10 11 12 A AND MONEY FOOL HIS ARE PARTED SOON Search for KID A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Open hashing (cont.) If hash function distributes keys uniformly, average length of linked list will be α = n/m. This ratio is called load factor. Average number of probes in successful, S, and unsuccessful searches, U: S  1+α/2, U = α Load α is typically kept small (ideally, about 1) Time efficiency: Insertion and deletion: O(1) Search: O(1 + α) A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Closed hashing (Open addressing) Keys are stored inside a hash table. Key A FOOL AND HIS MONEY ARE SOON PARTED h(K) 1 9 6 10 7 11 12 1 2 3 4 5 6 7 8 9 10 11 12 A FOOL AND HIS MONEY ARE SOON PARTED A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Closed hashing (cont.) Does not work if n > m Deletion (lazy deletion) Mark previously occupied location by a special symbol to distinguish them from locations that have not been occupied. Number of probes to find/insert/delete a key depends on load factor α = n/m and collision resolution strategy. For linear probing: S = (½) (1+ 1/(1- α)) and U = (½) (1+ 1/(1- α)²) As the table gets filled (α approaches 1), number of probes in linear probing increases dramatically: A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.

Double hashing Use second hash function to determine a fixed increment for the probing sequence to be used after a collision Double hashing is superior to linear probing. Time complexity analysis is quite difficult. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 7 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.