CSC 421: Algorithm Design & Analysis

Slides:



Advertisements
Similar presentations
Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Advertisements

Space-for-Time Tradeoffs
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
1 CSC 421: Algorithm Design & Analysis Spring 2013 Space vs. time  space/time tradeoffs  examples: heap sort, data structure redundancy, hashing  string.
1 CSC 427: Data Structures and Algorithm Analysis Fall 2004 Heaps and heap sort  complete tree, heap property  min-heaps & max-heaps  heap operations:
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off (extra space in tables - breathing.
Design a Data Structure Suppose you wanted to build a web search engine, a la Alta Vista (so you can search for “banana slugs” or “zyzzyvas”) index say.
Boyer-Moore string search algorithm Book by Dan Gusfield: Algorithms on Strings, Trees and Sequences (1997) Original: Robert S. Boyer, J Strother Moore.
Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber May 1994.
1 CSC 222: Computer Programming II Spring 2005 Searching and efficiency  sequential search  big-Oh, rate-of-growth  binary search  example: spell checker.
1 CSC 427: Data Structures and Algorithm Analysis Fall 2010 transform & conquer  transform-and-conquer approach  balanced search trees o AVL, 2-3 trees,
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
Heapsort Based off slides by: David Matuszek
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
Chapter 7 Space and Time Tradeoffs James Gain & Sonia Berman
Heapsort CSC Why study Heapsort? It is a well-known, traditional sorting algorithm you will be expected to know Heapsort is always O(n log n)
Sorting with Heaps Observation: Removal of the largest item from a heap can be performed in O(log n) time Another observation: Nodes are removed in order.
MA/CSSE 473 Day 24 Student questions Quadratic probing proof
CSS446 Spring 2014 Nan Wang.  Java Collection Framework ◦ Set ◦ Map 2.
Application: String Matching By Rong Ge COSC3100
MA/CSSE 473 Day 27 Hash table review Intro to string searching.
CSC 211 Data Structures Lecture 13
1 CSC 427: Data Structures and Algorithm Analysis Fall 2008 Algorithm analysis, searching and sorting  best vs. average vs. worst case analysis  big-Oh.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
MA/CSSE 473 Day 23 Student questions Space-time tradeoffs Hash tables review String search algorithms intro.
Chapter 2: Basic Data Structures. Spring 2003CS 3152 Basic Data Structures Stacks Queues Vectors, Linked Lists Trees (Including Balanced Trees) Priority.
CSC 427: Data Structures and Algorithm Analysis
CSC 427: Data Structures and Algorithm Analysis
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off: b extra space in tables (breathing.
1 CSC 427: Data Structures and Algorithm Analysis Fall 2011 Space vs. time  space/time tradeoffs  hashing  hash table, hash function  linear probing.
Exact String Matching Algorithms Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Collections Data structures in Java. OBJECTIVE “ WHEN TO USE WHICH DATA STRUCTURE ” D e b u g.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Heapsort. What is a “heap”? Definitions of heap: 1.A large area of memory from which the programmer can allocate blocks as needed, and deallocate them.
Collections Mrs. C. Furman April 21, Collection Classes ArrayList and LinkedList implements List HashSet implements Set TreeSet implements SortedSet.
HEAPS. Review: what are the requirements of the abstract data type: priority queue? Quick removal of item with highest priority (highest or lowest key.
Data Structures and Algorithms Searching Algorithms M. B. Fayek CUFE 2006.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
Design and Analysis of Algorithms – Chapter 71 Space-Time Tradeoffs: String Matching Algorithms* Dr. Ying Lu RAIK 283: Data Structures.
Amortized Analysis and Heaps Intro David Kauchak cs302 Spring 2013.
1 CSC 421: Algorithm Design Analysis Spring 2013 Transform & conquer  transform-and-conquer approach  presorting  balanced search trees, heaps  Horner's.
CSC 143T 1 CSC 143 Highlights of Tables and Hashing [Chapter 11 p (Tables)] [Chapter 12 p (Hashing)]
Algorithms Design Fall 2016 Week 6 Hash Collusion Algorithms and Binary Search Trees.
CSG523/ Desain dan Analisis Algoritma
MA/CSSE 473 Day 26 Student questions Boyer-Moore B Trees.
CSC 427: Data Structures and Algorithm Analysis
CSC 427: Data Structures and Algorithm Analysis
CSC 421: Algorithm Design & Analysis
CSC 222: Object-Oriented Programming
Mark Redekopp David Kempe
CSC 421: Algorithm Design Analysis
13 Text Processing Hongfei Yan June 1, 2016.
CSCE350 Algorithms and Data Structure
Space-for-time tradeoffs
Chapter 7 Space and Time Tradeoffs
CSC 421: Algorithm Design & Analysis
Space-for-time tradeoffs
Space-for-time tradeoffs
CSC 421: Algorithm Design Analysis
CSC 421: Algorithm Design Analysis
CSC 427: Data Structures and Algorithm Analysis
Amortized Analysis and Heaps Intro
Space-for-time tradeoffs
Space-for-time tradeoffs
Sequences 5/17/ :43 AM Pattern Matching.
CSE 373: Data Structures and Algorithms
Presentation transcript:

CSC 421: Algorithm Design & Analysis Spring 2014 Space vs. time space/time tradeoffs examples: heap sort, data structure redundancy, hashing string matching brute force, Horspool algorithm, Boyer-Moore algorithm

Space vs. time for many applications, it is possible to trade memory for speed i.e., additional storage can allow for a more efficient algorithm ArrayList wastes space when it expands (doubles in size) each expansion yields a list with (slightly less than) half empty improves overall performance since only log N expansions when adding N items linked structures sacrifice space (reference fields) for improved performance e.g., a linked list takes twice the space, but provides O(1) add/remove from either end hash tables can obtain O(1) add/remove/search if willing to waste space to minimize the chance of collisions, must keep the load factor low HashSet & HashMap resize (& rehash) if load factor every reaches 75%

Heap sort can utilize a heap to efficiently sort a list of values start with the ArrayList to be sorted construct a heap out of the elements repeatedly, remove min element and put back into the ArrayList public static <E extends Comparable<? super E>> void heapSort(ArrayList<E> items) { MinHeap<E> itemHeap = new MinHeap<E>(); for (int i = 0; i < items.size(); i++) { itemHeap.add(items.get(i)); } items.set(i, itemHeap.minValue()); itemHeap.remove(); N items in list, each insertion can require O(log N) swaps to reheapify construct heap in O(N log N) N items in heap, each removal can require O(log N) swap to reheapify copy back in O(N log N) can get the same effect by copying the values into a TreeSet, then iterating

Other space vs. time examples we have made similar space/time tradeoffs often in code AnagramFinder (find all anagrams of a given word) can preprocess the dictionary & build a map of words, keyed by sorted letters then, each anagram lookup is O(1) BinaryTree (implement a binary tree & eventually extend to BST) kept track of the number of nodes in a field (updated on each add/remove) turned the O(N) size operation into O(1) KWIC Index (store and display titles based on keywords) to allow for efficient searches/displays, store a copy for each keyword other examples?

String matching many useful applications involve searching text for patterns word processing (e.g., global find and replace) data mining (e.g., looking for common parameters) genomics (e.g., searching for gene sequences) TTAAGGACCGCATGCCCTCGAATAGGCTTGAGCTTGCAATTAACGCCCAC… in general given a (potentially long) string S and need to find (relatively small) pattern P an obvious, brute force solution exists: O(|S||P|) however, utilizing preprocessing and additional memory can reduce this to O(|S|) in practice, it is often < |S| HOW CAN THIS BE?!?

Brute force String: FOOBARBIZBAZ Pattern: BIZ the brute force approach would shift the pattern along, looking for a match: FOOBARBIZBAZ at worst: |S|-|P|+1 passes through the pattern each pass requires at most |P| comparisons  O(|S||P|)

Smart shifting FOOBARBIZBAZ BIZ suppose we look at the rightmost letter in the attempted match here, compare Z in the pattern with O in the string since there is no O in the pattern, can skip the next two comparisons FOOBARBIZBBAZ BIZ again, no R in BIZ, so can skip another two comparisons FOOBARBIZBBAZ BIZ

Horspool algorithm given a string S and pattern P, Construct a shift table containing each letter of the alphabet. if the letter appears in P (other than in the last index)  store distance from end otherwise, store |P| e.g., P = BIZ Align the pattern with the start of the string S, i.e., with S0S1…S|P|-1 Repeatedly, compare from right-to-left with the aligned sequence SiSi+1…Si+|P|-1 if all |P| letters match, then FOUND. if not, then shift P to the right by shiftTable[Si+|P|-1] if the shift falls off the end, then NOT FOUND A B C … I X Y Z 3 2 1

Horspool example 1 S = FOOBARBIZBAZ P=BIZ shiftTable for "BIZ" BIZ Z and O do not match, so shift shiftTable[O] = 3 spots BIZ Z and R do not match, so shift shiftTable[R] = 3 spots BIZ pattern is FOUND total number of comparisons = 5 A B C … I X Y Z 3 2 1

Horspool example 2 S = "FOZIZBARBIZBAZ" P="BIZ" shiftTable for "BIZ" FOZIZBARBIZBAZ BIZ Z and Z match, but not I and O, so shift shiftTable[Z] = 3 spots BIZ Z and B do not match, so shift shiftTable[B] = 2 spots BIZ Z and R do not match, so shift shiftTable[R] = 3 spots BIZ pattern is FOUND total number of comparisons = 7 A B C … I X Y Z 3 2 1 suppose we are looking for BIZ in a string with no Z's how many comparisons?

Horspool analysis space & time requires storing shift table whose size is the alphabet since the alphabet is usually fixed, table requires O(1) space worst case is O(|S||P|) this occurs when skips are infrequent & close matches to the pattern appear often for random data, however, only O(|S|) Horspool algorithm is a simplification of a more complex (and well-known) algorithm: Boyer-Moore algorithm in practice, Horspool is often faster however, Boyer-Moore has O(|S|) worst case, instead of O(|S||P|)

Boyer-Moore algorithm based on two kinds of shifts (both compare right-to-left, find first mismatch) the first is bad-symbol shift (based on the symbol that caused the mismatch) BIZFIZIBIZFIZBIZ FIZBIZ F and B don't match, shift to align F FIZBIZ I and Z don't match, shift to align I FIZBIZ I and Z don't match, shift to align I FIZBIZ F and Z don't match, shift to align F FIZBIZ FOUND

Bad symbol shift bad symbol table is |alphabet||P| kth row contains shift amount if mismatch occurred at index k A B C … F I Y Z 6 2 5 1 3 4 - bad symbol table for FIZBIZ: BIZFIZIBIZFIZBIZ FIZBIZ F and B don't match  badSymbolTable(F,2) = 3 FIZBIZ I and Z don't match  badSymbolTable(I, 0) = 1 FIZBIZ I and Z don't match  badSymbolTable(I, 3) = 1 FIZBIZ F and Z don't match  badSymbolTable(F, 0) = 5 FIZBIZ FOUND

Good suffix shift find the longest suffix that matches if that suffix appears to the left in P preceded by a different char, shift to align if not, then shift the entire length of the word BIZFIZIBIZFIZBIZ FIZBIZ IZ suffix matches, IZ appears to left so shift to align FIZBIZ no suffix match, so shift 1 spot FIZBIZ BIZ suffix matches, doesn't appear again so full shift FIZBIZ FOUND

Good suffix shift good suffix shift table is |P| assume that suffix matches but char to the left does not if that suffix appears to the left preceded by a different char, shift to align if not, then can shift the entire length of the word IZBIZ ZBIZ BIZ IZ Z 6 3 1 good suffix table for FIZBIZ BIZFIZIBIZFIZBIZ FIZBIZ IZ suffix matches  goodSuffixTable(IZ) = 3 FIZBIZ no suffix match  goodSuffixTable() = 1 FIZBIZ BIZ suffix matches  goodSuffixTable(BIZ) = 6 FIZBIZ FOUND

Boyer-Moore string search algorithm calculate the bad symbol and good suffix shift tables while match not found and not off the edge compare pattern with string section shift1 = bad symbol shift of rightmost non-matching char shift2 = good suffix shift of longest matching suffix shift string section for comparison by max(shift1, shift2) the algorithm has been proven to require at most 3*|S| comparisons so, O(|S|) in practice, can require fewer than |S| comparisons requires storing O(|P|) bad symbol shift table and O(|P|) good suffix shift table