MA/CSSE 473 Day 27 Hash table review Intro to string searching.

Slides:



Advertisements
Similar presentations
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Advertisements

Space-for-Time Tradeoffs
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Data Structures Using C++ 2E
Chapter 7 Space and Time Tradeoffs. Space-for-time tradeoffs Two varieties of space-for-time algorithms: b input enhancement — preprocess the input (or.
1 CSC 421: Algorithm Design & Analysis Spring 2013 Space vs. time  space/time tradeoffs  examples: heap sort, data structure redundancy, hashing  string.
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off (extra space in tables - breathing.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
§3 Separate Chaining ---- keep a list of all keys that hash to the same value struct ListNode; typedef struct ListNode *Position; struct HashTbl; typedef.
MA/CSSE 473 Day 28 Hashing review B-tree overview Dynamic Programming.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
MA/CSSE 473 Day 22 Binary Heaps Heapsort Answers to student questions Exam 2 Tuesday, Nov 4 in class.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
MA/CSSE 473 Day 24 Student questions Quadratic probing proof
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
MA/CSSE 473 Day 23 Student questions Space-time tradeoffs Hash tables review String search algorithms intro.
Final Exam Review CS Total Points – 60 Points Writing Programs – 50 Points Tracing Algorithms, determining results, and drawing pictures – 50.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off: b extra space in tables (breathing.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
CISC220 Fall 2009 James Atlas Dec 04: Hashing and Maps K+W Chapter 9.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Design and Analysis of Algorithms – Chapter 71 Space-Time Tradeoffs: String Matching Algorithms* Dr. Ying Lu RAIK 283: Data Structures.
MA/CSSE 473 Day 25 Student questions Boyer-Moore.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
MA/CSSE 473 Day 23 Review of Binary Heaps and Heapsort
MA/CSSE 473 Day 26 Student questions Boyer-Moore B Trees.
Data Structures Using C++ 2E
Hashing Alexandra Stefan.
Exam Hints.
MA/CSSE 473 Day 24 Student questions Space-time tradeoffs
Data Structures Using C++ 2E
Hashing Exercises.
Advanced Associative Structures
CSCE350 Algorithms and Data Structure
Space-for-time tradeoffs
Collision Resolution Neil Tang 02/18/2010
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Chapter 7 Space and Time Tradeoffs
CSE 326: Data Structures Hashing
Hashing Alexandra Stefan.
Space-for-time tradeoffs
Space-for-time tradeoffs
Collision Resolution Neil Tang 02/21/2008
CSE 326: Data Structures Lecture #12 Hashing II
Space-for-time tradeoffs
Space-for-time tradeoffs
Collision Resolution: Open Addressing Extendible Hashing
Presentation transcript:

MA/CSSE 473 Day 27 Hash table review Intro to string searching

SPACE-TIME TRADEOFFS Sometimes using a little more space saves a lot of time

Space vs time tradeoffs Often we can find a faster algorithm if we are willing to use additional space. Give some examples (quiz question) Examples: Q8

Space vs time tradeoffs Often we can find a faster algorithm if we are willing to use additional space. Give some examples (quiz question) Examples: –Binary heap vs simple sorted array. Uses one extra array position –Merge sort –Radix sort and Bucket Sort –Anagram finder –Binary Search Tree (extra space for the pointers) –AVL Tree (extra space for the balance code)

HASH TABLE IMPLEMENTATION A Quick Review

Hash Table Review Section 7.4 of Levitin Excellent detailed reference: Weiss Chapter 20 Covered in 230 –Both versions of the course We do a very quick review Then prove a property that is described in 230 but typically not proved there. A brief summary of hashing with a nice animation:

Hashing Review What problem do we try to solve by hashing? What is the general idea of how hashing works? Why does it fit into Chapter 7 (space-time tradeoffs)? What are the main issues to be addressed when discussing hashing implementation? How to choose between a hash table and a binary search tree? Discuss the following questions in a group of three students

Terminology and analysis If any of this terminology is unfamiliar, you should look it up collision load factor ( ) perfect hash function open addressing –linear probing –cluster –quadratic probing –rehashing separate chaining Q1 Answer student questions on Q1 before going on to Q2

Some Hashing Details … Can be found on this page: hulman.edu/class/csse/csse230/201230/Slides/17- Graphs-HashTables.pdfhttp:// hulman.edu/class/csse/csse230/201230/Slides/17- Graphs-HashTables.pdf Similar to Weiss's presentation They are linked from here in case you didn't "get it" the first time in CSSE230. We will not go over all of them in detail in class. If you don't understand the effect of clustering, you might find the animation that is linked from the slides especially helpful.

With linear probing, if there is a collision at H, we try H, H+1, H+2, H+3,... (all modulo the table size) until we find an empty spot. –Causes (primary) clustering With quadratic probing, we try H, H+1 2. H+2 2, H+3 2,... –Eliminates primary clustering, but can cause secondary clustering. –Is it possible that it misses some available array positions? Collision Resolution: Quadratic probing Q2

Choose a prime number for the array size, then … –If the array is not more than half full, finding a place to do an insertion is guaranteed, and no cell is probed twice –Suppose the array size is P, a prime number greater than 3 –Show by contradiction that if i and j are ≤ ⌊ P/2 ⌋, and i≠j, then H + i 2 (mod P) ≢ H + j 2 (mod P). Use an algebraic trick to calculate next index –Replaces mod and general multiplication with subtraction and a bit shift –Difference between successive probes: H + (i+1) 2 = H + i 2 + (2i+1 ) [can use a bit-shift for the multiplication]. nextProbe = nextProbe + (2i+1); if (nextProbe >= P) nextProbe -= P; Hints for quadratic probing Q3

No one has been able to analyze it Experimental data shows that it works well –Provided that the array size is prime, and is the table is less than half full Quadratic probing analysis

STRING SEARCH Search for a string within another string

Brute Force String Search Example The problem: Search for the first occurrence of a pattern of length m in a text of length n. Usually, m is much smaller than n. What makes brute force so slow? When we find a mismatch, we can shift the pattern by only one character position in the text. Text: abracadabtabradabracadabcadaxbrabbracadabraxxxxxxabracadabracadabra Pattern: abracadabra abracadabra abracadabra abracadabra abracadabra Q4-5

Faster String Searching Brute force: worst case m(n-m+1) A little better: but still Ѳ(mn) on average –Short-circuit the inner loop Was a HW problem

Horspool's Algorithm A simplified version of the Boyer-Moore algorithm A good bridge to understanding Boyer-Moore Published in 1980 What makes brute force so slow? –When we find a mismatch, we can only shift the pattern to the right by one character position in the text. –Text: abracadabtabradabracadabcadaxbrabbracadabraxxxxxxabracadabracadabra Pattern: abracadabra abracadabra abracadabra abracadabra Can we shift farther? Like Boyer-Moore, Horspool does the comparisons in a counter-intuitive order (moves right-to-left through the pattern) Q6-8

Horspool's Main Question If there is a character mismatch, how far can we shift the pattern, with no possibility of missing a match within the text? What if the last character in the pattern is compared with a character in the text that does not occur in the pattern at all? Text:... ABCDEFG... Pattern: CSSE473

How Far to Shift? Look at first (rightmost) character in the part of the text that is compared to the pattern: The character is not in the pattern.....C {C not in pattern) BAOBAB The character is in the pattern (but not the rightmost).....O ( O occurs once in pattern) BAOBAB.....A ( A occurs twice in pattern) BAOBAB The rightmost characters do match.....B BAOBAB