Storage and Retrieval Structures by Ron Peterson.

Slides:



Advertisements
Similar presentations
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Hash Tables,
Advertisements

Hashing.
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Log Files. O(n) Data Structure Exercises 16.1.
Hashing Techniques.
Data Structures Hash Tables
Maps, Dictionaries, Hashtables
1 Hash Tables Gordon College CS Hash Tables Recall order of magnitude of searches –Linear search O(n) –Binary search O(log 2 n) –Balanced binary.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hash Table March COP 3502, UCF.
HASHING Section 12.7 (P ). HASHING - have already seen binary and linear search and discussed when they might be useful (based on complexity)
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hashing Dr. Yingwu Zhu.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
1 Symbol Tables The symbol table contains information about –variables –functions –class names –type names –temporary variables –etc.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Search  We’ve got all the students here at this university and we want to find information about one of the students.  How do we do it?  Linked List?
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
LECTURE 35: COLLISIONS CSC 212 – Data Structures.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Week 9 - Monday.  What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing CSE 2011 Winter July 2018.
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash functions Open addressing
Dictionaries and Their Implementations
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Lecture-Hashing.
Presentation transcript:

Storage and Retrieval Structures by Ron Peterson

Overview Storage & Retrieval as an ADT Simple implementations –Arrays of records –Sorted arrays –Trees Efficiency issues Hash tables

S & R ADT A container with a bunch of records Each record has a “key” field Operations: –Add a record –Remove a record by key –Find a record by key, retrieve a copy

Simple Implementations Array of records –Insert at end, –Find by linear search Sorted array –Insert in position order, –Find by binary search Trees and balanced trees –We’ll study this later

Efficiency Issues Regular arrays – O(N) retrieval Sorted arrays – O(log N) retrieval, but O(N) add (Insertion) Trees – O(log N) retrieval & add but backup & degenerate tree issues Balanced trees – O(log N), but complex & backup issues Alternative: Hash table – O(C) or close

Hash Table Motivation How about if we used an array, –but every record had a unique location? For example, we have an array of employee records, but the key is Employee-ID which goes from 1 to 300 Employee 17 gets put in location 17 Add and retrieve are each O(C) Problem: what if SSN is the key?

The Hash Table Solution For SSN as the Employee-ID –(as might be needed for Payroll) One slot per 9-digit ID would require an array of one billion slots; not feasible! Instead, let’s still have an array of 300 (or a few more) slots and then figure out: An easy “mapping” function: –LocationIndex = Hash(SSN)

Hash Table Issues Coming up with a Hash function –Easy to calculate –Result in correct range –Minimize duplicate answers Duplicates (“collisions”) inevitable –Many-to-one function (keys to location) Need a plan for dealing with it –“collision handling”

Collision Handling When adding a record, and a record with a different key is in the location given by the Hash function; And when retrieving any record that collided when added; You need to use the same process of what to do next.

Collision Handling Methods Just increment the location until you find an empty slot (or the key sought) –Called “linear probing” –Provably a bad choice because it tends to create filled up blocks! Jumps of increasing size (+ wrap-around); –Most common version is “quadratic probing” Using an overflow area with links

Hash function approaches Numeric key: just use mod: –Hash(key): return key%Size Non-numeric key: do a weighted sum of the ASCII codes of the characters: –Char[1] + 5*Char[2] + 17*Char[3] –Then Sum%Size Special care is usually taken to avoid non- uniformity in distribution of keys.

Design of a Hash Table Choose a size that leaves room for growth and turn-over (employees leaving?) Add, Remove, and Find all use the same –Hash function –Collision handling, so Write a Hash function Choose & implement a collision handling method

A Few Final Issues If you run out of slots, you might need to rebuild the whole table with a bigger size. The size is often chosen as a prime number so that cyclicity in the distribution of keys has the least effect. New approaches to collision handling are continually being studied. Hashing to pointers to linked lists can be very effective if the Hash function is good.