Chapter 11. Hashing.

Slides:



Advertisements
Similar presentations
CS Data Structures Chapter 8 Hashing.
Advertisements

Hashing.
HASH TABLE. HASH TABLE a group of people could be arranged in a database like this: Hashing is the transformation of a string of characters into a.
Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
Hashing Part Two Better Collision Resolution Small parts of this material stolen from "File Organization and Access" by Austing and Cassel.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Hashing Part One Reaching for the Perfect Search Most of this material stolen from "File Structures" by Folk, Zoellick and Riccardi.
Hashing Techniques.
Hashing CS 3358 Data Structures.
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by:  Search the hash table in.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Spring 2004 ECE569 Lecture ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
CS Data Structures Chapter 8 Hashing (Concentrating on Static Hashing)
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
CpSc 3220 File and Database Processing Hashing. Exercise – Build a B + - Tree Construct an order-4 B + -tree for the following set of key values: (2,
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Comp 335 File Structures Hashing.
Hashing Hashing is another method for sorting and searching data.
FALL 2005 CENG 351 Data Management and File Structures 1 Hashing.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing with Buckets. read = 10.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Data Structures1 Overview(1) O(1) access to files Variation of the relative file Record number for a record is not arbitrary; rather, it is obtained by.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 Chapter 9 Searching And Table. 2 OBJECTIVE Introduces: Basic searching concept Type of searching Hash function Collision problems.
File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by Folk, Zoellick and Riccardi.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
Chapter 5 Record Storage and Primary File Organizations
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
Hashing 1 Lec# 12 Presented by Halla Abdel Hameed.
CENG Hashing for files. CENG 3512 Introduction Idea: to reference items in a table directly by doing arithmetic operations to transform keys into.
Data Structures Using C++ 2E
LEARNING OBJECTIVES O(1), O(N) and O(LogN) access times. Hashing:
Hash Tables (Chapter 13) Part 2.
Hashing CENG 351.
Database Management System
Subject Name: File Structures
Data Structures Using C++ 2E
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash Table.
Hash In-Class Quiz.
Chapter 10 Hashing.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
Chapter 11. Hashing.
Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some.
Hashing.
What we learn with pleasure we never forget. Alfred Mercier
Hashing Indirect Address Translation
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
Presentation transcript:

Chapter 11. Hashing

Contents Introduction A Simple Hashing Algorithm Hashing Functions and Record Distributions How Much Extra Memory Should Be Used? Collision Resolution by Progressive Overflow Storing More Than One Record per Address: Buckets Making Deletions Other Collision Resolution Techniques Patterns of Record Access

1. Introduction O-notation O(1) O(N) : sequential searching O(log2N) O(logkN) : B-Tree (k : 리프 노드 크기) What is Hashing? a = h(K) h (hash function), K (key), a (home address) Example K = BASS h = (first char * second char) mod 1000 a = h(K) = (66 * 65) mod 1000 = 4,290 mod 1000 = 290

Introduction Collision Example key : LOWELL => a = (76 * 79) mod 1000 = 6,004 mod 1000 = 4 OLIVIER => a = (79 * 76) mod 1000 = 6,004 mod 1000 = 4 Several ways to reduce the number of collisions 1. Spread out the records Good hashing algorithms 2. Use extra memory 3. Put more than one record at a single address Buckets

2. A Simple Hashing Algorithm 3 Steps 1. Represent the key in numerical form 2. Fold and add 3. Divide by a prime number and use the remainder as the address Example Step 1. Represent the Key in Numerical Form LOWELL = 76 79 87 69 76 76 32 32 32 32 32 32 L O W E L L Blanks

A Simple Hashing Algorithm Example (계속) Step 2. Fold and Add 76 79 | 87 69 | 76 76 | 32 32 | 32 32 | 32 32 7679 + 8769 + 7676 + 3232 + 3232 = 30588 (30588+3232 = 33820 => 2byte Maximum 값 32767 을 초과하므로) 7679 + 8769 = 16448 => 16448 mod 19937 = 16448 16448 + 7676 = 24124 => 24124 mod 19937 = 4187 4187 + 3232 = 7419 => 7419 mod 19937 = 7419 7419 + 3232 = 10651 => 10651 mod 19937 = 10651 10651 + 3232 = 13883 => 13883 mod 19937 = 13883 Step 3. Divide by the Size of the Address Space a = s mod n (n : # of address in file) a = 13883 mod 100 = 83 a = 13883 mod 101 = 46

3. Hashing Functions and Record Distributions Distributing Records among Addresses 1 2 3 4 5 6 7 8 9 10 A B C D E F G Record Address Best (a) 1 2 3 4 5 6 7 8 9 10 A B C D E F G Record Address Worst (b) Acceptable Record Address 1 2 3 4 5 6 7 8 9 10 A B C D E F G (c) <Figure 11.3> Different distributions. (a) Uniform distribution(Best) (b) Worst case (c) Randomly distribution (Acceptable)

Hashing Functions and Record Distributions Some Other Hashing Methods Better than random Examine keys for a pattern 주민등록 번호 Divide the key by a prime number Random Square the key and take the middle 4532 => 2 0 5 2 0 9 Radix transformation

4. How Much Extra Memory Should Be Used ? Packing Density Example r = 75 records N = 100 address

How Much Extra Memory Should Be Used ? Predicting Collisions for Different Packing Densities Packing density (%) Synonyms (%) 10 40 70 90 100 4.8 17.6 28.1 34.1 36.8 <Table 11.2> Effect of packing density on the proportion of records not stored at their home addresses

5. Collision Resolution by Progressive Overflow Open addressing Linear probing address 3 York h(K) 1 2 Rosen Novak’s home address 3 Jasper York’s home address address 2 Novak h(K) 4 York

Collision Resolution by Progressive Overflow Search Length Key Home Address # of Access (Search Length) Adams Bates Cole Dean Evans 0 1 1 2 0 1 1 2 2 5 Adams 1 Bates 2 Cole 3 Dean 4 Evans 5

Collision Resolution by Progressive Overflow Search Length (계속) Example <Figure 11.7> Average search length versus packing density in a hashed file

6. Storing More Than One Record per Address : Buckets Key Home Address Green Hall Jenks King Land Marx Nutt 0 0 2 3 3 3 3 Green Hall 1 Jenks 2 King Land Marks 3 Nutt 4

Storing More Than One Record per Address : Buckets Effects of Buckets on Performance r : # of records N : # of addresses b : # of records in a bucket File without buckets File with buckets # of records # of addresses Bucket size Packing density Ratio of records to addresses r = 750 N = 1000 b = 1 0.75 r/N = 0.75 r = 750 N = 500 b = 2 0.75 r/N = 1.5

Storing More Than One Record per Address : Buckets <Table 11.4> Synonyms causing collisions as a percent of records for different packing densities and different bucket sizes Packing density Bucket size 1 2 5 10 20 % 50 % 80 % 100 % 9.4 21.3 31.2 36.8 2.2 10.4 20.4 27.1 0.1 2.5 10.3 17.6 0.0 0.4 5.3 12.5

7. Making Deletions 처음상태 Key Home Address Actual address Adams Jones Morris Smith 1 2 3 Adams Jones 1 Morris 2 Smith 3

Making Deletions (1) Tombstones for Handling Deletions Adams Jones 1 Morris 2 Smith 3 * Deletion of Morris Adams Jones 1 ### 2 Smith 3 “Smith는 찾을 수 없다” ### : tombstone This mark indicates that a record once lived there but no longer does

Making Deletions (2) Implications of Tombstones for Insertions Inserting “Smith” (3) Effects of Deletions and Additions on Performance Solution to problem of deteriorating average search length Reorganization

8. Other Collision Resolution Techniques (1) Double Hashing Second hashing function Increment(c) adding Seek time overhead

Other Collision Resolution Techniques (2) Chained Progressive Overflow Adams 1 Bates 2 Cole Key Home address Actual Address Search length(1) Search length(2) Adams Bates Cole Dean Evans Flint 0 1 0 1 4 0 0 1 2 3 4 5 1 1 3 3 1 6 1 1 2 2 1 3 3 Dean 4 Evans 5 Flint Adams 2 1 Bates 3 2 Cole 5 3 Dean -1 4 Evans -1 5 Flint -1

Other Collision Resolution Techniques (3) Chaining with a Separate Overflow Area Home address Primary data area Overflow area Adams Cole 2 1 Bates 1 Dean -1 2 Flint -1 3 4 Evans -1

Other Collision Resolution Techniques (4) Scatter Tables: Indexing Revisited Adams 1 1 2 3 4 Coles 3 Bates 4 Flint -1 Deans -1 Evans -1

Patterns of Record Access A small percentage of the records in a file account for a large percentage of the accesses : 80 / 20 Rule 80% of the accesses are performed on 20% of the records