File Organizations Jan. 2008Yangjun Chen ACS-39021 Outline: Hashing (5.9, 5.10, 3 rd. ed.; 13.8, 4 th ed.) external hashing static hashing & dynamic hashing.

Slides:



Advertisements
Similar presentations
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications Faloutsos & Pavlo Lecture#11 (R&G ch. 11) Hashing.
Advertisements

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
File Organizations Sept. 2012Yangjun Chen ACS Outline: Hashing (5.9, 5.10, 3 rd. ed.; 13.8, 4 th, 5 th ed.; 17.8, 6 th ed.) external hashing static.
Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
Hash-Based Indexes The slides for this text are organized into chapters. This lecture covers Chapter 10. Chapter 1: Introduction to Database Systems Chapter.
1 Linear Hashing Appendix for Chapter 1. 2 Linear Hashing Allow a hash file to expand and shrink dynamically without needing a directory. Suppose the.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
CPSC 404, Laks V.S. Lakshmanan1 Hash-Based Indexes Chapter 11 Ramakrishnan & Gehrke (Sections )
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 11 – Hash-based Indexing.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
Copyright 2003Curt Hill Hash indexes Are they better or worse than a B+Tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Hash Table indexing and Secondary Storage Hashing.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
HASH TABLES Malathi Mansanpally CS_257 ID-220. Agenda: Extensible Hash Tables Insertion Into Extensible Hash Tables Linear Hash Tables Insertion Into.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Hashing General idea: Get a large array
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 5, 6 of Elmasri “ How index-learning turns no student.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Hashing.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Comp 335 File Structures Hashing.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
FALL 2005 CENG 351 Data Management and File Structures 1 Hashing.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Static Hashing (using overflow for collision managment e.g., h(key) mod M h key Primary bucket pages 1 0 M-1 Overflow pages(as separate link list) Overflow.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Chapter 5 Record Storage and Primary File Organizations
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Dynamic Hashing (Chapter 12)
Are they better or worse than a B+Tree?
Hashing CENG 351.
Disk Storage, Basic File Structures, and Hashing
Database Management Systems (CS 564)
Disk Storage, Basic File Structures, and Hashing
Introduction to Database Systems
static hashing & dynamic hashing hash function
Advance Database System
Database Systems (資料庫系統)
2018, Spring Pusan National University Ki-Joune Li
1.(5) Describe the working process with a database system.
Presentation transcript:

File Organizations Jan. 2008Yangjun Chen ACS Outline: Hashing (5.9, 5.10, 3 rd. ed.; 13.8, 4 th ed.) external hashing static hashing & dynamic hashing hash function mathematical function that maps a key to a bucket address collisions collision resolution scheme open addressing chaining multiple hashing linear hashing

File Organizations Jan. 2008Yangjun Chen ACS Mapping a table into a file ssnnamebdatesexaddresssalary …... Employee file mapping Block (or page) -access unit of operating system -block size: range from 512 to 4096 bytes Bucket -access unit of database system -A bucket contains one or more blocks. A file can be considered as a collection of buckets. Each bucket has an address.

File Organizations Jan. 2008Yangjun Chen ACS External Hashing Consider a file comprising a primary area and an overflow area Records hash to one of many primary buckets Records not fitting into the primary area are relegated to overflow Common implementations are static - the number of primary buckets is fixed - and we expect to need to reorganize this type of files on a regular basis.

File Organizations Jan. 2008Yangjun Chen ACS External Hashing Consider a static hash file comprising M primary buckets We need a hash function that maps the key onto {0, 1, … M-1} If M is prime and Key is numeric then Hash(Key)= Key mod M can work well A collision may occur when more than one records hash to the same address We need a collision resolution scheme for overflow handling because the number of collisions for one primary bucket can exceed the bucket capacity open addressing chaining multiple hashing

File Organizations Jan. 2008Yangjun Chen ACS Overflow handling Open addressing subsequent buckets are examined until an open record position is found no need for an overflow area, unless the number of records exceeds … consider records being inserted R1, R2, R3, R4, R5, R6, R7 with bucket capacity of 2 and hash values 1, 2, 3, 2, 2, 1, How do we handle retrieval, deletion?

File Organizations Jan. 2008Yangjun Chen ACS Overflow handling Chaining a pointer in the primary bucket points to the first overflow record overflow records for one primary bucket are chained together consider records being inserted R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11. with bucket capacity of 2 and hash values 1, 2, 3, 2, 2, 1, 4, 2, 3, 3, 3. deletions? Primary Area Overflow Area

File Organizations Jan. 2008Yangjun Chen ACS Overflow handling Multiple Hashing when collision occurs a next hash function is tried to find an unfilled bucket eventually we would resort to chaining note that open addressing can suffer from poor performance due to islands of full buckets occurring and having a tendency to get even longer - using a second hash function helps avoid that problem

File Organizations Jan. 2008Yangjun Chen ACS Linear Hashing A dynamic hash file: grows and shrinks gracefully initially the hash file comprises M primary buckets numbered 0, 1, … M-1 the hashing process is divided into several phases (phase 0, phase 1, phase 2, …). In phase j, records are hashed according to hash functions h j (key) and h j+1 (key) h j (key) = key mod (2 j *M) phase 0: h 0 (key) = key mod (2 0 *M), h 1 (key) = key mod (2 1 *M) phase 1: h 1 (key) = key mod (2 1 *M), h 2 (key) = key mod (2 2 *M) phase 2: h 2 (key) = key mod (2 2 *M), h 3 (key) = key mod (2 3 *M) …...

File Organizations Jan. 2008Yangjun Chen ACS Linear Hashing h j (key) is used first; to split, use h j+1 (key) a split pointer keeps track of which bucket to split next splitting buckets splitting occurs according to a specific rule such as - an overflow occurring, or - the load factor reaching a certain value, etc. splitting a bucket means to redistribute the records into two buckets: the original one and a new one. In phase j, to determine which ones go into the original while the others go into the new one, we use h j+1 (key) = key mod 2 j+1 *M to calculate their address. split pointer goes from 0 to 2 j *M-1 during the j th phase, j= 0, 1, 2,...

File Organizations Jan. 2008Yangjun Chen ACS Linear Hashing 1.What is a phase? 2.How to split a bucket? 3.When to split a bucket? 4.What bucket will be chosen to split next?

File Organizations Jan. 2008Yangjun Chen ACS Linear Hashing, example initially suppose M=4 h 0 (key) = key mod M; i.e. key mod 4 (rightmost 2 bits) h 1 (key) = key mod 2*M Capacity of a bucket is 2. As the file grows, buckets split and records are redistributed using h 1 (key) = key mode 2*M. n=0 n=1 after the split

File Organizations Jan. 2008Yangjun Chen ACS Linear Hashing, example collision resolution strategy: chaining split rule: if load factor > 0.70 insert the records with hash values: 0011, 0010, 0100, 0001, 1000, 1110, 0101, 1010, 0111, Buckets to be added during the expansion

File Organizations Jan. 2008Yangjun Chen ACS Linear Hashing, example when inserting the sixth record (using h 0 = Key mod M)we would have but the load factor 6/8= 0.75 > 0.70 and so bucket 0 must be split (using h 1 = Key mod 2M): n=0 before the split (n is the point to the bucket to be split.) n=1 after the split load factor: 6/10=0.6 no split

File Organizations Jan. 2008Yangjun Chen ACS Linear Hashing, example n=1 load factor: 7/10=0.7 no split insert(0101)

File Organizations Jan. 2008Yangjun Chen ACS Linear Hashing, example n=1 load factor: 8/10=0.8 split using h 1. insert(1010) overflow

File Organizations Jan. 2008Yangjun Chen ACS Linear Hashing, example n=2 load factor: 8/12=0.66 no split 1010 overflow 0101

File Organizations Jan. 2008Yangjun Chen ACS Linear Hashing, example n=2 load factor: 9/12=0.75 split using h overflow overflow 0101 insert(0111)

File Organizations Jan. 2008Yangjun Chen ACS Linear Hashing, example n=3 load factor: 9/14=0.642 no split insert(1100)

File Organizations Jan. 2008Yangjun Chen ACS Linear Hashing, example n=3 load factor: 10/14=0.71 split using h

File Organizations Jan. 2008Yangjun Chen ACS Linear Hashing, example n=4 load factor: 10/16=0.625 no split. At this point, all the 4 (M) buckets are split. n should be set to 0. It begins a second phase. In the second phase, we will use h 1 to insert records and h 2 to split a bucket. -note that h 1 (K) = K mod 2M and h 2 (K) = K mod 4M

File Organizations Jan. 2008Yangjun Chen ACS Linear Hashing including two Phases: - collision resolution strategy: chaining -split rule: load factor > 0.7 -initially M = 4 (M: size of the primary area) -hash functions: h i (key) = key mod 2 i  M (i = 0, 1, 2, …) -bucket capacity = 2 Trace the insertion process of the following keys into a linear hashing file: 3, 2, 4, 1, 8, 14, 5, 10, 7, 24, 17, 13, 15.

File Organizations Jan. 2008Yangjun Chen ACS The first phase – phase 0 when inserting the sixth record we would have but the load factor 6/8= 0.75 > 0.70 and so bucket 0 must be split (using h 1 = Key mod 2M): n=0 before the split (n is the point to the bucket to be split.) n=1 after the split load factor: 6/10=0.6 no split

File Organizations Jan. 2008Yangjun Chen ACS n=1 load factor: 7/10=0.7 no split insert(5)

File Organizations Jan. 2008Yangjun Chen ACS n=1 load factor: 8/10=0.8 split using h 1. insert(10) overflow

File Organizations Jan. 2008Yangjun Chen ACS n=2 load factor: 8/12=0.66 no split 10 overflow 5

File Organizations Jan. 2008Yangjun Chen ACS n=2 load factor: 9/12=0.75 split using h overflow overflow 5 insert(7)

File Organizations Jan. 2008Yangjun Chen ACS n=3 load factor: 9/14=0.642 no split insert(24)

File Organizations Jan. 2008Yangjun Chen ACS n=3 load factor: 10/14=0.71 split using h

File Organizations Jan. 2008Yangjun Chen ACS n= The second phase – phase n = 0; using h 1 = Key mod 2M to insert and h 2 = Key mod 4M to split. insert(17)

File Organizations Jan. 2008Yangjun Chen ACS n=0 load factor: 11/16=0.687 no split insert(13)

File Organizations Jan. 2008Yangjun Chen ACS n=0 load factor: 12/16=0.75 split bucket 0, using h

File Organizations Jan. 2008Yangjun Chen ACS n=1 load factor: 13/18=0.722 split bucket 1, using h insert(15)