Hash Table indexing and Secondary Storage Hashing.

Slides:



Advertisements
Similar presentations
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advertisements

Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function.
CPSC 404, Laks V.S. Lakshmanan1 Hash-Based Indexes Chapter 11 Ramakrishnan & Gehrke (Sections )
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
Copyright 2003Curt Hill Hash indexes Are they better or worse than a B+Tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
Log Files. O(n) Data Structure Exercises 16.1.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
External Memory Hashing. Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key.
Chapter 13 Hash Tables Section 13.4 CS 257 Dr. T.Y.Lin Abhishek Pandya ID
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Chapter 13.4 Hash Tables Steve Ikeoka ID: 113 CS 257 – Spring 2008.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
HASH TABLES Malathi Mansanpally CS_257 ID-220. Agenda: Extensible Hash Tables Insertion Into Extensible Hash Tables Linear Hash Tables Insertion Into.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 5, 6 of Elmasri “ How index-learning turns no student.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Comp 335 File Structures Hashing.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
FALL 2005 CENG 351 Data Management and File Structures 1 Hashing.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Chapter 5 Record Storage and Primary File Organizations
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Relational Database Systems 2
Lecture 21: Hash Tables Monday, February 28, 2005.
Are they better or worse than a B+Tree?
Hashing CENG 351.
External Memory Hashing
Introduction to Database Systems
External Memory Hashing
External Memory Hashing
Hashing.
Index tuning Hash Index.
Database Design and Programming
2018, Spring Pusan National University Ki-Joune Li
CPSC-608 Database Systems
Hash-Based Indexes Chapter 11
Presentation transcript:

Hash Table indexing and Secondary Storage Hashing

In Memory An array of B buckets indexed from 0 to B-1 Each bucket is the head of a linked list The bucket index is determined by hash function h(k), where k is the key A common hash function, h(k) = B%k 4/6/20092COMP Mount Allison University

Secondary Storage Hashing Static hash table, fixed number of buckets Dynamic hash table, number of buckets can grow 4/6/20093COMP Mount Allison University

Static Hash Table The bucket array consists of blocks, rather than pointers to linked lists Records that are hashed by the hash function to a certain bucket are stored in the block of that bucket If there is no more place in the block, a chain of overflow blocks can be added to the bucket 4/6/20094COMP Mount Allison University

Dynamic Hash Tables Number of buckets (B) approximate the number of records divided by the number of records that can fit on a block, i.e. there is about one block per bucket Extensible hashing, B grows by doubling it Linear hashing, B grows by 1 4/6/20095COMP Mount Allison University

Extensible Hashing There is an array of pointers to blocks that represent the buckets, instead of array consisting of data itself. The length of array is always a power of, so in a growing step, the number of buckets doubles. There is not necessarily a data block for every bucket, some buckets can share a block if total number of records in those buckets fit in a block 4/6/20096COMP Mount Allison University

Extensible Hashing The hash function computes for each key a sequence of k bits. The bucket numbers use a small set of those k bits, say i most significant bits Therefore the bucket array has 2 i entries 4/6/20097COMP Mount Allison University

Extensible Hashing Advantage: when looking for a record, we never need to search more than one data block Disadvantage: for large i, doubling the array size is a substantial amount of work 4/6/20098COMP Mount Allison University

Extensible Hashing Disadvantage: for large i, the bucket array may not fit in memory any more. Example: assuming i = 32, the size of array will be 4 billion entries, and every pointer is 32 bits or 4 bytes, then the size of array will be 4 bytes x 4 billion = 16 GB 4/6/20099COMP Mount Allison University

Extensible Hashing 1.Every key has 4 bits, the most significant bit is used to determine the bucket number 2.The number 1 appearing in the nub of each block (lets call it j), indicates the number of bits used to determine membership of records in this block 4/6/200910COMP Mount Allison University

Extensible Hashing Insertion: If i = j, increment i by 1, and double the length of bucket array, i.e. 2 i+1 If j < i, split block B into two, distribute records in B to the two blocks based on (j+1) most significant bits, adjust j value for the proper blocks, adjust pointers in bucket array to point to proper blocks 4/6/200911COMP Mount Allison University

Extensible Hashing 1.Lets insert 1010 into this structure, it has to go to block 1, but there no place, 2.Then we have to split the block,, 3.and i = j, then we increment i and double the size of bucket array 4.Then we can split the block 1 into two blocks 4/6/200912COMP Mount Allison University

Extensible Hashing 1.Now block 1 is split into blocks 10 and 11 2.We use two bits now to determine the proper block for every record 3.Note the first block still is using one bit, therefore both buckets 00 and 01 point to it 4.If we insert 0000, it will go to the block pointed by buckets 00 and 01 5.If we insert 0111, based on i = 2 it has to go the same block and there is no room 6.Since j < i, we can simply split that block into two and adjust the proper bucket pointers 4/6/200913COMP Mount Allison University

Linear Hashing The number of buckets B is always chosen so the average number of records per bucket is a fixed fraction, say 80%, of the number of records that fill one block. Since blocks cannot always be split, overflow blocks are permitted. 4/6/200914COMP Mount Allison University

Linear Hashing The number of bits used to number the entries of the bucket array is (Ceiling (log 2 B)), where B is the current number of buckets. These bits are always taken from the right (low-order) end of the bit sequence that is produced by the hash function. We treat those bits as a binary integer number m, therefore if m<B, then the bucket m exists, if B <= m < 2 i, the bucket m does not exist yet, we place the record in bucket m – 2 i-1, 4/6/200915COMP Mount Allison University

Linear Hashing 1.i is the number of bits to address the buckets, the right most bit is used 2.n is the number of buckets 3.r is the number of records 4.We keep r/n <= 1.7, average occupancy of a bucket does not exceed 85% of the capacity of the block 4/6/200916COMP Mount Allison University

Linear Hashing 1.To insert 0101, since the bit sequence ends in 1, the record goes to bucket 1. 2.There is room then it can go there. 3.However now we exceed the ratio 1.7 (r/n), we should raise n to 3, then i = log 3 = 2 4/6/200917COMP Mount Allison University

Linear Hashing 1.Now we insert 0001, it has to go to the bucket 01, since its last two bits are 01 2.However that bucket is full 3.We add an overflow block 4.The ratio of records/buckets is 5/3, and still less than 1.7, so we don’t create new bucket 4/6/200918COMP Mount Allison University

Linear Hashing 1.Now lets insert 0111, this has to go to bucket m = 11 2, 2. m = 11 2 = 3 10 = n (number of buckets), then the bucket doesn’t exist 3.We place it in the bucket m – 2 i-1, i.e. 3 – 2 = 1 10 = 01 2, 4.However, the ratio of r/n exceeds 1.7, so we create a new bucket, i.e. 11 4/6/200919COMP Mount Allison University

Linear Hashing 4/6/2009COMP Mount Allison University20 1.Suppose we look for Since i = 2, we look for bucket number = 10 2 = Since m < n, then the bucket exist 1.Now lets look for Must be in bucket 11 3.But 11 2 = 3 10 = n, therefore the bucket doesn’t exist 4.We redirect to bucket 01 2 = 1 10, remember (m – 2 i-1 ) 5.If it is not there, surely it doesn’t exist