1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.

Slides:



Advertisements
Similar presentations
Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
Advertisements

1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Hash Tables.
Hashing.
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Hashing Dashiell Fryer CS 157B Dr. Lee. Contents Static Hashing Static Hashing File OrganizationFile Organization Properties of the Hash FunctionProperties.
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
Hash Table indexing and Secondary Storage Hashing.
External Memory Hashing. Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key.
Chapter 13 Hash Tables Section 13.4 CS 257 Dr. T.Y.Lin Abhishek Pandya ID
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
Chapter 13.4 Hash Tables Steve Ikeoka ID: 113 CS 257 – Spring 2008.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Chapter 5 Record Storage and Primary File Organizations
Hash-Based Indexes. Introduction uAs for any index, 3 alternatives for data entries k*: w Data record with key value k w w Choice orthogonal to the indexing.
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
CS 245: Database System Principles
Relational Database Systems 2
Dynamic Hashing (Chapter 12)
Lecture 21: Hash Tables Monday, February 28, 2005.
CPSC-608 Database Systems
CS 245: Database System Principles
External Memory Hashing
CS 245: Database System Principles
Index tuning Hash Index.
Database Design and Programming
2018, Spring Pusan National University Ki-Joune Li
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
CPSC-608 Database Systems
Hash-Based Indexes Chapter 11
CS4432: Database Systems II
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina

2 Main Memory Hash Tables uA hash function h maps search keys to integers in some range 0 to B-1 uB is the number of buckets uThere is a B-element array, each entry holds a pointer to a linked list uRecord with key k is put in the linked list that starts at entry h(k) of B.

3 Example of Hash Table B = 5 h(k) = k mod 5

4 Changes for Secondary Storage uBucket array contains blocks, not pointers to linked lists uRecords that hash to a certain bucket are put in the corresponding block uIf a bucket overflows then start a chain of overflow blocks

5 Insertion into Static Hash Table uTo insert a record with key K: ucompute h(K) uinsert record into one of the blocks in the chain of blocks for bucket number h(K), adding a new block to the chain if necessary

6 EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = d a c b h(e) = 1 e

7 Deletion from a Static Hash Table uTo delete records with key K: uGo to the bucket numbered h(K) uSearch for records with key K, deleting any that are found uPossibly condense the chain of overflow blocks for that bucket

a b c e d EXAMPLE: deletion Delete: e f f g maybe move g up c d

9 Rule of thumb: uTry to keep space utilization between 50% and 80% Utilization = # record used total # records that fit uIf < 50%, wasting space uIf > 80%, overflows significant depends on how good hash function is & on # records/bucket

10 Efficiency of Static Hash Tables uIf the hash table size is large enough and the distribution of keys by the hash function is sufficiently "even", then most buckets have no overflow blocks uIn this case lookup typically takes one disk I/O and insertion/deletion take two uSignificantly better than sequential indexes and B-trees u(But: hash tables do not support efficient range queries as B-trees do) uWhat if there are long overflow blocks?

11 How do we cope with growth? uOverflows and reorganizations uDynamic hashing uExtensible uLinear

12 Extensible Hash Tables uEach bucket in the bucket array contains a pointer to a block, instead of a block itself uBucket array can grow by doubling in size uCertain buckets can share a block if small enough uhash function computes a sequence of k bits, but only first i bits are used at any time to index into the bucket array uValue of i can increase (corresponds to bucket array doubling in size)

13 Extensible hashing: two ideas (a) Use i of b bits output by hash function b h(K) use i grows over time…

14 (b) Use directory h(K)[i ] to bucket

15 Inserting into Extensible Hash Table uTo insert record with key K: ucompute h(K) ugo to bucket indexed by first i bits of h(K) ufollow the pointer to get to block B uif room in B, insert record uelse let j be number of bits of hash value used to determine membership in B

16 Insertion cont'd uCase 1: j < i. wsplit block B in two wdistribute records in B to the 2 new blocks based on value of their (j+1)-st bit wupdate header of each new block to j+1 wadjust pointers in bucket array so that entries that used to point to B now point to correct block wif still no room in appropriate block for new record then repeat this process

17 Insertion cont'd uCase 2: j = i. wincrement i by 1 wdouble length of bucket array wentry for w0 and w1 both point to same block that old entry w pointed to (block is shared) wapply case 1 to split block B

18 Example: h(k) is 4 bits; 2 keys/bucket i = Insert New directory i = 2 2

Insert: i = Example continued

i = Insert: 1001 Example continued i = 3 3

21 Extensible hashing: deletion uNo merging of blocks uMerge blocks and cut directory if possible (Reverse insert procedure)

22 Extensible hashing Can handle growing files - with less wasted space - with no full reorganizations Summary + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - -

23 Linear Hash Tables uNumber of buckets increases more slowly than with extensible hashing uNumber of buckets is such that on average each block is x% full (say 80%) -- threshold uOverflow blocks can occur but average number per bucket << 1 uUse the i low-order bits from the result of the hash function to index into the bucket array

24 Linear hashing uAnother dynamic hashing scheme Two ideas: (a) Use i low order bits of hash grows b i (b) Bucket array grows linearly

25 Inserting into Linear Hash Table uTo insert record with key K, with last i bits of h(K) being a 1 a 2 …a i : uLet m be the integer represented by a 1 a 2 …a i in binary uIf m < n (number of buckets), then bucket m exists -- put record in that bucket uIf m n, then bucket m does not (yet) exist, so put record in bucket whose index corresponds to 0a 2 …a i

26 Inserting cont'd uIf no room in indicated bucket, then create an overflow bucket uCompare # records / # buckets to threshold uIf exceeds threshold then add a new bucket and rearrange records uIf number of buckets exceeds i, then increment i by 1

27 Example b=4 bits, i =2, 2 keys/bucket m = 01 (max used block) Future growth buckets If h(k)[i ] m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2 i -1 Rule 0101 can have overflow chains! insert 0101

28 Example b=4 bits, i =2, 2 keys/bucket m = 01 (max used block) Future growth buckets insert

29 Example Continued: How to grow beyond this? m = 11 (max used block) i =

30 Linear Hashing Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing Summary + + Can still have overflow chains -

31 uHashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 Comparing Index Approaches

32 uSequential Indexes and B-trees good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 Indexing vs Hashing

33 Index definition in SQL uCreate index name on rel (attr) uCreate unique index name on rel (attr) defines candidate key uDrop INDEX name

34 CANNOT SPECIFY TYPE OF INDEX (e.g. B-tree, Hashing, …) OR PARAMETERS (e.g. Load Factor, Size of Hash,...)... at least in SQL... Note