CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.

Slides:



Advertisements
Similar presentations
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advertisements

Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
Hash-Based Indexes The slides for this text are organized into chapters. This lecture covers Chapter 10. Chapter 1: Introduction to Database Systems Chapter.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
CPSC 404, Laks V.S. Lakshmanan1 Hash-Based Indexes Chapter 11 Ramakrishnan & Gehrke (Sections )
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 11 – Hash-based Indexing.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
Copyright 2003Curt Hill Hash indexes Are they better or worse than a B+Tree?
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
ICS 421 Spring 2010 Indexing (2) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 2/23/20101Lipyeow Lim.
Hash Table indexing and Secondary Storage Hashing.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Spring 2003 ECE569 Lecture ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
Spring 2004 ECE569 Lecture ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
1 Database Systems ( 資料庫系統 ) November 8, 2004 Lecture #9 By Hao-hua Chu ( 朱浩華 )
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Static Hashing (using overflow for collision managment e.g., h(key) mod M h key Primary bucket pages 1 0 M-1 Overflow pages(as separate link list) Overflow.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
B-Trees, Part 2 Hash-Based Indexes R&G Chapter 10 Lecture 10.
Chapter 5 Record Storage and Primary File Organizations
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Are they better or worse than a B+Tree?
Hash-Based Indexes Chapter 11
CPSC-608 Database Systems
Database Management Systems (CS 564)
Introduction to Database Systems
CS222: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hash-Based Indexes Chapter 10
External Memory Hashing
CS222P: Principles of Data Management Notes #8 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
Hashing.
Hash-Based Indexes Chapter 11
Index tuning Hash Index.
Database Systems (資料庫系統)
LINEAR HASHING E0 261 Jayant Haritsa Computer Science and Automation
Database Design and Programming
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
Hash-Based Indexes Chapter 11
Chapter 11 Instructor: Xin Zhang
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #07 Static Hashing, Extendible Hashing, Linear Hashing Instructor: Chen Li.
CS4432: Database Systems II
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

CS4432: Database Systems II Hash Indexing 1

Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2

Static Hashing Hash Table N buckets Since we talk about databases (disk-based) Each bucket will be one disk page Hashing function h(k) maps key k to one of the buckets Each bucket is one disk page 3

Example Hash Functions Each bucket is one disk page If the key k is integer, e.g., 100 – Hash function: k mod N If the key k is n-byte character string, e.g., “abcd” – Hash function: add (x 1 + x 2 + ….. X n) mod N Good Hash Function  Expected number of keys/bucket is the same for all buckets  Uniform distribution of keys 4

Within A Bucket Should we keep entries sorted? – Yes if we care about CPU time – Makes the insertion and deletion a bit more expensive 5

6 INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = d a c b Hash Table: Insertion We have 4 buckets Each bucket holds 2 keys Insert keys a, b, c, and d

7 1- Apply the hash function over d  h(d) = 0 2- Read the disk page of bucket 0 3- Search for key d - If keys are sorted, then search using Binary search Hash Table: Lookup Search for key = d Remember: Only equality search

d a c b Hash Table: Insertion with Overflow Insert key e  h(e) = 1 Create an overflow bucket and insert e Overflow bucket is another disk block e When Searching Remember to check the overflow buckets (if exist)

d a c b Hash Table: Deletion Search for the key to be deleted In case of overflow buckets – The overflow bucket may no longer be needed e

a b c e d EXAMPLE: Deletion Delete: e f f g maybe move “ g ” up c d Assume the following Hash Table

11 Handling The Growth of Hash Table In Static Hashing the # primary buckets is fixed If there are many keys, key distribution is bad – Use overflow buckets Bad News – The chain of overflow buckets may get large – Search time become slow Solution  Dynamic Hashing

Dynamic Hashing The number of primary buckets is not fixed and it can grow 12 Extensible Hashing Others … Our focus

Extensible Hash Index What to do when bucket (primary page) becomes full. What about we re-organize file by doubling # of buckets? – Too expensive because reading and writing all pages is expensive Main Idea of Extensible Hashing – Use a level of in-direction (array of pointers pointing to the hash buckets) – Use directory of pointers to buckets instead of buckets – double # of buckets by doubling the directory – split just the bucket that overflowed 13

Extensible Hash Index: Terminology Directory Buckets Global depth: # of bits to know the bucket Local depth: used at insertion time to know if we need to double the directory size For a given key k  convert to its bits (0s and 1s) 14

Extensible Hashing: Example 15 Directory uses 2 bits (the right-most ones)  4 entries Directory size = 4 Each bucket holds at most 4 entries How did we insert values 12, 10, 21?

Inserting Key 6 16 Since global depth = 2, we used only 2 most- right bits

Inserting Key Since global depth = 2, we used only 2 most- right bits Bucket A is full: -If local depth = global depth  double the size

Inserting Key Increment the global depth 2- This means  double its size 3- For the overflow bucket, divide into two 4- Increment their local depth 5- Re-distribute the keys 6- For all other buckets, leave them as is 7- the number of incoming pointers to each of these bucket is doubled For Buckets A & A2  Keys are distributed based on 3 bits For Others  Keys are distributed based on 2 bits 18

Inserting Key 9 Key 9  1001 (global depth = 3) Key 9  Bucket B (Full)  Since local depth < global depth No need to double Only split the bucket Increment local depth Re-distribute its keys 19

Inserting Key 9 X 1, 9 5, 13,

Extensible Hash Index Summary Lookup: – Global depth: # of bits needed to tell which bucket a datum belongs – Search the bucket Insertion: – If a bucket has room, add the hash key – If no room, May be able to add a new page without doubling (E.g., when adding 9*) May need to double the directory (E.g., when adding 20*) – How to tell if doubling is necessary? Doubling is necessary if Global Depth = Local Depth of overflow bucket 21