Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.

Slides:



Advertisements
Similar presentations
1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.
Advertisements

External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Hashing Dashiell Fryer CS 157B Dr. Lee. Contents Static Hashing Static Hashing File OrganizationFile Organization Properties of the Hash FunctionProperties.
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Hash-Based Indexes The slides for this text are organized into chapters. This lecture covers Chapter 10. Chapter 1: Introduction to Database Systems Chapter.
1 Linear Hashing Appendix for Chapter 1. 2 Linear Hashing Allow a hash file to expand and shrink dynamically without needing a directory. Suppose the.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
CPSC 404, Laks V.S. Lakshmanan1 Hash-Based Indexes Chapter 11 Ramakrishnan & Gehrke (Sections )
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
Copyright 2003Curt Hill Hash indexes Are they better or worse than a B+Tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 B-Trees Size and Lookup Chapter 1.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
Hash Table indexing and Secondary Storage Hashing.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
External Memory Hashing. Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key.
Chapter 13 Hash Tables Section 13.4 CS 257 Dr. T.Y.Lin Abhishek Pandya ID
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
Chapter 13.4 Hash Tables Steve Ikeoka ID: 113 CS 257 – Spring 2008.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
HASH TABLES Malathi Mansanpally CS_257 ID-220. Agenda: Extensible Hash Tables Insertion Into Extensible Hash Tables Linear Hash Tables Insertion Into.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
FALL 2005 CENG 351 Data Management and File Structures 1 Hashing.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Database Systems II Index Structures.
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
Dynamic Hashing (Chapter 12)
Lecture 21: Hash Tables Monday, February 28, 2005.
Are they better or worse than a B+Tree?
Hashing CENG 351.
CPSC-608 Database Systems
Database Management Systems (CS 564)
External Memory Hashing
Introduction to Database Systems
External Memory Hashing
Hashing.
Index tuning Hash Index.
Database Systems (資料庫系統)
Database Design and Programming
Module 12a: Dynamic Hashing
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
Index tuning Hash Index.
CS4432: Database Systems II
Presentation transcript:

Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be in bucket h(K). - Cuts search down by a factor of B. - One disk I/O if there is only one block per bucket. Hash­Table Lookup: For record(s) with search key K, compute h(K); search that bucket.

Hash­Table Insertion Put in bucket h(K) if it fits; otherwise create an overflow block. - Overflow block(s) are part of bucket. Example: Insert record with search key g.

What if the File Grows too Large? Efficiency is highest if #records < #buckets  #(records/block) If file grows, we need a dynamic hashing method to maintain the above relationship. - Extensible Hashing: double the number of buckets when needed. - Linear hashing: add one more bucket as appropriate.

Dynamic Hashing Framework Hash function h produces a sequence of k bits. Only some of the bits are used at any time to determine placement of keys in buckets. Extensible Hashing (Buckets may share blocks!) Keep parameter i = number of bits from the beginning of h(K) that determine the bucket. Bucket array now = pointers to buckets. - A block can serve as several buckets. - For each block, a parameter j  i tells how many bits of h(K) determine membership in the block. - I.e., a block represents 2 i-j buckets that share the first j bits of their number.

Example An extensible hash table when i=1:

Extensible Hash­table Insert If record with key K fits in the block pointed to by h(K), put it there. If not, let this block B represent j bits. 1. j<i: a)Split block B into two and distribute the records (of B) according to (j+1) st bit; b)set j:=j+1; c)fix pointers in bucket array, so that entries that formerly pointed to B now point either to B or the new block How? depending on…(j+1) st bit 2. j=i: 1.Set i:=i+1; 2.Double the bucket array, so it has now 2 i+1 entries; 3.proceed as in (1). Let w be an old array entry. Both the new entries w0 and w1 point to the same block that w used to point to.

Example Insert record with h(K) = Before Now, after the insertion

Example: Next Next: records with h(K)=0000; h(K)= Bucket for 0... gets split, - but i stays at 2. Then: record with h(K) = Overflows bucket for Raise i to 3. After the insertions Currently

Extensible Hash Tables: Advantages: Lookup; never search more than one data block. - Hope that the bucket array fits in main memory Defects: Substantial amount of work to double the bucket array - Interrupts access to data file - Makes certain insertions appear to take very long Doubling the bucket array soon is going to make the array to not fit in main memory. Problem with skewed key distributions. - E.g. Let 1 block=2 records. Suppose that three records have hash values, which happen to be the same in the first 20 bits. - In that case we would have i=20 and and one million bucket- array entries, even though we have only 3 records!!

Linear Hashing Use i bits from right (low­order) end of h(K). Buckets numbered [0…n-1], where 2 i-1 <n  2 i. Let last i bits of h(K) be m = (a 1,a 2,…,a i ) 1.If m < n, then record belongs in bucket m. 2.If n  m<2 i, then record belongs in bucket m-2 i-1, that is the bucket we would get if we changed a 1 (which must be 1) to 0. i=1 n=2 r=3 This is also part of the structure #of records #of buckets

Linear Hash­Table Insert Pick an upper limit on capacity, - e.g., 85% (1.7 records/bucket in our example). If an insertion exceeds capacity limit, set n := n If new n is 2 i + 1, set i := i + 1. No change in bucket numbers needed --- just imagine a leading 0. - Need to split bucket n - 2 i-1 because there is now a bucket numbered (old) n.

Example Insert record with h(K) = Capacity limit exceeded; increment n. r=3 n=2 i=1 #of records #of buckets r=4 n=3 i=2 #of records #of buckets

Example Insert record with h(K) = Capacity limit not exceeded. - But bucket is full; add overflow bucket. r=5 n=3 i=2

Example Insert record with h(K) = Capacity exceeded; set n = 4, add bucket Split bucket 01. r=7 n=4 i=2

Lookup in Linear Hash Table For record(s) with search key K, compute h(K); search the corresponding bucket according to the procedure described for insertion. If the record we wish to look up isn’t there, it can’t be anywhere else. E.g. lookup for a key which hashes to 1010, and then for a key which hashes to r=4 n=3 i=2

Exercise Suppose we want to insert keys with hash values: 0000…1111 in a linear hash table with 100% capacity threshold. Assume that a block can hold three records.