1 Lecture 21: Hash Tables Wednesday, November 17, 2004.

Slides:



Advertisements
Similar presentations
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advertisements

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
CPSC 404, Laks V.S. Lakshmanan1 Hash-Based Indexes Chapter 11 Ramakrishnan & Gehrke (Sections )
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Additional notes on Hashing And notes on HW4. Selected Answers to the Last Assignment The records will hash to the following buckets: K h(K) (bucket number)
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
CST203-2 Database Management Systems Lecture 7. Disadvantages on index structure: We must access an index structure to locate data, or must use binary.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be.
Hash Table indexing and Secondary Storage Hashing.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
External Memory Hashing. Hash Tables Hash function h: search key  [0…B-1]. Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key.
Chapter 13 Hash Tables Section 13.4 CS 257 Dr. T.Y.Lin Abhishek Pandya ID
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
Chapter 13.4 Hash Tables Steve Ikeoka ID: 113 CS 257 – Spring 2008.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
HASH TABLES Malathi Mansanpally CS_257 ID-220. Agenda: Extensible Hash Tables Insertion Into Extensible Hash Tables Linear Hash Tables Insertion Into.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
Spring 2004 ECE569 Lecture ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Lecture 5 Cost Estimation and Data Access Methods.
FALL 2005 CENG 351 Data Management and File Structures 1 Hashing.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Hash Tables and Query Execution March 1st, Hash Tables Secondary storage hash tables are much like main memory ones Recall basics: –There are n.
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Hash 2004, Spring Pusan National University Ki-Joune Li.
Lecture 21: Hash Tables Monday, February 28, 2005.
Hashing CENG 351.
CPSC-608 Database Systems
Database Management Systems (CS 564)
External Memory Hashing
Extendible Indexing Dina Said
Introduction to Database Systems
External Memory Hashing
External Memory Hashing
Extendible Hashing Primarily used for storage of files on disk
Hashing.
CSE 544: Lectures 13 and 14 Storing Data, Indexes
Lecture 6: Data Storage and Indexes
Database Systems (資料庫系統)
2018, Spring Pusan National University Ki-Joune Li
CSE 326: Data Structures: Sorting
CPSC-608 Database Systems
Monday, 5/13/2002 Hash table indexes, query optimization
Wednesday, 5/8/2002 Hash table indexes, physical operators
Hash-Based Indexes Chapter 11
CS4433 Database Systems Indexing.
CS4432: Database Systems II
Presentation transcript:

1 Lecture 21: Hash Tables Wednesday, November 17, 2004

2 Outline Hash-tables (13.4) Note: Used as indexes (far less than B+ trees) Used during query processing (a lot !!)

3 In Class Suppose the B+ tree has depth 4 and degree d=200 How many records does the relation have (maximum) ? How many index blocks do we need to read and/or write during: –A key lookup –An insertion –A deletion

4 Hash Tables Secondary storage hash tables are much like main memory ones Recall basics: –There are n buckets –A hash function f(k) maps a key k to {0, 1, …, n-1} –Store in bucket f(k) a pointer to record with key k Secondary storage: bucket = block, use overflow blocks when needed

5 Assume 1 bucket (block) stores 2 keys + pointers h(e)=0 h(b)=h(f)=1 h(g)=2 h(a)=h(c)=3 Hash Table Example e b f g a c Here: h(x) = x mod 4

6 Search for a: Compute h(a)=3 Read bucket 3 1 disk access Searching in a Hash Table e b f g a c

7 Place in right bucket, if space E.g. h(d)=2 Insertion in Hash Table e b f g d a c

8 Create overflow block, if no space E.g. h(k)=1 More over- flow blocks may be needed Insertion in Hash Table e b f g d a c k

9 Hash Table Performance Excellent, if no overflow blocks Degrades considerably when number of keys exceeds the number of buckets (I.e. many overflow blocks).

10 Extensible Hash Table Allows has table to grow, to avoid performance degradation Assume a hash function h that returns numbers in {0, …, 2 k – 1} Start with n = 2 i << 2 k, only look at first i most significant bits

11 Extensible Hash Table E.g. i=1, n=2 i =2, k=4 Note: we only look at the first bit (0 or 1) 0(010) 1(011) i=

12 Insertion in Extensible Hash Table Insert (010) 1(011) 1(110) i=

13 Insertion in Extensible Hash Table Now insert 1010 Need to extend table, split blocks i becomes 2 0(010) 1(011) 1(110), 1(010) i=

14 Insertion in Extensible Hash Table 0(010) 10(11) 10(10) i= (10) 2

15 Insertion in Extensible Hash Table Now insert 0000, then 0101 Need to split block 0(010) 0(000), 0(101) 10(11) 10(10) i= (10) 2

16 Insertion in Extensible Hash Table After splitting the block 00(10) 00(00) 10(11) 10(10) i= (10) 2 01(01) 2

17 Extensible Hash Table How many buckets (blocks) do we need to touch after an insertion ? How many entries in the hash table do we need to touch after an insertion ?

18 Performance Extensible Hash Table No overflow blocks: access always one read BUT: –Extensions can be costly and disruptive –After an extension table may no longer fit in memory

19 Linear Hash Table Idea: extend only one entry at a time Problem: n= no longer a power of 2 Let i be such that 2 i <= n < 2 i+1 After computing h(k), use last i bits: –If last i bits represent a number > n, change msb from 1 to 0 (get a number <= n)

20 Linear Hash Table Example n=3 (01)00 (11)00 (10)10 i= (01)11 BIT FLIP

21 Linear Hash Table Example Insert 1000: overflow blocks… (01)00 (11)00 (10)10 i= (01)11 (10)00

22 Linear Hash Tables Extension: independent on overflow blocks Extend n:=n+1 when average number of records per block exceeds (say) 80%

23 Linear Hash Table Extension From n=3 to n=4 Only need to touch one block (which one ?) (01)00 (11)00 (10)10 i= (01)11 i= (10)10 (01)00 (11)00 n=11

24 Linear Hash Table Extension From n=3 to n=4 finished Extension from n=4 to n=5 (new bit) Need to touch every single block (why ?) (01)11 i= (10)10 (01)00 (11)00 11