CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

Slides:



Advertisements
Similar presentations
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advertisements

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Hash-based Indexes CS 186, Spring 2006 Lecture 7 R &G Chapter 11 HASH, x. There is no definition for this word -- nobody knows what hash is. Ambrose Bierce,
1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.
Hash-Based Indexes The slides for this text are organized into chapters. This lecture covers Chapter 10. Chapter 1: Introduction to Database Systems Chapter.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Hashing and Indexing John Ortiz.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
CS 4432lecture #11 - indexing & hashing1 CS4432: Database Systems II Lecture #11 Professor Elke A. Rundensteiner.
1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 March 4, 2004 INDEXING II Lecture based on [GUW,
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
B+-tree and Hashing.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
1 Indexing and Hashing Indexing and Hashing Basic Concepts Dense and Sparse Indices B+Trees, B-trees Dynamic Hashing Comparison of Ordered Indexing and.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
1 Hash-Based Indexes Chapter Introduction : Hash-based Indexes  Best for equality selections.  Cannot support range searches.  Static and dynamic.
HASH TABLES Malathi Mansanpally CS_257 ID-220. Agenda: Extensible Hash Tables Insertion Into Extensible Hash Tables Linear Hash Tables Insertion Into.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
1 CS232A: Database System Principles INDEXING. 2 Given condition on attribute find qualified records Attr = value Condition may also be Attr>value Attr>=value.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
CS 4432query processing1 CS4432: Database Systems II Lecture #11 Professor Elke A. Rundensteiner.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
1.1 CS220 Database Systems Indexing: Hashing Slides courtesy G. Kollios Boston University via UC Berkeley.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Indexed Sequential Access Method.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
Chapter 5 Multidimensional Indexes. One dimensional index can be used to support multidimensional query. F1=‘abcd’ F2= 123‘abcd#123’
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Database Systems II Index Structures.
Hash-Based Indexes. Introduction uAs for any index, 3 alternatives for data entries k*: w Data record with key value k w w Choice orthogonal to the indexing.
CS4432: Database Systems II
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Access Structures COMP3211 Advanced Databases Dr Nicholas Gibbins
COMP3017 Advanced Databases
CS 245: Database System Principles
CS232A: Database System Principles INDEXING
CPSC-608 Database Systems
CS 245: Database System Principles
External Memory Hashing
Yan Huang - CSCI5330 Database Implementation – Access Methods
External Memory Hashing
CS 245: Database System Principles
Index tuning Hash Index.
Database Design and Programming
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
CPSC-608 Database Systems
CS4432: Database Systems II
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More

CS 245Notes 52 key  h(key) Hashing Buckets (typically 1 disk block)

CS 245Notes Two alternatives records (1) key  h(key)

CS 245Notes 54 (2) key  h(key) Index record key 1 Two alternatives Alt (2) for “secondary” search key

CS 245Notes 55 Example hash function Key = ‘x 1 x 2 … x n ’ n byte character string Have b buckets h: add x 1 + x 2 + ….. x n – compute sum modulo b

CS 245Notes 56  This may not be best function …  Read Knuth Vol. 3 if you really need to select a good function. Good hash  Expected number of function:keys/bucket is the same for all buckets

CS 245Notes 57 Within a bucket: Do we keep keys sorted? Yes, if CPU time critical & Inserts/Deletes not too frequent

CS 245Notes 58 Next: example to illustrate inserts, overflows, deletes h(K)

CS 245Notes 59 EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = d a c b h(e) = 1 e

CS 245Notes a b c e d EXAMPLE: deletion Delete: e f f g maybe move “g” up c d

CS 245Notes 511 Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket

CS 245Notes 512 How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear

CS 245Notes 513 Extensible hashing: two ideas (a) Use i of b bits output by hash function b h(K)  use i  grows over time…

CS 245Notes 514 (b) Use directory h(K)[i ] to bucket

CS 245Notes 515 Example: h(k) is 4 bits; 2 keys/bucket i = Insert New directory i = 2 2

CS 245Notes Insert: i = Example continued

CS 245Notes i = Insert: 1001 Example continued i = 3 3

CS 245Notes 518 Extensible hashing: deletion No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure)

CS 245Notes 519 Deletion example: Run thru insert example in reverse!

CS 245Notes 520 Extensible hashing Can handle growing files - with less wasted space - with no full reorganizations Summary + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - -

CS 245Notes 521 Linear hashing Another dynamic hashing scheme Two ideas: (a) Use i low order bits of hash grows b i (b) File grows linearly

CS 245Notes 522 Example b=4 bits, i =2, 2 keys/bucket m = 01 (max used block) Future growth buckets If h(k)[i ]  m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2 i -1 Rule 0101 can have overflow chains! insert 0101

CS 245Notes 523 Example b=4 bits, i =2, 2 keys/bucket m = 01 (max used block) Future growth buckets insert

CS 245Notes 524 Example Continued: How to grow beyond this? m = 11 (max used block) i =

CS 245Notes 525 If U > threshold then increase m (and maybe i )  When do we expand file? Keep track of: # used slots total # of slots = U

CS 245Notes 526 Linear Hashing Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing Summary + + Can still have overflow chains -

CS 245Notes 527 Example: BAD CASE Very full Very emptyNeed to move m here… Would waste space...

CS 245Notes 528 Hashing - How it works - Dynamic hashing - Extensible - Linear Summary

CS 245Notes 529 Next: Indexing vs Hashing Index definition in SQL Multiple key access

CS 245Notes 530 Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 Indexing vs Hashing

CS 245Notes 531 INDEXING (Including B Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 Indexing vs Hashing

CS 245Notes 532 Index definition in SQL Create index name on rel (attr) Create unique index name on rel (attr) defines candidate key Drop INDEX name

CS 245Notes 533 CANNOT SPECIFY TYPE OF INDEX (e.g. B-tree, Hashing, …) OR PARAMETERS (e.g. Load Factor, Size of Hash,...)... at least in SQL... Note

CS 245Notes 534 ATTRIBUTE LIST  MULTIKEY INDEX (next) e.g., CREATE INDEX foo ON R(A,B,C) Note

CS 245Notes 535 Motivation: Find records where DEPT = “Toy” AND SAL > 50k Multi-key Index

CS 245Notes 536 Strategy I: Use one index, say Dept. Get all Dept = “Toy” records and check their salary I1I1

CS 245Notes 537 Use 2 Indexes; Manipulate Pointers ToySal > 50k Strategy II:

CS 245Notes 538 Multiple Key Index One idea: Strategy III: I1I1 I2I2 I3I3

CS 245Notes 539 Example Record Dept Index Salary Index Name=Joe DEPT=Sales SAL=15k Art Sales Toy 10k 15k 17k 21k 12k 15k 19k

CS 245Notes 540 For which queries is this index good? Find RECs Dept = “Sales” SAL=20k Find RECs Dept = “Sales” SAL > 20k Find RECs Dept = “Sales” Find RECs SAL = 20k

CS 245Notes 541 Interesting application: Geographic Data DATA: x y...

CS 245Notes 542 Queries: What city is at ? What is within 5 miles from ? Which is closest point to ?

CS 245Notes 543 h n b i a c o d Example e g f m l k j h i a b c d e f g n o m l j k Search points near f Search points near b 5 15

CS 245Notes 544 Queries Find points with Yi > 20 Find points with Xi < 5 Find points “close” to i = Find points “close” to b =

CS 245Notes 545 Many types of geographic index structures have been suggested Quad Trees R Trees

CS 245Notes 546 Two more types of multi key indexes Grid Partitioned hash

CS 245Notes 547 Grid Index Key 2 X 1 X 2 …… X n V 1 V 2 Key 1 V n To records with key1=V 3, key2=X 2

CS 245Notes 548 CLAIM Can quickly find records with –key 1 = V i  Key 2 = X j –key 1 = V i –key 2 = X j And also ranges…. –E.g., key 1  V i  key 2 < X j

CS 245Notes 549  But there is a catch with Grid Indexes! How is Grid Index stored on disk? Like Array... X1X2X3X4 X1X2X3X4X1X2X3X4 V1V2V3 Problem: Need regularity so we can compute position of entry

CS 245Notes 550 Solution: Use Indirection Buckets V 1 V 2 V 3 * Grid only V 4 contains pointers to buckets Buckets -- X1 X2 X3

CS 245Notes 551 With indirection: Grid can be regular without wasting space We do have price of indirection

CS 245Notes 552 Can also index grid on value ranges SalaryGrid Linear Scale 123 ToySalesPersonnel 0-20K1 20K-50K2 50K-3 8

CS 245Notes 553 Grid files Good for multiple-key search Space, management overhead (nothing is free) Need partitioning ranges that evenly split keys + - -

CS 245Notes 554 Idea: Key1 Key2 Partitioned hash function h1h

CS 245Notes 555 h1(toy)=0000 h1(sales)=1001 h1(art)= h2(10k)=01100 h2(20k)=11101 h2(30k)=01110 h2(40k)=00111., EX: Insert

CS 245Notes 556 h1(toy)=0000 h1(sales)=1001 h1(art)= h2(10k)=01100 h2(20k)=11101 h2(30k)=01110 h2(40k)= Find Emp. with Dept. = Sales  Sal=40k

CS 245Notes 557 h1(toy)=0000 h1(sales)=1001 h1(art)= h2(10k)=01100 h2(20k)=11101 h2(30k)=01110 h2(40k)= Find Emp. with Sal=30k look here

CS 245Notes 558 h1(toy)=0000 h1(sales)=1001 h1(art)= h2(10k)=01100 h2(20k)=11101 h2(30k)=01110 h2(40k)= Find Emp. with Dept. = Sales look here

CS 245Notes 559 Post hashing discussion: - Indexing vs. Hashing - SQL Index Definition - Multiple Key Access - Multi Key Index Variations: Grid, Geo Data - Partitioned Hash Summary