CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

Slides:



Advertisements
Similar presentations
External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
Advertisements

CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Hashing and Indexing John Ortiz.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
ICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 02/18/20101Lipyeow Lim.
CS4432: Database Systems II
CS CS4432: Database Systems II Basic indexing.
B+-tree and Hashing.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
1 Indexing and Hashing Indexing and Hashing Basic Concepts Dense and Sparse Indices B+Trees, B-trees Dynamic Hashing Comparison of Ordered Indexing and.
Tree-Structured Indexes. Introduction v As for any index, 3 alternatives for data entries k* : À Data record with key value k Á Â v Choice is orthogonal.
1 Tree-Structured Indexes Yanlei Diao UMass Amherst Feb 20, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Index tuning-- B+tree. overview © Dennis Shasha, Philippe Bonnet 2001 B+-Tree Locking Tree Traversal –Update, Read –Insert, Delete phantom problem: need.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
Tree-Structured Indexes Chapter 10
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
Access Structures COMP3211 Advanced Databases Dr Nicholas Gibbins
COMP3017 Advanced Databases
CS 245: Database System Principles
Extra: B+ Trees CS1: Java Programming Colorado State University
CPSC-608 Database Systems
CS 245: Database System Principles
Yan Huang - CSCI5330 Database Implementation – Access Methods
CS222/CS122C: Principles of Data Management Notes #07 B+ Trees
CS 245: Database System Principles
Tree-Structured Indexes
Index tuning Hash Index.
Database Design and Programming
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #06 B+ trees Instructor: Chen Li.
CS4432: Database Systems II
CS222P: Principles of Data Management UCI, Fall Notes #06 B+ trees
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

CS 4432lecture #10 - indexing & hashing2 1.B+-tree Odds and Ends 2.Hashing (briefly) Chapter 4 – INDEXING Wrap-up  value record

CS 4432lecture #10 - indexing & hashing3 Root B+Tree Examplen=

CS 4432lecture #10 - indexing & hashing4 Comparison B-tree vs. indexed seq. file Less space, so lookup faster Inserts managed by overflow area Requires temporary restructuring Unpredictable performance Consumes more space, so lookup slower Each insert/delete potentially restructures Build-in restructuring Predictable performance

CS 4432lecture #10 - indexing & hashing5 DBA does not know when to reorganize DBA does not know how full to load pages of new index B-trees better …

CS 4432lecture #10 - indexing & hashing6 A la buffering… Is LRU a good policyfor B+tree buffers?  Of course not!  Should try to keep root in memory at all times (and perhaps some nodes from second level)

CS 4432lecture #10 - indexing & hashing7 Interesting problem: For B+tree, how large should n be? … n is number of keys / node

CS 4432lecture #10 - indexing & hashing8 assumptions: n children per node and N records in database (1)Time to read B-Tree node from disk is (tseek + tread*n) msec. (2)Once in main memory, use binary search to locate key, (a + b log_2 n) msec (3)Need to search (read) log_n (N) tree nodes (4)t-search = (tseek + tread*n + (a + b*log_2(n)) * log n (N)

CS 4432lecture #10 - indexing & hashing9  Can get: f(n) = time to find a record f(n) n opt n  FIND n opt by f’(n) = 0 øWhat happens to n opt as: Disk gets faster? CPU get faster? …

CS 4432lecture #10 - indexing & hashing10 Bulk Loading of B+ Tree For large collection of records, create B+ tree. Method 1: Repeatedly insert records  slow. Method 2: Bulk Loading  more efficient.

CS 4432lecture #10 - indexing & hashing11 Bulk Loading of B+ Tree Initialization: – Sort all data entries – Insert pointer to first (leaf) page in new (root) page. 3* 4* 6*9*10*11*12*13* 20*22* 23*31* 35* 36*38*41*44* Sorted pages of data entries; not yet in B+ tree Root

CS 4432lecture #10 - indexing & hashing12 Bulk Loading (Contd.) Index entries for leaf pages always entered into right-most index page When this fills up, it splits. (Split may go up right- most path to root.) Faster than repeated inserts, especially when one considers locking! 3* 4* 6*9*10*11*12*13* 20*22* 23*31* 35* 36*38*41*44* Root Data entry pages not yet in B+ tree * 4* 6*9*10*11*12*13* 20*22* 23*31* 35* 36*38*41*44* 6 Root not yet in B+ tree Data entry pages

CS 4432lecture #10 - indexing & hashing13 Summary of Bulk Loading Method 1: multiple inserts. – Slow. – Does not give sequential storage of leaves. Method 2: Bulk Loading – Has advantages for concurrency control. – Fewer I/Os during build. – Leaves will be stored sequentially (and linked) – Can control “fill factor” on pages.

CS 4432lecture #10 - indexing & hashing14 key  h(key) Hashing Buckets (typically 1 disk block)

CS 4432lecture #10 - indexing & hashing15 Example hash function Key = ‘x 1 x 2 … x n ’ n byte character string Have b buckets h: add x 1 + x 2 + ….. x n – compute sum modulo b

CS 4432lecture #10 - indexing & hashing16  This may not be best function …  Read Knuth Vol. 3 if you really need to select a good function. Good hash  Expected number of function:keys/bucket is the same for all buckets

CS 4432lecture #10 - indexing & hashing17 Within a bucket: Do we keep keys sorted? Yes, if CPU time critical & Inserts/Deletes not too frequent

CS 4432lecture #10 - indexing & hashing18 Next: example to illustrate inserts, overflows, deletes h(K)

CS 4432lecture #10 - indexing & hashing19 EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = d a c b h(e) = 1 e

CS 4432lecture #10 - indexing & hashing a b c e d EXAMPLE: deletion Delete: e f f g maybe move “g” up c d

CS 4432lecture #10 - indexing & hashing21 Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket

CS 4432lecture #10 - indexing & hashing22 How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible hashing Others …

CS 4432lecture #10 - indexing & hashing23 Extensible hashing : idea 1 (a) Use i of b bits output by hash function b h(K)  use i  grows over time…

CS 4432lecture #10 - indexing & hashing24 (b) Use directory h(K)[i ] to bucket Extensible hashing : idea 2

CS 4432lecture #10 - indexing & hashing25 Example: h(k) is 4 bits; 2 keys/bucket i = Insert New directory i =

CS 4432lecture #10 - indexing & hashing Insert: i = Example continued

CS 4432lecture #10 - indexing & hashing i = Insert: 1001 Example continued i = 3 3

CS 4432lecture #10 - indexing & hashing28 Extensible hashing: deletion Merge blocks and cut directory if possible (Reverse insert procedure)

CS 4432lecture #10 - indexing & hashing29 Extensible hashing Can handle growing files - with less wasted space - with no full reorganizations Summary + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - -

CS 4432lecture #10 - indexing & hashing30 Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 Indexing vs Hashing

CS 4432lecture #10 - indexing & hashing31 INDEXING (Including B Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 Indexing vs Hashing

CS 4432lecture #10 - indexing & hashing32 The BIG picture…. Chapters 2 & 3: Storage, records, blocks... Chapter 4 & 5: Access Mechanisms - Indexes - B trees - Hashing - Multi key Chapter 6 & 7: Query Processing