1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.

Slides:



Advertisements
Similar presentations
1 Yet More on Indexes Hash Tables Source: our textbook, slides by Hector Garcia-Molina.
Advertisements

External Memory Hashing. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B.
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Indexing (Cont.) These slides are a modified version of the slides of the book “Database System Concepts” (Chapter 12), 5th Ed., McGraw-Hill,McGraw-Hill.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
Hashing and Indexing John Ortiz.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
File Processing : Hash 2015, Spring Pusan National University Ki-Joune Li.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Indexing.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
CST203-2 Database Management Systems Lecture 7. Disadvantages on index structure: We must access an index structure to locate data, or must use binary.
INDEXING AND HASHING.
Index tuning Hash Index. overview Introduction Hash-based indexes are best for equality selections. –Can efficiently support index nested joins –Cannot.
1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 March 4, 2004 INDEXING II Lecture based on [GUW,
B-trees - Hashing. 11.2Database System Concepts Review: B-trees and B+-trees Multilevel, disk-aware, balanced index methods primary or secondary dense.
B+-tree and Hashing.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
1 Indexing and Hashing Indexing and Hashing Basic Concepts Dense and Sparse Indices B+Trees, B-trees Dynamic Hashing Comparison of Ordered Indexing and.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #11.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #8.
Indexing and Hashing.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.
CS 277 – Spring 2002Notes 51 CS 277: Database System Implementation Arthur Keller Notes 5: Hashing and More.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #12.
CS CS4432: Database Systems II. CS Index definition in SQL Create index name on rel (attr) (Check online for index definitions in SQL) Drop.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #9.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
Ch12: Indexing and Hashing  Basic Concepts  Ordered Indices B+-Tree Index Files B+-Tree Index Files B-Tree Index Files B-Tree Index Files  Hashing Static.
Indexing and Hashing.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Hashing.
Computing & Information Sciences Kansas State University Friday, 24 Oct 2008CIS 560: Database System Concepts Lecture 23 of 42 Friday, 24 October 2008.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 560: Database System Concepts Lecture 22 of 42 Wednesday, 22 October.
CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
1 CPS216: Advanced Database Systems Notes 05: Operators for Data Access (contd.) Shivnath Babu.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Database Systems II Index Structures.
1 Ullman et al. : Database System Principles Notes 5: Hashing and More.
CPSC 8620Notes 61 CPSC 8620: Database Management System Design Notes 6: Hashing and More.
Access Structures COMP3211 Advanced Databases Dr Nicholas Gibbins
COMP3017 Advanced Databases
CS 245: Database System Principles
CPSC-608 Database Systems
CS 245: Database System Principles
External Memory Hashing
Yan Huang - CSCI5330 Database Implementation – Access Methods
External Memory Hashing
External Memory Hashing
CS 245: Database System Principles
Index tuning Hash Index.
Database Design and Programming
Chapter 11: Indexing and Hashing
CPSC-608 Database Systems
CPSC-608 Database Systems
CS4432: Database Systems II
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files Hashing Hashing Static Static Dynamic Hashing Dynamic Hashing More: bitmap indexing More: bitmap indexing

2 Hashing Static hashing Static hashing Dynamic hashing Dynamic hashing

3 key  h(key) Hashing Buckets (typically 1 disk block)

4 Example hash function Key = ‘x 1 x 2 … x n ’ n byte character string Key = ‘x 1 x 2 … x n ’ n byte character string Have b buckets Have b buckets h: add x 1 + x 2 + ….. x n h: add x 1 + x 2 + ….. x n compute sum modulo b compute sum modulo b

5  This may not be best function … Good hash  Expected number of function:keys/bucket is the function:keys/bucket is the same for all buckets

6 Within a bucket: Do we keep keys sorted? Do we keep keys sorted? Yes, if CPU time critical Yes, if CPU time critical & Inserts/Deletes not too frequent & Inserts/Deletes not too frequent

7 Next: example to illustrate inserts, overflows, deletes h(K)

8 EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = d a c b h(e) = 1 e

a b c e d EXAMPLE: deletion Delete: e f f g maybe move “g” up c d

10 Rule of thumb: Try to keep space utilization Try to keep space utilization between 50% and 80% Utilization = # keys used Utilization = # keys used total # keys that fit total # keys that fit If < 50%, wasting space If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket If > 80%, overflows significant depends on how good hash function is & on # keys/bucket

11 How do we cope with growth? Overflows and reorganizations Overflows and reorganizations Dynamic hashing Dynamic hashing Extensible Extensible Linear Linear

12 Extensible hashing: two ideas (a) Use i of b bits output by hash function b h(K)  h(K)  use i  grows over time…. use i  grows over time…

13 (b) Use directory h(K)[i ] to bucket

14 Example: h(k) is 4 bits; 2 keys/bucket i =i =i =i = Insert New directory i =i =i =i = 2 2

Insert: i = Example continued

i = Insert: 1001 Example continued i = 3 3

17 Extensible hashing: deletion No merging of blocks No merging of blocks Merge blocks and cut directory if possible Merge blocks and cut directory if possible (Reverse insert procedure)

18 Deletion example: Run thru insert example in reverse! Run thru insert example in reverse!

19 Extensible hashing Can handle growing files - with less wasted space - with no full reorganizations Summary +Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - -

20 Advanced indexing Multiple attributes Multiple attributes Bitmap indexing Bitmap indexing

21 Multiple-Key Access Use multiple indices for certain types of queries. Use multiple indices for certain types of queries. Example: Example: select account-number from account where branch-name = “Perryridge” and balance = 1000 Possible strategies? Possible strategies?

22 Indices on Multiple Attributes where branch-name = “PP” and balance = 1000 where branch-name = “PP” and balance = 1000 Suppose we have an index on combined search-key (branch-name, balance). BB,1000 CC,200 PP,800 PP,1500 AB,200 AA,2000 AA,2300 AA,2500 AB,200 AC,200 CC,200 DD,200 DD,300 CC,200 PP,300 PP,800 PP,1000 PP,1300 PP,1500 PP,1560

23 where branch-name = “PP” and balance < 1000 where branch-name = “PP” and balance < 1000 Suppose we have an index on combined search-key (branch-name, balance). BB,1000 CC,200 PP,800 PP,1500 AB,200 AA,2000 AA,2300 AA,2500 AB,200 AC,200 CC,200 DD,200 DD,300 CC,200 PP,300 PP,800 PP,1000 PP,1300 PP,1500 PP,1560 search pp,0 search pp,1000

24 where branch-name < “PP” and balance = 1000? where branch-name < “PP” and balance = 1000? Suppose we have an index on combined search-key (branch-name, balance). BB,1000 CC,200 PP,800 PP,1500 AB,200 AA,2000 AA,2300 AA,2500 AB,200 AC,200 CC,200 DD,200 DD,300 CC,200 PP,300 PP,800 PP,1000 PP,1300 PP,1500 PP,1560 NO!

25 Bitmap Indices An index designed for multiple valued search keys An index designed for multiple valued search keys

26 Bitmap Indices (Cont.) Unique values of gender Unique values of income-level Bitmap(size = table size) The income-level value of record 3 is L1

27 Bitmap Indices (Cont.) Some properties of bitmap indices Some properties of bitmap indices Number of bitmaps for each attribute? Number of bitmaps for each attribute? Size of each bitmap? Size of each bitmap? When is the bitmap matrix sparse and what attributes are good for bitmap indices? When is the bitmap matrix sparse and what attributes are good for bitmap indices?

28 Bitmap Indices (Cont.) Bitmap indices generally very small compared with relation size Bitmap indices generally very small compared with relation size E.g. if record is 100 bytes, space for a single bitmap is 1/800 of space used by relation. E.g. if record is 100 bytes, space for a single bitmap is 1/800 of space used by relation. If number of distinct attribute values is 8, bitmap is only 1% of relation size If number of distinct attribute values is 8, bitmap is only 1% of relation size What about insertion? What about insertion? Deletion? Deletion?

29 Bitmap Indices Queries Sample query: Males with income level L AND = What about the number of males with income level L1? even faster!

30 Bitmap Indices Queries Queries are answered using bitmap operations Queries are answered using bitmap operations Intersection (and) Intersection (and) Union (or) Union (or) Complementation (not) Complementation (not)