Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.

Slides:



Advertisements
Similar presentations
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Advertisements

Data Organization - B-trees. 11.2Database System Concepts A simple index Brighton A Downtown A Downtown A Mianus A Perry.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #7.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
CS4432: Database Systems II
CS CS4432: Database Systems II Basic indexing.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Chapter 9 of DBMS First we look at a simple (strawman) approach (ISAM). We will see why it is unsatisfactory. This will motivate the B+Tree Read 9.1 to.
1 More on Indexes Secondary Indexes B-Trees Source: our textbook, slides by Hector Garcia-Molina.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #10.
CS 277 – Spring 2002Notes 41 CS 277: Database System Implementation Notes 4: Indexing Arthur Keller.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #7.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Primary Indexes Dense Indexes
CS 245Notes 41 CS 245: Database System Principles Notes 4: Indexing Hector Garcia-Molina.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
1 CS143: Index. 2 Topics to Learn Important concepts –Dense index vs. sparse index –Primary index vs. secondary index (= clustering index vs. non-clustering.
CS 255: Database System Principles slides: B-trees
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
CS4432: Database Systems II
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Index Structures for Files Indexes speed up the retrieval of records under certain search conditions Indexes called secondary access paths do not affect.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
CS 405G: Introduction to Database Systems 22 Index Chen Qian University of Kentucky.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
B+ tree & B tree Extracted from Garcia Molina
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.
Indexing Structures Database System Implementation CSE 507 Some slides adapted from R. Elmasri and S. Navathe, Fundamentals of Database Systems, Sixth.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Database Systems II Index Structures.
1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman.
CS4432: Database Systems II
1 Query Processing Part 3: B+Trees. 2 Dense and Sparse Indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Insertions.
CS 405G: Introduction to Database Systems 12. Index.
1 Ullman et al. : Database System Principles Notes 4: Indexing.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Chapter 5 Ranking with Indexes. Indexes and Ranking n Indexes are designed to support search  Faster response time, supports updates n Text search engines.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
CS 728 Advanced Database Systems Chapter 18
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Extra: B+ Trees CS1: Java Programming Colorado State University
CPSC-629 Analysis of Algorithms
CPSC-310 Database Systems
(Slides by Hector Garcia-Molina,
Database Design and Programming
Lecture 20: Indexes Monday, February 27, 2006.
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

Indexes

Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much bigger than key-­pointer pairs. - Fit index in memory, even if data file does not? - Faster search through index than data file? Sparse Indexes Key-­pointer pairs for only a subset of records, typically first in each block. Saves index space.

Dense Index

Sparse Index

Num. Example of Dense Index Data file = 1,000,000 tuples that fit 10 at a time into a block of 4096 bytes (4KB) 100,000 blocks  data file = 400 MB Index file: For typical values of key 30 Bytes, and pointer 8 Bytes, we can fit: 4096/(30+8)  100 (key,pointer) pairs in a block. So, we need 10,000 blocks = 40 MB for the index file. This might fit into available main memory.

Num. Example of Sparse Index Data file and block sizes as before One (key,pointer) record for the first record of every block  index file = 100,000 (key, pointer) pairs = 100,000 * 38Bytes = 1,000 blocks = 4MB If the index file could fit in main memory  1 disk I/O to find record given the key

Multiple levels of index An index maybe large; using an index on the index may improve the search time;  1. Build a (dense or sparse) index on the data file  2. Build a sparse index on the index file Ex: 2nd level index has 1,000 key-pointer pairs = 10 blocks = 40KB.  40 KB for the 2nd level index file fits in m.m.  2 disk I/O’s to find the record, given a key

Lookup for key K Issues: sparse vs. dense? 1.Find key K in dense index. 2.Find largest key  K in sparse index. Follow pointer. a) Dense: just follow. b) Sparse: follow to block, examine block. Dense vs. Sparse: Dense index can answer: ”Is there a record with key K?” Sparse index can not!

Cost of Lookup We can do binary search. log 2 (number of index blocks) I/O’s to find the desired record. All binary searches to the index will start at the block in the middle, then at 1/4 and 3/4 points, 1/8, 3/8, 5/8, 7/8. - So, if we store some of these blocks in main memory, I/O’s will be significantly lower. For our example: Binary search in the index may use at most log 2 10,000 = 14 blocks (or I/O’s) to find the record, given the key, … or much less if we store some of the index blocks as above.

Secondary Indexes A primary index is an index on a sorted file. - Such an index “controls” the placement of records to be “primary,” A secondary index is an index that does not control placement, surely not on a file sorted by its search key. - Sparse, secondary index makes no sense. - Usually, search key is not a “key.”

Indirect Buckets To avoid repeating keys in index, use a level of indirection, called buckets. Additional advantage: allows intersection of sets of records without looking at records themselves. Example Movies(title, year, length, studioName); secondary indexes on studioName and year. SELECT title FROM Movies WHERE studioName = 'Disney' AND year = 1995;

Inverted Indexes Similar (to secondary indexes) idea from information­retrieval community, but: - Record  document. - Search­key value of record  presence of a word in a document. Usually used with “buckets.”

Additional Information in Buckets We can extend bucket to include role, position of word, e.g. Type Position

Example of index selection Movie(title,year,length,studioName) Studio(name,address,president) Frequent query: Find movies given a studioName (Q1)  use primary/secondary index on Movie.studioName Also, if FAQs include: Find all movies produced by a studio, given the president of the studio (Q2) and Find movies made by a studio, given its address (Q3)  use a clustered file organization, with secondary indexes: one on president and another on address If we still want to answer Q1 efficiently, we now need a primary index on studioName

Operations with Indexes Deletions and insertions are problematic for flat indexes. Eventually, we need to reorganize entries and records. - E.g. insert 15 …that’s a messy approach.

B­Trees Generalizes multilevel index. Number of levels varies with size of data file, but is often 3. B+ tree = form we'll discuss. - All nodes have same format. - a B-tree of order n has n keys and n + 1 pointers. Useful for primary, secondary indexes, primary keys, nonkeys. Leaf has at least key-pointer pairs Interior nodes use at least pointers. Root has at least one key and two pointers

B-Tree of order 3 Recursive procedure: If we are at a leaf, look among the keys there. If the i-th key is K, the the i-th pointer will take us to the desired record. If we are at an internal node with keys K 1,K 2,…,K n, then if K<K 1 we follow the first pointer, if K 1  K<K 2 we follow the second pointer, and so on. Try to find a record with search key 40.

B­-Trees: A typical leaf and interior node To record with key 57 To record with key 81 To record with key 95 To next leaf in sequence Leaf To subtree with keys K<57 To subtree with keys 57  K<81 To subtree with keys 81  K<95 Interior Node To subtree with keys K  95 57, 81, and 95 are the least keys we can reach by via the corresponding pointers.

Operations in B-Tree Will illustrate with a dense index, but straightforward to generalize for sparse indices. Operations 1.Lookup 2.Insertion 3.Deletion

Lookup Recursive procedure: If we are at a leaf, look among the keys there. If the i-th key is K, the the i-th pointer will take us to the desired record. If we are at an internal node with keys K 1,K 2,…,K n, then if K<K 1 we follow the first pointer, if K 1  K<K 2 we follow the second pointer, and so on. Try to find a record with search key 40.

Range Queries Lookup key 10 Follow the “next-leaf” pointer collect all data-pointers (or retrieve the records) Until you reach key 25 Number of I/O’s depends on the range SELECT * FROM R WHERE R.key >= 10 AND R.key <= 25;

Insertion into B-Trees in words… We try to find a place for the new key in the appropriate leaf, and we put it there if there is room. If there is no room in the proper leaf, we “split” the leaf into two and divide the keys between the two new nodes, so each is half full or just over half full. - Split means “add a new block” The splitting of nodes at one level appears to the level above as if a new key-pointer pair needs to be inserted at that higher level. - We may thus apply this strategy to insert at the next level: if there is room, insert it; if not, split the parent node and continue up the tree. As an exception, if we try to insert into the root, and there is no room, then we split the root into two nodes and create a new root at the next higher level; - The new root has the two nodes resulting from the split as its children.

Insertion Try to insert a search key = 40. First, lookup for it, in order to find where to insert. It has to go here, but the node is full!

Beginning of the insertion of key 40 Observe the new node and the redistribution of keys and pointers What’s the problem? No parent yet for the new node!

Continuing of the Insertion of key 40 We must now insert a pointer to the new leaf into this node. We must also associate with this pointer the key 40, which is the least key reachable through the new leaf. But the node is full. Thus it too must split!

Completing of the Insertion of key This is a new node. We have to redistribute 3 keys and 4 pointers. We leave three pointers in the existing node and give two pointers to the new node. 43 goes in the new node. But where the key 40 goes? 40 is the least key reachable via the new node.

Completing of the Insertion of key It goes here! 40 is the least key reachable via the new node.

Structure of B-trees Degree n means that all nodes have space for n search keys and n+1 pointers Node = block Let - block size be 4096 Bytes, - key 4 Bytes, - pointer 8 Bytes. Let’s solve for n: 4n + 8(n+1)  4096  n  340 n = degree = order = fanout

Example n = 340, however a typical node has 255 keys At level 3 we have: nodes, which means  16  2 20 records can be indexed. Suppose record = 1024 Bytes  we can index a file of size 16  2 20  2 10  16 GB If the root is kept in main memory accessing a record requires 3 disk I/O

Deletion from B-trees in words… If the node from which we delete still has the minimum no. of keys we’re done(possibly raise new key to parent) If the node from which we delete now has too few keys, then - If an adjacent sibling has more than the min. no. of key, borrow a key-pointer from that sibling. Adjust the keys in the parent - Else merge the underfull node and the node with the min. number of keys. Delete a key-pointer from the parent, and adjust the parent recursively

Deletion Suppose we delete key=7

Deletion (Raising a key to parent) This node is less than half full. So, it borrows key 5 from sibling.

Deletion Suppose we delete now key=11. No siblings with enough keys to borrow.

Deletion We merge, i.e. delete a block from the index. However, the parent ends up not having any key.

Deletion Parent: Borrow from sibling!

The slides from here to the end of the file are by Hector Garcia-Molina

Root B+Tree Examplen=

CS 245Notes 439 Size of nodes:n+1 pointers n keys (fixed)

CS 245Notes 440 Don’t want nodes to be too empty Use at least Non-leaf:  (n+1)/2  pointers Leaf:  (n+1)/2  pointers to data

CS 245Notes 441 Full nodemin. node Non-leaf Leaf n= counts even if null

CS 245Notes 442 B+tree rulestree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to recordsexcept for “sequence pointer”

CS 245Notes 443 (3) Number of pointers/keys for B+tree Non-leaf (non-root) n+1n  (n+1)/ 2   (n+1)/ 2  - 1 Leaf (non-root) n+1n Rootn+1n11 Max Max Min Min ptrs keys ptrs  data keys  (n+ 1) / 2 

CS 245Notes 444 Insert into B+tree (a) simple case - space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root

CS 245Notes 445 (a) Insert key = 32 n=

CS 245Notes 446 (a) Insert key = 32 n=

CS 245Notes 447 (a) Insert key = 7 n=

CS 245Notes 448 (a) Insert key = 7 n=

CS 245Notes 449 (a) Insert key = 7 n=

CS 245Notes 450 (c) Insert key = 160 n=

CS 245Notes 451 (c) Insert key = 160 n=

CS 245Notes 452 (c) Insert key = 160 n=

CS 245Notes 453 (c) Insert key = 160 n=

CS 245Notes 454 (d) New root, insert 45 n=

CS 245Notes 455 (d) New root, insert 45 n=

CS 245Notes 456 (d) New root, insert 45 n=

CS 245Notes 457 (d) New root, insert 45 n= new root

CS 245Notes 458 (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Deletion from B+tree

CS 245Notes 459 (b) Coalesce with sibling - Delete n=4

CS 245Notes 460 (b) Coalesce with sibling - Delete n=4 40

CS 245Notes 461 (c) Redistribute keys - Delete n=4

CS 245Notes 462 (c) Redistribute keys - Delete n=4 35

CS 245Notes (d) Non-leaf coalese –Delete 37 n=4 25

CS 245Notes (d) Non-leaf coalese –Delete 37 n=

CS 245Notes (d) Non-leaf coalese –Delete 37 n=

CS 245Notes (d) Non-leaf coalese –Delete 37 n= new root