CMU SCS 15-826: Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU www.cs.cmu.edu/~christos.

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos.
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #4: Multi-key and Spatial Access Methods - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications Faloutsos & Pavlo Lecture#11 (R&G ch. 11) Hashing.
CMU SCS : Multimedia Databases and Data Mining Lecture#3: Primary key indexing – hashing C. Faloutsos.
Data Organization - B-trees. 11.2Database System Concepts A simple index Brighton A Downtown A Downtown A Mianus A Perry.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
1 Lecture 8: Data structures for databases II Jose M. Peña
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science Database Applications Lecture#9: Indexing (R&G ch. 10)
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Primary Indexes Dense Indexes
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
1 B+ Trees. 2 Tree-Structured Indices v Tree-structured indexing techniques support both range searches and equality searches. v ISAM : static structure;
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications C. Faloutsos & A. Pavlo Lecture#9 (R&G ch. 10) Indexing.
CMU SCS : Multimedia Databases and Data Mining Lecture#3: Primary key indexing – hashing C. Faloutsos.
Storage and Indexing February 26 th, 2003 Lecture 19.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
March 7 & 9, Csci 2111: Data and File Structures Week 8, Lectures 1 & 2 Multi-Level Indexing and B-Trees.
File Processing : Index and Hash 2015, Spring Pusan National University Ki-Joune Li.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Adapted from Mike Franklin
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
1 Indexing. 2 Motivation Sells(bar,beer,price )Bars(bar,addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 Query:
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Indexing and Hashing I (based on notes by Silberchatz, Korth, and.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Storage and Indexing. How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to.
CS411 Database Systems Kazuhiro Minami 10: Indexing-1.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
CS422 Principles of Database Systems Indexes Chengyu Sun California State University, Los Angeles.
CS422 Principles of Database Systems Indexes
Data Organization - B-trees
Data Indexing Herbert A. Evans.
Tree-Structured Indexes: Introduction
Tree-Structured Indexes
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
15-826: Multimedia Databases and Data Mining
C. Faloutsos Indexing and Hashing – part I
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
CPSC-310 Database Systems
15-826: Multimedia Databases and Data Mining
B+-Trees and Static Hashing
Faloutsos & Pavlo Lecture#11 (R&G ch. 11) Hashing
Tree-Structured Indexes
Lecture 19: Data Storage and Indexes
Faloutsos - Pavlo CMU SCS /615
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Tree-Structured Indexes
Lecture 21: B-Trees Monday, Nov. 19, 2001.
Adapted from Mike Franklin
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
15-826: Multimedia Databases and Data Mining
Lecture 20: Indexes Monday, February 27, 2006.
Tree-Structured Indexes
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

CMU SCS : Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU

CMU SCS Reading Material [Ramakrishnan & Gehrke, 3rd ed, ch. 10] Copyright: C. Faloutsos (2012)2

CMU SCS Copyright: C. Faloutsos (2012)3 Problem Given a large collection of (multimedia) records, find similar/interesting things, ie: Allow fast, approximate queries, and Find rules/patterns

CMU SCS Copyright: C. Faloutsos (2012)4 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining

CMU SCS Copyright: C. Faloutsos (2012)5 Indexing - Detailed outline primary key indexing –B-trees and variants –(static) hashing –extendible hashing secondary key indexing spatial access methods text...

CMU SCS Copyright: C. Faloutsos (2012)6 In even more detail: B – trees B+ - trees, B*-trees hashing

CMU SCS Copyright: C. Faloutsos (2012)7 Primary key indexing find employee with ssn=123

CMU SCS Copyright: C. Faloutsos (2012)8 B-trees the most successful family of index schemes (B-trees, B +- trees, B * -trees) Can be used for primary/secondary, clustering/non-clustering index. balanced “n-way” search trees

CMU SCS Copyright: C. Faloutsos (2012)9 Citation Rudolf Bayer and Edward M. McCreight, Organization and Maintenance of Large Ordered Indices, Acta Informatica, 1: , Received the 2001 SIGMOD innovations award among the most cited db publications

CMU SCS Copyright: C. Faloutsos (2012)10 B-trees Eg., B-tree of order 3: <6 >6 <9 >9

CMU SCS Copyright: C. Faloutsos (2012)11 B - tree properties: each node, in a B-tree of order n: –Key order –at most n pointers –at least n/2 pointers (except root) –all leaves at the same level –if number of pointers is k, then node has exactly k-1 keys –(leaves are empty) v1v2 …v n-1 p1 pn

CMU SCS Copyright: C. Faloutsos (2012)12 Properties “block aware” nodes: each node -> disk page O(log (N)) for everything! (ins/del/search) typically, if n = , then levels utilization >= 50%, guaranteed; on average 69%

CMU SCS Copyright: C. Faloutsos (2012)13 Queries Algo for exact match query? (eg., ssn=8?) <6 >6 <9 >9

CMU SCS Copyright: C. Faloutsos (2012)14 Queries Algo for exact match query? (eg., ssn=8?) <6 >6 <9 >9

CMU SCS Copyright: C. Faloutsos (2012)15 Queries Algo for exact match query? (eg., ssn=8?) <6 >6 <9 >9

CMU SCS Copyright: C. Faloutsos (2012)16 Queries Algo for exact match query? (eg., ssn=8?) <6 >6 <9 >9

CMU SCS Copyright: C. Faloutsos (2012)17 Queries Algo for exact match query? (eg., ssn=8?) <6 >6 <9 >9 H steps (= disk accesses)

CMU SCS Copyright: C. Faloutsos (2012)18 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 )

CMU SCS Copyright: C. Faloutsos (2012)19 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) <6 >6 <9 >9

CMU SCS Copyright: C. Faloutsos (2012)20 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) <6 >6 <9 >9

CMU SCS Copyright: C. Faloutsos (2012)21 B-trees: Insertion Insert in leaf; on overflow, push middle up (recursively) split: preserves B - tree properties

CMU SCS Copyright: C. Faloutsos (2012)22 B-trees Easy case: Tree T0; insert ‘8’ <6 >6 <9 >9

CMU SCS Copyright: C. Faloutsos (2012)23 B-trees Tree T0; insert ‘8’ <6 >6 <9 >9 8

CMU SCS Copyright: C. Faloutsos (2012)24 B-trees Hardest case: Tree T0; insert ‘2’ <6 >6 <9 >9 2

CMU SCS Copyright: C. Faloutsos (2012)25 B-trees Hardest case: Tree T0; insert ‘2’ push middle up

CMU SCS Copyright: C. Faloutsos (2012)26 B-trees Hardest case: Tree T0; insert ‘2’ Ovf; push middle

CMU SCS Copyright: C. Faloutsos (2012)27 B-trees Hardest case: Tree T0; insert ‘2’ Final state

CMU SCS Copyright: C. Faloutsos (2012)28 B-trees: Insertion Q: What if there are two middles? (eg, order 4) A: either one is fine

CMU SCS Copyright: C. Faloutsos (2012)29 B-trees: Insertion Insert in leaf; on overflow, push middle up (recursively – ‘propagate split’) split: preserves all B - tree properties (!!) notice how it grows: height increases when root overflows & splits Automatic, incremental re-organization

CMU SCS Copyright: C. Faloutsos (2012)30 Overview B – trees –Dfn, Search, insertion, deletion B+ - trees hashing

CMU SCS Copyright: C. Faloutsos (2012)31 Deletion Rough outline of algo: Delete key; on underflow, may need to merge In practice, some implementors just allow underflows to happen…

CMU SCS Copyright: C. Faloutsos (2012)32 B-trees – Deletion Easiest case: Tree T0; delete ‘3’ <6 >6 <9 >9

CMU SCS Copyright: C. Faloutsos (2012)33 B-trees – Deletion Easiest case: Tree T0; delete ‘3’ <6 >6 <9 >9

CMU SCS Copyright: C. Faloutsos (2012)34 B-trees – Deletion Easiest case: Tree T0; delete ‘3’ <6 >6 <9 >9

CMU SCS Copyright: C. Faloutsos (2012)35 B-trees – Deletion Case1: delete a key at a leaf – no underflow Case2: delete non-leaf key – no underflow Case3: delete leaf-key; underflow, and ‘rich sibling’ Case4: delete leaf-key; underflow, and ‘poor sibling’

CMU SCS Copyright: C. Faloutsos (2012)36 B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) <6 >6 <9 >9 Delete & promote, ie:

CMU SCS Copyright: C. Faloutsos (2012)37 B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) <6 >6 <9 >9 Delete & promote, ie:

CMU SCS Copyright: C. Faloutsos (2012)38 B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) <6 >6 <9 >9 Delete & promote, ie: 3

CMU SCS Copyright: C. Faloutsos (2012)39 B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) <3 >3 <9 >9 3 FINAL TREE

CMU SCS Copyright: C. Faloutsos (2012)40 B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) Q: How to promote? A: pick the largest key from the left sub-tree (or the smallest from the right sub-tree) Observation: every deletion eventually becomes a deletion of a leaf key

CMU SCS Copyright: C. Faloutsos (2012)41 B-trees – Deletion Case1: delete a key at a leaf – no underflow Case2: delete non-leaf key – no underflow Case3: delete leaf-key; underflow, and ‘rich sibling’ Case4: delete leaf-key; underflow, and ‘poor sibling’

CMU SCS Copyright: C. Faloutsos (2012)42 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) <6 >6 <9 >9 Delete & borrow, ie:

CMU SCS Copyright: C. Faloutsos (2012)43 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) <6 >6 <9 >9 Delete & borrow, ie: Rich sibling

CMU SCS Copyright: C. Faloutsos (2012)44 B-trees – Deletion Case3: underflow & ‘rich sibling’ ‘rich’ = can give a key, without underflowing ‘borrowing’ a key: THROUGH the PARENT!

CMU SCS Copyright: C. Faloutsos (2012)45 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) <6 >6 <9 >9 Delete & borrow, ie: Rich sibling NO!!

CMU SCS Copyright: C. Faloutsos (2012)46 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) <6 >6 <9 >9 Delete & borrow, ie:

CMU SCS Copyright: C. Faloutsos (2012)47 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) <6 >6 <9 >9 Delete & borrow, ie: 6

CMU SCS Copyright: C. Faloutsos (2012)48 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) <6 >6 <9 >9 Delete & borrow, ie: 6

CMU SCS Copyright: C. Faloutsos (2012)49 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) <3 >3 <9 >9 Delete & borrow, through the parent 6 FINAL TREE

CMU SCS Copyright: C. Faloutsos (2012)50 B-trees – Deletion Case1: delete a key at a leaf – no underflow Case2: delete non-leaf key – no underflow Case3: delete leaf-key; underflow, and ‘rich sibling’ Case4: delete leaf-key; underflow, and ‘poor sibling’

CMU SCS Copyright: C. Faloutsos (2012)51 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) <6 >6 <9 >9

CMU SCS Copyright: C. Faloutsos (2012)52 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) <6 >6 <9 >9

CMU SCS Copyright: C. Faloutsos (2012)53 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) <6 >6 <9 >9 A: merge w/ ‘poor’ sibling

CMU SCS Copyright: C. Faloutsos (2012)54 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) Merge, by pulling a key from the parent exact reversal from insertion: ‘split and push up’, vs. ‘merge and pull down’ Ie.:

CMU SCS Copyright: C. Faloutsos (2012)55 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) <6 >6 A: merge w/ ‘poor’ sibling 9

CMU SCS Copyright: C. Faloutsos (2012)56 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) <6 >6 9 FINAL TREE

CMU SCS Copyright: C. Faloutsos (2012)57 B-trees – Deletion Case4: underflow & ‘poor sibling’ -> ‘pull key from parent, and merge’ Q: What if the parent underflows?

CMU SCS Copyright: C. Faloutsos (2012)58 B-trees – Deletion Case4: underflow & ‘poor sibling’ -> ‘pull key from parent, and merge’ Q: What if the parent underflows? A: repeat recursively

CMU SCS Copyright: C. Faloutsos (2012)59 Overview B – trees B+ - trees, B*-trees hashing

CMU SCS Copyright: C. Faloutsos (2012)60 B+ trees - Motivation if we want to store the whole record with the key –> problems (what?) <6 >6 <9 >9

CMU SCS Copyright: C. Faloutsos (2012)61 Solution: B + - trees They string all leaf nodes together AND replicate keys from non-leaf nodes, to make sure every key appears at the leaf level

CMU SCS Copyright: C. Faloutsos (2012)62 B+ trees <6 >=6>=6 <9 >=9>=9 713

CMU SCS Copyright: C. Faloutsos (2012)63 B+ trees - insertion <6 >=6>=6 <9 >=9>=9 713 Eg., insert ‘8’

CMU SCS Copyright: C. Faloutsos (2012)64 Overview B – trees B+ - trees, B*-trees hashing

CMU SCS Copyright: C. Faloutsos (2012)65 B*-trees splits drop util. to 50%, and maybe increase height How to avoid them?

CMU SCS Copyright: C. Faloutsos (2012)66 B*-trees: deferred split! Instead of splitting, LEND keys to sibling! (through PARENT, of course!) <6 >6 <9 >9 2

CMU SCS Copyright: C. Faloutsos (2012)67 B*-trees: deferred split! Instead of splitting, LEND keys to sibling! (through PARENT, of course!) <3 >3 <9 >9 2 7 FINAL TREE

CMU SCS Copyright: C. Faloutsos (2012)68 B*-trees: deferred split! Notice: shorter, more packed, faster tree It’s a rare case, where space utilization and speed improve together BUT: What if the sibling has no room for our ‘lending’?

CMU SCS Copyright: C. Faloutsos (2012)69 B*-trees: deferred split! BUT: What if the sibling has no room for our ‘lending’? A: 2-to-3 split: get the keys from the sibling, pool them with ours (and a key from the parent), and split in 3. Details: too messy (and even worse for deletion)

CMU SCS Copyright: C. Faloutsos (2012)70 Conclusions Main ideas: recursive; block-aware; on overflow -> split; defer splits All B-tree variants have excellent, O(logN) worst-case performance for ins/del/search B+ tree is the prevailing indexing method More details: [Knuth vol 3.] or [Ramakrishnan & Gehrke, 3rd ed, ch. 10]