Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU SCS 15-826: Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU www.cs.cmu.edu/~christos.

Similar presentations


Presentation on theme: "CMU SCS 15-826: Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU www.cs.cmu.edu/~christos."— Presentation transcript:

1 CMU SCS 15-826: Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU www.cs.cmu.edu/~christos

2 CMU SCS Reading Material [Ramakrishnan & Gehrke, 3rd ed, ch. 10] 15-826Copyright: C. Faloutsos (2012)2

3 CMU SCS 15-826Copyright: C. Faloutsos (2012)3 Problem Given a large collection of (multimedia) records, find similar/interesting things, ie: Allow fast, approximate queries, and Find rules/patterns

4 CMU SCS 15-826Copyright: C. Faloutsos (2012)4 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining

5 CMU SCS 15-826Copyright: C. Faloutsos (2012)5 Indexing - Detailed outline primary key indexing –B-trees and variants –(static) hashing –extendible hashing secondary key indexing spatial access methods text...

6 CMU SCS 15-826Copyright: C. Faloutsos (2012)6 In even more detail: B – trees B+ - trees, B*-trees hashing

7 CMU SCS 15-826Copyright: C. Faloutsos (2012)7 Primary key indexing find employee with ssn=123

8 CMU SCS 15-826Copyright: C. Faloutsos (2012)8 B-trees the most successful family of index schemes (B-trees, B +- trees, B * -trees) Can be used for primary/secondary, clustering/non-clustering index. balanced “n-way” search trees

9 CMU SCS 15-826Copyright: C. Faloutsos (2012)9 Citation Rudolf Bayer and Edward M. McCreight, Organization and Maintenance of Large Ordered Indices, Acta Informatica, 1:173-189, 1972. Received the 2001 SIGMOD innovations award among the most cited db publications www.informatik.uni-trier.de/~ley/db/about/top.html

10 CMU SCS 15-826Copyright: C. Faloutsos (2012)10 B-trees Eg., B-tree of order 3: 1 3 6 7 9 13 <6 >6 <9 >9

11 CMU SCS 15-826Copyright: C. Faloutsos (2012)11 B - tree properties: each node, in a B-tree of order n: –Key order –at most n pointers –at least n/2 pointers (except root) –all leaves at the same level –if number of pointers is k, then node has exactly k-1 keys –(leaves are empty) v1v2 …v n-1 p1 pn

12 CMU SCS 15-826Copyright: C. Faloutsos (2012)12 Properties “block aware” nodes: each node -> disk page O(log (N)) for everything! (ins/del/search) typically, if n = 50 - 100, then 2 - 3 levels utilization >= 50%, guaranteed; on average 69%

13 CMU SCS 15-826Copyright: C. Faloutsos (2012)13 Queries Algo for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9

14 CMU SCS 15-826Copyright: C. Faloutsos (2012)14 Queries Algo for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9

15 CMU SCS 15-826Copyright: C. Faloutsos (2012)15 Queries Algo for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9

16 CMU SCS 15-826Copyright: C. Faloutsos (2012)16 Queries Algo for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9

17 CMU SCS 15-826Copyright: C. Faloutsos (2012)17 Queries Algo for exact match query? (eg., ssn=8?) 1 3 6 7 9 13 <6 >6 <9 >9 H steps (= disk accesses)

18 CMU SCS 15-826Copyright: C. Faloutsos (2012)18 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 )

19 CMU SCS 15-826Copyright: C. Faloutsos (2012)19 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) 1 3 6 7 9 13 <6 >6 <9 >9

20 CMU SCS 15-826Copyright: C. Faloutsos (2012)20 Queries what about range queries? (eg., 5<salary<8) Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) 1 3 6 7 9 13 <6 >6 <9 >9

21 CMU SCS 15-826Copyright: C. Faloutsos (2012)21 B-trees: Insertion Insert in leaf; on overflow, push middle up (recursively) split: preserves B - tree properties

22 CMU SCS 15-826Copyright: C. Faloutsos (2012)22 B-trees Easy case: Tree T0; insert ‘8’ 1 3 6 7 9 13 <6 >6 <9 >9

23 CMU SCS 15-826Copyright: C. Faloutsos (2012)23 B-trees Tree T0; insert ‘8’ 1 3 6 7 9 13 <6 >6 <9 >9 8

24 CMU SCS 15-826Copyright: C. Faloutsos (2012)24 B-trees Hardest case: Tree T0; insert ‘2’ 1 3 6 7 9 13 <6 >6 <9 >9 2

25 CMU SCS 15-826Copyright: C. Faloutsos (2012)25 B-trees Hardest case: Tree T0; insert ‘2’ 1 2 6 7 9 13 3 push middle up

26 CMU SCS 15-826Copyright: C. Faloutsos (2012)26 B-trees Hardest case: Tree T0; insert ‘2’ 6 7 9 1313 2 2 Ovf; push middle

27 CMU SCS 15-826Copyright: C. Faloutsos (2012)27 B-trees Hardest case: Tree T0; insert ‘2’ 7 9 1313 2 6 Final state

28 CMU SCS 15-826Copyright: C. Faloutsos (2012)28 B-trees: Insertion Q: What if there are two middles? (eg, order 4) A: either one is fine

29 CMU SCS 15-826Copyright: C. Faloutsos (2012)29 B-trees: Insertion Insert in leaf; on overflow, push middle up (recursively – ‘propagate split’) split: preserves all B - tree properties (!!) notice how it grows: height increases when root overflows & splits Automatic, incremental re-organization

30 CMU SCS 15-826Copyright: C. Faloutsos (2012)30 Overview B – trees –Dfn, Search, insertion, deletion B+ - trees hashing

31 CMU SCS 15-826Copyright: C. Faloutsos (2012)31 Deletion Rough outline of algo: Delete key; on underflow, may need to merge In practice, some implementors just allow underflows to happen…

32 CMU SCS 15-826Copyright: C. Faloutsos (2012)32 B-trees – Deletion Easiest case: Tree T0; delete ‘3’ 1 3 6 7 9 13 <6 >6 <9 >9

33 CMU SCS 15-826Copyright: C. Faloutsos (2012)33 B-trees – Deletion Easiest case: Tree T0; delete ‘3’ 1 6 7 9 13 <6 >6 <9 >9

34 CMU SCS 15-826Copyright: C. Faloutsos (2012)34 B-trees – Deletion Easiest case: Tree T0; delete ‘3’ 1 6 7 9 13 <6 >6 <9 >9

35 CMU SCS 15-826Copyright: C. Faloutsos (2012)35 B-trees – Deletion Case1: delete a key at a leaf – no underflow Case2: delete non-leaf key – no underflow Case3: delete leaf-key; underflow, and ‘rich sibling’ Case4: delete leaf-key; underflow, and ‘poor sibling’

36 CMU SCS 15-826Copyright: C. Faloutsos (2012)36 B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) 1 3 6 7 9 13 <6 >6 <9 >9 Delete & promote, ie:

37 CMU SCS 15-826Copyright: C. Faloutsos (2012)37 B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) 1 3 7 9 13 <6 >6 <9 >9 Delete & promote, ie:

38 CMU SCS 15-826Copyright: C. Faloutsos (2012)38 B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) 17 9 13 <6 >6 <9 >9 Delete & promote, ie: 3

39 CMU SCS 15-826Copyright: C. Faloutsos (2012)39 B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) 17 9 13 <3 >3 <9 >9 3 FINAL TREE

40 CMU SCS 15-826Copyright: C. Faloutsos (2012)40 B-trees – Deletion Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) Q: How to promote? A: pick the largest key from the left sub-tree (or the smallest from the right sub-tree) Observation: every deletion eventually becomes a deletion of a leaf key

41 CMU SCS 15-826Copyright: C. Faloutsos (2012)41 B-trees – Deletion Case1: delete a key at a leaf – no underflow Case2: delete non-leaf key – no underflow Case3: delete leaf-key; underflow, and ‘rich sibling’ Case4: delete leaf-key; underflow, and ‘poor sibling’

42 CMU SCS 15-826Copyright: C. Faloutsos (2012)42 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 3 6 7 9 13 <6 >6 <9 >9 Delete & borrow, ie:

43 CMU SCS 15-826Copyright: C. Faloutsos (2012)43 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 3 69 13 <6 >6 <9 >9 Delete & borrow, ie: Rich sibling

44 CMU SCS 15-826Copyright: C. Faloutsos (2012)44 B-trees – Deletion Case3: underflow & ‘rich sibling’ ‘rich’ = can give a key, without underflowing ‘borrowing’ a key: THROUGH the PARENT!

45 CMU SCS 15-826Copyright: C. Faloutsos (2012)45 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 3 69 13 <6 >6 <9 >9 Delete & borrow, ie: Rich sibling NO!!

46 CMU SCS 15-826Copyright: C. Faloutsos (2012)46 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 3 69 13 <6 >6 <9 >9 Delete & borrow, ie:

47 CMU SCS 15-826Copyright: C. Faloutsos (2012)47 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 3 9 13 <6 >6 <9 >9 Delete & borrow, ie: 6

48 CMU SCS 15-826Copyright: C. Faloutsos (2012)48 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 39 13 <6 >6 <9 >9 Delete & borrow, ie: 6

49 CMU SCS 15-826Copyright: C. Faloutsos (2012)49 B-trees – Deletion Case3: underflow & ‘rich sibling’ (eg., delete 7 from T0) 1 39 13 <3 >3 <9 >9 Delete & borrow, through the parent 6 FINAL TREE

50 CMU SCS 15-826Copyright: C. Faloutsos (2012)50 B-trees – Deletion Case1: delete a key at a leaf – no underflow Case2: delete non-leaf key – no underflow Case3: delete leaf-key; underflow, and ‘rich sibling’ Case4: delete leaf-key; underflow, and ‘poor sibling’

51 CMU SCS 15-826Copyright: C. Faloutsos (2012)51 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) 1 3 6 7 9 13 <6 >6 <9 >9

52 CMU SCS 15-826Copyright: C. Faloutsos (2012)52 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) 1 3 6 7 9 <6 >6 <9 >9

53 CMU SCS 15-826Copyright: C. Faloutsos (2012)53 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) 1 3 6 7 9 <6 >6 <9 >9 A: merge w/ ‘poor’ sibling

54 CMU SCS 15-826Copyright: C. Faloutsos (2012)54 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) Merge, by pulling a key from the parent exact reversal from insertion: ‘split and push up’, vs. ‘merge and pull down’ Ie.:

55 CMU SCS 15-826Copyright: C. Faloutsos (2012)55 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) 1 3 6 7 <6 >6 A: merge w/ ‘poor’ sibling 9

56 CMU SCS 15-826Copyright: C. Faloutsos (2012)56 B-trees – Deletion Case4: underflow & ‘poor sibling’ (eg., delete 13 from T0) 1 3 6 7 <6 >6 9 FINAL TREE

57 CMU SCS 15-826Copyright: C. Faloutsos (2012)57 B-trees – Deletion Case4: underflow & ‘poor sibling’ -> ‘pull key from parent, and merge’ Q: What if the parent underflows?

58 CMU SCS 15-826Copyright: C. Faloutsos (2012)58 B-trees – Deletion Case4: underflow & ‘poor sibling’ -> ‘pull key from parent, and merge’ Q: What if the parent underflows? A: repeat recursively

59 CMU SCS 15-826Copyright: C. Faloutsos (2012)59 Overview B – trees B+ - trees, B*-trees hashing

60 CMU SCS 15-826Copyright: C. Faloutsos (2012)60 B+ trees - Motivation if we want to store the whole record with the key –> problems (what?) 1 3 6 7 9 13 <6 >6 <9 >9

61 CMU SCS 15-826Copyright: C. Faloutsos (2012)61 Solution: B + - trees They string all leaf nodes together AND replicate keys from non-leaf nodes, to make sure every key appears at the leaf level

62 CMU SCS 15-826Copyright: C. Faloutsos (2012)62 B+ trees 1 3 6 6 9 9 <6 >=6>=6 <9 >=9>=9 713

63 CMU SCS 15-826Copyright: C. Faloutsos (2012)63 B+ trees - insertion 1 3 6 6 9 9 <6 >=6>=6 <9 >=9>=9 713 Eg., insert ‘8’

64 CMU SCS 15-826Copyright: C. Faloutsos (2012)64 Overview B – trees B+ - trees, B*-trees hashing

65 CMU SCS 15-826Copyright: C. Faloutsos (2012)65 B*-trees splits drop util. to 50%, and maybe increase height How to avoid them?

66 CMU SCS 15-826Copyright: C. Faloutsos (2012)66 B*-trees: deferred split! Instead of splitting, LEND keys to sibling! (through PARENT, of course!) 1 3 6 7 9 13 <6 >6 <9 >9 2

67 CMU SCS 15-826Copyright: C. Faloutsos (2012)67 B*-trees: deferred split! Instead of splitting, LEND keys to sibling! (through PARENT, of course!) 1 2 3 6 9 13 <3 >3 <9 >9 2 7 FINAL TREE

68 CMU SCS 15-826Copyright: C. Faloutsos (2012)68 B*-trees: deferred split! Notice: shorter, more packed, faster tree It’s a rare case, where space utilization and speed improve together BUT: What if the sibling has no room for our ‘lending’?

69 CMU SCS 15-826Copyright: C. Faloutsos (2012)69 B*-trees: deferred split! BUT: What if the sibling has no room for our ‘lending’? A: 2-to-3 split: get the keys from the sibling, pool them with ours (and a key from the parent), and split in 3. Details: too messy (and even worse for deletion)

70 CMU SCS 15-826Copyright: C. Faloutsos (2012)70 Conclusions Main ideas: recursive; block-aware; on overflow -> split; defer splits All B-tree variants have excellent, O(logN) worst-case performance for ins/del/search B+ tree is the prevailing indexing method More details: [Knuth vol 3.] or [Ramakrishnan & Gehrke, 3rd ed, ch. 10]


Download ppt "CMU SCS 15-826: Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU www.cs.cmu.edu/~christos."

Similar presentations


Ads by Google