Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yan Huang - CSCI5330 Database Implementation – Access Methods

Similar presentations


Presentation on theme: "Yan Huang - CSCI5330 Database Implementation – Access Methods"— Presentation transcript:

1 Yan Huang - CSCI5330 Database Implementation – Access Methods
This is a modified version of Prof. Hector Garcia Molina’s slides. All copy rights belong to the original author. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

2 Yan Huang - CSCI5330 Database Implementation – Access Methods
Basic Concepts Value Search Key - set of attributes used to look up records in a file. search key pointer record ? value 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

3 Index Evaluation Metrics
Access types supported efficiently. E.g., Point query: find “Tom” Range query: find students whose age is between 20-40 Access time Update time Space overhead 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

4 Yan Huang - CSCI5330 Database Implementation – Access Methods
Ordered Indices In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

5 Yan Huang - CSCI5330 Database Implementation – Access Methods
same order Search key 20 10 Primary index Also called clustering index The search key of a primary index is usually but not necessarily the primary key. 10 30 50 70 40 30 90 110 130 150 60 50 80 70 170 190 210 230 100 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

6 Yan Huang - CSCI5330 Database Implementation – Access Methods
different order Search key Secondary index: non-clustering index. 10 20 30 40 50 60 70 ... 50 30 70 20 40 80 10 100 60 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

7 Yan Huang - CSCI5330 Database Implementation – Access Methods
Dense Index Sequential File 20 10 10 20 30 40 Dense Index: contains index records for every search-key values. 40 30 50 60 70 80 60 50 80 70 90 100 110 120 100 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

8 Yan Huang - CSCI5330 Database Implementation – Access Methods
Sparse Index Sequential File 20 10 10 30 50 70 Sparse Index: contains index records for only some search-key values. Applicable when records are sequentially ordered on search-key 40 30 90 110 130 150 60 50 80 70 170 190 210 230 100 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

9 Yan Huang - CSCI5330 Database Implementation – Access Methods
Secondary indexes Sequence field does not make sense! 50 30 30 20 80 100 70 20 Sparse index 90 ... 40 80 10 100 60 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

10 Yan Huang - CSCI5330 Database Implementation – Access Methods
Multilevel Index Sparse 2nd level Sequential File 20 10 10 90 170 250 10 30 50 70 40 30 90 110 130 150 330 410 490 570 60 50 80 70 170 190 210 230 100 90 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

11 Yan Huang - CSCI5330 Database Implementation – Access Methods
Multilevel Index Secondary indexes Sequence field 10 20 30 40 50 60 70 ... 50 30 10 50 90 ... sparse high level 70 20 40 80 10 100 60 90 Lowest level is dense Other levels are sparse 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

12 Yan Huang - CSCI5330 Database Implementation – Access Methods
Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

13 Yan Huang - CSCI5330 Database Implementation – Access Methods
Outline: Conventional indexes B+-Tree  NEXT 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

14 Yan Huang - CSCI5330 Database Implementation – Access Methods
NEXT: Another type of index Give up on sequentiality of index Try to get “balance” 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

15 Yan Huang - CSCI5330 Database Implementation – Access Methods
B+Tree Example n=4 Root 100 120 150 180 30 3 5 11 120 130 180 200 30 35 100 101 110 150 156 179 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

16 Sample non-leaf 57 81 95 to keys to keys to keys to keys
<  k<81 81k<95 95 Key is moved (not copied) from lower level non-leaf node to upper level non-leaf node 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

17 Yan Huang - CSCI5330 Database Implementation – Access Methods
Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 with key 57 with key 81 To record with key 85 Key is copied (not moved) from leaf node to non-leaf node 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

18 Yan Huang - CSCI5330 Database Implementation – Access Methods
35 Leaf: Non-leaf: 30 35 30 30 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

19 Yan Huang - CSCI5330 Database Implementation – Access Methods
Size of nodes: n pointers n-1 keys 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

20 Don’t want nodes to be too empty
Use at least Root : 2 pointers Non-leaf: n/2 pointers Leaf : (n-1)/2 keys 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

21 Yan Huang - CSCI5330 Database Implementation – Access Methods
Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35 counts even if null 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

22 B+tree rules tree of order n
(1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer” 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

23 Yan Huang - CSCI5330 Database Implementation – Access Methods
(3) Number of pointers/keys for B+tree Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n n-1 n/2 n/2- 1 Leaf (non-root) n n-1 (n-1)/2 (n-1)/2 Root n n-1 2 1 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

24 Yan Huang - CSCI5330 Database Implementation – Access Methods
Insert into B+tree (a) simple case space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

25 Yan Huang - CSCI5330 Database Implementation – Access Methods
(a) Insert key = 32 n=4 100 30 3 5 11 30 31 32 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

26 Yan Huang - CSCI5330 Database Implementation – Access Methods
(b) Insert key = 7 n=4 100 30 7 3 5 11 30 31 3 5 7 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

27 Yan Huang - CSCI5330 Database Implementation – Access Methods
(c) Insert key = 160 n=4 100 160 120 150 180 180 150 156 179 180 200 160 179 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

28 Yan Huang - CSCI5330 Database Implementation – Access Methods
(d) New root, insert 45 n=4 30 new root 10 20 30 40 1 2 3 10 12 20 25 30 32 40 40 45 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

29 Yan Huang - CSCI5330 Database Implementation – Access Methods
Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

30 Yan Huang - CSCI5330 Database Implementation – Access Methods
(b) Coalesce with sibling Delete 50 n=5 10 40 100 40 10 20 30 40 50 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

31 Yan Huang - CSCI5330 Database Implementation – Access Methods
(c) Redistribute keys Delete 50 n=5 10 40 100 35 10 20 30 35 40 50 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

32 Yan Huang - CSCI5330 Database Implementation – Access Methods
(d) Non-leaf coalesce Delete 37 n=5 25 25 new root 10 20 30 40 40 30 25 26 1 3 10 14 20 22 30 37 40 45 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

33 B+tree deletions in practice
Often, coalescing is not implemented Too hard and not worth it! 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

34 Index Definition in SQL
Create an index create index <index-name> on <relation-name> (<attribute-list>) E.g.: create index gindex on country(gdp); To drop an index drop index <index-name> E.g.: drop index gindex; 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

35 Yan Huang - CSCI5330 Database Implementation – Access Methods
Multi-key Index Motivation: Find records where DEPT = “Toy” AND SAL > 50k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

36 Yan Huang - CSCI5330 Database Implementation – Access Methods
Strategy I: Use one index, say Dept. Get all Dept = “Toy” records and check their salary I1 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

37 Yan Huang - CSCI5330 Database Implementation – Access Methods
Strategy II: Use 2 Indexes; Manipulate Pointers Toy Sal > 50k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

38 Yan Huang - CSCI5330 Database Implementation – Access Methods
Strategy III: Multiple Key Index One idea: I2 I3 I1 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

39 Yan Huang - CSCI5330 Database Implementation – Access Methods
Example Example Record Dept Index Salary 10k 15k Art Sales Toy 17k 21k Name=Joe DEPT=Sales SAL=15k 12k 15k 15k 19k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

40 For which queries is this index good?
Find RECs Dept = “Sales” SAL=20k Find RECs Dept = “Sales” SAL > 20k Find RECs Dept = “Sales” Find RECs SAL = 20k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

41 Interesting application:
Geographic Data DATA: <X1,Y1, Attributes> <X2,Y2, Attributes> y x . . . 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

42 Yan Huang - CSCI5330 Database Implementation – Access Methods
Queries: What city is at <Xi,Yi>? What is within 5 miles from <Xi,Yi>? Which is closest point to <Xi,Yi>? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

43 Yan Huang - CSCI5330 Database Implementation – Access Methods
Example a 25 15 35 20 40 30 10 i d e h Search points near f Search points near b b n f 5 15 l o c j g m k h i a b c d e f g n o m l j k 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

44 Yan Huang - CSCI5330 Database Implementation – Access Methods
Queries Find points with Yi > 20 Find points with Xi < 5 Find points “close” to i = <12,38> Find points “close” to b = <7,24> 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

45 Yan Huang - CSCI5330 Database Implementation – Access Methods
Many types of geographic index structures have been suggested Quad Trees R Trees 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

46 Two more types of multi key indexes
Grid Bitmap index 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

47 Yan Huang - CSCI5330 Database Implementation – Access Methods
Grid Index Key 2 X1 X2 …… Xn V1 V2 Key 1 Vn To records with key1=V3, key2=X2 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

48 Yan Huang - CSCI5330 Database Implementation – Access Methods
CLAIM Can quickly find records with key 1 = Vi  Key 2 = Xj key 1 = Vi key 2 = Xj And also ranges…. E.g., key 1  Vi  key 2 < Xj 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

49 Yan Huang - CSCI5330 Database Implementation – Access Methods
 But there is a catch with Grid Indexes! How is Grid Index stored on disk? Like Array... X1 X2 X3 X4 V1 V2 V3 Problem: Need regularity so we can compute position of <Vi,Xj> entry 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

50 Solution: Use Indirection
Buckets V1 V2 V *Grid only V contains pointers to buckets X1 X2 X3 -- -- -- -- -- 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

51 Yan Huang - CSCI5330 Database Implementation – Access Methods
With indirection: Grid can be regular without wasting space We do have price of indirection 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

52 Can also index grid on value ranges
Salary Grid 0-20K 1 20K-50K 2 50K- 8 3 Linear Scale 1 2 3 Toy Sales Personnel 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

53 Yan Huang - CSCI5330 Database Implementation – Access Methods
Grid files Good for multiple-key search Space, management overhead (nothing is free) Need partitioning ranges that evenly split keys + - - 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

54 Example Grid File for account
Divide branch-name into non-uniform intervals ? Branch-name <Central and 10k<=balance<50k two attributes as search key Divide balance into non-uniform intervals What about Central<=branch-name<Townsend and 50k<=balance? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

55 Example Grid File for account
Bj Bk 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

56 Yan Huang - CSCI5330 Database Implementation – Access Methods
Grid Files (Cont.) Linear scales must be chosen to uniformly distribute records across cells. Otherwise there will be too many overflow buckets. Periodic re-organization to increase grid size will help. But reorganization can be very expensive. Space overhead of grid array can be high. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

57 Yan Huang - CSCI5330 Database Implementation – Access Methods
Bitmap Indices Another index could be used for multiple valued search keys 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

58 Bitmap Indices (Cont.) The income-level value of record 3 is L1
Bitmap(size = table size) Unique values of gender Unique values of income-level 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

59 Yan Huang - CSCI5330 Database Implementation – Access Methods
Bitmap Indices (Cont.) Some properties of bitmap indices Number of bitmaps for each attribute? Size of each bitmap? When is the bitmap matrix sparse and what attributes are good for bitmap indices? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

60 Yan Huang - CSCI5330 Database Implementation – Access Methods
Bitmap Indices (Cont.) Bitmap indices generally very small compared with relation size E.g. if record is 100 bytes, space for a single bitmap is 1/800 of space used by relation. If number of distinct attribute values is 8, bitmap is only 1% of relation size What about insertion? Deletion? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

61 Bitmap Indices Queries
Sample query: Males with income level L1 10010 AND = 10000 even faster! What about the number of males with income level L1? 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

62 Bitmap Indices Queries
Queries are answered using bitmap operations Intersection (and) Union (or) Complementation (not) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

63 Yan Huang - CSCI5330 Database Implementation – Access Methods
Hashing key  h(key) <key> Buckets (typically 1 disk block) . 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

64 Yan Huang - CSCI5330 Database Implementation – Access Methods
Two alternatives . records (1) key  h(key) . 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

65 Yan Huang - CSCI5330 Database Implementation – Access Methods
Two alternatives record (2) key  h(key) key 1 Index Alt (2) for “secondary” search key 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

66 Yan Huang - CSCI5330 Database Implementation – Access Methods
Example hash function Key = ‘x1 x2 … xn’ n byte character string Have b buckets h: add x1 + x2 + ….. xn compute sum modulo b 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

67 Yan Huang - CSCI5330 Database Implementation – Access Methods
 This may not be best function … Good hash  Expected number of function: keys/bucket is the same for all buckets 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

68 Yan Huang - CSCI5330 Database Implementation – Access Methods
Within a bucket: Do we keep keys sorted? Yes, if CPU time critical & Inserts/Deletes not too frequent 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

69 Next: example to illustrate inserts, overflows, deletes
h(K) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

70 EXAMPLE 2 records/bucket
INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 1 2 3 d a c b e h(e) = 1 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

71 Yan Huang - CSCI5330 Database Implementation – Access Methods
EXAMPLE: deletion Delete: e f 1 2 3 a d b d c c e maybe move “g” up f g 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

72 Yan Huang - CSCI5330 Database Implementation – Access Methods
Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

73 How do we cope with growth?
Overflows and reorganizations Dynamic hashing Extensible Linear 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

74 Extensible hashing: two ideas
(a) Use i of b bits output by hash function b h(K)  use i  grows over time…. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

75 Yan Huang - CSCI5330 Database Implementation – Access Methods
(b) Use directory h(K)[i ] to bucket . . 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

76 Example: h(k) is 4 bits; 2 keys/bucket
New directory 2 00 01 10 11 i = 1 i = 0001 1 1 1001 1 1100 1010 1100 Insert 1010 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

77 Yan Huang - CSCI5330 Database Implementation – Access Methods
Example continued 2 0000 0111 0001 i = 2 00 01 10 11 1 0001 0111 2 1001 1010 Insert: 0111 0000 2 1100 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

78 Yan Huang - CSCI5330 Database Implementation – Access Methods
Example continued 000 001 010 011 100 101 110 111 3 i = 0000 2 i = 0001 2 00 01 10 11 0111 2 1001 1010 2 1001 1010 Insert: 1001 2 1100 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

79 Extensible hashing: deletion
No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

80 Yan Huang - CSCI5330 Database Implementation – Access Methods
Deletion example: Run thru insert example in reverse! 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

81 Yan Huang - CSCI5330 Database Implementation – Access Methods
Extensible hashing Summary Can handle growing files - with less wasted space - with no full reorganizations + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

82 Yan Huang - CSCI5330 Database Implementation – Access Methods
Linear hashing Another dynamic hashing scheme Two ideas: (a) Use i low order bits of hash grows b i (b) File grows linearly 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

83 Example b=4 bits, i =2, 2 keys/bucket
0101 can have overflow chains! insert 0101 Future growth buckets 0000 0101 1010 1111 m = 01 (max used block) If h(k)[i ]  m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ] - 2i -1 Rule 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

84 Example b=4 bits, i =2, 2 keys/bucket
0101 insert 0101 1111 0101 Future growth buckets 11 0000 1010 0101 10 1010 1111 m = 01 (max used block) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

85 Example Continued: How to grow beyond this?
3 i = 2 0000 100 0101 101 0101 1010 1111 0101 . . . m = 11 (max used block) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

86 Yan Huang - CSCI5330 Database Implementation – Access Methods
 When do we expand file? Keep track of: # used slots total # of slots = U If U > threshold then increase m (and maybe i ) 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

87 Yan Huang - CSCI5330 Database Implementation – Access Methods
Linear Hashing Summary Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing + + Can still have overflow chains - 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

88 Yan Huang - CSCI5330 Database Implementation – Access Methods
Example: BAD CASE Very full Very empty Need to move m here… Would waste space... 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

89 Yan Huang - CSCI5330 Database Implementation – Access Methods
Summary Hashing - How it works - Dynamic hashing - Extensible - Linear 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

90 Yan Huang - CSCI5330 Database Implementation – Access Methods
Indexing vs Hashing Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods

91 Yan Huang - CSCI5330 Database Implementation – Access Methods
Indexing vs Hashing INDEXING good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Access Methods


Download ppt "Yan Huang - CSCI5330 Database Implementation – Access Methods"

Similar presentations


Ads by Google