Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman.

Similar presentations


Presentation on theme: "1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman."— Presentation transcript:

1 1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman

2 2 Physical Storage Media n Speed of data access n Cost per unit of data n Reliability Data loss (power failure or system crash) Physical failure (storage device) Storage types Volatile storage Non-volatile storage

3 3 Memory Hierarchy DBMS Programs, Main Memory DBMS Tertiary Storage Virtual Memory Disk File System Main Memory Cache

4 4 Disk Access Characteristics Move data to main memory: Position head on cylinder Find and access sector Steps of reading a block: Processor and disk controller processes the request Seek time: position the head Rotation latency: rotate the sector under the head Transfer time: sector/block read by the head

5 5 Disk Access Characteristics Steps of writing a block: Read the block into the main memory Change main memory copy of block Write new content back on disk Verify correctness of write

6 6 How to find records efficiently? Primary key – sequential organization Search key? High I/O cost  INDEXING

7 Cost of Indexing Where the time spent on answering a query Fast: processing in memory Slow: fetching from secondary storage Cost of indexing: –Index on several attributes: fast retrieval but slow writes (maintain index structure) 7

8 8 Topics Conventional indexes B-trees Hashing schemes (read only)

9 9 Sequential File 20 10 40 30 60 50 80 70 100 90

10 10 Sequential File 20 10 40 30 60 50 80 70 100 90 Dense Index 10 20 30 40 50 60 70 80 90 100 110 120

11 11 Sequential File 20 10 40 30 60 50 80 70 100 90 Sparse Index 10 30 50 70 90 110 130 150 170 190 210 230

12 12 Sequential File 20 10 40 30 60 50 80 70 100 90 Sparse 2nd level 10 30 50 70 90 110 130 150 170 190 210 230 10 90 170 250 330 410 490 570

13 13 Sparse vs. Dense Tradeoff Sparse: Less index space per record can keep more of index in memory Dense: Can tell if any record exists without accessing file

14 14 Terms Index sequential file Search key (  primary key) Primary index (on Sequencing field) Secondary index Dense index (all Search Key values in) Sparse index Multi-level index

15 15 Next: Duplicate keys Deletion/Insertion Secondary indexes

16 16 Duplicate keys 10 20 10 30 20 30 45 40

17 17 10 20 10 30 20 30 45 40 10 20 30 10 20 10 30 20 30 45 40 10 20 30 Dense index, one way to implement? Duplicate keys

18 18 10 20 10 30 20 30 45 40 10 20 30 40 Dense index, better way? Duplicate keys

19 19 10 20 10 30 20 30 45 40 10 20 30 Sparse index, one way? Duplicate keys careful if looking for 20 or 30!

20 20 10 20 10 30 20 30 45 40 10 20 30 Sparse index, another way? Duplicate keys – place first new key from block should this be 40?

21 21 Duplicate values, primary index Index may point to first instance of each value only File Index Summary a a a b 

22 22 Deletion from sparse index 20 10 40 30 60 50 80 70 10 30 50 70 90 110 130 150

23 23 Deletion from sparse index 20 10 40 30 60 50 80 70 10 30 50 70 90 110 130 150 – delete record 40

24 24 Deletion from sparse index 20 10 40 30 60 50 80 70 10 30 50 70 90 110 130 150 – delete record 30 40

25 25 Deletion from sparse index 20 10 40 30 60 50 80 70 10 30 50 70 90 110 130 150 – delete records 30 & 40 50 70

26 26 Deletion from dense index 20 10 40 30 60 50 80 70 10 20 30 40 50 60 70 80

27 27 Deletion from dense index 20 10 40 30 60 50 80 70 10 20 30 40 50 60 70 80 – delete record 30 40

28 28 Insertion, sparse index case 20 1030 50 4060 10 30 40 60

29 29 Insertion, sparse index case 20 1030 50 4060 10 30 40 60 – insert record 34 34 our lucky day! we have free space where we need it!

30 30 Insertion, sparse index case 20 1030 50 4060 10 30 40 60 – insert record 15 15 20 30 20 Illustrated: Immediate reorganization Variation: – insert new block (chained file) – update index

31 31 Insertion, sparse index case 20 1030 50 4060 10 30 40 60 – insert record 25 25 overflow blocks (reorganize later...)

32 32 Insertion, dense index case Similar Often more expensive...

33 33 Summary so far Conventional index –Basic Ideas: sparse, dense, multi-level… –Duplicate Keys –Deletion/Insertion –Secondary indexes

34 34 Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance

35 35 NEXT: Another type of index –Give up on sequentiality of index –Try to get “balance”

36 36 Root B+Tree Examplen=3 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200

37 37 Sample non-leaf to keysto keysto keys to keys < 5757  k<8181  k<95  95 57 81 95

38 38 Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85

39 39 Size of nodes:n+1 pointers n keys (fixed)

40 40 Don’t want nodes to be too empty Use at least Non-leaf:  (n+1)/2  pointers Leaf:  (n+1)/2  pointers to data

41 41 Full nodemin. node Non-leaf Leaf n=3 120 150 180 30 3 5 11 30 35 counts even if null

42 42 B+tree rulestree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer”

43 43 (3) Number of pointers/keys for B+tree Non-leaf (non-root) n+1n  (n+1)/ 2   (n+1)/ 2  - 1 Leaf (non-root) n+1n Rootn+1n11 Max Max Min Min ptrs keys ptrs  data keys  (n+ 1) / 2 

44 44 Insert into B+tree (read only) (a) simple case –space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root

45 45 (a) Insert key = 32 n=3 3 5 11 30 31 30 100 32

46 46 (a) Insert key = 7 n=3 3 5 11 30 31 30 100 3535 7 7

47 47 (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Deletion from B+tree

48 48 (b) Coalesce with sibling –Delete 50 10 40 100 10 20 30 40 50 n=4 40

49 49 (c) Redistribute keys –Delete 50 10 40 100 10 20 30 35 40 50 n=4 35

50 50 B+tree deletions in practice –Often, coalescing is not implemented –Too hard and not worth it!


Download ppt "1 CSCE 520 Test 2 Info Indexing Modified from slides of Hector Garcia-Molina and Jeff Ullman."

Similar presentations


Ads by Google