1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina.

1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

2 How to Represent a Relation uSuppose we scatter its records arbitrarily among the blocks of the disk uHow to answer SELECT * FROM R? uScan every block: wridiculously slow wwould require lots of overhead info in each block and each record header

3 How to Represent a Relation uReserve some blocks for the relation uNo need to scan entire disk uHow to answer SELECT * FROM R WHERE cond ? uScan all the records in the reserved blocks wStill ridiculously slow

4 Indexes uUse indexes -- special data structures -- that allow us to find all the records that satisfy a condition "efficiently" uPossible data structures: wsimple indexes on sorted files wsecondary indexes on unsorted files wB-trees whash tables

5 Sorted Files uSorted file: records (tuples) of the file (relation) are in sorted order of the field (attribute) of interest. uThis field might or might not be a key of the relation. uThis field is called the search key. uA sorted file is also called a sequential file.

6 Index on Sequential File uAn index is another file containing key- pointer pairs of the form (K,a) uK is a search key ua is an address (pointer) uThe record at address a has search key K uParticularly useful when the search key is the primary key of the relation

7 Dense Indexes uAn index with one entry for every key in the data file uWhat's the point? uIndex is much smaller than data file when record contains much more than just the search key uIf index is small enough to fit in main memory, record with a certain search key can be found quickly: binary search in memory, followed by only one disk I/O

8 Example of a Dense Index Sequential File 20 10 40 30 60 50 80 70 100 90 Dense Index 10 20 30 40 50 60 70 80 90 100 110 120

9 Some Numbers wrelation with 1,000,000 tuples wblock size is 4096 bytes u10 records per block uthus 100,000 blocks, > 400 Mbytes ukey field is 30 bytes upointer is 8 bytes uthus at least 100 key-pointer pairs per block uthus dense index size is 10,000 blocks, about 40 Mbytes usince log(10,000) = 13, takes at most 14 disk I/O's for a search

10 Sparse Index uUses less space than a dense index uRequires more time to find a record with a given key uIn a sparse index, there is just one (key,pointer) pair per data block. uThe key is for the first record in the block.

11 Sparse Index Example Sequential File 20 10 40 30 60 50 80 70 100 90 Sparse Index 10 30 50 70 90 110 130 150 170 190 210 230

12 Using a Sparse Index uTo find the record with key K, search the index for the largest key ≤ K uUse binary search to do this uRetrieve the indicated data block uSearch the block for the record with key K

13 Comparing Sparse and Dense Indexes uSparse index uses much less space wIn the previous numeric example, sparse index size is now only 1000 index blocks, about 4 Mbytes uDense index, unlike sparse, lets us answer "is there a record with key K?" without having to retrieve a data block

14 Multiple Levels of Index uMake an index for the index uCan continue this idea for more levels, but usually only two levels in practice uSecond and higher level indexes must be sparse, otherwise no savings

15 Two-Level Index Example Sequential File 20 10 40 30 60 50 80 70 100 90 Sparse 2nd level 10 30 50 70 90 110 130 150 170 190 210 230 10 90 170 250 330 410 490 570

16 Numeric Example Again uSuppose we put a second-level index on the first-level sparse index uSince first-level index uses 1000 blocks and 100 key-pointer pairs fit per block, we need 10 blocks for second-level index uVery likely to keep the second-level index in memory uThus search requires at most two disk I/O's (one for block of first-level index, one for data block)

17 Duplicate Search Keys uWhat if more than one record has a given search key value? (Then the search key is not a key of the relation.) uSolution 1: Use a dense index and allow duplicate search keys in it. uTo find all data records with search key K, follow all the pointers in the index with search key K

18 Solution 1 Example 10 20 10 30 20 30 45 40 10 20 30 10 20 10 30 20 30 45 40 10 20 30

19 Duplicate Search Keys with Dense Index uSolution 2: only keep record in index for first data record with each search key value (saves some space in the index) uTo find all data records with search key K, follow the one pointer in the index and then move forward in the data file

20 Solution 2 Example 10 20 10 30 20 30 45 40 10 20 30 40

21 Duplicate Search Keys with Sparse Index uRecall that index has an entry for just the first data record in each block uTo find all data records with key K: wfind last entry (E1) in index with key ≤ K wmove toward front of index until reaching entry (E2) with key < K wCheck data blocks pointed to by entries from E2 to E1 for records with search key K

22 Dupl. Keys w/ Sparse Index 10 20 10 30 20 30 45 40 10 20 30 careful if looking for 20 or 30!

23 Variation on Previous Scheme uIndex entry for a data block holds smallest search key that is new (did not appear in a previous block) uIf there is no new search key in that block, then index entry holds the lone search key in the block uTo find all data record with key K: wsearch index for first entry whose key is either K, or K wif a record with key K is in that block then scan forward from there

24 Variation Example 10 20 10 30 20 30 45 40 10 20 30 should this be 40?

25 Inserting and Deleting Data Recall three main techniques: ucreate/delete overflow blocks woverflow blocks do not have entries in a sparse index umay be able to insert new blocks in sequential order wnew block needs an entry in a sparse index wchanging an index can create same problems umake room in a full block by sliding some data to an adjacent block; combine adjacent blocks if they get too empty

26 General Strategy uWhen data file changes, index must adapt uDetails depend on whether index is sparse or dense and how data file modifications are implemented uIndex file is itself sequential, so same strategies as for modifying data files can be applied to index files

27 Effects of Actions on Index ActionDense IndexSparse Index Create empty overflow block none Delete empty overflow block none Create empty (main) block noneinsert Delete empty (main) block nonedelete Insert recordinsertmaybe update Delete recorddeletemaybe update Slide recordupdatemaybe update

28 Explanations for Actions ucreate/destroy empty overflow block has no effect on wdense index since it refers to records wsparse index since it refers to main records ucreate/destroy empty main block: wno effect on dense index as above winsert/delete entry in sparse index uinsert/delete/slide record: winsert/delete/update entry in dense index wonly change sparse index if affects first record in block

29 Deletion from sparse index 20 10 40 30 60 50 80 70 10 30 50 70 90 110 130 150

30 Deletion from sparse index 20 10 40 30 60 50 80 70 10 30 50 70 90 110 130 150 – delete record 40

31 Deletion from sparse index 20 10 40 30 60 50 80 70 10 30 50 70 90 110 130 150 – delete record 30 40

32 Deletion from sparse index 20 10 40 30 60 50 80 70 10 30 50 70 90 110 130 150 – delete records 30 & 40 50 70

33 Deletion from dense index 20 10 40 30 60 50 80 70 10 20 30 40 50 60 70 80

34 Deletion from dense index 20 10 40 30 60 50 80 70 10 20 30 40 50 60 70 80 – delete record 30 40

35 Insertion, sparse index case 20 1030 50 4060 10 30 40 60

36 Insertion, sparse index case 20 1030 50 4060 10 30 40 60 – insert record 34 34 our lucky day! we have free space where we need it!

37 Insertion, sparse index case 20 1030 50 4060 10 30 40 60 – insert record 15 15 20 30 20 Illustrated: Immediate reorganization Variation: – insert new block (chained file) – update index

38 Insertion, sparse index case 20 1030 50 4060 10 30 40 60 – insert record 25 25 overflow blocks (reorganize later...)

39 Insertion, dense index case Similar Often more expensive...

1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina.

Similar presentations

Presentation on theme: "1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina.

Similar presentations

Presentation on theme: "1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina."— Presentation transcript:

Similar presentations

About project

Feedback