1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 February 19, 2004 INDEXING I Lecture based on [GUW,

1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 February 19, 2004 INDEXING I Lecture based on [GUW, 13.0-13.2] Slides based on Notes 04: Indexing for Stanford CS 245, fall 2002 by Hector Garcia-Molina

2 Today  Why indexing?  Conventional indexes (dense/sparse)  Multi-level indexes  Secondary indexes  Next time: B-tree and hash indexes

3 Why indexing?  Common queries involve conditions on the values of attributes, e.g. SELECT * FROM R WHERE a=11 SELECT * FROM R WHERE 0<=b and b<42  indexing an attribute (or set of attributes) speeds up finding tuples with specific values. (And gives other speed-ups as well.)  Conceptually similar to index in a book.

4 Problem session  Consider an index data structure "similar to" the index in a book.  How many steps does it take to find the occurrences of a specific term?  What about the number of I/Os?  Does your encyclopaedia have an index?

5 Sequential File 20 10 40 30 60 50 80 70 100 90

6 Sequential File 20 10 40 30 60 50 80 70 100 90 Dense Index 10 20 30 40 50 60 70 80 90 100 110 120

7 Sequential File 20 10 40 30 60 50 80 70 100 90 Sparse Index 10 30 50 70 90 110 130 150 170 190 210 230

8 Sequential File 20 10 40 30 60 50 80 70 100 90 Sparse 2nd level 10 30 50 70 90 110 130 150 170 190 210 230 10 90 170 250 330 410 490 570

9 Sparse vs. Dense Trade-off  Sparse: Less index space per record can keep more of index in memory  Dense: Can tell if any record exists without accessing file (Later:  sparse better for insertions  dense needed for secondary indexes)

10 Summary of terms  Index on sequential file  Search key (can be  primary key)  Primary index (on sequencing field) -secondary index works on other fields  Dense index (all search key values in)  Sparse index (one search key/ block)  Multi-level index (index on index)

11 Next:  Duplicate keys  Deletion/Insertion  Secondary indexes

12 Duplicate keys 10 20 10 30 20 30 45 40

13 10 20 10 30 20 30 45 40 10 20 30 10 20 10 30 20 30 45 40 10 20 30 Dense index, one way to implement? Duplicate keys

14 10 20 10 30 20 30 45 40 10 20 30 40 Dense index, better way? Duplicate keys What assumption is made here?

15 Sparse index, one way? Duplicate keys 10 20 10 30 20 30 45 40 10 20 30 careful if looking for 20 or 30!

16 Sparse index, another way? Duplicate keys 10 20 10 30 20 30 45 40 10 20 30 – place first new key from block should this be 40?

17 Deletion from sparse index 20 10 40 30 60 50 80 70 10 30 50 70 90 110 130 150  delete record 40

18 Deletion from sparse index 20 10 40 30 60 50 80 70 10 30 50 70 90 110 130 150  delete record 30 40

19 Deletion from sparse index  delete records 30 & 40 20 10 40 30 60 50 80 70 10 30 50 70 90 110 130 150 50 70

20 Deletion from dense index  delete record 30 20 10 40 30 60 50 80 70 10 20 30 40 50 60 70 80 40

21 Insertion (sparse index case) 20 1030 50 4060 10 30 40 60  insert record 34 34  our lucky day! we have free space where we need it!

22 Insertion - using overflow blocks Problem: Overflow blocks take longer to access 20 1030 50 4060 10 30 40 60  insert record 25 25 overflow blocks (reorganize later...)

23 Insertion - immediate reorganization  In general: Use same technique as for inserting in linked list. 20 1030 50 4060 10 30 40 60  insert record 15 15 20 30 20

24 Secondary indexes Sequence field  Sparse index 50 30 70 20 40 80 10 100 60 90 30 20 80 100 90... does not make sense!

25 Secondary indexes Sequence field 50 30 70 20 40 80 10 100 60 90  Dense index 10 20 30 40 50 60 70... 10 50 90... sparse high level

26 Duplicate values & secondary indexes 10 20 40 20 40 10 40 10 40 30 10 20 30 40... one option... Problem: Uses more space than necessary

27 Duplicate values & secondary indexes 10 20 40 20 40 10 40 10 40 30 10 20 30 40 50 60... buckets

28 Summary  Indexes allow finding a particular attribute value in a few I/Os.  Unresolved problems regarding insertions and deletions.  Next time: Also obtaining efficient updates (using B-trees). Hash indexes - sometimes more efficient.

1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 February 19, 2004 INDEXING I Lecture based on [GUW,

Similar presentations

Presentation on theme: "1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 February 19, 2004 INDEXING I Lecture based on [GUW,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 February 19, 2004 INDEXING I Lecture based on [GUW,

Similar presentations

Presentation on theme: "1 Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 February 19, 2004 INDEXING I Lecture based on [GUW,"— Presentation transcript:

Similar presentations

About project

Feedback