Presentation is loading. Please wait.

Presentation is loading. Please wait.

Indexing 4/11/2019.

Similar presentations


Presentation on theme: "Indexing 4/11/2019."— Presentation transcript:

1 Indexing 4/11/2019

2 Index Concept Main idea: A separate data structure used to locate records Many, many, many, many flavors of index organization have been proposed and tried including structures which combine hashing and indexing Various buffering schemes are somewhat orthogonal We’ll focus on General concepts [chapter ] ISAM (indexed sequential) [ch. 5.1] B-trees and B+ trees [ch ] 4/11/2019

3 Index Terminology Most generally, index is a list of value/address pairs Each pair is an index “entry” Value is the index “key” Address will point to a data record, or to a data page There might be many records on a page The assumption is that the value/address pair will be much smaller in size than the full record If index is small, a copy can be maintained in memory! Permanent disk copy is still needed 4/11/2019

4 Key Terminology Index key field Primary index Secondary index
Not necessarily the same as the primary DB key of the table! But called a “key” anyway Primary index Key is the primary (DB) key Only one index per file Secondary index Key is not the primary DB key Could be many indices per file (or none) 4/11/2019

5 More Indexing Terminology
Dense index One index entry for each record (or page) Non-dense or sparse index Less than one index entry for each record Inverted file: File which has a dense secondary index Clustering index Preserves locality: close index entries refer to close data records Multilevel indexing each level is an index to the next level down 4/11/2019

6 Indexing Pitfalls Index itself is a file
Occupies disk space Must worry about maintenance, consistency, recovery, etc. Large indices won't fit in memory May require multiple seeks to locate record entry 4/11/2019

7 Desiderata for Multilevel Indexes
Should support efficient random access Should also support efficient sequential access, if possible Should have low height Should be efficiently updatable Should be storage-efficient Top level(s) should fit in memory 4/11/2019

8 ISAM = Indexed Sequential Access Method
IBM terminology “Indexed Sequential” more general term (non-IBM) ISAM as described in textbook (5.1) is very close to B+ tree simpler versions exist Main idea: maintain sequential file but give it an index Sequentiality for efficient “batch” processing Index for random record access 4/11/2019

9 ISAM Technique Build a dense index of the pages (1st level index)
Sparse from a record viewpoint Then build an index of the 1st level index (2nd level index) Continue recursively until top level index fits on 1 page Some implementations may stop after a fixed # of levels 4/11/2019

10 Updating an ISAM File Data set must be kept sequential
So that it can be processed without the index May have to rewrite entire file to add records Could use overflow pages chained together or in fixed locations (overflow area) Index is usually NOT updated as records are added! Once in a while the whole thing is “reorganized” Data pages recopied to eliminate overflows Index recreated 4/11/2019

11 ISAM Pros, Cons Pro Cons Relatively simple
Great for true sequential access Cons Not very dynamic Inefficient if lots of overflow pages Can only be one ISAM index per file 4/11/2019

12 B-Tree B-Tree is a type of multilevel index
from another standpoint: it's a type of balanced tree Invented in 1972 by Boeing engineers R. Bayer and E. McCreight By 1979: "the standard organization for indexes in a database system" (Comer) 4/11/2019

13 B-Tree Overview Assume for now that keys are fixed-length and unique
A B-tree can be thought of as a generalized binary search tree multiple branches rather than just L or R Trees are always perfectly balanced Some wasted space in the nodes is tolerated 4/11/2019

14 B-Tree Concepts Each node contains
tree (index node) pointers, and key values (with record or page pointers) Given a key K and the two node pointers L and R around it All key values pointed to by L are < K All key values pointed to by R are > K “Order p” means (up to) p tree pointers, (up to) p-1 keys Terminology differs between authors 4/11/2019

15 B+ Tree vs. B-tree Textbook only discusses B+ trees
So do we from now on Two big differences: Original B-trees had record pointers in all of the index nodes; B+ trees only in leaf nodes Given a key K and the two node pointers L and R around it All key values pointed to by L are < K All key values pointed to by R are >= K B+ tree data pages are linked together to form a sequential file Gives the advantages of ISAM In our book, it’s a doubly-linked list 4/11/2019

16 Alternate Views of the Leaf Nodes
[cf. Chapter 4.3.1] Leaf nodes might be actual data pages Leaf nodes might contain pointers to the actual data records or pages For B+ trees, this implies the leaf node format is different from the non-leaf node format may hold different number of entries The leaf nodes can be chained together, regardless of whether the actual data pages are! 4/11/2019

17 B+Tree Growth and Change
The big idea: When a node is full, it splits. middle value is propagated upward If we’re lucky, there’s room for it in the level above two new nodes are at same level as original node Height of tree increases only when the root splits A very nice property This is what keeps the tree perfectly balanced Recommended: split only “on the way down” On deletion: two adjacent nodes recombine if both are < half full 4/11/2019

18 Variations Could redistribute records between adjacent blocks
esp. on deletion (B* tree) Variable order: accommodate varying key lengths Could store the whole record in the index block especially if records are few and small in a B+ tree, this would make sequential access especially efficient 4/11/2019

19 B+ Trees with Other Indices
Suppose you have a B+ tree for the file Leaf nodes of the index are the actual pages of the file, doubly linked together for sequential access Suppose you have some secondary indices What happens when a B+ tree node splits or merges??? 4/11/2019

20 Other Forms of Indexing
Bitmap indexes One index per value (property) of interest One bit per record TRUE if record has a particular property Indexed hash: hash function takes you to an entry in an index allows physical record locations to change Clever indexing schemes are useful in optimizing complex queries 4/11/2019


Download ppt "Indexing 4/11/2019."

Similar presentations


Ads by Google