Presentation is loading. Please wait.

Presentation is loading. Please wait.

Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.

Similar presentations


Presentation on theme: "Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access."— Presentation transcript:

1 Marwan Al-Namari Hassan Al-Mathami

2 Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access to desired data in a database. E.g., author catalog in library Popular indices : balanced trees, B+ trees and hashes.

3 Indexing (ISAM&B-Tree) Basic Concepts Indexed Sequential Access Method(ISAM) Ordered Indices Multilevel Index B+-Tree Index Files B-Tree Index Files

4 Basic Concepts Search Key - attribute to set of attributes used to look up records in a file. Pointer - An index file consists of records (called index entries) of the form Index files are typically much smaller than the original file Two basic kinds of indices: Ordered indices: search keys are stored in sorted order Hash indices: search keys are distributed uniformly across “buckets” using a “hash function”. search-key pointer

5 Indexed Sequential Access Method Data Page 1 Data Page 2 Data Page 3 :: Data Page N-1 Data Page N If our large database is sorted, we can speed up search by doing binary search on the entire database. However this means we must do log(N) disk accesses… The idea of ISAM is to do a faster, approximate binary search in main memory, and use this information to do fewer disk accesses (usually only one).

6 Indexed Sequential Access Method K P An index entry is a pair, where key is the value of the first key on the page, and pointer, points to the page. Example Data Page 7 Maggieq4 Manjulap3 Marged5 Montyf4 Maggie page 7

7 Indexed Sequential Access Method An index file is a concatenation of index entries. Together with one extra pointer at the beginning. Example K1 P1K2 P2K3 P3 K3 P0 Maggie page 2Waylon page 3Maggie page 1

8 Indexed Sequential Access Method 7 P4 K3 P3 Every record pointed to by this pointer has a value greater that or equal to 7 Every record pointed to by this pointer has a value less that 7 Lets look at an index file (this one is the smallest possible example)

9 Indexed Sequential Access Method Data Page 1 Data Page 2 Data Page 3 :: Data Page N-1 Data Page N 125 16 19 :: p p p p p p Index File Data Files Instead of doing binary search on the data files, we can do binary search on the index, to find the largest value, which is equal to or less than the search key. We then use the pointer to go to disk to retrieve the relevant block from disk. Example: we are searching for 8, we do a binary search to find 5, we retrieve page 2, and search it to find a match (if there is one). Find the largest entry less than the key, follow right child. Find the smallest entry greater than or equal to the key, follow left child.

10 Indexed Sequential Access Method Data Page 1 Data Page 2 Data Page 3 :: Data Page N-1 Data Page N 52 34 77 :: p p p p p p Index File Data Files How big should the index file be? How about more pointers per page? We could have two pointers to each page (on average, or exactly). This does not help, because we have to retrieve a block at a time.

11 Indexed Sequential Access Method How big should the index file be? How about less pointers per page? We could have a pointer for each two pages (on average, or exactly). This might help, because it makes the index smaller. We can do a little trick of adding “sideways” pointers. Actually, these “sideways” pointer can be useful for another reason, they can be helpful for range queries. However, to few pointers leads to chaining…. Data Page 1 Data Page 2 Data Page 3 :: Data Page N-1 Data Page N 125 16 19 :: p p p p p p Index File Data Files

12 Indexed Sequential Access Method Data Page 1 Data Page 2 Data Page 3 :: Data Page N-1 Data Page N 125 16 19 :: p p p p p p Index File Data Files We have seen that too small or too large an index (in other words too few or too many pointers) can be a problem. But suppose the index does not fit in main memory? The key observation is that the index itself is a sort of database, so lets build an index on the index! 21 p

13 Ordered Indices In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library. Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file. Also called clustering index The search key of a primary index is usually but not necessarily the primary key. Secondary index: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index. Index-sequential file: ordered sequential file with a primary index. Indexing techniques evaluated on basis of:

14 Multilevel Index If primary index does not fit in memory, access becomes expensive. To reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file and construct a sparse index on it. outer index – a sparse index of primary index inner index – the primary index file If even outer index is too large to fit in main memory, yet another level of index can be created, and so on. Indices at all levels must be updated on insertion or deletion from the file.

15 Multilevel Index (Cont.)

16 B + -Tree Index Files Disadvantage of indexed-sequential files: performance degrades as file grows, since many overflow blocks get created. Periodic reorganization of entire file is required. Advantage of B + -tree index files: automatically reorganizes itself with small, local, changes, in the face of insertions and deletions. Reorganization of entire file is not required to maintain performance. Disadvantage of B + -trees: extra insertion and deletion overhead, space overhead. Advantages of B + -trees outweigh disadvantages, and they are used extensively. B + -tree indices are an alternative to indexed-sequential files.

17 B + -Tree Index Files (Cont.) All paths from root to leaf are of the same length Each node that is not a root or a leaf has between [n/2] and n children. A leaf node has between [(n–1)/2] and n–1 values Special cases: If the root is not a leaf, it has at least 2 children. If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 and (n–1) values. A B + -tree is a rooted tree satisfying the following properties:

18 B + -Tree Node Structure Typical node K i are the search-key values P i are pointers to children (for non-leaf nodes) or pointers to records or buckets of records (for leaf nodes). The search-keys in a node are ordered K 1 < K 2 < K 3 <... < K n–1

19 Leaf Nodes in B + -Trees For i = 1, 2,..., n–1, pointer P i either points to a file record with search-key value K i, or to a bucket of pointers to file records, each record having search-key value K i. Only need bucket structure if search-key does not form a primary key. If L i, L j are leaf nodes and i < j, L i ’s search-key values are less than L j ’s search-key values P n points to next leaf node in search-key order Properties of a leaf node:

20 Non-Leaf Nodes in B + -Trees Non leaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf node with m pointers: All the search-keys in the subtree to which P 1 points are less than K 1 For 2  i  n – 1, all the search-keys in the subtree to which P i points have values greater than or equal to K i–1 and less than K m–1

21 Example of a B + -tree B + -tree for account file (n = 3)

22 Example of B + -tree Leaf nodes must have between 2 and 4 values (  (n–1)/2  and n –1, with n = 5). Non-leaf nodes other than root must have between 3 and 5 children (  (n/2  and n with n =5). Root must have at least 2 children. B + -tree for account file (n - 5)

23 Updates on B + -Trees: Insertion B + -Tree before and after insertion of “Clearview”

24 B-Tree Index Files Nonleaf node – pointers B i are the bucket or file record pointers. nSimilar to B+-tree, but B-tree allows search-key values to appear only once; eliminates redundant storage of search keys. nSearch keys in nonleaf nodes appear nowhere else in the B- tree; an additional pointer field for each search key in a nonleaf node must be included. nGeneralized B-tree leaf node

25 B-Tree Index File Example B-tree (above) and B+-tree (below) on same data

26 B-Tree Index Files (Cont.) Advantages of B-Tree indices: May use less tree nodes than a corresponding B + -Tree. Sometimes possible to find search-key value before reaching leaf node. Disadvantages of B-Tree indices: Only small fraction of all search-key values are found early Non-leaf nodes are larger, so fan-out is reduced. Thus B- Trees typically have greater depth than corresponding B + -Tree Insertion and deletion more complicated than in B + -Trees Implementation is harder than B + -Trees. Typically, advantages of B-Trees do not out weigh disadvantages.

27 B-Trees27 Type #1: Simple leaf deletion 122952 27915225669723143 Delete 2: Since there are enough keys in the node, just delete it Assuming a 5-way B-Tree, as before... Note when printed: this slide is animated

28 B-Trees28 Type #2: Simple non-leaf deletion 122952 7915225669723143 Delete 52 Borrow the predecessor or (in this case) successor 56 Note when printed: this slide is animated

29 B-Trees29 Type #4: Too few keys in node and its siblings 122956 79152269723143 Delete 72 Too few keys! Join back together Note when printed: this slide is animated

30 B-Trees30 Type #4: Too few keys in node and its siblings 1229 79152269563143 Note when printed: this slide is animated

31 B-Trees31 Type #3: Enough siblings 1229 79152269563143 Delete 22 Demote root key and promote leaf key Note when printed: this slide is animated

32 B-Trees32 Type #3: Enough siblings 12 297915 31 695643 Note when printed: this slide is animated

33 Binary Trees VS. BTrees Binary tree only have 2 children max. For large files binary tree will be too high because of the limit of children and not enough keys per records. Btrees disk size can have many children depending on the disk block. Btrees are more realistic for indexing files because they easily maintain balance and can store many keys in only a few records.

34 B+ VS. B- Trees B+ trees store redundant search key values because index is smaller. In a B+ tree, all pointers to data records exists at the leaf-level nodes. B-tree eliminates redundancy but require additional pointers to do so. In a B-tree, pointers to data records exist at all levels of the tree.

35 An Animation of B-tree Algorithm http://ats.oka.nu/b-tree/b-tree.html Also watch the Youtube video: B-Trees https://www.youtube.com/watch?v=HZRPa0kMOZE


Download ppt "Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access."

Similar presentations


Ads by Google