Download presentation
Presentation is loading. Please wait.
Published byGavin Rodgers Modified over 5 years ago
1
Indexing, Access and Database System Architecture
CSCI 6442 ©Copyright 2019, David C. Roberts, all rights reserved
2
Agenda Database performance goals DBMS use of disk Searching B-trees
Intro to DBMS Architecture
3
Why Focus on B-Tree? The B-tree is essential to join processing in RDBMS Without the B-Tree, no RDBMS! Performance characteristics of join are the most important factor in overall DBMS performance Many of the performance problems you will see are related to B-tree performance in some way.
4
DBMS Performance Data is stored on disk
Disk is necessary for database to be reliably available Disk is millions of times slower than anything that happens in RAM Number of disk accesses is a good measure of DBMS cost for an operation
5
Disk vs. RAM RAM is accessible in any order
Any sort of structures can be used Data structure courses usually cover data structures for RAM We’ll talk about how to make efficient use of disk
6
Disk as Pages Disk is composed of fixed-length records, rotating around To access information, we need to move the head and wait for the disk to rotate We wait the same time whether we use one byte or all the record We call this fixed length record a page
7
Search Methods Linear search Binary search
Binary tree-structured search N-ary trees B-trees Hashing
8
Linear Search Elements are stored in arrival order
Search starts at the beginning, continues until desired value is found Average number of accesses for n elements is approximately n/2
9
Binary Search Elements are stored in order by value to be searched
Search starts at midpoint With each probe, half of candidates are removed Average number of probes is log2n
10
Disadvantages of Binary Search
Elements must be kept in order Inserting one element may require reorganization of entire list If stored, search jumps from bucket to bucket
11
Using Linked Structure for Binary Search
Using links we can separate physical organization from search sequence Avoids possible need to reorganize the entire store because of a single update Accelerates update, still allows fast search
12
Example Binary Search Tree
13
Problems with Binary Search Tree
Each node is likely to be on a different page, making inefficient use of disk accesses Balance of the tree is also an issue What if, instead of just one key at each node, we could store a whole page full of keys? Then we would use disk efficiently and have a very shallow tree
14
Balance
15
Balance A tree is said to be balanced if the length of all the paths from the root to the leaves differ by no more than one.
16
The problem—how to make efficient use of disk space and also keep the tree balanced. The answer—the b-tree, the brilliant invention of Beyer and McCreight.
17
B-tree We allow nodes to be incompletely filled in order to maintain perfect balance of the tree We grow the tree from the bottom; when a node is over-full we split it and put an added node one level up Deletions are the reverse of additions
18
B-tree
19
B-tree We understand that with each entry there is an address in storage. Having understood that, we omit them from the rest of the diagrams Data Store
20
B-tree
21
B-tree 1
22
B-tree 1 4
23
B-tree 1 4 6
24
B-tree 1 4 6 8
25
B-tree 5 1 4 6 8
26
B-tree When a node is full, to add a value we split the node and put the middle value in the level above. 5 1 4 6 8
27
How It Really Looks
28
B-tree questions How large should node size be? How many values should it contain? Are the values indexed by a b-tree properly called keys? How full are b-tree nodes, on the average, after the system has been operating for a while?
29
B-plus tree Usually the b+-tree, a variation on the b-tree is used
B+ trees have all indexed values represented in the leaves Other nodes do not have pointers to rows, only pointers to other nodes B+ trees provide very high density of indexes
30
B+ tree Some values, and pointers to blocks in the sequence set
Index Set All values, and pointers to rows in the database Sequence Set
31
The insert algorithm for B+ Trees
B+ Tree Add Algorithm The insert algorithm for B+ Trees Leaf Page Full Index Page FULL Action NO Place the record in sorted position in the appropriate leaf page YES Split the leaf page Place Middle Key in the index page in sorted order. Left leaf page contains records with keys below the middle key. Right leaf page contains records with keys equal to or greater than the middle key. Split the leaf page. Records with keys < middle key go to the left leaf page. Records with keys >= middle key go to the right leaf page. Split the index page. Keys < middle key go to the left index page. Keys > middle key go to the right index page. The middle key goes to the next (higher level) index. IF the next level index page is full, continue splitting the index pages.
32
B+ Tree Delete Algorithm
The delete algorithm for B+ Trees Leaf Page Below Fill Factor Index Page Below Fill Factor Action NO Delete the record from the leaf page. Arrange keys in ascending order to fill void. If the key of the deleted record appears in the index page, use the next key to replace it. YES Combine the leaf page and its sibling. Change the index page to reflect the change. Combine the leaf page and its sibling. Adjust the index page to reflect the change. Combine the index page with its sibling. Continue combining index pages until you reach a page with the correct fill factor or you reach the root page.
33
Hashing Develop a function that maps data values into a range of storage addresses For each search value, use a function to compute a hash value and store the associated data at that address To search, just compute the hash value and look at that address
34
Hashing Instead of storing the data at the hash address, store a pointer to the data The table of pointers is called a hash table Using hashing for a search locates a stored value in just one access Number of accesses to locate a value is independent of n
35
Hashing Question Why are b-trees the most used index method for database systems and not hashing, given that hashing is faster? Hint—think about the disadvantages of hashing
36
Introduction to Database System Architecture
37
DBMS Software Architecture
Application Program Buffer System Global Area Database System Application Program Buffer Application Program Buffer Application Program Buffer
38
Query Processing Architecture
SQL Lexical Analyzer Tokens Syntax Analyzer Quads Executor Results
39
Executor Software Architecture
SQL Executor Table Management Index Management Row Management Node Management Page Management Data Store
40
Next week: more about DBMS architecture and performance
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.