Physical Storage Structures

Physical Storage Structures

Objectives Why it matters DBMSs different “under the hood”
Databases and Files Page Structure Heaps B-Tree indexes Index behaviour Buffers Summary CSD305 Advanced Databases

Why physical storage matters
Relational databases hide physical storage details from the users Physical data independence provided DBMS makes use of physical storage structures to access, manage and maintain the data (and metadata) Query Optimizer needs to know about the physical structures to optimize its Query Plans CSD305 Advanced Databases

DBMSs differ at the physical level
Relational DBMSs can appear very similar at a logical level They’re often very different at a physical level CSD305 Advanced Databases

How queries makes use of physical details
SQL query sent to DBMS DBMS accesses metadata (system catalogs) to determine physical details of underlying tables, indexes etc. Query optimizer uses this information to plan query DBMS exploits physical structures to execute efficient query CSD305 Advanced Databases

Database Files CSD305 Advanced Databases

Databases and Files in SQL Server
One database maps to One primary file (.mdf extension) N secondary files (.ndf extension) At least one log file (.ldf extension) Disk space in SQL Server is divided into pages Each file is split into a number of 8K pages 128 pages per one Megabyte Log files do not contain pages They contain a series of log records CSD305 Advanced Databases

Logical and Physical File Names
SQL Server files have two names logical_file_name The name used to refer to the physical file in all SQL statements. os_file_name The name of the physical file including the directory path. It must follow the rules for the operating system file names. CSD305 Advanced Databases

Database Page Structure
Page Header Data rows 8192 bytes CSD305 Advanced Databases page size 8 KB. So SQL Server databases have 128 pages per megabyte. Each page has 96-byte header stores system information. includes the page number, page type, the amount of free space on the page, and the allocation unit ID of the object that owns the page. Row offset array

The Page Header Chains the pages together using pointers
Stores a variety of housekeeping information 96 bytes in total Page Header Page Header Page Header Data rows Data rows Data rows CSD305 Advanced Databases 8 pages chained together is an extent when more space required a new extent is created Row offset array Row offset array Row offset array

The Row offset array An array of 2-byte slots
One slot for each data row Each slot holds offset of a data row Order of slots determines logical order of rows Page Header CSD305 Advanced Databases Data rows A row offset table end of the page, one entry for each row. Row offset array

CSD305 Advanced Databases
Data rows on page serially reverse sequence from the sequence of the rows on the page

Organization of Records in Files
Heap – a record can be placed anywhere in the file where there is space Sequential – store records in sequential order, based on the value of the search key of each record Hashing – a hash function computed on some attribute of each record; the result specifies in which block of the file the record should be placed Indexed Sequential – Combines Indexed and Sequential file organization Records of each relation may be stored in a separate file. In a clustering file organization records of several different relations can be stored in the same file CSD305 Advanced Databases

HEAP table behaviour Default structure – the HEAP
Collection of pages filled with rows A record can be placed anywhere in the file where there is space New set of pages (an extent) added as required A HEAP table has no primary key Rows can be accessed in two ways Serial scan of all pages Through a type of bookmark known as a RID comprising a file number, page number and slot number If a row grows in size, the extra data is put on a new page and a forwarding pointer is used to connect CSD305 Advanced Databases

Sequential File Organization
Suitable for applications that require sequential processing of the entire file The records in the file are ordered by a search-key CSD305 Advanced Databases

Sequential File Organization (Cont.)
Deletion – use pointer chains Insertion –locate the position where the record is to be inserted if there is free space insert there if no free space, insert the record in an overflow block In either case, pointer chain must be updated Need to reorganize the file from time to time to restore sequential order CSD305 Advanced Databases

Clustering File Organization
Simple file structure stores each relation in a separate file Can instead store several relations in one file using a clustering file organization E.g., clustering organization of student and course: CSD305 Advanced Databases Useful where regular good for queries involving students courses bad for queries involving only student - results in variable size records

Indexes - Basic Concepts
Indexing mechanisms are used to speed up access to desired data. Search Key - attribute to set of attributes used to look up records in a file. An index file consists of records (called index entries) of the form Index files are typically much smaller than the original file In ordered indices search keys are stored in sorted order CSD305 Advanced Databases search-key pointer

Index Types SQL Server provides three types of indexes
Clustered Nonclustered Full text One clustered index per table Data is maintained in clustered index order 248 nonclustered indexes per table Nonclustered indexes maintain pointers to rows Full text is beyond scope CSD305 Advanced Databases

B+-tree indices are an alternative to indexed-sequential files.
B+-Tree Index Files B+-tree indices are an alternative to indexed-sequential files. Disadvantage of indexed-sequential files: performance degrades as file grows, since many overflow blocks get created. Periodic reorganization of entire file is required. Advantage of B+-tree index files: automatically reorganizes itself with small, local, changes, in the face of insertions and deletions. Reorganization of entire file is not required to maintain performance. Disadvantage of B+-trees: extra insertion and deletion overhead, space overhead. Advantages of B+-trees outweigh disadvantages, and they are used extensively. CSD305 Advanced Databases

Physical Table Organization
Alternative structure – the Clustered Index Double-linked list of pages Key values determine order of data rows through the list of pages A B-Tree index is constructed to reference the data pages The database administrator decides on the clustered index key It's often the Primary Key But it doesn't need to be It doesn't even need to be unique (the index manager will internally add a "uniquifier" CSD305 Advanced Databases

Clustered Index Properties
Remember that "clustered index" is a table structure there can be only one such index per table The clustered index provides Direct access by key value Direct access by part key value (leftmost part) Sequential access in key sequence A new row will be inserted into its target page If it doesn't fit, then page splitting will occur CSD305 Advanced Databases

Clustered Index Mechanism
With a clustered index, there will be one entry on the last intermediate index level page for each data page The data page is the leaf or bottom level of the index (Assume a clustered index on last name) CSD305 Advanced Databases

B-Trees for clustered indexes
A02 D34 … Root Page A02 A12 B09 D16 D34 … Index Pages A02 Hanif 4 A06 Ines 2 A11 Zeke 2 A12 Anne 1 A20 Maria 4 B01 Ali B04 Betty 1 B09 Kalen 2 B10 Alexi 3 B70 Olga 2 C12 Pat 3 D16 Mo 1 D17 Ali D22 Bert 4 D34 Amy 2 E11 Art 2 E23 Kim 4 E54 Tay 4 … Data Pages CSD305 Advanced Databases

Page Splitting (before)
D34 … A02 A12 B09 D16 New row to be inserted CSD305 Advanced Databases B77 Sue 1 B09 Kalen 2 B10 Alexi 3 B70 Olga 2 C12 Pat 3 the target page

Page Splitting (after)
B09 D34 … Note that the index has split too! A02 A12 B09 B70 D16 CSD305 Advanced Databases The data has been distributed between the original page and a new page B09 Kalen 2 B10 Alexi 3 B70 Olga 2 B77 Sue 1 C12 Pat 3

Non-clustered Indexes
Unlike the clustered index that determines the table structure, non-clustered indexes have no effect on the data table structure Non-clustered indexes are B-Trees constructed from two types of page Index pages Leaf pages Each leaf page holds a number of entries containing: A key value for a data row A bookmark CSD305 Advanced Databases

Nonclustered Index Mechanism
The nonclustered index has an extra, leaf level for page / row pointers Data placement is not affected by non-clustered indexes (Assume an NCI on first name) CSD305 Advanced Databases

B-Trees for non-clustered indexes
Alexi Tay … Root Page B-Trees for non-clustered indexes Alexi Anne Hanif Maria Tay … Index Pages Leaf Pages Tay Zeke Alexi Ali Ali Amy Anne Art Bert Betty Hanif Ines Kalen Kim Maria Mo Olga Pat CSD305 Advanced Databases A02 Hanif 4 A06 Ines 2 A11 Zeke 2 A12 Anne 1 A20 Maria 4 B01 Ali B04 Betty 1 B09 Kalen 2 B10 Alexi 3 B70 Olga 2 C12 Pat 3 D16 Mo 1 D17 Ali D22 Bert 4 D34 Amy 2 E11 Art 2 E23 Kim 4 E54 Tay 4 … Data Pages

How does the non-clustered index work?
Given a search task that involves a key value, or partial key value The database engine follows the index structure to the appropriate leaf page The leaf page entry bookmark is used to locate the data record CSD305 Advanced Databases

What is a bookmark? A bookmark is a means of locating a data record
There are two forms of bookmark RID (row identifier) bookmarks are made up of File number Page number Row number Clustered index key value (note that where this is the case a search that uses the non-clustered index will also use the clustered index!) CSD305 Advanced Databases

Choosing indexes Remember you can have one clustered index and many non-clustered indexes The clustered index provides the most efficient direct access Choose the appropriate key for frequent queries Every index has the potential to Speed up retrieval Slow down inserts, updates and deletes So you need to weigh the costs and benefits CSD305 Advanced Databases

Creating indexes in SQL
No duplicate key values allowed the name of the index CREATE UNIQUE CLUSTERED INDEX empNoIndex ON Employees (empNo); CREATE NONCLUSTERED INDEX empNameIndex ON Employees (empName) WITH FILLFACTOR = 70 DROP INDEX Employees.myOldIndex the key CSD305 Advanced Databases Make it 70% full to allow for growth

The Buffer Pool The DBMS maintains a collection of pages in main memory – known as the buffer pool Pages are retrieved from the buffer pool in preference to disk Because it’s quicker to access RAM than disk In fact, you can only access a data page if it exists in memory As the database is used, the buffer pool fills with data pages CSD305 Advanced Databases

The Buffer Pool DBMS Database on disk First request Second request
CSD305 Advanced Databases Buffer Pool

Buffer Pool Management
Buffer pool pages must be available for pages to be read into The “Lazywriter” process ensures that there is free space The Lazywriter sweeps the buffer pool at regular intervals It checks whether page has been referenced recently If it hasn’t then it makes the buffer slot available on the free list (if the page is dirty, it writes it back to disk first) CSD305 Advanced Databases

Optimizer Selection Criteria
During the index selection phase of optimization the optimizer decides which (if any) indexes best resolve the query Identify which indexes match the clauses Estimate rows to be returned Estimate page reads CSD305 Advanced Databases

on employee (minit, job_id , job_lvl)
Composite Indexes Composite (compound) indexes may be selected by the server if the first column of the index is specified in a where clause, or if it is a clustered index create index idx1 on employee (minit, job_id , job_lvl) CSD305 Advanced Databases

Composite Indexes (Cont’d)
create index idx1 on employee (minit, job_id , job_lvl) Which queries may use the index? select * from employee where minit = 'A' and job_id != 4 and job_lvl = 135 where job_id != 4 select * from employee where minit = 'A' CSD305 Advanced Databases

Summary Every DBMS is different at a physical level
You need to understand the physical structures to understand how queries work and improve database performance CSD305 Advanced Databases

Physical Storage Structures

Similar presentations

Presentation on theme: "Physical Storage Structures"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Physical Storage Structures

Similar presentations

Presentation on theme: "Physical Storage Structures"— Presentation transcript:

Similar presentations

About project

Feedback