CS 540 Database Management Systems

Name: CS 540 Database Management Systems
Uploaded: 2017-12-05T22:34:04+00:00
Duration: PTM27S24
Channel: Marjorie Hart
Description: CS 540 Database Management Systems

CS 540 Database Management Systems
Lecture 5: DBMS Architecture, storage, and access methods

Database System Implementation
User Requirements Conceptual Design Physical Storage Schema Entity Relationship(ER) Model Relational Model Files and Indexes

The advantage of RDBMS It separates logical level (schema) from physical level (implementation). Physical data independence Users do not worry about how their data is stored and processes on the physical devices. It is all SQL! Their queries work over (almost) all RDBMS deployments.

Challenges in physical level
Processor: – MIPS Main memory: around 10 Gb/ sec. Secondary storage: higher capacity and durability Disk random access Seek time + rotational latency + transfer time Seek time: 4 ms ms! Rotational latency: 2 ms – 7 ms! Transfer time: at most 1000 Mb/ sec Read, write in blocks.

Gloomy future: Moor’s law
Speed of processors and cost and maximum capacity of storage increase exponentially over time. But storage (main and secondary) access time grows much more slowly.

Random access versus sequential access
Disk random access : Seek time + rotational latency + transfer time. Disk sequential access: reading blocks next to each other No seek time or rotational latency Much faster than random access

DBMS Architecture User/Web Forms/Applications/DBA query transaction
Process manager Query Parser Transaction Manager Query Rewriter Logging & Recovery Query Optimizer Lock Manager Query Executor Files & Access Methods Lock Tables Buffers Buffer Manager Main Memory Storage Manager Storage

DBMS Architecture User/Web Forms/Applications/DBA query transaction
Process manager Query Parser Transaction Manager Query Rewriter Logging & Recovery Query Optimizer Lock Manager Query Executor Files & Access Methods Lock Tables Buffers Buffer Manager This lecture Main Memory Storage Manager Storage

A Design Dilemma To what extent should we reuse OS services?
Reuse as much as we can Performance problem (inefficient) Lack of control (incorrect crash recovery) Replicating some OS functions (“mini OS”) Have its own buffer pool Directly manage record structures with files …

OS vs. DBMS Similarities?
What do they manage? What do they provide?

OS vs. DBMS: Similarities
Purpose of an OS: managing hardware presenting interface abstraction to applications DBMS is in some sense an OS? DBMS manages data Both as API for application development!

OS vs. DBMS: Related Concepts
Process Management  What DB concepts? process synchronization deadlock handling Storage management  What DB concepts? virtual memory file system

OS vs. DBMS: Differences?

OS vs. DBMS: Differences
DBMS: Top-down to encapsulate high-level semantics! Data data with particular logical structures Queries query language with well defined operations Transactions transactions with ACID properties OS: Bottom-up to present low-level hardware

Problems with DBMS on top of OS
Buffer pool management File system Process management Consistency control Paged virtual memory

Buffer Pool Management
Performance of system calls LRU replacement Query-aware replacement needed for performance Circular access: 1, 2, …, n, 1, 2, .. Prefetching DBMS knows exactly which block is to be fetched next Crash recovery Need “selected force out”

Relations vs. File system
Data object abstraction file: array of characters relation: set of tuples Physical contiguity: large DB files want clustering of blocks sol1: managing raw disks by DBMS sol2: simulate by managing free spaces in DBMS Multiple trees (access methods) file access: directory hierarchy (user access method) block access: inodes tuple access: DBMS indexes - Sol2: DBMS asks OS for large-than-needed-now chunks, and manage space within DBMS

Process management Reuse OS process management
One process for each user Problem: DB processes are large long time to switch between processes Problem: critical sections Processes may have to wait for a descheduled process that has locks. n server processes that handle users’ requests duplication of OS multi-tasking inside servers! communication between processes: Message passing is not efficient Solutions: OS implements favored processes not forced out, relinquish the control voluntarily. faster message passing methods.

Consistency control OS provides some support for locking and recovery.
OS provides lock on files DB requires lock on smaller units like tuples Commit point Buffer manager ensures all changes are flushed on disk. Buffer manager must know the inside of transactions.

State of the art DBMSs duplicate some OS functionalities.
OS customized for DBMS

Access methods The methods that RDBMS uses to retrieve the data.
Attribute value(s)  Tuple(s)

Types of search queries
Point query over Product(name, price) Select * From Product Where name = ‘IPad-Pro’; Range query over Product(name, price) Select * Where price > 2 AND price < 10;

Types of access methods
Full table scan Inefficient for both point and range queries. Sequential access Efficient for both point and range queries. Should keep the file sorted. Inefficient to maintain Middle ground?

Indexing An old idea

Index A data structure that speeds up selecting tuples in a relation based on some search keys. Search key A subset of the attributes in a relation May not be the same as the (primary) key Entries in an index (k, r) k is the search key. r is the pointer to a record (record id).

Index Data file stores the table data.
Index file stores the index data structure. Index file is smaller than the data file. Ideally, the index should fit in the main memory. Index File Data File 10 20 10 20 30 40 30 40 50 60 70 80 50 60

Well known index structures
B+ trees: very popular Hash tables: Not frequently used

B+ trees The index of a very large data file gets too large.
How about building an index for the index file? A multi-level index, or a tree

B+ trees Degree of the tree: d
Each node (except root) stores [d, 2d] keys: Non-leaf nodes 10 32 94 [A , 10) [10, 32) [32, 94) [94, B) Leaf nodes 12 28 32 39 41 65 Records 12 28 32

Example d = 2 60 19 50 80 90 110 12 13 17 19 21 30 40 50 52 60 65 72 12 13 17 19 21 30 40 50 52 60 65 72

Retrieving tuples using B+ tree
Point queries Start from the root and follow the links to the leaf. Range queries Find the lowest point in the range. Then, follow the links between the nodes. The top levels are kept in the buffer pool.

Inserting a new key Pick the proper leaf node and insert the key.
If the node contains more than 2d keys, split the node and insert the extra node in the parent. If leaf level, add K3 to the right node (K3, ) parent K1 K2 K3 K4 K5 R0 R1 R2 R3 R4 R5 K1 K2 R0 R1 R2 K4 K5 R3 R4 R5

Example Insert K = 18 60 19 50 80 90 110 12 13 17 19 21 30 40 50 52 60 65 72 12 13 17 19 21 30 40 50 52 60 65 72

Insertion Insert K = 18 60 19 50 80 90 110 12 13 17 18 19 21 30 40 50 52 60 65 72 12 13 17 18 19 21 30 40 50 52 60 65 72

Insertion Insert K= 20 60 19 50 80 90 110 12 13 17 18 19 20 21 30 40 50 52 60 65 72 12 13 17 18 19 20 21 30 40 50 52 60 65 72

Insertion Need to split the node 60 19 50 80 90 110 12 13 17 18 19 20
21 30 40 50 52 60 65 72 12 13 17 18 19 20 21 30 40 50 52 60 65 72

Insertion Split and update the parent node.
What if we need to split the root? 60 19 21 50 80 90 110 12 13 17 18 19 20 21 30 40 50 52 60 65 72 12 13 17 18 19 20 21 30 40 50 52 60 65 72

Deletion Delete K = 21 60 19 21 50 80 90 110 12 13 17 18 19 20 21 30 40 50 52 60 65 72 12 13 17 18 19 20 21 30 40 50 52 60 65 72

Deletion Note: K = 21 may still remain in the internal levels 60 19 21
50 80 90 110 12 13 17 18 19 20 30 40 50 52 60 65 72 12 13 17 18 19 20 30 40 50 52 60 65 72

Deletion Delete K = 20 60 19 21 50 80 90 110 12 13 17 18 19 20 30 40 50 52 60 65 72 12 13 17 18 19 20 30 40 50 52 60 65 72

Deletion We need to update the number of keys on the node:
Borrow from siblings: rotate 60 19 21 50 80 90 110 12 13 17 18 19 30 40 50 52 60 65 72 12 13 17 18 19 30 40 50 52 60 65 72

Deletion We need to update the number of keys on the node:
Borrow from siblings: rotate 60 18 21 50 80 90 110 12 13 17 18 19 30 40 50 52 60 65 72 12 13 17 18 19 30 40 50 52 60 65 72

Deletion What if we cannot borrow from siblings?
Example: delete K = 30 60 18 21 50 80 90 110 12 13 17 18 19 30 40 50 52 60 65 72 12 13 17 18 19 30 40 50 52 60 65 72

Deletion What if we cannot borrow from siblings? Merge with a sibling.
60 18 21 50 80 90 110 12 13 17 18 19 40 50 52 60 65 72 12 13 17 18 19 40 50 52 60 65 72

Deletion What if we cannot borrow from siblings? Merge siblings! 60 18
21 50 80 90 110 12 13 17 18 19 40 50 52 60 65 72 12 13 17 18 19 40 50 52 60 65 72

Deletion What to do with the dangling key and pointer? simply remove them 60 18 21 50 80 90 110 12 13 17 18 19 40 50 52 60 65 72 12 13 17 18 19 40 50 52 60 65 72

Deletion Final tree 60 18 50 80 90 110 12 13 17 18 19 40 50 52 60 65 72 12 13 17 18 19 40 50 52 60 65 72

What You Should Know What are some major limitations of services provided by an OS in supporting a DBMS? In response to such limitations, what does a DBMS do? B+ tree indexing

CS 540 Database Management Systems

Similar presentations

Presentation on theme: "CS 540 Database Management Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 540 Database Management Systems

Similar presentations

Presentation on theme: "CS 540 Database Management Systems"— Presentation transcript:

Similar presentations

About project

Feedback