5. Disk, Pages and Buffers Why Not Store Everything in Main Memory

Slides:



Advertisements
Similar presentations
The Bare Basics Storing Data on Disks and Files
Advertisements

Storing Data: Disk Organization and I/O
Storing Data: Disks and Files Chapter 9
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 9 Yea, from the table of my memory Ill wipe away.
1 Storing Data: Disks and Files Chapter 7. 2 Disks and Files v DBMS stores information on (hard) disks. v This has major implications for DBMS design!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke, edited K/Shomper1 Storing Data: Disks and Files Chapter 9 Yea, from the table of my memory.
Storing Data: Disks and Files
Storing Data: Disks and Files
CPSC 404, Laks, based on Ramakrishnan & Gherke and Garcia-Molina et al.1 Review 1-- Storing Data: Disks and Files Chapter 9 [Sections : Ramakrishnan.
1 Storing Data Disks and Files Yea, from the table of my memory Ill wipe away all trivial fond records. -- Shakespeare, Hamlet.
FILES (AND DISKS).
Lecture # 7.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7 Yea, from the table of my memory Ill wipe away all.
Buffer Management Notes Adapted from Prof Joe Hellersteins notes
Introduction to Database Systems1 Buffer Management Storage Technology: Topic 2.
DBMS Storage and Indexing
CS4432: Database Systems II Buffer Manager 1. 2 Covered in week 1.
Storing Data: Disks and Files: Chapter 9
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
Storing Data: Disks and Files
Buffer management.
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Murali Mani Overview of Storage and Indexing (based on slides from Wisconsin)
Storing Data: Disks and Files Lecture 3 (R&G Chapter 9) “Yea, from the table of my memory I’ll wipe away all trivial fond records.” -- Shakespeare, Hamlet.
Storing Data: Disks and Files Lecture 5 (R&G Chapter 9) “Yea, from the table of my memory I’ll wipe away all trivial fond records.” -- Shakespeare, Hamlet.
The Relational Model (cont’d) Introduction to Disks and Storage CS 186, Spring 2007, Lecture 3 Cow book Section 1.5, Chapter 3 (cont’d) Cow book Chapter.
Storing Data: Disks and Files Lecture 3 (R&G Chapter 9) “Yea, from the table of my memory I’ll wipe away all trivial fond records.” -- Shakespeare, Hamlet.
1 Database Systems November 12/14, 2007 Lecture #7.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 9.
Storage and File Structure. Architecture of a DBMS.
Database Management 6. course. OS and DBMS DMBS DB OS DBMS DBA USER DDL DML WHISHESWHISHES RULESRULES.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
Physical Storage Susan B. Davidson University of Pennsylvania CIS330 – Database Management Systems November 20, 2007.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7 “ Yea, from the table of my memory I ’ ll wipe away.
1 Storing Data: Disks and Files Chapter 9. 2 Disks and Files  DBMS stores information on (“hard”) disks.  This has major implications for DBMS design!
R. Ramakrishnan and J. Gehrke: Storing Data on Disks 1 Storing Data: Disks and Files Chapter 9 “Yea, from the table of my memory I’ll wipe away all trivial.
“Yea, from the table of my memory I’ll wipe away all trivial fond records.” -- Shakespeare, Hamlet.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
Exam I Grades uMax: 96, Min: 37 uMean/Median:66, Std: 18 uDistribution: w>= 90 : 6 w>= 80 : 12 w>= 70 : 9 w>= 60 : 9 w>= 50 : 7 w>= 40 : 11 w>= 30 : 5.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Content based on Chapter 9 Database Management Systems, (3.
COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Storing Data: Disks and Files Chapter 7 Jianping Fan Dept of Computer Science UNC-Charlotte.
1 Storing Data: Disks and Files Chapter 9. 2 Objectives  Memory hierarchy in computer systems  Characteristics of disks and tapes  RAID storage systems.
Database Applications (15-415) DBMS Internals: Part II Lecture 12, February 21, 2016 Mohammad Hammoud.
Storing Data: Disks and Files Memory Hierarchy Primary Storage: main memory. fast access, expensive. Secondary storage: hard disk. slower access,
The very Essentials of Disk and Buffer Management.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Disks and Files.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
CS522 Advanced database Systems
Database Applications (15-415) DBMS Internals- Part I Lecture 11, February 16, 2016 Mohammad Hammoud.
Storing Data: Disks and Files
Storing Data: Disks and Files
Database Applications (15-415) DBMS Internals: Part II Lecture 11, October 2, 2016 Mohammad Hammoud.
CS222/CS122C: Principles of Data Management Lecture #3 Heap Files, Page Formats, Buffer Manager Instructor: Chen Li.
Database Management Systems (CS 564)
Storing Data: Disks and Files
Lecture 10: Buffer Manager and File Organization
Database Systems November 2, 2011 Lecture #7.
5. Disk, Pages and Buffers Why Not Store Everything in Main Memory
Storing Data: Disks and Files
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Basics Storing Data on Disks and Files
CS222P: Principles of Data Management Lecture #3 Buffer Manager, PAX
Database Systems (資料庫系統)
Storing Data: Disks and Files
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Presentation transcript:

5. Disk, Pages and Buffers Why Not Store Everything in Main Memory 5. Disk, Pages and Buffers Why Not Store Everything in Main Memory? Why use disks at all? Main memory costs too much. $1000 will buy you either 1 GB of RAM or 1 TGB of disk today (maybe more!) (~1000 as much disk as RAM per dollar) . Main memory is volatile. We want data to be saved between program runs (data persistence or residuality) Main memory is smaller. System disks typically hold many orders of magnitude more data than RAM (TBs vs GBs) The typical storage hierarchy is: Main memory (RAM) for data in current use. Disk for the main database (secondary storage). Tapes for archiving older versions of the data (tertiary storage). 19

Disks Secondary storage device of choice. Main advantage over tapes: random access vs. sequential Data stored and retrieved in units called disk blocks or pages Unlike RAM, time to retrieve a disk page varies depending upon location on disk. Caveat: In NUMA (non-uniform memory access) machines, e.g., NDSU CHPC’s SGI ALTIX, even in RAM, time to retrieve depends upon location (which brick or quad) Therefore, relative placement of pages on disk has a major impact on DBMS performance! 20

Arm assembly moves in/out to position a head on a desired track. Components of a Disk Collection of tracks under the heads at any one time is a cylinder. Only one head reads/writes at any one time. Spindle Tracks (both sides of platter) Block size i(smallest unit of transfer) is a multiple of sector size (which is fixed). Disk head Arm movement Arm assembly Sector cylinder Platters 21

Accessing a Disk Page Delay time to access (read/write) data on a disk block: seek time (moving arms to position disk head on track) rotational delay (waiting for block to rotate under head) transfer time (actually moving of electronic data to/from disk surface) Seek time and rotational delay dominate. Seek times can vary from about 1 to 20msec Rotational delay can vary from 0 to 10msec Transfer rate can be 1msec per 4KB page Key to lower I/O cost: reduce seek/rotation delays! Hardware vs. software solutions? 22

Arranging Pages on Disk (clustering) `Next’ block concept: blocks on same track, followed by blocks on same cylinder, followed by blocks on adjacent cylinder Blocks in a file should be arranged sequentially on disk (by the above notions of `next’), to minimize seek and rotational delay. For a sequential scan, pre-fetching several pages at a time is a big win! 23

RAID (redundant array of independent disks) RAID Disk Array: Arrangement of several disks that gives the abstraction of a single, large disk. RAID Goals: Increase performance (more concurrent read/write heads) and reliability (redundant data copies are kept) RAID's two main techniques: Data striping: Data is partitioned and “striped” across disks; size of a partition is called the striping unit. Partitions are distributed over several disks allowing more read/write heads to operate in parallel. Redundancy: Redundant info allows reconstruction of data if 1 disk fails.

RAID Levels Level 0: Block striping but no redundancy Disk1 Disk2 Disk3 Disk4 1 2 3 4 Level 0: Block striping but no redundancy (e.g., Blocks 1,2,3,4 on Disks 1,2,3,4 resp.) Faster reads (more r/w heads working in parallel) Disk1 Disk2 Disk3 Disk4 4 3 4 3 2 2 1 1 Level 1: Mirroring (2 identical copies) Each disk has a mirror image disk (check disk) Parallel reads, but a write involves 2 disks. Improved durability Disk1 Disk2 Disk3 Disk4 3 4 3 4 1 2 1 2 Level 0+1: (sometimes called level 10) Block Striping and Mirroring Faster reads plus improved durability Level 2: simple bit-striping, Not used these days.

RAID Levels 3,4,5 Level 3: Bit-Interleaved Parity ParityDisk Disk1 Disk2 Disk3 Disk4 1 2 3 4 Level 3: Bit-Interleaved Parity Striping Unit = 1 bit bits 1,2,3,4 e.g., on disks 1,2,3,4, resp. 1 check disk. Each read/write request involves all disks; disk array can process 1 request at a time (but very rapidly) ParityDisk Disk1 Disk2 Disk3 Disk4 1 2 3 4 Level 4: Block-Interleaved Parity Striping Unit=1 block. blocks 1,2,3,4 1 check disk Parallel reads possible for small requests large requests can utilize full bandwidth Writes involve modifying block and check disk Disk0 Disk1 Disk2 Disk3 Disk4 Level 5: Block-Interleaved Distributed Parity Similar to Level 4, but parity blocks distributed over all disks (striping unit = block) eliminates Parity Disk hot-spot

Buffer Management in a DBMS Page Requests from Higher Levels Occupied frame free frame BUFFER POOL (page frames) DB MAIN MEMORY DISK Disk_mgr transfers pages between page-frame > disk choice of frame for a page is dictated by replacement policy Data must be in RAM (buffer) for DBMS to operate on it! LookupTable of <frame#, pageid> pairs is maintained. 4

When a Page is Requested If requested page (from a higher level) is not in buffer pool: Choose a frame for replacement If frame is dirty (has been changed while in RAM), write it to disk first Read requested page into that frame (and update the LookupTable) Pin the page (designate it as temporarily non-replaceable) and return its address to requesting higher level layer process. If requests can be predicted (e.g., in sequential scans), pages can be pre-fetched several at a time! 5

More on Buffer Management The requestor of a page, when it is done with that page, must unpin it (actually decrement its pin count) and set dirty bit if page has been modified. Because a page in the buffer pool may be requested concurrently by many higher layer processes, pin count is used (LookupTable has <frame#, pg-ID, pincnt>) A page is a candidate for replacement iff pin count = 0. A note: CC & recovery subsystem may force additional I/O when a frame is chosen for replacement. (e.g., to implement a Write-Ahead Log protocol; more later on that.) 6

Buffer Replacement write read Frame is chosen for replacement by a replacement policy: Least-recently-used (LRU) or Most-Recently-Used (MRU) or… An example is given below, showing that knowledge of access pattern by the buffer manager, can be important – e.g., with LRU: Extent (multi-block) pre-fetching (and extent writing) would alleviate this situation considerably. BUFFER POOL These 6 reads fill the buffer. With LRU, every new read requires a write, flushing a frame (assuming all 6 pages have been change (dirtied)) 13 7 1 2 3 4 5 6 14 8 9 write read 10 11 12 DISK pages 1 2 3 4 5 6 7 8 9 10 11 12 13 14… 7

Buffer Replacement Policy Policy can have big impact on # of I/O’s; depends on access pattern Sequential flooding is the bad situation caused by LRU + repeated sequential scans Can happen when # buffer frames < # pages in sequentially scan. Each page request causes a flush, whereas, MRU + repeated sequential scans would not. Given a file with 7 blocks to be read sequentially and repeatedly. Note that, after a while, every page to be read, was just flushed. BUFFER POOL 1 2 3 4 5 6 1 2 7 2nd scan now needs page-2, but it was just flushed! Etc. Second scan begins, requiring page-1, but it was just flushed! Pgs 1-6 read in order. To read page-7, LRU flushes page 1 1 2 3 4 5 6 7 7

DBMS vs. OS File System OS can do disk space and buffer management. Why not let OS manage these tasks? Differences in OS support: portability issues Some OS limitations, e.g., files can’t span disks. Buffer management in DBMS requires ability to: pin a page in buffer pool, force a page to disk (important for implementing CC & recovery), adjust replacement policy, and pre-fetch pages based on access patterns in typical DB operations. 8

Record Formats: Fixed Length fields F1 F2 F3 F4 L1 L2 L3 L4 Field lengths Base address (B) Address = B+L1+L2 Info about field types same for all records in a file can access via offsets stored in system catalogs 9

Record Formats: Variable Length Two alternative formats (assuming # fields is fixed): 4 $ Field Count Fields Delimited by Special Symbols F1 F2 F3 F4 F1 F2 F3 F4 Array of Field Offsets The 2nd alternative offers direct access to ith field, efficient storage of nulls (special don’t know value); small directory overhead. 10

Page Formats: Fixed Length Records RecSlot 1 RecSlot 2 RecSlot N . . . N PACKED pg format Free Space number of records . . . M 1 M ... 3 2 1 UNPACKED, BITMAP Slot 1 Slot 2 Slot N Slot M number of records Record ID (RID) = <page id, slot #>. In PACKED, moving records for free space mgmt changes RID. That may not be acceptable (RIDs are to be permanent IDs). 11

UNPACKED, RECORD POINTER Page Format (for Variable Length Records) Rid = (i,N) * Page i Rid = (i,2) * Rid = (i,1) * Pointer to start of free space … N # of record slots SLOT DIRECTORY N . . . 2 1 Can move records on page without changing RID; so, attractive for fixed-length records too. 12