Lecture 9: Data Storage and IO Models

Slides:



Advertisements
Similar presentations
Storing Data: Disk Organization and I/O
Advertisements

Storing Data: Disks and Files
Storing Data: Disks and Files: Chapter 9
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
Storing Data: Disks and Files
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Murali Mani Overview of Storage and Indexing (based on slides from Wisconsin)
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
©Silberschatz, Korth and Sudarshan11.1Database System Concepts Chapter 11: Storage and File Structure Overview of Physical Storage Media Magnetic Disks.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 9.
Lecture 11: DMBS Internals
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
Physical Storage Susan B. Davidson University of Pennsylvania CIS330 – Database Management Systems November 20, 2007.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7 “ Yea, from the table of my memory I ’ ll wipe away.
1 Storing Data: Disks and Files Chapter 9. 2 Disks and Files  DBMS stores information on (“hard”) disks.  This has major implications for DBMS design!
“Yea, from the table of my memory I’ll wipe away all trivial fond records.” -- Shakespeare, Hamlet.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
11.1Database System Concepts. 11.2Database System Concepts Now Something Different 1st part of the course: Application Oriented 2nd part of the course:
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Content based on Chapter 9 Database Management Systems, (3.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
1 CS122A: Introduction to Data Management Lecture #14: Indexing Instructor: Chen Li.
1 Storing Data: Disks and Files Chapter 9. 2 Objectives  Memory hierarchy in computer systems  Characteristics of disks and tapes  RAID storage systems.
Data Storage and Querying in Various Storage Devices.
File organization Secondary Storage Devices Lec#7 Presenter: Dr Emad Nabil.
The very Essentials of Disk and Buffer Management.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Disks and Files.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
CS522 Advanced database Systems
Database Applications (15-415) DBMS Internals- Part I Lecture 11, February 16, 2016 Mohammad Hammoud.
Chapter 2: Computer-System Structures
Storing Data: Disks and Files
Storing Data: Disks and Files
Database Applications (15-415) DBMS Internals: Part II Lecture 11, October 2, 2016 Mohammad Hammoud.
Storage and Disks.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
End of XQuery DBMS Internals
Database Management Systems (CS 564)
Oracle SQL*Loader
Disks and Files DBMS stores information on (“hard”) disks.
Lecture 11: DMBS Internals
Storing Data: Disks and Files
Lecture 10: Buffer Manager and File Organization
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
Disk Storage, Basic File Structures, and Buffer Management
Introduction to Database Systems
Database Systems November 2, 2011 Lecture #7.
5. Disk, Pages and Buffers Why Not Store Everything in Main Memory
Storing Data: Disks and Files
Lecture 30: The IO Model 1 External Sorting
Secondary Storage Management Brian Bershad
Persistence: hard disk drive
CSE 232A Graduate Database Systems
Basics Storing Data on Disks and Files
Secondary Storage Management Hank Levy
Storing Data: Disks and Files
Lecture 18: DMBS Overview and Data Storage
Lecture 15: Data Storage Tuesday, February 20, 2001.
CSE 190D Database System Implementation
Presentation transcript:

Lecture 9: Data Storage and IO Models

Submission Project Part 1 tonight Lecture 9 Announcements Submission Project Part 1 tonight Instructions on Piazza! PS2 due on Friday at 11:59 pm Questions? Easier than PS1. Badgers Rule!

Today’s Lecture Data Storage Disk and Files Buffer Manager - Prelims

Lecture 9 > Section 1 Data Storage

What you will learn about in this section Lecture 9 > Section 1 What you will learn about in this section Life cycle of a query Architecture of a DBMS Memory Hierarchy

matches(“insurance claims”) Lecture 9 > Section 1 Life cycle of a query Query Result Query Database Server Syntax Tree Parser Query Plan Optimizer Segments Query Scheduler |…|……|………..|………..| Query Result Execute Operators where W.image.rain() Report R, Weather W Select R.text from and W.date = R.date and W.city = R.city matches(“insurance claims”) R.text. and Query

Internal Architecture of a DBMS Lecture 9 > Section 1 > Architecture of a DBMS Internal Architecture of a DBMS query Query Execution data access Storage Manager I/O access

Architecture of a Storage Manager Lecture 9 > Section 1 > Storage Manager Architecture of a Storage Manager I/O Manager Access Methods Heap File B+-tree Index Sorted File Hash Index Buffer Manager Concurrency Control Manager Recovery Manager IO Accesses In Systems, IO cost matters a ton!

Data Storage How does a DBMS store and access data? Lecture 9 > Section 1 > Data Storage Data Storage How does a DBMS store and access data? main memory (fast, temporary) disk (slow, permanent) How do we move data from disk to main memory? buffer manager How do we organize relational data into files?

Magnetic Hard Disk Drive (HDD) Lecture 9 > Section 1 > Memory Hierarchy Memory Hierarchy CPU Cache Main Memory 100s Price Access Speed Capacity A C C E S S C Y C L E S Flash Storage 105 – 106 Magnetic Hard Disk Drive (HDD) 107 – 108

Why not main memory? Relatively high cost Lecture 9 > Section 1 > Memory Hierarchy Why not main memory? Relatively high cost Main memory is not persistent! Typical storage hierarchy: Primary storage: main memory (RAM) for currently used data Secondary storage: disk for the main database Tertiary storage: tapes for archiving older versions of the data

Lecture 9 > Section 2 2. Disk and Files

What you will learn about in this section Lecture 9 > Section 2 What you will learn about in this section All about disks Accessing a disk Managing disk space

Disks Secondary storage device of choice. Lecture 9 > Section 2 > Disks Disks Secondary storage device of choice. Data is stored and retrieved in units called disk blocks Unlike RAM, time to retrieve a disk page varies depending upon location on disk. Therefore, relative placement of pages on disk has major impact on DBMS performance!

The Mechanics of a Disk Mechanical characteristics: Lecture 9 > Section 2 > Disks The Mechanics of a Disk Cylinder Spindle Tracks Disk head Sector Mechanical characteristics: Rotation speed (7200RPM) Number of platters (1-30) Number of tracks (<=10000) Number of bytes/track(105) Arm movement Platters Arm assembly

The Mechanics of a Disk Platters spin @ ~ 7200rpm Lecture 9 > Section 2 > Disks The Mechanics of a Disk Cylinder Spindle Tracks Disk head Platters spin @ ~ 7200rpm Arm assembly moves to position a head on a desired track. Tracks under heads make a cylinder (imaginary!) Only 1 head reads/writes at any time Block size : multiple of sector size (which is fixed). Sector Arm movement Platters Arm assembly

The Mechanics of a Disk Unit of read or write: Lecture 9 > Section 2 > Disks The Mechanics of a Disk Cylinder Spindle Tracks Disk head Sector Unit of read or write: disk block: k*Sector Size Once in memory: page Typically: 4k or 8k or 16k Arm movement Platters Arm assembly Access time = seek time + rotational delay + transfer time (1-20 ms) (0-10ms) (~1 ms per 8k block)

Access time = seek time + rotational delay + transfer time Lecture 9 > Section 2 > Disks The Mechanics of a Disk Cylinder Spindle Tracks Disk head GOAL: Minimize seek and rotational delay Sector “Next Block” concept Blocks on same track Blocks on same cylinder Blocks on adjacent cylinder Arm movement Platters Arm assembly Disks read/write one block at a time Access time = seek time + rotational delay + transfer time For a sequential scan, pre-fetching several pages at a time is a big win!

Lecture 9 > Section 2 > Accessing the Disk Accessing the disk (I) access time = rotational delay + seek time + transfer time rotational delay: time to wait for sector to rotate under the disk head typical delay: 0–10 ms maximum delay = 1 full rotation average delay ~ half rotation RPM Average delay 5,400 5.56 7,200 4.17 10,000 3.00 15,000 2.00

Accessing the disk (II) Lecture 9 > Section 2 > Accessing the Disk Accessing the disk (II) access time = rotational delay + seek time + transfer time seek time: time to move the arm to position disk head on the right track typical seek time: ~ 9 ms ~ 4 ms for high-end disks

Accessing the disk (III) Lecture 9 > Section 2 > Accessing the Disk Accessing the disk (III) access time = rotational delay + seek time + transfer time data transfer time: time to move the data to/from the disk surface typical rates: ~100 MB/s the access time is dominated by the seek time and rotational delay!

Example: Specs What are the I/O rates for block size 4 KB and: Lecture 9 > Section 2 > Accessing the Disk Example: Specs Seagate HDD Capacity 3 TB RPM 7,200 Average Seek Time 9 ms Max Transfer Rate 210 MB/s # Platters 3 What are the I/O rates for block size 4 KB and: random workload (~ 0.3 MB/s) sequential workload (~ 210 MB/s)

Managing Disk Space The disk space is organized into files Lecture 9 > Section 2 > Managing Disk Space Managing Disk Space The disk space is organized into files Files are made up of pages Pages contain records Data is allocated/deallocated in increments of pages Logically close pages should be nearby in the disk

SSDs (Solid State Drive) Lecture 9 > Section 2 > SSDs SSDs (Solid State Drive) SSDs use flash memory No moving parts (no rotate/seek motors) eliminates seek time and rotational delay very low power and lightweight Data transfer rates: 300-600 MB/s SSDs can read data (sequential or random) very fast!

SSDs Small storage (0.1-0.5x of HDD) expensive (20x of HDD) Lecture 9 > Section 2 > SSDs SSDs Small storage (0.1-0.5x of HDD) expensive (20x of HDD) Writes are much more expensive than reads (10x) Limited lifetime 1-10K writes per page the average failure rate is 6 years Can only read and write in blocks or pages of 2K, 4K, or more bytes. Looks like a disk.

3. Buffer Manager - Prelims Lecture 9 > Section 3 3. Buffer Manager - Prelims

What you will learn about in this section Lecture 9 > Section 3 What you will learn about in this section Buffer Manager Replacement Policy

High-level: Disk vs. Main Memory Lecture 9 > Section 3 > Storage & memory model High-level: Disk vs. Main Memory Platters Spindle Disk head Arm movement Arm assembly Tracks Sector Cylinder Disk: Slow: Sequential block access Read a blocks (not byte) at a time, so sequential access is cheaper than random Disk read / writes are expensive! Durable: We will assume that once on disk, data is safe! Cheap Random Access Memory (RAM) or Main Memory: Fast: Random access, byte addressable ~10x faster for sequential access ~100,000x faster for random access! Volatile: Data can be lost if e.g. crash occurs, power goes out, etc! Expensive: For $100, get 16GB of RAM vs. 2TB of disk!

Lecture 9 > Section 3 > The Buffer Main Memory Buffer A buffer is a region of physical memory used to store temporary data In this lecture: a region in main memory used to store intermediate data between disk and processes Key idea: Reading / writing to disk is slow- need to cache data! Disk

The (Simplified) Buffer Lecture 9 > Section 3 > The Buffer The (Simplified) Buffer Main Memory Buffer In this class: We’ll consider a buffer located in main memory that operates over pages and files: Read(page): Read page from disk -> buffer if not already in buffer Disk 1,0,3 1,0,3

The (Simplified) Buffer Lecture 9 > Section 3 > The Buffer The (Simplified) Buffer Main Memory Buffer In this class: We’ll consider a buffer located in main memory that operates over pages and files: 2 1,0,3 Read(page): Read page from disk -> buffer if not already in buffer Processes can then read from / write to the page in the buffer Disk 1,0,3

The (Simplified) Buffer Lecture 9 > Section 3 > The Buffer The (Simplified) Buffer Main Memory Buffer In this class: We’ll consider a buffer located in main memory that operates over pages and files: 1,2,3 Read(page): Read page from disk -> buffer if not already in buffer Flush(page): Evict page from buffer & write to disk Disk 1,0,3

The (Simplified) Buffer Lecture 9 > Section 3 > The Buffer The (Simplified) Buffer Main Memory Buffer In this class: We’ll consider a buffer located in main memory that operates over pages and files: 1,2,3 Read(page): Read page from disk -> buffer if not already in buffer Flush(page): Evict page from buffer & write to disk Disk 1,0,3 Release(page): Evict page from buffer without writing to disk

Managing Disk: The DBMS Buffer Lecture 9 > Section 3 > The Buffer Managing Disk: The DBMS Buffer Main Memory Buffer Database maintains its own buffer Why? The OS already does this… DB knows more about access patterns. Watch for how this shows up! (cf. Sequential Flooding) Recovery and logging require ability to flush to disk. Disk

Lecture 9 > Section 3 > The Buffer The Buffer Manager A buffer manager handles supporting operations for the buffer: Primarily, handles & executes the “replacement policy” i.e. finds a page in buffer to flush/release if buffer is full and a new page needs to be read in DBMSs typically implement their own buffer management routines

A Simplified Filesystem Model Lecture 9 > Section 3 > The Buffer A Simplified Filesystem Model For us, a page is a fixed-sized array of memory Think: One or more disk blocks Interface: write to an entry (called a slot) or set to “None” DBMS also needs to handle variable length fields Page layout is important for good hardware utilization as well (see 346) And a file is a variable-length list of pages Interface: create / open / close; next_page(); etc. Disk File 1,0,3 1,0,3 Page

Lecture 9 > Section 3 > The Buffer Buffer Manager Data must be in RAM for DBMS to operate on it All the pages may not fit into main memory Buffer manager: responsible for bringing pages from disk to main memory as needed pages brought into main memory are in the buffer pool the buffer pool is partitioned into frames: slots for holding disk pages

Buffer Manager page request buffer pool page frame disk Lecture 9 > Section 3 > The Buffer Buffer Manager page request buffer pool Upper levels: release pages when done indicate if a page is modified page frame disk

Remember: Buffer Manager Lecture 9 > Section 3 > The Buffer Remember: Buffer Manager Read(page): Read page from disk -> buffer if not already in buffer Flush(page): Evict page from buffer & write to disk Release(page): Evict page from buffer without writing to disk

Buffer replacement policy Lecture 9 > Section 3 > The Buffer Buffer replacement policy How do we choose a frame for replacement? LRU (Least Recently Used) Clock MRU (Most Recently Used) FIFO, random, … The replacement policy has big impact on # of I/O’s (depends on the access pattern) To be continued!