CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)
Advertisements

Storing Data: Disk Organization and I/O
Lecture # 7. Topics Storage Techniques of Bits Storage Techniques of Bits Mass Storage Mass Storage Disk System Performance Disk System Performance File.
- Dr. Kalpakis CMSC Dr. Kalpakis 1 Outline In implementing DBMS we need to answer How should the system store and manage very large amounts of data?
Storing Data: Disks and Files: Chapter 9
CS 277 – Spring 2002Notes 21 CS 277: Database System Implementation Notes 02: Hardware Arthur Keller.
The Memory Hierarchy fastest, perhaps 1Mb
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 2Gb Typically magnetic disks, magneto­ optical (erasable),
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
13.2 Disks Mechanics of Disks Presented by Chao-Hsin Shih Feb 21, 2011.
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
Disks.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 512Mb Access times in milliseconds, great variability. Unit.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Storing Data: Disks & Files
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 5 – Storage Organization.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
Lecture 11: DMBS Internals
Physical Storage and File Organization COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
1 Secondary Storage Management Submitted by: Sathya Anandan(ID:123)
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
ICS 321 Fall 2011 Overview of Storage & Indexing (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/9/20111Lipyeow.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
File Processing : Storage Media 2015, Spring Pusan National University Ki-Joune Li.
1 Data Storage (Chap. 11) Based on Hector Garcia-Molina’s slides.
11.1Database System Concepts. 11.2Database System Concepts Now Something Different 1st part of the course: Application Oriented 2nd part of the course:
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Disk Basics CS Introduction to Operating Systems.
CS 101 – Sept. 28 Main vs. secondary memory Examples of secondary storage –Disk (direct access) Various types Disk geometry –Flash memory (random access)
Lecture 5: 9/10/2002CS149D Fall CS149D Elements of Computer Science Ayman Abdel-Hamid Department of Computer Science Old Dominion University Lecture.
Section 13.1 – Secondary storage management (Former Student’s Note)
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
1 CSE232A: Database System Principles Hardware. Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query.
COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
1 CS122A: Introduction to Data Management Lecture #14: Indexing Instructor: Chen Li.
Data Storage and Querying in Various Storage Devices.
File organization Secondary Storage Devices Lec#7 Presenter: Dr Emad Nabil.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Lec 5 part1 Disk Storage, Basic File Structures, and Hashing.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Disks and Files.
Computer System Structures Storage
Storage and Disks.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
Computer Science 210 Computer Organization
Backing Store.
Disks and Files DBMS stores information on (“hard”) disks.
File Processing : Storage Media
Lecture 11: DMBS Internals
Lecture 9: Data Storage and IO Models
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
File Processing : Storage Media
Computer Application Waseem Gulsher
Secondary Storage Management Brian Bershad
Persistence: hard disk drive
Section 13.1 – Secondary storage management (Former Student’s Note)
Secondary Storage Management Hank Levy
CS 245: Database System Principles Notes 02: Hardware
Presentation transcript:

CS4432: Database Systems II Data Storage 1

Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data? – Has significant impact on performance Design decisions: – What representations and data structures best support efficient manipulations of this data? To understand why the DBMSs applies specific strategies – Must first understand how disks work 2

Disks and Files DBMS stores information on (“hard”) disks. Main memory is only for processing This has major implications for DBMS design! – READ: transfer data from disk to main memory (RAM). – WRITE: transfer data from RAM to disk. – Both are high-cost operations, relative to in- memory operations, so must be planned carefully! 3

DBMS vs. OS? Who’s in Control DBMS is in control of managing its data – It knows more about structure – It knows more about access pattern 4

That is why DBMS has Storage Manager & Buffer Manager 5

Understanding Disks 6

Storage Hierarchy Cache (all levels) Main Memory Secondary Storage Tertiary Storage Fastest Slowest Avg. Size: 256kb-1MB Read/Write Time: seconds. Random Access Smallest of all memory, and also the most costly. Usually on same chip as processor. Easy to manage in Single Processor Environments, more complicated in Multiprocessor Systems. Avg. Size: 128 MB – 1 GB Read/Write Time: to seconds. Random Access Becoming more affordable. Volatile Avg. Size: 30GB-160GB Read/Write Time: seconds NOT Random Access Extremely Affordable: $0.68/GB!!! Can be used for File System, Virtual Memory, or for raw data access. Blocking (need buffering) Avg. Size: Gigabytes-Terabytes Read/Write Time: seconds NOT Random Access, or even remotely close Extremely Affordable: pennies/GB!!! Not efficient for any real-time database purposes, could be used in an offline processing environment 7

Storage Hierarchy 8

Memory Hierarchy Summary access time (sec) cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape typical capacity (bytes) 9

Memory Hierarchy Summary access time (sec) cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape dollars/MB 10

Why Not Store Everything in Main Memory? Costs too much. $100 will buy you either 16GB of RAM or 360GB of disk today. Main memory is volatile. We want data to be saved between runs. (Obviously!) Typical hierarchy: – Main memory (RAM)  Processing – Disks (secondary storage)  Persistent Storage – Tapes & DVDs  Archival 11

Motivation Consider the following algorithm : For each tuple r in relation R{ Read the tuple r For each tuple s in relation S{ read the tuple s append the entire tuple s to r } What is the time complexity of this algorithm? 12

Motivation Complexity: – This algorithm is O(n 2 ) ! Is it always ? – Yes, if we assume random access of data. Hard disks are not efficient in Random Access ! Unless organized efficiently, this algorithm may be much worse than O(n 2 ). 13

Disks: Some Facts Data is stored and retrieved in units called disk blocks. – Disk block 512 bytes to 4K or 8K Movement to main-memory – Must read or write one block at a time 14

Disk Components Platter (2 surface) 15

Virtual Cylinder Disk Head Platter Cylinder 16

Tracks divided into Sectors Track Sector Gap Gaps ≈ 10% Sectors ≈ 90% 17

Movements Arm moves in-out – Called seek time – Mechanical Platter rotates – Called latency time – Mechanical 18

Actual Disk 19

Disk Controller Processor MemoryDisk Controller... Disk 1 Disk 2 1.Controls the mechanical movement 2.Transferring the data from disks to memory 3.Smart buffering and scheduling 20

How big is the disk if? There are 4 platters There are 8192 tracks per surface There are 256 sectors per track There are 512 bytes per sector Size = 2 * num of platters * tracks * sectors * bytes per sector Size = 2 * 4* 8192 * 256 * 512 Size = 2 33 bytes / (1024 bytes/kb) /(1024 kb/MB) /(1024 MB/GB) Size = 2 33 = 2 3 * 2 30 = 8GB Remember 1kb = 1024 bytes, not 1000! 21

Scale of Bytes 22

More Disk Terminology Rotation Speed: – The speed at which the disk rotates: 5400RPM Number of Tracks: – Typically 10,000 to 15,000. Bytes per track: – ~10 5 bytes per track 23

Big Question: What about access time? block x in memory ? I want block X Time = Disk Controller Processing Time + Disk Delay{seek & rotation} + Transfer Time 24