CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II Data Storage (Sections 11.2, 11.3, 11.4, 11.5)
Advertisements

- Dr. Kalpakis CMSC Dr. Kalpakis 1 Outline In implementing DBMS we need to answer How should the system store and manage very large amounts of data?
Storing Data: Disks and Files: Chapter 9
CS 277 – Spring 2002Notes 21 CS 277: Database System Implementation Notes 02: Hardware Arthur Keller.
CS 245Notes 21 CS 245: Database System Principles Notes 02: Hardware Hector Garcia-Molina.
The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 512Mb Typically magnetic disks, magneto­ optical (erasable), CD­ ROM.
The Memory Hierarchy fastest, perhaps 1Mb
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 2Gb Typically magnetic disks, magneto­ optical (erasable),
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
13.2 Disks Mechanics of Disks Presented by Chao-Hsin Shih Feb 21, 2011.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
1 CS143: Disks and Files. 2 System Architecture CPU Main Memory Disk Controller... Disk Word (1B – 64B) ~ x GB/sec Block (512B – 50KB) ~ x MB/sec System.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #5.
Disks.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 512Mb Access times in milliseconds, great variability. Unit.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 5 – Storage Organization.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?
Lecture 11: DMBS Internals
Physical Storage and File Organization COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
1 Secondary Storage Management Submitted by: Sathya Anandan(ID:123)
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
ICS 321 Fall 2011 Overview of Storage & Indexing (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 11/9/20111Lipyeow.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
File Processing : Storage Media 2015, Spring Pusan National University Ki-Joune Li.
1 Data Storage (Chap. 11) Based on Hector Garcia-Molina’s slides.
Chapter 8 External Storage. Primary vs. Secondary Storage Primary storage: Main memory (RAM) Secondary Storage: Peripheral devices  Disk drives  Tape.
11.1Database System Concepts. 11.2Database System Concepts Now Something Different 1st part of the course: Application Oriented 2nd part of the course:
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?
Storing Data Dina Said 1 1.
Database Systems Disk Management Concepts. WHY DO DISKS NEED MANAGING? logical information  physical representation bigger databases, larger records,
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
Disk Basics CS Introduction to Operating Systems.
CS 101 – Sept. 28 Main vs. secondary memory Examples of secondary storage –Disk (direct access) Various types Disk geometry –Flash memory (random access)
Section 13.2 – Secondary storage management (Former Student’s Note)
DBMS 2001Notes 2: Hardware1 Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Disk Average Seek Time. Multi-platter Disk platter Disk read/write arm read/write head.
Magnetic Disk Rotational latency Example Find the average rotational latency if the disk rotates at 20,000 rpm.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Programmer’s View of Files Logical view of files: –An a array of bytes. –A file pointer marks the current position. Three fundamental operations: –Read.
1 CSE232A: Database System Principles Hardware. Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query.
COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.
CPS216: Advanced Database Systems Notes 03: Data Access from Disks Shivnath Babu.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
CS422 Principles of Database Systems Disk Access Chengyu Sun California State University, Los Angeles.
Data Storage and Querying in Various Storage Devices.
Storage and Disks.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
CPSC-608 Database Systems
Disks and Files DBMS stores information on (“hard”) disks.
File Processing : Storage Media
Lecture 11: DMBS Internals
Lecture 9: Data Storage and IO Models
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
File Processing : Storage Media
Persistence: hard disk drive
Chapter 11: Mass-Storage Systems
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
CS 245: Database System Principles Notes 02: Hardware
Presentation transcript:

CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner

Data Storage: Overview How does a DBMS store and manage large amounts of data? –(today, tomorrow) What representations and data structures best support efficient manipulations of this data? –(thereafter)

The Memory Hierarchy Cache (all levels) Main Memory Secondary Storage Tertiary Storage Fastest Slowest Avg. Size: 256kb-1MB Read/Write Time: seconds. Random Access Smallest of all memory, and also the most costly. Usually on same chip as processor. Easy to manage in Single Processor Environments, more complicated in Multiprocessor Systems. Avg. Size: 128 MB – 1 GB Read/Write Time: to seconds. Random Access Becoming more affordable. Volatile Avg. Size: 30GB-160GB Read/Write Time: seconds NOT Random Access Extremely Affordable: $0.68/GB!!! Can be used for File System, Virtual Memory, or for raw data access. Blocking (need buffering) Avg. Size: Gigabytes-Terabytes Read/Write Time: seconds NOT Random Access, or even remotely close Extremely Affordable: pennies/GB!!! Not efficient for any real-time database purposes, could be used in an offline processing environment

Memory Hierarchy Summary access time (sec) cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape typical capacity (bytes)

Memory Hierarchy Summary access time (sec) cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape dollars/MB

Motivation Consider the following algorithm : For each tuple r in relation R{ Read the tuple r For each tuple s in relation S{ read the tuple s append the entire tuple s to r } What is the time complexity of this algorithm?

Motivation Complexity: –This algorithm is O(n 2 ) ! Is it always ? –Yes, if we assume random access of data. Hard disks are NOT Random Access ! Unless organized efficiently, this algorithm may be much worse than O(n 2 ). We need to know how a hard disk operates to understand how to efficiently store information and optimize storage.

Disk Mechanics Many DB related issues involve hard disk I/O! Thus we will now study how a hard disk works.

Disk Mechanics Disk Head Platter Cylinder

Disk Mechanics Track Sector Gap

Disk Mechanics P MDC...

Disk Controller Disk Controller is a processor capable of: –Controlling the motion of disk heads –Selecting surface from which to read/write –Transferring data to/from memory P MDC...

More Disk Terminology Rotation Speed: –The speed at which the disk rotates: 5400RPM Number of Tracks: –Typically 10,000 to 15,000. Bytes per track: –~10 5 bytes per track

How big is the disk if? There are 4 platters There are 8192 tracks per surface There are 256 sectors per track There are 512 bytes per sector Size = 2 * num of platters * tracks * sectors * bytes per sector Size = 2 * 4platters * 8192 tracks/platter * 256 sect/trac * 512 bytes/sect Size = 2 33 bytes / (1024 bytes/kb) /(1024 kb/MB) /(1024 MB/GB) Size = 2 33 = 2 3 * 2 30 = 8GB Remember 1kb = 1024 bytes, not 1000!

What about access time? block x in memory ? I want block X Time = Disk Controller Processing Time + Disk Latency + Transfer Time

Access time, Graphically P MDC... Disk Controller Processing Time Disk Latency Transfer Time

Disk Controller Processing Time Time = Disk Controller Processing Time + Disk Latency + Transfer Time CPU Request  Disk Controller –nanoseconds Disk Controller Contention –microseconds Bus –microseconds Typically a few microseconds, so this is negligible for our purposes.

Transfer Time Time = Disk Controller Processing Time + Disk Latency + Transfer Time Typically 10mb/sec Or 4096 blocks takes ~.5 ms

Disk Delay Time = Disk Controller Processing Time + Disk Latency + Transfer Time More complicated Disk Delay = Seek Time + Rotational Latency

Seek Time Seek time is most critical time in Disk Delay. Average Seek Times: –Maxtor 40GB (IDE) ~10ms –Western Digital (IDE) 20GB ~9ms –Seagate (SCSI) 70 GB ~3.6ms –Maxtor 60GB (SATA) ~9ms

Rotational Latency Head Here Block I Want

Average Rotational Latency Average latency is about half of the time it takes to make one revolution RPM = 8.33 ms 5400 RPM = 5.55 ms 7200 RPM = 4.16 ms 10,000 RPM = 3.0 ms (newer drives)

Example Disk Latency Problem Calculate the Minimum, Maximum and Average disk latencies for reading a byte block on the same hard drive as before: 4 platters 8192 tracks 256 sectors/track 512 bytes/sector Disk rotates at 3840 RPM Seek time: 1 ms between cylinders, + 1ms for every 500 cylinders traveled. Gaps consume 10% of each track A 4096-byte block is 8 sectors The disk makes one revolution in 1/64 of a second 1 rotation takes: 15.6 ms Moving one track takes 1.002ms. Moving across all tracks takes 17.4ms

Solution: Minimum Latency Assume best case: – head is already on block we want! In that case, it is just read time of 8 sectors of 4096-byte block. We will pass over 8 sectors and 7 gaps. Remember : 10% are gaps and 90% are information,. or 36 o are gaps, 324 o is information. 36 x (7/256) x (8/256) = degrees / 360 =.0308 rot (3.08% of the rotation).0308 rot / 64 rot/sec = sec = 0.482ms

Solution: Maximum Latency Now assume worst case: –The disk head is over innermost cylinder and the block we want is on outermost cylinder, –block we want has just passed under the head, so we have to wait a full rotation. Time = Time to move from innermost track to outermost track + Time for one full rotation + Time to read 8 sectors = 17.4 ms (seek time) ms (one rotation) +.5ms.. (from minimum latency calculation) = 33.5 ms!!

Solution: Average Latency Now assume average case: –It will take an average amount of time to seek, and –block we want is ½ of a revolution away from heads. Time =Time to move over tracks + Time for one-half of a rotation + Time to read 8 sectors = 6.5ms (next slide) + 7.8ms (.5 rotation) +.5 ms (from min latency ) = 14.8 ms

Solution: Calculating Average Seek Time Integrate over this graph = 2730 cylinders = /500 = 6.5 ms Starting track Avg travel

Writing Blocks Basically same as reading! Phew!

Verifying a write Verify : Same as reading/writing, – plus one additional revolution to come back to the block and verify. So for our earlier example to verify each case: MIN 5ms ms + 5ms = 25.6ms MAX 33.5ms ms + 5ms = 54.1ms AVG 14.8ms ms + 5ms = 35.4 ms

After seeing all of this … Which will be faster Sequential I/O or Random I/O? What are some ways we can improve I/O times without changing the disk features?

Next … Read Sections