CS4432: Database Systems II Data Storage (Better Block Organization) 1.

Slides:



Advertisements
Similar presentations
CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.
Advertisements

Storing Data: Disk Organization and I/O
I/O Management and Disk Scheduling
- Dr. Kalpakis CMSC Dr. Kalpakis 1 Outline In implementing DBMS we need to answer How should the system store and manage very large amounts of data?
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Disks and RAID.
CS 277 – Spring 2002Notes 21 CS 277: Database System Implementation Notes 02: Hardware Arthur Keller.
CS 245Notes 21 CS 245: Database System Principles Notes 02: Hardware Hector Garcia-Molina.
The Memory Hierarchy fastest, perhaps 1Mb
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 2Gb Typically magnetic disks, magneto­ optical (erasable),
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
1 CS143: Disks and Files. 2 System Architecture CPU Main Memory Disk Controller... Disk Word (1B – 64B) ~ x GB/sec Block (512B – 50KB) ~ x MB/sec System.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #5.
CS 342 – Operating Systems Spring 2003 © Ibrahim Korpeoglu Bilkent University1 Input/Output – 5 Disks CS 342 – Operating Systems Ibrahim Korpeoglu Bilkent.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
Avishai Wool lecture Introduction to Systems Programming Lecture 9 Input-Output Devices.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Using the Disk, and Disk Optimizations Professor Elke A. Rundensteiner.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Operating Systems COMP 4850/CISG 5550 Disks, Part II Dr. James Money.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
CS 346 – Chapter 10 Mass storage –Advantages? –Disk features –Disk scheduling –Disk formatting –Managing swap space –RAID.
1 Recitation 8 Disk & File System. 2 Disk Scheduling Disks are at least four orders of magnitude slower than main memory –The performance of disk I/O.
Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?
Topic: Disks – file system devices. Rotational Media Sector Track Cylinder Head Platter Arm Access time = seek time + rotational delay + transfer time.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 29 Database Systems II Secondary Storage.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
1 Data Storage (Chap. 11) Based on Hector Garcia-Molina’s slides.
Chapter 8 External Storage. Primary vs. Secondary Storage Primary storage: Main memory (RAM) Secondary Storage: Peripheral devices  Disk drives  Tape.
Lecture 3 Page 1 CS 111 Online Disk Drives An especially important and complex form of I/O device Still the primary method of providing stable storage.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Storing Data Dina Said 1 1.
Disk Basics CS Introduction to Operating Systems.
DBMS 2001Notes 2: Hardware1 Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Device Management Mark Stanovich Operating Systems COP 4610.
Disk Average Seek Time. Multi-platter Disk platter Disk read/write arm read/write head.
Magnetic Disk Rotational latency Example Find the average rotational latency if the disk rotates at 20,000 rpm.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
1 CSE232A: Database System Principles Hardware. Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query.
COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.
Part IV I/O System Chapter 12: Mass Storage Structure.
CPS216: Advanced Database Systems Notes 03: Data Access from Disks Shivnath Babu.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
CS 245Notes 21 Database System Principles Notes 02: Hardware.
1 Components of the Virtual Memory System  Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk.
CS422 Principles of Database Systems Disk Access Chengyu Sun California State University, Los Angeles.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems DISK I/0.
Disks and RAID.
CS 554: Advanced Database System Notes 02: Hardware
CPSC-608 Database Systems
Disks and Files DBMS stores information on (“hard”) disks.
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
Disks and scheduling algorithms
Persistence: hard disk drive
Parameters of Disks The most important disk parameter is the time required to locate an arbitrary disk block, given its block address, and then to transfer.
Lecture 10: Magnetic Disks
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
CS 245: Database System Principles Notes 02: Hardware
Presentation transcript:

CS4432: Database Systems II Data Storage (Better Block Organization) 1

Big Question: What about access time? block x in memory ? I want block X Time = Disk Controller Processing Time + Disk Delay{seek & rotation} + Transfer Time 2

Access time, Graphically P MDC... Disk Controller Processing Time Disk Delay Transfer Time 3

Disk Controller Processing Time Time = Disk Controller Processing Time + Disk Delay + Transfer Time CPU Request  Disk Controller – Nanoseconds (10 -9 ) Disk Controller Contention – Microseconds (10 -6 ) Bus – Microseconds (10 -6 ) ≈ Microseconds Negligible for our purposes. ≈ Microseconds Negligible for our purposes. 4

Transfer Time Time = Disk Controller Processing Time + Disk Delay + Transfer Time Typically 10MB/sec Reading 4K data block takes ~ 0.5 ms Order of 1 millisecond (or less) 5

Disk Delay Time = Disk Controller Processing Time + Disk Delay + Transfer Time More complicated Disk Delay = Seek Time + Rotational Latency 6

Seek Time Seek time is most critical time in Disk Delay. Average Seek Times: – Maxtor 40GB (IDE) ~10ms – Western Digital (IDE) 20GB ~9ms – Seagate (SCSI) 70 GB ~3.6ms – Maxtor 60GB (SATA) ~9ms Order of 10 milliseconds 7

Rotational Latency Head Here Block I Want 8

Average Rotational Latency Average latency is about half of the time it takes to make one revolution RPM = 8.33 ms 5400 RPM = 5.55 ms 7200 RPM = 4.16 ms 10,000 RPM = 3.0 ms (newer drives) Order of few milliseconds 9

Accessing a Disk Block: Summary Time to access (read/write) a disk block: – seek time ( moving arms to position disk head on track ) – rotational latency ( waiting for block to rotate under head ) – transfer time ( actually moving data to/from disk surface ) Seek time and rotational latency dominate. – Seek time varies from about 1 to 20msec – Rotational delay varies from 0 to 10msec – Transfer rate is about 0.5msec per 4KB page Key to lower I/O cost: reduce seek/rotation latency! 10

Example Disk Latency Problem Calculate the Minimum, Maximum and Average disk latencies for reading a 4096-byte block on the same hard drive as before: 4 platters 8192 tracks 256 sectors/track 512 bytes/sector Disk rotates at 3840 RPM Seek time: 1 ms (warm-up), + 1ms for every 500 cylinders traveled. Gaps consume 10% of each track Reading one sector 0.06 ms A 4096-byte block is 8 sectors The disk makes one revolution in 1/64 of a second 1 rotation takes: 15.6 ms Moving one track takes 1.002ms. Moving across all tracks takes 17.4ms 11

Best Case: Minimum Latency Assume best case: – head is already on block we want! In that case, it is just read time of 8 sectors of 4096-byte block. We will pass over 8 sectors and 7 gaps. That is only the “Transfer Time” ≈ 0.06 ms x 8 = 0.5 ms 12

Worst Case: Maximum Latency Now assume worst case: – The disk head is over innermost cylinder and the block we want is on outermost cylinder, – block we want has just passed under the head, so we have to wait a full rotation. Time = Time to move from innermost track to outermost track + Time for one full rotation + Time to read 8 sectors = 17.4 ms (seek time) ms (one rotation) + 0.5ms (transfer time) = 33.5 ms!! 13

Average Case: Average Latency Now assume average case: – It will take an average amount of time to seek, and – block we want is ½ of a revolution away from heads. Time =Time to move over tracks + Time for one-half of a rotation + Time to read 8 sectors = 9.2ms (approximation) + 7.8ms (half rotation) ms (from min latency ) = 17.5 ms 14

Writing Blocks Same as reading blocks … 15

After seeing all of this … Which will be faster Sequential I/O or Random I/O? Sequential I/O – Reading blocks next to each other on the same track Sequential I/O saves seek & rotation latency times Next Question: How to organize the data to avoid/reduce Random I/Os ? 16

Accelerating Access to Blocks 17

Accelerating Access to Blocks 1.Placing Related Blocks on Cylinders 2.Using Multiple Disks 3.Mirroring 4.Disk Scheduling 5.Prefetching & Buffering 18 Performed by Disk Controller

1- Placing Related Blocks on Cylinders If blocks B1, B2, B3, and B4 will be read together But them on the same cylinder to read them at once. Keep additional related blocks on the next sectors on the same track 19 B4 B3 B2 B1

2- Using Multiple Disks: Striping Use multiple smaller disks instead of one large disk Each disk can access its data independently – N disks  N times faster access 20 B4 B3B2B1 B5B6 Disk 1Disk 2Disk 3

3- Mirroring Use pairs of disks that are mirrors t each other Good for failure & Good for faster access Higher overhead under writing operations 21

4- Disk Scheduling Disk Controller may have a sequence of block requests Not necessary serve requests in their arrival order (FIFO)  Use better scheduling policy Elevator & SCAN policies 22

4- Disk Scheduling: SCAN When starting a sweep (inward or outward) – Complete the sweep until the end – skip any newly arrived requests after the start 23

5- Prefetching & Buffering If DBMS can predict the sequence of access – It can pre-fetch and buffer more blocks even before requesting them. Example: Have a File » Sequence of Blocks B1, B2, … Have a Program » Process B1 » Process B2 » Process B3 … 24

25 Naïve Single Buffer Solution (1) Read B1  Buffer (2) Process Data in Buffer (3) Read B2  Buffer (4) Process Data in Buffer...

Cost of Naïve Solution 26 SayP = time to process/block R = time to read in 1 block n = # blocks Single buffer time = n(P+R)

27 Double Buffering Memory: Disk: ABCDGEF process

28 Double Buffering Memory: Disk: ABCDGEF B done process A

29 Double Buffering Memory: Disk: ABCDGEF A C process B done

Cost of Double Buffering In Double Buffering – R does not involve seek or latency times (except for the first block) 30 What is processing time? P = Processing time/block R = IO time/block n = # blocks

Cost of Double Buffering 31 P = Processing time/block R = IO time/block n = # blocks Double Buffering time = R + nP Single Buffering time = n(R+P)

Accelerating Access to Blocks: Covered 1.Placing Related Blocks on Cylinders 2.Using Multiple Disks 3.Mirroring 4.Disk Scheduling 5.Prefetching & Buffering 32

CS4432: Database Systems II Verification & Disk Failure 33

Intermittent Failures If we try to read the sector but the correct content of that sector is not delivered to the disk controller Check for the good or bad sector To check write is correct: Read is performed Good sector and bad sector is known by the Disk Controller 34

Checksums Each sector has some additional bits, called the checksums (or parity bits) Checksums are set depending on the values of the data bits stored in that sector Probability of reading bad sector is less if we use checksums Checksum 35

Checksums For Odd parity: Odd number of 1’s – Add a parity bit 1 For Even parity: Even number of 1’s – add a parity bit 0 So, number of 1’s becomes always even Sequence : > odd no of 1’s parity bit: 1 -> Sequence : >even no of 1’s parity bit: 0 -> What is the probability of not detecting a failure? 36

Assume we use N parity bits Probability of not detecting a failure is – 1/ 2 N – E.g., for one byte  1/2 8 = 1/ Checksums

Permanent Failure E.g., Disk damage Use or redundant disks and mirroring 38