1 Data Storage (Chap. 11) Based on Hector Garcia-Molina’s slides.

Slides:



Advertisements
Similar presentations
CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.
Advertisements

- Dr. Kalpakis CMSC Dr. Kalpakis 1 Outline In implementing DBMS we need to answer How should the system store and manage very large amounts of data?
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Disks and RAID.
CS 277 – Spring 2002Notes 21 CS 277: Database System Implementation Notes 02: Hardware Arthur Keller.
CS 245Notes 21 CS 245: Database System Principles Notes 02: Hardware Hector Garcia-Molina.
Data Storage John Ortiz. Lecture 17Data Storage2 Overview  Database stores data on secondary storage  Disk has distinct storage and access characteristics.
The Memory Hierarchy fastest, perhaps 1Mb
Storage. The Memory Hierarchy fastest, but small under a microsecond, random access, perhaps 2Gb Typically magnetic disks, magneto­ optical (erasable),
CS4432: Database Systems II Data Storage - Lecture 2 (Sections 13.1 – 13.3) Elke A. Rundensteiner.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
1 Chapter 6 Storage and Multimedia: The Facts and More.
Disk Drivers May 10, 2000 Instructor: Gary Kimura.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Professor Elke A. Rundensteiner.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
1 CS143: Disks and Files. 2 System Architecture CPU Main Memory Disk Controller... Disk Word (1B – 64B) ~ x GB/sec Block (512B – 50KB) ~ x MB/sec System.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CS4432: Database Systems II Lecture 2 Timothy Sutherland.
CPSC-608 Database Systems Fall 2009 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #5.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
Avishai Wool lecture Introduction to Systems Programming Lecture 9 Input-Output Devices.
Secondary Storage CSCI 444/544 Operating Systems Fall 2008.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Using the Disk, and Disk Optimizations Professor Elke A. Rundensteiner.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
1 Introduction to Computers Day 4. 2 Storage device A functional unit into which data can be –placed –retained(stored) –retrieved(accessed)
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 5 – Storage Organization.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
CS 346 – Chapter 10 Mass storage –Advantages? –Disk features –Disk scheduling –Disk formatting –Managing swap space –RAID.
Chapter 2 Data Storage How does a computer system store and manage very large volumes of data ?
Chapter 3 Data Storage. Media Storage Main memory (Electronic Memory): Stores data currently being used Is made of semiconductor chips. Secondary Memory.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 29 Database Systems II Secondary Storage.
IT 344: Operating Systems Winter 2010 Module 13 Secondary Storage Chia-Chi Teng CTB 265.
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
Disks Chapter 5 Thursday, April 5, Today’s Schedule Input/Output – Disks (Chapter 5.4)  Magnetic vs. Optical Disks  RAID levels and functions.
Chapter 2. Data Storage Chapter 2.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
11.1Database System Concepts. 11.2Database System Concepts Now Something Different 1st part of the course: Application Oriented 2nd part of the course:
CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #5.
CS 101 – Sept. 28 Main vs. secondary memory Examples of secondary storage –Disk (direct access) Various types Disk geometry –Flash memory (random access)
Section 13.2 – Secondary storage management (Former Student’s Note)
DBMS 2001Notes 2: Hardware1 Principles of Database Management Systems Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
1 CSE232A: Database System Principles Hardware. Data + Indexes Database System Architecture Query ProcessingTransaction Management SQL query Parser Query.
COSC 6340: Disks 1 Disks and Files DBMS stores information on (“hard”) disks. This has major implications for DBMS design! » READ: transfer data from disk.
CPS216: Advanced Database Systems Notes 03: Data Access from Disks Shivnath Babu.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
CS 245Notes 21 Database System Principles Notes 02: Hardware.
1 Query Processing Exercise Session 1. 2 The system (OS or DBMS) manages the buffer Disk B1B2B3 Bn … … Program’s private memory An application program.
File organization Secondary Storage Devices Lec#7 Presenter: Dr Emad Nabil.
Computer Science 210 Computer Organization
CS 554: Advanced Database System Notes 02: Hardware
CPSC-608 Database Systems
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
CSE 451: Operating Systems Winter 2006 Module 13 Secondary Storage
CSE 451: Operating Systems Autumn 2003 Lecture 12 Secondary Storage
CSE 451: Operating Systems Winter 2007 Module 13 Secondary Storage
CSE 451: Operating Systems Secondary Storage
CSE 451: Operating Systems Winter 2003 Lecture 12 Secondary Storage
CSE 451: Operating Systems Spring 2005 Module 13 Secondary Storage
CSE451 File System Introduction and Disk Drivers Autumn 2002
CSE 451: Operating Systems Autumn 2004 Secondary Storage
CSE 451: Operating Systems Winter 2004 Module 13 Secondary Storage
CPS216: Advanced Database Systems Notes 04: Data Access from Disks
CS 245: Database System Principles Notes 02: Hardware
Presentation transcript:

1 Data Storage (Chap. 11) Based on Hector Garcia-Molina’s slides

2 Outline Hardware: Disks Access Times Example - Megatron 747 Optimizations Other Topics: –Storage costs –Using secondary storage –Disk failures*

3 Hardware DBMS Data Storage

4 P MC Typical Computer Secondary Storage...

5 Processor Fast, slow, reduced instruction set, with cache, pipelined… Speed: 100  500  1000 MIPS Memory Fast, slow, non-volatile, read-only,… Access time:  sec. 1  s  1 ns

6 Secondary storage Many flavors: - Disk: Floppy (hard, soft) Removable Packs Winchester Ram disks Optical, CD-ROM… Arrays - TapeReel, cartridge Robots

7 Focus on: “Typical Disk” Terms: Platter, Head, Actuator Cylinder, Track Sector (physical), Block (logical), Gap …

8 Top View

9 “Typical” Numbers Diameter: 1 inch  15 inches Cylinders:100  2000 Surfaces:1 (CDs)  (Tracks/cyl) 2 (floppies)  30 Sector Size:512B  50K Capacity:360 KB (old floppy)  4000 GB

10

11 Focus on: “Typical Disk” Terms: Platter, Head, Actuator Cylinder, Track Sector (physical), Block (logical), Gap …

12 Summary of Disk Geometry…

13 Disk Access Time block x in memory ? I want block X

14 Disk latency = Seek Time + Rotational Delay + Transfer Time + Other

15 Seek Time 3 or 20x x 1N Cylinders Traveled Time

16 Average Random Seek Time   SEEKTIME (i  j) S = N(N-1) N N i=1 j=1 j  i “Typical” S: 10 ms  40 ms

17 Rotational Delay Head Here Block I Want

18 Average Rotational Delay R = 1/2 revolution “typical” R = 8.33 ms (3600 RPM)

19 Transfer Rate: t “typical” t: 1  3 MB/second transfer time: block size t

20 Other Delays CPU time to issue I/O Contention for disk controller Contention for bus, memory “Typical” Value: 0

Conference leave –Feb. 18 (Thursday) –Feb. 24 (Tuesday) 21

22 So far: Random Block Access What about: Reading “Next” block?

23 If we do things right Time to get = Block Size + Negligible block t - skip gap - switch track - once in a while, next cylinder

24 Rule ofRandom I/O: Expensive Thumb Sequential I/O: Much less Exp:1 KB Block »Random I/O:  20 ms. »Sequential I/O:  1 ms.

25 Cost for Writing is similar to Reading …. unless we want to verify! need to add (full) rotation + Block size t

26 To Modify a Block? To Modify Block: (a) Read Block (b) Modify in Memory (c) Write Block [(d) Verify?]

27 Block Address: Physical Device Cylinder # Surface # Sector

28 Complication: Bad Blocks Messy to handle May map via software to integer sequence 1 2.Map Actual Block Addresses. m

in diameter 3600 RPM 1 surface 16 MB usable capacity (16 X 2 20 ) 128 cylinders seek time: average = 25 ms. adjacent cyl = 5 ms. An Example Megatron 747 Disk (old)

30 1 KB block = 1 sector 10% overhead between blocks capacity = 16 MB = (2 20 )16 = 2 24 # cylinders = 128 = 2 7 bytes/cyl = 2 24 /2 7 = 2 17 = 128 KB blocks/cyl = 128 KB / 1 KB = 128

RPM 60 revolutions / sec 1 rev. = msec. One track:... Time over useful data:(16.66)(0.9)=14.99 ms. Time over gaps: (16.66)(0.1) = 1.66 ms. Transfer time 1 block = 14.99/128=0.117 ms. Trans. time 1 block+gap=16.66/128=0.13ms.

32 Burst Bandwith (BB) 1 KB in ms. BB = 1/0.117 = 8.54 KB/ms or BB =8.54KB/ms x 1000 ms/1sec x 1MB/1024KB = 8540/1024 = 8.33 MB/sec

33 Sustained bandwith (over track) 128 KB in ms. SB = 128/16.66 = 7.68 KB/ms or SB = 7.68 x 1000/1024 = 7.50 MB/sec.

34 T 1 = Time to read one random block T 1 = seek + rotational delay + TT = 25 + (16.66/2) = ms.

35 Suppose OS deals with 4 KB blocks T 4 = 25 + (16.66/2) + (.117) x 1 + (.130) X 3 = ms [Compare to T 1 = ms] block

36 T T = Time to read a full track (start at any block) T T = 25 + (0.130/2) * = ms to get to first block * Actually, a bit less; do not have to read last gap.

37

38 The NEW Megatron 747 (pp. 518 and pp. 521) 8 platters, 16 Surfaces, 3.5 Inch diameter 2 14 = 16,384 Tracks/surface 128 Sectors/track 2 12 = 4096 Bytes/sector Gaps represent 10% of the track.

39 The NEW Megatron 747 (pp. 518 and pp. 521) The disk rotates at 7200 RPM; i.e., it makes one rotation in 8.33 millisecond. To move the head assembly between cylinders takes one millisecond to start and stop. It takes one millisecond for every 1000 cylinders traveled.

40 The minimum, Maximum, and average times to read a 16,384-byte block?

41 Minimum and Maximum Time

42 Average Time

43 Assignment 3 is coming soon!

44 8 GB Disk If all tracks have 256 sectors Outermost density: 100,000 bits/inch Inner density: 250,000 bits/inch 1.

45 Outer third of tracks: 320 sectors Middle third of tracks: 256 Inner third of tracks: 192 Density: 114,000  182,000 bits/inch

46 Timing for new Megatron 747 (Ex 2.3) Time to read 4096-byte block: –MIN: 0.5 ms –MAX: 33.5 ms –AVE: 14.8 ms

47

48 Outline Hardware: Disks Access Times Example: Megatron 747 Optimizations Other Topics –Storage Costs –Using Secondary Storage –Disk Failures here

49 Optimizations (in controller or O.S.) Disk Scheduling Algorithms –e.g., elevator algorithm Track (or larger) Buffer Pre-fetch Arrays* Mirrored Disks On Disk Cache

50 Double Buffering Problem: Have a File » Sequence of Blocks B1, B2… Have a Program » Process B1 » Process B2 » Process B3...

51 Single Buffer Solution (1) Read B1  Buffer (2) Process Data in Buffer (3) Read B2  Buffer (4) Process Data in Buffer...

52 SayP = time to process/block R = time to read in 1 block n = # blocks Single buffer time = n(P+R)

53 Double Buffering Memory: Disk: ABCDGEFA B done process A C B done

54 Say P  R What is processing time? P = Processing time/block R = IO time/block n = # blocks Double buffering time = R + nP Single buffering time = n(R+P)

55 Disk Arrays RAIDs (various flavors) Block Striping Mirrored logically one disk

56 On Disk Cache P MC... cache

57 Block Size Selection? Big Block  Amortize I/O Cost Big Block  Read in more useless stuff! and takes longer to read Unfortunately...

58

59 Trend As memory prices drop, blocks get bigger... Trend

60 Storage Cost access time (sec) cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape typical capacity (bytes) from Gray & Reuter

61 Storage Cost access time (sec) cache electronic main electronic secondary magnetic optical disks online tape nearline tape & optical disks offline tape dollars/MB from Gray & Reuter

62 Using secondary storage effectively Example: Sorting data on disk Conclusion: –I/O costs dominate –Design algorithms to reduce I/O Also: How big should blocks be?

63 Disk Failures Partial  Total Intermittent  Permanent

64 Coping with Disk Failures Detection –e.g. Checksum Correction  Redundancy

65 At what level do we cope? Single Disk –e.g., Error Correcting Codes (pp. 562) Disk Array Logical Physical

66 Operating System e.g., Stable Storage Logical BlockCopy A Copy B

67 Database System e.g., Log Current DBLast week’s DB

68 Summary Secondary storage, mainly disks I/O times I/Os should be avoided, especially random ones….. Summary

69 Outline Hardware: Disks Access Times Example: Megatron 747 Optimizations Other Topics –Storage Costs –Using Secondary Storage –Disk Failures here