COS 318: Operating Systems Storage Devices

Slides:



Advertisements
Similar presentations
Chapter 6 I/O Systems.
Advertisements

Chapter 8: Secondary-Storage Structure
Chapter 5 Input/Output 5.1 Principles of I/O hardware
1 Chapter 11 I/O Management and Disk Scheduling Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and.
1 Lecture 22: I/O, Disk Systems Todays topics: I/O overview Disk basics RAID Reminder: Assignment 8 due Tue 11/21.
I/O InterfaceCS510 Computer ArchitecturesLecture Lecture 17 I/O Interfaces and I/O Busses.
Cs422 – Operating Systems Organization
Disks and RAID.
Input/Output III Disks CS 423, Fall 2007 Klara Nahrstedt/Sam King 6/3/20141.
Disk Storage SystemsCSCE430/830 Disk Storage Systems CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U. Maine) Fall,
Secondary Storage Devices: Magnetic Disks
Overview of Mass Storage Structure
Silberschatz, Galvin and Gagne Operating System Concepts Disk Scheduling Disk IO requests are for blocks, by number Block requests come in an.
I/O Chapter 8. Outline Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
Chapter 12: Secondary-Storage Structure. Outline n Cover n (Magnetic) Disk Structure n Disk Attachment n Disk Scheduling Algorithms l FCFS,
CS224 Spring 2011 Computer Organization CS224 Chapter 6A: Disk Systems With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide.
Copyright © 2009 EMC Corporation. Do not Copy - All Rights Reserved.
I/O Management and Disk Scheduling
Operating Systems Disk Scheduling A. Frank - P. Weisberg.
Mass-Storage Structure
Mass Storage Systems. Readings r Chapter , 12.7.
Triple-Parity RAID and Beyond Hai Lu. RAID RAID, an acronym for redundant array of independent disks or also known as redundant array of inexpensive disks,
Storage and Disks Fusheng Wang Department of Biomedical Informatics
Disk Scheduling Because Disk I/O is so important, it is worth our time to Investigate some of the issues involved in disk I/O. One of the biggest issues.
1 Disks Introduction ***-. 2 Disks: summary / overview / abstract The following gives an introduction to external memory for computers, focusing mainly.
Storage and Disks.
CMPT 300 Introduction to Operating Systems
13. Secondary Storage (S&G, Ch. 13)
CS 6560: Operating Systems Design
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Disks and RAID.
Disks and Files Vivek Pai Princeton University. 2 Gedankenagain What is one-millionth of the coast-to-coast distance of the US? Does anything in the physical.
Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:
Disk Drivers May 10, 2000 Instructor: Gary Kimura.
Avishai Wool lecture Introduction to Systems Programming Lecture 9 Input-Output Devices.
Secondary Storage CSCI 444/544 Operating Systems Fall 2008.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
12.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 12: Mass-Storage Systems.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL:
CS 346 – Chapter 10 Mass storage –Advantages? –Disk features –Disk scheduling –Disk formatting –Managing swap space –RAID.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
1 Recitation 8 Disk & File System. 2 Disk Scheduling Disks are at least four orders of magnitude slower than main memory –The performance of disk I/O.
Copyright ©: Nahrstedt, Angrave, Abdelzaher, Caccamo1 Disk & disk scheduling.
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
1Fall 2008, Chapter 12 Disk Hardware Arm can move in and out Read / write head can access a ring of data as the disk rotates Disk consists of one or more.
Disks Chapter 5 Thursday, April 5, Today’s Schedule Input/Output – Disks (Chapter 5.4)  Magnetic vs. Optical Disks  RAID levels and functions.
I/O and Disk Scheduling CS Spring Overview Review of I/O Techniques I/O Buffering Disk Geometry Disk Scheduling Algorithms RAID.
I/O Computer Organization II 1 Introduction I/O devices can be characterized by – Behavior: input, output, storage – Partner: human or machine – Data rate:
Chapter 12 – Mass Storage Structures (Pgs )
Lecture 3 Page 1 CS 111 Online Disk Drives An especially important and complex form of I/O device Still the primary method of providing stable storage.
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
Device Management Mark Stanovich Operating Systems COP 4610.
CS399 New Beginnings Jonathan Walpole. Disk Technology & Secondary Storage Management.
Part IV I/O System Chapter 12: Mass Storage Structure.
Device Management Andy Wang Operating Systems COP 4610 / CGS 5765.
W4118 Operating Systems Instructor: Junfeng Yang.
CS422 Principles of Database Systems Disk Access Chengyu Sun California State University, Los Angeles.
Magnetic Disks Have cylinders, sectors platters, tracks, heads virtual and real disk blocks (x cylinders, y heads, z sectors per track) Relatively slow,
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems DISK I/0.
Sarah Diesburg Operating Systems CS 3430
Multiple Platters.
Disks and RAID.
Mass-Storage Structure
Introduction I/O devices can be characterized by I/O bus connections
CSE 153 Design of Operating Systems Winter 2018
Disks and scheduling algorithms
Persistence: hard disk drive
Chapter 11: Mass-Storage Systems
Andy Wang Operating Systems COP 4610 / CGS 5765
Introduction to Operating Systems
Presentation transcript:

COS 318: Operating Systems Storage Devices

Today’s Topics Magnetic disks Magnetic disk performance Disk arrays (RAID) Flash memory

A Typical Magnetic Disk Controller External connection IDE/ATA, SATA SCSI, SCSI-2, Ultra SCSI, Ultra-160 SCSI, Ultra-320 SCSI Fibre channel Cache Buffer data between disk and interface Controller Read/write operation Cache replacement Failure detection and recovery External connection Interface DRAM cache Controller Disk

Disk Caching Method Pros of disk caching Cons Use DRAM to cache recently accessed blocks Most disk has 16MB Some of the RAM space stores “firmware” (an embedded OS) Blocks are replaced usually in an LRU order Pros of disk caching Good for reads if accesses have locality Cons Cost Need to deal with reliable writes

Disk Arm and Head Disk arm Disk head Read/write operation A disk arm carries disk heads Disk head Mounted on an actuator Read and write on disk surface Read/write operation Disk controller receives a command with <track#, sector#> Seek the right cylinder (tracks) Wait until the right sector comes Perform read/write

Mechanical Component of A Disk Drive Tracks Concentric rings around disk surface, bits laid out serially along each track Cylinder A track of the platter, 1000-5000 cylinders per zone, 1 spare per zone Sectors Each track is split into arc of track (min unit of transfer)

Disk Sectors Where do they come from? What is a sector? Formatting process Logical maps to physical What is a sector? Header (ID, defect flag, …) Real space (e.g. 512 bytes) Trailer (ECC code) What about errors? Detect errors in a sector Correct them with ECC If not recoverable, replace it with a spare Skip bad sectors in the future Hdr 512 bytes ECC … Sector i i+1 i+2 defect

Disks Were Large First Disk: IBM 305 RAMAC (1956) 5MB capacity 50 disks, each 24”

They Are Now Much Smaller Form factor: .5-1” 4” 5.7” Storage: 120-750GB Form factor: .4-.7”  2.7”  3.9” Storage: 60-200GB Form factor: .2-.4”  2.1”  3.4” Storage: 1GB-8GB

Areal Density vs. Moore’s Law (Mark Kryder at SNW 2006)

50 Years Later (Mark Kryder at SNW 2006) IBM RAMAC (1956) Seagate Momentus (2006) Difference Capacity 5MB 160GB 32,000 Areal Density 2K bits/in2 130 Gbits/in2 65,000,000 Disks 50 @ 24” diameter 2 @ 2.5” diameter 1 / 2,300 Price/MB $1,000 $0.01 1 / 3,200,000 Spindle Speed 1,200 RPM 5,400 RPM 5 Seek Time 600 ms 10 ms 1 / 60 Data Rate 10 KB/s 44 MB/s 4,400 Power 5000 W 2 W 1 / 2,500 Weight ~ 1 ton 4 oz 1 / 9,000

Sample Disk Specs (from Seagate) Cheetah 15k.5 Barracuda ES.2 Capacity Formatted capacity (GB) 300 1000 Discs 4 Heads 8 Sector size (bytes) 512 Performance External interface Ultra320 SCSI, FC, S. SCSI SATA Spindle speed (RPM) 15,000 7,200 Average latency (msec) 2.0 4.16 Seek time, read/write (ms) 3.5/4.0 8.5/9.5 Track-to-track read/write (ms) 0.2-0.4 0.8/1.0 External transfer (MB/sec) 300-400 350 Internal transfer rate (MB/sec) 89-150 105 (sustained) Cache size (MB) 16 32 Reliability Recoverable read errors 1 per 1012 bits read 1 per 1010 bits read Non-recoverable read errors 1 per 1016 bits read 1 per 1015 bits read

Disk Performance Seek Rotational delay Transfer bytes Position heads over cylinder, typically 3.5-9.5 ms Rotational delay Wait for a sector to rotate underneath the heads Typically 8  4 ms (7,200 – 15,000RPM) or ½ rotation takes 4 - 2ms Transfer bytes Transfer bandwidth is typically 40-125 Mbytes/sec Performance of transfer 1 Kbytes Seek (4 ms) + half rotational delay (2ms) + transfer (0.013 ms) Total time is 6.01 ms or 167 Kbytes/sec (1/360 of 60MB/sec)! 8.3*25*10=1660KB

% of Disk Transfer Bandwidth More on Performance What transfer size can get 90% of the disk bandwidth? Assume Disk BW = 60MB/sec, ½ rotation = 2ms, ½ seek = 4ms BW * 90% = size / (size/BW + rotation + seek) size = BW * (rotation + seek) * 0.9 / 0.1 = 60MB * 0.006 * 0.9 / 0.1 = 3.24MB Seek and rotational times dominate the cost of small accesses Disk transfer bandwidth are wasted Need algorithms to reduce seek time Speed depends on which sectors to access Block Size (Kbytes) % of Disk Transfer Bandwidth 1Kbytes 0.28% 1Mbytes 73.99% 3.24Mbytes 90%

FIFO (FCFS) order Method Pros Cons 53 199 First come first serve Fairness among requests In the order applications expect Cons Arrival may be on random spots on the disk (long seeks) Wild swings can happen 53 199 98, 183, 37, 122, 14, 124, 65, 67

SSTF (Shortest Seek Time First) Method Pick the one closest on disk Rotational delay is in calculation Pros Try to minimize seek time Cons Starvation Question Can we avoid the starvation? 53 199 98, 183, 37, 122, 14, 124, 65, 67 (65, 67, 37, 14, 98, 122, 124, 183)

Elevator (SCAN) Method Pros Cons 53 199 Take the closest request in the direction of travel Real implementations do not go to the end (called LOOK) Pros Bounded time for each request Cons Request at the other end will take a while 53 199 98, 183, 37, 122, 14, 124, 65, 67 (37, 14, 65, 67, 98, 122, 124, 183)

C-SCAN (Circular SCAN) Method Like SCAN But, wrap around Real implementation doesn’t go to the end (C-LOOK) Pros Uniform service time Cons Do nothing on the return 53 199 98, 183, 37, 122, 14, 124, 65, 67 (65, 67, 98, 122, 124, 183, 14, 37)

Discussions Which is your favorite? Disk I/O request buffering FIFO SSTF SCAN C-SCAN Disk I/O request buffering Where would you buffer requests? How long would you buffer requests?

RAID (Redundant Array of Independent Disks) Main idea Store the error correcting codes on other disks General error correcting codes are too powerful Use XORs or single parity Upon any failure, one can recover the entire block from the spare disk (or any disk) using XORs Pros Reliability High bandwidth Cons The controller is complex RAID controller D1 D2 D3 D4 P  P = D1  D2  D3  D4 D3 = D1  D2  P  D4

Synopsis of RAID Levels RAID Level 0: Non redundant RAID Level 1: Mirroring RAID Level 2: Byte-interleaved, ECC RAID Level 3: Byte-interleaved, parity RAID Level 4: Block-interleaved, parity RAID Level 5: Block-interleaved, distributed parity

RAID Level 6 and Beyond 1 2 3 A 4 5 6 7 B 8 9 10 11 C 12 13 14 15 D E Goals Less computation and fewer updates per random writes Small amount of extra disk space Extended Hamming code Remember Hamming code? Specialized Eraser Codes IBM Even-Odd, NetApp RAID-DP, … Beyond RAID-6 Reed-Solomon codes, using MOD 4 equations Can be generalized to deal with k (>2) disk failures 1 2 3 A 4 5 6 7 B 8 9 10 11 C 12 13 14 15 D E F G H

Dealing with Disk Failures What failures Power failures Disk failures Human failures What mechanisms required NVRAM for power failures Hot swappable capability Monitoring hardware RAID reconstruction

Next Generation: FLASH 1995 16 Mb NAND flash chips 2005 16 Gb NAND flash Doubled each year since 1995 Market driven by Phones, Cameras, iPod,… Low entry-cost, ~$30/chip → ~$3/chip 2012 1 Tb NAND flash == 128 GB chip == 1TB or 2TB “disk” for ~$400 or 128GB disk for $40 or 32GB disk for $5 Samsung prediction

Some Current FLASH Parameters 5,000 IOs / second Chip read ~ 20 MB/s write ~ 10 MB/s N chips have N x bandwidth Latency ~ 25 μs to start read, ~ 100 μs to read a “2K page” ~ 2,000 μs to erase ~ 200 μs to write a “2K page” Power ~ 1W for 8 chips and controller

What’s Wrong With FLASH? Expensive: $/GB 2x less than cheap DRAM 50x more than disk today, may drop to 10x in 2012 Limited lifetime ~100k to 1M writes / page requires “wear leveling” but, if you have 1B pages, then 15,000 years to “use” ½ the pages. Current performance limitations Slow to write can only write 0’s, so erase (set all 1) then write Large (e.g. 128K) segments to erase need to manage data layout

Where Is Flash Going Consumer products High reliability Camera, cell phone, USB stick, … High reliability Airplane, high vibration environment, dusty, … High performance Emerging products to replace high-end storage system with huge cache and 15,000RPM FC drives New flash technologies No erase cycle

Summary Disk is complex Disk real density is on Moore’s law curve Need large disk blocks to achieve good throughput OS needs to perform disk scheduling RAID improves reliability and high throughput at a cost Careful designs to deal with disk failures Flash memory has emerged at both low end and high end and may simplify systems software