Presentation on theme: "Input/Output III Disks CS 423, Fall 2007 Klara Nahrstedt/Sam King 6/3/20141."— Presentation transcript:
Input/Output III Disks CS 423, Fall 2007 Klara Nahrstedt/Sam King 6/3/20141
Administrative MP3 – Deadline, November 5, 8am Re-grading of MP2, HW1, Midterm is closed 6/3/20142
3 Data Centers Core of enterprise computing – A cluster of specialized servers – Multiple tiers Storage Database File Applications Web
6/3/ Data Path Disks, storage server, and database server Widely used caching memory – Large access speed gaps – Different sizes – Various granularity >300 cycles ~125 ns 14 cycles 6 ns ~40 us > 5 ms
6/3/ Two Performance Trends The gaps are increasing large Source: Zhifeng Chen, Optimization of Data Access for Database Applications, PhD Thesis, 2005, UIUC
6/3/20146 Disks First, Then File Systems Form factor: Storage: 18-73GB Form factor: Storage: 4-27GB Form factor: Storage: 170MB-1GB
6/3/20147 Disk Technology Trends Disks are getting smaller for similar capacity – Spin faster, less rotational delay, higher bandwidth – Less distance for head to travel (faster seeks) – Lighter weight (for portables) Disk data is getting denser – More bits/square inch – Tracks are closer together – Doubles density every 18 months Disks are getting cheaper ($/MB) – Factor of ~2 per year since 1991 – Head close to surface
6/3/20148 Disk Organization Disk surface – Circular disk coated with magnetic material Tracks – Concentric rings around disk surface, bits laid out serially along each track Sectors – Each track is split into arc of track (min unit of transfer) sector
6/3/20149 More on Disks CDs and floppies come individually, but magnetic disks come organized in a disk pack Cylinder – Certain track of the platter Disk arm – Seek the right cylinder seek a cylinder
6/3/ Disk Examples ( Summarized Specs )
6/3/ Disk Performance Seek – Position heads over cylinder, typically ms Rotational delay – Wait for a sector to rotate underneath the heads – Typically ms (7,200 – 10,000RPM) or ½ rotation takes ms Transfer bytes – Average transfer bandwidth (15-37 Mbytes/sec) Performance of transfer 1 Kbytes – Seek (5.3 ms) + half rotational delay (3ms) + transfer (0.04 ms) – Total time is 8.34ms or 120 Kbytes/sec! What block size can get 90% of the disk transfer bandwidth?
6/3/ Disk Behaviors There are more sectors on outer tracks than inner tracks – Read outer tracks: 37.4MB/sec – Read inner tracks: 22MB/sec Seek time and rotational latency dominate the cost of small reads – A lot of disk transfer bandwidth is wasted – Need algorithms to reduce seek time Block Size (Kbytes) % of Disk Transfer Bandwidth 1Kbytes0.5% 8Kbytes3.7% 256Kbyte s 55% 1Mbytes83% 2Mbytes90%
6/3/ Observations Getting first byte from disk read is slow – high latency Peak bandwidth high, but rarely achieved Need to mitigate disk performance impact – Do extra calculations to speed up disk access Schedule requests to shorten seeks – Move some disk data into main memory – file system caching
6/3/ RAID Use parallel processing to speed up CPU performance Use parallel I/O to improve disk performance, reliability (1988, Patterson) Design new class of I/O devices called RAID – Redundant Array of Inexpensive Disks (also Redundant Array of Independent Disks) Use the RAID in OS as a SLED (Single Large Expensive Disk), but with better performance and reliability
6/3/ RAID (cont.) RAID consists of RAID SCSI controller plus a box of SCSI disks Data are divided into strips and distributed over disks for parallel operation RAID 0 … RAID 5 levels RAID 0 organization writes consecutive strips over the drives in round-robin fashion – operation is called striping RAID 1 organization uses striping and duplicates all disks RAID 2 uses words, even bytes and stripes across multiple disks; uses error codes, hence very robust scheme RAID 3, 4, 5 alterations of the previous ones
6/3/ Linux Kernel
Kernel Components Affected by Block Device Op. 6/3/ Block Device Driver Block Device Driver I/O Scheduler Layer Generic Block Layer Disk Filesystem Disk Filesystem Disk Filesystem Mapping Layer Disk Caches VFS
6/3/ Disk Scheduling Which disk request is serviced first? – FCFS – Shortest seek time first – Elevator (SCAN) – LOOK – C-SCAN (Circular SCAN) – C-LOOK Looks familiar?
6/3/ FIFO (FCFS) order Method – First come first serve Pros – Fairness among requests – In the order applications expect Cons – Arrival may be on random spots on the disk (long seeks) – Wild swing can happen Analogy: – Can elevator scheduling use FCFS? , 183, 37, 122, 14, 124, 65, 67 53
6/3/ SSTF ( Shortest Seek Time First ) Method – Pick the one closest on disk – Rotational delay is in calculation Pros – Try to minimize seek time Cons – Starvation Question – Is SSTF optimal? – Can we avoid starvation? Analogy: elevator , 183, 37, 122, 14, 124, 65, 67 (65, 67, 37, 14, 98, 122, 124, 183) 53
6/3/ Elevator (SCAN) Method – Take the closest request in the direction of travel – Real implementations do not go to the end (called LOOK) Pros – Bounded time for each request Cons – Request at the other end will take a while , 183, 37, 122, 14, 124, 65, 67 (37, 14, 0, 65, 67, 98, 122, 124, 183) 53
6/3/ C-SCAN (Circular SCAN) Method – Like SCAN – But, wrap around – Real implementation doesnt go to the end (C-LOOK) Pros – Uniform service time Cons – Do nothing on the return , 183, 37, 122, 14, 124, 65, 67 (65, 67, 98, 122, 124, 183, 199, 0, 14, 37) 53
6/3/ LOOK and C-LOOK SCAN and C-SCAN move the disk arm across the full width of the disk In practice, neither algorithm is implemented this way More commonly, the arm goes only as far as the final request in each direction. Then, it reverses direction immediately, without first going all the way to the end of the disk. These versions of SCAN and C-SCAN are called LOOK and C-LOOK
6/3/ Group Discussion Questions The disk scheduling algorithm that may cause starvation is: – FCFS or SSTF or C-SCAN or LOOK ?? From the list of disk-scheduling algorithms (FCFS, SSTF, SCAN, C- SCAN, LOOK, C-LOOK), SSTF will always give the least head movement for any set of cylinder-number requests to the disk scheduler: – True or False ?? The cylinder numbers on a disk are 0,1,…10. Currently, there are five cylinder requests on the disk scheduler queue in the following order: 1,5,4,8,7 and the head is located at position 2 and moving in the direction of increasing block numbers. The time to serve a request is proportional to the distance from the head to the cylinder number requested. If T(X) is the time it takes to service the requests currently in the queue using scheduling algorithm X, then: – T(SSTF) < T(SCAN) < T(FCFS) or – T(FCFS) < T(SSTF) < T(SCAN) or – T(SSTF) < T(FCFS) < T(SCAN) or – None of the above???
6/3/ History of Disk-related Concerns When memory was expensive – Do as little bookkeeping as possible When disks were expensive – Get every last sector of usable space When disks became more common – Make them much more reliable When processor got much faster – Make them appear faster
6/3/ Disk Versus Memory Memory Latency in 10s of processor cycles Transfer rate 300+MB/s Contiguous allocation gains ~10x Disk Latency in milliseconds Transfer rate 5-50MB/s Contiguous allocation gains ~1000x
6/3/ On-Disk Caching Method – Put RAM on disk controller to cache blocks Seagate ATA disk has.5MB, IBM Ultra160 SCSI has 16MB Some of the RAM space stores firmware (an OS) – Blocks are replaced usually in LRU order Pros – Good for reads if you have locality Cons – Expensive – Need to deal with reliable writes
6/3/ Disk Block Caches Main memory rather than disk may hold disk blocks 85% or more of all I/O requests by file system and applications can be satisfied by disk block cache BSD UNIX provides a disk block cache as part of the block-oriented device software layer It consists of between 100 and 1000 individual buffers.
6/3/ CD-ROM Compact Disk – Read Only Memory, Optical Disk – 1980 – Philips and Sony developed CD CD is prepared (WRITE OPERATION) – using a high-power infrared laser to burn 0.8 micron diameter holes in a coated glass master disk; – from the master disk, mold is created, processed and reflective layer is deposited on polycarbonate; – depressions on the polycarbonate substrate are called pits, the unburned areas between pits are called lands CD is read (READ OPERATION) – Low-power laser diode shines infrared light with a wavelength of 0.78 micron on pits and lands as they stream by. – Laser is on the polycarbonate side, so pits stick out toward the laser as bumps in the flat surface. – Pits and lands return different light to the players photodetector (pit returns less light than light bouncing off land), hence the player tells pit from land. Pit length is 0.6 micrometers. – Pit/land and land/pit transitions represent 1, absence of transition is 0. CD-Recordable, CD-Rewritables, DVD
6/3/ Disk I/O Summary Disk is an important I/O device Disk must be fast and reliable Error Handling is important; check checksum, called ECC (Error Correcting Code) – Track a bad sector – Substitute a spare for the bad sector – Shift al sectors to bypass the bad one RAID protects against few bad sectors, but does not protect against write errors laying down bad data Stable Storage may be needed