2 Fig 5-11 Logical Position of Device Drivers ReviewFig 5-11 Logical Position of Device Drivers
3 Figure 5-16 Layers of I/O Systems ReviewFigure 5-16 Layers of I/O Systems
4 ReviewFigure 5-4: Operation of a DMA transfer, the DMA controller has access to the system bus independent of the CPU
5 5.4 Disk HardwareDisks are characterized by the fact that reads and writes are equally fast, ideal for secondary memory.Arrays of disks are used to provide highly-reliable storage.Optical disks are important for distribution of programs, data, and movies.
6 Magnetic DisksSome magnetic disks have little electronics and just deliver a simple serial bit stream. On these disks, the controller does most of the work.On other disks, particularly IDE (Integrated Drive Electronics) disks, the drive itself contains a microcontroller that does some work and allows the real controller to issue high-level commands.Overlapped seeks: A controller does seeks on two or more drives at the same time.Many controller can read and write on one drive while seeking on one or more other drives.However, only one transfer between the controller and the main memory is possible.
7 Magnetic DisksDisk parameters for the original IBM PC floppy disk and a Western Digital WD hard disk
9 Disk Hardware32 sectors per track16 sectors per trackA possible virtual geometry for this disk. Mapping from virtual to real parameters is needed.Physical geometry of a disk with two zones (in 18G hard disc (WD18300), there are 162 zones)
10 RAID (Redundant Array of Independent Disks) Distributing data over multiple drives is called striping.Disk mirroring4bit data, 3bit Hamming code for error correctionRaid levels 0 through 2 (note: no hierarchy among levels)Backup and parity drives are shaded
11 RAIDRAID level 0 works best with large requests, the bigger the better. RAID level 0 works worst with OS that habitually ask for one sector at a time.The reliability of RAID level 0 is potentially worse than SLED (Single Large Expensive Disk).RAID level 1 duplicates all the disks.Write performance is no better, but read performance can be up to twice as goodRAID level 2 works on a word basis, even a byte basis. Parity bits of a Hamming code are bits 1, 2, and 4. Imagine the seven drives of Fig 5-19(c) were synchronized on arm position and rotation position.This only makes sense with a substantial number of drives. (high overhead)Why?
12 RAID Raid levels 3 through 5 Backup and parity drives are shaded even or odd parity for 1 bit error correctionLike RAID 0, with strip-for-strip parity written to an extra drive.Raid levels 3 through 5Backup and parity drives are shaded
13 RAID RAID level 3 is a simplified version of RAID level 2. RAID levels 4 and 5 work with strips again, not individual words with parity, and do not require synchronized drives.RAID level 4 performs poorly for small updates. If one sector is changed, it is necessary to read all the drives to recalculate the parity. Slight improvement can be obtained by reading the old user data and old parity data and recompute the new parity from them. Heavy load on the parity drive.RAID level 5 distributes the parity bits uniformly over all the drives. However, in the event of a drive crash, reconstructing the contents of the failed drive is complex.Skip CD-ROMS, .. From p 310 to 315
14 Disk FormattingBefore the disk can be used, each platter must receive a low-level format.The format consists of a series of concentric tracks, each containing someNumber of sectors, with short gaps between the sectors.Start of sector bit pattern (to recognize the start of the sector),Cylinder number,Sector numberError correcting code, about 16 bytesA disk sector after low–level format.With the overhead of preamble, ECC, intersector gap, spare sectors, often the formatted capacity is 20% less than the unformatted capacity.
15 Disk FormattingAn illustration of cylinder skew, meaning the position of sector 0 on each track is offset from the previous track to accommodate rotation latency
16 Cylinder skew exampleFor a 10,000 rpm (rotations per minute) drive, one rotation takes secIf a track contains 300 sectors, a new sector passes under the head everyIf track to track seek time is 800 μsec , then the disk needs a cylinder skew ofData rate
17 Disk FormattingTo allow time for ECC calculation and transferring data to main memory:(b) Single interleaving(c) Double interleaving(a) No interleavingIn many modern controllers, to avoid the need for interleaving, the controller buffers an entire track.
18 PartitionSector 0: (master boot record) containing some boot code plus the partition table at the end, which gives the starting sector and size of each partition. To be able to boot from the hard disk, one partition must be marked as active in the partition table.High-level format of each partition: lays down a boot block, the free storage administration (free list or bitmap), root directory, and an empty file system. It also puts a code in the partition table entry telling which file system is used in the partition.
19 Disk Arm Scheduling Algorithms Time required to read or write a disk block determined by 3 factorsSeek timeRotational delayActual transfer timeSeek time dominatesError checking is done by controllersMany disk drivers maintain a table, indexed by cylinder number, with all the pending requests for each cylinder chained together in a linked list headed by the table entries.
20 Disk Arm Scheduling Algorithms First Come First Served: With initial position at 11, pendingrequests at 1, 36, 16, 34, 9, 12, total arm motions are 111 cylindersInitialpositionPending requests(2) Shortest Seek First (SSF) disk scheduling algorithm: totally 61 cylindersWith a heavily loaded disk, the arm will tend to stay in the middle of the disk most of the time. Not fair.
21 Disk Arm Scheduling Algorithms (3) The elevator algorithm for scheduling disk requests: totally 60 cylinders. The upper bound on the total motion is fixed.(4) Variation of the elevator algorithm: always scan in the same direction to have smaller variance in response time.
22 Disk Arm Scheduling Algorithms If the disk has the property that the seek time is much faster than the rotation delay, then pending requests should be sorted by sector number, and as soon as the next sector is about to pass under the head, the arm should be zipped over to the right track to read/write it.Many disk controllers always read and cache multiple sectors in the controller’s cache memory, even when only one is requested. In its simplest mode, the cache is divided into two sections, one for reads and one for writes.Disk controller’s cache is independent of the OS’s cache. What is the difference?Other issues: myltiple drives, real geometry vs. virtual geometryMany are not actually read
23 Error HandlingBad sectors do not correctly read back the value just written to it. If the defect can not be covered by ECC, the error can not be masked.Two approaches to bad blocks: deal with them in the controller and deal with them in the OS.
24 Error HandlingController handling of bad sectors (in initial shipment)A disk track with a bad sectorSubstituting a spare for the bad sectorShifting all the sectors to bypass the bad one
25 Error HandlingErrors can develop after the drive is installed. If ECC cannot handle, the first thing is try the read again. If it is getting repeated errors on a certain sector, it can switch to a spare before the sector has died completely. Usually Fig5-29(b) has to be used.OS handling of bad sectorsMust make sure bad sectors do not occur in any files and do not occur in any free list or bitmap. One way is done by creating a secret file containing all the bad sectors.
26 Error Handling Problems: Backup: If the disk is backed up sector by sector rather than file by file, it is difficult to prevent read errors during backup. The only hope is the backup program is smart enough to give up after 10 failed reads and continue with the next sector.mechanical problems: When the arm gets to its destination, the controller reads the actual cylinder number from the preamble of the next sector. If the arm is in a wrong place, a seek error has occurred.Most hard disk controller correct seek error automatically. Most floppy controllers let the drive handle the error by issuing a recalibrate command, to move the arm as far as it will go and reset the controller’s internal idea of the current cylinder to 0.
27 Error HandlingDisk usually has a pin on the chip which forces the controller to forget whatever it was doing and reset itself. If all else failed, the disk driver can set a bit to invoke this signal and reset the controller.In systems with real-time constraints, like video or CD-ROM recording, recalibration inserts gaps into the bit stream and are unacceptable. Special drives, which never recalibrate are available for such applications.
28 Stable StorageStable storage: When a write is issued, the disk either correctly writes the data, or it does nothing, leaving the existing data intact.Basic AssumptionWhen a disk writes a block with error, the error could be detected on a subsequent read by examining ECC.Having the same sector go bad on a second (independent) drive during a reasonable time interval is small enough to ignore.CPU can fail, in which case it just stops.
29 Stable StorageUses a pair of identical disks with the corresponding blocks working together to form one error-free block. In the absence of errors, the corresponding blocks on both drives are the same.The following three operations are defined:Stable writes: first writing the block on drive 1, the reading it back to verify. If not correctly, retry up to n times until one works. After n consecutive failures, try a spare sector until it succeeds. After the write to disk 1 has succeeded, the corresponding disk on drive 2 is written in the same way.
30 Stable StorageStable reads: First read from drive 1. If failed with an incorrect ECC, the read is tried again, up to n times. If all failed, the corresponding block is read from drive 2.Crash recovery: If a pair of blocks are both good and the same, nothing needs to be done. If one of them has an ECC error, the bad block is overwritten by the other good block. If both are good but different, the block from drive 1 is written onto drive 2.
31 Analysis of the influence of crashes on stable writes Stable StorageAnalysis of the influence of crashes on stable writes
32 Stable Storage Improvements: Keep track of which block was being written in nonvolatile RAM during a stable write so that only one block has to be checked during recovery. The stable write can put the number of the block it is about to update in nonvolatile RAM before starting the write.If nonvolatile RAM is not available, at the start of the stable write, a fixed disk block on drive 1 is overwritten with the number of the block to be stably written. This block is then read back to verify it. After getting it correct, the corresponding block on drive 2 is written and verified.Once a day, a complete scan of both disks must be done repairing any damage. Then even if both blocks go bad within a period of a few days, all errors are repaired correctly.