IO and RAID
Input/Output 5.1 Principles of I/O hardware 5.2 Principles of I/O software 5.3 I/O software layers 5.4 Disks 5.5 Clocks 5.6 Character-oriented terminals 5.7 Graphical user interfaces 5.8 Network terminals 5.9 Power management
Device Controllers I/O devices have components: mechanical component electronic component The electronic component is the device controller may be able to handle multiple devices Controller's tasks convert serial bit stream to block of bytes perform error correction as necessary make available to main memory
Memory-Mapped I/O: merits all code can be written in C -test&set on control register data buffer in memory control registers are on ports IN REG, PORT OUT PORT, REG
Memory-Mapped I/O: demerits cached IO registers can be disastrous complex bus architectures case (b) is more difficult for MM IO
Direct Memory Access (DMA) physical or virtual address?
Goals of I/O Software (1) Device independence programs can access any I/O device without specifying device in advance (floppy, hard drive, or CD-ROM) Uniform naming name of a file or device a string or an integer not depending on which machine Error handling handle as close to the hardware as possible synchronous vs asynchronous buffered or not
Goals of I/O Software (2) Synchronous vs. asynchronous transfers blocked transfers vs. interrupt-driven Buffering data coming off a device cannot be stored in final destination Sharable vs. dedicated devices disks are sharable tape drives would not be
Programmed I/O copy_from_user(buffer, p, count) /* p is the kernel buffer */ for(i=0; i<count; i++) { while(*printer_status_register != READY); /* polling */ *printer_data_register = p[i]; }
Interrupt-Driven I/O syscall ends up with calling the scheduler (a) io request interrupt printer syscall ends up with calling the scheduler (a) the device raises an interrupt when it is ready the interrupt is handled like (b)
I/O Using DMA syscall (a) and interrupt handler (b)
Layers of the I/O Software System I/O Software Layers Layers of the I/O Software System
Interrupt Handlers (1) Save regs not already saved by interrupt hardware Set up context(MMU) for interrupt service procedure Set up stack for interrupt service procedure Ack interrupt controller, reenable interrupts Copy registers from where saved Run service procedure Set up MMU context for process to run next Load new process' registers Start running the new process top half bottom half interrupt handler
Device Drivers device driver
Device Drivers in User Space Advantages all user level utilities can be used libraries, debugger an error affects only the driver, not the whole kernel easy installation (dynamic and static) Disadvantages interrupts are not available direct memory is not possible slow (user/kernel switching) usually not reentrant
Device-Independent I/O Software (1) Uniform interfacing for device drivers Buffering Error reporting Allocating and releasing dedicate devices Providing a device-independent block size io software Functions of the device-independent I/O software
Device-Independent I/O Software (2) Standard interface eases development of device driver naming like a file eases protection
Input Buffering (a) Unbuffered input: process one char at a time (b) Buffering in user space: buffer may be paged out (c) Buffering in the kernel followed by copying to user space: buffer may not has been copied yet (d) Double buffering in the kernel
Copy Operations Due to Buffering Networking may involve many copies
Review Layers of the I/O system and the main functions of each layer
Disk Hardware (1) model Baracuda ATA II cheetah 73 capacity 30GB 73GB plates # 3 12 heads # 6 24 RPM 7200 10025 sector size 512B same sector/track 63 463 track/in 21368 18145 seek time read 8.2 ms 5.85ms write 9.5ms 6.35ms track to track(r) 1.2ms 0.6ms track to track(w) 1.9ms 0.9ms
Disk Hardware (2) Physical geometry of a disk with two zones A possible virtual geometry for this disk
RAID no redundancy same as SLED for a single sector request mirroring spread byte or word disks should be synchronized Hamming code
Disk Hardware (4) same as RAID-2 parity instead of Hamming deals with single failure
Disk Formatting cylinder skew hides track-to-track seek latency
Disk Interleaving no interleaving accessing consecutive sectors 1. move a sector to buffer 2. copy buffer to memory 3. move next sector…. cannot utilized contiguous allocation single/double interleaving better buffer entire track instead of a sector
Disk Arm Scheduling Algorithms (1) Time required to read or write a disk block determined by 3 factors Seek time Rotational delay Actual transfer time Seek time dominates Error checking is done by controllers
Error Handling A disk track with a bad sector Substituting a spare for the bad sector Shifting all the sectors to bypass the bad one
Stable Storage stable storage write to mirrored disks writes correct value or leaves the disk intact write to mirrored disks write to disk 1 and verify do the same to disk 2 non-volatile RAM speeds up this operation
RS-232 Terminal Hardware An RS-232 terminal communicates with computer 1 bit at a time Called a serial line – bits go out in series, 1 bit at a time names: /dev/ttyx, COM1 and COM2 ports Computer and terminal are completely independent
Input Software (1) Central buffer pool Dedicated buffer for each terminal
Display Hardware (1) Memory-mapped displays Parallel port Memory-mapped displays driver writes directly into display's video RAM
Input Software Keyboard driver delivers a number driver converts to characters uses a ASCII table Exceptions, adaptations needed for other languages many OS provide for loadable keymaps or code pages
Clients and servers in the M.I.T. X Window System
The SLIM Network Terminal (1) The architecture of the SLIM terminal system
The SLIM Network Terminal (2) Messages used in the SLIM protocol from the server to the terminals
Power consumption of various parts of a laptop computer Power Management (1) Power consumption of various parts of a laptop computer
Power management (2) The use of zones for backlighting the display
Power Management (3) Running at full clock speed Cutting voltage by two cuts clock speed by two, cuts power by four
Power Management (4) Telling the programs to use less energy Examples may mean poorer user experience Examples change from color output to black and white speech recognition reduces vocabulary less resolution or detail in an image
AutoRAID Problems with RAID difficult to configure requires detailed knowledge of disks and workload too many parameters to set difficult to change the configuration difficult to add disks RAID 5 preserve an extra disk to cope with a disk failure (hot spare) we don’t know if this disk works since it is not used in normal operation
AutoRAID The solutions Offer a hierarchy of levels store active data in mirrored disks and store inactive data in level 5 arrays depending the activity, data are moved across this hierarchy the active set must be small to be fitted into the mirrored disk be unchanged for a long time to prevent frequent data movement
Hierarchy Implementation manually by the system administrator error-prone cannot adapt to rapidly changing access patterns requires highly skilled person difficult to add new disks file system best place since it is aware of access patterns for each file and it is a software not in customer’s hand array controller (behind scsi controller) devoid of the knowledge of access patterns make the array look like a normal disk
AutoRAID Features Mapping: maps host block address to physical location Mirroring and RAID 5 use mirroring until run short of disk space, if space is needed, store some data as RAID 5 keep data in RAID 1 be at least 10% of all data transfer data by the frequency of accesses automatic and done in idle time online disk upgrade dual controllers active use of hot spares when a disk fails, some data in the level 1 are moved to level 5 to make a room for reconstruction
Hardware Configuration
Data Layout LUN = a number of PEX RB is visible to clients
Data Layout(2)
Mapping Tables
Migration from level 5 to level 1 (from inactive to active) when RB is written from RAID 1 to RAID 5 in background if the data has not been changed for a long time if space is needed in Mirror
Review shows a perfect research and development hardware development in parallel with the simulator an accurate simulator makes it possible to calibrate the design decision before they committed to making a product Comparison with CLARiiON and JBOD(Just Bunch of Disks) Microbenchmark results do not give much intuition Would be nice to show what happens when AutoRAID degrades to the mirrored mode Not much why, mostly what Does not really tell us much about the statistical significance