Dr. Jerry Shiao, Silicon Valley University

Slides:



Advertisements
Similar presentations
Chapter 8: Secondary-Storage Structure
Advertisements

Cs422 – Operating Systems Organization
Chapter 12: Secondary-Storage Structure. Outline n Cover n (Magnetic) Disk Structure n Disk Attachment n Disk Scheduling Algorithms l FCFS,
Tertiary Storage Devices
Silberschatz and Galvin Operating System Concepts Module 14: Tertiary-Storage Structure Tertiary Storage Devices Operating System Issues Performance.
Mass-Storage Structure
Silberschatz, Galvin and Gagne Operating System Concepts Chapter 14: Mass-Storage Systems Disk Structure Disk Scheduling Disk Management Swap-Space.
Module 13: Secondary-Storage
Chapter 14: Mass-Storage Systems
Chapter 14: Mass-Storage Systems
Chapter 12 Mass-Storage Systems
Chapter 12: Mass-Storage Systems
CS 6560: Operating Systems Design
Page 15/25/2015 CSE 30341: Operating Systems Principles Overview of Mass Storage Structure  Magnetic disks provide bulk of secondary storage  Drives.
Chapter 12: Mass-Storage Structures
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 12: Mass-Storage Systems.
Based on the slides supporting the text
FCFS Illustration shows total head movement of 640 cylinders.
Chapter 12: Secondary-Storage Structure Silberschatz, Galvin and Gagne ©2005 Operating System Principles 12.1 Overview of Mass Storage Structure.
Secondary Storage CSCI 444/544 Operating Systems Fall 2008.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Mass-Storage Systems Revised Tao Yang.
Moving-head Disk Mechanism Rotation Speeds: 60 to 200 rotations per second Head Crash: read-write head makes contact with the surface.
12.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 12: Mass-Storage Systems.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 14: Mass-Storage Systems Disk Structure Disk Scheduling Disk Management Swap-Space.
1 Operating Systems Part VI: Mass- Storage Structure.
Mass-Storage Systems. Objectives  physical structure of secondary and tertiary storage devices  performance characteristics of mass-storage devices.
Silberschatz and Galvin  Operating System Concepts Module 13: Secondary-Storage Disk Structure Disk Scheduling Disk Management Swap-Space Management.
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
Chapter 12: Mass-Storage Systems Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 12: Mass-Storage Systems Overview of Mass.
Page 110/12/2015 CSE 30341: Operating Systems Principles Network-Attached Storage  Network-attached storage (NAS) is storage made available over a network.
Chapter 12: Mass-Storage Systems Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 1, 2005 Chapter 12: Mass-Storage.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 14: Mass-Storage Systems Disk Structure Disk Scheduling Disk Management Swap-Space.
CS 6502 Operating Systems Dr. J.. Garrido Device Management (Lecture 7b) CS5002 Operating Systems Dr. Jose M. Garrido.
CE Operating Systems Lecture 20 Disk I/O. Overview of lecture In this lecture we will look at: Disk Structure Disk Scheduling Disk Management Swap-Space.
I/O Management and Disk Structure Introduction to Operating Systems: Module 14.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 14: Mass-Storage Systems Disk Structure Disk Scheduling Disk Management Swap-Space.
8.1 CSE Department MAITSandeep Tayal 8: Secondary-Storage Disk Structure Disk Scheduling Disk Management Swap-Space Management Disk Reliability Stable-Storage.
Silberschatz, Galvin, and Gagne  Applied Operating System Concepts Module 13: Secondary-Storage Disk Structure Disk Scheduling Disk Management.
Chapter 14: Mass-Storage
Chapter 14: Mass-Storage Systems Operating System Concepts Disk Structure Disk Scheduling Disk Management Swap-Space Management RAID Structure Disk Attachment.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 10: Mass-Storage Systems.
Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Silberschatz, Galvin and Gagne ©2007 Chapter 11: File System Implementation.
Chapter 10: Mass-Storage Systems
Chapter 12: Mass-Storage Systems
Module 13: Secondary-Storage
Chapter 12: Mass-Storage Systems
Operating System (013022) Dr. H. Iwidat
Chapter 12: Mass-Storage Systems
I/O System Chapter 5 Designed by .VAS.
Module 13: Secondary-Storage
Chapter 12: Mass-Storage Structure
Chapter 14: Mass-Storage Systems
Operating System I/O System Monday, August 11, 2008.
Mass-Storage Structure
Module 13: Secondary-Storage
Chapter 12: Mass-Storage Systems
Overview of Mass Storage Structure
Tape pictures.
Overview Continuation from Monday (File system implementation)
Chapter 12: Mass-Storage Systems
Chapter 12: Mass-Storage Systems
Chapter 12: Mass-Storage Systems
Chapter 12: Mass-Storage Structure
Chapter 12: Mass-Storage Systems
Chapter 12: Mass-Storage Systems
Chapter 12: Mass-Storage Systems
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
Mass-Storage Systems (Disk Scheduling)
Module 13: Secondary-Storage
Chapter 12: Mass-Storage Systems
Presentation transcript:

Dr. Jerry Shiao, Silicon Valley University Operating Systems Dr. Jerry Shiao, Silicon Valley University Spring 2014 SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Secondary (Magnetic Disk) and Tertiary (Tape) Physical Structure: Lowest Level of File System. Secondary Storage Devices: Disk Attachment Computer Access: Host-Attached (I/O Ports), Network-Attached Storage, Storage-Area Network. Operating System Support: Disk Scheduling (FCFS, SSTF, SCAN, CSCAN Algorithms) Disk Management (Partitions, MBR) Swap-Space Management (File System, Virtual Memory) Mass Storage in RAID Structure (Redundant Array of Independent Disks) RAID Lvls 0–6 (Variations of RAID, reliability using redundancy) Stable-Storage Implementation (No loss) Tertiary Storage Devices (Magnetic Tape) Performance Issues (Speed, Reliability, Cost) Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Magnetic Disks: Secondary Storage of Computer Systems Multiple Disk Platters Rotates at 60 – 200 times per second. Transfer Rate: Rate of data flow between Disk Drive and Computer System. Typically several MegaBytes per second. Positioning Time: Time move Disk Arm to Cylinder ( Seek Time) and time rotate desired sector to disk head (Rotational Latency). Typicallly Seek Time and Rotational Latency several milliseconds. Disk Head travels over disk, separated by microns. Head crash when Disk Head contacts the Disk. Removable Disk: Consists of one Disk Platter. Disk Controller Connected to Host Controller with I/O Bus. I/O Busses: EIDE (Enhanced Integrated Drive Electronics), ATA (Advanced Technology Attachment), SATA (Serial ATA), USB (Universal Serial Bus), FC (Fibre Channel), SCSI (Small Computer Systems Interface). Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Magnetic Disks: Secondary Storage of Computer Systems Platter Size related to performance. Reduce Head movements and improve seek times (Faster reads and writes). Reduce platter size, improve stiffness and more resistant to shock and vibrations. Flatness of surface eaiser to manufacture. Spindle spins faster , less-powerful motors. Less power, reduces noise and heat. Increases seek performance, less head movements. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Magnetic Disks: Secondary Storage of Computer Systems 5.25” PC Hard Disk. 3.5” Desktop PC Hard Disk. Most common. Compact Flash. PC Card. 2.5” Laptop Hard Disk. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Magnetic Tapes: Early Secondary-Storage Medium. Large Quantities of Data (20 GBytes to 200 Gbytes). Limitations: Access time slow ( 1000 times slower than Hard Disk). Backup File System Mainly used for backup, storage of infrequently used data. Tape in spool and moves across Disk Head. Built-in Compression to increase capacity. Magnetic Tapes categorized by width 4, 8, and 19 Millimeters ¼ and ½ Inches. LTO-2 (Linear Tape-Open) Tape Cartridge SDLT (Super Digital Linear Tape) Tape Cartridge Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Disk Structure Mapped as one-dimensional Array of Logical Blocks Logical Block: Typically 512 Bytes. Sector 0: First sector of first track on outermost cylinder. Mapped outermost cylinder to innermost cylinder. Disk Address: Cylinder number, Track number, Sector number Difficult to translate logical block to physical sector. Number of sectors per track not constant in some drives. Further from center, greater number of sectors per track. CLV (Constant Linear Velocity) specifies uniform density of bits per track. Drive increases rotational speed when head moves from outer to inner track. Used in CDROM and DVDROM. CAV (Constant Angular Velocity) specifies bit density of inner track less to outer track. Drive rotational speed constant. Used in hard disk. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Disk Attachment Computers Access Disk: Host-Attached Storage (I/O Ports) or Network-Attached Storage (remote host). Host-Attached Storage Access through I/O ports. IDE (Integrated Drive Electronics) or ATA (Advanced Technology Attachment) Supports 2 drives per I/O Bus. SCSI (Small Computer Systems Interface) Supports 16 drives or SCSI targets per I/O Bus. Host Controller Card Each SCSI target address up to 8 Logical Units FC ( Fiber Channel) 24 bit address Switch Fabric. Multiple hosts and Storage Devices can attach to the fabric. FC-AL (Arbitrated Loop) Supports 126 devices (drives and controllers). Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Disk Attachment Network-Attached Storage ( NAS ) Accessed remotely over an IP network using TCP or UDP. Clients uses RPC (Remote Procedure Calls) NFS for UNIX CIFS for Windows NAS usually a RAID array with RPC software. Lower performance than direct-attached storage devices. iSCSI: Uses IP network protocol to carry SCSI protocol. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Disk Attachment Storage-Area Network ( SAN ) Private Network Connecting Servers and Storage Units. Multiple Hosts ( Servers ) and Multiple Storage Arrays are attached to same SAN. SAN Switch controls access to the Storage Units. Fiber Channel Interconnects SAN components. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Disk Scheduling Operating System must provide fast access time and transfer rate to mass storage. Access Time Components: Seek Time: Disk arm to move the heads to the cylinder of the sector. Rotational Latency: Disk to rotate to the sector. Disk Bandwidth: Total Bytes transferred, divided by total time between first request and completion of transfer. Improve Access Time (Seek Time and Rotational Latency) by managing order of disk I/O Request. Disk Queue: Requests made when drive or controller is busy. Disk Scheduling: Operating System algorithm to choose the next pending request to service. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Disk Scheduling FCFS Scheduling: First Come First Serve Fair, but does not provide the fastest service. Large swings of the disk head possible, causing lengthy Access Time. Cylinders: Seek time is the Head Movement to a Cylinder. Large jumps between Cylinders. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Disk Scheduling SSTF Scheduling: Shortest Seek Time First Service requests closes to current Head Position before moving Head further away ( minimize Seek Time ). Similar to Shortest Job First Scheduling. Starvation of Job furthest from the current request queue. Cylinders: Minimize Cylinder access ( seek ) time. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Disk Scheduling SCAN Algorithm Elevator Algorithm: Disk arm starts at one end of the disk and moves towards the other end, servicing requests at each cylinder. When the end is reached, head movement is reversed. Requests collect at the cylinders that was just passed. Cylinders: Access chart similar to SSTF. Minimize Cylinder access (seek) time. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Disk Scheduling C-SCAN Algorithm Variant of SCAN with a uniform wait time. Head moves from one end to the other end servicing requests. When other end reached, immediately returns to beginning before servicing requests. Treats cylinders as circular list that wraps from last to first cylinder. Cylinders: Wraps from last cylinder to first cylinder before servicing requests again. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Disk Scheduling C-LOOK or LOOK Algorithm Variant of C-SCAN Head moves from one end to the other end servicing requests. Head only goes as far as final request in one direction before reversing directions. SCAN and C-SCAN typically follows this pattern. Cylinders: Wraps as soon as last request is reached. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Disk Scheduling Selection of Disk-Scheduling Algorithm SSTF simple, performs better than FCFS. SCAN, C-SCAN perform better for systems with heavy disk usage. Require retrieval algorithm. File Allocation Algorithm important. Contiguous blocks will have minimum head movement. Linked or Indexed File can have blocks scattered on the disk. Caches for directories and index blocks required, otherwise frequent directory access causes excess head movements. SSTF or C-LOOK ( LOOK ) typically used as default algorithm. Disk Scheduling Algorithm written as separate module, allowing it to be replaced with a different algorithm. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Disk Management Magnetic disk initially is a platter of magnetic recording material. Low-Level Formatting or Physical Formatting: Disk Controller instructed on how many bytes of data (256, 512, 1024 Bytes) between the header and trailer of each sector. Fills disk with sectors. Header sector number. Trailer ECC (Error-Correcting Code) calculated every Write and verified on Read. ECC used to recover actual value ( Soft Error ) and reported to Disk Controller. Operating System creates partitions (groups of cylinders). Logical Formatting: Creation of a File System in the partition or left “raw”. File System data structures include maps of free and allocated space ( a FAT or inodes) and initial empty directory. File System group blocks together into cluster for I/O performance. Raw disk is viewed as large sequential array of logical blocks ( No File System services, file locking, prefetching, space allocation, file names, and directories). Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Disk Management Bootstrap Loader Stored in ROM at fixed location: Processor starts executing when reset. Small bootstrap loader program instructions disk controller to read boot blocks into memory and starts executing the Bootstrap program. Boot Block Bootstrap program at fixed location on disk (Master Boot Record) initializes CPU registers to device controllers, contents of main memory, and loads Operating System from non-fixed location on disk and start Operating System running. Windows 2000: Boot Block in first sector of the hard disk ( Master Boot Record ). MBR contains table of partitions and which partiion is the boot partition (Operating System and Device Drivers). Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Disk Management Bad Blocks Sectors can become defective (moving parts with small tolerances). IDE Disk Controller require bad blocks to be detected manually. format () command updates FAT during File System initialization. chkdsk () command during runtime. SCSI Disk Controller maintains list of bad blocks from low-level formatting. Maintains list of spare sectors to replace bad blocks. Spare sectors are allocated on each cylinder and entire spare cylinder is also allocated (used to prevent disk scheduling algorithm from being invalidated when bad sectors are replaced with spare sectors). Sector Sparing: SCSI Disk Controller detects bad sector for FCC and notifies Operating System. During bootup, Operating System notifies controller to replace bad sector with spare sector. Sector Slipping: All sectors after the bad sector is moved down one sector until spare sector is reached. Bad sector is mapped to the “free” sector after it. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Swap-Space Management Operating Systems merged Swapping with Virtual Memory Paging Techniques. In Paging Systems, only pages of a process are swapped out. Virtual Memory uses disk space as extension of main memory. Disk access is slower than memory access: how swap space is used and how swap space is managed is critical to performance of Operating System. Swap Space in raw partition: Swap-Space Storage Manager used to allocate and deallocate blocks from the raw partition.’ Swap Space partition is fixed. Internal Fragmentation can occur, but process life is short. Increasing Swap Space require repartitioning. Swap Space in File-System Space: Large file within the File System: File-System APIs to create/allocate. Straightforward to implement, but inefficient to use File-System data structures. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Swap-Space Management Linux has swap space in a swap file on a File System or a raw-swap-space partition. Swap space only used for anonymous memory ( memory allocated for stack, heap, uninitialized data area, and regions of memory shared by other processes). Text-segment pages are reread from disk when swapped out. Swap area has 4 Kbyte Page Slots and Swap Maps to track swap space. Array of counters, each corresponding to a page slot in swap area. Count in swap map entry indicates multiple processes share the page. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure RAID Structure RAID (Redundant Arrays of Independent Disks): Redundant Information stored on multiple disks for Reliability. Disks attached directly to I/O Bus: Operating System implement RAID. Intelligent Host Controller: Control multiple disks and implement RAID at controller. Storage Array or RAID Array Standalone unit with controller: Has cache, control multiple disks attached to host with ATA, SCSI, FC controller. Solution to Reliability is Redundancy: Store extra information that is used in event of disk failure to rebuild lost information. Mirrored Volume: Logical disk consists of two physical disk and write is carried out on both disks. Simple, but expensive. Mean Time Before Failure (MTBF): Mean time to failure on single disk (100,000 Hours). Mean Time To Repair (MTTR): Time to replace disk and restore data (10Hrs). Mean Time to Data Loss: (MTBF^2) / (2*MTTR) =100K^2 / (2*10)=10^6 Hrs NVRAM Cache protect agains power failure. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure RAID Structure Performance Via Parallelism: Achieve Performance using Parallelism Techniques. Mirroring: Read Reqs sent to either disk, 2 x Reqs per logical disk. Multiple Disks: Bit-Level Data Striping: Splitting bits of each byte across multiple disks. Bit “n” of each byte to disk “n”. Multiple of 8 disks or divides 8 (i.e. 4 disks or 2 disks). Every disk participates in every access (read or write). Array of 8 disks, I/O access 8 times as fast. Block-Level Striping: Splitting blocks across multiple disks. Block “n” with “m” disks, block “n” goes to disk (n mod m) + 1. Array of 4 disks, Block 0: (0 mod 4) = 0 + 1= disk 1. Block 1: (1 mod 4) = 1 + 1= disk 2. Striping Parallelism achieves: Increase throughput by load balancing. Reduce response time of large accesses. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure RAID Structure RAID Levels Schemes using mirroring (reliability) and striping (performance). RAID 0: Non-Redundant Striping. Block Striping for performance, No redundancy. RAID 1: Mirrored Disks. Disk Mirroring for redundancy. RAID 2: Memory-Style Error-Correcting Codes. Bit-Level Striping with ECC bits stored in additional disks. RAID 3: Bit-Interleaved Parity. Block Striping for performance, single bit for parity. Uses disk controller detection of read/write error of each sector using sector’s Error Correction Code. Uses dedicated parity hardware and NVRAM cache to store blocks during parity computation. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure RAID Structure RAID 4: Block-Interleaved Parity. Block Striping for performance, parity block to restore failed block. RAID 5: Block-Interleaved Distributed Parity. Block Striping for performance, parity block to all disks in the RAID. Most common parity RAID system. RAID 6: P + Q Reduncancy. Similar to RAID 5, extra redundant information guards against multiple disk failure. RAID 0 + 1: Combination of RAID 0 (Performance) and RAID 1 (Reliability). Disk is striped and then stripe is mirrored. Limitation: If first disk fail, then entire stripe is unavailable. RAID 1 + 0: Combination of RAID 1 and RAID 0. Disk is mirrored in pairs and then mirrored pairs are striped. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure RAID Structure C: Copy of the data on disk. P: Error Correcting Bits. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure RAID Structure A, B, C, D, E, F: Blocks P1, P2, P3: Parity www.thegeekstuff.com SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure RAID Structure Striping is done before the mirror. If the disk fails, then the mirror set cannot be used. Degrades to a RAID 0. Disks are mirrored in pairs and then mirrors are striped together. If a disk fails, only that disk loses its portion of the stripe, the other disks retain their stripe portions. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure RAID Structure Variations to RAID Implementations: RAID implemented within Kernel or System Software Layer. Storage hardware provides minimum features. Typically RAID 0, RAID 1, or RAID 0 + 1. RAID implemented in Host Bus-Adapter (HBA) hardware. Restrictive, only disks connected to the HBA is in the RAID. RAID implemented in hardware of storage array. Storage Array create multiple RAID sets. Operating System only implements File System. RAID implemented in SAN interconnect layer, between hosts and storage. Accepts commands from the servers and manages access to storage. Replication between Storage Arrays Auto duplication of writes between separate sites for redundancy and disaster recovery. Hot Spare Not used and configured as replacement in case of disk failure. Rebuild a mirrored pair and RAID level reestablished automatically. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure RAID Structure Selecting RAID Level Rebuild Performance Important in high-performance or interactive database systems. Easiest with RAID 1, data copied from another disk. Other RAID levels require accessing all disks in the array (RAID 5 slow). RAID 0 used in high-performance applications: Data loss not critical. RAID 1 popular for applications that require high reliability and fast recover. RAID 0 + 1: Performance and reliability important (small databases). RAID 1 + 0: Performance and reliability important (small databases). RAID 5: Large volumes of data. RAID 6: Not supported by many RAID implementations. Criterias: How many disks in RAID set? More disks, data transfer rates higher, but more expensive. How many bits protected by parity bit? Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure RAID Structure RAID Problem: Protects against physical media errors, not data corruption, other hardware and software errors. Solaris ZFS File System: Checksums to protect all data/metadata. Internal checksums kept with pointer to the data block. Inode has checksum of each data block. Directory entry has checksum for the inode. Provides a high level of consistency, error detection, and error correction. Checksum protects data and metadata. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure RAID Structure RAID Problem: Lack of Flexibility, File Systems cannot grow or shrink dynamically. ZFS combines File System Management and Volume Management. RAID set contains pools of storage: Pool contain one or more File Systems. Entire pool free space available to all File Systems within the pool. No artificial limit on storage and no need to relocate File Systems between volumes or resize volumes. Configure quotas on File System growth . Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Stable-Storage Implementation Certain applications (Write-Ahead Log) require concept of Stable Storage. Information is never lost: Replicate the information on multiple storage devices (disk) with independent failure modes. Coordinate writing of updates to ensure that stable data is recovered after any failure during data transfer or even during recovery. Detection and recovery procedure to restore the data block. Procedure must maintain two physical block for logical block. Write operation complete after both physical blocks are written. Usually two copies enough for Stable Storage, but could have an arbitrary number of copies. NVRAM cache memory nonvolitile (battery power) will receive the write before writing to disk. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Tertiary Storage Devices Low cost, removable media: Removable Disks: Floppy Disks, Removable Magnetic Disks. CD-ROM and DVD-ROM: Write-Once, Read-Many (WORM) disks with thin aluminum film to record. Magneto-Optic Disk records data on platter with magnetic material and uses Laser light to record. Optical Disks uses special materials altered by laser light instead of magnetism (ReadOnly/WriteOnce/RW CDs and DVDs). Magnetic Tapes: Application not requiring fast random access. Used to hold backup copies of disk data or large volumes of data used in research and record storage. Tape Libraries: Stacker (Library holds few tapes), Silo (Library holds thousands of tapes). Solid-State Disks (SSD): Nonvolatile SSDs has same characteristics as hard disks, but no moving parts (no seek time or latency) and faster than hard drives. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Tertiary Storage Devices Operating System Support: Abstractions for removable media. Raw Device: Array of data blocks. File System: File System structures for storage media. Operating System must queue and schedules requests. Tapes: Raw storage medium. Application opens whole tape drive as raw device (no File System APIs and services). Application must decide how to organize the array of blocks. Tape reserved for the exclusive use of that application (another application would not know how to interpret the tape). Variable block size and size of block determined when block is written. locate( ) operation finds specific block number. Cannot locate into empty space beyond written area. Last block written has end-of-tape (EOT) mark written. Tape drives are Append-Only devices (updating a block in the middle of the tape erases everytihing beyond that block). read position ( ) returns the current logical block number of tape head. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Tertiary Storage Devices File Naming Operating Systems mostly leave naming of removable media unresolved and depend on applications and users to determine how to access and interpret the data. UNIX has mount table to identify the location of the media. Hierarchial Storage Management ( HSM) Extends the storage hierarchy beyond primary memory and secondary memory (magnetic disk) to incoporate Tertiary Storage. Tertiary Storage implemented as collection of tapes or removable disks. Extends the File System. Small and frequently used files remain on magnetic disk and large and inactive files are archived to Tertiary Storage ( Tape Drive ) HSM in supercomputing centers and large companies that have large volumns of data. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Tertiary Storage Devices Performance Issues: Tertiary Storage performance aspects are speed, reliability, and cost. Speed Two aspects of speed in Tertiary Storage are Bandwidth and Latency. Sustained Bandwidth: Average data rate during transfer – number of bytes divided by the transfer time. Effective Bandwidth: Average over the I/O time, including seek( ), locate( ), and any cartridge switching time in a tape or disk library ( jukebox ). Bandwidth of a drive is the Sustained Bandwidth. Disk: Few megabytes to > 60 Mbytes per second (affected by rotational speed and ATA/SCSI controller). Tape: Few Megabytes to > 30 Mbytes per second. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Tertiary Storage Devices Speed Access Latency: Amount of time needed to locate data. Disk: Two dimensional, moves arm to selected cylinder and wait for rotational latency ( < 5 milliseconds ). Tape: Three dimensional, most of the data are buried below layers of tape wound on a reel. Selected block reaching tape head takes tens or hundreds of seconds ( > 1000 times slower than disk ). Disk Jukebox: Drive stops spinning, robotic arm switch disk cartridge, spins up new cartridge ( several seconds ) and the disk access latency. Average latency of tens of seconds. Tape Jukebox: Tape rewinding (< 4 minutes), robotic arm switch tape cartridge (1 or 2 minutes), drive calibration to tape ( many seconds ), and tape access latency. Average latency of hundreds of seconds. Jukebox or removable library is best devoted to storage of infrequently used data. Can only support relatively small number of I/O requests per hour. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Tertiary Storage Devices Reliability Fixed hard disks are more reliable than removable magnetic disks. Removable disks exposed to dust, changes to temperature and humidity, and mechanical abuse ( shock, bending ). Head fault: Head crash of hard disk will destroy the platter and data. Optical disks are very reliable: layer storing bits is protected by transparent plastic or glass layer. Magnetic Tape reliability varies: Depends on the tape drive. Inexpensive drives wear out tape after few dozen uses. Expensive drives allow tape to be used millions of times. Head fault: Head crash leaves the data cartridge unharmed. In order of reliability: Optical disks, fixed-disk drive, removable-disk drives, removable-tape drives. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Tertiary Storage Devices Cost Main memory is more expensive than disk storage by factor of 100. Cost of storage has fallen dramatically, with price of disk storage dropped more, relative to price of DRAM and tape. Disk drive per megabyte is approaching cost of a tape cartridge without the tape drive. Small and medium size tape libraries has higher storage cost than disk systems with equivalent capacity. Cost of tape is a small fraction of the price of the tape drive. Overall cost of tape storage becomes lower as more tapes are purchased per tape drive. Tertiary Storage is obsolete, no longer order of magnitude less expensive than magnetic disk. Tape storage limited to backups of disk drives and archival storage in tape libraries that greatly exceed the practical storage capacity of large disk farms. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Tertiary Storage Devices Cost: Price per Mbyte of DRAM from 1981 to 2004 4 price crashes: 1981, 1984. 1989, 1996. Caused by excess production. 1987 and 1993 there was a shortage and caused price increases. As SIMM density increases, cost decreases per MB. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Tertiary Storage Devices Cost: Price per Mbyte of Magnetic Hard Disk from 1981 to 2004 Price decline has been steady. From 1981 to 2004, price has dropped by > 4 order of magnitude ($100 / MB to $.001 / MB). In 2004, DRAM is $ .8 / MB and Disk is $ .001 / MB, about 100 to 1 difference. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Tertiary Storage Devices Cost: Price per Mbyte of Magnetic Tape Drive from 1981 to 2004 Tape drive prices has fell steadily up to 1997. Since 1997, tape drive prices has not pummeted as rapidly as disk drives. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL

Mass-Storage Structure Summary Disk Drives are the major Secondary-Storage I/O Devices. Structured as large one-dimensional arrays of logical 512 disk blocks. Attached to Computer Systems through I/O Ports or through Network Connections. Disk-Scheduling Algorithms improves the effective bandwidth, average response time and variance in response time. SSTF, C-SCAN, LOOK, and C-LOOK. Operating System manages the disk blocks. Formats the sectors on raw hardware and partitions the disk. Boot Blocks is created. File System created. Raw disk partition or File System for Swap space . Reliability via Redundancy using Redundant Array of Independent Disks ( RAID): Different levels of RAID. Tertiary Storage of Removable Disk and Tape Drives. Operating System Support Removable Disk with File System interface and Tapes with specialilzed Device Driver. Copyright @ 2009 John Wiley & Sons Inc. SILICON VALLEY UNIVERSITY CONFIDENTIAL