Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMPUT429/CMPE382 Amaral 1/17/01 CMPUT429/CMPE382 Winter 2001 TopicC: I/O (Adapted from David A. Pattersons CS252, Spring 2001 Lecture Slides)

Similar presentations


Presentation on theme: "CMPUT429/CMPE382 Amaral 1/17/01 CMPUT429/CMPE382 Winter 2001 TopicC: I/O (Adapted from David A. Pattersons CS252, Spring 2001 Lecture Slides)"— Presentation transcript:

1 CMPUT429/CMPE382 Amaral 1/17/01 CMPUT429/CMPE382 Winter 2001 TopicC: I/O (Adapted from David A. Pattersons CS252, Spring 2001 Lecture Slides)

2 CMPUT429/CMPE382 Amaral 1/17/01 Motivation: Who Cares About I/O? CPU Performance: Improves 60% per year I/O system performance limited by mechanical delays (disk I/O) improves less than 10% per year (IO per sec) Amdahl's Law: system speed-up limited by the slowest part! 10% IO & 10x CPU => 5x Performance (lose 50% of CPU gain) 10% IO & 100x CPU => 10x Performance (lose 90% of CPU gain) I/O bottleneck: Diminishing fraction of time in CPU Diminishing value of faster CPUs

3 CMPUT429/CMPE382 Amaral 1/17/01 I/O Systems Processor Cache Memory - I/O Bus Main Memory I/O Controller Disk I/O Controller I/O Controller Graphics Network interrupts

4 CMPUT429/CMPE382 Amaral 1/17/01 Outline Disk Basics Disk History Disk options in 2000 Disk fallacies and performance Tapes RAID

5 CMPUT429/CMPE382 Amaral 1/17/01 Disk Device Terminology Several platters, with information recorded magnetically on both surfaces (usually) Actuator moves head (end of arm,1/surface) over track (seek), select surface, wait for sector rotate under head, then read or write – Cylinder: all tracks under heads Bits recorded in tracks, which in turn divided into sectors (e.g., 512 Bytes) Platter Outer Track Inner Track Sector Actuator HeadArm

6 CMPUT429/CMPE382 Amaral 1/17/01 Photo of Disk Head, Arm, Actuator Actuator Arm Head Platters (12) { Spindle

7 CMPUT429/CMPE382 Amaral 1/17/01 Disk Device Performance Platter Arm Actuator HeadSector Inner Track Outer Track Disk Latency = Seek Time + Rotation Time + Transfer Time + Controller Overhead Seek Time? depends no. tracks move arm, seek speed of disk Rotation Time? depends on speed disk rotates, how far sector is from head Transfer Time? depends on data rate (bandwidth) of disk (bit density), size of request Controller Spindle

8 CMPUT429/CMPE382 Amaral 1/17/01 Disk Device Performance Average distance sector from head? 1/2 time of a rotation –10000 Revolutions Per Minute Rev/sec –1 revolution = 1/ sec 6.00 milliseconds –1/2 rotation (revolution) 3.00 ms Average no. tracks move arm? –Sum all possible seek distances from all possible tracks / # possible »Assumes average seek distance is random –Disk industry standard benchmark

9 CMPUT429/CMPE382 Amaral 1/17/01 Data Rate: Inner vs. Outer Tracks To keep things simple, orginally kept same number of sectors per track –Since outer track longer, lower bits per inch Competition decided to keep BPI the same for all tracks (constant bit density) More capacity per disk More of sectors per track towards edge Since disk spins at constant speed, outer tracks have faster data rate Bandwidth outer track 1.7X inner track! –Inner track highest density, outer track lowest, so not really constant –2.1X length of track outer / inner, 1.7X bits outer / inner

10 CMPUT429/CMPE382 Amaral 1/17/01 Devices: Magnetic Disks Sector Track Cylinder Head Platter Purpose: – Long-term, nonvolatile storage – Large, inexpensive, slow level in the storage hierarchy Characteristics: – Seek Time (~8 ms avg) »positional latency »rotational latency Transfer rate – MByte/sec –Blocks Capacity –Gigabytes –Quadruples every 2 years (aerodynamics) 7200 RPM = 120 RPS => 8 ms per rev ave rot. latency = 4 ms 128 sectors per track => 0.25 ms per sector 1 KB per sector => 16 MB / s Response time = Queue + Controller + Seek + Rot + Xfer Service time

11 CMPUT429/CMPE382 Amaral 1/17/01 Disk Performance Model /Trends Capacity + 100%/year (2X / 1.0 yrs) Transfer rate (BW) + 40%/year (2X / 2.0 yrs) Rotation + Seek time – 8%/ year (1/2 in 10 yrs) MB/$ > 100%/year (2X / 1.0 yrs) Fewer chips + areal density

12 CMPUT429/CMPE382 Amaral 1/17/01 State of the Art: Barracuda 180 –181.6 GB, 3.5 inch disk –12 platters, 24 surfaces –24,247 cylinders –7,200 RPM; (4.2 ms avg. latency) –7.4/8.2 ms avg. seek (r/w) –64 to 35 MB/s (internal) –0.1 ms controller time –10.3 watts (idle) source: Latency = Queuing Time + Controller time + Seek Time + Rotation Time + Size / Bandwidth per access per byte { + Sector Track Cylinder Head Platter Arm Track Buffer

13 CMPUT429/CMPE382 Amaral 1/17/01 Disk Performance Example (will fix later) Calculate time to read 64 KB (128 sectors) for Barracuda 180 X using advertised performance; sector is on outer track Disk latency = average seek time + average rotational delay + transfer time + controller overhead = 7.4 ms * 1/(7200 RPM) + 64 KB / (65 MB/s) ms = 7.4 ms /(7200 RPM/(60000ms/M)) + 64 KB / (65 KB/ms) ms = ms = 12.7 ms

14 CMPUT429/CMPE382 Amaral 1/17/01 Areal Density Bits recorded along a track –Metric is Bits Per Inch (BPI) Number of tracks per surface –Metric is Tracks Per Inch (TPI) Disk Designs Brag about bit density per unit area –Metric is Bits Per Square Inch –Called Areal Density –Areal Density = BPI x TPI

15 CMPUT429/CMPE382 Amaral 1/17/01 Areal Density –Areal Density = BPI x TPI –Change slope 30%/yr to 60%/yr about 1991

16 CMPUT429/CMPE382 Amaral 1/17/01 MBits per square inch: DRAM as % of Disk over time source: New York Times, 2/23/98, page C3, Makers of disk drives crowd even more data into even smaller spaces 470 v Mb/si 9 v. 22 Mb/si 0.2 v. 1.7 Mb/si

17 CMPUT429/CMPE382 Amaral 1/17/01 Historical Perspective 1956 IBM Ramac early 1970s Winchester –Developed for mainframe computers, proprietary interfaces –Steady shrink in form factor: 27 in. to 14 in Form factor and capacity drives market, more than performance 1970s: Mainframes 14 inch diameter disks 1980s: Minicomputers,Servers 8,5 1/4 diameter PCs, workstations Late 1980s/Early 1990s: –Mass market disk drives become a reality »industry standards: SCSI, IPI, IDE –Pizzabox PCs 3.5 inch diameter disks –Laptops, notebooks 2.5 inch disks –Palmtops didnt use disks, so 1.8 inch diameter disks didnt make it 2000s: –1 inch for cameras, cell phones?

18 CMPUT429/CMPE382 Amaral 1/17/01 Disk History Data density Mbit/sq. in. Capacity of Unit Shown Megabytes 1973: 1. 7 Mbit/sq. in 140 MBytes 1979: 7. 7 Mbit/sq. in 2,300 MBytes source: New York Times, 2/23/98, page C3, Makers of disk drives crowd even more data into even smaller spaces

19 CMPUT429/CMPE382 Amaral 1/17/01 Disk History 1989: 63 Mbit/sq. in 60,000 MBytes 1997: 1450 Mbit/sq. in 2300 MBytes source: New York Times, 2/23/98, page C3, Makers of disk drives crowd even mroe data into even smaller spaces 1997: 3090 Mbit/sq. in 8100 MBytes

20 CMPUT429/CMPE382 Amaral 1/17/01 1 inch disk drive! 2000 IBM MicroDrive: – 1.7 x 1.4 x 0.2 –1 GB, 3600 RPM, 5 MB/s, 15 ms seek –Digital camera, PalmPC? 2006 MicroDrive? 9 GB, 50 MB/s! –Assuming it finds a niche in a successful product –Assuming past trends continue

21 CMPUT429/CMPE382 Amaral 1/17/01 Disk Characteristics in 2000 $447$435$828

22 CMPUT429/CMPE382 Amaral 1/17/01 Disk Characteristics in 2000

23 CMPUT429/CMPE382 Amaral 1/17/01 Disk Characteristics in 2000

24 CMPUT429/CMPE382 Amaral 1/17/01 Disk Characteristics in 2000

25 CMPUT429/CMPE382 Amaral 1/17/01 Fallacy: Use Data Sheet Average Seek Time Manufacturers needed standard for fair comparison (benchmark) –Calculate all seeks from all tracks, divide by number of seeks => average Real average would be based on how data laid out on disk, where seek in real applications, then measure performance –Usually, tend to seek to tracks nearby, not to random track Rule of Thumb: observed average seek time is typically about 1/4 to 1/3 of quoted seek time (i.e., 3X-4X faster) –Barracuda 180 X avg. seek: 7.4 ms 2.5 ms

26 CMPUT429/CMPE382 Amaral 1/17/01 Fallacy: Use Data Sheet Transfer Rate Manufacturers quote the speed off the data rate off the surface of the disk Sectors contain an error detection and correction field (can be 20% of sector size) plus sector number as well as data There are gaps between sectors on track Rule of Thumb: disks deliver about 3/4 of internal media rate (1.3X slower) for data For example, Barracuda 180X quotes 64 to 35 MB/sec internal media rate 47 to 26 MB/sec external data rate (74%)

27 CMPUT429/CMPE382 Amaral 1/17/01 Disk Performance Example Calculate time to read 64 KB for UltraStar 72 again, this time using 1/3 quoted seek time, 3/4 of internal outer track bandwidth; (12.7 ms before) Disk latency = average seek time + average rotational delay + transfer time + controller overhead = (0.33 * 7.4 ms) * 1/(7200 RPM) + 64 KB / (0.75 * 65 MB/s) ms = 2.5 ms /(7200 RPM/(60000ms/M)) + 64 KB / (47 KB/ms) ms = ms = 8.2 ms (64% of 12.7)

28 CMPUT429/CMPE382 Amaral 1/17/01 Future Disk Size and Performance Continued advance in capacity (60%/yr) and bandwidth (40%/yr) Slow improvement in seek, rotation (8%/yr) Time to read whole disk YearSequentiallyRandomly (1 sector/seek) minutes6 hours minutes 1 week(!) 3.5 form factor make sense in 5 yrs? –What is capacity, bandwidth, seek time, RPM? –Assume today 80 GB, 30 MB/sec, 6 ms, RPM

29 CMPUT429/CMPE382 Amaral 1/17/01 Tape vs. Disk Longitudinal tape uses same technology as hard disk; tracks its density improvements Disk head flies above surface, tape head lies on surface Disk fixed, tape removable Inherent cost-performance based on geometries: fixed rotating platters with gaps (random access, limited area, 1 media / reader) vs. removable long strips wound on spool (sequential access, "unlimited" length, multiple / reader) Helical Scan (VCR, Camcoder, DAT) Spins head at angle to tape to improve density

30 CMPUT429/CMPE382 Amaral 1/17/01 Current Drawbacks to Tape Tape wear out: –Helical 100s of passes to 1000s for longitudinal Head wear out: –2000 hours for helical Both must be accounted for in economic / reliability model Bits stretch Readers must be compatible with multiple generations of media Long rewind, eject, load, spin-up times; not inherent, just no need in marketplace Designed for archival

31 CMPUT429/CMPE382 Amaral 1/17/01 Automated Cartridge System: StorageTek Powderhorn x 50 GB 9830 tapes = 300 TBytes in 2000 (uncompressed) –Library of Congress: all information in the world; in 1992, ASCII of all books = 30 TB –Exchange up to 450 tapes per hour (8 secs/tape) 1.7 to 7.7 Mbyte/sec per reader, up to 10 readers 7.7 feet 10.7 feet 8200 pounds, 1.1 kilowatts

32 CMPUT429/CMPE382 Amaral 1/17/01 Library vs. Storage Getting books today as quaint as programming in the 1970s: –punch cards, batch processing –wander thru shelves, anticipatory purchasing Cost $1 per book to check out $30 for a catalogue entry 30% of all books never checked out Write only journals? Digital library can transform campuses

33 CMPUT429/CMPE382 Amaral 1/17/01 Whither tape? Investment in research: –90% of disks shipped in PCs; 100% of PCs have disks –~0% of tape readers shipped in PCs; ~0% of PCs have tapes Before, N disks / tape; today, N tapes / disk –40 GB/DLT tape (uncompressed) –80 to 192 GB/3.5" disk (uncompressed) Cost per GB: –In past, 10X to 100X tape cartridge vs. disk –Jan 2001: 40 GB for $53 (DLT cartridge), $2800 for reader –$1.33/GB cartridge, $2.03/GB 100 cartridges + 1 reader –($10995 for 1 reader + 15 tape autoloader, $10.50/GB) –Jan 2001: 80 GB for $244 (IDE,5400 RPM), $3.05/GB –Will $/GB tape v. disk cross in 2001? 2002? 2003? Storage field is based on tape backup; what should we do? Discussion if time permits?

34 CMPUT429/CMPE382 Amaral 1/17/01 Use Arrays of Small Disks? Disk Array: 1 disk design Conventional: 4 disk designs Low End High End Katz and Patterson asked in 1987: Can smaller disks be used to close gap in performance between disks and CPUs?

35 CMPUT429/CMPE382 Amaral 1/17/01 Advantages of Small Formfactor Disk Drives Low cost/MB High MB/volume High MB/watt Low cost/Actuator Cost and Environmental Efficiencies

36 CMPUT429/CMPE382 Amaral 1/17/01 Replace Small Number of Large Disks with Large Number of Small Disks! (1988 Disks) Capacity Volume Power Data Rate I/O Rate MTTF Cost IBM 3390K 20 GBytes 97 cu. ft. 3 KW 15 MB/s 600 I/Os/s 250 KHrs $250K IBM 3.5" MBytes 0.1 cu. ft. 11 W 1.5 MB/s 55 I/Os/s 50 KHrs $2K x70 23 GBytes 11 cu. ft. 1 KW 120 MB/s 3900 IOs/s ??? Hrs $150K Disk Arrays have potential for large data and I/O rates, high MB per cu. ft., high MB per KW, but what about reliability? 9X 3X 8X 6X

37 CMPUT429/CMPE382 Amaral 1/17/01 Array Reliability Reliability of N disks = Reliability of 1 Disk ÷ N 50,000 Hours ÷ 70 disks = 700 hours Disk system MTTF: Drops from 6 years to 1 month! Arrays (without redundancy) too unreliable to be useful! Hot spares support reconstruction in parallel with access: very high media availability can be achieved Hot spares support reconstruction in parallel with access: very high media availability can be achieved

38 CMPUT429/CMPE382 Amaral 1/17/01 Redundant Arrays of (Inexpensive) Disks Files are "striped" across multiple disks Redundancy yields high data availability –Availability: service still provided to user, even if some components failed Disks will still fail Contents reconstructed from data redundantly stored in the array Capacity penalty to store redundant info Bandwidth penalty to update redundant info

39 CMPUT429/CMPE382 Amaral 1/17/01 Redundant Arrays of Inexpensive Disks RAID 1: Disk Mirroring/Shadowing Each disk is fully duplicated onto its mirror Very high availability can be achieved Bandwidth sacrifice on write: Logical write = two physical writes Reads may be optimized Most expensive solution: 100% capacity overhead ( RAID 2 not interesting, so skip) recovery group

40 CMPUT429/CMPE382 Amaral 1/17/01 Redundant Array of Inexpensive Disks RAID 3: Parity Disk P logical record P contains sum of other disks per stripe mod 2 (parity) If disk fails, subtract P from sum of other disks to find missing information Striped physical records

41 CMPUT429/CMPE382 Amaral 1/17/01 RAID 3 Sum computed across recovery group to protect against hard disk failures, stored in P disk Logically, a single high capacity, high transfer rate disk: good for large transfers Wider arrays reduce capacity costs, but decreases availability 33% capacity cost for parity in this configuration

42 CMPUT429/CMPE382 Amaral 1/17/01 Inspiration for RAID 4 RAID 3 relies on parity disk to discover errors on Read But every sector has an error detection field Rely on error detection field to catch errors on read, not on the parity disk Allows independent reads to different disks simultaneously

43 CMPUT429/CMPE382 Amaral 1/17/01 Problems of Disk Arrays: Small Writes D0D1D2 D3 P D0' + + D1D2 D3 P' new data old data old parity XOR (1. Read) (2. Read) (3. Write) (4. Write) RAID-5: Small Write Algorithm 1 Logical Write = 2 Physical Reads + 2 Physical Writes

44 CMPUT429/CMPE382 Amaral 1/17/01 System Availability: Orthogonal RAIDs Array Controller String Controller String Controller String Controller String Controller String Controller String Controller... Data Recovery Group: unit of data redundancy Redundant Support Components: fans, power supplies, controller, cables End to End Data Integrity: internal parity protected data paths

45 CMPUT429/CMPE382 Amaral 1/17/01 System-Level Availability Fully dual redundant I/O Controller Array Controller Recovery Group Goal: No Single Points of Failure Goal: No Single Points of Failure host with duplicated paths, higher performance can be obtained when there are no failures

46 CMPUT429/CMPE382 Amaral 1/17/01 Berkeley History: RAID-I RAID-I (1989) –Consisted of a Sun 4/280 workstation with 128 MB of DRAM, four dual-string SCSI controllers, inch SCSI disks and specialized disk striping software Today RAID is $19 billion dollar industry, 80% nonPC disks sold in RAIDs

47 CMPUT429/CMPE382 Amaral 1/17/01 Summary: RAID Techniques: Goal was performance, popularity due to reliability of storage Disk Mirroring, Shadowing (RAID 1) Each disk is fully duplicated onto its "shadow" Logical write = two physical writes 100% capacity overhead Parity Data Bandwidth Array (RAID 3) Parity computed horizontally Logically a single high data bw disk High I/O Rate Parity Array (RAID 5) Interleaved parity blocks Independent reads and writes Logical write = 2 reads + 2 writes

48 CMPUT429/CMPE382 Amaral 1/17/01 Summary Storage Disks: –Extraodinary advance in capacity/drive, $/GB –Currently 17 Gbit/sq. in. ; can continue past 100 Gbit/sq. in.? –Bandwidth, seek time not keeping up: 3.5 inch form factor makes sense? 2.5 inch form factor in near future? 1.0 inch form factor in long term? Tapes –No investment, must be backwards compatible –Are they already dead? –What is a tapeless backup system?

49 CMPUT429/CMPE382 Amaral 1/17/01 Reliability Definitions Examples on why precise definitions so important for reliability Is a programming mistake a fault, error, or failure? –Are we talking about the time it was designed or the time the program is run? –If the running program doesnt exercise the mistake, is it still a fault/error/failure? If an alpha particle hits a DRAM memory cell, is it a fault/error/failure if it doesnt change the value? –Is it a fault/error/failure if the memory doesnt access the changed bit? –Did a fault/error/failure still occur if the memory had error correction and delivered the corrected value to the CPU?

50 CMPUT429/CMPE382 Amaral 1/17/01 IFIP Standard terminology Computer system dependability: quality of delivered service such that reliance can be placed on service Service is observed actual behavior as perceived by other system(s) interacting with this systems users Each module has ideal specified behavior, where service specification is agreed description of expected behavior A system failure occurs when the actual behavior deviates from the specified behavior failure occurred because an error, a defect in module The cause of an error is a fault When a fault occurs it creates a latent error, which becomes effective when it is activated When error actually affects the delivered service, a failure occurs (time from error to failure is error latency)

51 CMPUT429/CMPE382 Amaral 1/17/01 Fault v. (Latent) Error v. Failure A fault creates one or more latent errors Properties of errors are –a latent error becomes effective once activated –an error may cycle between its latent and effective states –an effective error often propagates from one component to another, thereby creating new errors Effective error is either a formerly-latent error in that component or it propagated from another error A component failure occurs when the error affects the delivered service These properties are recursive, and apply to any component in the system An error is manifestation in the system of a fault, a failure is manifestation on the service of an error

52 CMPUT429/CMPE382 Amaral 1/17/01 Fault v. (Latent) Error v. Failure An error is manifestation in the system of a fault, a failure is manifestation on the service of an error Is a programming mistake a fault, error, or failure? –Are we talking about the time it was designed or the time the program is run? –If the running program doesnt exercise the mistake, is it still a fault/error/failure? A programming mistake is a fault the consequence is an error (or latent error) in the software upon activation, the error becomes effective when this effective error produces erroneous data which affect the delivered service, a failure occurs

53 CMPUT429/CMPE382 Amaral 1/17/01 Fault v. (Latent) Error v. Failure An error is manifestation in the system of a fault, a failure is manifestation on the service of an error Is If an alpha particle hits a DRAM memory cell, is it a fault/error/failure if it doesnt change the value? –Is it a fault/error/failure if the memory doesnt access the changed bit? –Did a fault/error/failure still occur if the memory had error correction and delivered the corrected value to the CPU? An alpha particle hitting a DRAM can be a fault if it changes the memory, it creates an error error remains latent until effected memory word is read if the effected word error affects the delivered service, a failure occurs

54 CMPUT429/CMPE382 Amaral 1/17/01 Fault v. (Latent) Error v. Failure An error is manifestation in the system of a fault, a failure is manifestation on the service of an error What if a person makes a mistake, data is altered, and service is affected? fault: error: latent: failure:

55 CMPUT429/CMPE382 Amaral 1/17/01 Fault Tolerance vs Disaster Tolerance Fault-Tolerance (or more properly, Error- Tolerance): mask local faults (prevent errors from becoming failures) –RAID disks –Uninterruptible Power Supplies –Cluster Failover Disaster Tolerance: masks site errors (prevent site errors from causing service failures) –Protects against fire, flood, sabotage,.. –Redundant system and service at remote site. –Use design diversity From Jim Grays Talk at UC Berkeley on Fault Tolerance " 11/9/00

56 CMPUT429/CMPE382 Amaral 1/17/01 Defining reliability and availability quantitatively Users perceive a system alternating between 2 states of service with respect to service specification: 1.service accomplishment, where service is delivered as specified, 2.service interruption, where the delivered service is different from the specified service, measured as Mean Time To Repair (MTTR) Transitions between these 2 states are caused by failures (from state 1 to state 2) or restorations (2 to 1) module reliability: a measure of continuous service accomplishment (or of time to failure) from a reference point, e.g, Mean Time To Failure (MTTF) –The reciprocal of MTTF is failure rate module availability: measure of service accomplishment with respect to alternation between the 2 states of accomplishment and interruption = MTTF / (MTTF+MTTR)

57 CMPUT429/CMPE382 Amaral 1/17/01 Fail-Fast is Good, Repair is Needed As MTTF>>MTTR, improving either MTTR or MTTF gives benefit Note: Mean Time Between Failures (MTBF)= MTTF+MTTR Lifecycle of a module fail-fast gives short fault latency High Availability is low UN-Availability is low UN-Availability Unavailability ­ MTTR MTTF+MTTR MTTF+MTTR From Jim Grays Talk at UC Berkeley on Fault Tolerance " 11/9/00

58 CMPUT429/CMPE382 Amaral 1/17/01 Dependability: The 3 ITIES Reliability / Integrity: does the right thing. (Also large MTTF) Availability: does it now. (Also small MTTR MTTF+MTTR System Availability: if 90% of terminals up & 99% of DB up? (=>89% of transactions are serviced on time ). Security Integrity Reliability Availability From Jim Grays Talk at UC Berkeley on Fault Tolerance " 11/9/00

59 CMPUT429/CMPE382 Amaral 1/17/01 Reliability Example If assume collection of modules have exponentially distributed lifetimes (age of compoent doesn't matter in failure probability) and modules fail independently, overall failure rate of collection is sum of failure rates of modules Calculate MTTF of a disk subsystem with –10 disks, each rated at 1,000,000 hour MTTF –1 SCSI controller, 500,000 hour MTTF –1 power supply, 200,000 hour MTTF –1 fan, 200,000 MTTF –1 SCSI cable, 1,000,000 hour MTTF Failure Rate = 10*1/1,000, /500, /200, /200, /1,000,000 = ( )/1,000,000 = 23/1,000,000 MTTF=1/Failure Rate = 1,000,000/23 = 43,500 hrs

60 CMPUT429/CMPE382 Amaral 1/17/01 What's wrong with MTTF? 1,000,000 MTTF > 100 years; ~ infinity? How calculated? –Put, say, 2000 in a room, calculate failures in 60 days, and then calculate the rate –As long as 1,000,000 hr MTTF Suppose we did this with people? 1998 deaths per year in US ("Failure Rate") Deaths 5 to 14 year olds = 20/100,000 MTTF human = 100,000/20 = 5,000 years Deaths >85 year olds = 20,000/100,000 MTTF human = 100,000/20,000 = 5 years source: "Deaths: Final Data for 1998,"

61 CMPUT429/CMPE382 Amaral 1/17/01 What's wrong with MTTF? 1,000,000 MTTF > 100 years; ~ infinity? But disk lifetime is 5 years! => if you replace a disk every 5 years, on average it wouldn't fail until 21st replacement A better unit: % that fail Fail over lifetime if had 1000 disks for 5 years = (1000 disks * 365*24) / 1,000,000 hrs/failure = 43,800,000 / 1,000,000 = 44 failures = 4.4% fail with 1,000,000 MTTF Detailed disk spec lists failures/million/month Typically about 800 failures per month per million disks at 1,000,000 MTTF, or about 1% per year for 5 year disk lifetime

62 CMPUT429/CMPE382 Amaral 1/17/01 Dependability Big Idea: No Single Point of Failure Since Hardware MTTF is often 100,000 to 1,000,000 hours and MTTF is often 1 to 10 hours, there is a good chance that if one component fails it will be repaired before a second component fails Hence design systems with sufficient redundancy that there is No Single Point of Failure

63 CMPUT429/CMPE382 Amaral 1/17/01 HW Failures in Real Systems: Tertiary Disks A cluster of 20 PCs in seven 7-foot high, 19-inch wide racks with GB, 7200 RPM, 3.5-inch IBM disks. The PCs are P6-200MHz with 96 MB of DRAM each. They run FreeBSD 3.0 and the hosts are connected via switched 100 Mbit/second Ethernet

64 CMPUT429/CMPE382 Amaral 1/17/01 When To Repair? Chances Of Tolerating A Fault are 1000:1 (class 3) A 1995 study: Processor & Disc Rated At ~ 10khr MTTF Computed Single Observed FailuresDouble Fails Ratio 10k Processor Fails14 Double ~ 1000 : 1 40k Disc Fails,26 Double ~ 1000 : 1 Hardware Maintenance: On-Line Maintenance "Works" 999 Times Out Of The chance a duplexed disc will fail during maintenance?1:1000 Risk Is 30x Higher During Maintenance => Do It Off Peak Hour Software Maintenance: Repair Only Virulent Bugs Wait For Next Release To Fix Benign Bugs From Jim Grays Talk at UC Berkeley on Fault Tolerance " 11/9/00

65 CMPUT429/CMPE382 Amaral 1/17/01 Sources of Failures MTTFMTTR Power Failure :2000 hr 1 hr Phone Lines Soft >.1 hr.1 hr Hard4000 hr10 hr Hardware Modules :100,000hr10hr (many are transient) Software : 1 Bug/1000 Lines Of Code (after vendor-user testing) => Thousands of bugs in System! Most software failures are transient: dump & restart system. Useful fact: 8,760 hrs/year ~ 10k hr/year From Jim Grays Talk at UC Berkeley on Fault Tolerance " 11/9/00

66 CMPUT429/CMPE382 Amaral 1/17/01 Case Study - Japan "Survey on Computer Security", Japan Info Dev Corp., March (trans: Eiichi Watanabe). Vendor (hardware and software) 5 Months Application software 9 Months Communications lines1.5 Years Operations 2 Years Environment 2 Years 10 Weeks 1,383 institutions reported (6/84 - 7/85) 7,517 outages, MTTF ~ 10 weeks, avg duration ~ 90 MINUTES To Get 10 Year MTTF, Must Attack All These Areas 42% 12% 25% 9.3% 11.2 % Vendor Environment Operations Application Software Tele Comm lines From Jim Grays Talk at UC Berkeley on Fault Tolerance " 11/9/00

67 CMPUT429/CMPE382 Amaral 1/17/01 Case Studies - Tandem Trends Reported MTTF by Component SOFTWARE Years HARDWARE Years MAINTENANCE Years OPERATIONS Years ENVIRONMENT Years SYSTEM82021Years Problem: Systematic Under-reporting From Jim Grays Talk at UC Berkeley on Fault Tolerance " 11/9/00

68 CMPUT429/CMPE382 Amaral 1/17/01 VAX crashes 85, 93 [Murp95]; extrap. to 01 Sys. Man.: N crashes/problem, SysAdmin action –Actions: set params bad, bad config, bad app install HW/OS 70% in 85 to 28% in 93. In 01, 10%? Rule of Thumb: Maintenance 10X HW –so over 5 year product life, ~ 95% of cost is maintenance Is Maintenance the Key?

69 CMPUT429/CMPE382 Amaral 1/17/01 OK: So Far Hardware fail-fast is easy Redundancy plus Repair is great (Class 7 availability) Hardware redundancy & repair is via modules. How can we get instant software repair? We Know How To Get Reliable Storage RAID Or Dumps And Transaction Logs. We Know How To Get Available Storage Fail Soft Duplexed Discs (RAID 1...N). ? How do we get reliable execution? ? How do we get available execution? From Jim Grays Talk at UC Berkeley on Fault Tolerance " 11/9/00

70 CMPUT429/CMPE382 Amaral 1/17/01 Does Hardware Fail Fast? 4 of 384 Disks that failed in Tertiary Disk

71 CMPUT429/CMPE382 Amaral 1/17/01 High Availability System Classes Goal: Build Class 6 Systems Availability 90.% 99.% 99.9% 99.99% % % % System Type Unmanaged Managed Well Managed Fault Tolerant High-Availability Very-High-Availability Ultra-Availability Unavailable (min/year) 50,000 5, Availability Class UnAvailability = MTTR/MTBF can cut it in ½ by cutting MTTR or MTBF From Jim Grays Talk at UC Berkeley on Fault Tolerance " 11/9/00

72 CMPUT429/CMPE382 Amaral 1/17/01 How Realistic is "5 Nines"? HP claims HP-9000 server HW and HP-UX OS can deliver % availability guarantee in certain pre-defined, pre-tested customer environments –Application faults? –Operator faults? –Environmental faults? Collocation sites (lots of computers in 1 building on Internet) have –1 network outage per year (~1 day) –1 power failure per year (~1 day) Microsoft Network unavailable recently for a day due to problem in Domain Name Server: if only outage per year, 99.7% or 2 Nines

73 CMPUT429/CMPE382 Amaral 1/17/01 Summary: Dependability Fault => Latent errors in system => Failure in service Reliability: quantitative measure of time to failure (MTTF) –Assuming expoentially distributed independent failures, can calculate MTTF system from MTTF of components Availability: quantitative measure % of time delivering desired service Can improve Availability via greater MTTF or smaller MTTR (such as using standby spares) No single point of failure a good hardware guideline, as everything can fail Components often fail slowly Real systems: problems in maintenance, operation as well as hardware, software

74 CMPUT429/CMPE382 Amaral 1/17/01 Summary: Dependability Fault => Latent errors in system => Failure in service Reliability: quantitative measure of time to failure (MTTF) –Assuming expoentially distributed independent failures, can calculate MTTF system from MTTF of components Availability: quantitative measure % of time delivering desired service Can improve Availability via greater MTTF or smaller MTTR (such as using standby spares) No single point of failure a good hardware guideline, as everything can fail Components often fail slowly Real systems: problems in maintenance, operation as well as hardware, software

75 CMPUT429/CMPE382 Amaral 1/17/01 Introduction to Queueing Theory More interested in long term, steady state than in startup => Arrivals = Departures Littles Law: Mean number tasks in system = arrival rate x mean reponse time –Observed by many, Little was first to prove Applies to any system in equilibrium, as long as nothing in black box is creating or destroying tasks ArrivalsDepartures

76 CMPUT429/CMPE382 Amaral 1/17/01 A Little Queuing Theory: Notation Queuing models assume state of equilibrium: input rate = output rate Notation: r average number of arriving customers/second T ser average time to service a customer (tradtionally µ = 1/ T ser ) userver utilization (0..1): u = r x T ser (or u = r / T ser ) T q average time/customer in queue T sys average time/customer in system: T sys = T q + T ser L q average length of queue: L q = r x T q L sys average length of system: L sys = r x T sys Littles Law: Length system = rate x Time system (Mean number customers = arrival rate x mean service time) ProcIOCDevice Queue server System

77 CMPUT429/CMPE382 Amaral 1/17/01 A Little Queuing Theory Service time completions vs. waiting time for a busy server: randomly arriving event joins a queue of arbitrary length when server is busy, otherwise serviced immediately –Unlimited length queues key simplification A single server queue: combination of a servicing facility that accomodates 1 customer at a time (server) + waiting area (queue): together called a system Server spends a variable amount of time with customers; how do you characterize variability? –Distribution of a random variable: histogram? curve? ProcIOCDevice Queue server System

78 CMPUT429/CMPE382 Amaral 1/17/01 A Little Queuing Theory Server spends a variable amount of time with customers –Weighted mean m1 = (f1 x T1 + f2 x T fn x Tn)/F (F=f1 + f2...) –variance = (f1 x T1 2 + f2 x T fn x Tn 2 )/F – m1 2 »Must keep track of unit of measure (100 ms 2 vs. 0.1 s 2 ) –Squared coefficient of variance: C = variance/m1 2 »Unitless measure (100 ms 2 vs. 0.1 s 2 ) Exponential distribution C = 1 : most short relative to average, few others long; 90% < 2.3 x average, 63% < average Hypoexponential distribution C 90% < 2.0 x average, only 57% < average Hyperexponential distribution C > 1 : further from average C=2.0 => 90% < 2.8 x average, 69% < average ProcIOCDevice Queue server System Avg.

79 CMPUT429/CMPE382 Amaral 1/17/01 A Little Queuing Theory: Variable Service Time Server spends a variable amount of time with customers –Weighted mean m1 = (f1xT1 + f2xT fnXTn)/F (F=f1+f2+...) –Squared coefficient of variance C Disk response times C ­ 1.5 (majority seeks < average) Yet usually pick C = 1.0 for simplicity Another useful value is average time must wait for server to complete task: m1(z) –Not just 1/2 x m1 because doesnt capture variance –Can derive m1(z) = 1/2 x m1 x (1 + C) –No variance => C= 0 => m1(z) = 1/2 x m1 ProcIOCDevice Queue server System

80 CMPUT429/CMPE382 Amaral 1/17/01 A Little Queuing Theory: Average Wait Time Calculating average wait time in queue T q –If something at server, it takes to complete on average m1(z) –Chance server is busy = u; average delay is u x m1(z) –All customers in line must complete; each avg T s er T q = u x m1(z) + L q x T s er = 1/2 x u x T ser x (1 + C) + L q x T s er T q = 1/2 x u x T s er x (1 + C) + r x T q x T s er T q = 1/2 x u x T s er x (1 + C) + u x T q T q x (1 – u) = T s er x u x (1 + C) /2 T q = T s er x u x (1 + C) / (2 x (1 – u)) Notation: raverage number of arriving customers/second T ser average time to service a customer userver utilization (0..1): u = r x T ser T q average time/customer in queue L q average length of queue:L q = r x T q

81 CMPUT429/CMPE382 Amaral 1/17/01 A Little Queuing Theory: M/G/1 and M/M/1 Assumptions so far: –System in equilibrium –Time between two successive arrivals in line are random –Server can start on next customer immediately after prior finishes –No limit to the queue: works First-In-First-Out –Afterward, all customers in line must complete; each avg T ser Described memoryless or Markovian request arrival (M for C=1 exponentially random), General service distribution (no restrictions), 1 server: M/G/1 queue When Service times have C = 1, M/M/1 queue T q = T ser x u x (1 + C) /(2 x (1 – u)) = T ser x u / (1 – u) T ser average time to service a customer userver utilization (0..1): u = r x T ser T q average time/customer in queue

82 CMPUT429/CMPE382 Amaral 1/17/01 A Little Queuing Theory: An Example processor sends 10 x 8KB disk I/Os per second, requests & service exponentially distrib., avg. disk service = 20 ms On average, how utilized is the disk? –What is the number of requests in the queue? –What is the average time spent in the queue? –What is the average response time for a disk request? Notation: raverage number of arriving customers/second = 10 T ser average time to service a customer = 20 ms (0.02s) userver utilization (0..1): u = r x T ser = 10/s x.02s = 0.2 T q average time/customer in queue = T ser x u / (1 – u) = 20 x 0.2/(1-0.2) = 20 x 0.25 = 5 ms (0.005s) T sys average time/customer in system: T sys =T q +T ser = 25 ms L q average length of queue:L q = r x T q = 10/s x.005s = 0.05 requests in queue L sys average # tasks in system: L sys = r x T sys = 10/s x.025s = 0.25

83 CMPUT429/CMPE382 Amaral 1/17/01 A Little Queuing Theory: Another Example processor sends 20 x 8KB disk I/Os per sec, requests & service exponentially distrib., avg. disk service = 12 ms On average, how utilized is the disk? –What is the number of requests in the queue? –What is the average time a spent in the queue? –What is the average response time for a disk request? Notation: raverage number of arriving customers/second= 20 T ser average time to service a customer= 12 ms userver utilization (0..1): u = r x T ser = /s x. s = T q average time/customer in queue = T s er x u / (1 – u) = x /( ) = x = ms T sys average time/customer in system: T sys =T q +T ser = 16 ms L q average length of queue:L q = r x T q = /s x s = requests in queue L sys average # tasks in system : L sys = r x T sys = /s x s =

84 CMPUT429/CMPE382 Amaral 1/17/01 A Little Queuing Theory: Another Example processor sends 20 x 8KB disk I/Os per sec, requests & service exponentially distrib., avg. disk service = 12 ms On average, how utilized is the disk? –What is the number of requests in the queue? –What is the average time a spent in the queue? –What is the average response time for a disk request? Notation: raverage number of arriving customers/second= 20 T ser average time to service a customer= 12 ms userver utilization (0..1): u = r x T ser = 20/s x.012s = 0.24 T q average time/customer in queue = T s er x u / (1 – u) = 12 x 0.24/(1-0.24) = 12 x 0.32 = 3.8 ms T sys average time/customer in system: T sys =T q +T ser = 15.8 ms L q average length of queue:L q = r x T q = 20/s x.0038s = requests in queue L sys average # tasks in system : L sys = r x T sys = 20/s x.016s = 0.32

85 CMPUT429/CMPE382 Amaral 1/17/01 A Little Queuing Theory: Yet Another Example Suppose processor sends 10 x 8KB disk I/Os per second, squared coef. var.(C) = 1.5, avg. disk service time = 20 ms On average, how utilized is the disk? –What is the number of requests in the queue? –What is the average time a spent in the queue? –What is the average response time for a disk request? Notation: raverage number of arriving customers/second= 10 T ser average time to service a customer= 20 ms userver utilization (0..1): u = r x T ser = 10/s x.02s = 0.2 T q average time/customer in queue = T ser x u x (1 + C) /(2 x (1 – u)) = 20 x 0.2(2.5)/2(1 – 0.2) = 20 x 0.32 = 6.25 ms T sys average time/customer in system: T sys = T q +T ser = 26 ms L q average length of queue:L q = r x T q = 10/s x.006s = 0.06 requests in queue L sys average # tasks in system :L sys = r x T sys = 10/s x.026s = 0.26

86 CMPUT429/CMPE382 Amaral 1/17/01 Pitfall of Not using Queuing Theory 1st 32-bit minicomputer (VAX-11/780) How big should write buffer be? –Stores 10% of instructions, 1 MIPS Buffer = 1 => Avg. Queue Length = 1 vs. low response time


Download ppt "CMPUT429/CMPE382 Amaral 1/17/01 CMPUT429/CMPE382 Winter 2001 TopicC: I/O (Adapted from David A. Pattersons CS252, Spring 2001 Lecture Slides)"

Similar presentations


Ads by Google