Presentation is loading. Please wait.

Presentation is loading. Please wait.

J. Gray, Dependability in the Internet Era (acknowledgement: slides from J.Gray, E.Brewer)

Similar presentations


Presentation on theme: "J. Gray, Dependability in the Internet Era (acknowledgement: slides from J.Gray, E.Brewer)"— Presentation transcript:

1 J. Gray, Dependability in the Internet Era (acknowledgement: slides from J.Gray, E.Brewer)

2 The Last 10 Years: Availability Dark Ages Ready for a Renaissance? Things got better, then things got a lot worse! 195019601970198019902000 9% 99% 99.9% 99.99% 99.999% Computer Systems Telephone Systems Cell phones Interne t Availability 2010

3 DEPENDABILITY: The 3 ITIES RELIABILITY / INTEGRITY: Does the right thing. (also MTTF>>1) AVAILABILITY: Does it now. (also 1 >> MTTR ) MTTF+MTTR System Availability: If 90% of terminals up & 99% of DB up? (=>89% of transactions are serviced on time ). Holistic vs. Reductionist view Security Integrity Reliability Availability

4 Fail-Fast is Good, Repair is Needed Improving either MTTR or MTTF gives benefit Lifecycle of a module fail-fast gives short fault latency High Availability is low UN-Availability is low UN-Availability Unavailability ~ MTTR MTTF MTTF

5 Disks (raid) the BIG Success Story Duplex or Parity: masks faults Disks @ 1M hours (~100 years) But –controllers fail and –have 1,000s of disks. Duplexing or parity, and dual path gives “perfect disks” Wal-Mart never lost a byte (thousands of disks, hundreds of failures). Only software/operations mistakes are left.

6 Fault Tolerance vs Disaster Tolerance Fault-Tolerance: mask local faults –RAID disks –Uninterruptible Power Supplies –Cluster Failover Disaster Tolerance: masks site failures –Protects against fire, flood, sabotage,.. –Also, software changes, site moves,… –Redundant system and service at remote site.

7 Availability 99999 well-managed nodes well-managed packs & clones well-managed GeoPlex Masks some hardware failures Masks hardware failures, Operations tasks (e.g. software upgrades) Masks some software failures Masks site failures (power, network, fire, move,…) Masks some operations failures Availability Un-managed

8 Case Studies - Tandem Trends MTTF improved Shiftfrom Hardware & Maintenance to from 50% to 10% toSoftware (62%) & Operations (15%) NOTE: Systematic under-reporting ofEnvironment Operations errors Application Software

9 Dependability Status circa 1995 ~4-year MTTF 5 9s for well-managed sys. Fault Tolerance Works. Hardware is GREAT (maintenance and MTTF). Software masks most hardware faults. Many hidden software outages in operations: New Software. Utilities. Need to make all hardware/software changes ONLINE.

10 Progress? MTTF improved from 1950-1995 MTTR incremental improvements 1970 --- failover Hardware and Software online change (pNp) is now standard Then the Internet arrived: –No project can take more than 3 months. –Time to market is everything –Change is good. Computer Systems Telephone Systems Cell phones Internet

11 The Internet Changed Expectations 1990 Phones delivered 99.999% ATMs delivered 99.99% Failures were front-page news. Few hackers Outages last an “hour” 2005 Cell phones deliver 90% Web sites deliver 99% Failures are business-page news Many hackers. Outages last a “day” This is progress?

12 2006

13

14 Eric Brewer said it best: ACID vs BASE the internet litmus test A tomicity C onsistency I solation D urabilty Availability? Strong consistency Isolation Focus on commit Conservative (Pessimistic) Difficult evolution (e.g. schema) Nested transactions B asic A vailability S oft State E ventual Consistency Availability FIRST Weak consistency stale data is OK Approximate answers OK Best effort Aggressive (optimistic) Easier Evolution. Simpler! Faster I think it is a spectrum

15

16

17

18


Download ppt "J. Gray, Dependability in the Internet Era (acknowledgement: slides from J.Gray, E.Brewer)"

Similar presentations


Ads by Google