Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Long Term Storage Trends and You Jim Gray Microsoft Research 28 Sept 2006 Minoan Phaistos Disk:1700 BC About 1KB No one can read it Illiac Disk: 1968.

Similar presentations


Presentation on theme: "1 Long Term Storage Trends and You Jim Gray Microsoft Research 28 Sept 2006 Minoan Phaistos Disk:1700 BC About 1KB No one can read it Illiac Disk: 1968."— Presentation transcript:

1 1 Long Term Storage Trends and You Jim Gray Microsoft Research 28 Sept 2006 Minoan Phaistos Disk:1700 BC About 1KB No one can read it Illiac Disk: 1968 storage bricks 200x

2 2 The Abstract We are headed for a world of 10TB disk drives, 64GB flash cards, and a massive main memories. This talk begins with an exploration of these storage trends and how they impact storage heat: 1.everything has to get colder, 2.utilities have to be redesigned to deal with scan times measured in days, and 3.massive replication is needed to mask failures. I assume we all agree that "tape is dead", so I am robbed of that lunatic idea, but I am still left with two crazy ideas: 4.smart disks and 5.the death of SAN. These contrarian ideas are related of course. The second half of the talk discusses the tape postmortem and these two crazy ideas.

3 3 The Reality This is an update of a 6-year old talk: –Rules of Thumb in Data Engineering –Rules of Thumb in Data Engineering, pdf, MSR-TR-99-100, 1999. Proc ICDE 2000,Rules of Thumb in Data Engineeringpdf, –talk.talk In light of 6 years change & progress. + brief note on some recent studies.

4 4 Whats New / Surprising Not a big surprise – just amazing! –exponential growth in capacity –latency lags bandwidth –5 minute rule is 30 minute rule FLASH is coming –low end storage (GBs now 100 GBs soon) –low latency storage (fraction of ms) –high $/byte but good $/access Smart Disks still seem far of, but...

5 5 To Blob or Not To Blob (½) Folklore: –DB is good for billions of small things –Files are good for thousands of big things Put another way: –DB is bad at big objects –Files Systems have trouble with billions of files. This is a fact, not a law of nature –DB and FS could learn each others tricks. But… what is big and small? Put another way: what is break-even size?

6 6 To Blob or Not To Blob (2/2) Folklore: BLOBS win for things less than 1MB. Refinement: If fragmentation, BLOBs win below 250KB. Humor: most files are less than 250KB. (but most bytes are in big files). To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem? Russell Sears, Catharine Van Ingen, Jim Gray, MSR-TR-2006-45, April 2006To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?

7 7 How Reliable are Cheap Disks? (1/5) Prices, Specs, and Gurus suggest SCSI good SATA bad. –3x cheaper but… –10x shorter MTTF –10x shorter warranty –100x higher Uncorrectable Error on Read (UER) Spec Sheet says 1 UER every 10 Terabytes! So, we measured and here is what we saw…

8 8 How Reliable are Cheap Disks? (2/5) Things fail much more often than predicted Vendors say 0.5% /year Customers see ~ 10x that rate Vendors say: –60% are no trouble found –30% are mis-handling (dropped/cooked/bent pins) –10% are real failures. Will UERs be worse than the specs? We need to worry about ctlr, pci, ram, software,… DISK DRIVE FAILURES

9 9 How Reliable are Cheap Disks? (3/5) For the record: Observed failure rates. SystemType Part Years Fails Fails /Year TerraServer SAN SCSI 10krpm858242.8% controllers7222.8% san switch9111.1% TerraServer Brick SATA 7krpm138107.2% Web Property 1 SCSI 10krpm15,8059726.0% controllers90013915.4% Web Property 2 PATA 7krpm22,4007403.3% motherboard3,769661.7% Empirical Measurements of Disk Failure Rates and Error Rates,Empirical Measurements of Disk Failure Rates and Error Rates Jim Gray, Catharine van Ingen, MSR-TR-2005-166, December 2005

10 10 How Reliable are Cheap Disks? (4/5) The experiment: Do 180,000 times (== 1.8PB ~ 1E16 bits) –Create and write 10GB disk file –Read it to check the checksum On various office systems for 4 months (~8 drive years) Expected 114 UER events, Observed 3 or 4 UER events –Two events corrected by OS on retry -- 1 real one –no disk failures –a file-system corruption (due to controller we guess) –Many reboots due to security patches –~4 system hangs (bad controllers / drivers). UER better than advertised (checked end-to-end) Empirical Measurements of Disk Failure Rates and Error Rates, MSR-TR-2005-166Empirical Measurements of Disk Failure Rates and Error Rates

11 11 Moral: Design For Failure (5/5) Things break: –disks break –controllers break –systems break –software breaks –data centers break –networks break Design for independent failure modes –guard against operations errors –guard against sympathetic failures –guard against viruses –Simple recovery is testable The cost of reliability is simplicity. Few are willing to pay that price T. Hoare

12 12 Its Hard to Archive a Petabyte It takes a LONG time to restore it. At 1GBps it takes 12 days! Store it in two (or more) places online. A geo-plex Scrub it continuously (look for errors) On failure, –use other copy until failure repaired, –refresh lost copy from safe copy. Can organize the two copies differently (e.g.: one by time, one by space)

13 13 Why 4 copies duplex storage masks MOST failures But,.. when one is broken you are worried So, triplex it (a la GFS, Cosmos, Blue)… And… you need geo-plex anyway So, why not 2+2 rather than 3+3? Symmetric and simple == good.

14 14 Outline Moores Law and consequences Storage rules of thumb Balanced systems rules revisited Networking rules of thumb Caching rules of thumb

15 15 Meta-Message: Technology Ratios Matter Price and Performance change. If everything changes in the same way, then nothing really changes. If some things get much cheaper/faster than others, then that is real change. Some things are not changing much: –Cost of people –Speed of light –… And some things are changing a LOT

16 16 The Perfect Memory (ratio problems) Store name-value pairs Read value given name (or predicate?) instantly! Capacity has grown ~2x/year (or 2x/2y) But ratios are changing: –Latency lags bandwidth (Patterson http://portal.acm.org/citation.cfm?id=1022596)http://portal.acm.org/citation.cfm?id=1022596 –Bandwidth lags capacity Pipelining (prefetch) can hide latency No way to fake bandwidth – you have to pay for it! capacity ~100tx/s and ~100 MB/s

17 17 Find Useful Ways To waste Space 1 TB disks now 100TB disks in 10 years? (or….) Cost: ~ $1GB now, 10$/TB in future Smart disks eventually (or now if you count xbox, ipod, …) Petabyte: 1,400 disks now 140 disks in 2012 Simple math –~30M seconds/year, –1GBps == ~30 PB/y Find creative ways to waste 99% of capacity but not use any bandwidth (ice cold data) capacity ~100tx/s and ~100 MB/s

18 18 Technology Trends 1 TB disks now 100TB disks in 10 years? (or….) Cost: ~ $1GB now, 10$/TB in future Smart disks eventually (or now if you count xbox, ipod, …) Petabyte: 1,400 disks now 300 disks in 2010 Simple math –~30M seconds/year, –1GBps == ~30 PB/y capacity ~100tx/s and ~100 MB/s

19 19 Technology Trend: Implication Find creative ways to waste 99% of capacity but not use any bandwidth (ice cold data) –replication –snapshots –archive Pipeline-Prefetch rewards –sequential access patterns –very large transfers large == 1MB now, large == 100MB in future Dataflow programming: stream data to programs. capacity ~100tx/s and ~100 MB/s

20 20 Technology Trend: Implication Q: For an infinite disk, how long does it take to –check disk (scrub) –defragment –reorganize –backup A: A LONG time Doing all four takes 4x longer Nightly/weekly << 4xInfinity Short-term fix: –combine utility scans –one pass algorithms. –Van Ingen: Where have all the IOPS gone? MSR-TR-2005-181 MSR-TR-2005-181 capacity ~100tx/s and ~100 MB/s

21 21 Bandwidth: links and parallel links Today: –40 Gbps per channel (λ) –12 channels per fiber (wdm): 500 Gbps –32 fibers/bundle = 16 Tbps/bundle In lab 20 Tbps/fiber (400 x WDM) 1 Tbps = USA 1996 WAN bisection bandwidth Serial links are fast can be used in parallel 1 fiber = 25 Tbps

22 22 Free Storage: like free puppies Storage is cheap (1k$/TB) Storage management is not 100K$ /TB /Year (or less… ) opX > 100 capX Goal opX << capX

23 23 Trends: Moores Law Performance/Price doubles every 18 months 100x per decade Progress in next 18 months = ALL previous progress –New storage = sum of all old storage (ever) –New processing = sum of all old processing. E. coli double ever 20 minutes! 15 years ago

24 24 Trends: ops/s/$ Had Three Growth Phases 1890-1945 Mechanical Relay 7-year doubling 1945-1985 Tube, transistor,.. 2.3 year doubling 1985-2010 Microprocessor 1.0 year doubling

25 25 So: a problem Suppose you have a ten-year compute job on the worlds fastest supercomputer. What should you do. ? Commit 250M$ now? ? Program for 9 years Software speedup: 2 6 = 64x Moores law speedup:2 6 = 64x so 4,000x speedup: spend 1M$ (not 250M$ on hardware) runs in 2 weeks, not 10 years. Homework problem: What is the optimum strategy?

26 26 Storage Capacity Beating Moores Law 500$/TB today (raw disk) 50$/TB by 2010 2005: shipped 350M drives ( 28% increase over 2004 ) ~ 0.1 Zeta Byte (!)

27 27 Trends: Magnetic Storage Densities Amazing progress Ratios have changed Improvements: Capacity 60%/y Bandwidth40%/y Access time 16%/y 2006: Seagate in lab @ 275ktpi, 1,730 kbpi 421 gbps 735 Mbps Limit: 50 tbpsi (100x density)

28 28 Trends: Density Limits The end is near! In 2000: Products@23 Gbpsi Lab: 50 Gbpsi limit: 60 Gbpsi But limit keeps rising & there are alternatives Today: Products @ 245 gbsi limit at 5 tbpsi Bit Density 3 2 3,000 2,000 1,000 600 300 200 100 60 30 20 10 6 b/µm 2 Gb/in 2 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 1 0.6 CD DVD ODD Wavelength Limit SuperParmagnetic Limit ?: NEMS, Florescent? Holographic, DNA? Figure adapted from Franco Vitaliano, The NEW new media: the growing attraction of nonmagnetic storage, Data Storage, Feb 2000, pp 21-32 Nope: Seagate @ 421 gbpsi Density vs Time b/µm 2 & Gb/in 2

29 29 Consequence of Moores law: Need an address bit every 18 months. Moores law gives you 2x more in 18 months. RAM –Today we have 1 GB to 1 TB machines (30-40 bits of addressing) –In 9 years we will need 6 more bits: 36-46 bit addressing (64GB - 64TB ram). Disks –Today we have 10 GB to 10 TB files & DBs (33-43 bit file addresses) –In 9 years, we will need 6 more bits 40-50 bit file addresses (1 PB files (! (?)))

30 30 Architecture could change this 1-level store: –System 48, AS400 has 1-level store. –Never re-uses an address. –Needs 96-bit addressing today. NUMAs and Clusters –Willing to buy a 100 M$ computer? –Then add 6 more address bits. Only 1-level store pushes us beyond 64-bits Still, these are logical addresses, 64-bit physical will last many years

31 31 Outline Moores Law and consequences Storage rules of thumb Balanced systems rules revisited Networking rules of thumb Caching rules of thumb

32 32 How much storage do we need? Soon everything can be recorded and indexed Most bytes will never be seen by humans. Data summarization, trend detection anomaly detection are key technologies See Mike Lesk: How much information is there: http://www.lesk.com/mlesk/ksg97/ksg.html http://www.lesk.com/mlesk/ksg97/ksg.html See Lyman & Varian: How much information http://www.sims.berkeley.edu/research/projects/how-much-info/ Yotta Zetta Exa Peta Tera Giga Mega Kilo A Book.Movi e All LoC books (words) All Books MultiMedia Everything ! Recorded A Photo 24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli

33 33 Storage Latency: How Far Away is the Data? Registers On Chip Cache On Board Cache Memory Disk 1 2 10 100 Tape /Optical Robot 10 9 6 Olympia This Campus This Room My Head 10 min 1.5 hr 2 Years 1 min Pluto 2,000 Years Andromeda

34 34 Storage Hierarchy : Speed & Capacity vs Cost Tradeoffs 10 15 10 12 10 9 6 Typical System (bytes) Size vs Speed Access Time (seconds) 10 -9 10 -6 10 -3 10 0 3 Cache Main Secondary Disc Nearline Offline Online Tape 10 4 2 0 -2 $/GB Price vs Speed Access Time (seconds) 10 -9 10 -6 10 -3 10 0 3 Cache Main Secondary Disc Nearline Offline Online Tape

35 35 Disks: Today Disk is 30GB to 1 TB 10-80 MBps 5k-15k rpm (6ms-2ms rotational latency) 10ms-3ms seek $/TB:.5K$/ATA, 1.2k$/SCSI For shared disks most time spent waiting in queue for access to arm/controller Seek Rotate Transfer Seek Rotate Transfer Wait

36 36 The Street Price of a Raw disk TB about 1K$/TB 12/1/1999 9/1/2000 9/1/2001 4/1/2002 9/20/2006

37 37 Standard Storage Metrics Capacity: –RAM: MB and $/MB: today at 4GB and ~100$/GB –Disk:GB and $/GB: today at 700GB and 500$/TB –Tape: TB and $/TB: today at 400GB and 300$/TB (nearline) Access time (latency) –RAM: 1…100 ns –Disk: 5…15 ms –Tape: 30 second pick, 30 second position Transfer rate –RAM: 1-10 GB/s –Disk: ~50 MB/s - - -Arrays can go to 1GB/s –Tape: ~50 MB/s - - - Arrays can go to 1GB/s

38 38 New Storage Metrics: Kaps, Maps, SCAN Kaps: How many kilobyte objects served per second –The file server, transaction processing metric –This is the OLD metric. Maps: How many megabyte objects served per sec –The Multi-Media metric SCAN: How long to scan all the data –the data mining and utility metric And –Kaps/$, Maps/$, TBscan/$

39 39 For the Record (good 2002 devices packaged in system http://www.tpc.org/results/individual_results/Compaq/compaq.5500.99050701.es.pdf) X 100 Tape slice is 8Tb with 1 LTO reader at 50MBps per 100 tapes.

40 40 For the Record (good 2002 devices packaged in system http://www.tpc.org/results/individual_results/Compaq/compaq.5500.99050701.es.pdf ) Tape is 1Tb with 4 DLT readers at 5MBps each.

41 41 Disk Changes Disks got cheaper: 20k$ -> 200$ –$/Kaps etc improved 100x (Moores law!) (or even 500x) –One-time event (went from mainframe prices to PC prices) Disks got cooler (50x per decade) –1990: 1 Kaps per 20 MB (1GB disk) –2006: 1 Kaps per 10,000 MB (.75TB disk) Disk scans take longer (10x per decade) –1990 disk ~ 1GB and 50Kaps and 5 minute scan –2006 disk ~750GB and 150Kaps and 5 hour scan So.. Backup/restore takes a long time (too long)

42 42 Storage Ratios Changed 10x better access time 10x more bandwidth 100x more capacity Data 25x cooler (1Kaps/20MB vs 1Kaps/GB) 4,000x lower media price 20x to 100x lower disk price Scan takes 10x longer (3 min vs 1hr) RAM/disk media price ratio changed –1970-1990 100:1 –1990-1995 10:1 –1995-1997 50:1 –2006 ~ 0.5$/GB disk 200:1 100$/GB ram

43 43 More Kaps and Kaps/$ Disk accesses got much less expensive Better disks Cheaper disks! But: disk arms are expensive the scarce resource 5 hour Scan vs 5 minutes in 1990 1 TB 70 MB/s Assumptions: 15krpm, Dell TPC-C pricing for scsi disks cabinets and controllers depreciated over 3 years.

44 44 Data on Disk Can Move to RAM in 10 years 100:1 10 years

45 45 The Absurd Disk Has Arrived 2.5 hr scan time (poor sequential access) 1 kaps / 10 GB (VERY cold data) Its a tape! 1 TB 100 MB/s 100 Kaps

46 46 FLASH: The Gap Filler? Flash chips are 4GB today – cards 64GB. 20$/GB –1/5 RAM price –but 20x disk price, but 20x better kaps Predicted to double each year to Tbit –doubled each year since 1997 Will eat disk market from below –cameras, ipods, … then laptops… then… –similar to cost/page or cost/first-page in printers Block-oriented read-write (2KB) 20MB/s per chip read 16 chips in parallel (64KB page, 320MB/s ~125 μs latency on read (25 fixed, 100 transfer) Write has 2ms latency (clear the page) Pages can only be written 1M times (approximately). Yearchip gbitPackage GB 2006164 2007328 20086416 200912832 201025664 2011512128 20121024256 ~80$ package

47 47 Flash CERTAINLY Represents an Opportunity To Rethink A Non-Volatile disk buffer (inside drive?) Low latency (100us) cache near cpu WAL Cache for Databases Quick restart FLASH is a block oriented device It likes read/write sequential It likes big (64KB reads/writes) A Design for High-Performance Flash Disks Andrew Birrell; Michael Isard; Chuck Thacker; Ted Wobber December 2005, MSR-TR-2005-176 MSR-TR-2005-176

48 48 Disk –750 GB – 50 MBps – 4 ms seek time – 2 ms rotate latency – 0.5 $/GB for drive 0.5 $/GB for ctlrs/cabinet –3.6 PB/rack –5 hour scan Tape –400 GB (80$/cartrige) – 40 MBps – 10 sec pick time –30-120 second seek time –200$/TB for media 800$/TB for drive+library –1 week scan The price advantage of tape is gone, and the performance advantage of disk is growing At 1K$/TB, disk is competitive with nearline tape. Guestimates Cern: 200 TB 3480 tapes 2 col = 50GB Rack = 1 TB = 1.25 drives Disk vs Tape

49 49 Auto Manage Storage 1980 rule of thumb: –A DataAdmin per 10GB, SysAdmin per mips 2006 rule of thumb –A DataAdmin per 50TB (WITH GOOD TOOLS) –Data Admin per ½ TB with crappy tools! –SysAdmin per 100 clones (varies with app). Problem: –5TB is >5k$ today, 500$ in a few years. –Admin cost >> storage cost !!!! Challenge: –Automate ALL storage admin tasks

50 50 How to cool disk data: Cache data in main memory –See 30 minute rule later in presentation Fewer-larger transfers –Larger pages (512-> 8KB -> 256KB) Sequential rather than random access –Random 8KB IO is 1 MBps –Sequential IO is 60 MBps (60:1 ratio is growing) Raid1 (mirroring) rather than Raid5 (parity).

51 51 Stripes, Mirrors, Parity (RAID 0,1, 5) RAID 0: Stripes –bandwidth RAID 1: Mirrors, Shadows,… –Fault tolerance –Reads faster, writes 2x slower RAID 5: Parity –Fault tolerance –Reads faster –Writes 4x or 6x slower. 0,3,6,..1,4,7,..2,5,8,.. 0,1,2,.. 0,2,P2,..1,P1,4,..P0,3,5,..

52 52 RAID 10 (strips of mirrors) Wins wastes space, saves arms RAID 5 (6 disks 1 vol): Performance –675 reads/sec –210 writes/sec –Write 4 logical IO, 2 seek + 1.7 rotate SAVES SPACE Performance degrades on failure RAID1 (6 disks, 3 pairs) Performance –750 reads/sec –300 writes/sec –Write 2 logical IO 2 seek 0.7 rotate SAVES ARMS Performance improves on failure

53 53 Best Index Page Size >64KB Best near 100KB small page has few entries, so little benefit big pages waste ram and bandwidth

54 54 Summarizing storage rules of thumb (1) Moores law: 4x every 3 years 100x more per decade Ratios change!!! Implies 2 bit of addressing every 3 years. Storage capacities increase 100x/decade Storage costs drop 100x per decade Storage throughput increases 10x/decade Data cools 10x/decade Disk page sizes increase 5x per decade.

55 55 Summarizing storage rules of thumb (2) RAM:Disk and Disk:Tape cost ratios are 100:1 and 1:1 Prices decline 100x per decade, so, in 10 years, disk data can move to RAM. A person should be able to administer a million dollars of storage: that is ~1PB today Disks are replacing tapes as backup devices. You cant backup/restore a Petabyte quickly so geoplex it. Mirroring rather than Parity to save disk arms

56 56 Outline Moores Law and consequences Storage rules of thumb Balanced systems rules revisited Networking rules of thumb Caching rules of thumb

57 57 Standard Architecture (today) PCI Bus 2 System Bus PCI Bus 1

58 58 Amdahls Balance Laws parallelism law: If a computation has a serial part S and a parallel component P, then the maximum speedup is (S+P)/S. balanced system law: A system needs a bit of IO per second per instruction per second: about 8 MIPS per MBps. memory law: = 1: the MB/MIPS ratio (called alpha ( )), in a balanced system is 1. IO law: Programs do one IO per 50,000 instructions.

59 59 Amdahls Laws Valid 40 Years Later? Parallelism law is algebra: so SURE! Balanced system laws? Look at tpc results (tpcC, tpcH) at http://www.tpc.org/ http://www.tpc.org/ Some imagination needed: –Whats an instruction (CPI varies from 1-3)? RISC, CISC, VLIW, … clocks per instruction,… –Whats an I/O?

60 60 Disks / cpu 25 44 TPC systems: Disk/CPU and I/B Normalize for CPI (clocks per instruction) –TPC-C has about 14 ins/byte of IO –TPC-H has ~1 ins/byte of IO MHz/ cpu CPImips KB / IO IO/s / disk Disks MB/s / cpu Ins/ IO Byte Amdahl 1 116 8 TPC-C= random 30002.1 1400 8120100 14 TPC-H= sequential 24001.2 2000 64 900 1762200 1

61 61 TPC systems: Whats alpha (=MB/MIPS ) ? Hard to say: –Intel 32 bit addressing (= 4GB limit). Known CPI. –IBM, HP, Sun have 64 GB limit. Unknown CPI. –Look at both, guess CPI for IBM, HP, Sun Alpha is between 4 and 16 MipsMemory AlphaDisks/cpu Amdahl11 11 tpcC Intel4x3Ghz = 6Gips24GB 425..100 tpcH Intel4x2.4Ghz= 10Gips64GB 1610..40

62 62 Instructions per IO? We know 8 mips per MBps of IO So, 8KB page is 64 K instructions And 64KB page is 512 K instructions. But, sequential has fewer instructions/byte. (3 vs 7 in tpcH vs tpcC). So, 64KB page is 200 K instructions.

63 63 Amdahls Balance Laws Revised Laws right, just need interpretation (imagination?) Balanced System Law: A system needs 8 MIPS/MBpsIO, but instruction rate must be measured on the workload. –Sequential workloads have low CPI (clocks per instruction), –random workloads tend to have higher CPI. Alpha (the MB/MIPS ratio) is rising from 1 to 16. This trend will likely continue. One Random IO per 50k instructions. Sequential IOs are larger One sequential IO per 200k instructions

64 64 PAP vs RAP (a 2006 perspective) Peak Advertised Performance vs Real Application Performance File System Application Data 2 GBps 1 GBps PCI-E 70 MBps 60 MBps Disks Disk Ctlr 1 GBps 500 MBps 8 GBps 4 GBps System Bus 2 socket 2core-4 issue 3Ghz = 48 Bips 1-6 cpi = 2..12 bips CPU PCI Bus 2 System Bus PCI Bus 1

65 65 Outline Moores Law and consequences Storage rules of thumb Balanced systems rules revisited Networking rules of thumb Caching rules of thumb

66 66 Standard IO (Infiniband) next Year? Probably Replace PCI with something better will still need a mezzanine bus standard Multiple serial links directly from processor Fast (10 GBps/link) for a few meters System Area Networks (SANS) ubiquitous (VIA morphs to Infiniband?) in 2006:Inifiniband got marginalized by 10Gbps Ethernet. It has low-latency, but that is a niche. PCI-Express came along ie: 2001

67 67 1 GBps Ubiquitous 10 GBps SANs in 5 years 1Gbps Ethernet are reality now. –Also FiberChannel,MyriNet, GigaNet, ServerNet,, ATM,… 10 Gbps x4 WDM deployed now (OC192) –3 Tbps WDM working in lab In 5 years, expect 10x, wow!! 5 MBps 20 MBps 40 MBps 80 MBps 120 MBps (1Gbps)

68 68 Networking WANS are getting faster than LANS G8 = OC192 = 9Gbps is standard Link bandwidth improves 4x per 3 years Speed of light (60 ms round trip in US) Software stacks have always been the problem. Time = SenderCPU + ReceiverCPU + bytes/bandwidth This has been the problem for small (10KB or less) messages

69 69 The Promise of SAN/VIA:10x in 2 years http://www.ViArch.org/ http://www.ViArch.org/ Yesterday: –10 MBps (100 Mbps Ethernet) –~20 MBps tcp/ip saturates 2 cpus –round-trip latency ~250 µs Now –Wires are 10x faster Myrinet, Gbps Ethernet, ServerNet,… – Fast user-level communication tcp/ip ~ 100 MBps 10% cpu round-trip latency is 15 us 1.6 Gbps demoed on a WAN

70 70 The Network Revolution Networking folks are finally streamlining LAN case (SAN). Offloading protocol to NIC ½ power point is 8KB Min round trip latency is ~50 µs. 3k ins +.1 ins/byte High-Performance Distributed Objects over a System Area Network Li, L. ; Forin, A. ; Hunt, G. ; Wang, Y., MSR-TR-98-68

71 71 How much does wire-time cost? $/Mbyte? CostTime Gbps Ethernet.2µ$ 10 ms 100 Mbps Ethernet.3µ$100 ms OC12 (650 Mbps).003$ 20 ms DSL.0006$ 25 sec POTs.002$200 sec Wireless:.80$500 sec

72 72 Data delivery costs 1$/GB today Rent for big customers: 30$/megabit per second per month Improved 3x in last 6 years (!). That translates to 0.1 $/GB at each end. Overhead (routers, people,..) makes it 1$/GB at each end. You can mail a 750 GB disk for 20$. –Thats 30x.. 3 x cheaper –If overnight its 7 MBps. –7 disks ~ 50 MBps (1/4 Gbps) TeraScale SneakerNet 7x750 GB ~ 5 TB

73 73 Outline Moores Law and consequences Storage rules of thumb Balanced systems rules revisited Networking rules of thumb Caching rules of thumb

74 74 The Five Minute Rule Trade DRAM for Disk Accesses Cost of an access (Drive_Cost / Access_per_second) Cost of a DRAM page ( $/MB/ pages_per_MB) Break even has two terms: Technology term and an Economic term Grew page size to compensate for changing ratios. Now at 5 minutes for random, 10 seconds sequential

75 75 Cost a RAM Page RAM_$_Per_MB PagesPerMB The 5 Minute Rule Derived Breakeven: RAM_$_Per_MB = _____ DiskPrice. PagesPerMB T x AccessesPerSecond T = DiskPrice x PagesPerMB. RAM_$_Per_MB x AccessPerSecond $ ( ) /T T =TimeBetweenReferences to Page Disk Access Cost /T DiskPrice. AccessesPerSecond

76 76 Plugging in the Numbers PPM/aps disk$/Ram$ Break Even Random 128/120 ~1 200/0.1 ~2,000 28 minutes Sequential 1/60 ~.01 ~ 2,000 30seconds Trend is longer times because disk$ not changing much, RAM$ declining 100x/decade 30 Minutes & 30 second rule

77 77 When to Cache Web Pages. Caching saves user time Caching saves wire time Caching costs storage Caching only works sometimes: –New pages are a miss –Stale pages are a miss

78 78 Web Page Caching Saves People Time Assume people cost 20$/hour (or.2 $/hr ???) Assume 20% hit in browser, 40% in proxy Assume 3 second server time Caching saves people time 28$/year to 150$/year of people time or.28 cents to 1.5$/year.

79 79 Web Page Caching Saves Resources Wire cost is penny (wireless) to 100µ$ LAN Storage is 8 µ$/mo Breakeven: wire cost = storage rent 18 months to 300 years Add people cost: breakeven >15 years. cheap people (.2$/hr) >3 years.

80 80 Caching Disk caching –30 minute rule for random IO –30 second rule for sequential IO Web page caching: –If page will be re-referenced in 18 months: with free users 15 years: with valuable users then cache the page in the client/proxy. Challenge: guessing which pages will be re-referenced detecting stale pages (page velocity)

81 81 Meta-Message: Technology Ratios Matter Price and Performance change. If everything changes in the same way, then nothing really changes. If some things get much cheaper/faster than others, then that is real change. Some things are not changing much: –Cost of people –Speed of light –… And some things are changing a LOT

82 82 Outline Moores Law and consequences Storage rules of thumb Balanced systems rules revisited Networking rules of thumb Caching rules of thumb

83 83 Whats New / Surprising Not a big surprise – just amazing! –exponential growth in capacity –latency lags bandwidth lags cpacity –5 minute rule is 30 minute rule FLASH is coming –low end storage (GBs now 100 GBs soon) –low latency storage (fraction of ms) –high $/byte but good $/access Smart Disks still seem far of, but...


Download ppt "1 Long Term Storage Trends and You Jim Gray Microsoft Research 28 Sept 2006 Minoan Phaistos Disk:1700 BC About 1KB No one can read it Illiac Disk: 1968."

Similar presentations


Ads by Google