1 The 5 Minute Rule Jim Gray Microsoft Research Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Building Peta-Byte Servers
Addition Facts
Welcome to Who Wants to be a Millionaire
1 Lecture 22: I/O, Disk Systems Todays topics: I/O overview Disk basics RAID Reminder: Assignment 8 due Tue 11/21.
CS 105 Tour of the Black Holes of Computing
IT253: Computer Organization
Storing Data: Disk Organization and I/O
1 Storing Data: Disks and Files Chapter 7. 2 Disks and Files v DBMS stores information on (hard) disks. v This has major implications for DBMS design!
I/O Chapter 8. Outline Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
Storing Data: Disks and Files
88 CHAPTER SECONDARY STORAGE. © 2005 The McGraw-Hill Companies, Inc. All Rights Reserved. 8-2 Competencies Distinguish between primary & secondary storage.
CS224 Spring 2011 Computer Organization CS224 Chapter 6A: Disk Systems With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Computer Studies Today Chapter 18 1 »Two main types of secondary storage devices: –Magnetic storage devices –Optical storage devices.
Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
CS 440 Database Management Systems RDBMS Architecture and Data Storage 1.
Mehdi Naghavi Spring 1386 Operating Systems Mehdi Naghavi Spring 1386.
CSE431 Chapter 6A.1Irwin, PSU, 2008 Chapter 6A: Disk Systems Mary Jane Irwin ( ) [Adapted from Computer Organization.
1 Disks Introduction ***-. 2 Disks: summary / overview / abstract The following gives an introduction to external memory for computers, focusing mainly.
Storage Devices.
Storing Data Chapter 4.
Network, Local, and Portable Storage Media Computer Literacy for Education Majors.
Computing ESSENTIALS CHAPTER Copyright 2003 The McGraw-Hill Companies, Inc.Copyright 2003 The McGraw-Hill Companies, Inc Secondary Storage computing.
Computer Systems – the impact of caches
Storage and Disks.
Addition 1’s to 20.
Equal or Not. Equal or Not
Week 1.
SE-292 High Performance Computing
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
1  1998 Morgan Kaufmann Publishers Interfacing Processors and Peripherals.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Cache Design and Tricks Presenters: Kevin Leung Josh Gilkerson Albert Kalim Shaz Husain.
Storing Data: Disks and Files: Chapter 9
Memory Hierarchy. Smaller and faster, (per byte) storage devices Larger, slower, and cheaper (per byte) storage devices.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
Secondary Storage CSCI 444/544 Operating Systems Fall 2008.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
“Five minute rule ten years later and other computer storage rules of thumb” Authors: Jim Gray, Goetz Graefe Reviewed by: Nagapramod Mandagere Biplob Debnath.
Tape is Dead Disk is Tape Flash is Disk RAM Locality is King Jim Gray Microsoft December 2006 Presented at CIDR2007 Gong Show
Jim Gray / Presented at U. Tokyo / 23 Jan 1999 Jim Gray Talk at University of Tokyo  Personal views on PITAC report: invest in long term research  Preview.
Lecture 11: DMBS Internals
IT 344: Operating Systems Winter 2010 Module 13 Secondary Storage Chia-Chi Teng CTB 265.
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Memory Systems How to make the most out of cheap storage.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?
CSE 451: Operating Systems Winter 2015 Module 14 Secondary Storage Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska, Levy,
CS 101 – Sept. 28 Main vs. secondary memory Examples of secondary storage –Disk (direct access) Various types Disk geometry –Flash memory (random access)
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CSE 451: Operating Systems Winter 2012 Secondary Storage Mark Zbikowski Gary Kimura.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
1 Meta-Message: Technology Ratios Matter Price and Performance change. If everything changes in the same way, then nothing really changes. If some things.
CSE 451: Operating Systems Spring 2010 Module 12.5 Secondary Storage John Zahorjan Allen Center 534.
Lecture 16: Data Storage Wednesday, November 6, 2006.
IT 344: Operating Systems Winter 2008 Module 13 Secondary Storage
Lecture 11: DMBS Internals
CSE 451: Operating Systems Winter 2006 Module 13 Secondary Storage
CSE 451: Operating Systems Autumn 2003 Lecture 12 Secondary Storage
CSE 451: Operating Systems Winter 2007 Module 13 Secondary Storage
CSE 451: Operating Systems Spring 2006 Module 13 Secondary Storage
CSE 451: Operating Systems Secondary Storage
CSE 451: Operating Systems Winter 2003 Lecture 12 Secondary Storage
CSE 451: Operating Systems Winter 2009 Module 12 Secondary Storage
CSE 451: Operating Systems Spring 2005 Module 13 Secondary Storage
CSE 451: Operating Systems Autumn 2004 Secondary Storage
CSE 451: Operating Systems Winter 2004 Module 13 Secondary Storage
CSE 451: Operating Systems Spring 2007 Module 11 Secondary Storage
Presentation transcript:

1 The 5 Minute Rule Jim Gray Microsoft Research Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today, we are here Peta10 15 Exa10 18

2 Storage Hierarchy (9 levels) Cache 1, 2Cache 1, 2 Main (1, 2, 3 if nUMA).Main (1, 2, 3 if nUMA). Disk (1 (cached), 2)Disk (1 (cached), 2) Tape (1 (mounted), 2)Tape (1 (mounted), 2)

3 Meta-Message: Technology Ratios Are Important If everything gets faster & cheaper at the same rate THEN nothing really changes.If everything gets faster & cheaper at the same rate THEN nothing really changes. Things getting MUCH BETTER:Things getting MUCH BETTER: –communication speed & cost 1,000x –processor speed & cost 100x –storage size & cost 100x Things staying about the sameThings staying about the same –speed of light (more or less constant) –people (10x more expensive) –storage speed (only 10x better)

4 Todays Storage Hierarchy : Speed & Capacity vs Cost Tradeoffs Typical System (bytes) Size vs Speed Access Time (seconds) Cache Main Secondary Disc Nearline Tape Offline Tape Online Tape $/MB Price vs Speed Access Time (seconds) Cache Main Secondary Disc Nearline Tape Offline Tape Online Tape

5 Storage Ratios Changed 10x better access time10x better access time 10x more bandwidth10x more bandwidth 4,000x lower media price4,000x lower media price DRAM/DISK 100:1 to 10:10 to 50:1DRAM/DISK 100:1 to 10:10 to 50:1

6 Thesis: Performance =Storage Accesses not Instructions Executed In the old days we counted instructions and IOsIn the old days we counted instructions and IOs Now we count memory referencesNow we count memory references Processors wait most of the timeProcessors wait most of the time

7 The Pico Processor 1 M SPECmarks 10 6 clocks/ fault to bulk ram Event-horizon on chip. VM reincarnated Multi-program cache Terror Bytes!

8 Storage Latency: How Far Away is the Data? Registers On Chip Cache On Board Cache Memory Disk Tape /Optical Robot Sacramento This Campus This Room My Head 10 min 1.5 hr 2 Years 1 min Pluto 2,000 Years Andromeda

9 The Five Minute Rule Trade DRAM for Disk AccessesTrade DRAM for Disk Accesses Cost of an access (DriveCost / Access_per_second)Cost of an access (DriveCost / Access_per_second) Cost of a DRAM page ( $/MB / pages_per_MB)Cost of a DRAM page ( $/MB / pages_per_MB) Break even has two terms:Break even has two terms: Technology term and an Economic termTechnology term and an Economic term Grew page size to compensate for changing ratios.Grew page size to compensate for changing ratios. Still at 5 minute for random, 1 minute sequentialStill at 5 minute for random, 1 minute sequential

10 Shows Best Page Index Page Size ~16KB

11 Standard Storage Metrics Capacity:Capacity: –RAM: MB and $/MB: today at 10MB & 100$/MB –Disk:GB and $/GB: today at 10 GB and 200$/GB –Tape: TB and $/TB: today at.1TB and 25k$/TB (nearline) Access time (latency)Access time (latency) –RAM:100 ns –Disk: 10 ms –Tape: 30 second pick, 30 second position Transfer rateTransfer rate –RAM: 1 GB/s –Disk: 5 MB/s Arrays can go to 1GB/s –Tape: 5 MB/s striping is problematic

12 New Storage Metrics: Kaps, Maps, SCAN? Kaps: How many kilobyte objects served per secondKaps: How many kilobyte objects served per second –The file server, transaction processing metric –This is the OLD metric. Maps: How many megabyte objects served per secondMaps: How many megabyte objects served per second –The Multi-Media metric SCAN: How long to scan all the dataSCAN: How long to scan all the data –the data mining and utility metric AndAnd –Kaps/$, Maps/$, TBscan/$

13 For the Record (good 1998 devices packaged in system ) X 14

14 How To Get Lots of Maps, SCANs parallelism: use many little devices in parallelparallelism: use many little devices in parallel Beware of the media mythBeware of the media myth Beware of the access time mythBeware of the access time myth At 10 MB/s: 1.2 days to scan 1,000 x parallel: 100 seconds SCAN. Parallelism: divide a big problem into many smaller ones to be solved in parallel.

15 The Disk Farm On a Card The 100GB disc card An array of discs Can be used as 100 discs 100 discs 1 striped disc 1 striped disc 10 Fault Tolerant discs 10 Fault Tolerant discs....etc....etc LOTS of accesses/second bandwidth bandwidth 14" Life is cheap, its the accessories that cost ya. Processors are cheap, its the peripherals that cost ya (a 10k$ disc card).

16 Tape Farms for Tertiary Storage Not Mainframe Silos Scan in 27 hours. many independent tape robots (like a disc farm) 10K$ robot 14 tapes 500 GB 5 MB/s 20$/GB 30 Maps 100 robots 50TB 50$/GB 3K Maps 27 hr Scan 1M$

,000 10, ,000 1,, 1000 xDisc Farm STC Tape Robot 6,000 tapes, 8 readers 100x DLTTape Farm GB/K$ Maps SCANS/Day Kaps The Metrics: Disk and Tape Farms Win Data Motel: Data checks in, but it never checks out

18 Tape & Optical: Beware of the Media Myth Optical is cheap: 200 $/platter 2 GB/platter => 100$/GB (2x cheaper than disc) Tape is cheap:30 $/tape 20 GB/tape => 1.5 $/GB (100x cheaper than disc).

19 Tape & Optical Reality: Media is 10% of System Cost Tape needs a robot (10 k$... 3 m$ ) tapes (at 20GB each) => 20$/GB $/GB (1x…10x cheaper than disc) Optical needs a robot (100 k$ ) 100 platters = 200GB ( TODAY ) => 400 $/GB ( more expensive than mag disc ) Robots have poor access times Not good for Library of Congress (25TB) Data motel: data checks in but it never checks out!

20 The Access Time Myth The Myth: seek or pick time dominates The reality: (1) Queuing dominates (2) Transfer dominates BLOBs (2) Transfer dominates BLOBs (3) Disk seeks often short (3) Disk seeks often short Implication: many cheap servers better than one fast expensive server –shorter queues –parallel transfer –lower cost/access and cost/byte This is now obvious for disk arrays This will be obvious for tape arrays