Storage Systems CSE 598d, Spring 2007 Lecture ?: Rules of thumb in data engineering Paper by Jim Gray and Prashant Shenoy Feb 15, 2007.

Slides:



Advertisements
Similar presentations
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Advertisements

CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
Instructor: Sazid Zaman Khan Lecturer, Department of Computer Science and Engineering, IIUC.
CS 277 – Spring 2002Notes 21 CS 277: Database System Implementation Notes 02: Hardware Arthur Keller.
CSCE 212 Chapter 8 Storage, Networks, and Other Peripherals Instructor: Jason D. Bakos.
The Conquest File System: An-I A. Wang Geoffrey H. Kuenning Peter Reiher Gerald J. Popek Life after Disks Abstract The rapidly declining cost of persistent.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
CS CS 5150 Software Engineering Lecture 19 Performance.
1 CS 501 Spring 2002 CS 501: Software Engineering Lecture 19 Performance of Computer Systems.
11/2/2004Comp 120 Fall November 9 classes to go! VOTE! 2 more needed for study. Assignment 10! Cache.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
CS 501: Software Engineering Fall 2000 Lecture 19 Performance of Computer Systems.
Lecture 3: A Case for RAID (Part 1) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.
1 CS 501 Spring 2005 CS 501: Software Engineering Lecture 22 Performance of Computer Systems.
MODULE 5: Introduction to Memory system
Bandwidth Rocks (1) Latency Lags Bandwidth (last ~20 years) Performance Milestones Disk: 3600, 5400, 7200, 10000, RPM.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
0 What is a computer?  Simply put, a computer is a sophisticated electronic calculating machine that:  Accepts input information,  Processes the information.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
1 CS : Technology Trends Ion Stoica ( September 12, 2011.
“Five minute rule ten years later and other computer storage rules of thumb” Authors: Jim Gray, Goetz Graefe Reviewed by: Nagapramod Mandagere Biplob Debnath.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 5 – Storage Organization.
Tape is Dead Disk is Tape Flash is Disk RAM Locality is King Jim Gray Microsoft December 2006 Presented at CIDR2007 Gong Show
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Lecture 11: DMBS Internals
1 Chapter 7: Storage Systems Introduction Magnetic disks Buses RAID: Redundant Arrays of Inexpensive Disks.
CS 1308 Computer Literacy and the Internet Computer Systems Organization.
© 2011 University of StirlingLecture Review/Slide 1CSC931 Computing Science I CSC931 Review.
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
1 CS 501 Spring 2006 CS 501: Software Engineering Lecture 22 Performance of Computer Systems.
Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 6.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Price Performance Metrics CS3353. CPU Price Performance Ratio Given – Average of 6 clock cycles per instruction – Clock rating for the cpu – Number of.
How Memory Works Physical Example 0 Water Tank 1 EmptyFull.
1 CS : Technology Trends Ion Stoica and Ali Ghodsi ( August 31, 2015.
EVLA Data Processing PDR Scale of processing needs Tim Cornwell, NRAO.
Memory and Storage. Computer Memory Processor registers – Temporary storage locations within the CPU – Examples Instruction register – holds the instruction.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
CS 1308 Computer Literacy and the Internet. Objectives In this chapter, you will learn about:  The components of a computer system  Putting all the.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Lecture#15. Cache Function The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that.
1Thu D. NguyenCS 545: Distributed Systems CS 545: Distributed Systems Spring 2002 Communication Medium Thu D. Nguyen
Am I RAM Or am I ROM?.
1 CS 501 Spring 2003 CS 501: Software Engineering Lecture 23 Performance of Computer Systems.
Abstract Increases in CPU and memory will be wasted if not matched by similar performance in I/O SLED vs. RAID 5 levels of RAID and respective cost/performance.
Memory The term memory is referred to computer’s main memory, or RAM (Random Access Memory). RAM is the location where data and programs are stored (temporarily),
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
Software System Performance CS 560. Performance of computer systems In most computer systems:  The cost of people (development) is much greater than.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
1 Query Processing Exercise Session 1. 2 The system (OS or DBMS) manages the buffer Disk B1B2B3 Bn … … Program’s private memory An application program.
1 Meta-Message: Technology Ratios Matter Price and Performance change. If everything changes in the same way, then nothing really changes. If some things.
Introduction to Computers - Hardware
Five-Minute Rule for trading memory for disc access-Jim Gray and G. F
David Kauchak CS 52 – Spring 2017
Anshul Gandhi 347, CS building
What is a computer? Simply put, a computer is a sophisticated electronic calculating machine that: Accepts input information, Processes the information.
What is a computer? Simply put, a computer is a sophisticated electronic calculating machine that: Accepts input information, Processes the information.
Part V Memory System Design
CS : Technology Trends August 31, 2015 Ion Stoica and Ali Ghodsi (
Lecture 11: DMBS Internals
Input-output I/O is very much architecture/system dependent
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chap 2. Computer Fundamentals
Introduction to Computer Architecture
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chapter 14: File-System Implementation
Tape is Dead Disk is Tape Flash is Disk RAM Locality is King
Presentation transcript:

Storage Systems CSE 598d, Spring 2007 Lecture ?: Rules of thumb in data engineering Paper by Jim Gray and Prashant Shenoy Feb 15, 2007

Contents Examination of rules-of-thumb in data engineering –Moore’s law –Amdahl’s rules –Gilder’s law Technological trends and how/whether existing rules-of-thumb need to be re-thought

Moore’s Law Circuit densities grow at 4x every 3 years –100x increase in a decade –More generally: Ax every B years –Originally meant for RAM Implies an extra bit of addressing every 18 months From 16-bit of addressing in 70s (1 MB) to 64-bit addressing these days (several GB) –Extended to CPU and storage

Disk parameters over time

Moore’s law applied to HDD Disk capacity has increased more than 100x in the last decade! –Areal density up from 20 Mbpsi to 35 Gbpsi However, data rate has only increased 30x –Capacity / Accesses per sec growing 10x per decade –Capacity / bandwidth growing 10x per decade Implications: –Disk accesses becoming more precious –Disk data becoming “cooler”

Closer look at the implications Discussion –Does the increase in disk capacity mean applications are also using correspondingly large stores? –Why are disk accesses per second going up? Recall these have grown slower than areal density 10 years ago: 30 kaps for 1 GB data Today: 120 kaps for 80 GB data –That is, only 1.5 kaps per GB –HDD data needs to be x cooler than it was 10 years ago –Use large main memories (caching)

Costly disk accesses have led to.. Preferring few large transfers over many small ones Preferring sequential transfers –Log-structured file systems Mirroring rather than other forms of redundancy

Cost trends Historically –Tape:HDD:RAM has been 1:10:1000 Calculation for a modern system gives 1:3:300 –Disk prices are approaching tape prices Disks are replacing tapes in several domains –Cost/MB for RAM declines 100x in a decade What is economical to put on disk today may be economical to put on RAM in 10 years –RAM taking up lot of the role of the HDD, HDD taking up a lot of the role of tape Storage management costs exceed device costs Admins required to manage more and more data –Automation, self-manageability becoming crucial

Amhdal’s System Balance Rules Parallelism law –Expresses maximum achievable speedup in terms of the fraction of parallelizable component of a computation Balanced system law –A system needs 1 bit of IO/sec per instruction/sec IOPS = IPS Memory law –MB/MIPS ratio in a balanced system is 1 IO law –Programs do IO per instructions How have these rules changed over time?

Methodology –Rely on well-regarded benchmarks TPC-C (random) and TPC-H (sequential) Revisions to Amhdal’s laws –Balanced system law: Measure instruction rate and IO rate on relevant workload –Memory law: MB to MIPS ratio rising from 1 to 4 Re-iteration of the growth in RAM as disk IOs become expensive –IO law: Workload dependent instructions per IO was geared toward random IO Increased sequentiality (discussed earlier) in disk accesses means higher instructions per IO

Gilder’s Law Network bandwidth would triple every year for the next 25 years (prediction in 1995) Link bandwidth triples every four years Network messages used to cost more instructions and IO instructions per byte than disk –Network protocol processing overheads –These overheads have been reduced due to smarter NICs Cost comparison –Cost of moving data over WAN much more expensive than from local disk over LAN Related: Cost of shipping large disk arrays or entire computers comparable to the cost of data transfer over the Internet –However, this price gap likely to decline soon and bandwidth would be plentiful within a decade Implication: Local disks could then be used as caches (or pre-fetch buffers) with the main data store being remote –Save on local storage management costs –Managed data center model - is already seen!

Caching 5 minute rule for random workloads 1 minute rule for sequential worloads Web caches –Cache everything!