Presentation is loading. Please wait.

Presentation is loading. Please wait.

“Five minute rule ten years later and other computer storage rules of thumb” Authors: Jim Gray, Goetz Graefe Reviewed by: Nagapramod Mandagere Biplob Debnath.

Similar presentations


Presentation on theme: "“Five minute rule ten years later and other computer storage rules of thumb” Authors: Jim Gray, Goetz Graefe Reviewed by: Nagapramod Mandagere Biplob Debnath."— Presentation transcript:

1 “Five minute rule ten years later and other computer storage rules of thumb” Authors: Jim Gray, Goetz Graefe Reviewed by: Nagapramod Mandagere Biplob Debnath Reviewed by: Nagapramod Mandagere Biplob Debnath

2 Outline Problem Statement Motivation Importance and Relevance Main Contributions and Validation Key Ideas Illustrations New Metrics Assumptions Re-write Today Questions

3 Problem Statement Broader Problem: Viewing developments over a long period of time to try and extract important technology trends. Specific Instance: Inferring rules of thumb for buffer replacement policies in a number of settings, including RAID environments. Given: Trends over time for parameters such as memory cost, disk cost, tape cost Given: Trends over time for parameters such as memory cost, disk cost, tape cost Find: Rules of thumb for deciding where to store the data and when to replace data from memory buffer Find: Rules of thumb for deciding where to store the data and when to replace data from memory buffer Objectives: Simple rules, extensible rules Objectives: Simple rules, extensible rules Constraints: Hierarchical Storage Model Constraints: Hierarchical Storage Model

4 Typical Database Administrators Dilemma The performance isn’t good. Am I doing something wrong? Should I cache on the client? Should I cache this data in memory? Should store data back on disk? (local or network disk) Should I move data to tape?

5 Importance & Relevance Different rates at which parameters changes seek/second & Disk capacity – 10x to 100x Disk MB/K$ & DRAM MB/K$ - 1000x

6 Importance & Relevance The location of data is very important Main Memory: Very Fast, Expensive, limited size Main Memory: Very Fast, Expensive, limited size Disk Storage: Lot slower that main memory, inexpensive, close to unlimited size Disk Storage: Lot slower that main memory, inexpensive, close to unlimited size Tape Storage: Slowest, dirt cheap, unlimited capacity Tape Storage: Slowest, dirt cheap, unlimited capacity How can one decide what data resides where? System Learns from data access patterns and adapts (Admins hate to give up control) System Learns from data access patterns and adapts (Admins hate to give up control) Administrator controls data locality by using some experience or historical performance info (rules of thumb) Administrator controls data locality by using some experience or historical performance info (rules of thumb)

7 Main Contributions & Validation The Five minute rule Randomly accessed buffer pages can be replaced if unused for more than 5 minutes. Randomly accessed buffer pages can be replaced if unused for more than 5 minutes. Sequentially accessed buffer pages can be replaced if unused for more than 1 minute. Sequentially accessed buffer pages can be replaced if unused for more than 1 minute. Metrics for storage performance characterization Cost/Access Cost/Access Maps: Megabyte accesses per second Maps: Megabyte accesses per second Scan: Time it takes to sequentially read or write all the data in the device Scan: Time it takes to sequentially read or write all the data in the device Validation Methodology - Examples Examples Examples Random access On pass sort Two pass sort Trends observed over a period of time Trends observed over a period of time

8 Key Ideas Tradeoff between the cost of RAM and the cost of disk accesses. The tradeoff is that caching pages in the extra memory can save disk IOs. The tradeoff is that caching pages in the extra memory can save disk IOs. The break-even point is met when the rent on the extra memory for cache ($/page/sec) exactly matches the savings in disk accesses per second ($/disk_access/sec). The break-even point is met when the rent on the extra memory for cache ($/page/sec) exactly matches the savings in disk accesses per second ($/disk_access/sec).

9 Illustration – Typical System in 1997 For a system with following characteristics PagesPerMBofRAM = 128 pages/MB (8KB pages) PagesPerMBofRAM = 128 pages/MB (8KB pages) AccessesPerSecondPerDisk = 64 access/sec/disk AccessesPerSecondPerDisk = 64 access/sec/disk PricePerDiskDrive = 2000 $/disk (9GB + controller) PricePerDiskDrive = 2000 $/disk (9GB + controller) PricePerMBofDRAM = 15 $/MB_DRAM PricePerMBofDRAM = 15 $/MB_DRAM The Inter reference interval is 266 seconds ~ 5 minutes

10 Illustration One pass algorithms reads data and never references it, reads data and never references it, no need to cache the data in RAM. no need to cache the data in RAM. system needs only enough buffer memory to allow data to stream from disk to main memory. system needs only enough buffer memory to allow data to stream from disk to main memory. Typically, two or three one-track buffers (~100 KB) are adequate per disk to buffer disk operations and allow the device to stream data to the application. Typically, two or three one-track buffers (~100 KB) are adequate per disk to buffer disk operations and allow the device to stream data to the application.

11 Illustration Two pass algorithms sequential operations that read a large dataset and then revisit parts of the data. sequential operations that read a large dataset and then revisit parts of the data. Database join, cube, rollup, and sort operators Database join, cube, rollup, and sort operators Sorting uses two pass if memory size is smaller than the data set size Sorting uses two pass if memory size is smaller than the data set size Inter reference time is typically about a minute (sequential data access) Inter reference time is typically about a minute (sequential data access)

12 Illustration – Two Pass Sort One pass sort needs larger amount of memory Memory needed grows faster with size of input file For files bigger than memory size, two pass is the only option

13 Disk vs Tape tradeoff Tape vs Disk Trade off ????? Tape - larger penalty (slower access, least cost) Solution – Larger breakeven point, bigger page size

14 New Metrics Data flow applications which stream huge amounts of data like data mining applications, multimedia applications New Metrics Kaps Kaps Kilo byte accesses per second Maps Maps Mega byte accesses per second Scan Scan Time taken to sequentially read or write all data on a device These metrics combined with rent costs provide a price/performance metric

15 Assumptions Disk storages have same characteristics (cost/performance). It assumes that the disk storage systems is homogenous and does not consider the more recent shift towards hierarchical/heterogeneous storage systems. The trade off only consider the performance aspect, the security and fault tolerance issues are assumed to be uniform throughout.

16 Re-write Re-evaluate the rules of thumb considering more recent costs and the more recent trends in storage systems like heterogeneous/hierarchical storage Take into account SAN, NAS characteristics Take into account SAN, NAS characteristics

17 Questions??? Does Five minute rule hold good today??? No (With Reservations) If one changes the Page Size to MegaByte range, five minute rule still applies. If one changes the Page Size to MegaByte range, five minute rule still applies. Pages/MB of RAM = 16 (8 K pages) Access/sec/disk = 64 Price/disk drive = $400 Price/MB of RAM = $0.1 Break even point ~ 1000s Further Evidence - Jim (Keynote in FAST 2004) Grayhttp://www.usenix.org/events/fast05/


Download ppt "“Five minute rule ten years later and other computer storage rules of thumb” Authors: Jim Gray, Goetz Graefe Reviewed by: Nagapramod Mandagere Biplob Debnath."

Similar presentations


Ads by Google