Qingbo Zhu, Asim Shankar and Yuanyuan Zhou

Slides:



Advertisements
Similar presentations
Reducing Energy Consumption of Disk Storage Using Power Aware Cache Management Qingbo Zhu, Francis M. David, Christo F. Deveraj, Zhenmin Li, Yuanyuan Zhou.
Advertisements

Conserving Disk Energy in Network Servers ACM 17th annual international conference on Supercomputing Presented by Hsu Hao Chen.
The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling.
ULC: An Unified Placement and Replacement Protocol in Multi-level Storage Systems Song Jiang and Xiaodong Zhang College of William and Mary.
ARC: A SELF-TUNING, LOW OVERHEAD REPLACEMENT CACHE
Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.
Managing Wire Delay in Large CMP Caches Bradford M. Beckmann David A. Wood Multifacet Project University of Wisconsin-Madison MICRO /8/04.
1 Storage-Aware Caching: Revisiting Caching for Heterogeneous Systems Brian Forney Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau Wisconsin Network Disks University.
1 Storage-Aware Caching: Revisiting Caching for Heterogeneous Storage Systems (Fast’02) Brian Forney Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau Wisconsin.
MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES D. Colarelli D. Grunwald U. Colorado, Boulder.
1 Conserving Energy in RAID Systems with Conventional Disks Dong Li, Jun Wang Dept. of Computer Science & Engineering University of Nebraska-Lincoln Peter.
Energy Conservation in Datacenters through Cluster Memory Management and Barely-Alive Memory Servers Vlasia Anagnostopoulou Susmit.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Caching for File Systems Andy Wang COP 5611 Advanced Operating Systems.
RIMAC: Redundancy-based hierarchical I/O cache architecture for energy-efficient, high- performance storage systems Xiaoyu Yao and Jun Wang Computer Architecture.
Energy Efficient Prefetching – from models to Implementation 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software Engineering.
Web Server Load Balancing/Scheduling Asima Silva Tim Sutherland.
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
© 2006 IBM Corporation Adaptive Self-Tuning Memory in DB2 Adam Storm, Christian Garcia-Arellano, Sam Lightstone – IBM Toronto Lab Yixin Diao, M. Surendra.
Reducing Energy Consumption of Disk Storage Using Power- Aware Cache Management Q. Zhu, F. David, C. Devaraj, Z. Li, Y. Zhou, P. Cao* University of Illinois.
1 Design and Performance of a Web Server Accelerator Eric Levy-Abegnoli, Arun Iyengar, Junehwa Song, and Daniel Dias INFOCOM ‘99.
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
DONE-08 Sizing and Performance Tuning N-Tier Applications Mike Furgal Performance Manager Progress Software
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
Power and Energy Conservation Techniques for Disk Array Based Systems Zvika Guz June, 2004.
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
MIAO ZHOU, YU DU, BRUCE CHILDERS, RAMI MELHEM, DANIEL MOSSÉ UNIVERSITY OF PITTSBURGH Writeback-Aware Bandwidth Partitioning for Multi-core Systems with.
Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,
Application Transformations for Energy and Performance-Aware Device Management Taliver Heath, Eduardo Pinheiro, Jerry Hom, Ulrich Kremer, and Ricardo Bianchini.
Energy Efficient Prefetching and Caching Athanasios E. Papathanasiou and Michael L. Scott. University of Rochester Proceedings of 2004 USENIX Annual Technical.
Wireless Cache Invalidation Schemes with Link Adaptation and Downlink Traffic Presented by Ying Jin.
Jiahao Chen, Yuhui Deng, Zhan Huang 1 ICA3PP2015: The 15th International Conference on Algorithms and Architectures for Parallel Processing. zhangjiajie,
Cache Replacement Championship
Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs Lunkai Zhang, Diana Franklin, Frederic T. Chong 1 Brian Neely,
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Web Server Load Balancing/Scheduling
OPERATING SYSTEMS CS 3502 Fall 2017
Performance directed energy management using BOS technique
Memory Segmentation to Exploit Sleep Mode Operation
Jacob R. Lorch Microsoft Research
Greedy & Heuristic algorithms in Influence Maximization
Web Server Load Balancing/Scheduling
Memory Allocation The main memory must accommodate both:
Adaptive Cache Partitioning on a Composite Core
Green cloud computing 2 Cs 595 Lecture 15.
Protocols for Low Power
Scheduling Jobs Across Geo-distributed Datacenters
Cache Memory Presentation I
Memory Management for Scalable Web Data Servers
Database Performance Tuning and Query Optimization
Juan Rubio, Lizy K. John Charles Lefurgy
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
Join Processing in Database Systems with Large Main Memories (part 2)
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
CPU Scheduling G.Anuradha
COT 4600 Operating Systems Spring 2011
Cooperative Caching, Simplified
CARP: Compression-Aware Replacement Policies
Lecture 3: Main Memory.
Chapter 11 Database Performance Tuning and Query Optimization
CENG 351 Data Management and File Structures
CSE 589 Applied Algorithms Spring 1999
Increasing Effective Cache Capacity Through the Use of Critical Words
COMP755 Advanced Operating Systems
Dynamic Power Management for Streaming Data
Dong Hyun Kang, Changwoo Min, Young Ik Eom
Presentation transcript:

Qingbo Zhu, Asim Shankar and Yuanyuan Zhou PB-LRU: A Self-Tuning Power Aware Storage Cache Replacement Algorithm for Conserving Disk Energy Qingbo Zhu, Asim Shankar and Yuanyuan Zhou Presented: Hang Zhao Chiu Tan 12/31/2018

PB-LRU: Partition-Based LRU Storage is a major energy consumer, 27% of power budget in a data center. PB-LRU is a power aware, on-line cache management algorithm. PB-LRU dynamically partitions cache at run time for energy optimal cache size per disk. Practical algorithm that dynamically adapts to workload changes with little tuning. 12/31/2018

Outline Motivation Background Why need PB-LRU? Main Idea Energy estimation at Run Time Solving MCKP Evaluation & Simulation Conclusion 12/31/2018

Motivation Why is power conservation important? Data centers are an important component of the Internet infrastructure. Power needs for a data center are increasing at 25% a year, with storage taking up 27%. How to reduce power in storage? Simple. Spin down disk when not in use. 12/31/2018

Motivation (II) But … Performance and energy penalty when disk moving from low to high mode. Data center volume is high. Idle periods small. Makes spinning up and down impractical. Solution: Multi-speed disk architecture. PB-LRU targets multi-speed disk. 12/31/2018

Background Break-even time: Minimum length of idle time needed justify spinning up/down. Oracle DPM: Knows length of next idle period. Uses this to regulate power modes. Practical DPM: Use thresholds to regulate powering up or down. 12/31/2018

Why need PB-LRU? Earlier work: PA-LRU. Idea: Keep blocks from less active disks in cache. Thus extends idle period. Cost: More misses to active disks. Justification: Since active disks are already spinning, cheaper in terms of power consumption. 12/31/2018

However … PA-LRU requires complicated parameter tuning. 4 parameters needed. No intuition between parameters and disk power consumption or IO times. Thus difficult to adopt simple extensions or heuristics for real world implementation. PB-LRU is a practical implementation ! 12/31/2018

PB-LRU: Main Idea Divide cache into partitions, one for each disk. Each partitioned managed individually. Resize partitions periodically. Workloads are not equally distributed through different disks. 12/31/2018

Main Idea (II) So what do we need? Estimate, for each disk, the energy consumed for a particular cache size. (estimation problem) Use these estimates to find partitioning that minimize total energy consumption for all disks. (MCKP problem) 12/31/2018

Estimation Problem Q: How to estimate energy consumption per disk for different cache sizes at run time? Use simulators. One (multi-disk) simulator for every cache size. Requires (NumCacheSizes X NumDisks) simulators. Impractical! 12/31/2018

Estimation Problem (II) Mattson’s Stack: Take advantage of inclusion property. A cache of k blocks is a subset of k+1 blocks. Accessing a stack at position i means a miss at caches smaller than size i. PB-LRU uses Mattson’s Stack to predict hit or a miss for different partition sizes. 12/31/2018

Estimation Problem (III) In addition, PB-LRU keeps track of previous access time and previous energy consumption. With these pieces of information, energy consumption of various cache is estimated. 12/31/2018

Before Cache Size Pre_miss Energy Cache Accesses Time T1 T2 T3 T4 T5 5 possible Cache sizes Stack 1 2 3 4 5 Mattson Stack Cache Size Pre_miss Energy 1 T5 E5 2 3 4 5 RCache 1 2 3 Before Existing Cache (real) 12/31/2018

Miss for cache size < 4 Time T1 T2 T3 T4 T5 T6 Access 5 4 3 2 1 LRU 4th element of stack. Miss for cache size < 4 Stack 4 (1) 1 (2) 2 (3) 3 (4) 5 (5) T6: Access Block 4 Cache Size Pre_miss Energy 1 T6 E6 2 3 4 T5 E5 5 RCache 4 2 3 E6 = E5 + E(T6-T5) + 10ms + ActivePower LRU 12/31/2018

Solving MCKP MCKP is NP-hard. But modified problem solvable using dynamic programming. General result: Increase cache size for less active disks, decrease cache size for active disks. Why? Penalty for reducing cache size of an active disk is small, while the energy saved for increasing cache size for inactive disk is large 12/31/2018

Evaluation Methodology The integrated simulator Disk power model CacheSim DiskSim Multi-speed disks model Similar to IBM Ultrastar 36Z15 Add 4 lower-speed modes: 12k, 9k, 6k and 3k RPM Power model: 2-competitive thresholds 12/31/2018

Evaluation Methodology cont. The traces Real system traces OLTP – database storage system, (21 disks, 128MB cache) Cello96 – Cello file server from HP, (19 disks, 32MB cache) Synthetic traces generated based on storage system workloads zipf distribution to distribute requests among 24 disks and blocks in each disk “hill” shape to reflect temporal locality Inter-request arrival distribution: exponential, Pareto 12/31/2018

Simulation results Algorithms Infinite cache LRU PA-LRU PB-LRU Limited save due to high cold misses rate 64% Algorithms Infinite cache LRU PA-LRU PB-LRU PB-LRU saves 9% Outperform LRU 22% 12/31/2018

Simulation results cont. PB-LRU has 5% better response time saves 40% response time Oracle DPM does not slow down the average response time for it always spin disk in time for a request All PB-LRU results are insensitive to the epoch length 12/31/2018

Accuracy of Energy Estimation OLTP, 21 disks with Practical DPM Largest deviation of estimated energy from real energy is 1.8% 12/31/2018

Cache partition sizes MCKP partition tendency 11-12MB 1MB MCKP partition tendency gives small sizes to disks which remain active increase the sizes assigned to relatively inactive disks 12/31/2018

Effects of spin-up cost Disks stay longer at low-power mode Break-even time increases 12/31/2018

Sensitivity Analysis on Epoch Length The epoch length just needs to be large enough to accommodate the “warm-up” period after re-partitioning. 12/31/2018

Conclusion PB-LRU: online storage cache replacement algorithm partitioning the total system cache amongst individual disks It focuses on multiple disks with data center workloads Achieving similar or better energy saving and response time improvement with significant less parameter tuning 12/31/2018

Future work Taking pre-fetching into consideration to investigate the role of cache management in energy conservation Optimally divide the total cache between the cache and pre-fetching buffers Implement the disk power modeling component into the real storage system 12/31/2018

Impact of PB-LRU 5 citations found at Google Scholar Energy conservation techniques for disk array-based servers (ICS’04) Performance Directed Energy Management for Main Memory and Disks (ASPLOS’04) Power Aware Storage Cache Management Power and Energy Management for Server Systems Management Issues 12/31/2018