Outperforming LRU with an Adaptive Replacement Cache Algorithm Nimrod megiddo Dharmendra S. Modha IBM Almaden Research Center.

Slides:

Advertisements

Similar presentations

ARC: A self-tuning, low overhead Replacement Cache

Advertisements

9.4 Page Replacement What if there is no free frame?

Page Replacement Algorithms

4.4 Page replacement algorithms

Song Jiang1 and Xiaodong Zhang1,2 1College of William and Mary

Page Replacement Algorithms

Online Algorithm Huaping Wang Apr.21

Chapter 101 The LRU Policy Replaces the page that has not been referenced for the longest time in the past By the principle of locality, this would be.

Chapter 11 – Virtual Memory Management

Virtual Memory 3 Fred Kuhns

Yannis Smaragdakis / 11-Jun-14 General Adaptive Replacement Policies Yannis Smaragdakis Georgia Tech.

The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling.

Chapter 11 – Virtual Memory Management

Page-replacement policies On some standard algorithms for the management of resident virtual memory pages.

Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.

Page Replacement Algorithms

A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:

Paging: Design Issues. Readings r Silbershatz et al: ,

ARC: A SELF-TUNING, LOW OVERHEAD REPLACEMENT CACHE

Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Virtual Memory Operating System Concepts chapter 9 CS 355

Virtual Memory Background Demand Paging Performance of Demand Paging

Virtual Memory Introduction to Operating Systems: Module 9.

1 Lecture 9: Large Cache Design II Topics: Cache partitioning and replacement policies.

Cache Memory By JIA HUANG. "Computer Science has only three ideas: cache, hash, trash.“ - Greg Ganger, CMU.

Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.

Slides prepared by Rose Williams, Binghamton University Chapter 5 Defining Classes II.

Virtual Memory. Names, Virtual Addresses & Physical Addresses Source Program Absolute Module Name Space P i ’s Virtual Address Space P i ’s Virtual Address.

1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University.

CSI 400/500 Operating Systems Spring 2009 Lecture #9 – Paging and Segmentation in Virtual Memory Monday, March 2 nd and Wednesday, March 4 th, 2009.

ECE7995 Caching and Prefetching Techniques in Computer Systems Lecture 8: Buffer Cache in Main Memory (IV)

A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan.

Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.

CSCI2413 Lecture 6 Operating Systems Memory Management 2 phones off (please)

New Visual Characterization Graphs for Memory System Analysis and Evaluation Edson T. Midorikawa Hugo Henrique Cassettari.

Maninder Kaur VIRTUAL MEMORY 24-Nov

Chapter 3 Memory Management: Virtual Memory

Page 19/17/2015 CSE 30341: Operating Systems Principles Optimal Algorithm  Replace page that will not be used for longest period of time  Used for measuring.

by B. Zadrozny and C. Elkan

Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.

Online Paging Algorithm By: Puneet C. Jain Bhaskar C. Chawda Yashu Gupta Supervisor: Dr. Naveen Garg, Dr. Kavitha Telikepalli.

« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)

MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.

Qingqing Gan Torsten Suel CSE Department Polytechnic Institute of NYU Improved Techniques for Result Caching in Web Search Engines.

Chapter 21 Virtual Memoey: Policies Chien-Chung Shen CIS, UD

Computer Architecture Lecture 26 Fasih ur Rehman.

An Effective Disk Caching Algorithm in Data Grid Why Disk Caching in Data Grids?  It takes a long latency (up to several minutes) to load data from a.

Copyright ©: Lawrence Angrave, Vikram Adve, Caccamo 1 Virtual Memory.

Time Parallel Simulations I Problem-Specific Approach to Create Massively Parallel Simulations.

Project Presentation By: Dean Morrison 12/6/2006 Dynamically Adaptive Prepaging for Effective Virtual Memory Management.

A BRIEF INTRODUCTION TO CACHE LOCALITY YIN WEI DONG 14 SS.

Page Buffering, I. Pages to be replaced are kept in main memory for a while to guard against poorly performing replacement algorithms such as FIFO Two.

Transforming Policies into Mechanisms with Infokernel Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Nathan C. Burnett, Timothy E. Denehy, Thomas J.

Jiahao Chen, Yuhui Deng, Zhan Huang 1 ICA3PP2015: The 15th International Conference on Algorithms and Architectures for Parallel Processing. zhangjiajie,

1 Page Replacement Algorithms. 2 Virtual Memory Management Fundamental issues : A Recap Key concept: Demand paging  Load pages into memory only when.

Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 9: Virtual Memory.

COT 4600 Operating Systems Spring 2011 Dan C. Marinescu Office: HEC 304 Office hours: Tu-Th 5:00-6:00 PM.

LIRS: Low Inter-reference Recency Set Replacement for VM and Buffer Caches Xiaodong Zhang College of William and Mary.

COS 318: Operating Systems Virtual Memory Paging.

LRFU (Least Recently/Frequently Used) Block Replacement Policy

Computer Architecture

Chapter 21 Virtual Memoey: Policies

Demand Paging Reference Reference on UNIX memory management

Demand Paging Reference Reference on UNIX memory management

Distributed Systems CS

Zipf-Distributions & Caching

Lecture 14: Large Cache Design II

Lecture 9: Caching and Demand-Paged Virtual Memory

ARC (Adaptive Replacement Cache)

COMP755 Advanced Operating Systems

Presentation transcript:

Outperforming LRU with an Adaptive Replacement Cache Algorithm Nimrod megiddo Dharmendra S. Modha IBM Almaden Research Center

Outline Introduction. ARC intuition. Cache replacement Algorithms. Cache and History. Adaptive replacement Cache. Experimental Results. Conclusion.

Introduction A commonly used criterion for evaluating replacement policy is its hit ratio Hit ratio: the frequency with which it finds a page in the cache. The LRU is the policy of choice in cache management. Attempts to outperform LRU in practice had not succeeded because of the overhead issues and the need to pretune parameters. The Adaptive replacement cache (ARC) is: – Self-tuning. – Low-overhead that responds online to changing access patterns. –Continually balances between recency and frequency features of the workload, demonstrating that adaptation eliminates the need for the workload-specific pretuning that plagued many previous proposals to improve LRU.

Introduction –Easy to implement, its running time per request is essentially independent of the cache size. (a real life implementation revealed that ARC has low space overhead—0.75 percent of the cache size). –Scan-resistant: it allows one-time sequential requests to pass through without polluting the cache or flushing pages that have temporal locality, otherwise.

ARC Intuition ARC maintains two LRU pages lists: L1 and L2. –L1 maintains pages that have been seen only once, recently. –L2 maintains pages that have been seen at least twice, recently ARC caches only a fraction of pages on these lists. The pages that have been seen twice within a short time may be thought of as having high frequency or having longer term reuse potential. Hence. –L1 captures recency. –L2 captures frequency. If the cache holds c pages, we strive to keep these two lists to roughly the same size, c.

ARC Intuition The two lists, together, comprise a cache directory that holds at most 2c pages. ARC caches a variable number of most recent pages from both L1 and L2 such that the total number of cached pages is c. ARC continually adapts the precise number of pages from each list that are cached. To contrast an adaptive approach with a non adaptive approach, suppose FRC p provides a fixed-replacement policy that attempts to keep in cache the p most recent pages from L1 and c - p most recent pages in L2. thus, ARC behaves like FRC p except that it can vary p adaptively.

ARC Intuition Most algorithms use recency and frequency as predictors of the likelihood that pages will be reused in the future. ARC acts as an adaptive filter to detect and track temporal locality. –If either recency or frequency becomes more important at sometime, ARC will detect the change and adapt its investment in each of the two lists accordingly.

Cache Replacement Algorithms Offline optimal –Lazlo A. Belady’s MIN is an optimal, offline policy for replacing the page in the cache that has the greatest distance to its next occurrence. Recency –the LRU policy always replaces the least recently used page. –This policy has undergone many improvements. –Three of the most important related algorithms: –Clock, WS (working set), WSClock.

Cache Replacement Algorithms Frequency –The independent reference model (IRM) captures the notion of page reference frequencies. –Requests received at different times are independent 1.LFU replaces the least frequently used pages –Drawbacks LFU’s running time per request is logarithmic in the cache size. It is oblivious to recent history. Adapts poorly to variable access patterns by accumulating stale pages with past high-frequency counts, which may no longer be useful. 2.LRU-2 memorizes the times for each cache page’s two most recent occurrences and replaces the page with the least –second – most – recent occurrence.

Cache Replacement Algorithms Drawbacks –Using a priority queues, which gives it logarithmic complexity. –It must tune the parameter – correlated information period, That captures the amount of time a page that has only been seen once recently should be kept in the cache. 3.2Q is an improved method with constant complexity that is used to alleviate (overcome) logarithmic complexity. –Similar to LRU-2, except that it uses simple LRU list instead of priority queue. –ARC’s low computational overhead resembles 2Q’s. –Drawback: –Tuning the parameter – correlated information period.

Cache Replacement Algorithms The low inter reference recency set’s (LIRS) design builds upon 2Q. –LIRS maintains a variable size LRU stack of unbounded size that serves as a cache directory. –LIRS selects a few top pages from the directory, depending on two parameters that affects its performance. Drawback –Due to a certain stack pruning operation, LIRS has an average – case rather than worst – case constant – time overhead, which is a significant drawback.

Cache Replacement Algorithms Summary In contrast to LRU-2, 2Q, LIRS, LRFU algorithms – which all require offline selection of tunable parameters – ARC policy functions online and is completely self tuning. ARC doesn’t suffer from periodic rescaling parameters since it doesn’t maintain frequency count. Unlike LIRS, ARC doesn’t require potentially unbounded space overhead. ARC, LIRS, and 2Q have constant time implementation complexity while LFU and LRU-2 have logarithmic implementation complexity.

Cache and History Assume that c is the cache size in pages Two policies were introduced: 1.DBL(2c) policy, this policy memorizes 2c pages and manages an imaginary cache of size 2c. 2.A class (c) policy of cache replacement polices. –DBL(2c) maintains two LRU lists: L1 which contains the pages that have been seen recently only once. L2 which contains pages that have been seen recently at least twice. –A page resides in L1 if: it has been requested exactly once since the last time it was removed from it was requested only once and never removed from –A page resides in L2 if: It has been requested more than once since the last time it was removed from it was requested more than once and was never removed from

Cache and History The policy functions as follows: –If L1 contains exactly C pages, replace the LRU page in L1, otherwise replace the LRU page in L2. –Initially, the lists are empty: L1 = L2 = –if a requested page resides in, then the page will be moved to the MRU positions of L2, otherwise. –The page will be moved to the MRU position of L1. In the later case, If |L1| = C, then, the policy removes the LRU member of L1 If |L1| < C and |L1| + |L2| = 2C the policy removes the LRU member of L2. The algorithm tries to equalize the sizes of the two lists. The algorithm ensures that, the following invariants will hold always 0<= |L1| + |L2| <=2C and 0<= |L1| <= C. –II(c) are used to track all the 2C items that would be present in a cache of size 2C managed by DBL(2C), but at most C are actually kept in cache.

Cache and History L1 is divided into: –T1, contains the top or most-recent pages in L1 –B1, contains the bottom or least-recent pages in L1. L2 is also partitioned into T2 top and B2 bottom, subject to the following conditions: –If |L1| + |L2| < C, then B1 = B2 =. –If |L1| + |L2| > C-1, then |T1| + |T2| = C. –For i=1,2, either T i or B i is empty or the LRU page in T i is more recent than MRU page in B i. For all traces and at each time, contains exactly those pages, that would be cached under a policy in the class. Pages in T1 and T2 reside in the cache directory and in the cache. History pages in B1 and B2 reside only in the cache directory

Cache and History Once the cache directory has 2c pages, and will both contain exactly C pages. ARC leverages the extra history information in to effect a continual adaptation.

Adaptive replacement cache A fixed replacement cache FRC p ( c ) with a tunable parameter p, 0<= p <= C, in the class ( c ) attempts to keep in cache the p most recent pages from L1 and C – p most recent pages in L2. –If either |T1| > p or (|T1| = p and x B2), replace the LRU page in T1. –If either |T1| < p or (|T1| = p and x B1), replace the LRU page in T2. P represents the current target size for the list T1. ARC behaves like FRC p ( c ), except that p changes adaptively. A hit in B1 suggests an increase in the size of T1. A hit in B2 suggests an increase in the size of T2. The continual updates of P effect these increases. The learning rates depend on the relative sizes of B1 and B2 ARC attempts to keep T1 and B2 to roughly the same size. ARC attempts to keep T2 and B1 to roughly the same size. On a hit in B1, p increments by max{|B1|/|B2|, 1}, But doesn’t exceed C.

Adaptive replacement cache On a hit in B2, p decrements by max{|B1|/|B2|, 1}, but it never drops below zero. ARC never stops adapting, so it always responds to workload changes from IRM to SDD and vice versa. Because = always contain the LRU C pages, LRU cannot experience cache hits unbeknownst to ARC, but ARC can and often does experience cache hits unbeknownst to LRU. If a page is not in, the system places it at the top of L1. from there, it makes its way the LRU position in L1, unless requested once again prior to being evicted from L1, it never enters L2. ARC is scan resistant, since long sequence of read-once requests passes through L1 without flushing out possibly important pages in L2.

Experimental results Comparing the performance of various algorithms on various traces OLTP contains an hour’s worth of references to a CodasyI database. Collecting P1 through P14 over several months from a Windows NT. –COnCat, obtained by concatenating traces P1 through P14. –Merge(P), obtained by merging them using time stamps on each request. –DS1, a seven day trace, from a commercial database server –All these traces have a page size of 512byts. Capture a trace of the storage performance Council’s SPC1-like synthetic benchmark, which contains long sequential scans in addition to random access and has a page size of 4Kbytes.

Experimental results Considering three traces – S1, S2, and S3 – that perform desk read accesses initiated by a large commercial search engine in response to various Web search requests over several hours. The traces have a page size of 4Kbytes. Merge(S), obtained by merging S1 through S3 using time stamp on each request. All hit ratio are cold starts and are in percentages.

The tunable parameters for FBR and LIRS are set according to their original description. The tunable parameters for LRU-2, 2Q, and LRFU were selected offline for the best result for each cache size. ARC requires no user specified parameters. ARC outperforms LRU, LFU, FBR, LIRS and MQ. ARC performs as well as LRU-2, 2Q, LRFU, and MIN with their respective offline best parameters values.

The tunable parameter for MQ were set online The tunable parameters for other algorithms were chosen offline to be optimized for each cache size and workload. ARC outperforms LRU and performs nearly as well as competitively against 2Q, LRU-2, LRFU, LIRS, and MQ

Comparing ARC with LRU for all traces with practically relevant sizes Due to scan resistant, ARC outperforms LRU sometimes, dramatically ARC, working online, performs closely to and sometimes better than FRC p with best offline fixed choice of the parameter P for all traces.

ARC and LRU hit ratio(in percentages) vs Cache size (in pages) in log-log scale for traces P6, spc1-like, and Merge(S)

Conclusion The results show that the self – tuning, lower-overhead, scan- resistant ARC cache-replacement policy outperforms LRU. Using Adaptation in a cache replacement policy can produce considerable performance improvements in modern caches.