Presentation on theme: "07/05/20051 The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms by Ali R. Butt, Chris Gniady, and Y.Charlie Hu, SIGMETRICS05."— Presentation transcript:
07/05/20051 The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms by Ali R. Butt, Chris Gniady, and Y.Charlie Hu, SIGMETRICS05 Course: CSCI 780 – Advanced Topics on Caching Techniques in Computer and Distributed Systems Presenter: Chuan Yue
07/05/20052 Outline The Buffer Cache Linux Kernel Prefetching Adapted Buffer Cache Replacement Algorithms Simulation Results Conclusions Discussions
07/05/20053 Buffer Cache in Main Memory Two kinds of I/O operations: –Direct access read()/write() use block-based buffer cache –Memory-mapped I/O share page cache with the virtual memory system Naturally that leads to two separate buffers Problems: –Double buffering –Inconsistencies I/O using read/write virtual memory memory- mapped I/O page cache buffer cache disk
07/05/20054 Unification of Buffer Cache and Page Cache A unified buffer cache uses the same page cache to store –Virtual memory pages –Memory-mapped pages –Ordinary file system I/O Issues: –complex interactions between file system and VM I/O using read/write virtual memory memory- mapped I/O disk unified buffer cache
07/05/20055 Buffer Cache Management Designing effective buffer cache replacement algorithms is a fundamental challenge in improving system performance –Traditional file I/O system –Virtual memory system Various buffer cache replacement algorithms –LRU replacement is widely used –LRUs inability to cope with access patterns with weak locality –Other well-known algorithms that utilize recency information: LRU-2, 2Q, LIRS, LRFU, MQ, ARC
07/05/20056 Prefetching Prefetching is another highly effective technique used for improving the I/O performance The main motivation for prefetching is to overlap computation with I/O and thus reduce exposed latency of I/O Various prefetching techniques: –Prefetching using user inserted hints of I/O access patterns Drawback: placing burden on programmer –File system kernel-driven prefetching in modern operating systems Synchronous read-ahead to amortize seek cost Asynchronous prefetching after detecting sequential access patterns
07/05/20057 The impact of kernel prefetching on buffer cache replacement algorithms performance The close interactions between caching and prefetching –Prefetching file blocks into cache can be harmful (P. Cao, et. al., 1995) –Both replacement policy & prefetching buffer cache hit ratio –Hit ratio, prefetching & clustering I/O disk traffic –I/O disk traffic file system performance Almost all proposed buffer cache replacement algorithms didnt take into account the kernel driven prefetching The work in this paper: –Shows the potential performance impact of kernel prefetching on buffer cache replacement algorithms –Presents the simulation results on 8 adapted replacement algorithms
07/05/20058 Kernel components on the path from file system operations to the disk
07/05/20059 Kernel Prefetching in Linux Prefetching is based on the pattern of accesses to the file –Only considers prefetching for read accesses –Beneficial for sequential accesses to a file Read-ahead Group and Read-ahead Window Synchronous Prefetching and Asynchronous Prefetching 1 2 3 4 5 6 7 8 9 10 group window group window new group window
07/05/200510 Beladys algorithm can be non-optimal given kernel prefetching Access sequence: a c e g i k m o a b c d e f g h i j k l m n o p Without prefetching: Beladys Alg. 16 cache misses; LRU 23 cache misses With prefetching: Beladys Alg. 8 cache misses; LRU 6 cache misses
07/05/200511 Prefetching has been ignored in algorithm design Caching algorithms have been proposed and studied without considering prefetching –OPT –LRU –LRU-K [SIGMOD 1993] –2Q [VLDB 1994] –LRFU [TC 2001] –MQ [USENIX 2001] –LIRS [SIGMETRICS 2002] –ARC [FAST 2003] Changes to OPT, LRU, 2Q, LIRS will be explained
07/05/200512 OPT OPT is based on Beladys cache replacement algorithms. –Off-line, has the knowledge of future references In the presence of the Linux kernel prefetching –Prefetched blocks are assumed to be accessed most recently and inserted into the cache according to the original OPT algorithm –But, OPT is added the capability to immediately determine wrong prefetches, i.e., prefetched blocks that will not be accessed on-demand at all, or will be accessed further in future than all other blocks in the cache –Wrong prefetched blocks become immediate candidates for removal
07/05/200513 LRU LRU is the most widely used replacement policy In the presence of the kernel prefetching, adapted LRU: –Each access, kernel determines the number of blocks that need to be prefetched –Prefetched blocks are inserted in the MRU locations just like regular blocks
07/05/200514 2Q Three buffers and the algorithm: –A1in queue: all missed blocks are initially placed –A1out queue: when blocks are replaced from the A1in queue in the FIFO order, their addresses are temporarily placed –Am queue: When a block is re-referenced and its address is in the A1out queue, it is promoted to Am queue Block 10, 11, 12, 13, 14, 11, 12, 22, 10 11 Am A1in A1out Address only 12 13 1422
07/05/200515 2Q – With Adaptation (In the presence of the kernel prefetching) Prefetched blocks are treated as on-demand blocks: –A prefetched block is placed into the A1in queue initially –On the subsequent on-demand access, the block stays in the A1in queue –If the prefetched block is evicted from the A1in queue before any on- demand access, it is simply discarded, as opposed to being moved into the A1out queue –If a block currently in the A1out queue is prefetched, it is promoted into Am queue as if it is accessed on-demand Demand & Prefetch blocks 10, 11, 12, 11, 13, 14, 11, 22, 23 1011 Am A1in A1out Address only 12 13 14 22 23
07/05/200516 LIRS Dynamically and responsively maintains the LIR block set and HIR block set and keeps LIR block set in the cache In the presence of the kernel prefetching, adapted LIRS: –Prefetched blocks are not inserted into the LIRS stack S, they are only inserted into the HIR stack Q –If a prefetched block did not have an existing entry in LIRS stack S, the first on-demand access to the block will cause it to be inserted onto the top of LIRS stack S as a HIR block –If a prefetched block exists in LIRS stack S, the first on-demand access to the block will be treated as a LIR block access
07/05/200517 Performance Evaluation Trace collection –Interception of I/O system calls (using modified linux strace utility) –Collect I/O access type, time, file identifier (inode), and I/O size Timing accurate trace simulator –Detailed implementation of kernel prefetching and clustering –Interface with DiskSim simulator to simulator I/O time –Implementation of: OPT, LRU, LRU-2, LRFU, LIRS, MQ, 2Q, ARC Metrics –Hit ratio –Aggregated synchronous and asynchronous disk I/O requests –Actual running time
07/05/200519 Hit ratio results for cscope Kernel prefetching has a significant impact on the hit ratio The improvement for different algorithms differ Prefeching can result in significant changes in the relative performance of replacement algorithms
07/05/200520 Disk requests results for cscope The clustering of I/O requests in the presence of prefetching results in a significant reduction in the number of disk requests The effect is complex and closely tied to the file access patterns
07/05/200521 Execution time results for cscope Reduction in the # of disk requests due to kernel prefetching does not necessarily translate into reduction in execution time.
07/05/200522 Results for other three sequential access applications Glimpse –It also benefits from prefetching –The changes in the relative behavior of different algorithms observed in cscope with prefetching are also observed in glimpse Viewperf –It benefits the most from prefetching –The behavior of different cache replacement algorithms is similar to that observed in cscope Gcc –Many accesses are to small files, little opportunity for prefetching –All three performance metrics are almost identical with and without prefetching
07/05/200523 Hit ratio results for tpc-h Prefetching provides little improvement on the hit ratio for random access pattern
07/05/200524 Disk requests results for tpc-h Most of prefetched blocks are not accessed and as a result the number of disk requests is doubled
07/05/200525 Execution time results for tpc-h The significant increase in the number of I/Os translates into a significant increase in the execution time
07/05/200526 Results for concurrent applications Multi1: cscope, gcc –Similar as that of cscope Multi2: cscope, gcc, viewperf –Similar as that of Multi1, however, prefetching does not improve the execution time because viewperf is CPU-bound Multi3: glimpse, TPC-H –Similar as that of tpc-h
07/05/200527 Number and size of synchronous and asynchronous disk I/Os in cscope at 128MB cache size The total number of disk requests with prefetching is as least 30% lower than without prefetching for all schemes except OPT Most reduction in disk requests comes from issuing asynchronous disk requests which can be overlapped with CPU time
07/05/200528 Conclusions In this research work, the authors –Proposed prefetching implementation for different replacement algorithms –Built a timing simulator to evaluate relative performances The paper shows –Prefetching impacts hit ratio, disk requests, execution time –Comparison of hit ratios is insufficient –Kernel prefetching can narrow the performance gap of different replacement algorithms –Kernel prefetching can also change the relative performance benefits of different replacement algorithms Future buffer caching research should –Take into consideration prefetching and I/O clustering –Simulate execution time
07/05/200529 Discussions (1) Good points –No new algorithm; but the paper is the first to simulate and compare the impact of kernel prefetching on well-known cache buffer replacement algorithms –Results are not very astonishing, we can guess the general results for sequential and random workloads; but this paper is the first to report the results Bad points –The simulation is only based on I/O traces. It better VM traces based results are also presented. –Concurrent applications simulation results are not analyzed in detail (in this paper itself). –It better the unification of buffer cache and page cache in many OSes be considered. It better the competition between process page access and file cache page access be simulated and analyzed.
07/05/200530 Discussions (2) Some questions: –Regarding Beladys anomaly: In LIRS paper: Belady's anomaly appears in 2Q and ARC for glimpse workload In this paper: Without prefetching, their simulation results didn't show Belady's anomaly. With prefetching, Belady's anomaly appears in ARC for glimpse workload Why the differences? LRU has no Belady's anomaly. How about other algorithms? –Regarding simulations: Is there any relationship between cache size selection (in simulation) with the real environment where the trace is collected? Is the performance under thrashing condition still worth simulating?
07/05/200531 References A Study of Integrated Prefetching and Caching Strategies, P.Cao, et., al., ACM SIGMETRICS, 1995 Making LRU Friendly to Weak Locality Workloads: A Novel Replacement Algorithm to Improve Buffer Cache Performance, S. Jiang and X. Zhang, IEEE Transactions on Computers, VOL.54, NO.9, SEPTEMBER 2005 CLOCK-Pro: An Effective Improvement of the CLOCK Replacement, S. Jiang, F. Chen, and X. Zhang, Proceedings of 2005 USENIX Annual Technical Conference (USENIX'05) "Page Replacement in Linux 2.4 Memory Management," Rik van Riel, Proc. of 2001 USENIX Technical Conference, FREENIX track Towards and O(1) VM: Making Linux virtual memory management scale towards large amounts of physical memory, Rik van Riel, Proceedings of the Linux Symposium, July 2003 Journal File Systems in Linux, June 21th, 2005 (http://bulma.net/impresion.phtml?nIdNoticia=1154) The Buffer Cache, June 21th, 2005 (http://www.faqs.org/docs/linux_admin/buffer-cache.html) The Performance Impact of Kernel Prefetching on Buffer Cache Replacement, Chris Gniady, et., al., (Purdue University), ACM SIGMETRICS 2005 presentation slides More on File System (lecture notes, June 22th, 2005) (http://www.cs.rochester.edu/~kshen/csc256-spring2005/lectures/lecture16-file2.pdf)
Your consent to our cookies if you continue to use this website.