Download presentation
Presentation is loading. Please wait.
Published byMagnus Fletcher Modified over 9 years ago
1
Prefetching with Adaptive Cache Culling for Striped Disk Arrays Sung Hoon Baek and Kyu Ho Park shbaek@core.kaist.ac.kr kpark@ee.kaist.ac.kr Korea Advanced Institude of Science and Technology (KAIST) School of Electrical Engineering and Computer Science 1 2008 USENIX Annual Technical Conference
2
Our Work 2 Introduction
3
Disk Prefetching Schemes Accurate Prediction 1. Offline Prefetching 2. History-Based Prefetching 3. Application-Hint-Based Prefetching Sequential Prediction 4. Sequential Prefetching Most widely used, never beneficial to non-sequential accesses Our Scheme: Goal Beneficial to non-seq. read as well as seq. reads Very practical as much as Sequential Prefetching Approach Low prefetch cost while sacrificing prediction accuracy Consider both prefetch buffer management and prefetching For Striped Disk Arrays: RAID-0, RAID-5, RAID-6, SSD, etc 3 High overhead Impractical
4
Prior Work Buffer Management for Prefetched Data Related Work: TIP [1,2] Deterministic cost estimation makes errors Scan overhead: search the least-valuable block: O(N) Adaptive Strip Prefetching: the proposed scheme Practical Scheme. Low overhead: O(1) Inspired by ARC and SARC, which are for cached data More analytical method, for prefetched data Specialized for RAID 4 [1] R.H. Patterson and et al “Informed Prefetching and Caching,” ACM OSP, (Dec 1995) [2] A. TOMKINS, et al “Informed multiprocess prefetching and caching. ACM Int’l Conf. on MMCS (June 1997) Prior Work
5
Prior Works vs. Our Work 5 Buffer Management for Prefetched Data (TIP) [1] Adaptive Cache Management (ARC [2], SARC) (1) A New Prefetching non-seq. read, seq. read, very practical, for RAID (1) A New Prefetching non-seq. read, seq. read, very practical, for RAID + + (2) Prefetch Buffer Management Similar Method Similar Goal (3) An Online Cost Estimator [1] R.H. Patterson and et al “Informed Prefetching and Caching,” ACM OSP, (Dec 1995) [2] Megiddo and Modha, “ARC: A self-tuning, low overhead replacement cache”, USENIX FAST, 2003 Prior Works Our Work Prefetched Data Mgmt., More Analytical Method Prefetched Data Mgmt., More Analytical Method Cached Data Mgmt. O(N) O(1) Tightly Integrated Resolve bad cache utilization
6
RAID Layout 6 Problem: Independency
7
Adaptive Strip Prefetching (ASP) Strip Prefetching Read all blocks of a strip Segment prefetching Segment = Strip Bad cache utilization, unused data pollutes the cache Adaptive Cache Culling Buffer Management for Prefetched data Differential Feedback Online Prefetch Cost Estimation 7 My Work: Adaptive Strip Prefetching
8
Strip Prefetching 8 Non-sequential reads may be beneficial to SP nor not. However, Most non-sequential reads in real workloads also exhibit spatial locality unlike random reads on a huge workspace. So in many cases, SP provides performance gain. For random reads on a huge workspace, SP is deactivated by the online disk simulator.
9
Best Segment Size for a Segment Prefetching? 200 threads performs random read in a fixed read size Three UltraSCSI disks (15krpm) One strip Two strip Three strip One strip, twice strip Half strip, twice strip One strip Two strips Three strips twice strip size 128 KiB /128 KiB 256 KiB /128 KiB 384 KiB /128KiB 256 KiB /256 KiB 128 KiB /256 KiB Request size / Strip size: Bandwidth / prefetch size One strip Two strip half strip
10
Adaptive Strip Prefetching (ASP) Strip Prefetching Bad cache utilization, useless data pollutes the cache Adaptive Cache Culling (prefetch buffer management) Mitigate the disadvantage of strip prefetching Buffer Management for Prefetched Data Cull uselessly prefetched data Maximize total hit rate = prefetch hit rate + cache hit rate In a given cache management A differential feedback (an automatic manner) Prefetch hit: A request on a prefetched block Cache hit: A request on a cached block Online Prefetching Cost Estimation 10 My Work : Adaptive Strip Prefetching
11
Downstream Block States in Adaptive Strip Prefetching 11 Upstream
12
Basic Operations of ASP (1/2) 12 Adding a new strip cache to the upstream Culling Upstream N U : # of strip caches, variable Downstream Get free block caches Empty block Prefetched block Cached block Strip cache
13
Basic Operations of ASP (2/2) 13 Cache hit Cache miss : strip prefetching Cache hit Cache miss Upstream N U : max. # of strip caches, adaptively controlled variable
14
Cache Replacement Policy 14 MRU LRU Eviction (no ASP) pointing Cache Replacement Policy Culling (ASP) Prefetch Buffer Management A Global LRU list Global Bottom Hit
15
N U vs. hit rate 15 position Prefetch hit: hit on prefetched block ΔP: partial prefetch hit rate (hit rate on prefetched block ) ΔC: partial cache hit rate (hit rate on cached block ) position Additional cached blocks Hit rate for each position Additional cache hit rate Reduced prefetch hit rate N U = 9 N U = 7
16
Total Hit Rate vs. N U (1/2) 16 Find the optimal N U that maximizes the total hit rate Feedback Control: N U ←N U +s× slope
17
Total Hit Rate vs. N U (2/2) 17 Monotonically Increasing Function Slope ≥ 0 N U ←min(N U +C× slope, N U max ) Force N U to be the maximum value Monotonically Decreasing Function Slope ≤ 0 N U ←max(N U +C× slope, N U min } Force N U to ZERO
18
Derivative vs. Marginal Utility 18 Marginal Utility (inspired by SARC) Derivative Additional allocation Original upstream bottom
19
Differential Feedback 19 UpstreamDownstream culling Upstream Bottom (U b )Global Bottom (G b ) ΔP: # of prefetching hits in U b during a time interval ΔC: # of cache hits in G b during a time interval Proportional control Further work: PID (proportional-integral-derivative) control
20
Differential Feedback Diagram 20 Cache with Strip Prefetching + NUNU ZOH α -ΔCΔC ΔPΔP + + delay S+ + workload It maximizes the total hit rate in a given buffer management, and resolves the disadvantage of strip prefetching.
21
Initial Condition 21 Upstream Upstream Bottom & Global Bottom Overlapped two bottoms N a ← cache size / strip size Init: N U ← N a No feedback until N U + N D <= N a Force to perform Strip Prefetching until N U + N D <= N a No Downstream
22
Ghosts 22 Upstream Downstream Ghosts eviction Past cached block, which was the cached block before it become a ghost Upstream Downstream culling Culling: do not evict either past cached blocks or cached blocks Cache miss
23
Which become a ghost strip? 23 Our goal: easy implementation RAID drivers manages destage caches in terms of the stripe. A stripe cache includes its strip caches Example 1. Stripe2 has live strip caches for strip2A and strip2B 2. strip2A is evicted then it becomes a ghost 3. strip2B is evicted then they are completely removed
24
Online Cost Estimation (1/2) 24 The differential feedback resolves the disadvantage of strip prefetching But it is not beneficial to random reads Random reads cause rare prefetch hits and cache hits. The Online Cost Estimation Investigates which choice is better between Strip Prefetching and no prefetching Activate/deactivate Strip Prefetching
25
Online Cost Estimation (2/2) 25 Low Overhead O(1) Complexity
26
Evaluation Implemented a RAID-5 driver in Linux 2.6.18 Five SCSI320 disks (15krpm, 73GB) Dual Xeon 3.0GHz, 1GB of memory Combinations ASP+MSP ASP+SEQP MSP+SEQP ASP+MSP+SEQP SEQP: Sequential Prefetching of Linux SEQPX : SEQP with X KiB of prefetching size SP: Strip Prefetching ASP: Adaptive Strip Prefetching Measurement: six repetition, low deviation 26
27
PCMark 05 27 Over-provisioned memory General Application Usage Word, Winzip, PowerCrypt, Antivirus, Winamp, WMP, Internet, etc 2.2 times
28
Dbench Dbench: Realistic workload like a file server 28 11 times 30 % 2.2 times
29
Tiobench: Decision Correctness 29 Random Reads: Extremely low cache/prefetching hit rate Feedback does not work The online cost estimator makes the decision No prefetching
30
Maximum Latency & CPU Load Tiobench (random read) 30 Maximum latency CPU load / Throughput
31
IOZone : Independency IOZone Benchmark Concurrent sequential reads 31 Independency loss Including SEQP The best Parallelism loss Independency loss Parallelism loss Including MSP
32
IOZone: Stride/Reverse Read Stride Read 32 Reverse Read 40 times ASP includedSequential Prefetching
33
TPC Benchmark TM H TPC-H: business-oriented database server benchmark DBMS: MySQL Stride reads and non-sequential reads 33 27% 134% 721% 24% 41% 199% 52% 27% 37% 20% 73% 141% The gain of ASP+MSP over SEQP128
34
Real Scenarios cscope: C source file indexing of the kernel source cscope1: exclude object files cscope2: include object files 34 glimpse: text file indexing (/usr/share/doc) for cross reference link: linking kernel object codes 116% 10% 107% 44%
35
Linux Booting 35 30%
36
Summary Non-sequential reads as well as sequential reads Database Queries, Building Search Indices Link, Booting, File server General application usage Prefetch Buffer Management (Differential Feedback) Resolves the bad cache utilization of strip prefetching Online Disk Cost Simulation Resolve the bad prefetch cost of strip prefetching Practical, Low overhead, Great performance gain for practical RAID systems 36
37
Q&A 37
38
38 Step response NUNU Time Desired N U Real N U by the feedback control Realistic N U Initial N U
39
39 Backup Slides
40
Massive Stripe Prefetching 40 Prior Work: for parallelism Adaptive Strip Prefetching (ASP) Good for large numbers of concurrent IOs Bad Parallelism for small numbers of concurrent IOs Massive Stripe Prefetching (MSP) Our Prior Work Resolve Parallelism Loss Activated for a small number of concurrent sequential reads Prefetching multiple stripes Perfect parallelism of disks
41
The Prefetching Size and Time of MSP 41 The amount of sequential accesses in a file Prefetch size SEQP MSP + SEQP MSP Stripe size MSP is aligned in stripe Proposed scheme: for parallelism
42
The Coefficient α 42 The amount of memory in the increased region in U = the amount of memory in the reduced region in D
43
Further Work Optimal S ? or Dynamically controlling S Optimal Size of Upstream Bottom |U b | ? Ideal Derivative, Great Errors Impractical 43
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.