Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prefetching with Adaptive Cache Culling for Striped Disk Arrays Sung Hoon Baek and Kyu Ho Park Korea Advanced.

Similar presentations


Presentation on theme: "Prefetching with Adaptive Cache Culling for Striped Disk Arrays Sung Hoon Baek and Kyu Ho Park Korea Advanced."— Presentation transcript:

1 Prefetching with Adaptive Cache Culling for Striped Disk Arrays Sung Hoon Baek and Kyu Ho Park shbaek@core.kaist.ac.kr kpark@ee.kaist.ac.kr Korea Advanced Institude of Science and Technology (KAIST) School of Electrical Engineering and Computer Science 1 2008 USENIX Annual Technical Conference

2 Our Work 2 Introduction

3 Disk Prefetching Schemes  Accurate Prediction 1. Offline Prefetching 2. History-Based Prefetching 3. Application-Hint-Based Prefetching  Sequential Prediction 4. Sequential Prefetching  Most widely used, never beneficial to non-sequential accesses  Our Scheme:  Goal  Beneficial to non-seq. read as well as seq. reads  Very practical as much as Sequential Prefetching  Approach  Low prefetch cost while sacrificing prediction accuracy  Consider both prefetch buffer management and prefetching  For Striped Disk Arrays: RAID-0, RAID-5, RAID-6, SSD, etc 3 High overhead Impractical

4 Prior Work  Buffer Management for Prefetched Data  Related Work: TIP [1,2]  Deterministic cost estimation makes errors  Scan overhead: search the least-valuable block: O(N)  Adaptive Strip Prefetching: the proposed scheme  Practical Scheme.  Low overhead: O(1)  Inspired by ARC and SARC, which are for cached data  More analytical method, for prefetched data  Specialized for RAID 4 [1] R.H. Patterson and et al “Informed Prefetching and Caching,” ACM OSP, (Dec 1995) [2] A. TOMKINS, et al “Informed multiprocess prefetching and caching. ACM Int’l Conf. on MMCS (June 1997) Prior Work

5 Prior Works vs. Our Work 5 Buffer Management for Prefetched Data (TIP) [1] Adaptive Cache Management (ARC [2], SARC) (1) A New Prefetching non-seq. read, seq. read, very practical, for RAID (1) A New Prefetching non-seq. read, seq. read, very practical, for RAID + + (2) Prefetch Buffer Management Similar Method Similar Goal (3) An Online Cost Estimator [1] R.H. Patterson and et al “Informed Prefetching and Caching,” ACM OSP, (Dec 1995) [2] Megiddo and Modha, “ARC: A self-tuning, low overhead replacement cache”, USENIX FAST, 2003 Prior Works Our Work Prefetched Data Mgmt., More Analytical Method Prefetched Data Mgmt., More Analytical Method Cached Data Mgmt. O(N) O(1) Tightly Integrated Resolve bad cache utilization

6 RAID Layout 6 Problem: Independency

7 Adaptive Strip Prefetching (ASP)  Strip Prefetching  Read all blocks of a strip  Segment prefetching  Segment = Strip  Bad cache utilization, unused data pollutes the cache  Adaptive Cache Culling  Buffer Management for Prefetched data  Differential Feedback  Online Prefetch Cost Estimation 7 My Work: Adaptive Strip Prefetching

8 Strip Prefetching 8 Non-sequential reads may be beneficial to SP nor not. However, Most non-sequential reads in real workloads also exhibit spatial locality unlike random reads on a huge workspace. So in many cases, SP provides performance gain. For random reads on a huge workspace, SP is deactivated by the online disk simulator.

9 Best Segment Size for a Segment Prefetching? 200 threads performs random read in a fixed read size Three UltraSCSI disks (15krpm) One strip Two strip Three strip One strip, twice strip Half strip, twice strip One strip Two strips Three strips twice strip size 128 KiB /128 KiB 256 KiB /128 KiB 384 KiB /128KiB 256 KiB /256 KiB 128 KiB /256 KiB Request size / Strip size: Bandwidth / prefetch size One strip Two strip half strip

10 Adaptive Strip Prefetching (ASP)  Strip Prefetching  Bad cache utilization, useless data pollutes the cache  Adaptive Cache Culling (prefetch buffer management)  Mitigate the disadvantage of strip prefetching  Buffer Management for Prefetched Data  Cull uselessly prefetched data  Maximize total hit rate = prefetch hit rate + cache hit rate  In a given cache management  A differential feedback (an automatic manner)  Prefetch hit: A request on a prefetched block  Cache hit: A request on a cached block  Online Prefetching Cost Estimation 10 My Work : Adaptive Strip Prefetching

11 Downstream Block States in Adaptive Strip Prefetching 11 Upstream

12 Basic Operations of ASP (1/2) 12 Adding a new strip cache to the upstream Culling Upstream N U : # of strip caches, variable Downstream Get free block caches Empty block Prefetched block Cached block Strip cache

13 Basic Operations of ASP (2/2) 13 Cache hit Cache miss : strip prefetching Cache hit Cache miss Upstream N U : max. # of strip caches, adaptively controlled variable

14 Cache Replacement Policy 14 MRU LRU Eviction (no ASP) pointing Cache Replacement Policy Culling (ASP) Prefetch Buffer Management A Global LRU list Global Bottom Hit

15 N U vs. hit rate 15 position Prefetch hit: hit on prefetched block ΔP: partial prefetch hit rate (hit rate on prefetched block ) ΔC: partial cache hit rate (hit rate on cached block ) position Additional cached blocks Hit rate for each position Additional cache hit rate Reduced prefetch hit rate N U = 9 N U = 7

16 Total Hit Rate vs. N U (1/2) 16  Find the optimal N U that maximizes the total hit rate Feedback Control: N U ←N U +s× slope

17 Total Hit Rate vs. N U (2/2) 17  Monotonically Increasing Function  Slope ≥ 0  N U ←min(N U +C× slope, N U max )  Force N U to be the maximum value  Monotonically Decreasing Function  Slope ≤ 0  N U ←max(N U +C× slope, N U min }  Force N U to ZERO

18 Derivative vs. Marginal Utility 18 Marginal Utility (inspired by SARC) Derivative Additional allocation Original upstream bottom

19 Differential Feedback 19 UpstreamDownstream culling Upstream Bottom (U b )Global Bottom (G b )  ΔP: # of prefetching hits in U b during a time interval  ΔC: # of cache hits in G b during a time interval Proportional control Further work: PID (proportional-integral-derivative) control

20 Differential Feedback Diagram 20 Cache with Strip Prefetching + NUNU ZOH α -ΔCΔC ΔPΔP + + delay S+ + workload It maximizes the total hit rate in a given buffer management, and resolves the disadvantage of strip prefetching.

21 Initial Condition 21 Upstream Upstream Bottom & Global Bottom Overlapped two bottoms N a ← cache size / strip size Init: N U ← N a No feedback until N U + N D <= N a Force to perform Strip Prefetching until N U + N D <= N a No Downstream

22 Ghosts 22 Upstream Downstream Ghosts eviction Past cached block, which was the cached block before it become a ghost Upstream Downstream culling Culling: do not evict either past cached blocks or cached blocks Cache miss

23 Which become a ghost strip? 23  Our goal: easy implementation  RAID drivers manages destage caches in terms of the stripe.  A stripe cache includes its strip caches  Example 1. Stripe2 has live strip caches for strip2A and strip2B 2. strip2A is evicted then it becomes a ghost 3. strip2B is evicted then they are completely removed

24 Online Cost Estimation (1/2) 24  The differential feedback resolves the disadvantage of strip prefetching  But it is not beneficial to random reads  Random reads cause rare prefetch hits and cache hits.  The Online Cost Estimation  Investigates which choice is better between Strip Prefetching and no prefetching  Activate/deactivate Strip Prefetching

25 Online Cost Estimation (2/2) 25  Low Overhead  O(1) Complexity

26 Evaluation  Implemented a RAID-5 driver in Linux 2.6.18  Five SCSI320 disks (15krpm, 73GB)  Dual Xeon 3.0GHz, 1GB of memory  Combinations  ASP+MSP  ASP+SEQP  MSP+SEQP  ASP+MSP+SEQP  SEQP: Sequential Prefetching of Linux  SEQPX : SEQP with X KiB of prefetching size  SP: Strip Prefetching  ASP: Adaptive Strip Prefetching  Measurement: six repetition, low deviation 26

27 PCMark  05 27 Over-provisioned memory  General Application Usage  Word, Winzip, PowerCrypt, Antivirus, Winamp, WMP, Internet, etc 2.2 times

28 Dbench  Dbench: Realistic workload like a file server 28 11 times 30 % 2.2 times

29 Tiobench: Decision Correctness 29 Random Reads: Extremely low cache/prefetching hit rate Feedback does not work The online cost estimator makes the decision No prefetching

30 Maximum Latency & CPU Load  Tiobench (random read) 30 Maximum latency CPU load / Throughput

31 IOZone : Independency  IOZone Benchmark  Concurrent sequential reads 31 Independency loss Including SEQP The best Parallelism loss Independency loss Parallelism loss Including MSP

32 IOZone: Stride/Reverse Read  Stride Read 32  Reverse Read 40 times ASP includedSequential Prefetching

33 TPC Benchmark TM H  TPC-H: business-oriented database server benchmark  DBMS: MySQL  Stride reads and non-sequential reads 33 27% 134% 721% 24% 41% 199% 52% 27% 37% 20% 73% 141% The gain of ASP+MSP over SEQP128

34 Real Scenarios  cscope: C source file indexing of the kernel source  cscope1: exclude object files  cscope2: include object files 34  glimpse: text file indexing (/usr/share/doc) for cross reference  link: linking kernel object codes 116% 10% 107% 44%

35 Linux Booting 35 30%

36 Summary  Non-sequential reads as well as sequential reads  Database Queries, Building Search Indices  Link, Booting, File server  General application usage  Prefetch Buffer Management (Differential Feedback)  Resolves the bad cache utilization of strip prefetching  Online Disk Cost Simulation  Resolve the bad prefetch cost of strip prefetching  Practical, Low overhead, Great performance gain for practical RAID systems 36

37 Q&A 37

38 38 Step response NUNU Time Desired N U Real N U by the feedback control Realistic N U Initial N U

39 39 Backup Slides

40 Massive Stripe Prefetching 40 Prior Work: for parallelism  Adaptive Strip Prefetching (ASP)  Good for large numbers of concurrent IOs  Bad Parallelism for small numbers of concurrent IOs  Massive Stripe Prefetching (MSP)  Our Prior Work  Resolve Parallelism Loss  Activated for a small number of concurrent sequential reads  Prefetching multiple stripes  Perfect parallelism of disks

41 The Prefetching Size and Time of MSP 41 The amount of sequential accesses in a file Prefetch size SEQP MSP + SEQP MSP Stripe size MSP is aligned in stripe Proposed scheme: for parallelism

42 The Coefficient α 42 The amount of memory in the increased region in U = the amount of memory in the reduced region in D

43 Further Work  Optimal S ? or Dynamically controlling S  Optimal Size of Upstream Bottom |U b | ?   Ideal Derivative, Great Errors  Impractical 43


Download ppt "Prefetching with Adaptive Cache Culling for Striped Disk Arrays Sung Hoon Baek and Kyu Ho Park Korea Advanced."

Similar presentations


Ads by Google