Distributed Systems CS

Distributed Systems CS 15-440
Caching – Part IV Lecture 17, November 15, 2017 Mohammad Hammoud

Today… Last Lecture: Today’s Lecture: Announcements: Cache Consistency
Replacement Policies Announcements: Project 4 is out. It is due on November 27 The deadline for PS5 is extended to November 18 by midnight Quiz II is on November 16 during the recitation time

Key Questions What data should be cached and when?
Fetch Policy How can updates be made visible everywhere? Consistency or Update Propagation Policy What data should be evicted to free up space? Cache Replacement Policy

Working Sets Given a time interval T, WorkingSet(T) is defined as the set of distinct data objects accessed during T It is a function of the width of T Its size (or what is referred to as the working set size) is all what matters It captures the adequacy of the cache size with respect to the program behavior What happens if a client process performs repetitive accesses to some data, with a working set size that is larger than the underlying cache?

The LRU Policy: Sequential Flooding
To answer this question, assume: Three pages, A, B, and C as fixed-size caching units An access pattern: A, B, C, A, B, C, etc. A cache pool that consists of only two frames (i.e., equal-sized page containers) Access A: Page Fault Access B: Page Fault Access C: Page Fault Access A: Page Fault Access B: Page Fault Access C: Page Fault Access A: Page Fault A B A C B A C B A C B A C . . . Although the access pattern exhibits temporal locality, no locality was exploited! This phenomenon is known as “sequential flooding” For this access pattern, MRU works better!

Types of Accesses Why LRU did not perform well with this access pattern, although it is “repeatable”? The cache size was dwarfed by the working set size As the time interval T is increased, how would the working set size change, assuming: Sequential accesses (e.g., unrepeatable full scans) It will monotonically increase The working set will render very cache unfriendly Regular accesses, which demonstrate typical good locality It will non-monotonically increase (e.g., increase and decrease then increase and decrease, but not necessarily at equal widths across program phases) The working set will be cache friendly only if the cache size does not get dwarfed by its size Random accesses, which demonstrate no or very little locality (e.g., accesses to a hash table) The working set will exhibit cache unfriendliness if its size is much larger than the cache size

Example 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 Reference Trace
LRU Chain: Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 Frame 1 Frame 2 Frame 3 Frame 4 # of Hits: # of Misses: # of Hits: # of Misses: # of Hits: # of Misses:

LRU Chain: 7 Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 7 7 7 Frame 1 Frame 2 Frame 3 Frame 4 # of Hits: # of Misses: 1 # of Hits: # of Misses: 1 # of Hits: # of Misses: 1

LRU Chain: 0 7 Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 7 7 7 Frame 1 Frame 2 Frame 3 Frame 4 # of Hits: # of Misses: 2 # of Hits: # of Misses: 2 # of Hits: # of Misses: 2

LRU Chain: 1 0 7 Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 7 7 7 Frame 1 1 1 1 Frame 2 Frame 3 Frame 4 # of Hits: # of Misses: 3 # of Hits: # of Misses: 3 # of Hits: # of Misses: 3

LRU Chain: Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 2 7 7 Frame 1 1 1 1 Frame 2 2 2 Frame 3 Frame 4 # of Hits: # of Misses: 4 # of Hits: # of Misses: 4 # of Hits: # of Misses: 4

LRU Chain: Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 2 7 7 Frame 1 1 1 1 Frame 2 2 2 Frame 3 Frame 4 # of Hits: 1 # of Misses: 4 # of Hits: 1 # of Misses: 4 # of Hits: 1 # of Misses: 4

LRU Chain: Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 2 3 7 Frame 1 3 1 1 Frame 2 2 2 Frame 3 3 Frame 4 # of Hits: 1 # of Misses: 5 # of Hits: 1 # of Misses: 5 # of Hits: 1 # of Misses: 5

LRU Chain: Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 4 3 4 Frame 1 3 2 4 1 Frame 2 2 2 Frame 3 3 Frame 4 # of Hits: 2 # of Misses: 8 # of Hits: 4 # of Misses: 6 # of Hits: 4 # of Misses: 6

LRU Chain: Reference Trace Cache X (size = 3) Cache Y (size = 4) Cache Z (size = 5) Frame 0 3 4 Frame 1 3 2 4 1 Frame 2 2 2 Frame 3 3 Frame 4 # of Hits: 2 # of Misses: 9 # of Hits: 5 # of Misses: 6 # of Hits: 5 # of Misses: 6

Observation: The Stack Property
Adding cache space never hurts, but it may or may not help This is referred to as the “Stack Property” LRU has the stack property, but not all replacement policies have it E.g., FIFO does not have it

Competing Workloads What happens if multiple workloads run in parallel, sharing the same cache? Thrashing (or interference) will arise, potentially polluting the cache, especially if one workload is an only-one-time scan How can we isolate the effects of interferences? Apply static (or fixed) partitioning, wherein the cache is sliced into multiple fixed partitions This requires a-priori knowledge of the workloads With a full knowledge in advance, OPT can be applied! Apply dynamic partitioning, wherein the cache is adaptively resized based on workloads’ evolving access patterns This requires monitoring and tracking the characteristics of workloads

Adaptive Replacement Cache
As an example of a cache that applies dynamic partitioning, we will study: Adaptive Replacement Cache (ARC)

ARC Structure ARC splits the cache into two LRU lists:
L1 = Top Part (T1) + Bottom Part (B1) L2 = Top Part (T2) + Bottom Part (B2) T1 B1 L1: T2 B2 L2:

ARC Structure Content: T1 and T2 contain cached objects and history
B1 and B2 contain only history (e.g., keys for the cached objects) T1: Data + Metadata B1: Metadata L1: T2: Data + Metadata B2: Metadata L2:

ARC Structure Content:
T1 and T2 contain cached objects and history B1 and B2 contain only history (e.g., keys for the cached objects) Together, they remember exactly twice the number of pages that fit in the cache! This can greatly help in discovering patterns of pages that were evicted! T1: Data + Metadata B1: Metadata L1: T2: Data + Metadata B2: Metadata L2:

ARC Structure Sizes: Size (T1 + T2) = c pages They remember c
Size (T1) = p pages Size (T2) = c – p pages They remember c recently evicted pages Size (T1) = p B1 L1: Size (T2) = c - p B2 L2:

ARC Policy Rules: Key Idea:
L1 hosts pages that have been seen only once L1 captures recency L2 hosts pages that have been seen at least twice L2 captures frequency Key Idea: Adaptively (in response to observed workload characteristics) decide how many pages to keep at L1 versus L2 When recency interferes with frequency, ARC detects that and acts in a way that preserves temporal locality at L2

ARC Policy: Details For a requested page Q, one of four cases will happen: Case I: a hit in T1 or T2 If the hit is at T1, evict the LRU page in T2 and keep a record of it in B2 Move Q to the MRU position at T2 Case II: a miss in T1 U T2, but a hit in B1 Remove Q’s record at B1 and increase T1’s size via increasing p This will automatically decrease T2 since Size(T2) = (c – p) Evict the LRU page in T2 and keep a record of it in B2 Fetch Q and place it at the MRU position in T2

ARC Policy: Details For a requested page Q, one of four cases will happen: Case III: a miss in T1 U T2, but a hit in B2 Remove Q’s record at B2 and increase T2’s size via decreasing p This will automatically decrease T1 since Size(T1) = p Evict the LRU page in T2 and keep a record of it in B2 Fetch Q and place it at the MRU position in T2 Case IV: a miss in T1 U B1 U T2 U B2 Evict the LRU page in T1 and keep a record of it in B1 Fetch Q and place it at the MRU position in T1

Scan-Resistance of ARC
Observe that a new page is always placed at the MRU position in T1 From there, it gradually makes its way to the LRU position in T1 Unless it is used once again before eviction But this will not happen with one-time-only scans Hence, T2 will not be impacted by the scan!

Scan-Resistance of ARC
This makes ARC scan-resistance, whereby T2 will Be effectively isolated Grow at the expense of T1 since more hits will occur at B2 (which causes an increase in T2’s size) Effectively handle temporal locality, even with mixed workloads (i.e., workloads with and without locality running concurrently)

Next Class Server-Side Replication

Distributed Systems CS

Similar presentations

Presentation on theme: "Distributed Systems CS"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Systems CS

Similar presentations

Presentation on theme: "Distributed Systems CS"— Presentation transcript:

Similar presentations

About project

Feedback