Presentation is loading. Please wait.

Presentation is loading. Please wait.

MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,

Similar presentations


Presentation on theme: "MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,"— Presentation transcript:

1 MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20, 2010

2 Problem: Changing hardware and workloads encourage investigation of cache replacement/insertion policy designs Proposal: MadCache uses PC history to choose cache insertion policy –Last level cache granularity –Individual PC granularity Performance improvements over LRU –2.5% improvement IPC (single thread) –4.5% speedup and 6% speedup improvement (multithreaded) 2 Executive Summary

3 Importance of investigating cache insertion policies –Direct affect on performance –LRU dominated hardware designs for many years –Changing workloads, levels of caches Shared last-level cache –Cache behavior now depends on multiple running applications –One streaming thread can ruin the cache for everyone 3 Motivation

4 Dynamic insertion policies –DIP – Qureshi et. al – ISCA ’07 Dueling sets select best of multiple policies Bimodal Insertion Policy (BIP) offers thrash protection –TADIP – Jaleel et. al – PACT ’08 Awareness of other threads’ workloads Utilizing Program Counter information –Exhibit a useful amount of predictable behavior –Dead-block prediction and prefetching – ISCA ’01 –PC-based load miss prediction – MICRO ’95 4 Previous Work

5 Problem: With changing hardware and workloads, caches are subject to suboptimal insertion policies Solution: Use PC information to create a better policy –Adaptive default cache insertion policy –Track PCs to determine the policy on a finer grain than DIP –Filter out streaming PCs Introducing MadCache! 5 MadCache Proposal

6 Tracker Sets –Sample behavior of the cache –Enter the PCs into PC-Predictor Table –Determines default policy of cache Uses set dueling - Qureshi et. al – ISCA ’07 LRU and Bypassing Bimodal Insertion Policy (BBIP) Follower Sets –Majority of the last level cache –Typically follow the default policy –Can override default cache policy (PC-Predictor Table) 6 MadCache Design

7 7 Tracker and Follower Sets BBIP Tracker Sets LRU Trackers Sets Follower Sets Reuse Bit Index to PC- Predictor Tracker Sets overhead –1-bit to indicate if line was accessed again –10/11 bits to index PC-Predictor table Last Level Cache

8 PC-Predictor Table –Store PCs that have accessed Tracker Sets –Track behavior history using counter Decrement if an address is used many times in the LLC Increment if line is evicted and was never reused –Per-PC default policy override LRU (default) plus BBIP override BBIP (default) plus LRU override 8 MadCache Design

9 9 PC-Predictor Table Policy + PC(MSB) Counter# Entries (1 + 64 bits) (6 bits)(9 bits) PC (miss)(MSB) Counter Hit? 0 1 Parallel to cache miss, PC + current policy index PC-Predictor If hit in table, follow the PC’s override policy If miss in table, follow global default policy Default Policy PC-Predictor Table

10 Thread aware MadCache –Similar structures as single-threaded MadCache –Track based on current policy of other threads Multithreaded MadCache extensions –Separate tracker sets for each thread Each thread still tracks LRU and BBIP –PC-Predictor table Extended number of entries Indexed by thread-ID, policy, and PC –Set dueling PER THREAD 10 Multi-Threaded MadCache

11 11 Multi-threaded MadCache TID + + PC(MSB) Counter# Entries (2 + 4 + 64 bits) (6 bits)(9 bits) (MSB) Counter Hit? 0 1 Default PolicyPC-Predictor Table (10 bits) TID-0 TID-1 TID-2 TID-3 TID-0 BBIP Tracker Sets TID-0 LRU Tracker Sets Other Tracker Sets Follower Sets Last Level Cache

12 Deep Packet Inspection 1 –Large match tables (1MB+) commonly used for DFA/XFA regular expression matching –Incoming byte stream from packets causes different table traversals Table exhibits reuse between packets Packets mostly streaming (backtracking implementation dependent) 12 MadCache – Example Application 1 Evaluating GPUs for Network Packet Signature Matching – ISPASS ‘09

13 MadCache – Example Application 13 –Packets mostly streaming –Frequently accessed Match Table contents held in L1/L2 Less frequently accessed elements in LLC/memory Match Table Current Processing Element Packet Current Processing Element Packet

14 MadCache – Example Application DIP –Would favor BIP policy due to packet data streaming –LLC mixture of Match Table and useless packet data MadCache –Would identify PCs associated with Match Table as useful –LLC populated almst entirely by Match Table 14 DIP LLCMadCache LLC Packet Data Table Data

15 15 Experimentation Processor8-Stage, 4-wide pipeline Instruction window size128 entries Branch PredictorPerfect L1 inst. cache32KB, 64B linesize, 4-way SA, LRU, 1 cycle hit L1 data cache32KB, 64B linesize, 8-way SA, LRU, 1 cycle hit L2 cache32KB, 64B linesize, 8-way SA, LRU, 10 cycle hit L3 cache (1 thread)1MB, 64B linesize, 30 cycle hit L3 cache (4 threads)4MB, 64B linesize, 30 cycle hit Main memory200 cycles –15 benchmarks from SPEC CPU2006 –15 workload mixes for multithreaded experiments –200 million cycle simulations

16 IPC normalized to LRU –2.5% improvement across benchmarks tested –Slight improvement over DIP 16 Results – Single-threaded

17 17 Results – Multithreaded Throughput normalized to LRU –6% improvement across mixes tested –DIP performs similarly to LRU

18 18 Results Weighted speedup normalized to LRU –4.5% improvement across benchmaks tested –DIP performs similarly to LRU

19 19 Future Work MadderCache? –Optimize size of structures PC-Predictor Table size Replace CAM with Hashed PC & Tag –Detailed analysis of benchmarks with MadCache –Extend PC Predictions Don’t take into account sharers

20 20 Conclusions Cache behavior still evolving –Changing cache levels, sharing, workloads MadCache insertion policy uses PC information –PCs exhibit useful amount of predictable behavior MadCache performance –2.5% improvement IPC for single-threaded –4.5% speedup, 6% throughput improvement for 4-threads –Sized to competition bit budget Preliminary investigations show little impact with reduction in structures

21 21 Questions?


Download ppt "MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,"

Similar presentations


Ads by Google