Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Bank-aware Dynamic Cache Partitioning for Multicore Architectures
System Power Measurement, Modeling and Management Bank-aware Dynamic Cache Partitioning for Multicore Architectures Dimitris Kaseridis1 Jeff Stuecheli1,2 & Lizy K. John1 1University of Texas – Austin 2IBM – Austin Laboratory for Computer Architecture 9/23/2009

Outline Motivation/background Cache partitioning/profiling
Proposed system Results Conclusion/future work Laboratory for Computer Architecture 9/23/2009

Motivation Shared resources in CMPs Last Level Cache Memory bandwidth
Opportunity and Pitfalls Constructive Mixing low and high cache requirements in shared pool Destructive Thrashing workloads (spec cpu 2000 art + mcf) Cache partitioning required Primary opportunity requires heterogeneous workload mixes Typical in consolidation + virtualization Laboratory for Computer Architecture 9/23/2009

Monolithic vs NUCA vs Industry architectures
Monolithic: One large shared uniform latency cache bank on a CMP Does not exploit physical locality for private data Slow for all CMP-NUCA: Typical proposal has a very large number of autonomous cache banks Very flexible (256 banks) Non optimal configuration Inefficient bank size (bank overhead) Real implementations Fewer banks in industry NUCA with discrete cache levels Key is wire assumptions made in original NUCA analysis Core Cache Core Cache Core Core cache cache cache cache Cache Cache cache cache cache cache Core Core Core Core IBM POWER7 Intel Nehalem EX Laboratory for Computer Architecture 9/23/2009

Baseline System 8 cores 16 MB total capacity 16 x 1 MB banks
8 way associative Local Banks Tight latency to close core Center Banks Shared capacity Laboratory for Computer Architecture 9/23/2009

Cache Partitioning/Profiling
Laboratory for Computer Architecture 9/23/2009

Cache Sharing/Partitioning
Last level cache of CMP Once isolated resources now shared Drove need for isolation Design space Non-configurable Shared vs private caches Static partitioning/policy Long term policy choice Dynamic Real time profiling directed partitions Trial and error (experiment to find ideal configuration) Predictive profilers Non-invasive state space exploration (our system) Laboratory for Computer Architecture 9/23/2009

Bank-aware cache partitions
System components Non-invasive profiling using MSA (Mattson Stack Algorithm) Cache allocation using marginal utility Bank-aware LLC partitions Laboratory for Computer Architecture 9/23/2009

MSA Based Cache Profiling
Mattson stack algorithm Originally proposed to concurrently simulate many cache sizes Structure is a true LRU cache Stack distance from MRU of each reference is recorded Misses can be calculated for fraction of ways Laboratory for Computer Architecture 9/23/2009

Hardware MSA implementation
ways Hardware MSA implementation Naïve algorithm is prohibitive Fully associative Complete cache directory of maximum cache size for every core on the CMP (total size) Reduction Set Sampling Partial Tags Maximal Capacity Configuration in paper 12 bit tag 1/32 set sampling 9/16 bank per core 0.4% overhead of cache on chip sets Laboratory for Computer Architecture 9/23/2009

Marginal Utility Miss rate relative to capacity is non-linear, and heavily workload dependant Dramatic miss rate reduction as data structures become cache contained In practice, Iteratively assign cache to cores that produce the most hits per capacity Laboratory for Computer Architecture 9/23/2009

Bank-aware LLC partitions
a  ideal MSA model b  banked true LRU Cascade banks Power inefficient c  realistic banking Allocation policy Hash allocation Random allocation Bank granularity Uniform requirement Laboratory for Computer Architecture 9/23/2009

Bank-aware allocation heuristics
General idea As capacity grows, courser assignment is good enough Only share portions of Local cache banks between neighbors Central banks are assigned to a specific core Any core to receive central banks is also assigned full local capacity Laboratory for Computer Architecture 9/23/2009

Cache allocation flowchart
Assign full cache banks first (steps 1-3) All cores that have multiple banks are complete Partition remaining local banks (steps 4-7) Fine tune assignment Sharing pairs Laboratory for Computer Architecture 9/23/2009

Evaluation Laboratory for Computer Architecture 9/23/2009

Methodology Workloads
8 cores running mix of 26 SPEC CPU 2000 workloads What benchmark mix? Typical is to classify with limited experiments We wanted to cover a larger state space Monte Carlo Compare bank aware miss rate to ideal assignment Show algorithm works for many cases Detailed simulation Cycle accurate Full system Simics+GEMS+CourseBanks+CachePartitions Laboratory for Computer Architecture 9/23/2009

Monte Carlo How close is Bank-aware assignment to ideal monolithic?
Graphic shows miss rate reduction 1000 random SpecCPU 2000 benchmark mixes 97% correlation in miss rates Laboratory for Computer Architecture 9/23/2009

Workload sets for detailed simulation
Laboratory for Computer Architecture 9/23/2009

Cycle accurate simulation
Overall Miss ratio 70% reduction over shared 25% over equal Throughput 43% increase over shared 11% increase over equal Laboratory for Computer Architecture 9/23/2009

Conclusion/future work
Significant miss rate reduction/throughput improvement possible Partitions are very important Marginal utility can work with realistic banked CMP caches Heterogeneous Benchmarks needed Can’t evaluate all combinations Hand chosen combinations are hard to compare across proposals Laboratory for Computer Architecture 9/23/2009

Thank You, Questions? Laboratory for Computer Architecture University of Texas Austin & IBM Austin
9/23/2009

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Similar presentations

Presentation on theme: "Bank-aware Dynamic Cache Partitioning for Multicore Architectures"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Similar presentations

Presentation on theme: "Bank-aware Dynamic Cache Partitioning for Multicore Architectures"— Presentation transcript:

Similar presentations

About project

Feedback