Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer.

Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer Engineering North Carolina State University HPCA-2005

2 Chandra, Guo, Kim, Solihin - Contention Model L2 $ Cache Sharing in CMP L1 $ …… Processor Core 1Processor Core 2 L1 $

3 Chandra, Guo, Kim, Solihin - Contention Model Impact of Cache Space Contention Application-specific (what) Coschedule-specific (when) Significant: Up to 4X cache misses, 65% IPC reduction Need a model to understand cache sharing impact

4 Chandra, Guo, Kim, Solihin - Contention Model Related Work Uniprocessor miss estimation: Cascaval et al., LCPC 1999 Chatterjee et al., PLDI 2001 Fraguela et al., PACT 1999 Ghosh et al., TPLS 1999 J. Lee at al., HPCA 2001 Vera and Xue, HPCA 2002 Wassermann et al., SC 1997 Context switch impact on time-shared processor: Agarwal, ACM Trans. On Computer Systems, 1989 Suh et al., ICS 2001 No model for cache sharing impact:  Relatively new phenomenon: SMT, CMP  Many possible access interleaving scenarios

5 Chandra, Guo, Kim, Solihin - Contention Model Contributions Inter-Thread cache contention models  2 Heuristics models (refer to the paper)  1 Analytical model Input: circular sequence profiling for each thread Output: Predicted num cache misses per thread in a coschedule Validation  Against a detailed CMP simulator  3.9% average error for the analytical model Insight  Temporal reuse patterns  impact of cache sharing

6 Chandra, Guo, Kim, Solihin - Contention Model Outline Model Assumptions Definitions Inductive Probability Model Validation Case Study Conclusions

8 Chandra, Guo, Kim, Solihin - Contention Model Assumptions One circular sequence profile per thread  Average profile yields high prediction accuracy  Phase-specific profile may improve accuracy LRU Replacement Algorithm  Others are usu. LRU approximations Threads do not share data  Mostly true for serial apps  Parallel apps: threads likely to be impacted uniformly

9 Chandra, Guo, Kim, Solihin - Contention Model Outline Model Assumptions Definitions Inductive Probability (Prob) Model Validation Case Study Conclusions

10 Chandra, Guo, Kim, Solihin - Contention Model Definitions seq X (d X,n X ) = sequence of n X accesses to d X distinct addresses by a thread X to the same cache set cseq X (d X,n X ) (circular sequence) = a sequence in which the first and the last accesses are to the same address A B C D A E E B cseq(4,5)cseq(1,2) cseq(5,7) seq(5,8)

11 Chandra, Guo, Kim, Solihin - Contention Model Circular Sequence Properties Thread X runs alone in the system:  Given a circular sequence cseq X (d X,n X ), the last access is a cache miss iff d X > Assoc Thread X shares the cache with thread Y:  During cseq X (d X,n X )’s lifetime if there is a sequence of intervening accesses seq Y (d Y,n Y ), the last access of thread X is a miss iff d X +d Y > Assoc

12 Chandra, Guo, Kim, Solihin - Contention Model Example Assume a 4-way associative cache: A B A X’s circular sequence cseq X (2,3) U V V W Y’s intervening access sequence lifetime No cache sharing: A is a cache hit Cache sharing: is A a cache hit or miss?

13 Chandra, Guo, Kim, Solihin - Contention Model Example Assume a 4-way associative cache: A U B V V W A A B A X’s circular sequence cseq X (2,3) U V V W Y’s intervening access sequence A U B V V A W Cache HitCache Miss seq Y (2,3) intervening in cseq X ’s lifetime seq Y (3,4) intervening in cseq X ’s lifetime

15 Chandra, Guo, Kim, Solihin - Contention Model Inductive Probability Model For each cseq X (d X,n X ) of thread X  Compute P miss (cseq X ): the probability of the last access is a miss Steps:  Compute E(n Y ): expected number of intervening accesses from thread Y during cseq X ’s lifetime  For each possible d Y, compute P(seq(d Y, E(n Y )): probability of occurrence of seq(d Y, E(n Y )),  If d Y + d X > Assoc, add to P miss (cseq X )  Misses = old_misses + ∑ P miss (cseq X ) x F(cseq X ) 

16 Chandra, Guo, Kim, Solihin - Contention Model Computing P(seq(d Y, E(n Y ))) Basic Idea: P(seq(d,n)) = A * P(seq(d-1,n)) + B * P(seq(d-1,n-1))  Where A and B are transition probabilities Detailed steps in paper seq(d,n) seq(d-1,n-1)seq(d,n-1) + 1 access to a distinct address + 1 access to a non-distinct address

18 Chandra, Guo, Kim, Solihin - Contention Model Validation SESC simulator Detailed CMP + memory hierarchy 14 co-schedules of benchmarks (Spec2K and Olden) Co-schedule terminated when an app completes CMP Cores 2 cores, each 4-issue dynamic. 3.2GHz Base Memory L1 I/D (private): each WB, 32KB, 4way, 64B line L2 Unified (shared): WB, 512 KB, 8way, 64B line L2 replacement: LRU

19 Chandra, Guo, Kim, Solihin - Contention Model Validation Co-scheduleActual Miss Increase Prediction Error gzip + applu 243%-25% 11%2% gzip + apsi 180%-9% 0% mcf + art 296%7% 0% mcf + gzip 18%7% 102%22% mcf + swim 59%-7% 0% Error = (PM-AM)/AM Larger error happens when miss increase is very large Overall, the model is accurate

20 Chandra, Guo, Kim, Solihin - Contention Model Other Observations Based on how vulnerable to cache sharing impact:  Highly vulnerable (mcf, gzip)  Not vulnerable (art, apsi, swim)  Somewhat / sometimes vulnerable (applu, equake, perlbmk, mst) Prediction error:  Very small, except for highly vulnerable apps  3.9% (average), 25% (maximum)  Also small for different cache associativities and sizes

22 Chandra, Guo, Kim, Solihin - Contention Model Case Study Profile approx. by geometric progression F(cseq(1,*)) F(cseq(2,*)) F(cseq(3,*)) … F(cseq(A,*)) … Z Zr Zr 2 … Zr A …  Z = amplitude  0 < r < 1 = common ratio  Larger r  larger working set Impact of interfering thread on the base thread?  Fix the base thread  Interfering thread: vary Miss frequency = # misses / time Reuse frequency = # hits / time

23 Chandra, Guo, Kim, Solihin - Contention Model Base Thread: r = 0.5 (Small WS) Base thread:  Not vulnerable to interfering thread’s miss frequency  Vulnerable to interfering thread’s reuse frequency

24 Chandra, Guo, Kim, Solihin - Contention Model Base Thread: r = 0.9 (Large WS) Base thread:  Vulnerable to interfering thread’s miss and reuse frequency

26 Chandra, Guo, Kim, Solihin - Contention Model Conclusions New Inter-Thread cache contention models Simple to use:  Input: circular sequence profiling per thread  Output: Number of misses per thread in co-schedules Accurate  3.9% average error Useful  Temporal reuse patterns  cache sharing impact Future work:  Predict and avoid problematic co-schedules  Release the tool at http://www.cesr.ncsu.edu/solihinhttp://www.cesr.ncsu.edu/solihin

Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer.

Similar presentations

Presentation on theme: "Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer.

Similar presentations

Presentation on theme: "Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer."— Presentation transcript:

Similar presentations

About project

Feedback