Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data.

Similar presentations


Presentation on theme: "Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data."— Presentation transcript:

1 Yuejian Xie, Gabriel H. Loh

2 Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data

3 Capacity Management –Considering different cache space need, allocate proper space to each core. –Guo-MICRO07, Kim-PACT04, Srikantaiah-ASPLOS09, Qureshi-MICRO06 (UCP), … Dead Time Management –Evict dead lines (blocks with no reuse) sooner. –Kaxiras-ISCA01, Qureshi-ISCA07, Jaleel-PACT07 (TADIP), … 3

4 Core1 Core0 Core 0 gets 5 ways Core 1 gets 3 ways 4

5 MRU LRU Incoming Block 5

6 MRU LRU 6 Occupies one cache block for a long time with no benefit!

7 MRU LRU Incoming Block 7

8 MRU LRU 8 Useless BlockEvicted at next eviction Useful BlockMoved to MRU position

9 MRU LRU 9 Useless BlockEvicted at next eviction Useful BlockMoved to MRU position

10 PIPP: Novel scheme for Promotion and Insertion Eviction –When replacing a block in a set, which should be evicted? Insertion –For new blocks, where to insert the new block? Promotion –When there is a hit in the cache, how to adjust the blocks position/priority? 10

11 Whats PIPP? –Promotion/Insertion Pseudo Partitioning –Achieving both capacity and dead-time management. Eviction –LRU block as the victim Insertion –The cores quota worth of blocks away from LRU Promotion –To MRU by only one. MRU LRU To Evict Promote Hit Insert Position = 3 (Target Allocation) New 11

12 Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A 2 2 3 3 4 4 5 5 B B C C Core0s Block Core1s Block Request MRU LRU Core1s quota=3 D D 12

13 Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A 2 2 5 5 3 3 4 4 D D B B Core0s Block Core1s Block Request MRU LRU 6 6 Core0s quota=5 13

14 Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A 2 2 6 6 3 3 4 4 D D B B Core0s Block Core1s Block Request MRU LRU Core0s quota=5 7 7 14

15 Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A 2 2 6 6 3 3 4 4 D D Core0s Block Core1s Block Request MRU LRU D D 7 7 15

16 Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A 2 2 7 7 6 6 4 4 Core0s Block Core1s Block Request MRU LRU Core1s quota=3 D D 3 3 E E 16

17 Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A 2 2 7 7 6 6 D D Core0s Block Core1s Block Request MRU LRU 3 3 E E 2 2 17

18 Core0Core1Core2Core3 Quota6442 MRU LRU Insert closer to LRU position 18

19 19 MRU 0 Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks Core0s Block Core1s Block Request Strict Partition MRU 1 LRU 1 LRU 0 New

20 20 MRU LRU Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks Core0s Block Core1s Block Request New Pseudo Partition

21 Directly to MRU (TADIP) Directly to MRU (TADIP) 21 New MRU LRU Promote By One (PIPP) Promote By One (PIPP) MRU LRU New

22 22 Algorithm Capacity Management Dead-time Management Note LRU Baseline, no explicit management UCPStrict partitioning TADIP Insert at LRU and promote to MRU on hit PIPP Pseudo-partitioning and incremental promotion

23 Simulation environment –SimpleScalar-Zesto, Out-Of-Order, Intel Core2-like –32KB, 8way DL1 IL1, 4MB 16way LLC, 1.6GHz DDR2 Workloads Classification –UCP2-5 UCP-friendly, 2-core, 5 th workload –DIP4-3 TADIP-friendly, 4-core, 3 th workload 23

24 TADIP Friendly UCP Friendly PIPP outperforms LRU, 19.0%, UCP 10.6%, TADIP 10.1% PIPP is too cautious here. 24

25 TADIP Friendly UCP Friendly PIPP outperforms LRU 21.9%, UCP 12.1%, TADIP 17.5% 25

26 Occupancy Control Insertion Behavior TADIP inserts no-reuse lines at 1.7 while PIPP inserts those at 1.3. (LRU position equals to 0.) Pseudo-Partition Benefit 26

27 Novel proposal on Insertion and Promotion A single unified mechanism provides both capacity and dead time management Outperforms prior UCP and TADIP In the full paper: –Special version of PIPP for streaming application –Reducing hardware overhead –Sensitivity analysis 27

28 28

29 29

30 30

31 31

32 E.g. Target Partition {5,3} – Actual Occupancy {6,2} = 1 32

33 33

34 Streaming Application Detection –#Accesses, #Misses, MissRate > threshold Insertion –At a fixed position (independent of quota) –#Streaming Apps blocks away from LRU position Promotion –Promote by 1 with probability p stream –p stream « 1 34

35 35

36 36 Promotion Prob for General App Promotion Prob for Streaming App

37 37

38 38


Download ppt "Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data."

Similar presentations


Ads by Google