Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Partial Tag Comparison in Low-Power Snoop-based Chip Multiprocessors Ali ShafieeNarges Shahidi Amirali Baniasadi Sharif University of Technology.

Similar presentations


Presentation on theme: "Using Partial Tag Comparison in Low-Power Snoop-based Chip Multiprocessors Ali ShafieeNarges Shahidi Amirali Baniasadi Sharif University of Technology."— Presentation transcript:

1 Using Partial Tag Comparison in Low-Power Snoop-based Chip Multiprocessors Ali ShafieeNarges Shahidi Amirali Baniasadi Sharif University of Technology University of Victoria 1

2 Goal: Improving energy efficiency in snoop-based CMPs. Motivation: Broadcasting/processing entire tag is inefficient. Our Solution: Using Partial Tag Comparison (PTC) prior to snoop. Key Results Performance ( 2.9%) Tag array power ( 52%) Bandwidth utilization ( 78.5%) 2 This Work: Improving Snoop Coherency

3 Our Solution (PTC) vs. Conventional 3 D$ Interconnect Upper Level Cache …. D$ Upper Level Cache …. D$ Interconnect ConventionalOur solution Fast + Power & Bandwidth Fast ++ (early miss detection) Power & Bandwidth Efficient +

4 Conventional Snooping 4 Address Bus Snoop Bus Command Bus D$ CPU D$ CPU 2 1 3 3 3 controller 5 4 4 4 Redundant (miss): ~70%

5 Snoop Filters 5 Goal: Eliminate redundant snoop requests. Example: RegionScout (ISCA05), CGCT(ISCA05), SSP (ASPLOS08) PTC: (1) Early miss detection using subset of tag bits. (2) Once a miss is detected, snoop is avoided. How often is that possible?

6 6 How often using n bits is enough to detect a miss? 95 + % of misses can be detected using 8 bits.

7 7 D$ Address Bus LSB misshit Avoid Snoop Access Upper Level Snoop Potential Targets PTC-Filter

8 8 4-way D$ PTC-Filter Filter 0 12 3 … Core1s LSBCore2s LSBCore3s LSB VDLSB 8 bits

9 PTC: Filter Miss 9 Address Bus Snoop Bus Command Bus D$ CPU D$ CPU 3 2 controller 1

10 PTC: Filter Hit 10 Address Bus Snoop Bus Command Bus D$ CPU D$ CPU 2 4 controller 6 5 1 3

11 Filter Maintenance 11 PTC- Filter CPU 1 B FDE Request =A 3 3 Address Bus Core 0 ….. Core i Addr.CWD Snoop Controller 4 Command Bus 5 6 6 miss A. place it in position of tag F 2 2 Pending Request Table {Address=A, C=0,W=1, D=1} A011 Place A, insert in Way 1 of core 0

12 12 Methodology SESC simulator 4-way CMP SPLASH-2 benchmarks CACTI 6.0 4 MB 4-banked 16-way 10 cycle latency L2 6 cycle arbitration + 2 cycle core to controller latency + Crossbar data network+ MESI protocol DL1/IL1 4-way/2-way 64KB/32KB 3 cycle latency 64 B cache line+ 500 cycle Memory access

13 13 Performance Average: 2.9%

14 14 Bandwidth Average: 78.5%

15 15 Tag Power Average: 52%

16 Why do benchmarks show different performance improvement? Different cache miss frequency Different early miss detection frequency Not all cache misses are on the critical path Filter overhead: Timing: 1 cycle Power: 78.5% of single tag array access 16 Discussion

17 PTC: Using subset of tag bits to improve bandwidth/power efficiency. Results: Performance: 2.9% Tag Power: 52% Bandwidth: 78.5% 17 Summary

18 18

19 19 Global vs. Local Miss D$ Interconnect Upper Level Cache …. D$ Have B? NO D$ interconnect Upper Level Cache …. D$ Have B? NOYES D$ NO Global Miss Local Miss local miss detection better power/bandwidth profile Remote miss detection (source-based approach) vs. (destination-based filter)

20 20 Partial tag lookup: global miss

21 21 Partial tag lookup: local miss


Download ppt "Using Partial Tag Comparison in Low-Power Snoop-based Chip Multiprocessors Ali ShafieeNarges Shahidi Amirali Baniasadi Sharif University of Technology."

Similar presentations


Ads by Google