Presentation is loading. Please wait.

Presentation is loading. Please wait.

Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter.

Similar presentations


Presentation on theme: "Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter."— Presentation transcript:

1 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter (BFH), Philip Brisk (UC Riverside), Edoardo Charbon (TU Delft), Paolo Ienne (EPFL)

2 Multicore Embedded Systems Increasing number of multiprocessor based embedded systems. Low energy requirement with little compromise on performance. Significant energy consumption in the memory subsystem (caches, shared bus, main memory). 2 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture

3 Symmetric Multiprocessor System 3 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Shared Memory Shared Memory D$ I$ CPU 1 D$ I$ CPU 2 D$ I$ CPU n

4 Cache Coherency Problem 4 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Shared Memory Shared Memory D$ I$ CPU 1 D$ I$ CPU 2 D$ I$ CPU n

5 Snoopy Hardware Coherence Protocols 5 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Shared Memory Shared Memory D$ I$ CPU 1 D$ I$ CPU 2 D$ I$ CPU n Snoop misses consume excessive energy

6 Snoop Filters 6 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Shared Memory Shared Memory D$ I$ CPU 1 D$ I$ CPU 2 D$ I$ CPU n SF Snoop filter lookup costs lesser energy than a cache lookup

7 Snoop Filters in Prior Art Include, Exclude and Hybrid JETTY –Expensive for an embedded system in terms of area. –Energy consumed by the JETTYs itself is significant. Stream Registers –Present in IBM's BlueGene Supercomputer. –Inclusive filter. –Uses a base and mask register pair to track the cache lines. 7 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture

8 Stream Registers 8 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture 1 0 0 1 1 1 1 0b1001 1 0 0 1 1 1 0 01 0b1010 --- 0 Base Mask Valid No general mechanism to remove address from SR without compromising correctness Addresses with 10XX result in snoop filter hit

9 Drawbacks of Stream Register based Snoop Filters No efficient way to update the registers when a line is removed from cache –Degraded filtering performance over time –Additional logic units introduced but not efficient (e.g., cache wrap detection) 9 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture

10 Our Contribution Counting Stream Registers –Eliminates cache wrap detection logic –Counter to track cache lines –More robust to workload variability –Better or similar energy savings compared to SRs 10 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture

11 Counting Stream Registers 11 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture 1 0 0 1 1 1 0x01 0b1001 1 0 0 1 1 1 0 00x02 0b1010 --- 0 Base Mask Counter Removes the need for extra logic such as cache wrap detection, active register history etc. Invalidated cache lines can be tracked by decrementing the counter

12 Snoop Filter Architecture 12 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Index to direct mapped snoop filter table Index to direct mapped snoop filter table Set of cache lines grouped into a page Used for comparison with base register

13 Experimental Analysis Virtex 2 FPGA running OpenRISC soft cores –Configurable no. of processors, associativity and size of data and instruction cache, cache type and coherence protocol EEMBC Multibench Benchmarks CACTI 5.3 energy model –Total memory subsystem energy accounted for main memory r/w energy, data and instruction cache r/w energy, leakage and snoop energy 13 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture

14 Cache Design Space Exploration 14 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture

15 Results: Filtering Percentage 15 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture CSR achieves higher filtering % for smaller number of registers

16 Analysis: RGB2CMYK Benchmark 16 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture

17 Discussion: Energy Consumption For most benchmarks, snoop energy was around 8-10% of the total memory subsystem energy without snoop filters CSR filters more effective for certain benchmarks (H.264, Image rotation) –Better filtering performance with smaller no. of stream registers. Small reduction in overall energy –Platform limited to 32 MB of off-chip SDRAM –No complex data sharing and limited no. of multiple producers of same data 17 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture

18 Summary 18 Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Introduced counting stream registers based snoop filter architecture –Lesser hardware complexity and ability to track cache line invalidations Experimental evaluation shows better filtering percentage than stream registers with lesser performance variation for different workloads.


Download ppt "Counting Stream Registers: An Efficient and Effective Snoop Filter Architecture Aanjhan Ranganathan (ETH Zurich), Ali Galip Bayrak (EPFL), Theo Kluter."

Similar presentations


Ads by Google