Presentation is loading. Please wait.

Presentation is loading. Please wait.

HK-NUCA: Boosting Data Searches in Dynamic NUCA for CMPs Javier Lira ψ Carlos Molina ф Antonio González ψ,λ λ Intel Barcelona Research Center Intel Labs.

Similar presentations


Presentation on theme: "HK-NUCA: Boosting Data Searches in Dynamic NUCA for CMPs Javier Lira ψ Carlos Molina ф Antonio González ψ,λ λ Intel Barcelona Research Center Intel Labs."— Presentation transcript:

1 HK-NUCA: Boosting Data Searches in Dynamic NUCA for CMPs Javier Lira ψ Carlos Molina ф Antonio González ψ,λ λ Intel Barcelona Research Center Intel Labs - UPC Barcelona, Spain antonio.gonzalez@intel.com ф Dept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spain carlos.molina@urv.net ψ Dept. Arquitectura de Computadors Universitat Politècnica de Catalunya Barcelona, Spain javier.lira@ac.upc.edu IPDPS 2011, Anchorage, AK (USA) – May 17, 2011

2 Introduction 2 Core 0Core 1Core 2Core 3 Core 4Core 5Core 6Core 7 NUCA S-NUCA (Static NUCA) One possible location in the NUCA Simple Trivial search of data No leverages locality D-NUCA (Dynamic NUCA) Multiple candidate banks Migration increases complexity Not easy to find data Optimize cache access latency

3 Motivation 3 Significant performance potential Limited by the access scheme

4 Access schemes in D-NUCA Directory is not an alternative Needs to update block location on every migration Reduces D-NUCA potentiality Potential bottleneck Algorithmic-based schemes Partitioned multicast (hybrid access scheme) 1st step: Local bank + central banks (9 banks) 2nd step: The other core’s local banks 4 PerformanceEnergy SerialLow ParallelHigh

5 Serial vs Parallel 5 Reduce the number of messages required per access is crucial

6 Objectives 6 Optimize NUCA features Provide fast access when the data is near the requesting core Reduce network contention Crucial in both performance and energy

7 Outline Introduction and motivation Methodology HK-NUCA Results Conclusions 7

8 Methodology Simulation tools: Simics + GEMS CACTI v6.0 Two scenarios: Multi-programmed Mix of SPEC CPU2006 Parallel applications PARSEC Number of cores8 – UltraSPARC IIIi Frequency1.5 GHz Main Memory Size4 Gbytes Memory Bandwidth512 Bytes/cycle Private L1 caches8 x 32 Kbytes, 2-way Shared L2 NUCA cache8 MBytes, 128 Banks NUCA Bank64 KBytes, 8-way L1 cache latency3 cycles NUCA bank latency4 cycles Router delay1 cycle On-chip wire delay1 cycle Main memory latency250 cycles (from core)

9 Baseline architecture D-NUCA cache 8 MBytes 128 Banks Bank: 64 KBytes, 8-way Migration scheme: Gradual Promotion Replacement LRU Access Partitioned Multicast 9 Core 0Core 1Core 2Core 3 Core 4Core 5Core 6Core 7

10 Outline Introduction and motivation Methodology HK-NUCA Results Conclusions 10

11 HK-NUCA Home Knows where to find data in the NUCA cache Home bank knows which other banks have at least one data block that it manages There is a HK-PTR per cache set in all banks. 11 0010110000001010 HK-PTR

12 (2) Call Home(3) Parallel access HK-NUCA 12 Core 0Core 1Core 2Core 3 Core 4Core 5Core 6Core 7 Core 0 (1) Fast access 0010110000001010

13 Managing Home knowledge Actions that provoke an update of HK-PTR: New data enters to the cache Eviction from the NUCA cache Migration movements Migrations are synchronized with HK-PTR updates 13

14 Overheads Hardware Implementation HK-PTRs Network Home knowledge updates 14 NUCA cache8 MBytes HK-PTRs32 KBytes

15 Outline Introduction and motivation Methodology HK-NUCA Results Conclusions 15

16 Performance results 16 Overall performance improvement of 4-6%Workloads with high miss rateLow miss rate, but high hit rate in the first two HK-NUCA stages Low miss rate, high hit rate in the parallel access stage of HK-NUCA

17 HK-NUCA accuracy 17 85% of memory requests send less than 6 messages to the NUCA

18 On-chip network traffic 18 Avg Messages sent per request Part. Multcast10.03 HK-NUCA (3-steps)3.82 HK-NUCA (2-steps)4.06 Perfect Search1

19 Energy consumption results 19 HK-NUCA reduces dynamic energy consumption by more than 50%

20 Outline Introduction and motivation Methodology HK-NUCA Results Conclusions 20

21 Conclusions D-NUCA enables to take profit of the non-uniformity of NUCA caches D-NUCA benefits are restricted by the access scheme used HK-NUCA is an access scheme for D-NUCA organizations Allows fast accesses to data that is near the requesting core Home knowledge reduces miss resolution time and network contention Outperforms by 6% the best performing access scheme Reduces dynamic energy consumption by 50% 21

22 HK-NUCA: Boosting data searches in Dynamic NUCA for CMPs Questions? 22

23 Migration is not the problem 23 S-NUCAD-NUCA Access scheme is the main limitation in D-NUCA


Download ppt "HK-NUCA: Boosting Data Searches in Dynamic NUCA for CMPs Javier Lira ψ Carlos Molina ф Antonio González ψ,λ λ Intel Barcelona Research Center Intel Labs."

Similar presentations


Ads by Google