Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Analysis of NUCA Policies for CMPs Using Parsec v2.0 Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research.

Similar presentations


Presentation on theme: "Performance Analysis of NUCA Policies for CMPs Using Parsec v2.0 Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research."— Presentation transcript:

1 Performance Analysis of NUCA Policies for CMPs Using Parsec v2.0 Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel Labs - UPC Barcelona, Spain antonio.gonzalez@intel.com ф Dept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spain carlos.molina@urv.net ψ Dept. Arquitectura de Computadors Universitat Politècnica de Catalunya Barcelona, Spain javier.lira@ac.upc.edu XX Jornadas de Paralelismo, A Coruña (Spain) – September 17, 2009

2 Outline Introduction Methodology Analysis of NUCA policies Bank Placement Policy Bank Access Policy Bank Migration Policy Bank Replacement Policy Conclusions

3 Introduction CMPs have emerged as a dominant paradigm in system design. 1. Keep performance improvement while reducing power consumption. 2. Take advantage of Thread-level parallelism. Commercial CMPs are currently available. CMPs incorporate larger and shared last-level caches. Wire delay is a key constraint.

4 NUCA Non-Uniform Cache Architecture (NUCA) was first proposed in ASPLOS 2002 by Kim et al. [1]. NUCA divides a large cache in smaller and faster banks. Banks close to cache controller have smaller latencies than further banks. Processor [1] C. Kim, D. Burger and S.W. Keckler. An Adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS 02

5 NUCA Policies Bank Placement PolicyBank Access Policy Bank Replacement PolicyBank Migration Policy

6 Outline Introduction Methodology Analysis of NUCA policies Bank Placement Policy Bank Access Policy Bank Migration Policy Bank Replacement Policy Conclusions

7 Methodology Simulation tools: Simics + GEMS CACTI v6.0 PARSEC v2.0 Benchmark Suite Number of cores8, 4-way SMT Branch PredictorYAGS Intr. Window / ROB64 / 128 entries Block size64 Bytes L1 Cache (Instr/Data)32 KBytes, 2-way L2 Cache (NUCA)8 MBytes, 256 banks NUCA Bank32 Kbytes, 8-way L1 Latency3 cycles NUCA Bank Latency4 cycles Router Latency1 cycle Wire delay1 cycle Off-chip Mem. Latency350 cycles (from core)

8 Baseline NUCA cache architecture 8 cores 256 banks [2] B. M. Beckmann and D. A. Wood. Managing wire delay in large chip-multiprocessor caches. MICRO 04

9 Outline Introduction Methodology Analysis of NUCA policies Bank Placement Policy Bank Access Policy Bank Migration Policy Bank Replacement Policy Conclusions

10 Bank Placement Policy 1B + Static 16B + Static 16B + Local

11 Bank Placement Policy 1B + Static placement provides fair distribution. 16B configurations concentrate data in few banks. Placement and migration policies are strictly correlated.

12 Outline Introduction Methodology Analysis of NUCA policies Bank Placement Policy Bank Access Policy Bank Migration Policy Bank Replacement Policy Conclusions

13 Bank Access Policy Serial 9P + 7P Parallel

14 Bank Access Policy Power efficiency vs. Perfomance. 9P + 7P is a trade-off, but it is still far from the performance potencial. These results suggest the broad area of improvement on this policy.

15 Outline Introduction Methodology Analysis of NUCA policies Bank Placement Policy Bank Access Policy Bank Migration Policy Bank Replacement Policy Conclusions

16 Bank Migration Policy Static Gradual + Swapping Gradual + Replication

17 Bank Migration Policy Replication reduces the effective size of the cache. Migration approaches concentrate data blocks in few banks. Static approach fairly distribute data blocks in the whole cache. Placement and migration policies are strictly correlated.

18 Outline Introduction Methodology Analysis of NUCA policies Bank Placement Policy Bank Access Policy Bank Migration Policy Bank Replacement Policy Conclusions

19 Bank Replacement Policy Zero-copy One-copy Last Bank

20 Bank Replacement Policy Giving a second chance to evicted data blocks provides significant performance gain. Last Bank is a promising mechanism, but this is restricted by its small size. Further exploration on this policy is required.

21 Outline Introduction Methodology Analysis of NUCA policies Bank Placement Policy Bank Access Policy Bank Migration Policy Bank Replacement Policy Conclusions

22 NUCA is characterized by four policies. NUCA policies are related. Static placement with no-migration: Good trade-off. Bank placement and bank migration are strictly correlated. Bank access: Power efficiency vs. Performance. Bank replacement: Performance (unbounded last bank). Still room for improvement in all policies.

23 Performance Analysis of NUCA Policies for CMPs Using Parsec v2.0 Benchmark Suite Questions?


Download ppt "Performance Analysis of NUCA Policies for CMPs Using Parsec v2.0 Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research."

Similar presentations


Ads by Google