Presentation is loading. Please wait.

Presentation is loading. Please wait.

컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts.

Similar presentations


Presentation on theme: "컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts."— Presentation transcript:

1 컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts Institute of Technology, Cambridge, MA, USA) Mieszko Lis Yildiz Sinangil Srinivas Devadas DCC: A Dependable Cache Coherence Multicore Architecturencing DCC: A Dependable Cache Coherence

2 Table of Contents 1. INTRODUCTION 2. CACHE COHERENCE ARCHITECTURES 3. DEPENDABLE CACHE COHERENCE ARCHITECTURE 4. EVALUATION 5. CONCLUSION

3 1. INTRODUCTION Snooping protocols do not scale to large core Directory-based protocols require the overhead of directories Today computer architects are investing heavily into means of detecting and correcting errors Motivation

4 1. INTRODUCTION Snooping protocol –N transactions for an N-node –All caches need to watch every memory request from each processor

5 1. INTRODUCTION Directory-based protocol –Require the overhead of directories (# lines × # processors) –The complexity of directory protocols is attributed to the directory indirections

6 1. INTRODUCTION This paper proposes a novel dependable cache coherence architecture (DCC) that combines traditional directory-based coherence (DirCC) with a novel execution-migration-based coherence architecture (EM)

7 1. INTRODUCTION Ensures that no writable data is ever shared among caches, and therefore does not require directories When a thread needs to access an address cached on another core, the hardware efficiently migrates the thread’s execution context to the core EM protocol

8 Baseline architecture 2. CACHE COHERENCE ARCHITECTURES Multicore chip that is fully distributed across tiles with a uniform address space shared by all tiles

9 DirCC protocol 2. CACHE COHERENCE ARCHITECTURES The directory protocol brings data to the locus of the computation When a memory instruction refers to an address that is not locally cached, the instruction stalls while the coherence protocol brings the data to the local cache Cons –Long cache miss access latency –One address stored in many local cache –Many shared copies for invalidation

10 EM protocol 2. CACHE COHERENCE ARCHITECTURES The execution migration protocol always brings the computation to the data when a memory instruction requests an address not cached by the current core, the execution context (architecture state in registers and the TLB) moves to the core that is home for that data Although the EM protocol efficiently exploits spatial locality, the opportunities for exploiting temporal locality are limited to register values

11 EM protocol 2. CACHE COHERENCE ARCHITECTURES Migration ① Core C execute a memory access for address A ② It first compute the home core H for A ③ If H=C, it is a core hit → Request for A is forwarded to the cache hierarchy If H≠C, it is a core miss → Core C halts execution and migrates the architectural state to H → Thread context is loaded in the remote core H

12 EM protocol 2. CACHE COHERENCE ARCHITECTURES Migration Execution context transfer framework

13 3. DEPENDABLE CACHE COHERENCE ARCHITECTURE DirCC + EM DCC architecture enables runtime transitions between the two coherence protocols DCC protocol

14 Default processor configuration 4. EVALUATION

15 LU 4. EVALUATION SPLASH-2 LU_NON_CONTIGUOUS –Read/write data sharing which causes mass evictions DirCC –Capacity and coherence miss: 9% –AML: 35.2 cycles EM –Core miss: 65% –Hops per migration: 12 hops –Migration overhead: 51 cycles –AML: 28.4 cycles

16 RAYTRACE 4. EVALUATION SPLASH-2 RAYTRACE –Read-only data sharing DirCC –Capacity and coherence misse: 1.5% –AML: 5.8 cycles EM –Core miss: 29% –Hops per migration: 11 hops –Migration overhead: 47 cycles –AML: 15 cycles

17 Evaluation result 4. EVALUATION Average memory latency –LU 1.25X advantage for EM over the DirCC –RAYTRACE 2.6X advantage for DirCC over the EM Depending on the data sharing patterns of an application, either cache coherence protocol can perform better than the other

18 5. CONCLUSION Today microprocessor designers are investing heavily into means of detecting and correcting errors This paper proposed a novel dependable cache coherence architecture (DCC) that provides architectural redundancy for maintaining coherence between on-chip caches.


Download ppt "컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts."

Similar presentations


Ads by Google