컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts.

Slides:



Advertisements
Similar presentations
Virtual Hierarchies to Support Server Consolidation Michael Marty and Mark Hill University of Wisconsin - Madison.
Advertisements

Cache coherence for CMPs Miodrag Bolic. Private cache Each cache bank is private to a particular core Cache coherence is maintained at the L2 cache level.
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Cache Coherence Mechanisms (Research project) CSCI-5593
ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors Mohammad Hammoud, Sangyeun Cho, and Rami Melhem Presenter: Socrates Demetriades.
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
The Locality-Aware Adaptive Cache Coherence Protocol George Kurian 1, Omer Khan 2, Srini Devadas 1 1 Massachusetts Institute of Technology 2 University.
Zhongkai Chen 3/25/2010. Jinglei Wang; Yibo Xue; Haixia Wang; Dongsheng Wang Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China This paper.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
The University of Adelaide, School of Computer Science
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
1 Lecture 8: Large Cache Design I Topics: Shared vs. private, centralized vs. decentralized, UCA vs. NUCA, recent papers.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Shuchang Shan † ‡, Yu Hu †, Xiaowei Li † † Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences.
Multi-core architectures. Single-core computer Single-core CPU chip.
Tufts University Department of Electrical and Computer Engineering
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Predicting Coherence Communication by Tracking Synchronization Points at Run Time Socrates Demetriades and Sangyeun Cho 45 th International Symposium in.
Lecture 13: Multiprocessors Kai Bu
Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation Jason Bosko March 5 th, 2008 Based on “Managing Distributed, Shared L2 Caches through.
1 Lecture 13: Cache, TLB, VM Today: large caches, virtual memory, TLB (Sections 2.4, B.4, B.5)
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
The University of Adelaide, School of Computer Science
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing Liquin Cheng, John B. Carter and Donglai Dai cs.utah.edu by Evangelos Vlachos.
Reactive NUMA A Design for Unifying S-COMA and CC-NUMA
Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Architecture Background
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
Lecture: Large Caches, Virtual Memory
Lecture 13: Large Cache Design I
COSC121: Computer Systems. Managing Memory
CMSC 611: Advanced Computer Architecture
Lecture 21: Memory Hierarchy
The University of Adelaide, School of Computer Science
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
Lecture 23: Cache, Memory, Virtual Memory
Lecture 22: Cache Hierarchies, Memory
Lecture: Cache Innovations, Virtual Memory
Introduction to Multiprocessors
Performance metrics for caches
Lecture 24: Memory, VM, Multiproc
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Performance metrics for caches
Mengjia Yan† , Jiho Choi† , Dimitrios Skarlatos,
/ Computer Architecture and Design
Lecture 22: Cache Hierarchies, Memory
High Performance Computing
Chapter 4 Multiprocessors
The University of Adelaide, School of Computer Science
Performance metrics for caches
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Lecture 17 Multiprocessors and Thread-Level Parallelism
Jakub Yaghob Martin Kruliš
The University of Adelaide, School of Computer Science
Performance metrics for caches
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts Institute of Technology, Cambridge, MA, USA) Mieszko Lis Yildiz Sinangil Srinivas Devadas DCC: A Dependable Cache Coherence Multicore Architecturencing DCC: A Dependable Cache Coherence

Table of Contents 1. INTRODUCTION 2. CACHE COHERENCE ARCHITECTURES 3. DEPENDABLE CACHE COHERENCE ARCHITECTURE 4. EVALUATION 5. CONCLUSION

1. INTRODUCTION Snooping protocols do not scale to large core Directory-based protocols require the overhead of directories Today computer architects are investing heavily into means of detecting and correcting errors Motivation

1. INTRODUCTION Snooping protocol –N transactions for an N-node –All caches need to watch every memory request from each processor

1. INTRODUCTION Directory-based protocol –Require the overhead of directories (# lines × # processors) –The complexity of directory protocols is attributed to the directory indirections

1. INTRODUCTION This paper proposes a novel dependable cache coherence architecture (DCC) that combines traditional directory-based coherence (DirCC) with a novel execution-migration-based coherence architecture (EM)

1. INTRODUCTION Ensures that no writable data is ever shared among caches, and therefore does not require directories When a thread needs to access an address cached on another core, the hardware efficiently migrates the thread’s execution context to the core EM protocol

Baseline architecture 2. CACHE COHERENCE ARCHITECTURES Multicore chip that is fully distributed across tiles with a uniform address space shared by all tiles

DirCC protocol 2. CACHE COHERENCE ARCHITECTURES The directory protocol brings data to the locus of the computation When a memory instruction refers to an address that is not locally cached, the instruction stalls while the coherence protocol brings the data to the local cache Cons –Long cache miss access latency –One address stored in many local cache –Many shared copies for invalidation

EM protocol 2. CACHE COHERENCE ARCHITECTURES The execution migration protocol always brings the computation to the data when a memory instruction requests an address not cached by the current core, the execution context (architecture state in registers and the TLB) moves to the core that is home for that data Although the EM protocol efficiently exploits spatial locality, the opportunities for exploiting temporal locality are limited to register values

EM protocol 2. CACHE COHERENCE ARCHITECTURES Migration ① Core C execute a memory access for address A ② It first compute the home core H for A ③ If H=C, it is a core hit → Request for A is forwarded to the cache hierarchy If H≠C, it is a core miss → Core C halts execution and migrates the architectural state to H → Thread context is loaded in the remote core H

EM protocol 2. CACHE COHERENCE ARCHITECTURES Migration Execution context transfer framework

3. DEPENDABLE CACHE COHERENCE ARCHITECTURE DirCC + EM DCC architecture enables runtime transitions between the two coherence protocols DCC protocol

Default processor configuration 4. EVALUATION

LU 4. EVALUATION SPLASH-2 LU_NON_CONTIGUOUS –Read/write data sharing which causes mass evictions DirCC –Capacity and coherence miss: 9% –AML: 35.2 cycles EM –Core miss: 65% –Hops per migration: 12 hops –Migration overhead: 51 cycles –AML: 28.4 cycles

RAYTRACE 4. EVALUATION SPLASH-2 RAYTRACE –Read-only data sharing DirCC –Capacity and coherence misse: 1.5% –AML: 5.8 cycles EM –Core miss: 29% –Hops per migration: 11 hops –Migration overhead: 47 cycles –AML: 15 cycles

Evaluation result 4. EVALUATION Average memory latency –LU 1.25X advantage for EM over the DirCC –RAYTRACE 2.6X advantage for DirCC over the EM Depending on the data sharing patterns of an application, either cache coherence protocol can perform better than the other

5. CONCLUSION Today microprocessor designers are investing heavily into means of detecting and correcting errors This paper proposed a novel dependable cache coherence architecture (DCC) that provides architectural redundancy for maintaining coherence between on-chip caches.