컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts.

Slides:

Advertisements

Similar presentations

Virtual Hierarchies to Support Server Consolidation Michael Marty and Mark Hill University of Wisconsin - Madison.

Advertisements

Cache coherence for CMPs Miodrag Bolic. Private cache Each cache bank is private to a particular core Cache coherence is maintained at the L2 cache level.

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

Cache Coherence Mechanisms (Research project) CSCI-5593

ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors Mohammad Hammoud, Sangyeun Cho, and Rami Melhem Presenter: Socrates Demetriades.

A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.

The Locality-Aware Adaptive Cache Coherence Protocol George Kurian 1, Omer Khan 2, Srini Devadas 1 1 Massachusetts Institute of Technology 2 University.

Zhongkai Chen 3/25/2010. Jinglei Wang; Yibo Xue; Haixia Wang; Dongsheng Wang Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China This paper.

Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.

The University of Adelaide, School of Computer Science

CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.

1 Lecture 8: Large Cache Design I Topics: Shared vs. private, centralized vs. decentralized, UCA vs. NUCA, recent papers.

CPE 731 Advanced Computer Architecture Multiprocessor Introduction

An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

Shuchang Shan † ‡, Yu Hu †, Xiaowei Li † † Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences.

Multi-core architectures. Single-core computer Single-core CPU chip.

Tufts University Department of Electrical and Computer Engineering

Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.

Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

Predicting Coherence Communication by Tracking Synchronization Points at Run Time Socrates Demetriades and Sangyeun Cho 45 th International Symposium in.

Lecture 13: Multiprocessors Kai Bu

Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.

Managing Distributed, Shared L2 Caches through OS-Level Page Allocation Jason Bosko March 5 th, 2008 Based on “Managing Distributed, Shared L2 Caches through.

1 Lecture 13: Cache, TLB, VM Today: large caches, virtual memory, TLB (Sections 2.4, B.4, B.5)

Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.

Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:

CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.

Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.

The University of Adelaide, School of Computer Science

Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.

An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing Liquin Cheng, John B. Carter and Donglai Dai cs.utah.edu by Evangelos Vlachos.

Reactive NUMA A Design for Unifying S-COMA and CC-NUMA

Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

Architecture Background

Jason F. Cantin, Mikko H. Lipasti, and James E. Smith

Lecture: Large Caches, Virtual Memory

Lecture 13: Large Cache Design I

COSC121: Computer Systems. Managing Memory

CMSC 611: Advanced Computer Architecture

Lecture 21: Memory Hierarchy

The University of Adelaide, School of Computer Science

Accelerating Dependent Cache Misses with an Enhanced Memory Controller

Lecture 23: Cache, Memory, Virtual Memory

Lecture 22: Cache Hierarchies, Memory

Lecture: Cache Innovations, Virtual Memory

Introduction to Multiprocessors

Performance metrics for caches

Lecture 24: Memory, VM, Multiproc

Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP

Performance metrics for caches

Mengjia Yan† , Jiho Choi† , Dimitrios Skarlatos,

/ Computer Architecture and Design

Lecture 22: Cache Hierarchies, Memory

High Performance Computing

Chapter 4 Multiprocessors

The University of Adelaide, School of Computer Science

Performance metrics for caches

Lecture 17 Multiprocessors and Thread-Level Parallelism

Lecture 24: Virtual Memory, Multiprocessors

Lecture 23: Virtual Memory, Multiprocessors

Lecture 17 Multiprocessors and Thread-Level Parallelism

Jakub Yaghob Martin Kruliš

The University of Adelaide, School of Computer Science

Performance metrics for caches

Lecture 17 Multiprocessors and Thread-Level Parallelism

Presentation transcript:

컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts Institute of Technology, Cambridge, MA, USA) Mieszko Lis Yildiz Sinangil Srinivas Devadas DCC: A Dependable Cache Coherence Multicore Architecturencing DCC: A Dependable Cache Coherence

Table of Contents 1. INTRODUCTION 2. CACHE COHERENCE ARCHITECTURES 3. DEPENDABLE CACHE COHERENCE ARCHITECTURE 4. EVALUATION 5. CONCLUSION

1. INTRODUCTION Snooping protocols do not scale to large core Directory-based protocols require the overhead of directories Today computer architects are investing heavily into means of detecting and correcting errors Motivation

1. INTRODUCTION Snooping protocol –N transactions for an N-node –All caches need to watch every memory request from each processor

1. INTRODUCTION Directory-based protocol –Require the overhead of directories (# lines × # processors) –The complexity of directory protocols is attributed to the directory indirections

1. INTRODUCTION This paper proposes a novel dependable cache coherence architecture (DCC) that combines traditional directory-based coherence (DirCC) with a novel execution-migration-based coherence architecture (EM)

1. INTRODUCTION Ensures that no writable data is ever shared among caches, and therefore does not require directories When a thread needs to access an address cached on another core, the hardware efficiently migrates the thread’s execution context to the core EM protocol

Baseline architecture 2. CACHE COHERENCE ARCHITECTURES Multicore chip that is fully distributed across tiles with a uniform address space shared by all tiles

DirCC protocol 2. CACHE COHERENCE ARCHITECTURES The directory protocol brings data to the locus of the computation When a memory instruction refers to an address that is not locally cached, the instruction stalls while the coherence protocol brings the data to the local cache Cons –Long cache miss access latency –One address stored in many local cache –Many shared copies for invalidation

EM protocol 2. CACHE COHERENCE ARCHITECTURES The execution migration protocol always brings the computation to the data when a memory instruction requests an address not cached by the current core, the execution context (architecture state in registers and the TLB) moves to the core that is home for that data Although the EM protocol efficiently exploits spatial locality, the opportunities for exploiting temporal locality are limited to register values

EM protocol 2. CACHE COHERENCE ARCHITECTURES Migration ① Core C execute a memory access for address A ② It first compute the home core H for A ③ If H=C, it is a core hit → Request for A is forwarded to the cache hierarchy If H≠C, it is a core miss → Core C halts execution and migrates the architectural state to H → Thread context is loaded in the remote core H

EM protocol 2. CACHE COHERENCE ARCHITECTURES Migration Execution context transfer framework

3. DEPENDABLE CACHE COHERENCE ARCHITECTURE DirCC + EM DCC architecture enables runtime transitions between the two coherence protocols DCC protocol

Default processor configuration 4. EVALUATION

LU 4. EVALUATION SPLASH-2 LU_NON_CONTIGUOUS –Read/write data sharing which causes mass evictions DirCC –Capacity and coherence miss: 9% –AML: 35.2 cycles EM –Core miss: 65% –Hops per migration: 12 hops –Migration overhead: 51 cycles –AML: 28.4 cycles

RAYTRACE 4. EVALUATION SPLASH-2 RAYTRACE –Read-only data sharing DirCC –Capacity and coherence misse: 1.5% –AML: 5.8 cycles EM –Core miss: 29% –Hops per migration: 11 hops –Migration overhead: 47 cycles –AML: 15 cycles

Evaluation result 4. EVALUATION Average memory latency –LU 1.25X advantage for EM over the DirCC –RAYTRACE 2.6X advantage for DirCC over the EM Depending on the data sharing patterns of an application, either cache coherence protocol can perform better than the other

5. CONCLUSION Today microprocessor designers are investing heavily into means of detecting and correcting errors This paper proposed a novel dependable cache coherence architecture (DCC) that provides architectural redundancy for maintaining coherence between on-chip caches.