Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain) MOSAIC :

Slides:



Advertisements
Similar presentations
Using Partial Tag Comparison in Low-Power Snoop-based Chip Multiprocessors Ali ShafieeNarges Shahidi Amirali Baniasadi Sharif University of Technology.
Advertisements

Gennady Pekhimenko Advisers: Todd C. Mowry & Onur Mutlu
SE-292 High Performance Computing
L.N. Bhuyan Adapted from Patterson’s slides
Copyright Josep Torrellas 2003,20081 Cache Coherence Instructor: Josep Torrellas CS533 Term: Spring 2008.
Virtual Hierarchies to Support Server Consolidation Michael Marty and Mark Hill University of Wisconsin - Madison.
Cache coherence for CMPs Miodrag Bolic. Private cache Each cache bank is private to a particular core Cache coherence is maintained at the L2 cache level.
Directory-Based Cache Coherence Marc De Melo. Outline Non-Uniform Cache Architecture (NUCA) Cache Coherence Implementation of directories in multicore.
1 Fast Configurable-Cache Tuning with a Unified Second-Level Cache Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.
Learning Cache Models by Measurements Jan Reineke joint work with Andreas Abel Uppsala University December 20, 2012.
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a cache for secondary (disk) storage – Managed jointly.
A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy J. Zebchuk, E. Safi, and A. Moshovos.
Bypass and Insertion Algorithms for Exclusive Last-level Caches
SE-292 High Performance Computing
Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Cooperative Cache Scrubbing Jennifer B. Sartor, Wim Heirman, Steve Blackburn*, Lieven Eeckhout, Kathryn S. McKinley^ PACT 2014 * ^
The University of Adelaide, School of Computer Science
Coherence Ordering for Ring-based Chip Multiprocessors Mike Marty and Mark D. Hill University of Wisconsin-Madison.
To Include or Not to Include? Natalie Enright Dana Vantrease.
1 Lecture 17: Large Cache Design Papers: Managing Distributed, Shared L2 Caches through OS-Level Page Allocation, Cho and Jin, MICRO’06 Co-Operative Caching.
Nikos Hardavellas, Northwestern University
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
The Locality-Aware Adaptive Cache Coherence Protocol George Kurian 1, Omer Khan 2, Srini Devadas 1 1 Massachusetts Institute of Technology 2 University.
Zhongkai Chen 3/25/2010. Jinglei Wang; Yibo Xue; Haixia Wang; Dongsheng Wang Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China This paper.
1 Lecture 4: Directory Protocols Topics: directory-based cache coherence implementations.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Scalable Directory Protocols for 1000s of Cores Dominic DiTomaso EE 6633.
FreshCache: Statically and Dynamically Exploiting Dataless Ways Arkaprava Basu, Derek R. Hower, Mark D. Hill, Mike M. Swift.
(C) 2003 Milo Martin Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper,
Cooperative Caching for Chip Multiprocessors Jichuan Chang Guri Sohi University of Wisconsin-Madison ISCA-33, June 2006.
InputsMetricsCodeResults MAIN MEMORY core Interconnection network Private data (LI) cache Cache controller core Cache controller Private data (LI)
CS492B Analysis of Concurrent Programs Coherence Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
Presented By:- Prerna Puri M.Tech(C.S.E.) Cache Coherence Protocols MSI & MESI.
1 CACM July 2012 Talk: Mark D. Hill, Cornell University, 10/2012.
Virtual Hierarchies to Support Server Consolidation Mike Marty Mark Hill University of Wisconsin-Madison ISCA 2007.
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
Token Coherence: Decoupling Performance and Correctness Milo M. D. Martin Mark D. Hill David A. Wood University of Wisconsin-Madison ISCA-30 (2003)
“An Evaluation of Directory Schemes for Cache Coherence” Presented by Scott Weber.
March University of Utah CS 7698 Token Coherence: Decoupling Performance and Correctness Article by: Martin, Hill & Wood Presented by: Michael Tabet.
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing Liquin Cheng, John B. Carter and Donglai Dai cs.utah.edu by Evangelos Vlachos.
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Presented by: Nick Kirchem Feb 13, 2004
Cache Coherence: Directory Protocol
Cache Coherence: Directory Protocol
ASR: Adaptive Selective Replication for CMP Caches
Architecture and Design of AlphaServer GS320
Interaction of NoC design and Coherence Protocol in 3D-stacked CMPs
Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper, Dan Sorin§, Mark.
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
Assignment 4 – (a) Consider a symmetric MP with two processors and a cache invalidate write-back cache. Each block corresponds to two words in memory.
CMSC 611: Advanced Computer Architecture
Directory-based Protocol
Lecture 2: Snooping-Based Coherence
Interconnect with Cache Coherency Manager
Lecture: Cache Innovations, Virtual Memory
Improving Multiple-CMP Systems with Token Coherence
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
11 – Snooping Cache and Directory Based Multiprocessors
CANDY: Enabling Coherent DRAM Caches for Multi-node Systems
Lecture 22: Cache Hierarchies, Memory
High Performance Computing
Lucía G. Menezo Valentín Puente Jose Ángel Gregorio
Lecture: Cache Hierarchies
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
CSE 486/586 Distributed Systems Cache Coherence
Presentation transcript:

Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain) MOSAIC :

University of Cantabria Edinburgh - PACT 2013 Motivation Directory Schemas In-cache Sparse MOSAIC Coherence Protocol Examples Evaluation Results Conclusions

University of Cantabria Edinburgh - PACT 2013 Performance improvement: more processors per chip Major challenges: off-chip bandwidth wall Introduce cache into the chip Complex on-chip cache hierarchies Coherence protocol: fundamental role to play 3

University of Cantabria Edinburgh - PACT 2013 What coherence protocol to use with large number of cores: Broadcast-based protocols high energy requirements Directory-based protocols more storage necessities for sharing information MOSAIC: new coherence protocol Directory without inclusiveness Token Coherence to guarantee correctness 4

University of Cantabria Edinburgh - PACT 2013 Motivation Directory Schemas In-cache Sparse MOSAIC Coherence Protocol Examples Evaluation Results Conclusions

University of Cantabria Edinburgh - PACT 2013 Each block in LLC includes tag, data and the sharers information LLC receives requests needs precise knowledge Inclusiveness is necessary: any block in the private levels needs to be allocated in LLC Advantage: coherence protocol less complex Disadvantage: all LLC blocks has storage overhead 6

University of Cantabria Edinburgh - PACT @ P Processors and private caches LLC + in-cache directory PPP Interconnection network P

University of Cantabria Edinburgh - PACT datasharers LLC + in-cache directory Interconnection network P Overhead!!! Processors and private caches

University of Cantabria Edinburgh - PACT 2013 Directory entries separated from data Allocated under demand Overhead proportional to the aggregate private levels size (not LLC) Capacity and associativity has to be sufficient to keep private-level cache tags 9

University of Cantabria Edinburgh - PACT data Interconnection network P sharers LLC Sparse dir Processors and private caches

University of Cantabria Edinburgh - PACT 2013 Duplicate-tag directory: holding all the tags of private levels Example: 16 cores with 4-way 32KB L1 64-way Associativity = # cores * private caches associativity # sets = # private caches sets tag 11

University of Cantabria Edinburgh - PACT 2013 tag 12 Decrease Associativity: now << # cores * private caches associativity tag sharers tag One tag may be in various private caches More than 1 tag per entry conflicts Inclusiveness needed invalidate private data (recalls messages) tag Increase number of sets

University of Cantabria Edinburgh - PACT 2013 Motivation Directory Schemas In-cache Sparse MOSAIC Coherence Protocol Examples Evaluation Results Conclusions 13

University of Cantabria Edinburgh - PACT 2013 In-cache or sparse it doesnt matter No inclusiveness No invalidations of data in private caches Reconstruction of sharing information under demand Uses token counting to avoid extra traffic and guarantee correctness Token Coherence protocol: Initially each block := # tokens (==#procs) Read request: data and 1 token Write request: data and all tokens 14

University of Cantabria Edinburgh - PACT I0N/A P0P0 O2DATA P1P1 S1 P2P2 Sharers I Last Level Cache I0N/A Data_slice Dir_slice Memory Controller On-chip network Private Caches StateNum. Tokens Data V 2 3 1

University of Cantabria Edinburgh - PACT 2013 When data not present in LLC broadcast for reconstruction Private caches inform of num. of held tokens Token counting avoids negative acknowledgements or timeouts Reconstruction message piggybacks type of request and requestor Key: directory may replace silently no invalidations 16

University of Cantabria Edinburgh - PACT 2013 P0P1P2 Invalid State IS Read P3 DirLLC State S State O State C Data + token State A Reconstruction Info 1 token Info 2 tokens Owner Unblock (info 1 token) Read Forward GETS to Owner Sharers [P2] Owner: ¿? Sharers [P2, P1] Owner: P1 Sharers [P2, P1, P0] Owner: P1 Data + token 3 tokens1 token Unblock Sharers [P2, P1, P0, P3] Owner: P1 17

University of Cantabria Edinburgh - PACT 2013 P0P1P2 Invalid State IS Write P3 DirLLC State S State O State C Data + 3 tokens State A Reconstruction Sharers [P0] Owner: P0 3 tokens1 token State IM State M 1 token Unblock (info all tokens) 18 Directory Eviction

University of Cantabria Edinburgh - PACT 2013 Motivation Directory Schemas In-cache Sparse MOSAIC Coherence Protocol Examples Evaluation Results Conclusions 19

University of Cantabria Edinburgh - PACT 2013 Core 0Core 1Core 2Core 3 Core 4Core 5Core 6Core 7 RRRR RRRR RRRR RRRR Slice 0Slice 2Slice 1Slice 3 Slice 4Slice 6Slice 5Slice 7 Slice 8Slice 10Slice 9Slice 11 Slice 12Slice 14Slice 13Slice 15 20

University of Cantabria Edinburgh - PACT 2013 GEMS: full-system evaluation SLICC: Specification Language for Implementing Cache Coherence 21

University of Cantabria Edinburgh - PACT 2013 Normalized execution time KB 16K entries (8 bytes per entry)

University of Cantabria Edinburgh - PACT Normalized num. misses

University of Cantabria Edinburgh - PACT 2013 Normalized execution time KB 16K entries (8 bytes per entry) 16KB 2K entries

University of Cantabria Edinburgh - PACT KB 2K entries

University of Cantabria Edinburgh - PACT 2013 Average network link utilization 26

University of Cantabria Edinburgh - PACT %!!

University of Cantabria Edinburgh - PACT Normalized link utilization 16 cores configuration

University of Cantabria Edinburgh - PACT 2013 Low complexity and great scalability Very low storage overhead No noticeable energy cost Alternative for future many-core cache coherent CMPs Bandwidth scalability of a directory Elegancy of Token Coherence MOSAIC Coherence Protocol 29

University of Cantabria Edinburgh - PACT

University of Cantabria Edinburgh - PACT

University of Cantabria Edinburgh - PACT Normalized execution time - Same experiment with BASE: 20% impact in some cases L1: 4-way 32KB / L2: 8-way 256KB x2 full dir1/10 full dir

University of Cantabria Edinburgh - PACT Normalized Dynamic Energy