InputsMetricsCodeResults MAIN MEMORY core Interconnection network Private data (LI) cache Cache controller core Cache controller Private data (LI)

Slides:



Advertisements
Similar presentations
Cache coherence for CMPs Miodrag Bolic. Private cache Each cache bank is private to a particular core Cache coherence is maintained at the L2 cache level.
Advertisements

Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.
The University of Adelaide, School of Computer Science
Cache Coherence Mechanisms (Research project) CSCI-5593
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Zhongkai Chen 3/25/2010. Jinglei Wang; Yibo Xue; Haixia Wang; Dongsheng Wang Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China This paper.
CS 258 Parallel Computer Architecture Lecture 15.1 DASH: Directory Architecture for Shared memory Implementation, cost, performance Daniel Lenoski, et.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
The University of Adelaide, School of Computer Science
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
SYNAR Systems Networking and Architecture Group CMPT 886: Architecture of Niagara I Processor Dr. Alexandra Fedorova School of Computing Science SFU.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
InputsMetricsCode MAIN MEMORY core Interconnection network Private data (LI) cache Cache controller core Cache controller Private data (LI) cache MULTICORE.
IntroductionSnoopingDirectoryConclusion IntroductionSnoopingDirectoryConclusion Memory 1A 2B 3C 4D 5E Cache 1 1A 2B 3C Cache 2 3C 4D 5E Cache 4 1A 2B.
Comparing Memory Systems for Chip Multiprocessors Leverich et al. Computer Systems Laboratory at Stanford Presentation by Sarah Bird.
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
Lecture 13: Multiprocessors Kai Bu
Analytic Evaluation of Shared-Memory Systems with ILP Processors Daniel J. Sorin, Vijay S. Pai, Sarita V. Adve, Mary K. Vernon, and David A. Wood Presented.
Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
Cache Coherence Protocols A. Jantsch / Z. Lu / I. Sander.
RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors.
IntroductionSnoopingDirectoryConclusion IntroductionSnoopingDirectoryConclusion.
Caching in multiprocessor systems Tiina Niklander In AMICT 2009, Petrozavodsk
컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
The University of Adelaide, School of Computer Science
An Evaluation of Memory Consistency Models for Shared- Memory Systems with ILP processors Vijay S. Pai, Parthsarthy Ranganathan, Sarita Adve and Tracy.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
The University of Adelaide, School of Computer Science
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
Lecture 13: Multiprocessors Kai Bu
COSC6385 Advanced Computer Architecture
תרגול מס' 5: MESI Protocol
Architecture and Design of AlphaServer GS320
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
A Study on Snoop-Based Cache Coherence Protocols
12.4 Memory Organization in Multiprocessor Systems
Multiprocessors Oracle SPARC M core, 64MB L3 cache (8 x 8 MB), 1.6TB/s. 256 KB of 4-way SA L2 ICache, 0.5 TB/s per cluster. 2 cores share 256 KB,
The University of Adelaide, School of Computer Science
Directory-based Protocol
The University of Adelaide, School of Computer Science
Lecture 2: Snooping-Based Coherence
Chip-Multiprocessor.
Interconnect with Cache Coherency Manager
Multiprocessor Highlights
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Lecture 25: Multiprocessors
High Performance Computing
Lecture 25: Multiprocessors
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Lecture 17 Multiprocessors and Thread-Level Parallelism
Jakub Yaghob Martin Kruliš
Lecture 19: Coherence and Synchronization
The University of Adelaide, School of Computer Science
CSE 486/586 Distributed Systems Cache Coherence
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

InputsMetricsCodeResults

MAIN MEMORY core Interconnection network Private data (LI) cache Cache controller core Cache controller Private data (LI) cache MULTICORE PROCESSOR CHIP

InputsMetricsCodeResults

Name#coresC Latency M Latency M Blocks Cache Blocks Input Size Store % Number of invalidate messages in MSI and MESI Number of write-backs in MSI and MOSI L K1K 0 L K1K 40 L K1K 80 M K1K10K0 M K1K10K40 M K1K10K80 H K1K100K0 H K1K100K40 H K1K100K80 InputsMetricsCodeResults

Name#coresC Latency M Latency M Blocks Cache Blocks Input Size Store % Sensitivity of write-backs to the cache size MC K1010K50 MC K10010K50 MC K1K10K50 Sensitivity of write-backs to the # of cores MW K10010K50 MW K10010K50 MW K10010K50 InputsMetricsCodeResults  Goals: number of invalidate messages (MESI), number of write backs (MOSI)

InputsMetricsCodeResults

InputsMetricsCodeResults

InputsMetricsCodeResults

InputsMetricsCodeResults

InputsMetricsCodeResults

InputsMetricsCodeResults

InputsMetricsCodeResults

IntroductionSnoopingDirectoryConclusion

Name#coresC Latency M Latency M Blocks Cache Blocks Input Size Store % Number of invalidate messages in MSI and MESI Number of write-backs in MSI and MOSI L K1K 0 L K1K 40 L K1K 80 M K1K10K0 M K1K10K40 M K1K10K80 H K1K100K0 H K1K100K40 H K1K100K80 InputsMetricsCodeResults

InputsMetricsCodeResults

InputsMetricsCodeResults

InputsMetricsCodeResults

InputsMetricsCodeResults Name#coresC Latency M Latency M Blocks Cache Blocks Input Size Store % Sensitivity of write-backs to the cache size MC K1010K50 MC K10010K50 MC K1K10K50

InputsMetricsCodeResults

InputsMetricsCodeResults Name#coresC Latency M Latency M Blocks Cache Blocks Input Size Store % Sensitivity of write-backs to the # of cores MW K10010K50 MW K10010K50 MW K10010K50

InputsMetricsCodeResults

Interconnection network MAIN MEMORY core Private data (LI) cache Cache controller Directory controller Directory MAIN MEMORY core Private data (LI) cache Cache controller Directory controller Directory In this presentation, we present the result of implementing multiprocessor system model with distributed directory

… Directory controller Cache Block Cache controller Core Cache controller Core Cache controller Core Cache controller sends request to directory

Cache controller Core … Cache controller Directory controller Cache Block Core Cache controller Core bottleneck

Cache controller Core … Cache controller Directory controller Cache Block Core Cache controller Core Directory controller Cache controller responses to every request by unicasting message

Messages typesStates

 MOSI_protocol_cache_request: Executing cache controller request  MOSI_protocol_directory_request: Executing directory controller response  I_state_cache: Performing cache actions when it is in I state  Transition_I_to_SD: Performing cache actions when it is in I state and wants to change to S state with condition D  Directory_I: Performing directory action upon receiving message on cache controller for a block in I state

 MOSI protocol:  Number of cores: 8; Number of request/cycle: 4 L1 Block Size (bytes) Write-Back/ Memory References Write backs L1 cash size (KB) Write backs L1 block size (bytes) Block size =16 bytes Cache size = 128 bytes

Number of write backs mean(MOSI/MSI) =

IntroductionDirectorySnoopingConclusion Number of blocks/cache: 1000 Number of cache:100 Number of request/cycle: 4 Number of stalls mean(MOSI/MSI) =

Number of blocks/cache: 1000 Number of cache:100 Number of request/cycle: 4 Number of cycles mean(MOSI/MSI) = 1.459

Number of blocks/cache: 1000 Number of cache:100 Number of request/cycle: 4 mean(MOSI/MSI) = 1.345mean(MOSI/MSI) = 1.273

 [1] - Daniel J. S. Mark D. H. David A. W., “A Primer on Memory Consistency and Cache Coherence,” Morgan Claypool Publishers,  [2] – Suleman, Linda Bigelow Veynu Narasiman Aater. "An Evaluation of Snoop- Based Cache Coherence Protocols."  [3] – Tiwari, Anoop. Performance comparison of cache coherence protocol on multi-core architecture. Diss  [4] – Chang, Mu-Tien, Shih-Lien Lu, and Bruce Jacob. "Impact of Cache Coherence Protocols on the Power Consumption of STT-RAM-Based LLC."  [5] – CMU : Parallel Architecture and Programming. Lecture Series. Spring 2012