Caching in multiprocessor systems Tiina Niklander In AMICT 2009, Petrozavodsk 19.5.2009.

Slides:



Advertisements
Similar presentations
Chapter 5 Part I: Shared Memory Multiprocessors
Advertisements

CSE 502: Computer Architecture
SE-292 High Performance Computing
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
2013/06/10 Yun-Chung Yang Kandemir, M., Yemliha, T. ; Kultursay, E. Pennsylvania State Univ., University Park, PA, USA Design Automation Conference (DAC),
ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto
Thoughts on Shared Caches Jeff Odom University of Maryland.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
CS 258 Parallel Computer Architecture Lecture 15.1 DASH: Directory Architecture for Shared memory Implementation, cost, performance Daniel Lenoski, et.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection.
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
InputsMetricsCode MAIN MEMORY core Interconnection network Private data (LI) cache Cache controller core Cache controller Private data (LI) cache MULTICORE.
Computer System Architectures Computer System Software
LOGO Multi-core Architecture GV: Nguyễn Tiến Dũng Sinh viên: Ngô Quang Thìn Nguyễn Trung Thành Trần Hoàng Điệp Lớp: KSTN-ĐTVT-K52.
StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache Hyunjin Lee Sangyeun Cho Bruce R. Childers Dept. of Computer Science University.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
IntroductionSnoopingDirectoryConclusion IntroductionSnoopingDirectoryConclusion Memory 1A 2B 3C 4D 5E Cache 1 1A 2B 3C Cache 2 3C 4D 5E Cache 4 1A 2B.
InputsMetricsCodeResults MAIN MEMORY core Interconnection network Private data (LI) cache Cache controller core Cache controller Private data (LI)
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Lecture 13: Multiprocessors Kai Bu
Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.
1 Lecture: Large Caches, Virtual Memory Topics: cache innovations (Sections 2.4, B.4, B.5)
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
December 1, 2006©2006 Craig Zilles1 Threads and Cache Coherence in Hardware  Previously, we introduced multi-cores. —Today we’ll look at issues related.
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
Analyzing the Impact of Data Prefetching on Chip MultiProcessors Naoto Fukumoto, Tomonobu Mihara, Koji Inoue, Kazuaki Murakami Kyushu University, Japan.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
Understanding Parallel Computers Parallel Processing EE 613.
컴퓨터교육과 이상욱 Published in: COMPUTER ARCHITECTURE LETTERS (VOL. 10, NO. 1) Issue Date: JANUARY-JUNE 2011 Publisher: IEEE Authors: Omer Khan (Massachusetts.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
1 Lecture 7: PCM Wrap-Up, Cache coherence Topics: handling PCM errors and writes, cache coherence intro.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
Background Computer System Architectures Computer System Software.
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
The University of Adelaide, School of Computer Science
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Lecture 13: Multiprocessors Kai Bu
COMP 740: Computer Architecture and Implementation
Lynn Choi School of Electrical Engineering
Multiprocessing.
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Lecture 12: Cache Innovations
Cache Coherence (controllers snoop on bus transactions)
Lecture 2: Snooping-Based Coherence
Kai Bu 13 Multiprocessors So today, we’ll finish the last part of our lecture sessions, multiprocessors.
Lecture: Cache Innovations, Virtual Memory
/ Computer Architecture and Design
Lecture: Cache Hierarchies
Chapter 4 Multiprocessors
The University of Adelaide, School of Computer Science
Jakub Yaghob Martin Kruliš
The University of Adelaide, School of Computer Science
CSE 486/586 Distributed Systems Cache Coherence
Presentation transcript:

Caching in multiprocessor systems Tiina Niklander In AMICT 2009, Petrozavodsk

Background More transistors on one chip Multiple cores Larger cache Multiple on chip caches More functionality (more functional units, dedicated multimedia / deciphering cell, integrated GPU) Multiple cores introduce Cache organization Private vs shared caches Cache coherence

Cache organization Common organization: L1 is private Last-level cache is shared With three levels: L1 private L2 ? Private or shared L3 Shared

Private vs Shared cache Fully private, fully shared, partially shared F. Sibai: On the performance benefits of sharing and privatizing second and third-level cache memories in homogeneous multi-core architectures. Microprocessors and Microsystems 32 ( 2008), pp Shared L2 (all can access all L2) Private L2 (pair of processors share)

Shared cache Simple coherence issue (just one copy) Different latencies (CPU - cache location) Cache access competition (wait for other core) M. Kandemir, F. Li, M.J. Irwin, S.W. Son: A Novel Migration-Based NUCA Design for Chip Multiprocessors. In SC2008. IEEE, 2008, pp.

Private cache No access competition, smaller latencies, But coherence becomes an issue! Same date in multiple caches -> invalidate on write Cache partitioning Design time: Fixed partitioning Run time: Fixed partitioning (configuration issue) Dynamic (based on current need)

Cache coherence Protocols: MESI, MSI, MOSI, MOESI Invalidation message: RFO (Read for ownership) Each cache snoops the bus to monitor memory ops M E S I M NNNY E NNNY S NNYY I YYYY M – modified (O- Owned) E – Exlusive S – Shared I – Invalid N – not allowed state Y – allowed state wikipedia

(Distributed) cooperative caches Add a directory structure Knows the data locations in local caches Cache-to-cache copying When in another cache (directory locates) On eviction (store temporarily on another cache) E, Herrero, J. Conzález, R. Canal: Distributed Cooperative Caching. In PACT’08. ACM 2008, pp

New improvement ideas for cache performance 1/2 Split the cache for different tasks Dynamically allocate cache areas Software controlled eviction GOAL: thread moves unneeded, but strongly-shared data to shared cache to improve performance of other threads New instruction evict tells the processor to move some data from private L1 or L2 to shared L3

New improvement ideas for cache performance 2/2 Helper threads GOAL: additional thread executes parts of the code ahead of the actual thread to ‘prefetch’ data to cache Generate memory traces for the programmer Tuning the software performance

Conclusion Focus on fine-tuning the cache performance Cache coherence itself is solved earlier Not always used (if allowed non-coherent usage) L2 and L3 caches Shared or private Cache partitioning Support for software-based improvements Eviction hints Traces Prefetching (like helper thread)

References S. Fide, S. Jenks: Proactive use of shared L3 caches to enhance cache communic- ations in multi-core processors. IEEE Comp. Arch. L. vol 7 (2008), pp E. Herrero, J. Conzález, R. Canal: Distributed Cooperative Caching. In Conf. on Parallel architectures and compilation techniques, PACT’08. ACM 2008, pp M. Kandemir, F. Li, M.J. Irwin, S.W. Son: A Novel Migration-Based NUCA Design for Chip. Multiprocessors. In Proc. of the 2008 ACM/IEEE Conf. on Supercomputing. IEEE, 2008, pp L. Peng, et.al.: Memory hierarchy performance measurement of commercial dual-core desktop processors. Journal of Systems Architecture 54(2008), pp F. Sibai: On the performance benefits of sharing and privatizing second and third-level cache memories in homogeneous multi-core architectures. Microprocessors and Microsystems 32 ( 2008), pp J. Zhang, X. Fan, S.H. Liu: A Pollution Alleviate L2 Cache Replacement Policy for Chip Multiprocessor Architecture. In Int. Conf. on Networking, Architecture and Storage, IEEE, 2008, pp