Computer Architectures & Networks WS 2004/2005 Dec. 7 th 2005 The Memory Hierarchy / Caches Carsten Trinitis Lehrstuhl für Rechnertechnik und Rechnerorganisation.

Slides:



Advertisements
Similar presentations
fakultät für informatik informatik 12 technische universität dortmund Additional compiler optimizations Peter Marwedel TU Dortmund Informatik 12 Germany.
Advertisements

Lehrstuhl Informatik III: Datenbanksysteme Astrometric Matching - E-Science Workflow 1 Lehrstuhl Informatik III: 1 Datenbanksysteme 1 Fakultät für Informatik.
Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.
The Memory Gap: to Tolerate or to Reduce? Jean-Luc Gaudiot Professor University of California, Irvine April 2 nd, 2002.
1 Optimizing compilers Managing Cache Bercovici Sivan.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
High Performing Cache Hierarchies for Server Workloads
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
Reuse distance as a metric for cache behavior - pdcs2001 [1] Characterization and Optimization of Cache Behavior Kristof Beyls, Yijun Yu, Erik D’Hollander.
Memory Optimizations Research at UNT Krishna Kavi Professor Director of NSF Industry/University Cooperative Center for Net-Centric Software and Systems.
A Proposal for a New Hardware Cache Monitoring Architecture Martin Schulz, Jie Tao, Jürgen Jeitner, Wolfgang Karl Lehrstuhl für Rechnertechnik und Rechnerorganisation.
The Stanford Directory Architecture for Shared Memory (DASH)* Presented by: Michael Bauer ECE 259/CPS 221 Spring Semester 2008 Dr. Lebeck * Based on “The.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Increasing the Cache Efficiency by Eliminating Noise Philip A. Marshall.
Computer ArchitectureFall 2007 © November 14th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.
Implementing an OpenMP Execution Environment on InfiniBand Clusters Jie Tao ¹, Wolfgang Karl ¹, and Carsten Trinitis ² ¹ Institut für Technische Informatik.
Energy Efficient Prefetching – from models to Implementation 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software Engineering.
Performance Potentials of Compiler- directed Data Speculation Author: Youfeng Wu, Li-Ling Chen, Roy Ju, Jesse Fang Programming Systems Research Lab Intel.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
DATA ADDRESS PREDICTION Zohair Hyder Armando Solar-Lezama CS252 – Fall 2003.
Source Code Basics. Code For a computer to execute instructions, it needs to be in binary Each instruction is given a number Known as “operation code”
Advances in Language Design
General Theme In general work in teams combining architects, compiler developers, performance and tools engineers, and application experts –Note this extends.
CSE431 L22 TLBs.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 22. Virtual Memory Hardware Support Mary Jane Irwin (
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-8 Memory Management (2) Department of Computer Science and Software.
Memory Management. Memory  Commemoration or Remembrance.
EEL5708/Bölöni Lec 4.1 Fall 2004 September 10, 2004 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy.
Group 3: Architectural Design for Enhancing Programmability Dean Tullsen, Josep Torrellas, Luis Ceze, Mark Hill, Onur Mutlu, Sampath Kannan, Sarita Adve,
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
A Measurement Based Memory Performance Evaluation of Streaming Media Servers Garba Isa Yau and Abdul Waheed Department of Computer Engineering King Fahd.
Advanced Topics: Prefetching ECE 454 Computer Systems Programming Topics: UG Machine Architecture Memory Hierarchy of Multi-Core Architecture Software.
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
COMPUTER SYSTEMS ARCHITECTURE A NETWORKING APPROACH CHAPTER 12 INTRODUCTION THE MEMORY HIERARCHY CS 147 Nathaniel Gilbert 1.
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
1 Memory Systems Caching Lecture 24 Digital Design and Computer Architecture Harris & Harris Morgan Kaufmann / Elsevier, 2007.
Immediate Addressing Mode
CS427 Multicore Architecture and Parallel Computing
The University of Adelaide, School of Computer Science
课程名 编译原理 Compiling Techniques
Overview Introduction General Register Organization Stack Organization
Cache Memory Presentation I
Multi-Processing in High Performance Computer Architecture:
Implementation of IDEA on a Reconfigurable Computer
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy Snehasish Kumar, Hongzhou Zhao†, Arrvindh Shriraman Eric Matthews∗, Sandhya.
Presented by: Isaac Martin
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Part V Memory System Design
Lecture 14: Reducing Cache Misses
Chapter 5 Memory CSE 820.
Lecture: Cache Innovations, Virtual Memory
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Implementing an OpenMP Execution Environment on InfiniBand Clusters
Computer System Design Lecture 9
Multi Core Processing What is term Multi Core?.
15-740/ Computer Architecture Lecture 14: Prefetching
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
How to improve (decrease) CPI
Segmentation Observation: Programmers don’t think in pages!
Week1 software - Lecture outline & Assignments
Cache Performance Improvements
Presentation transcript:

Computer Architectures & Networks WS 2004/2005 Dec. 7 th 2005 The Memory Hierarchy / Caches Carsten Trinitis Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR) Technische Universität München

© C. Trinitis, Computer Architecture & Networks, CSE/TUM, WS 2005/ Programmability  Caches have no impact  Caches are designed to be transparent  Programmer has no influence  BUT: large performance impact  Need to use caches efficiently: High Locality!  E.g. Try to reuse data in caches  HPC Applications need to be tailored to caches  Adapt to cache sizes and cache line sizes  Good understanding of architecture required  BUT: Significant performance gains

© C. Trinitis, Computer Architecture & Networks, CSE/TUM, WS 2005/ Are Caches THE Solution?  Caches have been very effective  Transparently help to close the memory-CPU gap  Present in all modern computer systems  But: difficult to control  Adapt software to fully exploit caches  Can be very cumbersome (use analysis tools!)

© C. Trinitis, Computer Architecture & Networks, CSE/TUM, WS 2005/ Other Techniques / Pros & Cons  Prefetching  Try to preload data that will „potentially“ be used  Pro: Data can be pre-requested  Con: May waste bandwidth / not used loads  Controlled by Hardware  Speculative loads  Controlled by programmer / compiler  Insert the prefetching statements into the code.

© C. Trinitis, Computer Architecture & Networks, CSE/TUM, WS 2005/ Ideas for the Future  Enhance cache model  Higher associativity  Larger cache blocks  Adaptable cache parameters  Reconfigurable architectures  Adjust memory hierarchy to applications  Use of distinct local memory pools  Hardware remapping techniques  New model of computation  Away from CPU to memory centric models (PIM)

© C. Trinitis, Computer Architecture & Networks, CSE/TUM, WS 2005/ Summary  The Memory System  The Memory Wall Problems  Caches and their Problems  Issue of Locality & Alignment  Programmability & Software Techniques  Beyond caches  E.g. Prefetching