The University of Adelaide, School of Computer Science

Slides:



Advertisements
Similar presentations
Anshul Kumar, CSE IITD CSL718 : Memory Hierarchy Cache Performance Improvement 23rd Feb, 2006.
Advertisements

Lecture 12 Reduce Miss Penalty and Hit Time
Miss Penalty Reduction Techniques (Sec. 5.4) Multilevel Caches: A second level cache (L2) is added between the original Level-1 cache and main memory.
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
The University of Adelaide, School of Computer Science
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 10, 2003 Topic: Caches (contd.)
CS252/Culler Lec 4.1 1/31/02 CS203A Graduate Computer Architecture Lecture 14 Cache Design Taken from Prof. David Culler’s notes.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
EECS 470 Cache Systems Lecture 13 Coverage: Chapter 5.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 30, 2002 Topic: Caches (contd.)
The University of Adelaide, School of Computer Science
Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.
Lecture 10: Memory Hierarchy Design Kai Bu
CSC 4250 Computer Architectures December 5, 2006 Chapter 5. Memory Hierarchy.
– 1 – CSCE 513 Fall 2015 Lec08 Memory Hierarchy IV Topics Pipelining Review Load-Use Hazard Memory Hierarchy Review Terminology review Basic Equations.
Computer Architecture Ch5-1 Ping-Liang Lai ( 賴秉樑 ) Lecture 5 Review of Memory Hierarchy (Appendix C in textbook) Computer Architecture 計算機結構.
– 1 – CSCE 513 Fall 2015 Lec07 Memory Hierarchy II Topics Pipelining Review Load-Use Hazard Memory Hierarchy Review Terminology review Basic Equations.
Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.
Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
CS6290 Caches. Locality and Caches Data Locality –Temporal: if data item needed now, it is likely to be needed again in near future –Spatial: if data.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
For each of these, where could the data be and how would we find it? TLB hit – cache or physical memory TLB miss – cache, memory, or disk Virtual memory.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 M EMORY H IERARCHY D ESIGN Computer Architecture A Quantitative Approach, Fifth Edition.
Microprocessor Microarchitecture Memory Hierarchy Optimization Lynn Choi Dept. Of Computer and Electronics Engineering.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
CS203 – Advanced Computer Architecture Cache. Memory Hierarchy Design Memory hierarchy design becomes more crucial with recent multi-core processors:
Chapter 5 Memory Hierarchy Design. 2 Many Levels in Memory Hierarchy Pipeline registers Register file 1st-level cache (on-chip) 2nd-level cache (on same.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
Chapter 5 Memory II CSE 820. Michigan State University Computer Science and Engineering Equations CPU execution time = (CPU cycles + Memory-stall cycles)
COMP 740: Computer Architecture and Implementation
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
Reducing Hit Time Small and simple caches Way prediction Trace caches
Associativity in Caches Lecture 25
The University of Adelaide, School of Computer Science
CSC 4250 Computer Architectures
CS 704 Advanced Computer Architecture
现代计算机体系结构 主讲教师:张钢 教授 天津大学计算机学院 课件、作业、讨论网址:
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
5.2 Eleven Advanced Optimizations of Cache Performance
Cache Memory Presentation I
Morgan Kaufmann Publishers
Lec07 Memory Hierarchy II
Lec09 Memory Hierarchy yet again
Lec07 Memory Hierarchy II
The University of Adelaide, School of Computer Science
Chapter 2 Memory Hierarchy Design
Lec07 Memory Hierarchy II
Lecture 14: Reducing Cache Misses
Chapter 5 Memory CSE 820.
CS203A Graduate Computer Architecture Lecture 13 Cache Design
Lec07 Memory Hierarchy II
Ka-Ming Keung Swamy D Ponpandi
Adapted from slides by Sally McKee Cornell University
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Siddhartha Chatterjee
Summary 3 Cs: Compulsory, Capacity, Conflict Misses Reducing Miss Rate
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
Cache Performance Improvements
Areg Melik-Adamyan, PhD
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Ka-Ming Keung Swamy D Ponpandi
Presentation transcript:

The University of Adelaide, School of Computer Science Computer Architecture A Quantitative Approach, Fifth Edition The University of Adelaide, School of Computer Science 20 July 2018 Chapter 2 Memory Hierarchy Design Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 20 July 2018 Memory Hierarchy Introduction Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 20 July 2018 Causes of misses Introduction Compulsory First reference to a block Capacity Blocks discarded and later retrieved Conflict Program makes repeated references to multiple addresses from different blocks that map to the same location in the cache Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Memory Hierarchy Basics The University of Adelaide, School of Computer Science 20 July 2018 Memory Hierarchy Basics Introduction Note that speculative and multithreaded processors may execute other instructions during a miss Reduces performance impact of misses Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Memory Hierarchy Basics The University of Adelaide, School of Computer Science 20 July 2018 Memory Hierarchy Basics Introduction Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss rate Increases hit time, increases power consumption Higher associativity Reduces conflict misses Higher number of cache levels Reduces overall memory access time Giving priority to read misses over writes Use a write buffer! Reduces miss penalty Avoiding address translation in cache indexing Reduces hit time Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

Advanced cache optimizations Copyright © 2012, Elsevier Inc. All rights reserved.

1. Small and simple first level caches The University of Adelaide, School of Computer Science 20 July 2018 1. Small and simple first level caches Critical timing path: addressing tag memory, then comparing tags, then selecting correct set Direct-mapped caches can overlap tag compare and transmission of data Lower associativity reduces power because fewer cache lines are accessed Advanced Optimizations Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 20 July 2018 2. Way Prediction To improve hit time, predict the way to pre-set mux Mis-prediction gives longer hit time Prediction accuracy > 90% for two-way > 80% for four-way I-cache has better accuracy than D-cache First used on MIPS R10000 in mid-90s Used on ARM Cortex-A8 Extend to predict block as well “Way selection” Increases mis-prediction penalty Advanced Optimizations Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 20 July 2018 3. Pipelining Cache Pipeline cache access to improve bandwidth Examples: Pentium: 1 cycle Pentium Pro – Pentium III: 2 cycles Pentium 4 – Core i7: 4 cycles Increases branch mis-prediction penalty Makes it easier to increase associativity Advanced Optimizations Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 20 July 2018 4. Nonblocking Caches Allow hits before previous misses complete “Hit under miss” “Hit under multiple miss” L2 must support this In general, processors can hide L1 miss penalty but not L2 miss penalty Advanced Optimizations Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 20 July 2018 5. Multibanked Caches Organize cache as independent banks to support simultaneous access ARM Cortex-A8 supports 1-4 banks for L2 Intel i7 supports 4 banks for L1 and 8 banks for L2 Interleave banks according to block address Advanced Optimizations Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

6. Critical Word First, Early Restart The University of Adelaide, School of Computer Science 20 July 2018 6. Critical Word First, Early Restart Critical word first Request missed word from memory first Send it to the processor as soon as it arrives Early restart Request words in normal order Send missed work to the processor as soon as it arrives Effectiveness of these strategies depends on block size and likelihood of another access to the portion of the block that has not yet been fetched Advanced Optimizations Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 20 July 2018 7. Merging Write Buffer When storing to a block that is already pending in the write buffer, update write buffer Reduces stalls due to full write buffer Do not apply to I/O addresses Advanced Optimizations No write buffering Write buffering Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

8. Compiler Optimizations The University of Adelaide, School of Computer Science 20 July 2018 8. Compiler Optimizations Loop Interchange Swap nested loops to access memory in sequential order Blocking Instead of accessing entire rows or columns, subdivide matrices into blocks Requires more memory accesses but improves locality of accesses Advanced Optimizations Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 20 July 2018 9. Hardware Prefetching Fetch two blocks on miss (include next sequential block) Advanced Optimizations Pentium 4 Pre-fetching Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 20 July 2018 10. Compiler Prefetching Insert prefetch instructions before data is needed Non-faulting: prefetch doesn’t cause exceptions Register prefetch Loads data into register Cache prefetch Loads data into cache Combine with loop unrolling and software pipelining Advanced Optimizations Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

The University of Adelaide, School of Computer Science 20 July 2018 Summary Advanced Optimizations Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer