Computer Organization CS224 Fall 2012 Lessons 41 & 42.

Slides:

Advertisements

Similar presentations

Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy

Advertisements

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

Cache Performance 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.

Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014

Performance of Cache Memory

Quiz 4 Solution. n Frequency = 2.5GHz, CLK = 0.4ns n CPI = 0.4, 30% loads and stores, n L1 hit =0, n L1-ICACHE : 2% miss rate, 32-byte blocks n L1-DCACHE.

1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.

Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.

11/8/2005Comp 120 Fall November 9 classes to go! Read Section 7.5 especially important!

Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.

Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.

1 Recap: Memory Hierarchy. 2 Unified vs.Separate Level 1 Cache Unified Level 1 Cache (Princeton Memory Architecture). A single level 1 cache is used for.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Memory Hierarchy Design Chapter 5 Karin Strauss. Background 1980: no caches 1995: two levels of caches 2004: even three levels of caches Why? Processor-Memory.

CS 61C L35 Caches IV / VM I (1) Garcia, Fall 2004 © UCB Andy Carle inst.eecs.berkeley.edu/~cs61c-ta inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)

1  2004 Morgan Kaufmann Publishers Chapter Seven.

1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.

Lecture 32: Chapter 5 Today’s topic –Cache performance assessment –Associative caches Reminder –HW8 due next Friday 11/21/2014 –HW9 due Wednesday 12/03/2014.

1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.

Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy

Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.

Computing Systems Memory Hierarchy.

Chapter 1 Computer System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,

 Higher associativity means more complex hardware  But a highly-associative cache will also exhibit a lower miss rate —Each set has more blocks, so there’s.

CMP 301A Computer Architecture 1 Lecture 3. Outline zQuick summary zMultilevel cache zVirtual memory y Motivation and Terminology y Page Table y Translation.

Lecture 6. Cache #2 Prof. Taeweon Suh Computer Science & Engineering Korea University ECM534 Advanced Computer Architecture.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.

University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 11: Memory Hierarchy.

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

1 CENG 450 Computer Systems and Architecture Cache Review Amirali Baniasadi

Computer Architecture Ch5-1 Ping-Liang Lai ( 賴秉樑 ) Lecture 5 Review of Memory Hierarchy (Appendix C in textbook) Computer Architecture 計算機結構.

Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.

Lecture 15 Calculating and Improving Cache Perfomance

Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

1  1998 Morgan Kaufmann Publishers Chapter Seven.

1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:

Caches 1 Computer Organization II © McQuain Memory Technology Static RAM (SRAM) – 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM)

Memory Hierarchy 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.

The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy.

Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.

Memory Hierarchy— Five Ways to Reduce Miss Penalty.

Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.

CS161 – Design and Architecture of Computer Systems

COSC3330 Computer Architecture

Cache Memory and Performance

The Goal: illusion of large, fast, cheap memory

Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy

CSC 4250 Computer Architectures

How will execution time grow with SIZE?

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Morgan Kaufmann Publishers

Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy

ECE 445 – Computer Organization

Lecture 14: Reducing Cache Misses

Chapter 5 Memory CSE 820.

ECE 445 – Computer Organization

Set-Associative Cache

November 14 6 classes to go! Read

Morgan Kaufmann Publishers

Cache - Optimization.

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

Computer Organization CS224 Fall 2012 Lessons 41 & 42

Measuring Cache Performance  Components of CPU time l Program execution cycles -Includes cache hit time l Memory stall cycles -Mainly from cache misses  With simplifying assumptions: §5.3 Measuring and Improving Cache Performance

Cache Performance Example  Given l I-cache miss rate = 2% l D-cache miss rate = 4% l Miss penalty = 100 cycles l Base CPI (ideal cache) = 2.0 l Load & stores are 36% of instructions  Miss cycles per instruction l I-cache: 1.0 × 0.02 × 100 = 2.0 l D-cache: 0.36 × 0.04 × 100 = 1.44  Actual CPI = = 5.44 l Real CPI is 5.44/2 =2.72 times slower

Average Access Time  Hit time is also important for performance  Average memory access time (AMAT) l AMAT = Hit time + Miss rate × Miss penalty  Example l CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, I-cache miss rate = 5% l AMAT = 1ns + (0.05 × 20ns) = 2ns -2 cycles per instruction

Performance Summary  When CPU performance increased l Miss penalty becomes more significant  Decreasing base CPI l Greater proportion of time spent on memory stalls  Increasing clock rate l Memory stalls account for more CPU cycles  Can’t neglect cache behavior when evaluating system performance

Interactions with Software  Misses depend on memory access patterns l Algorithm behavior l Compiler optimization for memory access

Replacement Policy  Direct mapped: no choice  Set associative l Prefer non-valid entry, if there is one l Otherwise, choose among entries in the set  Least-recently used (LRU) l Choose the one unused for the longest time -Simple for 2-way, manageable for 4-way, too hard beyond that  Random l Gives approximately the same performance as LRU for high associativity

Multilevel Caches  Primary cache attached to CPU l Small, but fast  Level-2 cache services misses from primary cache l Larger, slower, but still faster than main memory  Main memory services L-2 cache misses  Some high-end systems include L-3 cache

Multilevel Cache Example  Given l CPU base CPI = 1, clock rate = 4GHz l Miss rate/instruction = 2% l Main memory access time = 100ns  With just primary cache l Miss penalty = 100ns/0.25ns = 400 cycles l Effective CPI = × 400 = 9

Example (cont.)  Now add L-2 cache l Access time = 5ns l Global miss rate to main memory = 0.5%  Primary miss with L-2 hit l L-1 miss penalty = 5ns/0.25ns = 20 cycles  Primary miss with L-2 miss l L-2 miss penalty = 400 cycles  CPI = 1 + (0.02 × 20) + (0.005 × 400) = 3.4  Performance ratio = 9/3.4 = 2.6

Multilevel Cache Considerations  Primary cache l Focus on minimal hit time  L-2 cache l Focus on low miss rate to avoid main memory access l Hit time has less overall impact  Results l L-1 cache usually smaller than a single cache l L-1 block size smaller than L-2 block size