1 Recap: Memory Hierarchy. 2 Unified vs.Separate Level 1 Cache Unified Level 1 Cache (Princeton Memory Architecture). A single level 1 cache is used for.

Slides:



Advertisements
Similar presentations
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Advertisements

Lecture 12 Reduce Miss Penalty and Hit Time
Cache Performance 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014
Performance of Cache Memory
Quiz 4 Solution. n Frequency = 2.5GHz, CLK = 0.4ns n CPI = 0.4, 30% loads and stores, n L1 hit =0, n L1-ICACHE : 2% miss rate, 32-byte blocks n L1-DCACHE.
1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
The Memory Hierarchy & Cache
EECC551 - Shaaban #1 lec # 8 Fall Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store.
Memory Hierarchy: Motivation
EECC550 - Shaaban #1 Lec # 8 Spring Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store.
Review of Mem. HierarchyCSCE430/830 Review of Memory Hierarchy CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U.
EECC551 - Shaaban #1 lec # 8 Spring The Memory Hierarchy & Cache Memory Hierarchy & Cache Basics (from 550):Review of Memory Hierarchy &
EECC551 - Shaaban #1 lec # 8 Spring Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store.
EECC551 - Shaaban #1 lec # 8 Fall Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store.
EECC550 - Shaaban #1 Lec # 8 Winter The Memory Hierarchy & Cache Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact.
EENG449b/Savvides Lec /1/04 April 1, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
EECC550 - Shaaban #1 Lec # 8 Winter The Memory Hierarchy & Cache Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact.
EECC551 - Shaaban #1 lec # 8 Winter Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store.
Lecture 32: Chapter 5 Today’s topic –Cache performance assessment –Associative caches Reminder –HW8 due next Friday 11/21/2014 –HW9 due Wednesday 12/03/2014.
EECC551 - Shaaban #1 lec # 8 Winter The Memory Hierarchy & Cache Memory Hierarchy & Cache Basics (from 550):Review of Memory Hierarchy &
EECC551 - Shaaban #1 lec # 8 Fall Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store.
EECC551 - Shaaban #1 lec # 7 Winter Memory Hierarchy: The motivation The gap between CPU performance and main memory has been widening with.
CS252/Kubiatowicz Lec 3.1 1/24/01 CS252 Graduate Computer Architecture Lecture 3 Caches and Memory Systems I January 24, 2001 Prof. John Kubiatowicz.
EENG449b/Savvides Lec /7/05 April 7, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
Memory: PerformanceCSCE430/830 Memory Hierarchy: Performance CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U. Maine)
EECC550 - Shaaban #1 Lec # 9 Winter Memory Hierarchy: Motivation The gap between CPU performance and realistic (non-ideal) main memory speed.
2/27/2002CSE Cache II Caches, part II CPU On-chip cache Off-chip cache DRAM memory Disk memory.
EECS 370 Discussion 1 xkcd.com. EECS 370 Discussion Topics Today: – Caches!! Theory Design Examples 2.
1 Caches Concepts Review  What is a block address? —Why not bring just what is needed by the processor?  What is a set associative cache?  Write-through?
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff Case.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Lecture 10 Cache Prof. Xiaoyao Liang 2015/4/29
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.
1 CENG 450 Computer Systems and Architecture Cache Review Amirali Baniasadi
Computer Architecture Ch5-1 Ping-Liang Lai ( 賴秉樑 ) Lecture 5 Review of Memory Hierarchy (Appendix C in textbook) Computer Architecture 計算機結構.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
December 18, Digital System Architecture Memory Hierarchy Design Pradondet Nilagupta Spring 2005 (original notes from Prof. Shaaban)
Lecture Objectives: 1)Explain the relationship between miss rate and block size in a cache. 2)Construct a flowchart explaining how a cache miss is handled.
Lecture 15 Calculating and Improving Cache Perfomance
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Computer Organization CS224 Fall 2012 Lessons 41 & 42.
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
Computer Organization CS224 Fall 2012 Lessons 39 & 40.
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 29 Memory Hierarchy Design Cache Performance Enhancement by: Reducing Cache.
Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.
CPE 626 CPU Resources: Introduction to Cache Memories Aleksandar Milenkovic Web:
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
Cache Memory and Performance
The Memory Hierarchy & Cache
CSC 4250 Computer Architectures
CSCI206 - Computer Organization & Programming
Morgan Kaufmann Publishers
/ Computer Architecture and Design
ECE 445 – Computer Organization
Lecture 14: Reducing Cache Misses
CPE 631 Lecture 05: Cache Design
Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1
If a DRAM has 512 rows and its refresh time is 9ms, what should be the frequency of row refresh operation on the average?
CS 286 Computer Architecture & Organization
Cache - Optimization.
Lecture 7 Memory Hierarchy and Cache Design
Presentation transcript:

1 Recap: Memory Hierarchy

2 Unified vs.Separate Level 1 Cache Unified Level 1 Cache (Princeton Memory Architecture). A single level 1 cache is used for both instructions and data. Separate instruction/data Level 1 caches (Harvard Memory Architecture): The level 1 (L 1 ) cache is split into two caches, one for instructions (instruction cache, L 1 I-cache) and the other for data (data cache, L 1 D-cache). Control Datapath Processor Registers Unified Level One Cache L 1 Control Datapath Processor Registers L 1 I-cache L 1 D-cache Unified Level 1 Cache (Princeton Memory Architecture) Separate Level 1 Caches (Harvard Memory Architecture)

3 Memory Access Tree For Unified Level 1 Cache CPU Memory Access L1 Miss: % = (1- Hit rate) = (1-H1) Access time = M + 1 Stall cycles per access = M x (1-H1) L1 Hit: % = Hit Rate = H1 Access Time = 1 Stalls= H1 x 0 = 0 ( No Stall) L1L1 AMAT = H1 x 1 + (1 -H1 ) x (M+ 1) = 1 + M x ( 1 -H1) Stall Cycles Per Access = AMAT - 1 = M x (1 -H1) M = Miss Penalty H1 = Level 1 Hit Rate 1- H1 = Level 1 Miss Rate

4 Memory Access Tree For Separate Level 1 Caches CPU Memory Access L1L1 Instruction Data Data L1 Miss: Access Time : M + 1 Stalls per access: % data x (1 - Data H1 ) x M Data L1 Hit: Access Time: 1 Stalls = 0 Instruction L1 Hit: Access Time = 1 Stalls = 0 Instruction L1 Miss: Access Time = M + 1 Stalls Per access: %instructions x (1 - Instruction H1 ) x M Stall Cycles Per Access = % Instructions x ( 1 - Instruction H1 ) x M + % data x (1 - Data H1 ) x M AMAT = 1 + Stall Cycles per access

5 Cache Organization: Separate Instruction and Data Caches? Size Instruction CacheData CacheUnified Cache 1 KB3.06%24.61%13.34% 2 KB2.26%20.57%9.78% 4 KB1.78%15.94%7.24% 8 KB1.10%10.19%4.57% 16 KB0.64%6.47%2.87% 32 KB0.39%4.82%1.99% 64 KB0.15%3.77%1.35% 128 KB0.02%2.88%0.95% Why separate?

6 Why have separate caches? Bandwidth: lets us access instructions and data in parallel (less structural hazards) Most programs don’t modify their instructions –I-Cache can be simpler than D-Cache, since instruction references are never writes Instruction stream has high locality of reference, can get higher hit rates with small cache –Data references never interfere with instruction references

7 Cache Performance Example To compare a split 8-KB instruction cache and a 8-KB data cache with a unified 16-KB cache we assume a hit to take 1 clock cycle and a miss to take 50 clock cycles, and a load or store to take one extra clock cycle on a unified cache (there is only one cache port 75% of memory accesses are instruction references. Using the miss rates for SPEC92 we get: From SPEC92 data a unified cache would have a miss rate of 2.87% Average memory access time = 1 + stall cycles per access = 1 + % instructions x (Instruction miss rate x Miss penalty) + % data x ( Data miss rate x Miss penalty) For split cache: Average memory access time split = % x ( 1.1% x 50) + 25% x (10.19%x50) = 2.69 For unified cache: Average memory access time unified = % x ( 2.87%) x 50) + 25% x ( % x 50) = 2.68 cycles