Memory Hierarchy Design Adapted from CS 252 Spring 2006 UC Berkeley Copyright (C) 2006 UCB.

Slides:



Advertisements
Similar presentations
Artrelle Fragher & Robert walker. 1 you look for the median 1 you look for the median 2 then you look for the min and max 2 then you look for the min.
Advertisements


LE FONTANE DELLACQUANUOVA Assonometria monometrica in scala 1:25.
Ozone Level ppb (parts per billion)
1 The 5 Minute Rule Jim Gray Microsoft Research Kilo10 3 Mega10 6 Giga10 9 Tera10 12 today,
Percent III by Monica Yuskaitis. Copyright © 2000 by Monica Yuskaitis How to Change Fractions to Percents Step 1 - Divide the denominator into the numerator.
How many people do we need to vaccinate?. Suppose R 0 = 10 How many new infections result from each infected person on average? 10 So how many people.
Graphing Linear Inequalities in Two Variables
Half Life. The half-life of a quantity whose value decreases with time is the interval required for the quantity to decay to half of its initial value.
1  1 =.
Year 6 mental test 10 second questions Numbers and number system Numbers and the number system, fractions, decimals, proportion & probability.
Fractions decimals & percentages
Can you explain the ideas
Sl No Top-up Amount No Of Affiliate Ads Payment Per Day By Affiliate Ad Total Affiliate Ad Income 1.5,000/- Daily 2 ad for 100 days 100/- Affiliate.
COMPUTER SYSTEMS An Integrated Approach to Architecture and Operating Systems Chapter 9 Memory Hierarchy ©Copyright 2008 Umakishore Ramachandran and William.
CS 105 Tour of the Black Holes of Computing
Chapter 13: Direct Memory Access
C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 Advanced Computers Architecture UNIT 2 CACHE MEOMORY Lecture7.
Number your paper from 1 to 10.
Cache and Virtual Memory Replacement Algorithms
ITEC 352 Lecture 25 Memory(2). Review RAM –Why it isnt on the CPU –What it is made of –Building blocks to black boxes –How it is accessed –Problems with.
Money Math Review.
NEFE High School Financial Planning Program Unit Three – Investing: Making Money Work for You Investing: Making Money Work for You Investing: Making Money.
Computer Systems – the impact of caches
Marks out of 100 Mrs Smith’s Class Median Lower Quartile Upper Quartile Minimum Maximum.
Communication-Avoiding Algorithms Jim Demmel EECS & Math Departments UC Berkeley.
Reaction Quotient – Q Value of the equilibrium expression for a reaction that is not at equilibrium 3 H2 + N2  2 NH3 Kc = 9.60.
Making Numbers Two-digit numbers Three-digit numbers Click on the HOME button to return to this page at any time.
Q $100 Q $200 Q $300 Q $400 Q $500 Q $100 Q $200 Q $300 Q $400 Q $500 Final Jeopardy SubstitutionAddition Student Choice Special Systems Challenge Jeopardy.
Look at these sequences. What is happening? 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 10, 20, 30, 40, 50, 60, 70, 80, 90, , 200, 300, 400, 500, 600, 700, 800,
Number bonds to 10,
Beat the Computer Drill Divide 10s Becky Afghani, LBUSD Math Curriculum Office, 2004 Vertical Format.
Latent Heat Materials can also go through a change in state called a phase change. Changes between solid, liquid, gaseous and plasma states. Energy Is.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Cache Design and Tricks Presenters: Kevin Leung Josh Gilkerson Albert Kalim Shaz Husain.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
55:035 Computer Architecture and Organization Lecture 7 155:035 Computer Architecture and Organization.
Appendix B. Memory Hierarchy CSCI/ EENG – W01 Computer Architecture 1 Dr. Babak Beheshti Slides based on the PowerPoint Presentations created by.
Appendix C: Review of Memory Hierarchy David Patterson Electrical Engineering and Computer Sciences University of California, Berkeley
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
MEMORY ORGANIZATION Memory Hierarchy Main Memory Auxiliary Memory
Memory Hierarchy. Smaller and faster, (per byte) storage devices Larger, slower, and cheaper (per byte) storage devices.
Memory Subsystem and Cache Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
CIS429/529 Cache Basics 1 Caches °Why is caching needed? Technological development and Moore’s Law °Why are caches successful? Principle of locality °Three.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 20 - Memory.
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 14 Instructor: L.N. Bhuyan
1 Sec (2.3) Program Execution. 2 In the CPU we have CU and ALU, in CU there are two special purpose registers: 1. Instruction Register 2. Program Counter.
CS1104: Computer Organisation School of Computing National University of Singapore.
Memory Hierarchy. Since 1980, CPU has outpaced DRAM... CPU 60% per yr 2X in 1.5 yrs DRAM 9% per yr 2X in 10 yrs 10 DRAM CPU Performance (1/latency) 100.
CS 312 Computer Architecture Memory Basics Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki
EEL5708/Bölöni Lec 4.1 Fall 2004 September 10, 2004 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 30 – Caches I After more than 4 years C is back at position number 1 in.
Memory Hierarchy David Kilgore CS 147 Dr. Lee Spring 2008.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 14: Memory Hierarchy Chapter 5 (4.
CS 5513 Computer Architecture Lecture 4 – Memory Hierarchy Review.
CS 152 Computer Architecture and Engineering
Local secondary storage (local disks)
Lec 3 – Memory Hierarchy Review
Lecture 6 Memory Hierarchy
CPSC 614 Computer Architecture Lec 6 – Memory Hierarchy Review
Rose Liu Electrical Engineering and Computer Sciences
CS 5513 Computer Architecture Lecture 4 – Memory Hierarchy Review
Memory Organization.
Electrical and Computer Engineering
ECE 463/563 Fall `18 Memory Hierarchies, Cache Memories H&P: Appendix B and Chapter 2 Prof. Eric Rotenberg Fall 2018 ECE 463/563, Microprocessor Architecture,
Hardware Organization
Presentation transcript:

Memory Hierarchy Design Adapted from CS 252 Spring 2006 UC Berkeley Copyright (C) 2006 UCB

Since 1980, CPU has outpaced DRAM... CPU 60% per yr 2X in 1.5 yrs DRAM 9% per yr 2X in 10 yrs 10 DRAM CPU Performance (1/latency) Year Gap grew 50% per year Q. How do architects address this gap? A. Put smaller, faster “cache” memories between CPU and DRAM. Create a “memory hierarchy”.

1977: DRAM faster than microprocessors Apple ][ (1977) Steve Wozniak Steve Jobs CPU: 1000 ns DRAM: 400 ns

Levels of the Memory Hierarchy CPU Registers 100s Bytes <10s ns Cache K Bytes ns cents/bit Main Memory M Bytes 200ns- 500ns $ cents /bit Disk G Bytes, 10 ms (10,000,000 ns) cents/bit Capacity Access Time Cost Tape infinite sec-min Registers Cache Memory Disk Tape Instr. Operands Blocks Pages Files Staging Xfer Unit prog./compiler 1-8 bytes cache cntl bytes OS 512-4K bytes user/operator Mbytes Upper Level Lower Level faster Larger

Memory Hierarchy: Apple iMac G5 iMac G5 1.6 GHz 07 Reg L1 Inst L1 Data L2 DRA M Disk Size 1K64K32K512K256M80G Latenc y Cycles, Time 1, 0.6 ns 3, 1.9 ns 3, 1.9 ns 11, 6.9 ns 88, 55 ns 10 7, 12 ms Let programs address a memory space that scales to the disk size, at a speed that is usually as fast as register access Managed by compiler Managed by hardware Managed by OS, hardware, application Goal: Illusion of large, fast, cheap memory

iMac’s PowerPC 970: All caches on-chip (1K) R eg ist er s 512K L2 L1 (64K Instruction) L1 (32K Data)

Memory Hierarchy Provides (virtually) unlimited amounts of fast memory. Takes advantage of locality. –Principle of locality (p.47): Programs tend to reuse data and instructions they have used recently. (Temporal and Spatial) –Most programs do not access all code or data uniformly.

The Principle of Locality The Principle of Locality: –Program access a relatively small portion of the address space at any instant of time. Two Different Types of Locality: –Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse) –Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straightline code, array access) Last 15 years, HW relied on locality for speed It is a property of programs which is exploited in machine design.

Programs with locality cache well... Donald J. Hatfield, Jeanette Gerald: Program Restructuring for Virtual Memory. IBM Systems Journal 10(3): (1971) Time Memory Address (one dot per access) Spatial Locality Temporal Locality Bad locality behavior

Multi-Level Memory Hierarchy typical values smaller faster more expensive

Goal: To provide a memory system with cost almost as low as the cheapest level of memory and speed almost as fast as the fastest level All data in one level are found in the level below. (The upper level is a subset of the lower level.) Memory Hierarchy

Why Important? Microprocessor performance improved 35% per year until 1986, and 55% per year since Memory: 7% per year performance improvement in latency Processor-memory performance gap

Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X) –Hit Rate: the fraction of memory access found in the upper level –Hit Time: Time to access the upper level which consists of RAM access time + Time to determine hit/miss Miss: data needs to be retrieve from a block in the lower level (Block Y) –Miss Rate = 1 - (Hit Rate) –Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor Hit Time << Miss Penalty (500 instructions on 21264!) Lower Level Memory Upper Level Memory To Processor From Processor Blk X Blk Y

Cache Measures Hit rate: fraction found in that level –So high that usually talk about Miss rate –Miss rate fallacy: as MIPS to CPU performance, miss rate to average memory access time in memory Average memory-access time = Hit time + Miss rate x Miss penalty (ns or clocks) Miss penalty: time to replace a block from lower level, including time to replace in CPU –access time: time to lower level = f(latency to lower level) –transfer time: time to transfer block =f(BW between upper & lower levels)

Cache Performance Memory stall cycles = Number of misses x Miss penalty = Misses IC x x Miss penalty = Instruction Memory accesses IC x x Miss rate x Miss penalty Instruction

Example Assume we have a computer where the clocks per instruction (CPI) is 1.0 when all memory accesses hit in the cache. The only data accesses are loads and stores, and these total 50% of the instructions. If the miss penalty is 25 clock cycles and the miss rate is 2%, how much faster would the computer be if all instructions were cache hits?

4 Questions for Memory Hierarchy Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy)

Block Placement Direct Mapped –(block address) mod (number of blocks in cache) Fully-associative Set associative –(block address) mod (number of sets in cache) –n-way set associative: n blocks in a set

Block Placement

The vast majority of processor caches today are direct mapped, two-way set associative, or four-way set associative.

Block Identification used to check all the blocks in the set used to select the set address of the desired data within the block address

DECstation 3100 cache block size = one word

64KB Cache Using 4-Word Blocks

Block Size vs. Miss Rate Increasing block size: decreasing miss rate. –The instruction references have better spatial locality. Serious problem with just increasing the block size: miss penalty increases. –Miss penalty: the time required to fetch the block from the next lower level of the hierarchy and load it into the cache. –Miss penalty = (latency to the first word) + (transfer time for the rest of the block)

Block Size vs. Miss Rate

Block Identification If the total cache size is kept the same, increasing associativity –increases the number of blocks per set –decreases the size of the index –increases the size of the tag

Block Replacement With direct-mapped –Only one block frame is checked for a hit, and only that block can be replaced. With fully associative or set associative, –Random –Least-recently used (LRU) –First in, First out (FIFO) See Figure 5.6 on p. 400.

Write Strategy Write through –The information is written to both the block in the cache and to the block in the memory. –Easier to implement, the cache is always clean. Write back –The information is written only to the block in the cache. The modified block is written to main memory only when it is replaced. –Less memory bandwidth, less power Write through with write buffer

What happens on a write? Write-ThroughWrite-Back Policy Data written to cache block also written to lower-level memory Write data only to the cache Update lower level when a block falls out of the cache DebugEasyHard Do read misses produce writes? NoYes Do repeated writes make it to lower level? YesNo Additional option -- let writes to an un-cached address allocate a new cache line (“write-allocate”).

Write Buffers for Write-Through Caches Q. Why a write buffer ? Processor Cache Write Buffer Lower Level Memory Holds data awaiting write-through to lower level memory A. So CPU doesn’t stall Q. Why a buffer, why not just one register ? A. Bursts of writes are common. Q. Are Read After Write (RAW) hazards an issue for write buffer? A. Yes! Drain buffer before next read, or send read 1 st after check write buffers.

Write Miss Policy Write allocate –The block is allocated on a write miss, followed by the write hit. No-write allocate –Write misses do not affect the cache. –Unusual –The block is modified only in the memory.

Exampl Assume a fully associative write-back cache with many cache entries that starts empty. What are the number of hits and misses when using no- write allocate versus write allocate? Write Mem[100]; Read Mem[200]; Write Mem[200]; Write Mem[100];

Write Miss Policy Write-back caches use write allocate normally. Write-through caches often use no-write allocate.

Example Assuming a cache of 16K blocks (1 block = 1 word) and 32-bit addressing, find the total number of sets and the total number of tag bits that are direct-mapped, 2-way set-associative, 4-way set-associative, and fully associative. Assume that this machine is byte-addressable.

Cache Performance Average memory access time = Hit time + Miss rate X Miss penalty