The Goal: illusion of large, fast, cheap memory

Slides:

Advertisements

Similar presentations

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

Advertisements

Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014

CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.

11/8/2005Comp 120 Fall November 9 classes to go! Read Section 7.5 especially important!

CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.

The Memory Hierarchy CPSC 321 Andreas Klappenecker.

1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.

331 Week13.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

Memory Chapter 7 Cache Memories.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

The Memory Hierarchy II CPSC 321 Andreas Klappenecker.

331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.

Caching I Andreas Klappenecker CPSC321 Computer Architecture.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

1  2004 Morgan Kaufmann Publishers Chapter Seven.

1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.

Processor Design 5Z032 Memory Hierarchy Chapter 7 Henk Corporaal Eindhoven University of Technology 2009.

1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.

Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.

DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.

Computing Systems Memory Hierarchy.

CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.

CMPE 421 Parallel Computer Architecture

CS1104: Computer Organisation School of Computing National University of Singapore.

EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff Case.

Lecture 14 Memory Hierarchy and Cache Design Prof. Mike Schulte Computer Architecture ECE 201.

Lecture 19 Today’s topics Types of memory Memory hierarchy.

EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and

King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.

1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.

EEL5708/Bölöni Lec 4.1 Fall 2004 September 10, 2004 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy.

1010 Caching ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.

The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.

Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.

CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

1  1998 Morgan Kaufmann Publishers Chapter Seven.

1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:

Improving Memory Access 2/3 The Cache and Virtual Memory

What is it and why do we need it? Chris Ward CS147 10/16/2008.

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

CMSC 611: Advanced Computer Architecture

COSC3330 Computer Architecture

Computer Organization

Yu-Lun Kuo Computer Sciences and Information Engineering

Improving Memory Access 1/3 The Cache and Virtual Memory

Systems Architecture II

November 14 6 classes to go! Read

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

CMSC 611: Advanced Computer Architecture

EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007

Morgan Kaufmann Publishers Memory Hierarchy: Introduction

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Cache - Optimization.

Memory & Cache.

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and fast (most of the time)? Hierarchy Parallelism

Types of Memories SRAM: DRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be refreshed) very small but slower than SRAM (factor of 5 to 10)

Exploiting Memory Hierarchy Users want large and fast memories! SRAM access times are 2 - 25ns at cost of $100 to $250 per Mbyte. DRAM access times are 60-120ns at cost of $5 to $10 per Mbyte. Disk access times are 10 to 20 million ns at cost of $.10 to $.20 per Mbyte. Try and give it to them anyway build a memory hierarchy 1997

An Expanded View of the Memory System Processor Control Memory Memory Memory Datapath Memory Memory Instead, the memory system of a modern computer consists of a series of black boxes ranging from the fastest to the slowest. Besides variation in speed, these boxes also varies in size (smallest to biggest) and cost. What makes this kind of arrangement work is one of the most important principle in computer design. The principle of locality. +1 = 7 min. (X:47) Slowest Speed: Fastest Biggest Size: Smallest Lowest Cost: Highest

Memory Hierarchy: How Does it Work? Temporal Locality (Locality in Time): => Keep most recently accessed data items closer to the processor Spatial Locality (Locality in Space): => Move blocks consisting of contiguous words to the upper levels How does the memory hierarchy work? Well it is rather simple, at least in principle. In order to take advantage of the temporal locality, that is the locality in time, the memory hierarchy will keep those more recently accessed data items closer to the processor because chances are (points to the principle), the processor will access them again soon. In order to take advantage of the spatial locality, not ONLY do we move the item that has just been accessed to the upper level, but we ALSO move the data items that are adjacent to it. +1 = 15 min. (X:55) Lower Level Memory Upper Level To Processor From Processor Blk X Blk Y

Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: Present the user with as much memory as is available in the cheapest technology. Provide access at the speed offered by the fastest technology. Processor Control Tertiary Storage (Disk) Secondary Storage (Disk) Second Level Cache (SRAM) Main Memory (DRAM) The design goal is to present the user with as much memory as is available in the cheapest technology (points to the disk). While by taking advantage of the principle of locality, we like to provide the user an average access speed that is very close to the speed that is offered by the fastest technology. (We will go over this slide in details in the next lecture on caches). +1 = 16 min. (X:56) Datapath On-Chip Cache Registers Speed (ns): 1s 10s 100s 10,000,000s (10s ms) 10,000,000,000s (10s sec) Size (bytes): 100s Ks Ms Gs Ts

How is the hierarchy managed? Registers <-> Memory by compiler (programmer?) cache <-> memory by the hardware memory <-> disks by the hardware and operating system (virtual memory) by the programmer (files)

Hits vs. Misses Read hits this is what we want! Read misses stall the CPU, fetch block from memory, deliver to cache, restart Write hits: can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache later) Write misses: read the entire block into the cache, then write the word

Cache Two issues: Our first example: block size is one word of data How do we know if a data item is in the cache? If it is, how do we find it? Our first example: block size is one word of data "direct mapped" For each item of data at the lower level, there is exactly one location in the cache where it might be. e.g., lots of items at the lower level share locations in the upper level

Direct Mapped Cache Mapping: address is modulo the number of blocks in the cache

Direct Mapped Cache For MIPS What kind of locality are we taking advantage of?

Direct Mapped Cache Taking advantage of spatial locality:

Choosing a block size Large block sizes help with spatial locality, but... It takes time to read the memory in Larger block sizes increase the time for misses It reduces the number of blocks in the cache Number of blocks = cache size/block size Need to find a middle ground 16-64 bytes works nicely Use split caches because there is more spatial locality in code

Performance Simplified model: execution time = (execution cycles + stall cycles) ´ cycle time stall cycles = # of instructions ´ miss ratio ´ miss penalty Two ways of improving performance: decreasing the miss ratio decreasing the miss penalty What happens if we increase block size?

Decreasing miss ratio with associativity

Fully Associative vs. Direct Mapped Fully associative caches provide much greater flexibility Nothing gets “thrown out” of the cache until it is completely full Direct-mapped caches are more rigid Any cached data goes directly where the index says to, even if the rest of the cache is empty A problem, though... Fully associative caches require a complete search through all the tags to see if there’s a hit Direct-mapped caches only need to look one place

A Compromise 7.3 2-Way set associative 4-Way set associative 0: 1: 2: 3: 4: 5: 6: 7: V Tag Data 0: 1: 2: 3: V Tag Data Each address has four possible locations with the same index Each address has two possible locations with the same index One fewer index bit: 1/2 the indexes Two fewer index bits: 1/4 the indexes Address = Tag | Index | Block offset Address = Tag | Index | Block offset 7.3

Decreasing miss penalty with multilevel caches Add a second level cache: often primary cache is on the same chip as the processor use SRAMs to add another cache above primary memory (DRAM) miss penalty goes down if data is in 2nd level cache Example: CPI of 1.0 on a 500Mhz machine with a 5% miss rate, 200ns DRAM access Adding 2nd level cache with 20ns access time decreases miss rate to 2% Using multilevel caches: try and optimize the hit time on the 1st level cache try and optimize the miss rate on the 2nd level cache