Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.

Slides:

Advertisements

Similar presentations

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

Lecture 8: Memory Hierarchy Cache Performance Kai Bu

Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.

Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

Chapter 7 Large and Fast: Exploiting Memory Hierarchy Bo Cheng.

331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.

Memory Organization.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

1  2004 Morgan Kaufmann Publishers Chapter Seven.

1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.

1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.

DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.

Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.

Computing Systems Memory Hierarchy.

Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.

Systems I Locality and Caching

CMPE 421 Parallel Computer Architecture

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

Lecture 19 Today’s topics Types of memory Memory hierarchy.

How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.

10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.

The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.

CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.

Computer Organization & Programming

Lecture 08: Memory Hierarchy Cache Performance Kai Bu

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.

CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.

CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

1  1998 Morgan Kaufmann Publishers Chapter Seven.

1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:

1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.

The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots.

Introduction to computer architecture April 7th. Access to main memory –E.g. 1: individual memory accesses for j=0, j++, j

Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.

CMSC 611: Advanced Computer Architecture

COSC3330 Computer Architecture

Yu-Lun Kuo Computer Sciences and Information Engineering

The Goal: illusion of large, fast, cheap memory

Cache Memory Presentation I

Morgan Kaufmann Publishers Memory & Cache

Morgan Kaufmann Publishers

Lecture 08: Memory Hierarchy Cache Performance

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Adapted from slides by Sally McKee Cornell University

CMSC 611: Advanced Computer Architecture

Lecture 20: OOO, Memory Hierarchy

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Memory & Cache.

10/18: Lecture Topics Using spatial locality

Presentation transcript:

Caches Hiding Memory Access Times

PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY ACCESS WRITE BACK ADDADD ADDADD ALUALU

PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY ACCESS WRITE BACK ADDADD ADDADD ALUALU

Memory  Main Memory (DRAM) much slower than processors  Secondary memory even slower  Gaps have widened over the past fifteen years, continues to widen  Memory hierarchy used to hide memory access latency

Memory Hierarchy Cache on chip Second Level Cache Main Memory Flash Memory Third Level Cache SRAM DRAM Secondary Memory (Hard Disk)

Memory Technology (2008) Type Speed (access time) cost per GB SRAM nanosecs $2-5,000 DRAM nanosecs $20-$75 Mag Disk 5-20 x 10^6 ns $0.20-$2.00 Flash(NOR) 80-10,000 ns $65 Flash(NAND) 25, ,000 ns $4

Why does a memory hierarchy work? 1.Spatial locality Items to be used next are likely to be near items just used Code is sequential Arrays and records are contiguous in memory 2.Temporal Locality Items just used are likely to be used again, soon Loops in code Multiple accesses to same data items

Structure of a memory hierarchy Information in memory is moved in blocks –Usually more than one word at a time –Called “blocks”, “lines”, or “pages” Location in unit –Determined by some of the address bits –Anywhere in the set of blocks Address of a word/byte is split –Upper bits are the tag and are stored with the block –Middle bits address the line or block where stored –Lower bits address the word within the block and byte within the word

Structure of a memory hierarchy Items in a higher level of hierarchy (closer to the processor) are also in the lower level If items are changed in the higher level, then the lower level must also be changed at some point When space in the higher level runs out, must have a rule for removing items to make more space

Memory Level Issues  Where does true image of data/code reside? –Higher level contains working copies, true image is in lower level of memory But higher level may contain the true image for a while until lower level is updated May take time for a changed value to percolate down to main memory level –Virtual memory (main memory vs hard disk) True image may be in main memory or in secondary memory -- really a point of view

Terminology for levels of Memory Registers – part of the processor Main memory – active memory in machine –Typically DRAM which loses image when off Cache – memory between main and processor –Current processors have two or three levels of cache –Levels 1 and 2 usually on chip, 3 sometimes on chip Virtual memory –Stored on hard disk –Used to make available memory larger than physical –Used to provide separate memory images for separate processes

Cache Issues  How is cache organized and addressed?  When cache is written to, how is memory image updated?  When cache is full and new items need to be put in the cache -- what is removed and replaced?

Cache, Example 1  256 Byte Cache (64 4-byte words)  Each Cache “line” or “block” holds one word (4 bytes)  Byte in cache is addressed by lowest two bits of address  Cache line is addressed by next six bits in address  Each Cache line has a “tag” matching the high 24 bits of the memory address

Byte Address

Address

Byte Address Address

Cache Access 1. Find Cache line address (bits 2 - 7) 2. Compare tag to high 24 bits –if matched, cache hit »read word or find Byte address, read or write item –if not matched, cache miss, go to memory »for a read: retrieve item and write to cache, then use »for a write: write to memory and to cache 3. Direct mapped cache -- every address can only go to one cache line! 4. What happens when cache is written to?

Write Policy  Write Through –Write to memory and to cache –Time to write to memory could delay instruction »write buffer can hide this latency  Write Back (also called Copy Back) –Write only to cache »mark cache line as “dirty”, using an additional bit –When cache line is replaced, if dirty, then write back to memory

Accelerating Memory Access  Bus Bandwidth limited –Wider bus, or burst mode  Memory width limited –Wider memory access  Memory Address limited –Burst mode access »one address to retrieve several successive words from memory

Improving memory to cache throughput

Accelerating Memory Access  How can Cache take advantage of faster memory access?  Store more than one word at a time on each “line” in the cache –Any cache miss brings the whole line containing the item into the cache  Takes advantage of spatial locality –next item needed is likely to be at the next address

Cache with multi-word line  256 Byte cache byte words  Each block (line) contains four words (16 bytes) –2 bits to address byte in word –2 bits to address word in line  Cache contains sixteen four-word blocks –4 bits to address cache block (line)  Each cache line has tag field for upper 24 bits of address

Address

MUX Hit To Control Data

PC Instruction Cache 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Cache MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY ACCESS WRITE BACK ASDUDBASDUDB ADDADD ALUALU Mem Addr Mem Data Hit Mem Addr Mem Data Hit

Instruction Cache Hit / Miss  Hit or Miss: –Instruction is fetched from Cache and placed in Pipeline buffer register –PC is latched into Memory Address Register  Hit: –Control sees hit, execution continues –Mem Addr unused

Instruction Cache Hit / Miss  Miss –Control sees miss, execution stalls –PC reset to PC - 4 –Values fetched from registers are unused –Memory Read cycle started, using Mem Addr  Memory Read completes –Value stored in cache, new tag written –Instruction execution restarts, cache hit

Set Associative Cache  Direct Mapped Cache –Misses caused by collisions -- two address with same cache line  Set Associative –Two or more (power of 2) lines for each address –More than one item with same cache line address can be in cache –Check means tags for all lines in set must be checked, one which matches yields hit, if none match, a miss

line address Two-way set associative cache

Hit Address Data Word Address tag Byte Address Valid MUX

4-way Set Associative (from book)

Cache Summary - types  Direct Mapped –Each line in cache takes one address –Line size may accommodate several words  Set Associative –Sets of lines serve the same address –Needs replacement policy for which line to purge when set is full –More flexible, but more complex –Fully Associative (one set)

Cache Summary  Cache Hit –Item is found in the cache –CPU continues at full speed –Need to verify valid and tag match  Cache Miss –Item must be retrieved from memory –Whole Cache line is retrieved –CPU stalls for memory access Out-or-order execution? Switch to another thread?

Cache Summary  Write Policies –Write Through (always write to memory) –Write Back (uses “dirty” bit)  Associative Cache Replacement Policy –LRU (Least Recently Used) –Random