Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
Quantum Computing II CPSC 321 Andreas Klappenecker.
Review for Midterm 2 CPSC 321 Computer Architecture Andreas Klappenecker.
Lecture 41: Review Session #3 Reminders –Office hours during final week TA as usual (Tuesday & Thursday 12:50pm-2:50pm) Hassan: Wednesday 1pm to 4pm or.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Unit-4 (CO-MPI Autonomous)
Maninder Kaur CACHE MEMORY 24-Nov
CMPE 421 Parallel Computer Architecture
9.2 Pipelining Suppose we want to perform the combined multiply and add operations with a stream of numbers: A i * B i + C i for i =1,2,3,…,7.
Introduction to Embedded Systems Rabie A. Ramadan 3.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
Computer Architecture Lecture 26 Fasih ur Rehman.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.
CSE Advanced Computer Architecture Week-1 Week of Jan 12, 2004 engr.smu.edu/~rewini/8383.
Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 CSCI 2510 Computer Organization Memory System II Cache In Action.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
Lecture 20 Last lecture: Today’s lecture: Types of memory
Introduction to Computer Organization Pipelining.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
CAM Content Addressable Memory
Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module.
CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Memory Hierarchy and Cache. A Mystery… Memory Main memory = RAM : Random Access Memory – Read/write – Multiple flavors – DDR SDRAM most common 64 bit.
Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of.
Advanced Architectures
Cache Memory.
COSC3330 Computer Architecture
Computer Architecture Chapter (14): Processor Structure and Function
CS2100 Computer Organization
William Stallings Computer Organization and Architecture 8th Edition
Multilevel Memories (Improving performance using alittle “cash”)
5.2 Eleven Advanced Optimizations of Cache Performance
Consider a Direct Mapped Cache with 4 word blocks
Pipeline Implementation (4.6)
William Stallings Computer Organization and Architecture 7th Edition
CDA 3101 Spring 2016 Introduction to Computer Organization
Morgan Kaufmann Publishers The Processor
Module IV Memory Organization.
Module IV Memory Organization.
Control unit extension for data hazards
CS 704 Advanced Computer Architecture
Control unit extension for data hazards
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Control unit extension for data hazards
Presentation transcript:

Computer Orgnization Rabie A. Ramadan Lecture 9

Cache Mapping Schemes

Cache memory is smaller than the main memory Only few blocks can be loaded at the cache The cache does not use the same memory addresses Which block in the cache is equivalent to which block in the memory? The processor uses Memory Management Unit (MMU) to convert the requested memory address to a cache address

Direct Mapping Assigns cache mappings using a modular approach j = i mod n j cache block number i memory block number n number of cache blocks Memory Cache

Example Given M memory blocks to be mapped to 10 cache blocks, show the direct mapping scheme? How do you know which block is currently in the cache?

Direct Mapping (Cont.) Bits in the main memory address are divided into three fields. Word  identifies specific word in the block Block  identifies a unique block in the cache Tag  identifies which block from the main memory currently in the cache

Example Consider, for example, the case of a main memory consisting of 4K blocks, a cache memory consisting of 128 blocks, and a block size of 16 words. Show the direct mapping and the main memory address format? Tag

Example (Cont.)

Direct Mapping Advantage Easy Does not require any search technique to find a block in cache Replacement is a straight forward Disadvantages Many blocks in MM are mapped to the same cache block We may have others empty in the cache Poor cache utilization

Group Activity Consider, the case of a main memory consisting of 4K blocks, a cache memory consisting of 8 blocks, and a block size of 4 words. Show the direct mapping and the main memory address format?

Given the following direct mapping chart, what is the cache and memory location required by the following addresses:

Fully Associative Mapping Allowing any memory block to be placed anywhere in the cache A search technique is required to find the block number in the tag field

Example We have a main memory with 2 14 words, a cache with 16 blocks, and blocks is 8 words. How many tag & word fields bits? Word field requires 3 bits Tag field requires 11 bits  2 14 /8 = 2048 blocks

Which MM block in the cache? Naïve Method: Tag fields are associated with each cache block Compare tag field with tag entry in cache to check for hit. CAM (Content Addressable Memory) Words can be fetched on the basis of their contents, rather than on the basis of their addresses or locations. For example: Find the addresses of all “Smiths” in Dallas.

Fully Associative Mapping Advantages Flexibility Utilizing the cache Disadvantage Required tag search Associative search  Parallel search Might require extra hardware unit to do the search Requires a replacement strategy if the cache is full Expensive

N-way Set Associative Mapping Combines direct and fully associative mapping The cache is divided into a set of blocks All sets are the same size Main memory blocks are mapped to a specific set based on : s = i mod S s specific to which block i mapped S total number of sets Any coming block is assigned to any cache block inside the set

N-way Set Associative Mapping Tag field  uniquely identifies the targeted block within the determined set. Word field  identifies the element (word) within the block that is requested by the processor. Set field  identifies the set

N-way Set Associative Mapping

Group Activity Compute the three parameters (Word, Set, and Tag) for a memory system having the following specification: Size of the main memory is 4K blocks, Size of the cache is 128 blocks, The block size is 16 words. Assume that the system uses 4-way set- associative mapping.

Answer

N-way Set Associative Mapping Advantages : Moderate utilization to the cache Disadvantage Still needs a tag search inside the set

If the cache is full and there is a need for block replacement, Which one to replace?

Cache Replacement Policies Random Simple Requires random generator First In First Out (FIFO) Replace the block that has been in the cache the longest Requires keeping track of the block lifetime Least Recently Used (LRU) Replace the one that has been used the least Requires keeping track of the block history

Cache Replacement Policies (Cont.) Most Recently Used (MRU) Replace the one that has been used the most Requires keeping track of the block history Optimal Hypothetical Must know the future

Example Consider the case of a 4X8 two-dimensional array of numbers, A. Assume that each number in the array occupies one word and that the array elements are stored column-major order in the main memory from location 1000 to location The cache consists of eight blocks each consisting of just two words. Assume also that whenever needed, LRU replacement policy is used. We would like to examine the changes in the cache if each of the direct mapping techniques is used as the following sequence of requests for the array elements are made by the processor:

Array elements in the main memory

Conclusion 16 cache miss No single hit 12 replacements Only 4 cache blocks are used

Group Activity Do the same in case of fully and 4-way set associative mappings ?

Pipelining

BasicIdea Basic Idea  Assembly Line  Divide the execution of a task among a number of stages  A task is divided into subtasks to be executed in sequence  Performance improvement compared to sequential execution

Pipeline Job 1 2 m tasks 1 2 n Pipeline Stream of Tasks

5 Tasks on 4 stage pipeline Task 1 Task 2 Task 3 Task 4 Task Time

Speedup t t t 1 2 n Pipeline Stream of m Tasks T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Speedup = (n *m)/(n + m -1)

Efficiency t t t 1 2 n Pipeline Stream of m Tasks T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Efficiency = Speedup/ n =m/(n+m-1)

Throughput t t t 1 2 n Pipeline Stream of m Tasks T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Throughput = no. of tasks executed per unit of time = m/((n+m-1)*t)

Instruction Pipeline  Pipeline stall  Some of the stages might need more time to perform its function.  E.g. the pipeline stalls after I 2  This is called a “Bubble” or “pipeline hazard”

Example  Show a Gantt chart for 10 instructions that enter a four-stage pipeline (IF, ID, IE, and IS)?  Assume that I 5 fetching process depends on the results of the I 4 evaluation.

Answer

Example Delay due to branch

Pipeline and Instruction Dependency Instruction Dependency The operation performed by a stage depends on the operation(s) performed by other stage(s). E.g. Conditional Branch  Instruction I 4 can not be executed until the branch condition in I 3 is evaluated and stored.  The branch takes 3 units of time

Pipeline and Data Dependency  Data Dependency:  A source operand of instruction I i depends on the results of executing a proceeding I j i > j  E.g.  I j can not be fetched unless the results of I i are saved.

Example  ADD R 1, R 2, R 3 R 3  R 1 + R 2  I i  SL R 3, R 3  SL( R 3 )  I i+1  SUB R 5, R 6, R 4 R 4  R 5 – R 6  I i+2  Assume that we have five stages in the pipeline:  IF (Instruction Fetch)  ID (Instruction Decode)  OF (Operand Fetch)  IE (Instruction Execute)  IS (Instruction Store) Show a Gantt chart for this code? Shift Left

Answer  R 3 in both I i and I i+1 needs to be written  Therefore, the problem is a Write after Write Data Dependency

Stalls Due to Data Dependency  Write after write  Read after write  Write after read  Read after read  does not cause stall

Read after write

Example Consider the execution of the following sequence of instructions on a five-stage pipeline consisting of IF, ID, OF, IE, and IS. It is required to show the succession of these instructions in the pipeline. Show all types of data dependency? Show the speedup and efficiency?

Answer