An Introduction to Cache View this presentation in slideshow mode.

Slides:

Advertisements

Similar presentations

Advertisements

1 Lecture 4 CONSTRUCT VALIDITY. 2 Validity A test is said to be VALID if it measures what it is supposed to measure.

SE-292 High Performance Computing

Lecture 10 Instruction Set Architecture

Addition 1’s to 20.

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan

Allocating Memory.

CSC1016 Coursework Clarification Derek Mortimer March 2010.

How caches take advantage of Temporal locality

Characteristics of Computer Memory

Memory Organization.

Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.

Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.

What is an instruction set?

Characteristics of Computer Memory

Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.

CH05 Internal Memory Computer Memory System Overview Semiconductor Main Memory Cache Memory Pentium II and PowerPC Cache Organizations Advanced DRAM Organization.

Faculty of Information Technology Department of Computer Science Computer Organization and Assembly Language Chapter 4 Cache Memory.

CMPE 421 Parallel Computer Architecture

MSJ-1 Alignment Network. MSJ-2 Alignment Network ALU 32 general purpose registers 32 bits memory width − a.k.a., block size (8 bytes, in this example)

Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CS-334: Computer.

TDC 311 The Microarchitecture. Introduction As mentioned earlier in the class, one Java statement generates multiple machine code statements Then one.

2007 Sept. 14SYSC 2001* - Fall SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.

CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

Computer Architecture EKT 422

Chapter 8: System Memory Dr Mohamed Menacer Taibah University

Chapter 9 Memory Organization By Nguyen Chau Topics Hierarchical memory systems Cache memory Associative memory Cache memory with associative mapping.

Cache Why it’s needed: Cost-performance optimizationWhy it’s needed: Cost-performance optimization Why it works: The principle of localityWhy it works:

Memory Hierarchies Sonish Shrestha October 3, 2013.

Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module.

Address alignment When a word (4-bytes) is loaded or stored the memory address must be a multiple of four. This is called an alignment restriction. Addresses.

Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.

COSC2410: LAB 19 INTRODUCTION TO MEMORY/CACHE DIRECT MAPPING 1.

Data representation How do we represent data in a digital system?

CSC 4250 Computer Architectures

CS/COE 1541 (term 2174) Jarrett Billingsley

How will execution time grow with SIZE?

Caches III CSE 351 Autumn 2017 Instructor: Justin Hsia

Cache Memory Presentation I

Caches II CSE 351 Spring 2017 Instructor: Ruth Anderson

A Closer Look at Instruction Set Architectures: Expanding Opcodes

CS61C : Machine Structures Lecture 6. 2

CSCI206 - Computer Organization & Programming

The Von Neumann Model Basic components Instruction processing

Direct Mapping.

Chapter 6 Memory System Design

ECE232: Hardware Organization and Design

How can we find data in the cache?

Memory Organization.

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

Chapter 9 Instruction Sets: Characteristics and Functions

Branch instructions We’ll implement branch instructions for the eight different conditions shown here. Bits 11-9 of the opcode field will indicate the.

CS-447– Computer Architecture Lecture 20 Cache Memories

Data representation How do we represent data in a digital system?

Caches III CSE 351 Winter 2019 Instructors: Max Willsey Luis Ceze

Chapter Five Large and Fast: Exploiting Memory Hierarchy

CS703 - Advanced Operating Systems

Data representation How do we represent data in a digital system?

Presentation transcript:

An Introduction to Cache View this presentation in slideshow mode

MSJ-2 Cache Viewed as a Parking Lot at ERAU Lets consider the parking lot behind King as our cache Suppose we numbered each parking spot although ERAU does not do this, many big parking structures do Further suppose, just for illustrative purposes, that we had exactly 16 parking places in the King lot, numbered in hex with 0 through F (hey, King is an engineering building, people here are supposed to know hex ;-) A B C D E F parking lot

MSJ-3 Parking the Car Legally Now suppose ERAUs parking regulations stated that a faculty member could only use the slot whose number matched the last digit on his or her license plate A B C D E F parking lot Suppose I ask you to go see if my car is in the parking lot and all you know is my license plate number You proceed directly (direct mapped!) to slot number 5 But just because theres a car there doesnt mean its mine; lots of cars have license plate numbers that end in 5 So youll have to check the license plate for the car in slot #5 to see if its mine You proceed directly (direct mapped!) to slot number 5 But just because theres a car there doesnt mean its mine; lots of cars have license plate numbers that end in 5 So youll have to check the license plate for the car in slot #5 to see if its mine Now suppose that the cost of checking the digits on the license plate grows non-linearly with the number of digits (the analogy is getting a bit strained, but it will have to do ;-) Well, you dont need to check all 7 digits; all you have to do is check the first 6 digits (ABC234) You dont need to check the 5, last digit of the license plate, since the car couldnt legally be in slot #5 unless the last digit of the license plate were a 5 Now suppose that the cost of checking the digits on the license plate grows non-linearly with the number of digits (the analogy is getting a bit strained, but it will have to do ;-) Well, you dont need to check all 7 digits; all you have to do is check the first 6 digits (ABC234) You dont need to check the 5, last digit of the license plate, since the car couldnt legally be in slot #5 unless the last digit of the license plate were a 5

MSJ-4 The Parking Lot as a Cache A B C D E F cache This digit is the block frame # that this block can occupy in our (direct mapped) cache I chose this license plate picture from the web since it rather fortuitously had only hex digits in it In a real cache of course, well be looking at binary bits pulled from a physical memory address The bits may or may not line up perfectly on 4 bit nibble boundaries I chose this license plate picture from the web since it rather fortuitously had only hex digits in it In a real cache of course, well be looking at binary bits pulled from a physical memory address The bits may or may not line up perfectly on 4 bit nibble boundaries The license plate is a memory address New York and Empire State are irrelevant to finding my car in the parking lot and parts of a memory address will similarly be irrelevant to how cache works Heres the only information (called the tag) from the license plate that we have to use to check to see if our block is the one in the block frame or if some other block is parked there The parking lot is a direct mapped cache The parking spaces are block frames, the cars are blocks Each block frame can hold exactly one block The parking lot is a direct mapped cache The parking spaces are block frames, the cars are blocks Each block frame can hold exactly one block

MSJ-5tag0x091a2 block frame # 0xb3 tag0x2468a 0x33 Interpreting the Physical Address 0x Heres what that would be in binary Heres how a cache might interpret these bits For example, heres a 32 bit physical address shown in hex The block frame number ( 0x33, in this example) is our parking slot number Just as we ignored New York in our license plate and parking lot example, some of the bits will be ignored by the cache (used by the alignment network, however), Everything else is the tag what you checked when you went to the correct block frame in the parking lot and wanted to see if it was my car that was parked there Only if all the fields were multiples of 4 bits wide would everything line up neatly in hex digits so that, for example, the hex for the tag could be seen in the hex for the overall address as easily as it was in our original license plate example But the size of each field is dictated by cache and memory design parameters and so is often not a multiple of 4 bits Only if all the fields were multiples of 4 bits wide would everything line up neatly in hex digits so that, for example, the hex for the tag could be seen in the hex for the overall address as easily as it was in our original license plate example But the size of each field is dictated by cache and memory design parameters and so is often not a multiple of 4 bits In reality, its the bit patterns that matter, not their hex names; but if we want to talk about these things, hex is a lot simpler to rattle off out loud My point here is that the hex representation for a tag, for example, may not be easily discernible from the hex of the original physical address; we have to look at the bit patterns in isolation, independent of their alignment in the physical address itself E.g., we can see the binary bit pattern for the tag in the binary bit pattern for the address but we dont see 0x2468a in 0x In reality, its the bit patterns that matter, not their hex names; but if we want to talk about these things, hex is a lot simpler to rattle off out loud My point here is that the hex representation for a tag, for example, may not be easily discernible from the hex of the original physical address; we have to look at the bit patterns in isolation, independent of their alignment in the physical address itself E.g., we can see the binary bit pattern for the tag in the binary bit pattern for the address but we dont see 0x2468a in 0x … and then the tag would be changed as well, since its rightmost (least significant) bits were changed, since some were confiscated to make room for the enlarged block frame # E.g., if the cache had more block frames, wed need more bits to hold the block frame number …

MSJ-6 block # a b c d e f a memory width = block size main memory tag content (a block) block frame # cache physical address block # offset block frame # tag Direct Mapped Cache in More Detail The cache breaks up the bits of the block number into two fields: the tag and the block frame # The size of the cache in block frames determines the number of bits needed for the block frame # E.g., if the cache contains 8 block frames, 3 bits (8=2 3 ) will be needed to uniquely specify a block frame # The size of the cache in block frames determines the number of bits needed for the block frame # E.g., if the cache contains 8 block frames, 3 bits (8=2 3 ) will be needed to uniquely specify a block frame # Main memory uses the block number to find the block in memory All the other bits in the address form the tag used by the cache Main memory is organized as a set of sequential blocks A block (a.k.a., a cache line or cache grain) is the quantum of transfer between main memory and cache Even if the CPU wants just a single byte from a byte- addressable memory, main memory will transfer up an entire block Its the alignment network that later pulls out and aligns the part that the CPU actually wants Main memory is organized as a set of sequential blocks A block (a.k.a., a cache line or cache grain) is the quantum of transfer between main memory and cache Even if the CPU wants just a single byte from a byte- addressable memory, main memory will transfer up an entire block Its the alignment network that later pulls out and aligns the part that the CPU actually wants The width of the main memory (i.e., block size) determines the number of bits needed for the offset; e.g., for a block size of 16 bytes, wed need 4 bits to specify the starting position of the bytes the alignment network must extract and align for a CPU register An instructions opcode (e.g., LB, for load byte, LW for load word) specifies the number of bytes required Only the alignment network uses the byte offset field of a physical address; its not used by either the main memory or the cache The width of the main memory (i.e., block size) determines the number of bits needed for the offset; e.g., for a block size of 16 bytes, wed need 4 bits to specify the starting position of the bytes the alignment network must extract and align for a CPU register An instructions opcode (e.g., LB, for load byte, LW for load word) specifies the number of bytes required Only the alignment network uses the byte offset field of a physical address; its not used by either the main memory or the cache When the cache gets a request for a block not currently in the cache (well see how this decision is made in just a minute), memory is told to send up the requested block which is then placed in the designated block frame (parking slot) The physical address of a requested item in memory controls the operation of the memory hierarchy An address is interpreted differently by main memory and cache The physical address of a requested item in memory controls the operation of the memory hierarchy An address is interpreted differently by main memory and cache The cache (our parking lot) is a set of block frames; each of which is analogous to a numbered slot in our parking lot Each block frame can contain: A single block of memory (analogous to our car), and The tag of that block (the leading digits of a license plate) The cache (our parking lot) is a set of block frames; each of which is analogous to a numbered slot in our parking lot Each block frame can contain: A single block of memory (analogous to our car), and The tag of that block (the leading digits of a license plate) The cache extracts the tag from the address and places it in the tag portion of the block frame Presented with a physical address, the cache determines if the requested block is in cache by going to the block frame and comparing the tag of the requested block with the tag of the resident block (if any) Cache hit: If they match, the block is sent to the alignment network which uses the offset to extract the requested bytes from the block and align them properly for the destination CPU register Cache miss: If the tags dont match, cache tells main memory to send up the requested block and then places it in its block frame, overwriting any block that used to be there, and placing the new blocks tag alongside it in the frame Presented with a physical address, the cache determines if the requested block is in cache by going to the block frame and comparing the tag of the requested block with the tag of the resident block (if any) Cache hit: If they match, the block is sent to the alignment network which uses the offset to extract the requested bytes from the block and align them properly for the destination CPU register Cache miss: If the tags dont match, cache tells main memory to send up the requested block and then places it in its block frame, overwriting any block that used to be there, and placing the new blocks tag alongside it in the frame