IT253: Computer Organization Lecture 11: Memory Tonga Institute of Higher Education.

IT253: Computer Organization Lecture 11: Memory Tonga Institute of Higher Education

The Big Picture

What is Memory? (Review) A large, linear array of bytes. –Each byte has it’s own address in memory Most ISA’s have commands that do byte addressing (the addresses start every 8 bits) Data is aligned on the word boundary. –This means things like integers, characters, instructions are 32 bits long (1 word)

How we think of memory now When we built our processor we needed to pretend memory worked very simply so that we could get instructions and data from it

What do we really need for memory? We need four parts for our memory –The cache, which is the fastest memory, that the processor will use directly –The memory bus and I/O bus –Memory (or RAM) –Hard Disks

Part I: Inside the Processor The processor will use an internal cache (inside the processor) and an external cache that is nearby This is called a two- level cache If things can’t be saved in the cache, it goes to main memory

Part II: Main Memory Main memory is the RAM in the computer. It is often called DRAM (dynamic random access memory)

Memory Types Explained RAM – Random Access Memory –Random – access all the locations at the any time –DRAM – dynamic RAM High density, cheap, slow, low power usage Dynamic means it needs to be “refreshed”. This is the main memory –SRAM – static RAM Low Density, high power, expensive, fast Static – memory will last forever (until power cuts off) Caches are made out of this Non-Random Access Memory –Some memory technology is sequential (like a tape). You need to go through a lot of memory to find the spot you want.

RAM What's important to know about RAM? –Latency – time it takes for a word to be read from memory –Bandwidth – average words read per second If a programmer can fit his whole program in the size of the cache, it will be much faster. Every time the CPU goes to the RAM, it must wait a long time to get the data. We can make our programs faster if all the instructions stay inside cache

SRAM We can make a SRAM circuit (one that does not need to be refreshed) with 6 transistors. Then we can put together the SRAM to make a bigger SRAM This is a 16 word SRAM diagram. It can be accessed with 4 bits. 2^4 = 16 Each SRAM cell will hold 8 bits

The SRAM diagram Like everything else, we can draw one simple box to describe an SRAM WE_L – Write Enable OE_L – Output Enable We need Output Enable and Write enable because we are using the D bus to do both the input and the output. This is to save space inside the processor A is the address that we are either writing to or outputting to. The number of bits depends on how many words are inside the SRAM

DRAM What we know about DRAM –Needs to be refreshed regularly –Holds a lot of data in small space –Uses very little power –Has Output Enable –Has Write Enable

The 1-transistor DRAM memory To save a single bit, we need just 1 transistor To Write: –Select row, put bit on the bit line To Read: –Select row, read what comes on bit line. (Only very few electrons) –Then rewrite value, because the charge of electricity left during the read To Refresh: –Just do a read that will rewrite value

Simple DRAM grouping The DRAM cells are put together in an array, where it is possible to access one bit at a time

Complicated DRAM grouping The real way DRAM is put together is in layers. Usually, 8 layers will be put together and the row and column numbers will go to all the layers and will return 8 bits (1 byte) at a time Example: –2 MB DRAM = 256K x 8 layers –512 rows x 512 columns x 8 planes –512x512 = 256,000 (256K)

Diagram for RAM RAS_L = If this is 1 then A contains the row address CAS_L = If this is 1 then A contains column address WE_L = write enable OE_L = output enable D = the data that will be either inputted or outputted. (To save space, we use the same line for input and output)

DRAMs through History Fast Page DRAM – this type of DRAM allowed selecting memory through rows and columns and was able to automatically get the next byte, saving time. It was introduced in 1992 for PCs. Synchronous DRAM (SDRAM) – gives a clock signal to the RAM, so that it can "pipeline" data, meaning it can send more than one piece of data at a time. Introduced in 1997 and is very common Dual Data Rate RAM (DDR-RAM) – can transfer data two times during a clock cycle. Introduced in 2000 and is used in all new computers Rambus DRAM (RDRAM) – Uses a special method of signalling that allows for faster clock speeds, but is made only by the Rambus company. Introduced in 2001, it was popular for a short time, before Intel refused to support it

Summary of DRAM and SRAM DRAM –Slow, cheap, low power. –Good for giving user a lot of memory at a low price –Uses 1 transistor to save one bit SRAM –Fast, expensive, uses power –Good for people who need speed –Uses 6 transistors to save one bit

Caches Why do we want a cache? If DRAM is slow and SRAM is fast, then we can make the average access time to memory very small if most of the accesses are in SRAM We can use SRAM to make a memory that works very quickly (the cache)

Different Levels of Memory THE MEMORY HIERARCHY

Cache Ideas: Locality Locality – the idea that most of the things you need are close by to you 90 percent of the time, you will be using 10 percent of the code Two types of locality: –Temporal – The locality of time – if something is used, it will be used again in the near future –Spatial – The locality of space – if something is used, then things that are near it will probably be used as well

How the levels work together The levels of memory are always working together to keep moving memory closer to the fastest level (the cache). The levels copy data between themselves Block – a block is the smallest piece of data that will be copied between levels

The Memory Hierarchy Hit – the data that is wanted is in the memory level we are searching –(example in picture is Block X) –Hit Rate – fraction of time that we find the data we want in the memory level –Hit Time – the time it takes to get a piece of data from the higher level into processor Miss – data is not in the higher level. The data needs to come from the lower level –Miss Rate = 1 – Hit Rate –Miss Penalty = the time it takes to load data from lower level into higher level and send to processor

A simple cache: Direct Mapped The first spot in a cache index will be from the beginning of a word. The next 4 cache indexes will automatically be the next 4 bytes from the main memory. Thus we are using 1-byte blocks in the cache index

Direct Mapped Cache A direct mapped cache – a cache of fixed size blocks. Each block holds data from main memory Parts of a direct mapped cache –Data – the actual data –Tag – special number for each block –Index – spot in the cache that holds the data Parts of a direct mapped cache address –Tag Array – list of tags that identify what's in the cache A Tag will tell us if the data we are looking for is in the cache Each cache entry will have a special, unique tag. If that tag is not in the cache, then we know that it is a miss and we need to get it from main memory –Cache Index – the location of a block in the cache –Block Offset – byte location in the cache block

Direct Mapped Caches The processor will use addresses that link into the cache. The address will have special parts, just like instruction formatting. With the different pieces of the address we can figure out where to find the data in the cache If the cache is 2 M bytes (in size) and the block size is 2 L, then there are 2 (M-L) blocks –If we use 32-bit addresses then: –Lowest L bits are for block offset –Next (M-L) bits are for Cache-Index –The last (32-M) bits are for Tag bits (tag holds address of data in cache)

Direct Mapped Cache Example Example: 1 KB cache with 32 byte blocks –Cache-Index = (Address % 1024) / 32 –Block Offset = Address % 32 –Tag = Address / 1024 (tag holds address of data in cache) –Valid Bit – says if the data in the cache is good, or if its bad 32 cache blocks * 32 byte blocks = 1024 bytes = 1 KB cache

Direct Mapped Cache Example Cache tag will check to see if the cache entry is actually In the cache or if it is not. If it is not, we get it from RAM

Direct Mapped Cache Example Example of a Cache Miss

Direct Mapped Cache Example A Cache Hit

The Block Size Decision The goal is to find the right block size so that you will get mostly cache hits. But also, if you miss, the penalty will not be that bad Larger block size – better spatial locality –But takes longer to put a new one into cache –If block size is too big, there are too few blocks in the cache and you will get many misses again

A Better Cache: Associative Cache An N-Way Set Associative Cache works differently from the direct mapped cache. In the N-Way Set, there are N entries for each cache index, so it is like N direct mapped caches at the same time All the entries in one set are selected and then only the one with the correct Cache Tag is chosen

Pros and Cons: Set Associative Cache The set associative cache gives us many benefits –Higher hit rate for same size cache –Fewer conflict misses –Can have a larger cache, but not change the number of bits used for cache index But there are also bad things –You need to compare N things to choose which is the right piece of data (so we get a time delay for a MUX) –The data is only available to use after we decide if it’s a hit or a miss (With direct mapped, we can assume it’s a hit and if it’s not, then fix the mistake)

Cache Questions Draw a 32 KB cache with 4 byte blocks that is 2 way set associative If you have a 256 byte direct mapped cache with 16 byte blocks, and you have the following tags in your tag array, choose which address will result in a hit in the cache: Tag array: Index 0 = 0xEF4021, Index 1 = 0xEF4022, Index 2 = 0x430322, Index 3 = 0x320933, Index 4 = 0xA34E44 1.0x43032263 2.0x43032202 3.0xEF402114 4.0xA34E4441 5.0x32093301

Sources for Cache Misses What can cause a cache miss? –Compulsory: When you start a computer, all the data in the cache is no good (also called ‘Cold Start’). Nothing we can do about it –Conflict: Multiple memory locations mapped to same cache spot You can increase cache size, or increase associativity –Capacity: Cache cannot contain all blocks needed by a program. Increase cache size –Invalidation: Something else changes the data (like some sort of input)

A Simple Chart for Cache misses

Replacing Blocks in Cache We need a way to decide how to replace blocks in cache. –For a direct mapped cache, there is no policy, because we just throw away the block that is in it’s place For a N-Way Set Associative cache, we have N blocks to choose from to throw away, because we’ll need to make room for the new block This is called the Cache Block Replacement Policy

Cache Block Replacement Policy Random Replacement - hardware randomly selects a block to throw out First in, First Out (FIFO) – Hardware keeps a list of what came into the cache in what order. It will then throw out what came first Least Recently Used (LRU) – Hardware keeps track of when each block was used. The one that has not been used for the longest is deleted

Cache Write Policy There are a few ways we can write data to the cache as well Our problem is that we need to keep data in the memory and the cache the same Two options to do this: –Write Back: store data only in cache. When cache block is replaced, move back to memory. Only one copy. We must use special controls to make sure we don't make mistakes –Write Through: Write to memory and to cache at the same time. We use a small buffer that will save copies of things before they get written to main memory, because it may take longer to write to main memory than it does to the cache.

Questions for the memory hierarchy Designers of memory systems need to know the answers to these questions before they start building 1.Where is a block placed in the upper level of memory? –(Block Placement) 2.How is a block found if it’s in the upper level? –(Block Identification) 3.Which block should be replaced on a miss? –(Block Replacement) 4.What happens on a write? –(Write Strategy)

Cache Performance CPU time = (CPU execution clock cycles + Memory Stall clock cycles) x Clock cycle time Memory Stall clock cycles = Memory accesses x Miss Rate x Miss Penalty We can figure out how well our cache will work with formulas like these –Example: If 1 instruction takes one clock cycle Miss penalty = 20 cycles Miss rate = 10% And there are a 1000 instructions and 300 memory accesses) –Then Memory Stall clock cycles = (300 *.10 * 20) = 600 cycles CPU time = (1000 + 600) * 1 = 1,600 cycles to do 1,000 instruction This means we are spending 37.5% of our time on memory access!!!!

How to improve cache performance Reduce miss rate –Remember 4 reasons for miss Compulsory (at first, there is no memory in cache, all bad) Capacity (can’t fit everything inside of the cache) Conflict (the stuff in the cache is not the right stuff we want) Invalidation (nothing we can do about this) Reduce miss penalty Reduce time for a hit in the cache So can we improve cache performance with our programming?? Yes!

Ways to improve Cache performance with programming With instructions –Loop interchange – change nesting of loops to access data in ways that will use the cache wisely –Combining Loops – Combine two loops that have much of same data and some of the same variables With data in memory –Merging arrays – putting arrays together. Use 1 array of an object that can hold two types of data instead of two arrays, each holding a different type of data –Pointers – Use pointers to access memory. They are not big blocks that need to be copied in and out of cache

Loop Interchange Example

Loop Combining Example

Merging Arrays Example

Changing code A lot of the time, the compiler will change your code into a more optimized version using these examples. It will try hard to make sure cache misses do not happen often. The compiler will reorder some instructions and look at memory for possible conflicts and try to fix them

Summary The chapter about memory covers a great deal. From the way it is built to the way that it works There are different levels of memory that work together The cache is the fastest and most important memory, so we have special rules about how to make it work We can affect memory speed ourselves through better coding

IT253: Computer Organization Lecture 11: Memory Tonga Institute of Higher Education.

Similar presentations

Presentation on theme: "IT253: Computer Organization Lecture 11: Memory Tonga Institute of Higher Education."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IT253: Computer Organization Lecture 11: Memory Tonga Institute of Higher Education.

Similar presentations

Presentation on theme: "IT253: Computer Organization Lecture 11: Memory Tonga Institute of Higher Education."— Presentation transcript:

Similar presentations

About project

Feedback