Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.

Memory Hierarchy How to improve memory access

Outline Locality Structure of memory hierarchy Cache Virtual memory

Locality Principle of locality –Programs access a relatively small portion of their address space at any instant of time. Temporal locality –If an item is referenced, it tends to be referenced again soon. Spatial locality –If an item is referenced, items near by tends to be referenced soon.

Memory Hierarchy Multiple levels of memory with different speeds and sizes. Give users the perception that the memory is as large as the largest and as fast as the fastest. The unit of memory considered in memory hierarchy is a block. CPU registers Memory SRAM Memory DRAM Memory Magnetic disk

Structure of memory hierarchy Memory SRAM Memory DRAM Memory Magnetic disk size CPU registers speed Cost per bit

Structure of memory hierarchy Memory typeAccess timeCost per bit Registers~ 0.2 ns SRAM: Static RAM0.5 – 5 ns$4,000-$10,000 DRAM: Dynamic RAM50 – 70 ns$100 - $200 Magnetic Disk5 - 20 ms$0.5 - $2

Cache A level of memory hierarchy between CPU and main memory. registers cache Memory Disk Every thing you need is in a register. Everything you need is in cache. Everything you need is in memory.

How to improve memory access time registers A B C D a b c d e f g h CPU Cache Memory Disk AC BD AB CD abe dc f hg

Address Space Suppose 1 block = 256 byte = 2 8 byte cache has 8 blocks memory has 32 blocks disk has 64 blocks. Then, cache has 8  2 8 = 2 11 bytes memory has 32  2 8 = 2 13 bytes disk has 64  2 8 = 2 14 bytes. For the cache, a block number has 3 bits, and an address has 11 bits. For the memory, a block number has 5 bits, and an address has 13 bits. For the disk, a block number has 6 bits, and an address has 14 bits. cachememory disk 111314 address data 8 88

Address Space 000000000001000010000011 000100000101000110000111 001000001001001010001011 001100001101001110001111 111000111001111010111011 111100111101111110111111 00000000010001000011 00100001010011000111 01000010010101001011 …… 11000110011101011011 11100111011111011111 000001010011 100101110111 Cache: 8 blocks Memory: 32 blocks Disk: 64 blocks Address: Block number || offset in block Address in cache : xxx || xxxxxxxxAddress in disk : xxxxxx || xxxxxxxxAddress in memory: xxxxx || xxxxxxxx

Hit / Miss Hit The requested data is found in the upper level of the hierarchy. Hit rate or hit ratio The fraction of memory access found in the upper level Hit time The time to access data when it hits (= time to check if the data is in the upper level + access time) Miss The requested data is not found the upper level, but is in the lower level, of the hierarchy. Miss rate or miss ratio 1 – hit rate Miss penalty The time to get a block of data into the upper level, and then into the CPU.

Cache A level of memory hierarchy between CPU and main memory. To access data in memory hierarchy –CPU requests data from cache. –Check if data is in the cache. Cache hit –Transfer the requested data from cache to CPU Cache miss –Transfer a block containing the requested data from memory to cache –Transfer the requested data from cache to CPU

How cache works ABCDEF cache memory A CPU Request A BCDEF miss Request BRequest CRequest D hit Request ERequest F Cache is full; Replace a block.

Where to place a block in cache Direct-mapped cache Each memory location is mapped to exactly one location in the cache. (But one cache location can be mapped to different memory location at different time.) Other mapping can be used. c0c1c2c3 b0b1b2b3 b4b5b6b7 b8b9b10… … Cache-memory mapping

Direct-mapped cache 00011011 000000 - 000011 000100 - 000111 001000 - 001011 001100 - 001111 010000 - 010011 010100 - 010111 011000 - 011011 011100 - 011111 100000 - 100011 100100 - 100111 101000 - 101011 101100 - 101111 110000 - 110011 110100 - 110111 111000 - 111011 111100 - 111111 Cache Memory 1 block = 4 byte

Fully-associative cache 00011011 000000 - 000011 000100 - 000111 001000 - 001011 001100 - 001111 010000 - 010011 010100 - 010111 011000 - 011011 011100 - 011111 100000 - 100011 100100 - 100111 101000 - 101011 101100 - 101111 110000 - 110011 110100 - 110111 111000 - 111011 111100 - 111111 Cache Memory 1 block = 4 byte

Set-associative cache 000001010011100101110111 000000 - 000011 000100 - 000111 001000 - 001011 001100 - 001111 010000 - 010011 010100 - 010111 011000 - 011011 011100 - 011111 100000 - 100011 100100 - 100111 101000 - 101011 101100 - 101111 110000 - 110011 110100 - 110111 111000 - 111011 111100 - 111111 Cache Memory 1 block = 4 byte

Determine if a block is in the cache For each block in the cache –Valid bit indicate that the block contains valid data –Tag Contain the information of the associated block in the memory Example: –If the valid bit is false, no block from memory is stored in that block of cache. –If the valid bit is true, the address of data stored in the block is stored in tag.

Example: direct-mapped 00011011 000000 - 000011 000100 - 000111 001000 - 001011 001100 - 001111 010000 - 010011 010100 - 010111 011000 - 011011 011100 - 011111 100000 - 100011 100100 - 100111 101000 - 101011 101100 - 101111 110000 - 110011 110100 - 110111 111000 - 111011 111100 - 111111 Valid bit Tag Cache Memory 1101 0111 00

Example: Fully-associative 00011011 000000 - 000011 000100 - 000111 001000 - 001011 001100 - 001111 010000 - 010011 010100 - 010111 011000 - 011011 011100 - 011111 100000 - 100011 100100 - 100111 101000 - 101011 101100 - 101111 110000 - 110011 110100 - 110111 111000 - 111011 111100 - 111111 Valid bit Tag Cache Memory 1111 1101010111100110

Example: set-associative cache 000001010011100101110111 000000 - 000011 000100 - 000111 001000 - 001011 001100 - 001111 010000 - 010011 010100 - 010111 011000 - 011011 011100 - 011111 100000 - 100011 100100 - 100111 101000 - 101011 101100 - 101111 110000 - 110011 110100 - 110111 111000 - 111011 111100 - 111111 Valid bit Tag Cache Memory 11100010 11011100011100

Access a direct-mapped cache Cache indexValid bittag 000100111 001110011 010011000 ……… 111101101 1 0 0 1 10 0 10 0 1 1 0 Memory address memory = AND hit Cache address

Access a fully-associative cache Cache indexValid bittag 000100111000 001110011001 010011000111 ……… 111101101100 1 0 0 1 0 0 1 1 0 Memory address AND 0 0 10 0 1 1 0 Cache address 0 0 1 0 0 1 1 0

Access a set-associative cache Cache indexValid bittagValid bittag 000010001111000 001111001100001 010111010011000 …………… 1111011111 1 1 0 0 10 0 1 1 0 Memory address AND 0 0 10 0 1 1 0 Cache address 0 0 1 0 0 0 1 1 0 AND 0 0 10 0 1 1 01 0 Cache address

Access a set-associative cache Cache indexValid bittagValid bittag 000100000101000 001111001100001 010111010010010 …………… 1110011010 Memory address Cache address == 1 1 0 0 1 0 0 1 1 0 AND hit1 hit0

Block size vs. Miss rate

Handling Cache Misses If an instruction is not in a cache, we have to wait for the memory to respond and write data into the cache. (multiple cycles) Cause processor stall. Steps to handle –Send PC-4 to memory –Read from memory to cache and wait for the result –Update cache information (tag + valid bit) –Restart instruction execution.

Handling Writes Write-through When data is written, both the cache and the memory are updated. Consistent copies of memory. Slow because writing to memory is slower. Improve by: –using a write buffer, storing data waiting to be written to memory –Then, processor can continue execution. Write-back When data is written, only the cache is updated. Memory is inconsistent with cache. Faster. But, once a block is removed from cache, it must be written back to memory.

Performance Improvement Increase hit rate/reduce miss rate –Increase cache size –Block size –Good cache associativity –Good replacement policy Reduce cache access time –Multilevel cache

CPU Multilevel Cache Memory L1 cache L2 cache

ProcessorL1 cacheL2 cache Pentium16 KB Pentium Pro16KB256/512 KB Pentium MMX32 KB Pentium II and III32 KB Celeron32 KB128 KB Pentium III Cumine32 KB256 KB AMD K6 and K6-264 KB AMD K6-364 KB256 KB AMD K7 Athlon128 KB AMD Duron128 KB64 KB AMD Athlon Thunderbird128 KB256 KB

Virtual Memory Similar to cache –Based on principle of locality –Memory is divided into equal blocks called page. –If a requested page is not found in the memory, page fault occurs. Allow efficient and safe sharing of memory among multiple programs –Each program has its own address space Virtually extends the memory size –A program can be larger than the memory.

Program A Virtual Memory Program B Program C Main memory Physical address Virtual memory Virtual address Address translation disk swap space

Program A Virtual Memory Main memory Virtual address space can be larger than physical address space.

Address Calculation Virtual memory physical memory Virtual page numberPage offsetPhysical page numberPage offset Virtual address physical address Address translation page table

Page Table Virtual page number Valid bitPhysical page number 0000…0001 0000…0011 … 0011…1100 1111…1111 Page offset Physical page numberPage offset Virtual address physical address Page table register

Page fault When the valid bit of the requested page = 0, a page fault occurs. Handling page fault –Get the requested page from disk (use information in the page table) –Find an available page in the memory If there is one, put the requested page in and update the entry in the page table. If there is none, find a page to be replaced (according to page replacement policy), replace it, and update both entries in the page table.

Page Replacement Page replacement policy –Least recently used (LRU): replace the page that has not been used for the longest time. Updating data in the virtual memory –If the replaced page was changed (written on the page), the page must be updated in the virtual memory. –Writing-back is more efficient than write-through. –If the replaced page was not changed (written on the page), no virtual memory update is necessary.

Other information in page tables Use/reference bit –Used for LRU policy Dirty bit –Used for updating the virtual memory

Translation-lookaside buffer (TLB) Cache that stores recently-used part of page table for efficiency When the operating system switches from process A to process B (called context switch), A’s page table must be replaced by B’s page table in TLB.

disk A swap space CB memory part of Apart of Cpart of B A’s page tableB’s page tableC’s page table TLB Currently used page table cache Currently used data & prog CPU

Three C’s

Effects of the three C’s Compulsory misses are too small to be seen in this graph. One-way set associativity two-way set associativity Four and eight-way set associativity

Design factors

Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.

Similar presentations

Presentation on theme: "Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.

Similar presentations

Presentation on theme: "Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory."— Presentation transcript:

Similar presentations

About project

Feedback