Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

Similar presentations


Presentation on theme: "The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir."— Presentation transcript:

1 The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

2 Translation-Look-aside Buffer (TLB)  To optimize the translation process and reduce memory access time  TLB is a cache that holds recently used page table mappings.  TLB tags hold the virtual page number and its data holds the corresponding physical page number.  TLB also holds the reference bit, valid bit and dirty bit  TLB miss - page in page table loaded by the CPU - much more frequent or  Page not in page table - page fault exception  In case of a miss the CPU selects which entry in the TLB needs to be replaced. Its reference and dirty bits are then written back into the page table.  Miss rates for the TLB are 0.01-1% penalty is 10-100 clock cycles much smaller than page fault! 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

3 Example:  Consider a virtual memory system with 40-bit virtual byte address, 16 KB page and 36-bit physical byte address.  What is the total size of the page table for each process on this machine, assuming that the valid, protection, dirty and use bits take a total of 4 bits and that all the virtual pages are in use?  Assume that disk addresses are not stored on the page table.  Page table size = #entries  entry size  The #entries = # pages in virtual address = 2 40 bytes = 16  10 3 bytes/page  = 2 40 = 2 26 entries 2 4  2 10  The width of each entry is 4+ 36 = 40 bits  Thus the size of the page table is 2 26  40 = 5  2 26 bytes= 335 MB 2 3 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

4 TLB and cache working together (Intrinsity FastMATH Proc.)  4 KB pages, TLB - 16 entries, fully associative - all need to be compared. Each entry is 64-bits  20 tag bits (virtual page #)  20 data bits (physical page #)  valid, ref and dirty bits, etc.  One of the extra bits is a write access bit. Prevents programs from writing into pages for which they have only read access - part of protection mechanism.  There could be three misses - cache miss, TLB miss and page fault.  A TLB miss in this case takes 16 cycles on average.  CPU saves process state then gives control of the CPU to another process, then brings page from disk. 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

5 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

6 How are TLB misses and Page Faults handled?  TLB miss – no entry in TLB matches the virtual address. In that case, if the page is in memory (as indicated by the page table) then that address is placed in the TLB.  So the TLB miss is handled by the OS in software. Once the TLB has the virtual address in, then the instruction that caused the TLB miss is re-executed.  If the valid bit of the retrieved page address in the TLB is 0, then a page fault  When a page fault occurs, the OS takes control and stores the states of the process that caused the page fault, as well as the address of the instruction that caused the page fault in the EPC. 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

7 How are TLB misses and Page Faults handled?  The OS then finds a place for the page by discarding an old one (if it was dirty it first has to be saved on disk)  After that the OS starts the transfer of the needed page from hard disk and gives control of the CPU to another process (millions of cycles).  Once the page was transferred, then the OS reads the EPC and returns control to the offending process so it can complete.  Also, if that instruction that caused the page fault was a sw, the write control line for the data memory is de- asserted to prevent the sw from completing.  When an exception occurs, the processor sets a bit that disable exceptions, so that a subsequent exception will not overwrite the EPC. 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

8 The influence of Block size  In general, larger block size take advantage of spatial locality BUT:  Larger block size means larger miss penalty - Takes longer time to fill up the block  If block size is too big relative to cache size, miss rate will go up  Too few cache blocks  In general, Average Access Time = Hit Time  (1 - Miss Rate) + Miss Penalty  Miss Rate Miss Penalty Block Size Miss Rate Exploits Spatial Locality Fewer blocks: compromises temporal locality Block Size Average Access Time Increased Miss Penalty & Miss Rate Block Size 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

9 The Influence of Associativity  Every change that improves the miss rate can also negatively affect overall performance  Ex. We can reduce the miss rate by increasing associativity (30% gain for small caches going from direct-mapped to two-way associative).  But large associativity does not make sense for modern caches which are large, since hardware costs more (more comparators) and the access time is larger.  While for cache full associativity does not pay, for paged memory it is good because misses are very expensive. Large page size means that Page Table is small. 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

10 The influence of associativity (SPEC2000) Small caches Large caches 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

11 Memory writes options  There are two options: write-through (for cache) and write- back (for paged memory).  During write-back pages are written to disk only if they were modified prior to being replaced.  The advantages of write-back are that multiple writes to a given page require only one write to the disk, and using high bandwidth, not one word-at-a-time.  Individual words can be written in a page much faster (cache rate) than if they were written-through to disk.  The advantage of write-through is that misses are simpler to handle and easier to implement (using write buffer).  In the future more caches will use write-back because of the CPU- Memory gap. 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

12 Processor-DRAM Memory Gap (latency) Solutions to reduce the gap: -L3 cache - Have the L2, L3 caches do something while idle 7% DRAM annual performance improvement 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

13 Sources of (Cache) Misses  Compulsory (cold start or process migration, first reference): first access to a block  “Cold” fact of life: not a whole lot you can do about it  Note: If you are going to run “billions” of instruction, Compulsory Misses are insignificant  Conflict (collision): Multiple memory locations (blocks) mapped to the same cache location  Solution 1: increase cache size  Solution 2: increase associativity  Capacity : Cache cannot contain all blocks accessed by the program  Solution: increase cache size  Invalidation : other process (e.g., I/O) updates memory 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

14 Additional conflict misses when going from two-way to one-way associative cache Additional conflict misses when going from four-way to two-way associative cache Total Misses Rate vs. Cache type and size Capacity misses reduce for larger caches 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

15 Design alternatives  Increase cache size  Decreases capacity misses  May increase access time  Increase associativity  Decreases conflict miss rate  May increase access time  Increase block size :  Decreases miss rate due to spatial locality  But increased miss penalty  Very large blocks may increase miss rate for small caches  So design of memory hierarchies is interesting 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir

16 Processor-DRAM Memory Gap for Multi-cores Cores Performance degradation for memory intensive applications 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir


Download ppt "The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir."

Similar presentations


Ads by Google