Presentation on theme: "CMPE 421 Parallel Computer Architecture"— Presentation transcript:
1CMPE 421 Parallel Computer Architecture PART5More Elaborations with cache&Virtual Memory
2Cache Optimization into categories Reducing Miss PenaltyMultilevel cachesCritical word first: Don’t wait for the full block to be loaded before sending the requested word and restarting the CPURead Miss Before write miss: This optimization serves reads before writes have been completed.SW R2, 512(R0) ; M ← R3 (cache index 0)LW R1,1024(R0) ; R1 ← M (cache index 0)LW R2,512(R0) ; R2 ← M (cache index 0)If the write buffer hasn’t completed writing to location 512 in memory, the read of location 512 will put the old, wrong value into the cache block, and then into R2.Victim Caches
3Victim CachesOne approach to lower miss penalty is to remember what was discarded in case it is needed again.This victim cache contains only blocks that are discarded from a cache because of a miss—“victims”—and are checked on a miss to see if they have the desired data before going to the next lower-level memory.The AMD Athlon has a victim cache with eight entries.Jouppi  found that victim caches of one to ﬁve entries are effective at reducing misses, especially for small, direct-mapped data caches. Depending on the program, a four-entry victim cache might remove one quarter of the misses in a 4-KB direct-mapped data cache.
4Cache Optimization into categories Reducing the miss rateLarger block size,Larger cache size,Higher associativity,Way prediction Pseudo-associativity,In way-prediction, extra bits are kept in the cache to predict the set of the next cache access.Compiler optimizationsReducing the time to hit in the cachesmall and simple caches,avoiding address translation,and pipelined cache access.
5Cache OptimizationComplier-based cache optimization reduces the miss rate without any hardware changeFor InstructionsReorder procedures in memory to reduce conflictProfiling to determine likely conflicts among groups of instructionsFor DataMerging Arrays: improve spatial locality by single array of compound elements vs. two arraysLoop Interchange: change nesting of loops to access data in order stored in memoryLoop Fusion: Combine two independent loops that have same looping and some variables overlapBlocking: Improve temporal locality by accessing “blocks” of data repeatedly vs. going down whole columns or rows
6ExamplesReduces misses by improving spatial locality through combined arrays that are accessed simultaneouslySequential accesses instead of striding through memory every 100 words; improved spatial locality
7ExamplesSome programs have separate sections of code that access the same arrays (performing different computation on common data)Fusing multiple loops into a single loop allows the data in cache to be used repeatedly before being swapped outLoop fusion reduces missed through improved temporal locality (rather than spatial locality in array merging and loop interchange)Accessing array “a” and “c” would have caused twice the number of misses without loop fusion
11VIRTUAL MEMORY You’re running a huge program that requires 32MB Your PC has only 16MB available...Rewrite your program so that it implements overlaysExecute the first portion of code (fit it in the available memory)When you need more memory...Find some memory that isn’t needed right nowSave it to diskUse the memory for the latter portion of codeSo on...The memory is to disk as registers are to memoryDisk as an extension of memoryMain memory can act as a “cache” for the secondary stage (magnetic disk)
12A Memory Hierarchy Disk Extend the hierarchy RegistersCPULoad or I-FetchStoreMainMemory(DRAM)CacheExtend the hierarchyMain memory acts like a cache for the diskCache: About $20/Mbyte <2ns access time512KB typicalMemory: About $0.15/MBtye, 50ns access time256MB typicalDisk: About $0.0015/MByte, 15ms (15,000,000 ns) access time40GB typicalHW manages movementDiskSW manages movementThe operating system is responsible for managingthe movement of memory between disk and mainmemory, and for keeping the address translationtable accurate.
13Virtual MemoryIdea: Keep only the portions of a program (code, data) that are currently needed in Main MemoryCurrently unused data is saved on disk, ready to be brought in when neededAppears as a very large virtual memory (limited only by the disk size)Advantages:Programs that require large amounts of memory can be run (As long as they don’t need it all at once)Multiple programs can be in virtual memory at once, only active programs will be loaded into memoryA program can be written (linked) to use whatever addresses it wants to! It doesn’t matter where it is physically loaded!When a program is loaded, it doesn’t need to be placed in continuous memory locationsDisadvantages:The memory a program needs may all be on diskThe operating system has to manage virtual memory
14Virtual Memory We will focus on using the disk as a storage area for chunks of main memory that are not being used.The basic concepts are similar to providing a cache for main memory, although we now view part of the hard disk as being the memory.Only few programs are activeAn active might not need all the memory that has been reserved by the program (store rest in the Hard disk)
15The Virtual Memory Concept Virtual Memory Space: All possible memory addresses (4GB in 32-bit systems) All that can be held as an option(conceived) .Virtual Memory SpaceDisk Swap Space: Area on hard disk that can be used as an extension of memory. (Typically equal to ram size) All that can be used.Disk Swap SpaceMain Memory: Physical memory.(Typically 1GB) All that physically exists.Main Memory
16The Virtual Memory Concept This address can be conceived of, but doesn’t correspond to any memory. Accessing it will produce an error.Virtual Memory SpaceDisk Swap SpaceMain MemoryThis address can be accessed. However, it currently is only on disk and must be read into main memory before being used. A table maps from its virtual address to the disk location.ErrorThis address can be accessed immediately since it is already in memory. A table maps from its virtual address to its physical address. There will also be a back-up location on disk.Disk Address: Not in main memoryPhysical Address: Disk Address:
17The Process The CPU deals with Virtual Addresses Steps to accessing memory with a virtual address1. Convert the virtual address to a physical addressNeed a special table (Virtual Addr-> Physical Addr.)The table may indicate that the desired address is on disk, but not in physical memoryRead the location from the disk into memory (this may require moving something else out of memory to make room)2. Do the memory access using the physical addressCheck the cache first (note: cache uses only physical addresses)Update the cache if needed
18Structure of Virtual Memory Return our Library AnalogyVirtual addresses as the title of a bookPhysical address as the location of that in the libraryFrom ProcessorVirtual AddressPage faultUsing elaborateSoftwarepage faultHandlingalgorithmAddress TranslatorPhysical AddressTo Memory
19Translation (hardware that translates these virtual addresses to physical addresses) Since the hardware access memory, we need to convert from a logical address to a physical address in hardwareThe Memory Management Unit (MMU) provides this functionality2n-1Physical MemoryCPUMMUVirtual Address(Logical)Physical Address(Real)
20Address TranslationIn Virtual Memory, blocks of memory (called pages) are mapped from one set of address (called virtual addresses) to another set (called physical addresses)
21Page FaultsThe virtual page number is used to index the page table. If the valid bit is on, the page table supplies the physical page number (i.e.,., the starting address of the page in memory) corresponding to the virtual page. If the valid bit is off, the page currently resides only on disk, at a specified address. In many systems, the table of physical page addresses and disk page addresses, while logically one table, are stored in two separate data structures. Dual tables are justified in part because we must keep the disk addressers of all the pages, even if they are currently in main memory.If the valid bit for a virtual page is off, a page fault occurs. The operating system must be given control. Once the operating system gets control, it must find the page in the next level of the hierarchy (usually magnetic disk) and decide where to place the requested page in main memory.
22Terminologypage: The unit of memory transferred between disk and the main memory.page fault: when a program accesses a virtual memory location that is not currently in the main memory.address translation: the process of finding the physical address that corresponds to a virtual address.Cache Virtual memoryBlock ⇒ PageCache miss ⇒ page faultBlock addressing ⇒ Address translation
23Difference between virtual and cache memory The miss penalty is huge (millions of seconds)Solution: Increase block size (page size) around 8KBBecause transfers have a large startup time, but data transfer is relatively fast after startedEven on faults (misses) VM must provide info on the disk locationVM system must have an entry for all possible locationsWhen there is a hit, the VM system provides the physical address in memory (not the actual data, in the cache we have data itself )Saves room – one address rather than 8 KB dataSince miss penalty is very huge, VM systems typically have a miss (page fault) rate of %
24In Virtual Memory Systems Pages should be large enough to amortize the high access time. (from 4 kB to 16 kB are typical, and some designers are considering size as large as 64 kB.)Organizations that reduce the page fault rate are attractive. The primary technique used here is to allow flexible placement of pages. (e.g. fully associative)Sophisticated LRU replacement policy is preferablePage faults can be handled in software.Write-back (Write-through scheme does not work.)we need a scheme that reduce the number of disk writes.
25Keeping track of pages: The page table All programs use the same virtual addressing spaceEach program must have its own memory mappingEach program has its own page table to map virtual addresses to physical addressesvirtual Address Physical AddressThe page table resides in memory, and is pointed to by the page table registerThe page table has an entry for every possible page (in principle, not in practice...), no tags are necessary.A valid bit indicates whether the page is in memory or on disk.Page Table
26Virtual to Physical Mapping Both virtual and physical address are broken down a page number and page offset(index)No tag - All entries are uniqueExample4GB (32-bit) Virtual Address Space32MB (25-bit) Physical Address Space8 KB (13-bit) page size (block size)Virtual Page NumberPage Offset121331TranslationNote: may involve reading from diskPage tables are stored in main MEMPhysical Page NumberPage Offset121324A 32-bit virtual address is given to the V.M. hardwareThe virtual page number (index) is derived from this by removing the page (block) offsetThe Virtual Page Number is looked up in a page tableWhen found, entry is either:The physical page number, if in memory V->1The disk address, if not in memory (a page fault) V->0If not found, the address is invalid
28Virtual Memory Consists Bits for page addressBits for virtual page numberNumber of virtual pagesEntries in the page tableBits for physical page numberNumber of physical pagesBits per page table lineTotal page table size
29Write issues Write Through - Update both disk and memory + Easy to implement- Requires a write buffer- Requires a separate disk write for every write to memory- A write miss requires reading in the page first, then writing back the single wordWrite Back - Write only to main memory. Write to the disk only when block is replaced.+ Writes are fast+ Multiple writes to a page are combined into one disk write- Must keep track of when page has been written (dirty bit)
30Page replacement policy Exact Least Recently Used (LRU) but it is expensive.So, use Approximate LRU:a use bit (or reference bit) is added to every page table lineIf there is a hit, PPN is used to form the address and reference bit is turned on so the bit is set at every accessthe OS periodically clears all use bitsthe page to replace is chosen among the ones with their use bit at zeroChoose one entry as a victim randomlyIf the OS chooses to replace the page, the dirty bit indicates whether the page to be written out before its location in memory can be given to another (give a Figure)
31Virtual memory example System with 20-bit V.A., 16KB pages, 256KB of physical memoryPage offset takes 14 bits, 6 bits for V.P.N. and 4 bits for P.P.N.Page Table:Access to:Virtual Page # Valid Physical Page #/ (index) Bit Disk address sector sector 4323… sectorPPN = 0010Physical Address:Access to:sector xxxx...PPN = Page Fault to sector11010Pick a page to “kick out” of memory (use LRU).Read data from sector 1239 into PPN 1010Assume LRU is VPN for this example.