2 Data Organization Big endian Little endian Alignment Most significant byte stored in first memory location each additional n bytes stored in next n locationsLittle endianLeast significant byte stored in first memory location each additional n bytes stored in next n locationsAlignment-Data requires more than one byte to represent a value.-Memory byte addressed.-Values must be stored in more than one location.-Neither format is better than the other. CPU expects data to be stored in one or the other. Problems come when data will be transferred between computers using different organizations.-
5 Memory Configuration Single chip Multiple chips Address bus, data bus, control bus are connected to the memory chipMultiple chipsAddress bus and control bus connected to the chipsDifferent bits of data bus connected to data pins
6 Computer Architectures von NeumannInstructions and data stored in same memory moduleHarvardSeparate memory modules for eachModern PCsHarvard used in cache memory
7 Memory Hierarchy Hierarchical memory system Registers Cache Main memorySecondary memory-One of the most important considerations in understanding the performance capabilities of a processor.-some types of memory far less efficient (cheaper) than others.-computer systems use a combination of memory types to provide the best performance at the best cost.(hierarchical memory approach)-in general, the faster memory is the more expensive it is per bit of storage.-by using a hierarchy of memories (each with different access speeds and storage capabilities) a computer system can exhibit performance above what would be possible without a combination of the various types.-memory is classified based on its distance from the processor-distance is measured in the number of machine cycles it takes to access the memory (closer:faster)
8 Memory Hierarchy Terminology Hit - Requested data resides in a given level of memoryMiss - Requested data not found in the given level of memoryHit rate – percentage of memory accesses found in a given level of memoryMiss rate – percentage of memory accesses not found in a given level of memory (1- Hit rate)Hit time – time required to access the requested data in a given level of memoryMiss penalty – time required to process a miss-typically, we are concerned with the hit rate only for upper levels of memory-miss penalty includes replacing a block in upper level memory, plus the additional time to deliver the requested data to the processor. Time to process a miss is typically significantly larger than the time to process a hit.-
9 Memory Hierarchy Access Time Registers – 1 ns -> 2ns System L1 Cache – 3 ns -> 10 ns SystemL2 Cache – 25 ns -> 50 ns SystemMain Memory – 30 ns -> 90 ns SystemFixed disk – 5 ms -> 29 ms OnlineOptical disk – 100 ms -> 5s Near lineMagnetic – 10 s -> 3 m Offline
10 Locality of ReferenceTemporal locality – recently accessed items tend to be accessed again in the near futureSpatial locality – accesses tend to be clustered in the address space (arrays or loops)Sequential locality – instructions tend to be accessed sequentiallyProcessors access memory in a patterned way.If memory location X is accessed at time t, there is a high probability that location X+1 will be accessed in the near future. Locality of reference can be exploited by implementing the memory as a hierarchy; when a miss is processed, instead of simply transferring the requested data to a higher level, the entire block containing the data is transferred. Since it is likely that the additional data in the block will be needed in the near future, and if so, this data can be loaded quickly from the faster memory.- this principle provides the opportunity for a system to use a small amount of very fast memory to effectively accelerate the majority of memory accesses.
11 CacheSmallHigh speedTemporarily stores data from frequently used memory locationsConnected to main memoryVery high speed, small amountData from frequently used memory locations is temporarily storedL2 typically 256K or 512K – resides between the CPU and main memoryL1 smaller (8K or 16K) – resides on processorPurpose is to speed up memory accesses by storing recently used data closer to the CPU instead of storing it in main memory.Cache composed of SRAMCache is not accessed by address; it is accessed by content (content addressable memory)
12 Cache Mapping SchemesThe mapping scheme determines where the data is placed when it is originally copied into cache and provides a method for the CPU to find previously copied data when searching cacheDirect mapped cacheFully associative cacheSet associative cacheFor cache to be functional it must store useful data. The data isn’t useful, though, if the CPU can’t find it.When accessing data or instructions the CPU first generates a main memory address. If the data has been copied to cache the address of the data in cache is not the same as the main memory address. How does the CPU find the data when it is in cache? It uses a specific mapping scheme that “converts” the main memory address into a cache location by giving special significance to the bits in the main memory address. The bits are divided into distinct groups called fields. Depending on the mapping scheme there may be 2 or 3 fields. How the fields are used depends on the mapping scheme as well.
13 Direct Mapped Cache Modular approach Block X of main memory is mapped to block Y of cache, mod N, where N is the total number of blocks in cache.In direct mapping the binary main memory address is partitioned into the fields shown:More main memory blocks than cache blocks. Main memory blocks compete for cache locations.Inexpensive, restrictive approachA given block of memory can only be placed in a certain block in cache.TagBlockWord
14 ExampleSmall system with 16 words of main memory divided into 8 blocks (each block has 2 words). Assume cache is 4 blocks in size (total of 8 words).Main memory address has 4 bits (24 = 16 words)4-bit address is divided into three fieldsword field: 1 bit, block field: 2 bits, tag field: 1 bitMapping:Main Memory Maps to CacheBlock 0 (addresses 0,1) Block 0Block 1 (addresses 2,3) Block 1Block 2 (addresses 4,5) Block 2Block 3 (addresses 6,7) Block 3Block 4 (addresses 8, 9) Block 0Block 5 (addresses 10, 11) Block 1Block 6 (addresses 12, 13) Block 2Block 7 (addresses 14, 15) Block 3
15 Main Memory Address 9 = 10012Split into fields: tag = 1 (1 bit)block = 00 (2 bits)word = 1 (1 bit)
16 Fully Associative Cache Built from associative memory so it can be searched in parallel.A single search must compare the requested tag to all tags in cache to determine if the block is present.Special hardware required to allow associative searching (expensive).Block of memory can be placed in any block in cache. Not as restrictive as direct mapping.Requires a larger tag to be stored, which results in a larger cache.
17 Set Associative Cache N-way set associative cache mapping Combination of direct mapped and fully associativeThe address maps the block to a set of cache blocksAddress is partitioned into three fields: tag, set, and word.This scheme is in the middle between Fully associative and direct mapped.All sets in cache must be the same size2-way associative cache each set is two blocks, 8-way 8 cache blocks per set, etcTag and word field are the same as in direct mapped. Set field indicates into which cache set the main memory block maps.2-way assoc. mapping with a main memory of 2^14 words, a cache of 16 blocks each of 8 words = 8 sets in cache. The main memory address has to be 14 bits long, the set field then has to be 3 bits, the word field is 3 bits, and the tag field is the remaining 8 bits.
18 Main Memory Medium speed Much larger than cache Complemented by a very large secondary memoryComposed of DRAM
19 Secondary MemoryVery largeSlower accessHard diskRemovable media
20 Virtual Memory Virtual Address Physical Address Mapping Page Frames PagesPagingFragmentationPage Faultuse hard disk space as an extension of RAM. Increases the available address space a process can use. This allows a program to run when only specific pieces are present in memory. Parts not currently being used are stored in the page file on disk.Even 512 MB RAM is not enough memory to hold multiple applications concurrently and the OSArea on the hard drive used for virtual memory is called a page file.Most common way to implement virtual memory is pagingVirtual Address – The logical or program address that the process uses. Whenever the CPU generates an address, it is always in terms of virtual address space.physical address – The real address in physical memorymapping – The mechanism by which virtual addresses are translated into physical ones (similar to cache mapping)page frames – the equal size chunks or blocks into which main memory is dividedpages – the chunks or blocks into which virtual memory (the logical address space) is divided, each equal in size to a page frame. Virtual pages are stored on disk until needed.paging – the process of copying a virtual page from disk to a page frame in main memory (most popular implementation of VM. VM can also be implemented with segmentation or a combination of paging and segmentation)fragmentation – memory that becomes unstable (system allocates more memory to a process than it needs because it has to allocate a page)page fault – a requested page is not in main memory and must be copied into memory from disk.success of paging is very dependant on the locality principle just like caching
21 Paging Allocate physical memory to processes in fixed size chunks Page tablein main memory (typically)N rows (N = # of virtual pages in the process)valid bit0 page is not in main memory1 page is in main memoryevery process has its own page table which resides in main memory.page table stores the physical location of each virtual page of the process.Additional fields can be added to the page table to provide more information:dirty bit- (aka modify bit) indicates whether the page has been changed. If the page has not been modified it does not need to be rewritten to disk.usage bit – indicate page usage. 1 whenever the page is accessed; set to 0 after a certain period of time.
22 How Paging Works Extract the page number Extract the offset Translate page number into physical page frame number using page tableOs dynamically translates a virtual address generated by a process into the physical address in main memory where the data actually resides.To convert the address the virtual address is divided into two fields: page and offsetthe offset field represents the offset within the page where the data is located.
23 How Paging Works cont’d look up page number in page tablecheck the valid bitvalid bit = 0system generates a page fault and OS must intervenelocate the page on diskfind a free page framecopy the page into the free page frameupdate the page tableresume execution of processif a process has free frames in main memory when a page fault occurs, the newly retrieved page can be placed in any of those free frames.If the memory allocated to the process is full, a victim page must be selected.Replacement algorithms used to select a victim page include FIFO, Random, and LRU (least recently used)
24 How Paging Works cont’d valid bit = 1page is in memoryreplace virtual page number with actual frame numberaccess data at offset in physical page frameadd the offset to the frame number for the given virtual page
25 Example process has a virtual address space of 28 words physical memory of 4 page frames32 word in lengthVirtual address is 8 bitsPhysical address is 7 bitsin this example the system has no cache.physical address is 7 bits because 4 frames of 32 words each is 128 words = 2^7
26 Example cont’d 2 fields of virtual address Page – 3 bitsoffset – 5 bitssystem generates virtual address 13 ( in binary)Page 000Offset 01101Physical address =offset must be 5 bits because 2^5 = 32. Need 5 bits to address 32 wordsuse page field as an index into the page table0th entry in page table virtual page 0 maps to physical page frame 2 (10 in binary)Translated physical address becomes page frame 2 offset 13.combine the page frame (10) and offset (01101) for the physical address.Physical address only has 7 bits (2 for the frame (4 frames), and 5 for the offset).
27 Access Time Time penalty associated with virtual memory two physical memory accesses for each memory access the processor generatestwo memory accesses: 1. page table, 2. the actual data
28 Disadvantages extra resource consumption memory overhead for storing page tablesspecial hardware and OS support required
29 AdvantagesPrograms are no longer restricted by the amount of physical memory availableEasier to write programsdon’t need to worry about physical address space limitationsAllows multitaskingvirtual memory allows us to run programs whose virtual address space is larger than physical memory
30 Segmentationvirtual address space divided into logical variable length units (segments)To copy a segment into memory OS looks for a chunk of free memory large enoughSegmentbase address – where located in memorybounds limit – indicates its sizeSegment table – base/bounds pairsPhysical memory isn’t really divided into anythingMemory accesses are translated into segment number and an offset within the segmentCheck is performed to make sure the offset is within the segmentIf offset is within bounds, base value for the segment (from the segment table) is added to the offset to yield the physical address
31 Segmentation External fragmentation as segments are copied into and out of memory free chunks of memory are broken upeventually many small chunks none big enough for any segmentGarbage collection combats external fragmentationEnough total memory may exist but it exists as a large number of small, unusable holesgarbage collection shuffles occupied chunks of memory to collect the smaller, fragmented chunks into fewer, larger, usable chunks. Similar to defragmenting a disk drive.
32 Paging and Segmentation Systems can use a combinationVirtual address space is divided into segments of variable lengthSegments are divided into fixed-size pagesMain memory is divided into the same size framesEach segment has a page tablePhysical address divided into 3 fields: segment, page number, offsetPaging is easier to manageallocation, freeing, swapping, and relocating are easy when everything is the same sizeSegmentation less overheadsegments are usually larger than pagesSegmentation eliminates internal fragmentationPaging eliminates external fragmentationSegmentation has the ability to support sharing and protection (difficult with paging)-segment field points the system to the correct page table, page number is used as an offset into the page table, offset is offset within the pageCombination advantageous because it allows for segmentation from the user’s point of view and paging from the system point of view