Presentation is loading. Please wait.

Presentation is loading. Please wait.

C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Advanced Computers Architecture Lecture 9 By Rohit Khokher Department.

Similar presentations


Presentation on theme: "C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Advanced Computers Architecture Lecture 9 By Rohit Khokher Department."— Presentation transcript:

1 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Advanced Computers Architecture Lecture 9 By Rohit Khokher Department of Computer Science, Sharda University, Greater Noida, India

2 Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module

3 Cache/Main Memory Structure

4 Cache operation – overview CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot

5 Cache Read Operation - Flowchart

6 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 Cache Design Size Mapping Function Replacement Algorithm Write Policy Block Size Number of Caches

7 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 Size Cost – More cache is expensive Speed – More cache is faster (up to a point) – Checking cache for data takes time

8 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 Comparison of Cache Sizes ProcessorType Year of Introduction L1 cache a L2 cacheL3 cache IBM 360/85Mainframe to 32 KB PDP-11/70Minicomputer19751 KB VAX 11/780Minicomputer KB IBM 3033Mainframe KB IBM 3090Mainframe to 256 KB Intel 80486PC19898 KB PentiumPC19938 KB/8 KB256 to 512 KB PowerPC 601PC KB PowerPC 620PC KB/32 KB PowerPC G4PC/server KB/32 KB256 KB to 1 MB2 MB IBM S/390 G4Mainframe KB256 KB2 MB IBM S/390 G6Mainframe KB8 MB Pentium 4PC/server20008 KB/8 KB256 KB IBM SP High-end server/ supercomputer KB/32 KB8 MB CRAY MTA b Supercomputer20008 KB2 MB ItaniumPC/server KB/16 KB96 KB4 MB SGI Origin 2001High-end server KB/32 KB4 MB Itanium 2PC/server KB256 KB6 MB IBM POWER5High-end server KB1.9 MB36 MB CRAY XD-1Supercomputer KB/64 KB1MB

9 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 Mapping functions are used as a way to decide which main memory block occupies which line of cache. As there are less lines of cache than there are main memory blocks, an algorithm is needed to decide this. Three cache mapping functions, i.e., methods of addressing to locate data within a cache. Direct Full Associative Set Associative Each of these depends on two facts: Mapping Function

10 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 RAM is divided into blocks of memory locations. In other words, memory locations are grouped into blocks of 2 n locations where n represents the number of bits used to identify a word within a block. These n bits are found at the least-significant end of the physical address. The image below has n=2 indicating that for each block of memory, there are 2 2 = 4 memory locations. First Concept

11 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 Therefore, for this example, the least two significant bits of an address indicate the location within a block while the remaining bits indicate the block number. The table below shows an example with a 20 bit address with four words per block. Notice that for each group of four words, the word bits take on each of the four possible values allowed with 2 bits while the block identification bits remain constant., BlockAddressBlock identification bitsWord bits Block 0 0x x x x Block 1 0x x x x Block 2 0x x x0000A x0000B Block 3 0x0000C x0000D x0000E x0000F And so on...until we get to the last row Block 2 n -1 0xFFFFC xFFFFD xFFFFE xFFFFF

12 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 The cache is organized into lines, each of which contains enough space to store exactly one block of data and a tag uniquely identifying where that block came from in memory. Second Concept

13 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 This is the simplest form of mapping. One block from main memory maps into only one possible line of cache memory. As there are more blocks of main memory than there are lines of cache, many blocks in main memory can map to the same line in cache memory. To implement this function, use the following formula: α = β % γ where, α is the cache line number, β is the block number in main memory, γ is the total number of lines in cache memory and % being the modulus operator. Direct Mapping

14 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2

15 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 The address for this example is broken down something like the following: The address is broken into three parts: (s-r) MSB bits represent the tag to be stored in a line of the cache corresponding to the block stored in the line; r bits in the middle identifying which line the block is always stored in; and w LSB bits identifying each word within the block. This means that: The number of addressable units = 2 s+w words or bytes The block size (cache line width not including tag) = 2 w words or bytes The number of blocks in main memory = 2 s (i.e., all the bits that are not in w) The number of lines in cache = m = 2 r The size of the tag stored in each line of the cache = (s - r) bits Tag 8 bits identifying line in cache word id bits

16 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2

17 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 Disadvantage There is a fixed cache location for any given block in main memory. If two blocks of memory sharing the same cache line are being continually referenced, cache misses would occur and these two blocks would continuously be swapped, resulting in slower memory access due to the time taken to access main memory

18 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every lines tag is examined for a match Cache searching gets expensive Address in this mapping. Tag word id bits

19 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 The number of addressable units = 2 s+w words or bytes The block size (cache line width not including tag) = 2 w words or bytes The number of blocks in main memory = 2 s (i.e., all the bits that are not in w) The number of lines in cache is not dependent on any part of the memory address The size of the tag stored in each line of the cache = s bits

20 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2

21 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 Set Associative Cache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set – e.g. Block B can be in any line of set i e.g. 2 lines per set – 2 way associative mapping – A given block can be in one of 2 lines in only one set

22 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2

23 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2

24 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 For direct mapping where there is only one possible line for a block of memory, no replacement algorithm is needed. For associative and set associative mapping, however, an algorithm is needed. For maximum speed, this algorithm is implemented in the hardware. Four of the most common algorithms are: least recently used This replaces the candidate line in cache memory that has been there the longest with no reference to it. first in first out This replaces the candidate line in the cache that has been there the longest. least frequently used This replaces the candidate line in the cache that has had the fewest references. random replacement This algorithm randomly chooses a line to be replaced from among the candidate lines. Studies have shown that this yields only slightly inferior performance than other algorithms. Replacement Algorithms

25 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 This is important because if changes were made to a line in cache memory, the appropriate changes should be made to the block in main memory before removing the line from the cache. The problems to contend with are more than one device may have access to main memory (I/O modules). If more than one processor on the same bus with its own cache is involved, the problem becomes more complex. Any change in either cache or main memory could invalidate the others. WRITE THROUGH The simplest technique is called write through. In using this technique, both main memory and cache are written to when a write operation is performed, ensuring that main memory is always valid. The main disadvantage of this technique is that it may generate substantial main memory traffic, causing a bottle neck and decreasing performance. Write Policy

26 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 WRITE BACK An alternative technique, known as write back minimizes main memory writes. Updates are made only in the cache. An update bit associated with the line is set. Main memory is updated when the line in cache gets replaces only if the update bit has been set. The problem with this technique is that all changes to main memory have to be made through the cache in order not to invalidate parts of main memory, which potentially may cause a bottle neck.

27 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 When a block of data is retrieved from main memory and put into the cache, the desired word and a number of adjacent words are retrieved. As the block size increases from a very small size, the hit ratio will at first increase due to the principle of locality of reference, which says that words in the vicinity of a referenced word are more likely to be referenced in the near future. As the block size increases, however, the hit ratio will decrease as the probability of reusing the new information becomes less than that of using the information that has been replaced. Line Size

28 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 Two aspects of this are: Multilevel Due to increased logic density, it has become possible to have a cache on the same chip as the processor. This increases execution time as less activity over an external bus is needed. Even though an on-chip cache exists, it is typically desirable to have an off-chip cache as well. This means that if a miss occurs on the level 1 cache (on-chip), instead of retrieving the data from the slower main memory, information may be retrieved from the level 2 cache, which, although slower than level 1 cache, is still appreciably faster than main memory. Some level 2 caches are stored on-chip and a level 3 cache has been implemented off-chip. Number of Caches

29 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2 Unified/Split Two types of words exist that are stored in cache, namely data and instruction. It has become common to split the cache into two to separate these words. Two potential advantages to a unified cache are: A greater hit rate than split caches because the load between instruction and data fetches are balanced automatically. Only one cache needs to be designed and implemented. The key advantage of the split cache design is that it eliminates contention for the cache between the instruction fetch/decode unit and the execution unit. This is important for designs that rely on pipelining of instructions.

30 C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 2


Download ppt "C SINGH, JUNE 7-8, 2010IWW 2010, ISATANBUL, TURKEY Advanced Computers Architecture, UNIT 1 Advanced Computers Architecture Lecture 9 By Rohit Khokher Department."

Similar presentations


Ads by Google