Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to Cache Design 2016/10/1\course\cpeg323-08F\Topic7a1.

Similar presentations


Presentation on theme: "An Introduction to Cache Design 2016/10/1\course\cpeg323-08F\Topic7a1."— Presentation transcript:

1 An Introduction to Cache Design 2016/10/1\course\cpeg323-08F\Topic7a1

2 2016/10/1\course\cpeg323-08F\Topic7a2 Cache A safe place for hiding and storing things. Webster Dictionary

3 2016/10/1\course\cpeg323-08F\Topic7a3 Even with the inclusion of cache, almost all CPUs are still mostly strictly limited by the cache access-time: In most cases, if the cache access time were decreased, the machine would speedup accordingly. - Alan Smith - Even more so for MPs!

4 2016/10/1\course\cpeg323-08F\Topic7a4 While one can imagine ref. patterns that can defeat existing cache M designs, it is the author’s experience that cache M improve performance for any program or workload which actually does useful computation.

5 2016/10/1\course\cpeg323-08F\Topic7a5 Generally has four aspects: 1. Maximizing the probability of finding a memory reference’s target in the cache (the hit ratio). 2. Minimizing the time to access information that is indeed in the cache (access time). 3. Minimizing the delay due to a miss. 4. Minimizing the overheads of updating main memory, maintaining cache coherence etc. Optimizing the design of a cache memory

6 2016/10/1\course\cpeg323-08F\Topic7a6 Key Factor in Design Decision for VM and Cache Access-time MainMem Access-time Cache Access-time SecondaryMem Access-time MainMem = 4 ~ 20. = 10 4 ~ 10 6. Cache control is usually implemented in hardware!!

7 2016/10/1\course\cpeg323-08F\Topic7a7 Technology in 1990s: Technology in 2000s ?

8 2016/10/1\course\cpeg323-08F\Topic7a8 Technology in 2004: Technology in 2008s ? See P&H Fig. pg. 469 3 rd Ed

9 2016/10/1\course\cpeg323-08F\Topic7a9 Technology in 2008: See P&H Fig. pg. 453 4 th Ed

10 2016/10/1\course\cpeg323-08F\Topic7a10 Processor Cache Main Memory Main Memory Secondary Memory Secondary Memory Cache in Memory Hierarchy

11 Emerging Memory Device Technologies Source: Emerging Nanoscale Memory and Logic devices: A Critical Assesment”, Hutchby et al, IEEE Computer, May, 2008

12 Emerging Memory Device Technologies Source: “Emerging Nanoscale Memory and Logic devices: A Critical Assesment”, Hutchby et al, IEEE Computer, May, 2008

13 2016/10/1\course\cpeg323-08F\Topic7a13

14 Source: Kooge, Peter ACS Productivity Workshop 2008

15 2016/10/1\course\cpeg323-08F\Topic7a15 Four Questions for Classifying Memory Hierarchies: The fundamental principles that drive all memory hierarchies allow us to use terms that transcend the levels we are talking about. These same principles allow us to pose four questions about any level of the hierarchy:

16 2016/10/1\course\cpeg323-08F\Topic7a16 Q1:Where can a block be placed in the upper level? (Block placement) Q2:How is a block found if it is in the upper level? (Block identification) Q3:Which block should be replaced on a miss? (Block replacement) Q4:What happens on a write? (Write strategy) Four Questions for Classifying Memory Hierarchies

17 2016/10/1\course\cpeg323-08F\Topic7a17 These questions will help us gain an understanding of the different tradeoffs demanded by the relationships of memories at different levels of a hierarchy.

18 2016/10/1\course\cpeg323-08F\Topic7a18 01173 30 Line ADDRESS DATA Concept of Cache miss and Cache hit 0 1 2 3 4 5 6 7 TAGSDATA 0117X35, 72, 55, 30, 64, 23, 16, 14 7620X11, 31, 26, 22, 55, … 3656X71, 72, 44, 50, … 1741X33, 35, 07, 65,...

19 2016/10/1\course\cpeg323-08F\Topic7a19 t eff :effective cache access time t cache :cache access time t main :main memory access time h :hit ratio t eff = ht cache + (1-h)t main Access Time

20 2016/10/1\course\cpeg323-08F\Topic7a20 Example Let t cache = 10 ns-1- 4 clock cycles t main = 50 ns-8-32 clock cycles h = 0.95 t effect = ? 10 x 0.95 + 50 x 0.05 9.5 + 2.5 = 12

21 2016/10/1\course\cpeg323-08F\Topic7a21 Hit Ratio Need high enough (say > 90%) to obtain desirable level of performance Amplifying effect of changes Never a constant even for the same machine

22 2016/10/1\course\cpeg323-08F\Topic7a22 Sensitivity of Performance w.r.th (hit ratio) t eff = h t cache + (1-h) t main = t cache [ h + (1-h) ] t cache [ 1 + (1-h) ] since 10, the magnifactor of h changes is 10 times. Conclusion: very sensitive t main t cache t main t cache t main t cache ~~~~

23 2016/10/1\course\cpeg323-08F\Topic7a23 Remember: “h 1” Example: Leth = 0.90 if h = 0.05 (0.90 0.95) then(1 - h) = 0.05 thent eff = t cache ( 1 + 0.5) ~~~~

24 2016/10/1\course\cpeg323-08F\Topic7a24 Basic Terminology Cache line (block) - size of a room 1 ~ 16 words Cache directory - key of rooms Cache may use associativity to find the “right directory” by matching “A collection of contiguous data that are treated as a single entity of cache storage.” The portion of a cache that holds the access keys that support associative access.

25 2016/10/1\course\cpeg323-08F\Topic7a25 Cache Organization Fully associative: an element can be in any block Direct mapping :an element can be in only one block. Set-associative :an element can be in a group of block

26 2016/10/1\course\cpeg323-08F\Topic7a26 An Example Mem Size = 256 k words x 4B/W= 1 MB Cache Size = 2 k words = 8 k byte Block Size= 16 word/block= 64 byte/block So Main M has = 16 k blocks (16,384) Cache has= 128 blocks addr = 18 bits + 2 bits = (2 8 x 2 10 ) x 2 2 256K 16 2K 16 (byte) 20 256 k words

27 2016/10/1\course\cpeg323-08F\Topic7a27 Fully Associative Feature  Any block in M can be in any block- frame in cache.  All entries (block frame) are compared simultaneously (by associative search).

28 2016/10/1\course\cpeg323-08F\Topic7a28 simplest example : a block = a word entire memory word address becomes Address 027560 0 17 027560 data 0 17 adv: no trashing (quick reorganizing) disadv: overhead of associative search: cost + time very “flexible” and higher probability to reside in cache. Cache “tag” A Special Case

29 2016/10/1\course\cpeg323-08F\Topic7a29 Fully associative cache organization

30 2016/10/1\course\cpeg323-08F\Topic7a30 No associative match From M-addr, “directly” indexed to the block frame in cache where the block should be located. A comparison then is to used to determine if it is a miss or hit. Direct Mapping

31 2016/10/1\course\cpeg323-08F\Topic7a31 Direct Mapping Advantage: simplest: Disadvantage: “trashing” Cont’d Fast (fewer logic) Low cost: (only one set comparator is needed hence can be in the form of standard M

32 2016/10/1\course\cpeg323-08F\Topic7a32 since cache only has 128 block frames so the degree of multiplexing: Disadr: “trashing” Main Memory Size16384 (block) 128 (2 7 ) 128 == 2 7 block/frame for addressing the corresponding frame or set of size 1. the high-order 7 bit is used as tag. i.e. 2 7 blocks “fall” in one block frame. Example

33 2016/10/1\course\cpeg323-08F\Topic7a33 Direct Mapping

34 2016/10/1\course\cpeg323-08F\Topic7a34 Direct Mapping Mapping (indexing) block addr mod (# of blocks in cache – in this case: mod (2 7 )) Adv: low-order log 2 (cache size) bit can be used for indexing Cont’d

35 2016/10/1\course\cpeg323-08F\Topic7a35 Set-Associative A compromises between direct/full-associative The cache is divided into S sets S = 2, 4, 8, … If the cache has M blocks than, all together, there are E = blocks/set # of buildings available for indexing MSMS In our example, S = 128/2 = 64 sets

36 2016/10/1\course\cpeg323-08F\Topic7a36 2-way set associative The 6-bit will index to the right set, then the 8-bit tag will be used for an associative match.

37 2016/10/1\course\cpeg323-08F\Topic7a37 Associativity with 8-block cache

38 2016/10/1\course\cpeg323-08F\Topic7a38 thus or Set Word 8 6 42 a 2-way set associative organization: available for indexing 2 14 (16k) 2 6 = 2 8 block/set 2 8 block/per set of 2 blocks 6 bit used to index into the right “set” higher order 8 bit used as tag hence an associative match of 8 bit with the tags of the 2 blocks is required 2 way Hence an associative matching of 8 bit with the tags of the 2 block is required.

39 2016/10/1\course\cpeg323-08F\Topic7a39 Sector Mapping Cache Sector (IBM 360/85) - 16 sector x 16 block/sector - 1 sector = consecutive multiple blocks - Cache miss: sector replacement - Valid bit - one block is moved on demand Example: Sector block word (tag) 06 7 13 14 17 7 7 4 A sector in memory can be in any sector in cache

40 2016/10/1\course\cpeg323-08F\Topic7a40 Sector Mapping Cache

41 2016/10/1\course\cpeg323-08F\Topic7a41 Cache has= 8 sector Main memory has = 1K sectors 128 blocks 16 blocks/sector 16 k 16 Sector mapping cache cont’d

42 2016/10/1\course\cpeg323-08F\Topic7a42 Example See P&H Fig. 7.7 3 rd Ed or 5.7 4 th Ed

43 2016/10/1\course\cpeg323-08F\Topic7a43 Total # of Bits in a Cache Total # of bits = Cache size x (# of bits of a tag + # of bits of a block + # of bits in valid field) For the example: Direct mapped Cache with 4kB of data, 1-word blocks and 32 bit address  4kB = 1k words = 2 10 words = 2 10 blocks # of bits of tag = 32 – (10 + 0 + 2) = 20 2 10 blocks 2 0 words/block 2 2 bytes/word Total # of bits = 2 10 x (20 + 32*1 + 1) = 53* 2 10 = 53 kbits = 6.625kBytes

44 2016/10/1\course\cpeg323-08F\Topic7a44 Another example: FastMATH Fast embedded microprocessor that uses the MIPS Architecture and a simple cache implementation. 16kB of data, 16-word blocks and 32 bit address  2 14 bytes * 1 word/4bytes * 1 block/16 words = 2 14 / (2 2 * 2 4 ) = 2 8 blocks # of bits of tag = 32 – (8 + 4 + 2) = 18 2 8 blocks 2 4 words/block 2 2 bytes/word Total # of bits = 2 8 x (18 + 32*16 + 1) = 531* 2 8 = 135,936 bits = 132.75 kBytes

45 2016/10/1\course\cpeg323-08F\Topic7a45 Example FastMATH See P&H Fig. 7.9 3 rd Ed or 5.9 4 th Ed


Download ppt "An Introduction to Cache Design 2016/10/1\course\cpeg323-08F\Topic7a1."

Similar presentations


Ads by Google