William Stallings Computer Organization and Architecture 7th Edition

William Stallings Computer Organization and Architecture 7th Edition
Chapter 4 Cache Memory (2/2)

More on Direct Mapping Since the mapping function of direct mapping is , we can see the direct mapped cache as following figure i: Cache line number j: block number m : number of lines in the cache 2

More on Direct Mapping – cache miss
Address (octal expression) tag data Cache line number 00000 12 00 12 000 When the processor wants to read the data in memory address “01777” First, find the index 777 in the cache Compare the tag “01” with the tag value in the cache Since the tag value in the cache is “00”, the cache miss is occurred Processor accesses the main memory, and then the word “65” is fetched. Also, the “65” value is updated to the cache line 777 with a new tag value “01” … 00 99 001 97 … 00777 53 01000 … 00 97 777 01777 65 11 02000 … 01 65 02777 77 From step (5), The tag & data is updated … (b) Cache 77777 FF In this example, we propose main Memory address = tag(6bits) + cache line index(9bits) Also, the address is expressed with an octal expression and 8 bit word size (a) Main Memory 3

Associative mapping overcomes the disadvantage of direct mapping
by permitting each main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory To determine whether a block is in the cache, the cache control logic must simultaneously examine every line’s tag for a match Cache searching gets expensive

Fully Associative Cache Organization

Associative Mapping Example
* 24bit sized main memory address : 16339C 22bit sized tag value : 058CE7 * FFFFFC (in main memory) >> 2 bits 3FFFFF (of tag value)

Associative Mapping Address Structure
Word 2 bit Tag 22 bit 24 bit address 22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit

Associative Mapping Summary
Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = undetermined Size of tag = s bits

Associative Mapping Pros & Cons
Advantage Flexible Disadvantages Cost Complex circuit for simultaneous comparison

Set Associative Mapping
Compromised to show the strengths of both the direct & associative mapping Cache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set e.g. Block B can be in any line of set i e.g. 2 lines per set 2 way associative mapping A given block can be in one of 2 lines in only one set

Set Associative Mapping
Cache is divided into v sets of k lines each m = v x k, where m: number of cache lines i = j mod v, where i : cache set number j : main memory block number v : number of sets A given block maps to any line in a given set K-way set associate cache 2-way and 4-way are common 11

Set Associative Mapping Example
m = 16 lines, v = 8 sets  k = 2 lines/set, 2 way set associative mapping * Assume 32 blocks in memory, i = j mod v set blocks 0 0, 8, 16, 24 1 1, 9, 17, 25 : : 7 7, 15, 23, 31 Since each set of cache has 2 lines, the memory block can be in one of 2 lines in the set e.g., block 17 can be assigned to either line 0 or line 1 in set 1

Set Associative Mapping Example
13 bit set number Block number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 … map to same set. ( all of above examples have same values of least significant 12 bits)

Two Way Set Associative Cache Organization

Set Associative Mapping Address Structure
Tag 9 bit Set 13 bit Word 2 bit Use set field to determine cache set to look in Compare tag field to see if we have a hit e.g Address Tag Data Set number 1FF 7FFC 1FF FFF 001 7FFC FFF Address (24) Tag (9) Set (13)+word(2) Set number (1FFF)

Two Way Set Associative Mapping Example
Same set Two Way Set Associative Mapping Example (1) Set Number : mem addr (set + word) [14:0]  >> 2  Set number [14:2] 0x0000  0000 0004  0001 339C  0CE7 7FFC  1FFF 7FF8  1FFE (2) Since 2-way associative each set has 2 cache lines (3) Tag : msb 9 bits mem addr [23:15]

Set Associative Mapping Summary
Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+w /2w=2s Number of lines in set = k Number of sets = v = 2d Number of lines in cache = kv = k * 2d Size of tag = (s – d) bits

Remarks Why is the simultaneous comparison cheaper here, compared to associate mapping? Tag is much smaller Only k tags within a set are compared Relationship between set associate and the first two: extreme cases of set associate k = 1  v = m  direct mapping (1 line/set) k = m  v = 1  associate mapping (one big set)

Replacement Algorithms (1) Direct mapping
When a new block is brought into cache, one of existing blocks must be replaced In direct mapping, the replacement alg has the following features: No choice Each block only maps to one line Replace that line

Replacement Algorithms (2) Associative & Set Associative
Hardware implemented algorithm (speed) Least Recently used (LRU) e.g. in 2 way set associative Which of the 2 block is LRU ? First in first out (FIFO) replace block that has been in cache longest Least frequently used replace block which has had fewest hits Random

Write Policy Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly

Write through All writes go to main memory as well as cache Both copies always agree Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic  cause bottleneck Slows down writes Remember bogus write through caches!

Write back Updates initially made in cache only
Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set i.e., only if the cache line is dirty, i.e., only if at least one word in the cache line is updated Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes

As block size increases from very small
Block size = line size As block size increases from very small  hit ratio increases because of “the principle of locality” As block size becomes very large  hit ratio decreases as Number of blocks decreases Probability of referencing all words in a block decreases 4 - 8 addressable units is reasonable

Number of Caches Two aspects Number of levels Unified vs. split

Modern CPU has on-chip cache (L1) that increases overall performance
Multilevel Caches Modern CPU has on-chip cache (L1) that increases overall performance e.g., : 8KB Pentium: 16KB PowerPC: up to 64KB Secondary, off-chip cache (L2) provides high speed access to main memory Generally 512KB or less Current processor has the L2 cache in its processor

Unified vs. Split Unified cache Split cache
Stores data and instructions in one cache Flexible and can balance the load between data and instruction fetches  higher hit ratio Only one cache to design and implement Split cache Two caches, one for data and one for instructions Trend toward split cache Good for superscalar machines that support parallel execution, prefetch, and pipelining Overcome cache contention

Pentium 4 Cache 80386 – no on chip cache
80486 – 8k using 16 byte lines and four way set associative organization Pentium (all versions) – two on chip L1 caches Data & instructions Pentium III – L3 cache(extra cache built in motherboard b/w processor & main memory) added off chip Pentium 4 L1 caches 8k bytes 64 byte lines four way set associative L2 cache Feeding both L1 caches 256k 128 byte lines 8 way set associative L3 cache on chip

Intel Cache Evolution

Pentium 4 Block Diagram

Pentium 4 Core Processor
Fetch/Decode Unit Fetches instructions from L2 cache Decode into micro-ops Store micro-ops in L1 cache Out of order execution logic Schedules micro-ops Based on data dependence and resources May speculatively execute Execution units Execute micro-ops Data from L1 cache Results in registers Memory subsystem L2 cache and systems bus

Pentium 4 Design Reasoning
Decodes instructions into RISC like micro-ops before L1 cache Micro-ops fixed length Superscalar pipelining and scheduling Pentium instructions long & complex Performance improved by separating decoding from scheduling & pipelining (More later – ch14) Data cache is write back Can be configured to write through L1 cache controlled by 2 bits in register CD = cache disable NW = not write through 2 instructions to invalidate (flush) cache and write back then invalidate L2 and L3 8-way set-associative Line size 128 bytes

PowerPC Cache Organization
601 – single 32kb 8 way set associative 603 – 16kb (2 x 8kb) two way set associative 604 – 32kb 620 – 64kb G3 & G4 64kb L1 cache 8 way set associative 256k, 512k or 1M L2 cache two way set associative G5 32kB instruction cache 64kB data cache

PowerPC G5 Block Diagram

Internet Sources Manufacturer sites Intel IBM/Motorola Search on cache

William Stallings Computer Organization and Architecture 7th Edition

Similar presentations

Presentation on theme: "William Stallings Computer Organization and Architecture 7th Edition"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

William Stallings Computer Organization and Architecture 7th Edition

Similar presentations

Presentation on theme: "William Stallings Computer Organization and Architecture 7th Edition"— Presentation transcript:

Similar presentations

About project

Feedback