2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt2 Memory: Location  Registers: inside cpu Fastest – on CPU chip  Cache: very fast, semiconductor, close to CPU  Internal or main memory Typically semiconductor media (transistors) Fast, random access, on system bus  External or secondary memory peripheral storage devices (e.g. disk, tape) Slower, often magnetic media, maybe slower bus

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt3 Memory: Capacity  Word size: # of bits in natural unit of organization Usually related to length of an instruction or the number of bits used to represent an integer number  Capacity expressed as number of words or number of bytes Usually a power of 2, e.g. 1 KB  1024 bytes why?

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt4 Other Memory System Characteristics  Unit of Transfer: Number of bits read from, or written into memory at a time Internal : usually governed by data bus width External : usually a block of words e.g 512 or more  Addressable unit: smallest location which can be uniquely addressed Internal : word or byte External : device dependent e.g. a disk “cluster”

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt5 Sequential Access Method  Start at the beginning – read through in order  Access time depends on location of data and previous location  e.g. tape location of interest start read to here... first location

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt6 Direct Access  Individual blocks have unique address  Access is by jumping to vicinity plus sequential search (or waiting! e.g. waiting for disk to rotate)  Access time depends on target location and previous location  e.g. disk block i... jump to here read to here

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt7 Random Access Method  Individual addresses identify specific locations  Access time independent of location or previous access  e.g. RAM... read here main memory types Ch. 5.1

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt8 Performance (Speed)  Access time Time between presenting the address and getting the valid data (memory or other storage)  Memory cycle time Some time may be required for the memory to “recover” before next access cycle time = access + recovery  Transfer rate rate at which data can be moved for random access memory = 1 / cycle time (cycle time) -1

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt9 Memory Hierarchy  size ? speed ? cost ?  registers in CPU  internal may include one or more levels of cache  external memory backing store smallest, fastest, most expensive, most frequently accessed medium, quick, price varies largest, slowest, cheapest, least frequently accessed

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt10 Memory Hierarchy - Diagram decreasing cost per bit, speed, access frequency increasing capacity, access time

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt11 Performance & Hierarchy List  Registers  Level 1 Cache  Level 2 Cache  Main memory  Disk cache  Disk  Optical  Tape soon ( 2 slides ! ) Faster, +$/byte Slower, -$/byte

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt12 Locality of Reference (circa 1968)  During program execution memory references tend to cluster, e.g. loops  Many instructions in localized areas of pgm are executed repeatedly during some time period, and remainder of pgm is accessed infrequently. (Tanenbaum)  Temporal LOR: a recently executed instruction is likely to be executed again soon  Spatial LOR: instructions with addresses close to a recently executed instruction are likely to be executed soon.  Same principles apply to data references.

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt13 Cache  small amount of fast memory  sits between normal main memory and CPU  may be located on CPU chip or module word transfer block transfer cache cache views main memory as organized in “blocks” smaller than main memory recall “blocks” from a few slides back

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt14 Why does Caching Improve Speed? Example:  Main memory has 100,000 words, access time is 0.1  s.  Cache has 1000 words and access time is 0.01  s.  If word is in cache (hit), it can be accessed directly by processor. in memory (miss), it must be first transferred to cache before access.  Suppose that 95% of access requests are hits.  Average time to access a word (0.95)(0.01  s)+0.05(0.1  s+ 0.01  s) = 0.015  s Close to cache speed Key proviso

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt15 Cache Read Operation  CPU requests contents of memory location  check cache for contents of location cache hit ! cache miss ! present  get data from cache (fast) not present  read required block from main to cache – then deliver data from cache to CPU

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt16 Cache Design  Size  Mapping Function  Replacement Algorithm  Write Policy  Block Size  Number of Caches

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt17 Size  Cost More cache is expensive  Speed More cache is faster (up to a point) Checking cache for data takes time

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt18 Mapping Function  how does cache contents map to main memory contents? cache tag data block 000 xxx block i block j... main memory address contents line use tag (and maybe line address) to identify block address

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt19 Cache Basics  cache line vs. main memory location –same concept – avoid confusion (?) –line has address and contents  contents of cache line divided into tag and data fields fixed width fields used differently ! data field holds contents of a block of main memory tag field helps identify the start address of the block of memory that is in the data field cache line width bigger than memory location width !

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt20 Mapping Function Example  cache of 64 KByte 16 K (2 14 ) lines – each line is 5 bytes wide = 40 bits  16 MBytes main memory  24 bit address 2 24 = 16 M  will consider DIRECT and ASSOCIATIVE mappings 4 byte blocks of main memory holds up to 64 Kbytes of main memory contents tag field: 1 byte data field: 4 bytes

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt21 Direct Mapping  each block of main memory maps to only one cache line i.e. if a block is in cache, it must be in one specific place – based on address!  split address into two parts least significant w bits identify unique word in block most significant s bits specify one memory block –split s bits into: cache line address field r bits tag field of s-r most significant bits s w tag line address s – r r s line field identifies line containing block !

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt22 tag s-r line address r word w 8 142 24 bit address s = 22 bit block identifier 2 bit word identifier (4 byte block) Direct Mapping: Address Structure for Example  two blocks may have the same r value, but then always have different tag value !

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt23 Direct Mapping Cache Line Table cache linemain memory blocks held 0 0, m, 2m, 3m, … 2 s -m 1 1, m+1, 2m+1, … 2 s -m+1 m-1 m-1, 2m-1,3m-1, … 2 s -1... m=2 14 s=22 each block = 4 bytes But…a line can contain only one of these at a time!

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt24 Direct Mapping Cache Organization

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt25 Direct Mapping for Example s = 22 bits = tag + line 4 byte blocks tag + line + word = 24 bits block start addresses r=14 bits m=2 r word= 2 bits tag = 8 bits

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt26 Direct Mapping Summary  Address length = (s + w) bits  Number of addressable units = 2 s+w words or bytes  Block size = line size – tag size = 2 w words or bytes  Number of blocks in main memory = 2 s+ w /2 w = 2 s  Number of lines in cache = m = 2 r  Size of tag = (s – r) bits

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt27 Direct Mapping pros & cons  Simple  Inexpensive  Fixed location for given block If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt28 Associative Memory  read: specify tag field value and word select  checks all lines – finds matching tag return contents of data field @ selected word  access time independent of location or previous access  write to data field @ tag value + word select what if no words with matching tag? Ch. 4.3

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt29 Associative Mapping  main memory block can load into any line of cache  memory address is interpreted as tag and word select in block  tag uniquely identifies block of memory !  every line’s tag is examined for a match  cache searching gets expensive s = tag  does not use line address !

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt30 Fully Associative Cache Organization no line field !

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt31 Associative Mapping Example tag = most signif. 22 bits of address Typo- leading F missing!

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt32 Tag 22 bit Word 2 bit Associative Mapping Address Structure  22 bit tag stored with each 32 bit block of data  Compare tag field with tag entry in cache to check for hit  Least significant 2 bits of address identify which 8 bit word is required from 32 bit data block  e.g. AddressTagData Cache line FFFFFC3FFFFF24682468 any, e.g. 3FFF

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt33 Associative Mapping Summary  Address length = (s + w) bits  Number of addressable units = 2 s+w words or bytes  Block size = line size – tag size = 2 w words or bytes  Number of blocks in main memory = 2 s+ w /2 w = 2 s  Number of lines in cache = undetermined  Size of tag = s bits

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt34 Set Associative Mapping  Cache is divided into a number of sets  Each set contains k lines  k – way associative  A given block maps to any line in a given set e.g. Block B can be in any line of set i  e.g. 2 lines per set 2 – way associative mapping A given block can be in one of 2 lines in only one set

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt35 K-Way Set Associative Cache Organization Direct + Associative Mapping set select (direct) tag (associative)

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt36 Set Associative Mapping Address Structure  Use set field to determine which set of cache lines to look in (direct)  Within this set, compare tag fields to see if we have a hit (associative)  e.g AddressTagDataSet number FFFFFC1FF123456781FFF 00FFFF001112233441FFF Tag 9 bitSet 13 bit Word 2 bit Same Set, different Tag, different Word

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt37 e.g Breaking into Tag, Set, Word  Given Tag=9 bits, Set=13 bits, Word=2 bits  Given address FFFFFD 16  What are values of Tag, Set, Word? First 9 bits are Tag, next 13 are Set, next 2 are Word Rewrite address in base 2: 1111 1111 1111 1111 1111 1101 Group each field in groups of 4 bits starting at right Add zero bits as necessary to leftmost group of bits  0001 1111 1111 0001 1111 1111 1111 0001  1FF 1FFF 1 (Tag, Set, Word)

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt38 Replacement Algorithms Direct Mapping  what if bringing in a new block, but no line available in cache?  must replace (overwrite) a line – which one?  direct  no choice each block only maps to one line  replace that line

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt39 Replacement Algorithms Associative & Set Associative  hardware implemented algorithm (speed)  Least Recently Used (LRU)  e.g. in 2-way set associative which of the 2 blocks is LRU?  First In first Out (FIFO) replace block that has been in cache longest  Least Frequently Used (LFU) replace block which has had fewest hits  Random

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt40 Write Policy  must not overwrite a cache block unless main memory is up to date  Complication: Multiple CPUs may have individual caches!!  Complication: I/O may address main memory too (read and write)!!  N.B. 15% of memory references are writes

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt41 Write Through Method  all writes go to main memory as well as cache  Each of multiple CPUs can monitor main memory traffic to keep its own local cache up to date  lots of traffic  slows down writes

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt42 Write Back Method  updates initially made in cache only  update (dirty) bit for cache slot is set when update occurs  if block is to be replaced, write to main memory only if update bit is set  Other caches get out of sync  I/O must access main memory through cache

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt43 Multiple Caches on one processor  two levels – L1 close to processor (often on chip) –L2 – between L1 and main memory  check L1 first – if miss – then check L2 if L2 miss – get from memory processor L1L2 local bus system bus to high speed bus

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt44 Unified vs. Split Caches  unified  both instruction and data in same cache  split  separate caches for instructions and data separate local busses to cache  increased concurrency  pipelining allows instruction fetch to be concurrent with operand access Ch. 12

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt45 Pentium Family Cache Evolution  80386 – no on chip cache  80486 – 8k using 16 byte lines and four way set associative organization  Pentium (all versions) – two on chip L1 (split) caches data & instructions

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt46 Pentium 4 Cache  Pentium 4 – split L1 caches 8k bytes 128 lines of 64 bytes each four way set associative = 32 sets  unified L2 cache – feeding both L1 caches 256k bytes 2048 (2k) lines of 128 bytes each 8 way set associative = 256 sets how many bits ? w words s set

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt47 L1 instructions L1 data unified L2 Pentium 4 Diagram (Simplified)

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt48 Power PC Cache Evolution  601 – single 32kb 8 way set associative  603 – 16kb (2 x 8kb) two way set associative  604 – 32kb  610 – 64kb  G3 & G4 64kb L1 cache  8 way set associative 256k, 512k or 1M L2 cache  two way set associative

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt49 PowerPC G4 unified L2 L1 instructions L1 data

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.

Similar presentations

Presentation on theme: "2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.

Similar presentations

Presentation on theme: "2007 Sept. 14SYSC 2001* - Fall 2007. SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples."— Presentation transcript:

Similar presentations

About project

Feedback