© Karen Miller, 2011 1 What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.

© Karen Miller, 2011 1 What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast fast at what? (easy answer: fast at my programs)

© Karen Miller, 2011 4 parallelism Suppose we have 3 tasks: t1, t2, and t3. If independent, then serial A serial implementation on 1 computer: t1 t2t3 parallel A parallel implementation (given that we have 3 computers). t1 t2 t3

© Karen Miller, 2011 7 Recall the fetch and execute cycle:  fetch instruction   PC update  decode  get operands (  for a load)  do operation  store result (  for a store)  requires a memory access

© Karen Miller, 2011 9 temporal locality Recently referenced memory locations are likely to referenced again (soon!) loop: instr 1 @ A1 instr 2 @ A2 instr 3 @ A3 b loop @ A4 Instruction stream references: A1 A2 A3 A4 A1 A2 A3 A4 A1 A2 A3... Note that the same memory location is repeatedly read (for the fetch).

© Karen Miller, 2011 10 spatial locality Memory locations near to referenced locations are likely to also be referenced. array memory Code must do something to each element of the array. Must load each element....

© Karen Miller, 2011 12 cache A cache is designed to hold copies of a subset of memory locations.  smaller (in terms of bytes) than main memory  faster than main memory  co-located: processor and cache are on the same chip

© Karen Miller, 2011 15 P sends memory request to C.  hit  hit: requested location's copy is in the C  miss  miss: requested location's copy is NOT in the C. So, send the memory access to M. P and C M

© Karen Miller, 2011 17 So, when designing a cache, keep likely to be referenced (again) bytes and their neighbors in the cache... So, what is in the cache is different for each different program. On average, for a given program: AverageAverage MemoryMemory AccessAccess TimeTime = T c + (miss ratio) (T m )

© Karen Miller, 2011 22 Take advantage of spatial locality by making the block size greater than 1 word. On a miss, copy the entire block into the cache, and then keep it there as long as possible. (Why?) How the cache uses the address to do a look up: index # byte/word within block ? which block frame

© Karen Miller, 2011 23  Which block frame is known as index # or (sometimes) line #  But, many main memory blocks map to the same cache block frame... only one may be in the frame at a time!  We must distinguish which one is in the frame right now.

© Karen Miller, 2011 26  Still missing... must distinguish block frames that have nothing in them from ones that have a block from main memory (consider power up for a computer system: nothing is in the cache)  We need 1 bit per block, most often called a valid bit (sometimes called a present bit)

© Karen Miller, 2011 27 cache access (or cache lookup)  index # is used to find the correct block frame  Is block frame valid? YES: Compare address tag to block frame's tag: match: HIT no match: MISS NO: MISS

© Karen Miller, 2011 31 How about 4-way set associative, or 8-way set associative? For a fixed number of block frames,  larger set size tends to lead to higher hit ratios  larger set size means that the amount of HW (circuitry) goes up, and T c increases 

© Karen Miller, 2011 33 VTagData 2write back at first, change data in the cache write to memory only when necessary dirty bit dirty bit is set on a write, to identify blocks to be written back to memory when a program completes, all dirty blocks must be written to memory...

© Karen Miller, 2011 34 2write back (continued)  faster multiple stores to the same location result in only 1 main memory access  more circuitry   must maintain the dirty bit  dirty miss : a miss caused by a read or write to a block not in the cache, but the required block frame has its dirty bit set. So, there is a write of the dirty block, followed by a read of the requested block.

© Karen Miller, 2011 35 VTagData How about 2 separate caches ? I-cache I-cache  for instructions only  can be rather small, and still have excellent performance. VTagData VTagData D-cache D-cache  for data only  needs to be fairly large

© Karen Miller, 2011 1 What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.

Similar presentations

Presentation on theme: "© Karen Miller, 2011 1 What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© Karen Miller, 2011 1 What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.

Similar presentations

Presentation on theme: "© Karen Miller, 2011 1 What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast."— Presentation transcript:

Similar presentations

About project

Feedback