William Stallings Computer Organization and Architecture 7th Edition

Slides:

Advertisements

Similar presentations

Chapter 6 Computer Architecture

Advertisements

Computer Memory System

Computer Organization and Architecture

Associative Cache Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word (or sub-address in line) Tag.

Computer Architecture, Memory Hierarchy & Virtual Memory

TK 2123 COMPUTER ORGANISATION & ARCHITECTURE

Cache memory Direct Cache Memory Associate Cache Memory Set Associative Cache Memory.

Computer Organization and Architecture

CH05 Internal Memory Computer Memory System Overview Semiconductor Main Memory Cache Memory Pentium II and PowerPC Cache Organizations Advanced DRAM Organization.

CSNB123 coMPUTER oRGANIZATION

Faculty of Information Technology Department of Computer Science Computer Organization and Assembly Language Chapter 4 Cache Memory.

Cache Memory. Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.

Computer Architecture Lecture 3 Cache Memory. Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics.

03-04 Cache Memory Computer Organization. Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics.

L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.

1 CACHE MEMORY 1. 2 Cache Memory ■ Small amount of fast memory, expensive memory ■ Sits between normal main memory (slower) and CPU ■ May be located on.

2007 Sept. 14SYSC 2001* - Fall SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.

CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

Memory Hierarchy. Hierarchy List Registers L1 Cache L2 Cache Main memory Disk cache Disk Optical Tape.

Chapter 4: MEMORY Cache Memory.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module.

Chapter 4 Cache Memory. Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.

Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.

Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.

1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.

Cache memory. Cache memory Overview CPU Cache Main memory Transfer of words Transfer of blocks of words.

Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of.

William Stallings Computer Organization and Architecture 7th Edition

William Stallings Computer Organization and Architecture 8th Edition

CSC 4250 Computer Architectures

William Stallings Computer Organization and Architecture 8th Edition

Yunju Baek 2006 spring Chapter 4 Cache Memory Yunju Baek 2006 spring.

Cache Memory Presentation I

Consider a Direct Mapped Cache with 4 word blocks

William Stallings Computer Organization and Architecture 7th Edition

Cache memory Direct Cache Memory Associate Cache Memory

BIC 10503: COMPUTER ARCHITECTURE

Module IV Memory Organization.

Computer Architecture

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Chapter 6 Memory System Design

William Stallings Computer Organization and Architecture 8th Edition

CS 3410, Spring 2014 Computer Science Cornell University

William Stallings Computer Organization and Architecture 8th Edition

William Stallings Computer Organization and Architecture 8th Edition

Cache - Optimization.

William Stallings Computer Organization and Architecture 8th Edition

10/18: Lecture Topics Using spatial locality

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

William Stallings Computer Organization and Architecture 7th Edition Chapter 4 Cache Memory (2/2)

More on Direct Mapping Since the mapping function of direct mapping is , we can see the direct mapped cache as following figure i: Cache line number j: block number m : number of lines in the cache 2

More on Direct Mapping – cache miss Address (octal expression) tag data Cache line number 00000 12 00 12 000 When the processor wants to read the data in memory address “01777” First, find the index 777 in the cache Compare the tag “01” with the tag value in the cache Since the tag value in the cache is “00”, the cache miss is occurred Processor accesses the main memory, and then the word “65” is fetched. Also, the “65” value is updated to the cache line 777 with a new tag value “01” … 00 99 001 97 … 00777 53 01000 … 00 97 777 01777 65 11 02000 … 01 65 02777 77 From step (5), The tag & data is updated … (b) Cache 77777 FF In this example, we propose main Memory address = tag(6bits) + cache line index(9bits) Also, the address is expressed with an octal expression and 8 bit word size (a) Main Memory 3

Associative mapping overcomes the disadvantage of direct mapping by permitting each main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory To determine whether a block is in the cache, the cache control logic must simultaneously examine every line’s tag for a match Cache searching gets expensive

Fully Associative Cache Organization

Associative Mapping Example * 24bit sized main memory address : 16339C 22bit sized tag value : 058CE7 * FFFFFC (in main memory) >> 2 bits 3FFFFF (of tag value)

Associative Mapping Address Structure Word 2 bit Tag 22 bit 24 bit address 22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit

Associative Mapping Summary Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+ w/2w = 2s Number of lines in cache = undetermined Size of tag = s bits

Associative Mapping Pros & Cons Advantage Flexible Disadvantages Cost Complex circuit for simultaneous comparison

Set Associative Mapping Compromised to show the strengths of both the direct & associative mapping Cache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set e.g. Block B can be in any line of set i e.g. 2 lines per set 2 way associative mapping A given block can be in one of 2 lines in only one set

Set Associative Mapping Cache is divided into v sets of k lines each m = v x k, where m: number of cache lines i = j mod v, where i : cache set number j : main memory block number v : number of sets A given block maps to any line in a given set K-way set associate cache 2-way and 4-way are common 11

Set Associative Mapping Example m = 16 lines, v = 8 sets  k = 2 lines/set, 2 way set associative mapping * Assume 32 blocks in memory, i = j mod v set blocks 0 0, 8, 16, 24 1 1, 9, 17, 25 : : 7 7, 15, 23, 31 Since each set of cache has 2 lines, the memory block can be in one of 2 lines in the set e.g., block 17 can be assigned to either line 0 or line 1 in set 1

Set Associative Mapping Example 13 bit set number Block number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 … map to same set. ( all of above examples have same values of least significant 12 bits)

Two Way Set Associative Cache Organization

Set Associative Mapping Address Structure Tag 9 bit Set 13 bit Word 2 bit Use set field to determine cache set to look in Compare tag field to see if we have a hit e.g Address Tag Data Set number 1FF 7FFC 1FF 12345678 1FFF 001 7FFC 001 11223344 1FFF Address (24) Tag (9) Set (13)+word(2) 1 1111 1111 111 1111 1111 1100 Set number (1FFF)

Two Way Set Associative Mapping Example Same set Two Way Set Associative Mapping Example (1) Set Number : mem addr (set + word) [14:0]  >> 2  Set number [14:2] 0x0000  0000 0004  0001 339C  0CE7 7FFC  1FFF 7FF8  1FFE (2) Since 2-way associative each set has 2 cache lines (3) Tag : msb 9 bits mem addr [23:15]

Set Associative Mapping Summary Address length = (s + w) bits Number of addressable units = 2s+w words or bytes Block size = line size = 2w words or bytes Number of blocks in main memory = 2s+w /2w=2s Number of lines in set = k Number of sets = v = 2d Number of lines in cache = kv = k * 2d Size of tag = (s – d) bits

Remarks Why is the simultaneous comparison cheaper here, compared to associate mapping? Tag is much smaller Only k tags within a set are compared Relationship between set associate and the first two: extreme cases of set associate k = 1  v = m  direct mapping (1 line/set) k = m  v = 1  associate mapping (one big set)

Replacement Algorithms (1) Direct mapping When a new block is brought into cache, one of existing blocks must be replaced In direct mapping, the replacement alg has the following features: No choice Each block only maps to one line Replace that line

Replacement Algorithms (2) Associative & Set Associative Hardware implemented algorithm (speed) Least Recently used (LRU) e.g. in 2 way set associative Which of the 2 block is LRU ? First in first out (FIFO) replace block that has been in cache longest Least frequently used replace block which has had fewest hits Random

Write Policy Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly

Write through All writes go to main memory as well as cache Both copies always agree Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic  cause bottleneck Slows down writes Remember bogus write through caches!

Write back Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set i.e., only if the cache line is dirty, i.e., only if at least one word in the cache line is updated Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes

As block size increases from very small Block size = line size As block size increases from very small  hit ratio increases because of “the principle of locality” As block size becomes very large  hit ratio decreases as Number of blocks decreases Probability of referencing all words in a block decreases 4 - 8 addressable units is reasonable

Number of Caches Two aspects Number of levels Unified vs. split

Modern CPU has on-chip cache (L1) that increases overall performance Multilevel Caches Modern CPU has on-chip cache (L1) that increases overall performance e.g., 80486: 8KB Pentium: 16KB PowerPC: up to 64KB Secondary, off-chip cache (L2) provides high speed access to main memory Generally 512KB or less Current processor has the L2 cache in its processor

Unified vs. Split Unified cache Split cache Stores data and instructions in one cache Flexible and can balance the load between data and instruction fetches  higher hit ratio Only one cache to design and implement Split cache Two caches, one for data and one for instructions Trend toward split cache Good for superscalar machines that support parallel execution, prefetch, and pipelining Overcome cache contention

Pentium 4 Cache 80386 – no on chip cache 80486 – 8k using 16 byte lines and four way set associative organization Pentium (all versions) – two on chip L1 caches Data & instructions Pentium III – L3 cache(extra cache built in motherboard b/w processor & main memory) added off chip Pentium 4 L1 caches 8k bytes 64 byte lines four way set associative L2 cache Feeding both L1 caches 256k 128 byte lines 8 way set associative L3 cache on chip

Intel Cache Evolution

Pentium 4 Block Diagram

Pentium 4 Core Processor Fetch/Decode Unit Fetches instructions from L2 cache Decode into micro-ops Store micro-ops in L1 cache Out of order execution logic Schedules micro-ops Based on data dependence and resources May speculatively execute Execution units Execute micro-ops Data from L1 cache Results in registers Memory subsystem L2 cache and systems bus

Pentium 4 Design Reasoning Decodes instructions into RISC like micro-ops before L1 cache Micro-ops fixed length Superscalar pipelining and scheduling Pentium instructions long & complex Performance improved by separating decoding from scheduling & pipelining (More later – ch14) Data cache is write back Can be configured to write through L1 cache controlled by 2 bits in register CD = cache disable NW = not write through 2 instructions to invalidate (flush) cache and write back then invalidate L2 and L3 8-way set-associative Line size 128 bytes

PowerPC Cache Organization 601 – single 32kb 8 way set associative 603 – 16kb (2 x 8kb) two way set associative 604 – 32kb 620 – 64kb G3 & G4 64kb L1 cache 8 way set associative 256k, 512k or 1M L2 cache two way set associative G5 32kB instruction cache 64kB data cache

PowerPC G5 Block Diagram

Internet Sources Manufacturer sites Intel IBM/Motorola Search on cache