Module IV Memory Organization.

Slides:

Advertisements

Similar presentations

SE-292 High Performance Computing

Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.

Associative Cache Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word (or sub-address in line) Tag.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Cache memory Direct Cache Memory Associate Cache Memory Set Associative Cache Memory.

Computer Organization and Architecture

Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.

COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.

Maninder Kaur CACHE MEMORY 24-Nov

Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

In1210/01-PDS 1 TU-Delft The Memory System. in1210/01-PDS 2 TU-Delft Organization Word Address Byte Address

L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

2007 Sept. 14SYSC 2001* - Fall SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.

CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.

Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.

Memory Hierarchy. Hierarchy List Registers L1 Cache L2 Cache Main memory Disk cache Disk Optical Tape.

11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.

Lecture 20 Last lecture: Today’s lecture: Types of memory

Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module.

Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.

CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.

Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of.

CSCI206 - Computer Organization & Programming

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Computer Organization

Non Contiguous Memory Allocation

Main Memory Cache Architectures

COSC3330 Computer Architecture

The Memory System (Chapter 5)

Memory and cache CPU Memory I/O.

Replacement Policy Replacement policy:

CSC 4250 Computer Architectures

Multilevel Memories (Improving performance using alittle “cash”)

12.4 Memory Organization in Multiprocessor Systems

Cache Memory Presentation I

Consider a Direct Mapped Cache with 4 word blocks

William Stallings Computer Organization and Architecture 7th Edition

Cache memory Direct Cache Memory Associate Cache Memory

Lecture 21: Memory Hierarchy

BIC 10503: COMPUTER ARCHITECTURE

Memory and cache CPU Memory I/O.

Lecture 23: Cache, Memory, Virtual Memory

Module IV Memory Organization.

Chapter 6 Memory System Design

Performance metrics for caches

Performance metrics for caches

Chap. 12 Memory Organization

Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP

Performance metrics for caches

Miss Rate versus Block Size

Lecture 22: Cache Hierarchies, Memory

Contents Memory types & memory hierarchy Virtual memory (VM)

Lecture 21: Memory Hierarchy

Performance metrics for caches

Cache - Optimization.

Cache Memory and Performance

Performance metrics for caches

Virtual Memory.

10/18: Lecture Topics Using spatial locality

Overview Problem Solution CPU vs Memory performance imbalance

Presentation transcript:

Module IV Memory Organization

Set Associative Cache It combines both the concepts. The cache lines are grouped into sets. The number of lines in a set can vary from 2 to 16. A part of address specify which set hold the address. Data is stored in any of the lines in the set.

Set Associative Cache Two lines per set is called two way set associative. Each entry has its own tag. A set is selected using its index

Set Associative Cache Assume you have • 16 bit memory address • 2 KB of cache 16 byte lines 2 way set associative The memory address is defined as follows: Word = log2(16) =4 Number of lines = 2 KB / 16 = 211/24=27 =128 Number of sets = 128 / 2 = 64 Set bits = log2(64) = 6 Tag bits = 16-(4+6)=6 Tag Set Word 6bits 6 bits 4bits

Example Suppose we want to read or write a byte at the address 357A Tag = 13 Line = 23 Word = 10 If set 23 in cache has tag 13, then data at 357A is in cache. Else, a miss has occurred Contents of any cache line of set 23 is replaced by contents of memory line 001101010111 = 855

Simulation Consider line size of 4 bytes No. of cache memory lines is 8 Cache is 2-way set associative. No. of sets =8/2 =4 No. of main memory lines is 24

Simulation Main Memory Set Set 0 Set1 Tag Data 1 2 3 Cache Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 1 2 3 Cache

Simulation 2mod 4 =2 Main Memory Set Set 0 Set1 Tag Data 1 2 98 3 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 1 2 98 3 2mod 4 =2 Cache MISS !!!

Simulation 7mod 4 =3 Main Memory Set Set 0 Set1 Tag Data 1 2 98 3 7 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 1 2 98 3 7 1283 7mod 4 =3 Cache MISS !!!

Simulation 15mod 4 =3 Main Memory Set Set 0 Set1 Tag Data 1 2 98 3 7 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 1 2 98 3 7 1283 15 993 15mod 4 =3 Cache MISS !!!

Simulation 22mod 4 =2 Main Memory Set Set 0 Set1 Tag Data 1 2 98 22 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 1 2 98 22 1232 3 7 1283 15 993 22mod 4 =2 Cache MISS !!!

Simulation 17mod 4 =1 Main Memory Set Set 0 Set1 Tag Data 1 17 12 2 98 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 1 17 12 2 98 22 1232 3 7 1283 15 993 17mod 4 =1 Cache MISS !!!

Simulation 16mod 4 =0 Main Memory Set Set 0 Set1 Tag Data 16 982 1 17 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 16 982 1 17 12 2 98 22 1232 3 7 1283 15 993 16mod 4 =0 Cache MISS !!!

Simulation 14mod 4 =2 Main Memory Set Set 0 Set1 Tag Data 16 982 1 17 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 16 982 1 17 12 2 14 22 1232 3 7 1283 15 993 14mod 4 =2 Cache MISS !!!

Simulation 18mod 4 =2 Main Memory Set Set 0 Set1 Tag Data 16 982 1 17 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 16 982 1 17 12 2 14 18 1123 3 7 1283 15 993 18mod 4 =2 Cache MISS !!!

Simulation 8mod 4 =0 Main Memory Set Set 0 Set1 Tag Data 16 982 8 1232 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 16 982 8 1232 1 17 12 2 14 18 1123 3 7 1283 15 993 8mod 4 =0 Cache MISS !!!

Simulation 4mod 4 =0 Main Memory Set Set 0 Set1 Tag Data 4 8172 8 1232 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 4 8172 8 1232 1 17 12 2 14 18 1123 3 7 1283 15 993 4mod 4 =0 Cache MISS !!!

Simulation 15mod 4 =3 Main Memory Set Set 0 Set1 Tag Data 4 8172 8 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 4 8172 8 1232 1 17 12 2 14 18 1123 3 7 1283 15 993 15mod 4 =3 Cache HIT !!!

Simulation 18mod 4 =2 Main Memory Set Set 0 Set1 Tag Data 4 8172 8 Memory References: 2,7,15,22,17,16, 14,18,8,4,15,18 Set Set 0 Set1 Tag Data 4 8172 8 1232 1 17 12 2 14 18 1123 3 7 1283 15 993 18mod 4 =2 Cache HIT !!!

Replacement Algorithms Because of its simplicity of implementation, LRU is the most popular replacement algorithm. Another method is FIFO : Replace that block in the set that has been in the cache longest. Still another method is LFU: Replace that block in the set that has the fewest references. A technique not based on usage is to pick a line at random from among the candidate lines. Studies shows that random replacement provides inferior performance to an algorithm based on usage

Write Policy There are 2 policies : Writeback Writethrough Writeback: writing results only to cache Adv : faster writes Disadv: out-of-date main memory data Writethrough: writing to cache and main memory Adv : maintain valid data in main memory Disadv: requires long write times

Line Size Two specific effects come into play: Larger blocks reduce the number of blocks that fit into a cache. As a block becomes larger, the additional words are farther from the requested word and therefore less likely to be needed in the near future.

Number of Caches Two aspects of this design issue number of levels of caches use of unified versus split caches Multilevel Caches : The simplest such organization is known as a two-level cache. Now a days we have three-level cache : L1,L2,L3

Number of Caches Unified Versus Split Cache Unified Cache: a single cache is used to store references to both data and instructions. Split Cache: Uses two caches : one dedicated to instructions and one dedicated to data. These two caches both exist at the same level, typically as two L1 caches.

Advantages of Unified Cache It has a higher hit rate than split caches Only one cache needs to be designed and implemented Advantage of Split Cache It eliminates contention for the cache between the instructions fetch/decode unit and the execution unit

Cache Coherency Needed if more than one device (typically a processor) shares cache and main memory . If data in one cache are altered, this invalidates not only the corresponding word in main memory, but also that same word in other caches Even if a write-through policy is used, the other caches may contain invalid data. A system that prevents this problem is said to maintain cache coherency.

Cache Coherency Possible approaches to cache coherency include the following: Bus watching with write-through Hardware transparency Noncacheable memory

Bus watching with write-through Each cache controller monitors the address lines to detect write operations to memory by other bus masters. If another master writes to a location in shared memory that also resides in the cache memory, the cache controller invalidates that cache entry.

Hardware Transparency Additional hardware is used to ensure that all updates to main memory via cache are reflected in all caches. If one processor modifies a word in its cache, this update is written to main memory. In addition, any matching words in other caches are similarly updated.

Noncacheable memory Only a portion of main memory is shared by more than one processor, and this is designated as noncacheable. In such a system, all accesses to shared memory are cache misses, because the shared memory is never copied into the cache. The noncacheable memory can be identified using chip-select logic or high-address bits.

Memory Interleaving Reduce memory access time Main memory is divided into a number of modules and addresses are arranged such that successive bytes are stored in different modules. CPU access successive locations and access to them can be done in parallel, reducing access time.

Memory Interleaving Lower order k bits are used to select a module and higher order m bits are to access a location in the module It should have 2k modules else there will be gaps of non-existent locations.

Associative Memory To search an object, no. of memory accesses depends on the location of object and efficiency of search algorithm. Time to find an object can be reduced, if objects are selected based on their contents. This type of memory is called Associative Memory or Content Addressable Memory (CAM).

Block Diagram

Associative Memory It consists of memory array with match logic for m n-bit words Argument Register (A) and Key Register (K) have n-bits per word Each word in memory is compared with A in parallel and the match is set in match register. Read can be done based on match register content

Associative Memory K is used to mask A Only bits of A with corresponding bits set of K is compared. Example

Associative Memory Cells