Presentation is loading. Please wait.

Presentation is loading. Please wait.

Memory Operation and Performance

Similar presentations


Presentation on theme: "Memory Operation and Performance"— Presentation transcript:

1 Memory Operation and Performance
Lecture 8 – If we understand memory architecture, we could develop programs that run faster. 2019/1/17

2 Topics Memory Systems – This is the location to execute program.
Caches – cache is between CPU and main memory. It is faster memory used to speed up program operation with a little bit extra cost. Virtual Memory (VM) – use limited main memory to execute a program that is larger than the main memory. We now develop program to use cache memory. 2019/1/17

3 Memory system Memory Technology – RAM, ROM, Static RAM (SRAM) and Dynamic RAM (DRAM) Locality of Reference – load the variables within the code segment to speed up operation. Memory Hierarchies – disk -> main memory –> L2 cache -> L1 cache -> register (in terms of speed from low to high) 2019/1/17

4 Cache and register here
Memory and CPU Program here Cache and register here 2019/1/17

5 Memory hierarchies Within CPU 10 ns 70 ns 300 ns 2019/1/17

6 Common Memory Technologies
Static Random Access Memory (SRAM) Dynamic Random Access Memory (DRAM) Magnetic disks Magnetic tapes Optical disks Permanent Storage 2019/1/17

7 Speed about 10 ns cache memory main memory 2019/1/17

8 Cost Cache Memory Main memory 2019/1/17

9 Difference between DRAM SRAM
The most important information that these graphs give us is not, for example, that it takes 4 nanoseconds to access a byte in static RAM, but that it takes a lot longer to access dynamic RAM than it does to access static RAM 2019/1/17

10 Trade-off between cost and speed
If cost were no object, all computers would use static RAM exclusively, to take advantage of its speed. If speed of access was not an issue, computers would use disk or DRAM exclusively, to minimize cost. The conclusion: DRAM in main memory and SRAM in cache memory 2019/1/17

11 Size and Cost Disk cache Main Memory 2019/1/17

12 Locality of Reference Unable to predict future memory accesses, computers do the next best thing: they take guesses. (means next code segment) Just like humans, they base these guesses on the assumption that the immediate future will be similar to the immediate past, that memory addresses that have been accessed recently are likely to be accessed again. This is often called the principle of locality of reference. 2019/1/17

13 Principle of Locality references to a single address occur close together in time like int i, j; (like i and j) (this is called temporal locality (time)). references to addresses that are near to each other occur together in time Like it calls i and then j later (this is called spatial locality (location)). 2019/1/17

14 Principle of locality The principle of locality of reference is not an assurance, but rather a GUESS) Empirically, however, there is little doubt that programs behave according to this principle. Think about it: If you need to use the same variable i later, it is better to keep this in the cache. Not to release to the memory. 2019/1/17

15 Locality in a Code Fragment
sum = 0; for (i = 0; i < MAX; i++) sum += array[i]; If we can load them and keep in cache memory, it will be faster. Array[0], array[1], array[2] etc. are close to each other within memory. 2019/1/17

16 CPU instructions for the fragment
Keep them in cache memory while executing will be faster 2019/1/17

17 Explanation of Locality
Refer to previous case: if Frag + 2 will predict that it will load it in frag + 6 (temporary locality, mans against time) Frag + 2 to frag + 6 inclusive (spatial locality: means the memory location) 2019/1/17

18 memory hierarchy Data is kept in the slow and large memory,
small parts of it moved into the small, fast memory as they are about to be needed. This fastest memory is called cache memory. Cache means hidden (invisible to user) It means it is better to keep the frequently used data/code in cache memory. 2019/1/17

19 How to achieve memory hierarchy?
To do this perfectly, a system needs to be able to predict what data will be needed before it is actually used, so that the date can be moved to fast memory in time to keep the CPU from waiting on the slow access times of large memories. 2019/1/17

20 Hit Ratio – means the percentage of finding the data in the cache memory
The percentage of accesses for which the prediction is successful is called the hit ratio A high hit ratio, then, results in faster program execution. if 60 out of every 100 accesses result in a hit, the hit ratio is expressed as 0.6 or 60% Typical hit ratios in today's computers are almost always above 0.90 2019/1/17

21 Determination of hit ratio – an example
The access speed for main memory is 60 ns The cache memory is 10 ns. If the hit ratio is 0.9 (means 9 out of 10, it will find the data or code in the fast memory (cache)), determine the average memory access time. Average access time = hit ratio x cache memory speed + (1 – hit ratio ) x main memory speed Average access time = 0.9 x x 60 = 15 ns Without cache memory, it takes 60 ns. With cache memory, it takes 15ns. You can see the benefit.In this case, you might ask, why don’t we replace by cache memory? Money!!!!! 2019/1/17

22 The Importance of a Good Hierarchy
The efficiency of the memory hierarchy is of the highest importance. Today's computers are not limited by the speed of their CPUs but by the speed of their memory systems and disks. (disk I/O and memory access time) A small improvement in the hit ratio can yield a large improvement in the speed of execution. 2019/1/17

23 Where is the bottleneck
Example An Intel Pentium VI processor can execute hundreds of millions of instructions per second (its clock speed is 4 GHz or more). Each of those instructions must be fetched from memory. If we can improve the memory access time, we can improve the overall performance. 2019/1/17

24 Bottleneck is memory not CPU
This is because the speed of the CPU is improving faster than the speed of DRAM, even though both are improving very fast. SRAM is faster than DRAM 2019/1/17

25 Graph showing CPU, DRAM & SRAM
main memory 2019/1/17

26 little change in hit ratio will improve the systems' perforamnce
Effect of hit ratio little change in hit ratio will improve the systems' perforamnce 2019/1/17

27 Design of the Hierarchy
register -> cache memory (SRAM)-> main memory (DRAM) -> Disk Typical computer designs have at least four levels in the hierarchy. Registers are closest to the CPU, are smallest in size, and are also the fastest memory in the system. Furthest from the CPU is the disk, where large inactive parts of programs sleep for long periods of time. In between is cache and main memory. 2019/1/17

28 Four-level hierarchy – each time, 8192 bytes will be loaded into memory, 32 bytes will be loaded into cache and 8 bytes will be loaded into register. 2019/1/17

29 Caches caching—that is, the levels of the memory hierarchy that stand between the CPU registers and the main memory (DRAM). These caches are typically implemented with static RAM. Cache Design Parameters A Diversity of Caches Looking at the Caches Cache-Aware Programming 2019/1/17

30 Cache Design Parameters
Fetch Strategy – load the data/code from main memory into cache line (say 32 bytes) [sequence 1] Placement Strategy – put the data into the appropriate cache line [sequence 2] Replacement Strategy – select which cache line to put back to main memory [sequence 3] Update Strategy [sequence 4] Size 2019/1/17

31 disk (block) -> main memory (page) -> cache (line)
Fetch Strategy When the cache misses (means CPU does not find the data in the cache), the data must be found in the next level of the memory hierarchy. That is the main memory. The "chunks" in which all transfers to and from the cache are done are called cache lines. Although it load 4 bytes (integers), it will fill up the 28 bytes ( = 32) in the cache line. Cache lines are typically in the order of 16 or 32 bytes long disk (block) -> main memory (page) -> cache (line) 2019/1/17

32 Effect after cache miss
When the cache misses an address X, it fetches addresses [X,X+32) into the cache, hoping that the program will get around to requesting the rest (that is, X+1 through X+32). The risk is that the program will not request the rest and the space consumed by it could have been better used to cache other values. Example: I want to load int i; it will load the rest (means j, k, for (i = 0…. into the cache line) ; int i, j, k; //12 bytes For (i = 0; i < 10; i++) { //you have to disassemble to find // size 2019/1/17

33 Example - assume that a cache consists of 128 lines each of 32 bytes
32 bytes line 126 When it loads the first byte into the line, it will load the remaining at the same time. 32 bytes line 127 2019/1/17

34 Determine the following
int c, m, y, k for(i = 0; i < 16; i++) {         for(j = 0; j < 16; j++) {                 square[i][j].c = 0;                 square[i][j].m = 0;                 square[i][j].y = 1;                 square[i][j].k = 0;         } } What is the total number of writes? What is the total number of writes that miss in the cache? What is the miss rate? 2019/1/17

35 Miss and Write Square[0][0].c 4 bytes 28 bytes write 2019/1/17

36 Answer What is the total number of writes?
16*16*4, total no. of writes =1024 (WRITES) What is the total number of writes that miss in the cache? 16*8 = no. of miss write(16*4*2) =128 (Misses) What is the miss rate? (16*8)/(16*16*4) * 100% =(1/8)*100% 2019/1/17

37 Placement Strategy (1)-fully-associative cache
No need to memorise Placement Strategy (1)-fully-associative cache When a value is moved into the cache, its address in memory must be remembered for matching against CPU requests. This can be done in many ways. Perhaps the simplest way is to store both the value and its address in the cache. When an address is referenced, it is then matched against all the addresses in the cache. This type of cache is called a fully-associative cache 2019/1/17

38 Placement Strategy - a direct cache
Because the cache is smaller than memory, the mapping from memory to cache location is many-to-one: each cache location may hold values from several memory addresses. Therefore, the address still needs to be cached together with the value. But because only certain addresses can reside at one cache location, only the part of the address that varies need be stored. This is called the tag. 2019/1/17

39 Placement Strategy - a direct cache
Because the cache is smaller than memory, the mapping from memory to cache location is many-to-one: each cache location may hold values from several memory addresses. Therefore, the address still needs to be cached together with the value. But because only certain addresses can reside at one cache location, only the part of the address that varies need be stored. This is called the tag. 2019/1/17

40 Direct cache address X must be placed in cache location (X/l) % N, where N is the size of the cache and l is the size of the cache line 2019/1/17

41 Example Now, suppose that, in the following code, the addresses of a and b only differ in the tag field. for (i = 0; i < M; i++) { a[i] = b[i] } In that case, for each i, a[i] and b[i] will both be cached in the same cache line. 2019/1/17

42 Thrashing This is a collision that happens repeatedly,
causing thrashing: a condition in which data is transferred back and forth with no benefit. In this case,CPU is very busy but the response is very slow. It is because almost of the CPU’s time is spending in swapping in/out. 2019/1/17

43 When cache is full, which line is moved out.
Replacement Strategy The strategy for replacement determines the choice of a "victim" cache line to evict (move out) when there is no room for an incoming value. The cache might be full, or the set into which the value is assigned in a set associative cache might be full. In either case, one of the lines in the set must be evicted (move out). When cache is full, which line is moved out. 2019/1/17

44 Replacement Strategy Random method
The cache might be full, or the set into which the value is assigned in a set associative cache might be full. In either case, one of the lines in the set must be evicted. Choosing at random is much better than always picking the same victim. Random method 2019/1/17

45 Update Strategy two main strategies: write-through and write-back. A write-through strategy dictates that the memory copy is updated whenever the cached copy is updated. (A cache architecture in which data is written to main memory at the same time as it is cached.) A write-back strategy dictates that the memory copy not be updated until the cache copy is evicted. (A cache architecture in which data is only written to main memory when it is forced out of the cache. ) 2019/1/17

46 Summary Cache is used to speed up operation.
Cache is fast (expensive, Static RAM) memory while main memory is slow (cheap memory, Dynamic RAM) To execute a code, CPU will search cache before searching main memory. That is why it is faster to group codes together so that it can be retained in the cache memory. Temporal (time) and spatial (space) Hit ratio means the ratio between CPU and memory. The higher the ratio, the better the performance (usually 90% yields the best performance.). 2019/1/17


Download ppt "Memory Operation and Performance"

Similar presentations


Ads by Google