Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

Slides:



Advertisements
Similar presentations
Computer System Organization Computer-system operation – One or more CPUs, device controllers connect through common bus providing access to shared memory.
Advertisements

Lecture 12 Reduce Miss Penalty and Hit Time
Miss Penalty Reduction Techniques (Sec. 5.4) Multilevel Caches: A second level cache (L2) is added between the original Level-1 cache and main memory.
Cache Performance 1 Computer Organization II © CS:APP & McQuain Cache Memory and Performance Many of the following slides are taken with.
Performance of Cache Memory
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
Computer Systems. Computer System Components Computer Networks.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
Computer System Overview
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
Lecture 41: Review Session #3 Reminders –Office hours during final week TA as usual (Tuesday & Thursday 12:50pm-2:50pm) Hassan: Wednesday 1pm to 4pm or.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
VIRTUAL MEMORY. Virtual memory technique is used to extents the size of physical memory When a program does not completely fit into the main memory, it.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Memory Systems Architecture and Hierarchical Memory Systems
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Computer Systems Overview. Page 2 W. Stallings: Operating Systems: Internals and Design, ©2001 Operating System Exploits the hardware resources of one.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
 Higher associativity means more complex hardware  But a highly-associative cache will also exhibit a lower miss rate —Each set has more blocks, so there’s.
CMPE 421 Parallel Computer Architecture
CACHE MEMORY Cache memory, also called CPU memory, is random access memory (RAM) that a computer microprocessor can access more quickly than it can access.
Operating Systems and Networks AE4B33OSS Introduction.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
Computer Architecture Lecture 26 Fasih ur Rehman.
Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.
Lecture 40: Review Session #2 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.
1 CENG 450 Computer Systems and Architecture Cache Review Amirali Baniasadi
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
Computer Architecture Lecture 27 Fasih ur Rehman.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 CSCI 2510 Computer Organization Memory System II Cache In Action.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Cache (Memory) Performance Optimization. Average memory access time = Hit time + Miss rate x Miss penalty To improve performance: reduce the miss rate.
Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication.
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
1 load [2], [9] Transfer contents of memory location 9 to memory location 2. Illegal instruction.
Computer Organization
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
COSC3330 Computer Architecture
Cache Memory and Performance
Memory COMPUTER ARCHITECTURE
Improving Memory Access 1/3 The Cache and Virtual Memory
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
Computer Engineering 2nd Semester
5.2 Eleven Advanced Optimizations of Cache Performance
Cache Memory Presentation I
Systems Architecture II
Module IV Memory Organization.
M. Usha Professor/CSE Sona College of Technology
Memory Organization.
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Part V Memory System Design
If a DRAM has 512 rows and its refresh time is 9ms, what should be the frequency of row refresh operation on the average?
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Presentation transcript:

Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface

1 Example #1: Effect of Interleaving Consider a cache which has 8 words per block. On a read miss, the block that contains the desired word must be copied from the memory into the cache. Assume that the hardware has following properties. It takes 1 clock cycle to send an address to the main memory. The first word is accessed in 8 clock cycles, and subsequent words are accessed in 4 clock cycles. Also, one clock cycle is necessary to send the word to the cache. How many clock cycles does it take to send the block of words to the cache? The total time taken is (7x4) +1 = 38

2 Example #1: Effect of Interleaving If the memory is constructed as four interleaved modules, then when the starting address of the block arrives at the memory, all four modules being accessing the required data using the high order bits of the address. After 8 clock cycles, each module has one word of data in its DBR. These words are transferred to the cache one word at a time during the next 4 clock cycles. During this time, the next word in each module is accessed. Then it takes another 4 clock cycles to transfer these words to the cache. Therefore the total time taken is =17. Speed up obtained during interleaving is 38/17 = 2.2

3 Example #2: Effect of cache on processor chip Consider the impact of the cache on the overall performance of the computer. Let h be the hit rate, M be the miss penalty, that is, the time to access information in the main memory, and C the time to access information in the cache. Then, the average access time experienced by the processor is given by: Refer to page 332 of the text book Let us consider the following example. If the computer has no cache, then it takes 10 clock cycles for every memory read access. For a computer which has a cache that holds 8 word blocks and an interleaved main memory, it takes 17 clock cycles to transfer a block from the main memory to the cache. Assume that 30% of the instructions require a memory access, so there are 130 memory accesses for every 100 instructions executed. Assume that the hit rate in the cache are 0.95 for instructions and 0.9 for data. Then, the improvement in performance is: 130x10/100(0.95x x17) + 30(0.9x1+0.1x17)=5.04

4 Example #3: Effect of L1 & L2 cache. Consider the impact of L1 and L2 cache on the overall performance of the processor. Let h1 be hit rate in cache L1, h2 the hit rate in cache L2, C1 the time to access information in L1 cache, C2 time to access information in L2 cache, M is the time to access information in the main memory. Then, the average access time of the processor is given by: Refer to page 335 of the text book.

5 Example #4: Set-associative cache A computer system has a main memory of 64K 16-bit words. It consists of a cache of 128 blocks with 16 words per block organized in a block set associative manner with 2 blocks per set. (a) Calculate the number of bits in each of the TAG, SET and WORD fields of the main memory address format. (b) Assume that the cache is initially empty. Suppose that the processor fetches 2080 words from locations 0,1, , in that order. It then repeats this fetch sequence nine more times. If the cache is 10 times faster than the main memory, estimate the improvement factor resulting from the use of the cache. Assume that the LRU algorithm is used for block replacement. (a) The main memory address is 16 bits. The number of bits in the WORD field is 4. The number of bits in the SET field is 6. The number of bits in the TAG field is 16 - (6+4) = 6

6 Example #4: Set-associative cache Words 0, 1, 2,....,2079 occupy blocks 0 to 129 in the main memory. After blocks 0, 127 have been read from the main memory into the cache on the first pass, the cache is full. Because the replacement algorithm is LRU, main memory blocks that occupy the first two sets of the 64 cache sets are always overwritten before they can be used on a successive pass. In particular main memory blocks 0, 64 and 128 continually displace each other in competing for the 2 block positions in cache set 0. Similarly, main memory blocks 1, 65 and 129 continually displace each other in competing for the 2 block positions in cache set 1. Main memory blocks that occupy the last 62 sets are fetched once in the first pass and remain in the cache for the next 9 pases. On the first pass all 130 blocks must be fetched from the main memory. On each of the 9 passes blocks in the last 62 sets of the cache (62x2=124) are found in the cache. The remaining 6 blocks ( ) must be fetced from the main memory. Improvement factor = Time without cache/Time with cache = 10x130x10t/(1x130x11t + 9(124x1t + 6x11t)) = 4.14