CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.

Slides:



Advertisements
Similar presentations
Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Miss Penalty Reduction Techniques (Sec. 5.4) Multilevel Caches: A second level cache (L2) is added between the original Level-1 cache and main memory.
Segmentation and Paging Considerations
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
Memory/Storage Architecture Lab Computer Architecture Virtual Memory.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
Memory Management (II)
1 Lecture 14: Virtual Memory Topics: virtual memory (Section 5.4) Reminders: midterm begins at 9am, ends at 10:40am.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
1 Chapter 8 Virtual Memory Virtual memory is a storage allocation scheme in which secondary memory can be addressed as though it were part of main memory.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Chapter 9 Virtual Memory Produced by Lemlem Kebede Monday, July 16, 2001.
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
Virtual Memory BY JEMINI ISLAM. What is Virtual Memory Virtual memory is a memory management system that gives a computer the appearance of having more.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
Memory Systems Architecture and Hierarchical Memory Systems
CSE431 L22 TLBs.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 22. Virtual Memory Hardware Support Mary Jane Irwin (
Lecture 19: Virtual Memory
8.4 paging Paging is a memory-management scheme that permits the physical address space of a process to be non-contiguous. The basic method for implementation.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Lecture 9: Memory Hierarchy Virtual Memory Kai Bu
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
Virtual Memory 1 1.
B. Ramamurthy.  12 stage pipeline  At peak speed, the processor can request both an instruction and a data word on every clock.  We cannot afford pipeline.
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CS203 – Advanced Computer Architecture Virtual Memory.
The Memory Hierarchy Lecture 31 20/07/2009Lecture 31_CA&O_Engr. Umbreen Sabir.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
CS161 – Design and Architecture of Computer
Memory: Page Table Structure
CS 704 Advanced Computer Architecture
COSC3330 Computer Architecture
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
From Address Translation to Demand Paging
Improving Memory Access 1/3 The Cache and Virtual Memory
CS 704 Advanced Computer Architecture
Cache Memory Presentation I
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Part V Memory System Design
Evolution in Memory Management Techniques
CMSC 611: Advanced Computer Architecture
Chapter 5 Memory CSE 820.
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Virtual Memory Overcoming main memory size limitation
Computer Architecture
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Main Memory Background
COMP755 Advanced Operating Systems
Overview Problem Solution CPU vs Memory performance imbalance
Virtual Memory 1 1.
Presentation transcript:

CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy

Main Memory Assume the following performance:  4 clock cycles to send address  56 clock cycles for access time per word  4 clock cycles to send a word of data Choose a cache block of 4 words. Then  miss penalty is 4×(4+56+4) or 256 clock cycles  bandwidth is 32/256 or 1/8 byte per clock cycle

Fig Three examples of bus width, memory width, & memory interleaving to achieve higher memory bandwidth

Techniques for Higher Bandwidth 1. Wider main memory ─ Quadrupling the width of the cache and the memory will quadruple the memory bandwidth. With a main memory width of 4 words, the miss penalty would drop from 256 cycles to 64 cycles. 2. Simple interleaved memory ─ Sending an address to four banks permits them all to read simultaneously. The miss penalty is now 4+56+(4×4) or 76 clock cycles.

Example What can interleaving and wide memory buy? Consider the following machine: Block size = 1 word Memory bus width = 1 word Miss rate = 3% Memory accesses per instruction = 1.2 Cache miss penalty = 64 cycles Average CPI (ignoring cache misses) = 2 If we change the block size to 2 words, the miss rate falls to 2%, and a 4-word block has a miss rate of 1.2%. What is the improvement in performance of interleaving two ways and four ways versus doubling the width of memory and the bus?

Solution (1) CPI for computer using 1-word blocks = 2 + (1.2×3%×64) = 4.30  Since the clock cycle time and instruction time won’t change in this example, we calculate performance improvement by just comparing CPI. Increasing the block size to 2 words gives these options: 64-bit bus and memory, no interleaving = 2 + (1.2×2%×2×64) = bit bus and memory, interleaving = 2 + (1.2×2%×(4+56+8)) = bit bus and memory, no interleaving = 2 + (1.2×2%×1×64) = 3.54  Thus, doubling the block size slows down the straightforward implementation (5.07 versus 4.30), while interleaving or wider memory is 1.19 or 1.22 times faster, respectively.

Solution (2) Increasing the block size to 4 words gives these options:  64-bit bus and memory, no interleaving = 2 + (1.2×1.2%×4×64) = 5.69  64-bit bus and memory, interleaving = 2 + (1.2×1.2%×( )) = 3.09  128-bit bus and memory, no interleaving = 2 + (1.2×1.2%×2×64) = 3.84 Again, the larger block hurts performance for the simple case (5.69 versus 4.30), although the interleaved 64-bit memory is now fastest ─ 1.39 times faster versus 1.12 for the wider memory and bus.

Interleaved Memory Interleaved memory is logically a wide memory, except that accesses to banks are staged over time to share internal resources. How many banks should be included? One metric, used in vector computers, is  Number of banks ≥ Number of clock cycles to access word in bank

Virtual Memory At any instant in time computers are running multiple processes, each with its own address space. It is too expensive to dedicate a full address space worth of memory for each process, especially since many processes use only a small part of their address space. We need a way to share a smaller amount of physical memory among many processes. One way, virtual memory, divides physical memory into blocks and allocate them to different processes. There must be a protection scheme that restricts a process to the blocks belonging only to that process.

Fig A program in its contiguous virtual address space

Comparison with Caches Page or segment is used for block. Page fault or address fault is used for miss. The CPU produces virtual addresses that are translated by a combination of hardware and software to physical addresses, which access main memory. This process is called memory mapping or address translation. Replacement on cache misses is primarily controlled by hardware, while virtual memory replacement is primarily controlled by the operating system. The size of the processor address determines the size of virtual memory, but the cache size is independent of the processor address size.

Figure Typical ranges of parameters for caches and virtual memory ParameterL1 cacheVirtual memory Block (page) size16 – 128B4,096 – 65,536B Hit time1 – 3 clock cycles50 – 150 clock cycles Miss penalty8 – 150 cycles10 6 – 10 7 clock cycles (access time) (6 – 130 cycles) (8×10 5 − 8×10 6 cycles) (transfer time) (2 – 20 cycles) (2×10 5 − 2×10 6 cycles) Miss rate0.1 – 10%10 −5 − 10 −3 % Address mapping25 – 45 bit physical address to 14 – 20 bit cache address 32 – 64 virtual address to 25 – 45 bit physical address

Figure How paging and segmentation divide a program

Fig Paging versus segmentation Why two words per address for segment? PageSegment Words per addressOneTwo (segment and offset) Programmer visible? InvisibleMay be visible Replacing a blockTrivial (all blocks are same size) Hard ( must find contiguous variable- sized, unused portion of main memory) Memory use inefficiency Internal fragmentation (unused portion of page) External fragmentatiion (unused pieces of main memory) Efficient disk trafficYes (adjusting page size to balance access time and transfer time) Not always (small segments may transfer just a few bytes)

Four Questions 1. Where can a block be placed in main memory? Miss penalty is high. So, choose direct-mapped, fully associative, or set associative? 2. How is a block found if it is in main memory? Paging and segmentation (tag, index, offset fields). 3. Which block should be replaced on a virtual memory miss? Random, LRU, or FIFO. 4. What is the write policy? Write through, write back, write allocate, or no-write allocate?

Paging Paging uses a data structure that is indexed by the page number. This structure contains the physical address of the block. The offset is concatenated to the physical page address. The structure takes the form of a page table. Indexed by the virtual page number, the size of the table equals the number of pages in the virtual address space. Given a 32-bit virtual address, 4 KB pages, and four bytes per page table entry, the size of the page table would be (2 32 /2 12 )×2 2 = 2 22 or 4 MB.

Figure Mapping of virtual address to physical address via page table How can we reduce address translation time?

Figure Again Use these values: 64-bit virtual address, 8KB page size. What is the number of entries in page table? What is the size of page table?

Fig Mapping of Alpha virtual address

Alpha Memory Management 1 64-bit address space 43-bit virtual address Three segments: 1. seg0: bits = 00…0 2. seg1: bits = 11…1 3. kseg Segment kseg is reserved for operating system User processes use seg0 Page tables reside in seg1

Alpha Memory Management 2 PTE (page table entry) is 64 bit (8 bytes) Each page table has 1,024 PTE’s Page size is thus 8KB Virtual address is 43 bits (why?) Physical page number is 28 bits Physical address is thus 41 bits (why?) Possible to increase page size to 16, 32, or 64KB If page size = 64KB, then virtual and physical addresses become 55 and 44 bits, resp. (why?)

Fig Overview of Alpha memory hierarchy