Administration Midterm on Thursday Oct 28. Covers material through 10/21. Histogram of grades for HW#1 posted on newsgroup. Sample problem set (and solutions)

Slides:



Advertisements
Similar presentations
Main MemoryCS510 Computer ArchitecturesLecture Lecture 15 Main Memory.
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley and Rabi Mahapatra & Hank Walker.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
EECC550 - Shaaban #1 Lec # 10 Spring Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store.
ECE 232 L27.Virtual.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 27 Virtual.
ENGS 116 Lecture 141 Main Memory and Virtual Memory Vincent H. Berk October 26, 2005 Reading for today: Sections 5.1 – 5.4, (Jouppi article) Reading for.
EECC550 - Shaaban #1 Lec # 10 Summer Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
Vm Computer Architecture Lecture 16: Virtual Memory.
CSE431 L22 TLBs.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 22. Virtual Memory Hardware Support Mary Jane Irwin (
Lecture 19: Virtual Memory
EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
Lecture 13 Main Memory Computer Architecture COE 501.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Virtual Memory.  Next in memory hierarchy  Motivations:  to remove programming burdens of a small, limited amount of main memory  to allow efficient.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
MBG 1 CIS501, Fall 99 Lecture 11: Memory Hierarchy: Caches, Main Memory, & Virtual Memory Michael B. Greenwald Computer Architecture CIS 501 Fall 1999.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
CS161 – Design and Architecture of Computer
Memory Hierarchy— Reducing Miss Penalty Reducing Hit Time Main Memory
CS 704 Advanced Computer Architecture
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
Memory Hierarchy Reducing Hit Time Main Memory and Examples
CS161 – Design and Architecture of Computer
Reducing Hit Time Small and simple caches Way prediction Trace caches
CS352H: Computer Systems Architecture
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
CMSC 611: Advanced Computer Architecture
Andy Wang Operating Systems COP 4610 / CGS 5765
Lecture: DRAM Main Memory
CS 105 “Tour of the Black Holes of Computing!”
Lecture: DRAM Main Memory
MICROPROCESSOR MEMORY ORGANIZATION
Andy Wang Operating Systems COP 4610 / CGS 5765
Chap. 12 Memory Organization
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Virtual Memory Nov 27, 2007 Slide Source:
Lecture 15: Memory Design
CSC3050 – Computer Architecture
CS 105 “Tour of the Black Holes of Computing!”
CS 105 “Tour of the Black Holes of Computing!”
Chapter Five Large and Fast: Exploiting Memory Hierarchy
September 1, 2000 Prof. John Kubiatowicz
Main Memory Background
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Cache Memory and Performance
Sarah Diesburg Operating Systems CS 3430
Andy Wang Operating Systems COP 4610 / CGS 5765
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

Administration Midterm on Thursday Oct 28. Covers material through 10/21. Histogram of grades for HW#1 posted on newsgroup. Sample problem set (and solutions) on pipelining are posted on the web page. Last year’s practice exam, and last year’s midterm (with solutions) are on web site (under “Exams”)

Main Memory Background Performance of Main Memory: Latency: Cache Miss Penalty Access Time: time between request and word arrives Cycle Time: time between requests Bandwidth: I/O & Large Block Miss Penalty (L2) Main Memory is DRAM: Dynamic Random Access Memory Dynamic since needs to be refreshed periodically (8 ms, 1% time) Addresses divided into 2 halves (Memory as a 2D matrix): RAS or Row Access Strobe CAS or Column Access Strobe Cache uses SRAM: Static Random Access Memory No refresh (6 transistors/bit vs. 1 transistorSize: DRAM/SRAM ­ 4-8, Cost & Cycle time: SRAM/DRAM ­ 8-16

DRAM Organization Row and Column separate, because pins/packaging expensive. So address bus / 2. RAS (Row Access Strobe) typically first. Some allows multiple CAS for same RAS (page mode). Refresh: Write after read (wipes out data), Refresh: Periodic. Read each row every 8ms. Cost is O(sqrt(capacity)).

4 Key DRAM Timing Parameters tRAC: minimum time from RAS line falling to the valid data output. Quoted as the speed of a DRAM when buy A typical 4Mb DRAM tRAC = 60 ns Speed of DRAM since on purchase sheet? tRC: minimum time from the start of one row access to the start of the next. tRC = 110 ns for a 4Mbit DRAM with a tRAC of 60 ns tCAC: minimum time from CAS line falling to valid data output. 15 ns for a 4Mbit DRAM with a tRAC of 60 ns tPC: minimum time from the start of one column access to the start of the next. 35 ns for a 4Mbit DRAM with a tRAC of 60 ns

Example Memory Performance Single access RAS 60ns Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns

Example Memory Performance Single access The Bad News RAS RAS 60ns 110ns Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns

Example Memory Performance Single access RAS CAS RAS 60ns 110ns Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns

Example Memory Performance Single access RAS CAS RAS 60ns 110ns 15ns Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns

Example Memory Performance Single access RAS CAS RAS 60ns 110ns 15ns Wait Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns

Example Memory Performance Single access RAS CAS CAS RAS 60ns 110ns 15ns Wait 35ns Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns

Example Memory Performance Multiple accesses RAS CAS CAS RAS 60ns The Good News 110ns 15ns Wait 35ns Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns

Example Memory Performance Multiple accesses RAS CAS CAS CAS CAS CAS 60ns 110ns 15ns Wait 15ns 15ns 35ns 35ns 35ns But refresh is a RAS... Row Access: 60 ns Row Cycle: 110 ns Column Access: 15 ns Column Cycle: 35ns

DRAM Performance A 60 ns (tRAC) DRAM can perform a row access only every 110 ns (tRC) perform column access (tCAC) in 15 ns, but time between column accesses is at least 35 ns (tPC). In practice, external address delays and turning around buses make it 40 to 50 ns These times do not include the time to drive the addresses off the microprocessor nor the memory controller overhead!

DRAM Trends DRAMs: capacity +60%/yr, cost –30%/yr 2.5X cells/area, 1.5X die size in _3 years ‘98 DRAM fab line costs $2B Commodity, second source industry => high volume, low profit, conservative Order of importance: 1) Cost/bit 2) Capacity First RAMBUS: 10X BW, +30% cost => little impact Gigabit DRAM will take over market

Main Memory Performance Simple: CPU, Cache, Bus, Memory same width (32 or 64 bits) Wide: CPU/Mux 1 word; Mux/Cache, Bus, Memory N words (Alpha: 64 bits & 256 bits; UtraSPARC 512) Interleaved: CPU, Cache, Bus 1 word: Memory N Modules (4 Modules); example is word interleaved (logically “wide”).

Why not have wider memory? Pins, packaging CPU accesses word at a time. Need multiplexer in critical path. Unit of expansion. ECC (need to read full ECC block on every write to portion of block).

Main Memory Performance Timing model (word size is 32 bits) 1 to send address, 6 access time, 1 to send data Cache Block is 4 words Simple M.P. = 4 x (1+6+1) = 32 Wide M.P. = 1 + 6 + 1 = 8 Interleaved M.P. = 1 + 6 + 4x1 = 11

Main Memory Performance Timing model (word size is 32 bits) 1 to send address, 6 access time, 2 (or more) to send data Cache Block is 4 words Interleaved M.P. = 1 + 6 + 4x1 = 11 Independent reads or writes: don’t need to wait as long as next op is to different bank.

Independent Memory Banks Memory banks for independent accesses vs. faster sequential accesses Multiprocessor I/O CPU with Hit under n Misses, Non-blocking Cache Superbank: all memory active on one block transfer (or Bank) Bank: portion within a superbank that is word interleaved (or Subbank) … Superbank Bank

Independent Memory Banks How many banks? number banks >= number clocks to access word in bank For sequential accesses, otherwise will return to original bank before it has next word ready Increasing DRAM => fewer chips => harder to have banks

DRAMs per PC over Time DRAM Generation ‘86 ‘89 ‘92 ‘96 ‘99 ‘02 ‘86 ‘89 ‘92 ‘96 ‘99 ‘02 1 Mb 4 Mb 16 Mb 64 Mb 256 Mb 1 Gb 32 8 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 16 4 8 2 4 1 Minimum Memory Size 8 2 4 1 8 2

Fast Memory Systems: DRAM specific Multiple CAS accesses: several names (page mode) Extended Data Out (EDO): 30% faster in page mode New DRAMs to address gap; RAMBUS: reinvent DRAM interface Each Chip a module vs. slice of memory Short bus between CPU and chips Does own refresh Variable amount of data returned 1 byte / 2 ns (500 MB/s per chip) Synchronous DRAM: 2 banks on chip, a clock signal to DRAM, transfer synchronous to system clock (66 - 150 MHz) RAMBUS first seen as niche (e.g. video memory), now poised to become standard.

Main Memory Organization DRAM/Disk interface

Four Questions for Memory Hierarchy Designers Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy)

Four Questions Applied to Virtual Memory Q1: Where can a block be placed in the upper level? (Block placement: fully-associative vs. page coloring) Q2: How is a block found if it is in the upper level? (Block identification:translation & lookup, page-tables and TLB) Q3: Which block should be replaced on a miss? (Block replacement: random vs. LRU) Q4: What happens on a write? (Write strategy: copy-on-write, protection, etc.) Q?: protection? demand-load vs. prefetch? fixed vs. variable size? unit of transfer vs. frame size. Software

Paging Organization size of information blocks that are transferred from secondary to main storage (M) virtual and physical address space partitioned into blocks of equal size page frames pages disk mem cache reg pages frame

Addresses in Virtual Memory 3 addresses to consider Physical address: where in main memory frame is stored Virtual address: a logical address, relative to process/name space/page table Disk address: specifying where on disk page is stored. Disk addresses can be either physical (specifying cyclinder, block, etc.) or indirect (another level of naming --- even file system or segment). Virtual addresses logically include a process_id concatenated to n-bit address.

Virtual Address Space and Physical Address Space sizes From point of view of hierarchy, the disk will have more capacity than DRAM BUT, this does not mean that virtual address space will be bigger than physical address space. Virtual addresses provide protection and a naming mechanism. A long, long, time ago, some machines had physical address space bigger than virtual address space, and more core than vaddr space. (Multiple processes in memory at same time).

Address Map V = {0, 1, . . . , n - 1} virtual address space M = {0, 1, . . . , m - 1} physical address space MAP: V --> M U {0} address mapping function n > m, (n=m, n<m history) MAP(a) = a' if data at virtual address a is present in physical address a' and a' in M = 0 if data at virtual address a is not present in M a missing item fault Name Space V fault handler Processor Addr Trans Mechanism Main Memory Secondary Memory a a' physical address OS performs this transfer

Paging Organization + V.A. P.A. unit of mapping frame 0 1K Addr Trans frame 0 1K Addr Trans MAP page 0 1K 1024 1 1K 1024 1 1K also unit of transfer from virtual to physical memory 7168 7 1K Physical Memory 31744 31 1K Virtual Memory Address Mapping 10 VA page no. disp Page Table Page Table Base Reg Access Rights V actually, concatenation is more likely PA + index into page table table located in physical memory physical memory address

Virtual Address and a Cache VA PA miss Trans- lation Cache Main Memory CPU hit Virtually Addressed Caches Revisited data It takes an extra memory access to translate VA to PA This makes cache access very expensive, and this is the "innermost loop" that you want to go as fast as possible ASIDE: Why access cache with PA at all? VA caches have a problem! synonym / alias problem: two different virtual addresses map to same physical address => two different cache entries holding data for the same physical address! for update: must update all cache entries with same physical address or memory becomes inconsistent determining this requires significant hardware, essentially an associative lookup on the physical address tags to see if you have multiple hits; or software enforced alias boundary: same lsb of VA &PA > cache size

TLBs A way to speed up translation is to use a special cache of recently used page table entries -- this has many names, but the most frequently used is Translation Lookaside Buffer or TLB Virtual Address Physical Address Dirty Ref Valid Access Really just a cache on the page table mappings TLB access time comparable to cache access time (much less than main memory access time)